Skip to content

LLM Providers

SPADE_LLM supports multiple LLM providers through a unified interface, enabling seamless switching between different AI services.

Provider Architecture

SPADE-LLM uses LiteLLM as a unified backend, giving you access to 100+ LLM providers through a single interface.

graph TD
    A[LLMProvider] --> B[LiteLLM]
    B --> C[OpenAI]
    B --> D[Ollama]
    B --> E[Anthropic]
    B --> F[Azure OpenAI]
    B --> G[Any OpenAI-compatible API]

    C --> H[GPT-5-nano]
    D --> I[Llama / Mistral / Gemma]
    E --> J[Claude 3.5]
    G --> K[vLLM / LM Studio / etc.]

Supported Providers

The unified LLMProvider uses the LiteLLM model format (provider/model-name):

  • OpenAI - GPT models (model name directly, e.g. gpt-5-nano)
  • Anthropic - Claude models (prefix anthropic/, e.g. anthropic/claude-sonnet-4-6)
  • Ollama - Local models (prefix ollama/, e.g. ollama/gpt-oss-20b)
  • OpenAI-compatible API - For vLLM, LM Studio or custom endpoints; use the openai/ prefix with custom base_url.

OpenAI

Cloud-based LLM service with state-of-the-art models:

from spade_llm.providers import LLMProvider

provider = LLMProvider(
    model="gpt-5-nano",
    api_key="your-api-key",
    temperature=0.7,
)

Ollama

Local deployment for privacy and control:

provider = LLMProvider(
    model="ollama/llama3.1:8b",
)

OpenAI-Compatible APIs (LM Studio, vLLM, etc.)

For any OpenAI-compatible API server, use the openai/ prefix with a custom base_url:

# LM Studio
provider = LLMProvider(
    model="openai/local-model",
    base_url="http://localhost:1234/v1",
)

# vLLM
provider = LLMProvider(
    model="openai/gpt-oss-120b",
    base_url="http://localhost:8000/v1",
)

Start vLLM server:

python -m vllm.entrypoints.openai.api_server \
    --model openai/gpt-oss-120b \
    --port 8000

Embeddings

LLMProvider supports generating embeddings for RAG (Retrieval-Augmented Generation) systems.

What are Embeddings?

Embeddings are dense vector representations of text that capture semantic meaning in high-dimensional space. Similar texts have similar embeddings, enabling:

  • Semantic search: Find documents by meaning, not just keywords
  • Clustering: Group similar content together
  • Recommendations: Suggest related documents
  • RAG systems: Retrieve relevant context for LLM queries

OpenAI Embeddings

from spade_llm.providers import LLMProvider

provider = LLMProvider(
    model="text-embedding-3-small",
    api_key="your-api-key",
)

# Generate embeddings
embeddings = await provider.get_embeddings([
    "First document text",
    "Second document text"
])

# Each embedding is a list of floats
print(f"Dimensions: {len(embeddings[0])}")  # e.g., 1536

Ollama Embeddings

provider = LLMProvider(
    model="ollama/nomic-embed-text",
)

embeddings = await provider.get_embeddings([
    "Document content here",
    "Another document"
])

Usage in RAG Systems

from spade_llm.rag import Chroma, VectorStoreRetriever
from spade_llm.providers import LLMProvider

# Setup embedding provider
embedding_provider = LLMProvider(
    model="ollama/nomic-embed-text",
)

# Use in vector store
vector_store = Chroma(
    collection_name="knowledge_base",
    embedding_fn=embedding_provider.get_embeddings
)

await vector_store.initialize()

Important: Use the same embedding model for both indexing documents and querying.

Best Practices

  • Test multiple providers during development to find the best fit
  • Implement fallback systems for critical applications
  • Use environment variables for easy configuration management
  • Monitor provider performance and costs in production
  • Choose models based on your specific use case requirements

Next Steps

  • Tools System - Add tool capabilities to your providers
  • RAG System - Use embeddings for retrieval-augmented generation
  • Architecture - Understanding the provider layer
  • Routing - Route responses based on provider capabilities