LLM Providers¶
SPADE_LLM supports multiple LLM providers through a unified interface, enabling seamless switching between different AI services.
Provider Architecture¶
SPADE-LLM uses LiteLLM as a unified backend, giving you access to 100+ LLM providers through a single interface.
graph TD
A[LLMProvider] --> B[LiteLLM]
B --> C[OpenAI]
B --> D[Ollama]
B --> E[Anthropic]
B --> F[Azure OpenAI]
B --> G[Any OpenAI-compatible API]
C --> H[GPT-5-nano]
D --> I[Llama / Mistral / Gemma]
E --> J[Claude 3.5]
G --> K[vLLM / LM Studio / etc.]
Supported Providers¶
The unified LLMProvider uses the LiteLLM model format (provider/model-name):
- OpenAI - GPT models (model name directly, e.g.
gpt-5-nano) - Anthropic - Claude models (prefix
anthropic/, e.g.anthropic/claude-sonnet-4-6) - Ollama - Local models (prefix
ollama/, e.g.ollama/gpt-oss-20b) - OpenAI-compatible API - For vLLM, LM Studio or custom endpoints; use the
openai/prefix with custombase_url.
OpenAI¶
Cloud-based LLM service with state-of-the-art models:
from spade_llm.providers import LLMProvider
provider = LLMProvider(
model="gpt-5-nano",
api_key="your-api-key",
temperature=0.7,
)
Ollama¶
Local deployment for privacy and control:
OpenAI-Compatible APIs (LM Studio, vLLM, etc.)¶
For any OpenAI-compatible API server, use the openai/ prefix with a custom base_url:
# LM Studio
provider = LLMProvider(
model="openai/local-model",
base_url="http://localhost:1234/v1",
)
# vLLM
provider = LLMProvider(
model="openai/gpt-oss-120b",
base_url="http://localhost:8000/v1",
)
Start vLLM server:
Embeddings¶
LLMProvider supports generating embeddings for RAG (Retrieval-Augmented Generation) systems.
What are Embeddings?¶
Embeddings are dense vector representations of text that capture semantic meaning in high-dimensional space. Similar texts have similar embeddings, enabling:
- Semantic search: Find documents by meaning, not just keywords
- Clustering: Group similar content together
- Recommendations: Suggest related documents
- RAG systems: Retrieve relevant context for LLM queries
OpenAI Embeddings¶
from spade_llm.providers import LLMProvider
provider = LLMProvider(
model="text-embedding-3-small",
api_key="your-api-key",
)
# Generate embeddings
embeddings = await provider.get_embeddings([
"First document text",
"Second document text"
])
# Each embedding is a list of floats
print(f"Dimensions: {len(embeddings[0])}") # e.g., 1536
Ollama Embeddings¶
provider = LLMProvider(
model="ollama/nomic-embed-text",
)
embeddings = await provider.get_embeddings([
"Document content here",
"Another document"
])
Usage in RAG Systems¶
from spade_llm.rag import Chroma, VectorStoreRetriever
from spade_llm.providers import LLMProvider
# Setup embedding provider
embedding_provider = LLMProvider(
model="ollama/nomic-embed-text",
)
# Use in vector store
vector_store = Chroma(
collection_name="knowledge_base",
embedding_fn=embedding_provider.get_embeddings
)
await vector_store.initialize()
Important: Use the same embedding model for both indexing documents and querying.
Best Practices¶
- Test multiple providers during development to find the best fit
- Implement fallback systems for critical applications
- Use environment variables for easy configuration management
- Monitor provider performance and costs in production
- Choose models based on your specific use case requirements
Next Steps¶
- Tools System - Add tool capabilities to your providers
- RAG System - Use embeddings for retrieval-augmented generation
- Architecture - Understanding the provider layer
- Routing - Route responses based on provider capabilities