RAG API Reference¶

Complete API documentation for SPADE-LLM's RAG (Retrieval-Augmented Generation) components.

Document Class¶

`spade_llm.rag.Document`¶

Core data structure for representing documents in the RAG system.

from spade_llm.rag import Document

doc = Document(
    content="Document text content",
    metadata={"source": "file.txt", "author": "user"}
)

Attributes¶

content (str): The text content of the document
metadata (dict, optional): Dictionary containing metadata about the document. Default: {}
id (str, optional): Unique identifier for the document. Auto-generated UUID if not provided

Methods¶

to_dict(): Convert the Document to a dictionary with keys "id", "content", "metadata"
from_dict(data: dict) (classmethod): Create a Document instance from a dictionary

Properties¶

text: Alias for the content attribute

Example¶

doc = Document(
    content="SPADE-LLM enables building multi-agent LLM systems.",
    metadata={
        "source": "documentation.md",
        "section": "introduction",
        "timestamp": "2024-01-15",
        "version": "1.0"
    }
)

print(doc.content)  # Access content
print(doc.text)  # Same as content
print(doc.metadata["source"])
print(doc.id)

doc_dict = doc.to_dict()
new_doc = Document.from_dict(doc_dict)

Document Loaders¶

`spade_llm.rag.BaseDocumentLoader`¶

Abstract base class for document loaders.

class BaseDocumentLoader:
    async def load_stream(self) -> AsyncGenerator[Document, None]:
        """Stream documents from the source as an async generator."""
        raise NotImplementedError

    async def load(self) -> list[Document]:
        """Load all documents from the source into a list."""
        return [doc async for doc in self.load_stream()]

`spade_llm.rag.TextLoader`¶

Load documents from a single text file.

from spade_llm.rag import TextLoader

loader = TextLoader(file_path="README.md", encoding="utf-8")
documents = await loader.load()

Parameters¶

file_path (str): Path to the text file to load
encoding (str, optional): File encoding. Default: "utf-8"

Returns¶

list[Document]: List containing a single Document with file content

Example¶

loader = TextLoader(file_path="./data/article.txt")
documents = await loader.load()

print(f"Loaded {len(documents)} document")
print(f"Content length: {len(documents[0].content)} characters")
print(f"Source: {documents[0].metadata['source']}")

`spade_llm.rag.DirectoryLoader`¶

Load multiple documents from a directory.

from spade_llm.rag import DirectoryLoader

loader = DirectoryLoader(
    path="./docs",
    glob_pattern="**/*.md"
)
documents = await loader.load()

Parameters¶

path (str): Directory path to load documents from
glob_pattern (str, optional): Glob pattern to match files. Default: "**/*"
recursive (bool, optional): If True, search subdirectories. Overridden by glob_pattern. Default: True
suffixes (list[str], optional): List of file suffixes to include (e.g., ['.txt', '.md']). If None, uses all extensions from loader_map
encoding (str, optional): File encoding. Default: "utf-8"
metadata (dict, optional): Base metadata to attach to all loaded documents
loader_map (dict, optional): A map of file extensions to loader classes

Returns¶

list[Document]: List of Documents, one per matched file

Glob Patterns¶

# All text files recursively
"**/*.txt"

# All markdown files in current directory only
"*.md"

# Multiple file types
"**/*.{md,txt,rst}"

# Specific subdirectory
"docs/**/*.md"

Example¶

loader = DirectoryLoader(
    path="./documentation",
    glob_pattern="**/*.{md,rst,txt}"
)

documents = await loader.load()
print(f"Loaded {len(documents)} documents)"}

# Access metadata
for doc in documents[:3]:
    source = doc.metadata.get("source", "unknown")
    print(f"- {source}: {len(doc.content)} chars")

Text Splitters¶

`spade_llm.rag.TextSplitter`¶

Abstract base class for text splitters.

class TextSplitter:
    def split_documents(self, documents: list[Document]) -> list[Document]:
        """Split documents into chunks."""
        raise NotImplementedError

`spade_llm.rag.CharacterTextSplitter`¶

Split documents by character count with optional separators.

from spade_llm.rag import CharacterTextSplitter

splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separator="\n\n"
)
chunks = splitter.split_documents(documents)

Parameters¶

chunk_size (int): Maximum size of each chunk in characters. Default: 2000
chunk_overlap (int): Number of overlapping characters between chunks. Default: 200
separator (str, optional): String to split on. Default: "\n\n"
is_separator_regex (bool, optional): Whether separator is a regex pattern. Default: False
keep_separator (bool | Literal["start", "end"], optional): Whether to keep the separator and where to place it. Can be False, True (equivalent to "start"), "start", or "end". Default: False
length_function (callable, optional): Function to measure chunk length. Default: len
add_start_index (bool, optional): If True, includes chunk's start index in metadata. Default: True
strip_whitespace (bool, optional): If True, strips whitespace from chunks. Default: True

Returns¶

list[Document]: List of Document chunks with preserved metadata

Example¶

splitter = CharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separator="\n\n"
)

documents = [
    Document(
        content="Long document text...",
        metadata={"source": "doc.txt"}
    )
]

chunks = splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

# Metadata is preserved in each chunk
for chunk in chunks:
    print(chunk.metadata["source"])  # Same as original

`spade_llm.rag.RecursiveCharacterTextSplitter`¶

Intelligently split documents using a hierarchy of separators.

from spade_llm.rag import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(documents)

Parameters¶

chunk_size (int): Maximum size of each chunk in characters. Default: 2000
chunk_overlap (int): Number of overlapping characters between chunks. Default: 200
separators (list[str], optional): Ordered list of separators to try. Default: ["\n\n", "\n", " ", ""]
keep_separator (bool | Literal["start", "end"], optional): Whether and where to keep the separator. Default: True
is_separator_regex (bool, optional): Whether separators are regex patterns. Default: False
length_function (callable, optional): Function to measure chunk length. Default: len
add_start_index (bool, optional): If True, includes chunk's start index in metadata. Default: True
strip_whitespace (bool, optional): If True, strips whitespace from chunks. Default: True

Returns¶

list[Document]: List of Document chunks

How It Works¶

Try splitting by first separator (e.g., "\n\n" for paragraphs)
If chunks are still too large, try next separator (e.g., "\n" for lines)
Continue through separator list until chunks are appropriate size
Preserves document structure better than character splitting

Example¶

# Recommended for most use cases
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

documents = [
    Document(content="""
# Introduction

This is a paragraph about RAG systems.
They combine retrieval with generation.

## Benefits

RAG provides several advantages:
- Better accuracy
- Up-to-date information
- Reduced hallucinations
    """)
]

chunks = splitter.split_documents(documents)

# Each chunk respects document structure
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(chunk.content[:100])
    print()

Vector Stores¶

`spade_llm.rag.VectorStore`¶

Abstract base class for vector stores.

class VectorStore:
    async def initialize(self): ...
    async def add_documents(self, documents: list[Document]): ...
    async def similarity_search(self, query: str, k: int = 4): ...
    async def delete_collection(self): ...
    async def cleanup(self): ...

`spade_llm.rag.Chroma`¶

ChromaDB vector store implementation.

from spade_llm.rag import Chroma
from spade_llm.providers import LLMProvider

provider = LLMProvider(model="ollama/nomic-embed-text")

vector_store = Chroma(
    collection_name="my_collection",
    embedding_fn=provider.get_embeddings,
    persist_directory="./vector_db"
)

await vector_store.initialize()

Parameters¶

collection_name (str): Name of the ChromaDB collection. Default: "documents"
persist_directory (str, optional): Directory to persist the database (for persistent client). If None, uses in-memory storage
host (str, optional): Host for ChromaDB server (for HTTP client)
port (int, optional): Port for ChromaDB server (for HTTP client). Default: 8000
ssl (bool, optional): Whether to use SSL for HTTP client. Default: False
headers (dict, optional): Optional HTTP headers for HTTP client
tenant (str, optional): Tenant ID. Default: 'default_tenant'
database (str, optional): Database name. Default: 'default_database'
embedding_fn (callable, optional): Async function that generates embeddings. Signature: async def(texts: list[str]) -> list[list[float]]
collection_metadata (dict, optional): Optional metadata for the collection
collection_configuration (CreateCollectionConfiguration, optional): Configuration for the collection. Use chromadb.api.collection_configuration.CreateCollectionConfiguration to define index properties (e.g., distance function). Example: CreateCollectionConfiguration(hnsw={"space": "cosine"})
client_settings (Settings, optional): Optional Chroma client settings (chromadb.config.Settings)
client (optional): Optional pre-configured Chroma client
relevance_score_fn (callable, optional): Function to convert distance to relevance score. If None, auto-selected based on collection's distance function

Methods¶

`initialize()`¶

Initialize the vector store and collection.

await vector_store.initialize()

Returns: None

`add_documents(documents: list[Document])`¶

Add documents to the vector store.

documents = [
    Document(content="Text 1", metadata={"id": "doc1"}),
    Document(content="Text 2", metadata={"id": "doc2"})
]

await vector_store.add_documents(documents)

Parameters:

documents (list[Document]): Documents to add

Returns: None

`similarity_search(query: str, k: int = 4, where: dict = None)`¶

Search for similar documents.

results = await vector_store.similarity_search(
    query="What is RAG?",
    k=5,
    where={"category": "documentation"}
)

Parameters:

query (str): Search query text
k (int): Number of results to return. Default: 4
where (dict, optional): Metadata filters

Returns: list[Document]

`similarity_search_with_score(query: str, k: int = 4, where: dict = None)`¶

Search with similarity scores.

results = await vector_store.similarity_search_with_score(
    query="embeddings",
    k=3
)

for doc, score in results:
    print(f"Score: {score:.3f}")
    print(f"Content: {doc.content[:100]}")

Parameters:

query (str): Search query text
k (int): Number of results to return. Default: 4
where (dict, optional): Metadata filters

Returns: list[tuple[Document, float]] - List of (document, score) tuples

`max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)`¶

Return documents selected using maximal marginal relevance.

results = await vector_store.max_marginal_relevance_search(
    query="RAG systems",
    k=5,
    fetch_k=20,
    lambda_mult=0.5
)

Parameters:

query (str): Search query text
k (int): Number of documents to return. Default: 4
fetch_k (int): Number of documents to fetch for MMR algorithm. Default: 20
lambda_mult (float): Balance between diversity (0) and similarity (1). Default: 0.5
filters (dict, optional): Filter by metadata
**kwargs: Additional keyword arguments

Returns: list[Document]

`get_by_ids(ids: list[str])`¶

Get documents by their IDs.

docs = await vector_store.get_by_ids(["id1", "id2", "id3"])

Parameters:

ids (list[str]): List of document IDs to retrieve

Returns: list[Document]

`get(ids: list[str] = None, where: dict = None, limit: int = None, offset: int = None, where_document: dict = None, include: list[str] = None)`¶

Get documents by metadata.

result = await vector_store.get(
    where={"source": "manual.pdf"},
    limit=10
)

ids = result.get("ids", [])
metadatas = result.get("metadatas", [])
documents = result.get("documents", [])

Parameters:

ids (list[str], optional): Document IDs to retrieve
where (dict, optional): Filter results by metadata. E.g. {"color": "red", "price": {"$lt": 5.0}}
limit (int, optional): Number of documents to return
offset (int, optional): Offset to start returning results from. Useful for paging results with limit
where_document (dict, optional): Filter by document content. E.g. {"$contains": "hello"}
include (list[str], optional): What to include in the results. Can contain "embeddings", "metadatas", "documents". IDs are always included. Defaults to ["metadatas", "documents"]

Returns: dict with keys "ids", "embeddings" (if requested), "metadatas", "documents"

`update_document(document_id: str, document: Document)`¶

Update a single document in the collection.

updated_doc = Document(
    content="Updated content",
    metadata={"version": "2.0"}
)
await vector_store.update_document("doc_id", updated_doc)

Parameters:

document_id (str): ID of the document to update
document (Document): New document content

Returns: None

`update_documents(ids: list[str], documents: list[Document])`¶

Update multiple documents in the collection.

await vector_store.update_documents(
    ["id1", "id2"],
    [doc1, doc2]
)

Parameters:

ids (list[str]): List of document IDs to update
documents (list[Document]): List of new document contents

Returns: None

`delete(ids: list[str])`¶

Delete documents by ID.

# Get IDs to delete
result = await vector_store.get(where={"version": "old"})
ids = result.get("ids", [])

# Delete them
success = await vector_store.delete(ids)
print(f"Deleted: {success}")

Parameters:

ids (list[str]): Document IDs to delete

Returns: bool - True if successful

`delete_collection()`¶

Delete the entire collection.

await vector_store.delete_collection()

Returns: None

`reset_collection()`¶

Reset the collection by deleting and recreating it. Useful for testing or when you want to completely clear and reinitialize a collection.

await vector_store.reset_collection()

Returns: None

`from_documents(documents: list[Document], embedding_fn: callable = None, collection_name: str = "documents", **kwargs)` (classmethod)¶

Create a Chroma vector store from a list of documents.

store = await Chroma.from_documents(
    documents=docs,
    embedding_fn=provider.get_embeddings,
    collection_name="my_docs",
    persist_directory="./db"
)

Parameters:

documents (list[Document]): List of Document objects to add
embedding_fn (callable, optional): Async function that takes a list of texts and returns embeddings
collection_name (str): Name of the collection to create. Default: "documents"
**kwargs: Additional arguments to pass to Chroma constructor

Returns: Initialized Chroma vector store with documents added

`from_texts(texts: list[str], embedding_fn: callable = None, metadatas: list[dict] = None, ids: list[str] = None, collection_name: str = "documents", **kwargs)` (classmethod)¶

Create a Chroma vector store from a list of texts.

store = await Chroma.from_texts(
    texts=["Text 1", "Text 2"],
    embedding_fn=provider.get_embeddings,
    metadatas=[{"source": "a"}, {"source": "b"}]
)

Parameters:

texts (list[str]): List of text strings to add
embedding_fn (callable, optional): Async function that takes a list of texts and returns embeddings
metadatas (list[dict], optional): Optional list of metadata dicts for each text
ids (list[str], optional): Optional list of IDs for each text
collection_name (str): Name of the collection to create. Default: "documents"
**kwargs: Additional arguments to pass to Chroma constructor

Returns: Initialized Chroma vector store with texts added

`get_document_count()`¶

Get total number of documents in collection.

count = await vector_store.get_document_count()
print(f"Total documents: {count}")

Returns: int

`cleanup()`¶

Clean up resources.

await vector_store.cleanup()

Returns: None

Complete Example¶

from spade_llm.rag import Chroma, Document
from spade_llm.providers import LLMProvider

async def main():
    # Setup
    provider = LLMProvider(model="ollama/nomic-embed-text")

    vector_store = Chroma(
        collection_name="docs",
        embedding_fn=provider.get_embeddings,
        persist_directory="./my_db"
    )

    await vector_store.initialize()

    # Add documents
    doc1 = Document(
        content="RAG combines retrieval and generation",
        metadata={"topic": "rag"}
    )
    doc2 = Document(
        content="Vector stores enable semantic search",
        metadata={"topic": "vectors"}
    )
    docs = [doc1, doc2]

    await vector_store.add_documents(docs)

    # Search
    results = await vector_store.similarity_search("What is RAG?", k=2)
    for doc in results:
        print(doc.content)

    # Search with scores
    scored_results = await vector_store.similarity_search_with_score(
        "semantic search", k=1
    )
    for doc, score in scored_results:
        print(f"Score: {score:.3f} - {doc.content}")

    # Get by metadata
    rag_docs = await vector_store.get(where={"topic": "rag"})
    print(f"Found {len(rag_docs['ids'])} RAG documents")

    # Delete specific documents by ID
    await vector_store.delete([doc1.id])

    # Check count
    count = await vector_store.get_document_count()
    print(f"Remaining: {count} documents")

    # Cleanup
    await vector_store.cleanup()

Retrievers¶

`spade_llm.rag.BaseRetriever`¶

Abstract base class for retrievers.

class BaseRetriever:
    async def retrieve(
        self,
        query: str,
        k: int = 4,
        **kwargs
    ) -> list[Document]:
        """Retrieve relevant documents for query."""
        raise NotImplementedError

`spade_llm.rag.VectorStoreRetriever`¶

Retriever that uses a vector store for document retrieval.

from spade_llm.rag import VectorStoreRetriever

retriever = VectorStoreRetriever(
    vector_store=vector_store
)

results = await retriever.retrieve(
    query="How do I configure agents?",
    k=5,
    search_type="similarity"
)

Parameters¶

vector_store (VectorStore): Initialized vector store instance

Methods¶

`retrieve_similarity(query: str, k: int = 4, sim_threshold: float = float("inf"), filters: dict = None, **kwargs)`¶

Retrieve documents based on vector similarity.

results = await retriever.retrieve_similarity(
    query="agent communication patterns",
    k=5,
    sim_threshold=0.7,
    filters={"category": "architecture"}
)

Parameters:

query (str): Search query text
k (int): Number of documents to retrieve. Default: 4
sim_threshold (float): Minimum similarity score threshold. Documents with scores below this will be filtered out. Use float("inf") to disable filtering. Default: float("inf")
filters (dict, optional): Optional metadata filters to narrow down the search
**kwargs: Additional keyword arguments passed to the vector store

Returns: list[Document]

`retrieve_mmr(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)`¶

Retrieve documents using Maximal Marginal Relevance for diversity.

results = await retriever.retrieve_mmr(
    query="LLM providers",
    k=5,
    fetch_k=20,
    lambda_mult=0.5
)

Parameters:

query (str): Search query text
k (int): Number of documents to return. Default: 4
fetch_k (int): Number of documents to fetch for MMR algorithm. The algorithm will first fetch this many documents, then select k diverse documents from them. Default: 20
lambda_mult (float): Balance between similarity (1.0) and diversity (0.0). Default: 0.5
filters (dict, optional): Optional metadata filters to narrow down the search
**kwargs: Additional keyword arguments passed to the vector store

Returns: list[Document]

Raises: NotImplementedError if the underlying vector store doesn't support MMR

`retrieve(query: str, k: int = 4, search_type: str = "similarity", **kwargs)`¶

Generic retrieval method that dispatches to specific search methods.

results = await retriever.retrieve(
    query="What is multi-agent coordination?",
    k=10,
    search_type="similarity",
    filters={"category": "architecture"}
)

Parameters:

query (str): Search query text
k (int): Number of documents to retrieve. Default: 4
search_type (str): Type of search to perform. Options: "similarity" (default) or "mmr"
**kwargs: Additional arguments passed to the specific retrieval method. For similarity: sim_threshold, filters. For mmr: fetch_k, lambda_mult, filters

Returns: list[Document]

Note: For a more explicit and type-safe API, prefer using the specific methods: retrieve_similarity() or retrieve_mmr()

Search Types:

"similarity": Standard similarity search
"mmr": Maximal Marginal Relevance (balances relevance and diversity)

Examples¶

Basic Similarity Search:

results = await retriever.retrieve(
    query="agent communication patterns",
    k=5
)

for doc in results:
    print(f"Source: {doc.metadata.get('source')}")
    print(f"Content: {doc.content[:200]}...")
    print()

MMR Search (Diversity):

# Get diverse results
results = await retriever.retrieve(
    query="LLM providers",
    k=5,
    search_type="mmr",
    fetch_k=20,  # Fetch more candidates
    lambda_mult=0.5  # Balance relevance (1.0) vs diversity (0.0)
)

Filtered Retrieval:

# Only retrieve from specific sources
results = await retriever.retrieve(
    query="installation steps",
    k=10,
    filters={
        "section": "getting-started",
        "version": "latest"
    }
)

Complete Workflow:

from spade_llm.rag import (
    Chroma,
    VectorStoreRetriever,
    RecursiveCharacterTextSplitter,
    DirectoryLoader
)
from spade_llm.providers import LLMProvider

async def setup_retrieval():
    # Load and chunk documents
    loader = DirectoryLoader(path="./docs", glob_pattern="**/*.md")
    documents = await loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=100
    )
    chunks = splitter.split_documents(documents)

    # Initialize vector store
    provider = LLMProvider(model="ollama/nomic-embed-text")
    vector_store = Chroma(
        collection_name="documentation",
        embedding_fn=provider.get_embeddings
    )
    await vector_store.initialize()
    await vector_store.add_documents(chunks)

    # Create retriever
    retriever = VectorStoreRetriever(vector_store=vector_store)

    # Retrieve documents
    results = await retriever.retrieve(
        query="How do I create a custom tool?",
        k=5,
        search_type="similarity"
    )

    return results

RAG API Reference¶

Document Class¶

spade_llm.rag.Document¶

Attributes¶

Methods¶

Properties¶

Example¶

Document Loaders¶

spade_llm.rag.BaseDocumentLoader¶

spade_llm.rag.TextLoader¶

Parameters¶

Returns¶

Example¶

spade_llm.rag.DirectoryLoader¶

Parameters¶

Returns¶

Glob Patterns¶

Example¶

Text Splitters¶

spade_llm.rag.TextSplitter¶

spade_llm.rag.CharacterTextSplitter¶

Parameters¶

Returns¶

Example¶

spade_llm.rag.RecursiveCharacterTextSplitter¶

Parameters¶

Returns¶

How It Works¶

Example¶

Vector Stores¶

spade_llm.rag.VectorStore¶

spade_llm.rag.Chroma¶

Parameters¶

Methods¶

initialize()¶

add_documents(documents: list[Document])¶

similarity_search(query: str, k: int = 4, where: dict = None)¶

similarity_search_with_score(query: str, k: int = 4, where: dict = None)¶

max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)¶

get_by_ids(ids: list[str])¶

get(ids: list[str] = None, where: dict = None, limit: int = None, offset: int = None, where_document: dict = None, include: list[str] = None)¶

update_document(document_id: str, document: Document)¶

update_documents(ids: list[str], documents: list[Document])¶

delete(ids: list[str])¶

delete_collection()¶

reset_collection()¶

from_documents(documents: list[Document], embedding_fn: callable = None, collection_name: str = "documents", **kwargs) (classmethod)¶

from_texts(texts: list[str], embedding_fn: callable = None, metadatas: list[dict] = None, ids: list[str] = None, collection_name: str = "documents", **kwargs) (classmethod)¶

get_document_count()¶

cleanup()¶

Complete Example¶

Retrievers¶

spade_llm.rag.BaseRetriever¶

spade_llm.rag.VectorStoreRetriever¶

Parameters¶

Methods¶

retrieve_similarity(query: str, k: int = 4, sim_threshold: float = float("inf"), filters: dict = None, **kwargs)¶

retrieve_mmr(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)¶

retrieve(query: str, k: int = 4, search_type: str = "similarity", **kwargs)¶

Examples¶

See Also¶

`spade_llm.rag.Document`¶

`spade_llm.rag.BaseDocumentLoader`¶

`spade_llm.rag.TextLoader`¶

`spade_llm.rag.DirectoryLoader`¶

`spade_llm.rag.TextSplitter`¶

`spade_llm.rag.CharacterTextSplitter`¶

`spade_llm.rag.RecursiveCharacterTextSplitter`¶

`spade_llm.rag.VectorStore`¶

`spade_llm.rag.Chroma`¶

`initialize()`¶

`add_documents(documents: list[Document])`¶

`similarity_search(query: str, k: int = 4, where: dict = None)`¶

`similarity_search_with_score(query: str, k: int = 4, where: dict = None)`¶

`max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)`¶

`get_by_ids(ids: list[str])`¶

`get(ids: list[str] = None, where: dict = None, limit: int = None, offset: int = None, where_document: dict = None, include: list[str] = None)`¶

`update_document(document_id: str, document: Document)`¶

`update_documents(ids: list[str], documents: list[Document])`¶

`delete(ids: list[str])`¶

`delete_collection()`¶

`reset_collection()`¶

`from_documents(documents: list[Document], embedding_fn: callable = None, collection_name: str = "documents", **kwargs)` (classmethod)¶

`from_texts(texts: list[str], embedding_fn: callable = None, metadatas: list[dict] = None, ids: list[str] = None, collection_name: str = "documents", **kwargs)` (classmethod)¶

`get_document_count()`¶

`cleanup()`¶

`spade_llm.rag.BaseRetriever`¶

`spade_llm.rag.VectorStoreRetriever`¶

`retrieve_similarity(query: str, k: int = 4, sim_threshold: float = float("inf"), filters: dict = None, **kwargs)`¶

`retrieve_mmr(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filters: dict = None, **kwargs)`¶

`retrieve(query: str, k: int = 4, search_type: str = "similarity", **kwargs)`¶