Knowledge bases provide agents with access to domain-specific information, documents, and organizational knowledge through semantic search.Documentation Index
Fetch the complete documentation index at: https://docs.atthene.com/llms.txt
Use this file to discover all available pages before exploring further.
Available Knowledge Base Types
Milvus
Vector DatabaseProduction-ready vector database with semantic search, hybrid search, and query expansion.
UKnow
Cloud Storage SearchSearch documents in SharePoint, OneDrive, Google Drive, and Confluence via the UKnow API.
Milvus Knowledge Base
Milvus is a vector database that enables semantic search over your documents using dense vector embeddings. Knowledge bases support customizable chunking strategies, embedding models, and structured data processing.Basic Configuration
Creating with API
Knowledge bases are created via the API with full control over processing configuration. See Create Knowledge Base API for complete documentation.Instance Fields
Unique identifier for this knowledge base instance. Must contain only alphanumeric characters, hyphens, and underscores.
Knowledge base type:
"milvus" or "uknow"Toggle the knowledge base on or off without removing it from configuration
KnowledgeBase model ID for database lookup (required for Milvus to resolve collection info)
Adapter-specific retrieval configuration (see below)
Human-readable description of the knowledge base
Milvus Retrieval Configuration
Theconfig object controls how documents are retrieved from Milvus:
Number of most relevant results to returnRange: 1-1000
Search strategy for retrievalOptions:
dense- Vector-based semantic search only (default, fastest)hybrid- Combines dense vector + BM25 keyword searchbm25- BM25 keyword-based search onlyhybrid_rrf- Hybrid with Reciprocal Rank Fusion for improved ranking
HNSW search parameter controlling accuracy vs speed trade-offHigher values = more accurate but slower
Minimum similarity score thresholdRange: 0.0-1.0
Only return results above this score
Only return results above this score
Number of results to skip (for pagination)
Embedding provider for vectorization:
"mistral", "azure_openai", or "telekom_otc"Embedding model name
LLM model for query expansion. Must be one of the available LLM models.
Number of expanded queries to generate (0-5). When greater than 0, uses the query_model to generate variations of the original query for improved recall.
Raw Milvus expression for pre-search filtering. Supports Milvus-native operators like
ARRAY_CONTAINS.Haystack-style metadata filters for document retrieval.
Usage Examples
Basic Knowledge Base Agent
Hybrid Search with RRF Fusion
Query Expansion
UKnow Cloud Storage Search
Multiple Knowledge Bases
Chunking Strategies
Knowledge bases support three chunking strategies configured during creation:Recursive (Default)
Splits text using multiple separators in order (paragraphs → sentences → words). Best for general documents. Parameters:strategy:"recursive"chunk_size: Size in words/characters (default: 500)chunk_overlap: Overlap between chunks (default: 50)split_by:"word"|"char"(only these two options)recursive_separators: Array of separators to try in order
Hierarchical
Creates multi-level chunks preserving document structure. Ideal for academic papers and structured documents. Parameters:strategy:"hierarchical"hierarchical_block_sizes: Descending array of block sizes (e.g.,[700, 350, 150])chunk_overlap: Overlap between chunkssplit_by:"word"|"sentence"
Fixed
Simple fixed-size chunks. Fastest processing for straightforward documents. Parameters:strategy:"fixed"chunk_size: Fixed chunk sizechunk_overlap: Overlap between chunkssplit_by:"word"|"sentence"
Structured Data Processing
CSV and Excel files (.csv, .xlsx, .xls) support specialized processing:
Configuration
Structured Config Parameters
Number of rows to combine into one searchable chunk (1-20)Lower values = More precise retrieval, slower ingestion
Higher values = Faster ingestion, broader context
Higher values = Faster ingestion, broader context
Output format for table dataOptions:
csv- Comma-separated valuesmarkdown- Markdown table format
Column name containing the main text content (CSV only)Required for
row mode processingProcessing mode for CSV filesOptions:
row- One document per row (precise retrieval)file- One document per file (holistic context)
If enabled, uses an openpyxl streaming mode for Excel processing to avoid loading entire sheets into memory. Recommended for Excel files larger than 50MB.
Embedding Providers
Choose your embedding model during knowledge base creation:Azure OpenAI
text-embedding-ada-002(1536 dimensions)
Mistral AI
mistral-embed(1024 dimensions)
Telekom OTC
text-embedding-bge-m3(1024 dimensions) - BGE multilingual modeljina-embeddings-v2-base-de(768 dimensions) - German-optimizedjina-embeddings-v2-base-code(768 dimensions) - Code-optimizedtsi-embedding-colqwen2-2b-v1(1024 dimensions) - TSI ColQwen2
Best Practices
Chunking
Retrieval
Next Steps
Agent Capabilities
Explore all agent capabilities including tools and streaming
Agent Types
Learn about different agent types