Skip to main content
The Knowledge Base section is where you upload documents, create collections, and build retrieval systems that your agents can query. Give your agents access to your company’s data, documentation, and knowledge.

Two Main Sections

Cloud Storage

Connect external storage providers to automatically sync and index documents:
  • SharePoint
  • OneDrive
  • Google Drive
Cloud Storage

RAG Pipeline

Build the retrieval system in 4 steps:
  1. Data Sources - Upload files or connect to storage
  2. Collections - Organize data into collections
  3. Knowledge Bases - Configure retrieval settings
  4. Retrieval Methods - Set up how agents search your data

Connecting Cloud Storage

1

Click Cloud Storage Tab

At the top of the Knowledge Base page
2

Click Connect Cloud Storage

Uknow Cloud
3

Choose Your Provider

Select SharePoint, OneDrive, or Google Drive
SharePoint Connection
4

Authenticate

Follow OAuth flow to grant access
5

Select Folders

Choose which folders to sync and index
Files are automatically synced and indexed. Updates to documents are reflected in your knowledge bases.

Building a RAG Pipeline

Step 1: Data Sources

Upload files or connect to storage:
  • Upload Files: PDF, DOCX, TXT, Markdown, CSV, Excel
  • Cloud Storage: Connect SharePoint, OneDrive, or Google Drive (see Cloud Storage section above)
Milvus Data Sources
Supported file types:
  • Documents: PDF, DOC, DOCX, TXT, MD (Markdown), LOG
  • Structured Data: CSV, XLSX, XLS (Excel)

Step 2: Collections

Group related data sources:
  • Create collections by topic or project
  • Add multiple data sources to a collection
  • Configure chunking and embedding settings per knowledge base
Collection organization:
  • Collections are reusable groups of data sources
  • Use collections to organize data by topic, project, or department
  • One data source can belong to multiple collections

Step 3: Knowledge Bases

Configure processing and retrieval for your data:
  • Name your knowledge base
  • Select collections and/or individual data sources
  • Configure chunking strategy and embedding model
  • Set up retrieval parameters
Processing Configuration:

Chunking Strategies

Control how documents are split:
Chunking Configuration
Customization Options:
Chunking Customization
  • Recursive (default): Splits by paragraphs → sentences → words
  • Hierarchical: Multi-level chunks preserving structure
  • Fixed: Simple fixed-size chunks
Chunking parameters:
  • Chunk size: 100-1000 words/characters (default: 500)
  • Chunk overlap: Overlap between chunks (default: 50)
  • Split by: word or character (recursive only; hierarchical and fixed support word only)
Smart Extraction (Coming Soon):
AI-powered entity extraction is currently in development. When released, it will automatically extract entities (PERSON, ORGANIZATION, CONCEPT, etc.) and metadata for enhanced retrieval accuracy.

Structured Data Processing

For CSV and Excel files:
Structured Data Configuration
  • Rows per batch: Combine 1-20 rows per searchable chunk
  • Table format: CSV or Markdown output
  • CSV content column: Specify which column contains text
  • Processing mode: Row-level (one doc per row) or file-level

Embedding Configuration

Embedding Provider Selection
Choose your embedding provider:
  • Azure OpenAI: text-embedding-ada-002 (1536 dimensions)
  • Mistral AI: mistral-embed (1024 dimensions)
  • Telekom OTC: BGE-M3, Jina v2 Base (DE/Code), TSI ColQwen2
Retrieval options:
  • Top K: Number of results to return (default: 10)
  • Search EF: HNSW search accuracy parameter (default: 64)
  • Metric Type: Distance metric - COSINE (recommended), L2, or IP
  • Score threshold: Minimum relevance score (0.0 - 1.0, optional)
  • Offset: Skip N results for pagination (optional)

Step 4: Retrieval Methods

Set up how agents query your data:
  • Similarity search: Vector-based semantic search
  • Keyword search: Traditional keyword matching
  • Hybrid search: Combine vector + keyword for best results

Using Knowledge Bases in Agents

Once created, add knowledge bases to your agents:

Basic Configuration

agents:
  - name: "support_agent"
    agent_type: "llm_agent"
    
    knowledge_bases:
      - name: "product_docs"
        knowledge_base_type: "milvus"
        knowledge_base_id: "kb_abc123"
        retrieval_config:
          strategy: "dense"  # dense, hybrid, bm25, or hybrid_rrf
          top_k: 10
          search_ef: 64
          metric_type: "COSINE"

Advanced Configuration

agents:
  - name: "advanced_agent"
    agent_type: "llm_agent"
    
    knowledge_bases:
      - name: "company_kb"
        knowledge_base_type: "milvus"
        knowledge_base_id: "kb_xyz789"
        retrieval_config:
          # Search strategy
          strategy: "hybrid_rrf"  # Options: dense, hybrid, bm25, hybrid_rrf
          
          # Result count
          top_k: 20
          offset: 0  # For pagination
          
          # Search quality
          search_ef: 128  # Higher = more accurate but slower
          metric_type: "COSINE"  # COSINE, L2, or IP
          score_threshold: 0.7  # Filter low-relevance results
          
          # Query expansion (optional) - Improves recall by generating query variations
          query_model: "gpt-4o"  # LLM for generating alternative phrasings
          max_num_query_expansions: 3  # Generate 3 additional query variants
Search Strategy Options:
  • dense: Vector-based semantic search only (default, fastest)
  • hybrid: Combines dense vector + BM25 keyword search
  • bm25: Keyword-based search only
  • hybrid_rrf: Hybrid with Reciprocal Rank Fusion for better result ranking
Or use the + button in Playground to insert knowledge bases. Knowledge Base Setup - Complete YAML configuration reference and advanced retrieval options

Managing Your Data

Updating Documents

Cloud storage: Files auto-sync when changed Manual uploads:
  1. Go to Data Sources
  2. Click the data source
  3. Upload new version or delete old files

Deleting Data

Delete a data source: Removes from all collections Delete a collection: Knowledge bases using it will stop working Delete a knowledge base: Agents using it will fail
Always check which agents are using a knowledge base before deleting it

Best Practices

Organize by topic: Create collections for different knowledge domains (product docs, policies, code, etc.)
Use cloud storage for dynamic content: If your docs change frequently, connect cloud storage instead of manual uploads
Test retrieval quality: Use the search preview in Knowledge Bases to test if you’re getting relevant results
Start with hybrid search: Combines the best of vector and keyword search for most use cases

Common Issues

“No results found”: Check your score threshold - it might be too high. Try lowering to 0.5 or 0.6
“Irrelevant results”: Increase the score threshold or reduce top_k to get more focused results
“Cloud sync failed”: Re-authenticate your cloud storage connection in Settings → Integrations

What’s Next?

Knowledge Base Setup

YAML configuration and advanced retrieval options

Playground

Add knowledge bases to agents

Milvus

Vector database details

Uknow Cloud Storage

Cloud storage integration