Create a new knowledge base from collections and data sources with customizable chunking and embedding configurations.
Request Body
Processing engine: milvusCurrently only Milvus is supported as the processing engine.
Array of collection IDs to include. Data sources from these collections will be extracted automatically.Either selected_collections or selected_datasources (or both) must be provided.
Array of individual data source IDs to include.Either selected_collections or selected_datasources (or both) must be provided.
Chunking configuration for text processing. See Chunking Strategies for details.Default: Recursive strategy with 500 word chunks and 50 word overlap
Embedding model configuration. See Embedding Configuration for details.Required fields:
provider: "azure_openai" | "mistral" | "telekom_otc"
model: Model name (e.g., "text-embedding-ada-002", "mistral-embed")
dimensions: Vector dimensions (e.g., 1536, 1024)
Whether to save the selected data sources as a new collection
Name for the new collection (required if save_as_collection is true)
new_collection_description
Description for the new collection (optional)
Chunking Strategies
Control how documents are split into searchable chunks:
Recursive (Default)
Splits text using multiple separators in order (paragraphs → sentences → words).
"chunking_strategy": {
"strategy": "recursive",
"chunk_size": 500,
"chunk_overlap": 50,
"split_by": "word", // "word" or "char" only
"recursive_separators": ["\n\n", "\n", ". ", " "]
}
Hierarchical
Creates multi-level chunks preserving document structure.
"chunking_strategy": {
"strategy": "hierarchical",
"chunk_overlap": 50,
"split_by": "word", // "word" only
"hierarchical_block_sizes": [700, 350, 150]
}
Fixed
Simple fixed-size chunks with specified overlap.
"chunking_strategy": {
"strategy": "fixed",
"chunk_size": 500,
"chunk_overlap": 50,
"split_by": "word" // "word" only
}
Structured Data (CSV/Excel)
For .csv, .xlsx, .xls files, use separate configuration:
"chunking_strategy": {
"document_config": {
"strategy": "recursive",
"chunk_size": 500,
"chunk_overlap": 50,
"split_by": "word"
},
"structured_config": {
"rows_per_batch": 10,
"table_format": "csv",
"csv_content_column": "text",
"csv_conversion_mode": "row"
}
}
Structured Config Fields:
rows_per_batch: Rows to combine per chunk (1-20)
table_format: Output format ("csv" or "markdown")
csv_content_column: Column name for text content (CSV only)
csv_conversion_mode: "row" (one doc per row) or "file" (one doc per file)
Embedding Configuration
Supported Providers:
- Azure OpenAI:
text-embedding-ada-002 (1536 dimensions)
- Mistral AI:
mistral-embed (1024 dimensions)
- Telekom OTC:
text-embedding-bge-m3 (1024 dimensions)
jina-embeddings-v2-base-de (768 dimensions)
jina-embeddings-v2-base-code (768 dimensions)
tsi-embedding-colqwen2-2b-v1 (1024 dimensions)
"embedding_config": {
"provider": "mistral",
"model": "mistral-embed",
"dimensions": 1024
}
Examples
Basic Example
curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Product Knowledge Base",
"description": "Product documentation and guides",
"processing_engine": "milvus",
"selected_collections": ["coll_123", "coll_456"]
}'
Advanced Example with Custom Chunking
curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Research Papers KB",
"description": "Academic research with hierarchical chunking",
"processing_engine": "milvus",
"selected_datasources": ["ds_789", "ds_012"],
"chunking_strategy": {
"strategy": "hierarchical",
"chunk_overlap": 70,
"split_by": "word",
"hierarchical_block_sizes": [700, 350, 150]
},
"embedding_config": {
"provider": "azure_openai",
"model": "text-embedding-3-small",
"dimensions": 1536
},
"save_as_collection": true,
"new_collection_name": "Research Collection"
}'
Response
{
"success": true,
"message": "Knowledge base 'Product Knowledge Base' created successfully",
"knowledge_base": {
"id": "kb_789",
"name": "Product Knowledge Base",
"description": "Product documentation and guides",
"processing_engine": "milvus",
"dataset_name": "",
"normalized_dataset_name": "product_knowledge_base",
"status": "pending",
"progress": 0,
"datasources": [],
"datasource_count": 2,
"total_datasources": 2,
"ingested_datasources": [],
"failed_datasources": [],
"datasource_errors": {},
"datasource_snapshots": {},
"success_rate": 0,
"chunking_strategy": {
"strategy": "recursive",
"chunk_size": 500,
"chunk_overlap": 50,
"split_by": "word",
"recursive_separators": ["\n\n", "\n", ". ", " "]
},
"embedding_config": {
"provider": "mistral",
"model": "mistral-embed",
"dimensions": 1024
},
"llm_config": {},
"processing_metadata": {},
"created_by_name": "John Doe",
"company_name": "Acme Corp",
"created_at": "2025-01-15T10:30:00Z",
"updated_at": "2025-01-15T10:30:00Z"
}
}