Create a new knowledge base from collections and data sources with customizable chunking and embedding configurations.Documentation Index
Fetch the complete documentation index at: https://docs.atthene.com/llms.txt
Use this file to discover all available pages before exploring further.
Request Body
Knowledge base name
Optional description
Processing engine:
milvusCurrently only Milvus is supported as the processing engine.Array of collection IDs to include. Data sources from these collections will be extracted automatically.Either
selected_collections or selected_datasources (or both) must be provided.Array of individual data source IDs to include.Either
selected_collections or selected_datasources (or both) must be provided.Chunking configuration for text processing. See Chunking Strategies for details.Default: Recursive strategy with 500 word chunks and 50 word overlap
Embedding model configuration. See Embedding Configuration for details.Required fields:
provider:"azure_openai"|"mistral"|"telekom_otc"model: Model name (e.g.,"text-embedding-ada-002","mistral-embed")dimensions: Vector dimensions (e.g.,1536,1024)
Whether to save the selected data sources as a new collection
Name for the new collection (required if
save_as_collection is true)Description for the new collection (optional)
Chunking Strategies
Control how documents are split into searchable chunks:
Recursive (Default)
Splits text using multiple separators in order (paragraphs → sentences → words).Hierarchical
Creates multi-level chunks preserving document structure.Fixed
Simple fixed-size chunks with specified overlap.Structured Data (CSV/Excel)
For.csv, .xlsx, .xls files, use separate configuration:
rows_per_batch: Rows to combine per chunk (1-20)table_format: Output format ("csv"or"markdown")csv_content_column: Column name for text content (CSV only)csv_conversion_mode:"row"(one doc per row) or"file"(one doc per file)
Embedding Configuration
Supported Providers:- Azure OpenAI:
text-embedding-ada-002(1536 dimensions) - Mistral AI:
mistral-embed(1024 dimensions) - Telekom OTC:
text-embedding-bge-m3(1024 dimensions)jina-embeddings-v2-base-de(768 dimensions)jina-embeddings-v2-base-code(768 dimensions)tsi-embedding-colqwen2-2b-v1(1024 dimensions)