Skip to main content
Create a new knowledge base from collections and data sources with customizable chunking and embedding configurations.

Request Body

name
string
required
Knowledge base name
description
string
Optional description
processing_engine
string
required
Processing engine: milvusCurrently only Milvus is supported as the processing engine.
selected_collections
array
Array of collection IDs to include. Data sources from these collections will be extracted automatically.Either selected_collections or selected_datasources (or both) must be provided.
selected_datasources
array
Array of individual data source IDs to include.Either selected_collections or selected_datasources (or both) must be provided.
chunking_strategy
object
Chunking configuration for text processing. See Chunking Strategies for details.Default: Recursive strategy with 500 word chunks and 50 word overlap
embedding_config
object
Embedding model configuration. See Embedding Configuration for details.Required fields:
  • provider: "azure_openai" | "mistral" | "telekom_otc"
  • model: Model name (e.g., "text-embedding-ada-002", "mistral-embed")
  • dimensions: Vector dimensions (e.g., 1536, 1024)
save_as_collection
boolean
default:"false"
Whether to save the selected data sources as a new collection
new_collection_name
string
Name for the new collection (required if save_as_collection is true)
new_collection_description
string
Description for the new collection (optional)

Chunking Strategies

Control how documents are split into searchable chunks:
Chunking Strategy Configuration

Recursive (Default)

Splits text using multiple separators in order (paragraphs → sentences → words).
"chunking_strategy": {
  "strategy": "recursive",
  "chunk_size": 500,
  "chunk_overlap": 50,
  "split_by": "word",  // "word" or "char" only
  "recursive_separators": ["\n\n", "\n", ". ", " "]
}

Hierarchical

Creates multi-level chunks preserving document structure.
"chunking_strategy": {
  "strategy": "hierarchical",
  "chunk_overlap": 50,
  "split_by": "word",  // "word" only
  "hierarchical_block_sizes": [700, 350, 150]
}

Fixed

Simple fixed-size chunks with specified overlap.
"chunking_strategy": {
  "strategy": "fixed",
  "chunk_size": 500,
  "chunk_overlap": 50,
  "split_by": "word"  // "word" only
}

Structured Data (CSV/Excel)

For .csv, .xlsx, .xls files, use separate configuration:
"chunking_strategy": {
  "document_config": {
    "strategy": "recursive",
    "chunk_size": 500,
    "chunk_overlap": 50,
    "split_by": "word"
  },
  "structured_config": {
    "rows_per_batch": 10,
    "table_format": "csv",
    "csv_content_column": "text",
    "csv_conversion_mode": "row"
  }
}
Structured Config Fields:
  • rows_per_batch: Rows to combine per chunk (1-20)
  • table_format: Output format ("csv" or "markdown")
  • csv_content_column: Column name for text content (CSV only)
  • csv_conversion_mode: "row" (one doc per row) or "file" (one doc per file)

Embedding Configuration

Supported Providers:
  • Azure OpenAI: text-embedding-ada-002 (1536 dimensions)
  • Mistral AI: mistral-embed (1024 dimensions)
  • Telekom OTC:
    • text-embedding-bge-m3 (1024 dimensions)
    • jina-embeddings-v2-base-de (768 dimensions)
    • jina-embeddings-v2-base-code (768 dimensions)
    • tsi-embedding-colqwen2-2b-v1 (1024 dimensions)
"embedding_config": {
  "provider": "mistral",
  "model": "mistral-embed",
  "dimensions": 1024
}

Examples

Basic Example

curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Knowledge Base",
    "description": "Product documentation and guides",
    "processing_engine": "milvus",
    "selected_collections": ["coll_123", "coll_456"]
  }'

Advanced Example with Custom Chunking

curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Papers KB",
    "description": "Academic research with hierarchical chunking",
    "processing_engine": "milvus",
    "selected_datasources": ["ds_789", "ds_012"],
    "chunking_strategy": {
      "strategy": "hierarchical",
      "chunk_overlap": 70,
      "split_by": "word",
      "hierarchical_block_sizes": [700, 350, 150]
    },
    "embedding_config": {
      "provider": "azure_openai",
      "model": "text-embedding-3-small",
      "dimensions": 1536
    },
    "save_as_collection": true,
    "new_collection_name": "Research Collection"
  }'

Response

{
  "success": true,
  "message": "Knowledge base 'Product Knowledge Base' created successfully",
  "knowledge_base": {
    "id": "kb_789",
    "name": "Product Knowledge Base",
    "description": "Product documentation and guides",
    "processing_engine": "milvus",
    "dataset_name": "",
    "normalized_dataset_name": "product_knowledge_base",
    "status": "pending",
    "progress": 0,
    "datasources": [],
    "datasource_count": 2,
    "total_datasources": 2,
    "ingested_datasources": [],
    "failed_datasources": [],
    "datasource_errors": {},
    "datasource_snapshots": {},
    "success_rate": 0,
    "chunking_strategy": {
      "strategy": "recursive",
      "chunk_size": 500,
      "chunk_overlap": 50,
      "split_by": "word",
      "recursive_separators": ["\n\n", "\n", ". ", " "]
    },
    "embedding_config": {
      "provider": "mistral",
      "model": "mistral-embed",
      "dimensions": 1024
    },
    "llm_config": {},
    "processing_metadata": {},
    "created_by_name": "John Doe",
    "company_name": "Acme Corp",
    "created_at": "2025-01-15T10:30:00Z",
    "updated_at": "2025-01-15T10:30:00Z"
  }
}