Create Knowledge Base

Create a new knowledge base from collections and data sources with customizable chunking and embedding configurations.

Request Body

name

string

required

Knowledge base name

description

string

Optional description

processing_engine

string

required

Processing engine: milvusCurrently only Milvus is supported as the processing engine.

selected_collections

array

Array of collection IDs to include. Data sources from these collections will be extracted automatically.Either selected_collections or selected_datasources (or both) must be provided.

selected_datasources

array

Array of individual data source IDs to include.Either selected_collections or selected_datasources (or both) must be provided.

chunking_strategy

object

Chunking configuration for text processing. See Chunking Strategies for details.Default: Recursive strategy with 500 word chunks and 50 word overlap

embedding_config

object

Embedding model configuration. See Embedding Configuration for details.Required fields:

provider: "azure_openai" | "mistral" | "telekom_otc"
model: Model name (e.g., "text-embedding-ada-002", "mistral-embed")
dimensions: Vector dimensions (e.g., 1536, 1024)

save_as_collection

boolean

default:"false"

Whether to save the selected data sources as a new collection

new_collection_name

string

Name for the new collection (required if save_as_collection is true)

new_collection_description

string

Description for the new collection (optional)

Chunking Strategies

Control how documents are split into searchable chunks:

Recursive (Default)

Splits text using multiple separators in order (paragraphs → sentences → words).

"chunking_strategy": {
  "strategy": "recursive",
  "chunk_size": 500,
  "chunk_overlap": 50,
  "split_by": "word",  // "word" or "char" only
  "recursive_separators": ["\n\n", "\n", ". ", " "]
}

Hierarchical

Creates multi-level chunks preserving document structure.

"chunking_strategy": {
  "strategy": "hierarchical",
  "chunk_overlap": 50,
  "split_by": "word",  // "word" only
  "hierarchical_block_sizes": [700, 350, 150]
}

Fixed

Simple fixed-size chunks with specified overlap.

"chunking_strategy": {
  "strategy": "fixed",
  "chunk_size": 500,
  "chunk_overlap": 50,
  "split_by": "word"  // "word" only
}

Structured Data (CSV/Excel)

For .csv, .xlsx, .xls files, use separate configuration:

"chunking_strategy": {
  "document_config": {
    "strategy": "recursive",
    "chunk_size": 500,
    "chunk_overlap": 50,
    "split_by": "word"
  },
  "structured_config": {
    "rows_per_batch": 10,
    "table_format": "csv",
    "csv_content_column": "text",
    "csv_conversion_mode": "row"
  }
}

Structured Config Fields:

rows_per_batch: Rows to combine per chunk (1-20)
table_format: Output format ("csv" or "markdown")
csv_content_column: Column name for text content (CSV only)
csv_conversion_mode: "row" (one doc per row) or "file" (one doc per file)

Embedding Configuration

Supported Providers:

Azure OpenAI: text-embedding-ada-002 (1536 dimensions)
Mistral AI: mistral-embed (1024 dimensions)
Telekom OTC:
- text-embedding-bge-m3 (1024 dimensions)
- jina-embeddings-v2-base-de (768 dimensions)
- jina-embeddings-v2-base-code (768 dimensions)
- tsi-embedding-colqwen2-2b-v1 (1024 dimensions)

"embedding_config": {
  "provider": "mistral",
  "model": "mistral-embed",
  "dimensions": 1024
}

Examples

Basic Example

curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Knowledge Base",
    "description": "Product documentation and guides",
    "processing_engine": "milvus",
    "selected_collections": ["coll_123", "coll_456"]
  }'

Advanced Example with Custom Chunking

curl -X POST https://api-be.atthene.com/api/v1/knowledge-bases/ \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Research Papers KB",
    "description": "Academic research with hierarchical chunking",
    "processing_engine": "milvus",
    "selected_datasources": ["ds_789", "ds_012"],
    "chunking_strategy": {
      "strategy": "hierarchical",
      "chunk_overlap": 70,
      "split_by": "word",
      "hierarchical_block_sizes": [700, 350, 150]
    },
    "embedding_config": {
      "provider": "azure_openai",
      "model": "text-embedding-3-small",
      "dimensions": 1536
    },
    "save_as_collection": true,
    "new_collection_name": "Research Collection"
  }'

Response

{
  "success": true,
  "message": "Knowledge base 'Product Knowledge Base' created successfully",
  "knowledge_base": {
    "id": "kb_789",
    "name": "Product Knowledge Base",
    "description": "Product documentation and guides",
    "processing_engine": "milvus",
    "dataset_name": "",
    "normalized_dataset_name": "product_knowledge_base",
    "status": "pending",
    "progress": 0,
    "datasources": [],
    "datasource_count": 2,
    "total_datasources": 2,
    "ingested_datasources": [],
    "failed_datasources": [],
    "datasource_errors": {},
    "datasource_snapshots": {},
    "success_rate": 0,
    "chunking_strategy": {
      "strategy": "recursive",
      "chunk_size": 500,
      "chunk_overlap": 50,
      "split_by": "word",
      "recursive_separators": ["\n\n", "\n", ". ", " "]
    },
    "embedding_config": {
      "provider": "mistral",
      "model": "mistral-embed",
      "dimensions": 1024
    },
    "llm_config": {},
    "processing_metadata": {},
    "created_by_name": "John Doe",
    "company_name": "Acme Corp",
    "created_at": "2025-01-15T10:30:00Z",
    "updated_at": "2025-01-15T10:30:00Z"
  }
}

API Overview

Agents

Sessions

Runtime

Files

Collections

Knowledge Bases

Images

Request Body

Chunking Strategies

Recursive (Default)

Hierarchical

Fixed

Structured Data (CSV/Excel)

Embedding Configuration

Examples

Basic Example

Advanced Example with Custom Chunking

Response

API Overview

Agents

Sessions

Runtime

Files

Collections

Knowledge Bases

Images

Documentation Index

​Request Body

​Chunking Strategies

​Recursive (Default)

​Hierarchical

​Fixed

​Structured Data (CSV/Excel)

​Embedding Configuration

​Examples

​Basic Example

​Advanced Example with Custom Chunking

​Response

Request Body

Chunking Strategies

Recursive (Default)

Hierarchical

Fixed

Structured Data (CSV/Excel)

Embedding Configuration

Examples

Basic Example

Advanced Example with Custom Chunking

Response