Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.atthene.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Atthene supports multiple LLM providers and models, each with unique capabilities. The framework automatically maps model names to their providers, handles authentication, and validates configurations.

Supported Providers

Atthene integrates with the following LLM providers:

Azure OpenAI

Enterprise-grade OpenAI models hosted on Azure

Telekom OTC

Open Telekom Cloud with Qwen, Claude, and custom models

Mistral AI

Fast and efficient European AI models

Google Gemini

Google’s multimodal AI models

Vertex AI Anthropic

Claude models hosted on Google Cloud Vertex AI

Model Selection

Basic Configuration

Specify a model in your agent’s llm_config section:
agents:
  - name: analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o
      temperature: 0.7
      max_tokens: 4096

Automatic Provider Detection

Atthene automatically detects the provider based on the model name. You don’t need to specify the provider explicitly:
# These automatically map to their respective providers
llm_config:
  model: gpt-4o           # → Azure OpenAI
  model: mistral-large    # → Mistral AI
  model: gemini-2.5-pro   # → Google Gemini
  model: claude-sonnet-4  # → Telekom OTC
The framework uses case-insensitive matching for model names, so gpt-4o, GPT-4O, and Gpt-4o are all valid.

Available Models

Azure OpenAI Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
gpt-4o✅ YesText, Image128K tokens
Azure OpenAI uses deployment names. Ensure your deployment name matches the model configuration.

Telekom OTC Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
Llama-3.3-70B✅ YesText128K tokens
Qwen3-VL-30B✅ YesText, Image32K tokens
Qwen3-30B✅ YesText32K tokens
Qwen3-Next-80B✅ YesText32K tokens
Teuken-7B❌ NoText32K tokens
claude-3-7-sonnet✅ YesText, Image128K tokens
claude-sonnet-4✅ YesText, Image128K tokens
gpt-oss-120b✅ YesText32K tokens
Teuken-7B does not support tool calling. Use it only for text generation tasks without tools.

Mistral AI Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
mistral-large✅ YesText, Image128K tokens
mistral-medium✅ YesText, Image128K tokens
mistral-small✅ YesText, Image128K tokens
ministral-3b✅ YesText, Image128K tokens
ministral-8b✅ YesText, Image128K tokens
ministral-14b✅ YesText, Image128K tokens
magistral-medium❌ NoText, Image128K tokens
magistral-small❌ NoText, Image128K tokens
pixtral-large✅ YesText, Image128K tokens
codestral✅ YesText128K tokens
devstral-small✅ YesText128K tokens
devstral-medium❌ NoText128K tokens
Magistral and devstral-medium models do not support tool calling. Use them only for text generation tasks without tools.

Google Gemini Models

Model Name (use in YAML)Tool SupportThinkingModalitiesContext WindowMax Output
gemini-2.5-flash✅ Yes✅ YesText, Image, Audio, Video1M tokens8K tokens
gemini-2.5-pro✅ Yes✅ YesText, Image, Audio, Video1M tokens8K tokens
gemini-3-flash✅ Yes✅ YesText, Image, Audio, Video1M tokens64K tokens
gemini-3.1-flash-lite✅ Yes✅ YesText, Image, Audio, Video~1M tokens64K tokens
gemini-3.1-pro✅ Yes✅ YesText, Image, Audio, Video~1M tokens64K tokens
Gemini models have massive context windows (1M tokens), support all modalities (text, image, audio, video), and support thinking mode — enable it with show_reasoning: true in your streaming config.

Vertex AI Anthropic Models

Model Name (use in YAML)Tool SupportThinkingModalitiesContext WindowMax Output
claude-opus-4-6✅ Yes✅ YesText, Image1M tokens128K tokens
claude-sonnet-4-6✅ Yes✅ YesText, Image1M tokens128K tokens

Multimodal Models & OCR

Understanding Modalities

Atthene models support different input types (modalities):
  • Text: Standard text input/output
  • Image: Image understanding and OCR capabilities

Image Processing (OCR)

Models with image modality support can process images directly. You can send images via the Frontend UI or REST API.

Agent Configuration

Configure your agent with a vision-capable model:
agents:
  - name: document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Supports text + image
      temperature: 0.3
    prompt_config:
      system_prompt: |
        You are a document analyzer. Extract text and structured data from images.
        Provide detailed analysis of visual elements.

Sending Images via Frontend

Use the Atthene GPT frontend to upload and process images:
  1. Click the attachment icon in the chat interface
  2. Select your image file (JPEG, PNG, etc.)
  3. Add your text prompt
  4. Send the message
The frontend automatically handles image encoding and multimodal message formatting.

Sending Images via REST API

Send multimodal content using the /api/v1/sessions/{session_id}/execute endpoint:
POST /api/v1/sessions/{session_id}/execute
Content-Type: application/json
Authorization: Bearer <your_token>

{
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/jpeg",
        "data": "<base64_encoded_image_data>"
      }
    }
  ]
}
Alternative: Using Image URLs For better performance, use publicly accessible image URLs instead of base64:
{
  "content": [
    {
      "type": "text",
      "text": "Analyze this document"
    },
    {
      "type": "image",
      "source": {
        "type": "url",
        "media_type": "image/jpeg",
        "url": "https://example.com/document.jpg"
      }
    }
  ]
}

PDF Processing

PDFs are automatically converted to images when sent to vision-capable models:
  • Each PDF page is converted to a JPEG image (150 DPI)
  • Images are compressed to max 800KB per page (1568px max dimension)
  • Quality is optimized for LLM vision APIs (85-40% JPEG quality)
  • Conversion happens automatically in the backend

Sending PDFs via REST API

{
  "content": [
    {
      "type": "text",
      "text": "Extract data from this invoice"
    },
    {
      "type": "file",
      "source": {
        "type": "base64",
        "media_type": "application/pdf",
        "filename": "invoice.pdf",
        "data": "<base64_encoded_pdf_data>"
      }
    }
  ]
}
The backend automatically:
  1. Detects the PDF file content
  2. Converts each page to a compressed JPEG image
  3. Sends the images to the multimodal model
  4. Processes the response
Note: Only vision-capable models can process images and PDFs. Text-only models will reject multimodal content with a clear error message listing available vision models. Supported Image Models:
  • Azure OpenAI: gpt-4o
  • Telekom OTC: Qwen3-VL-30B, claude-3-7-sonnet, claude-sonnet-4
  • Mistral AI: mistral-large, mistral-medium, mistral-small, ministral-3b/8b/14b, magistral-medium/small, pixtral-large
  • Google Gemini: All Gemini models (also support audio and video)
  • Vertex AI Anthropic: claude-opus-4-6, claude-sonnet-4-6

Multimodal Use Cases

Use vision models to extract text from scanned documents, invoices, receipts, and forms.Best Models: gpt-4o, claude-sonnet-4, pixtral-large
Analyze images for content, objects, scenes, and context.Best Models: gemini-2.5-pro, gpt-4o, Qwen3-VL-30B

BYOK (Bring Your Own Key)

Override API keys for specific agents using the api_key field in llm_config. This allows different agents to use different API keys, enabling multi-tenant isolation:
agents:
  - name: customer_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large
      api_key: your-mistral-api-key  # Direct key
      temperature: 0.7
You can also reference environment variables using the ${ENV_VAR} syntax:
llm_config:
  model: mistral-large
  api_key: ${TENANT_A_MISTRAL_KEY}  # References an environment variable
Security Notes: API keys provided directly in the YAML configuration are stored in plain text. For production deployments, use environment variable references (${ENV_VAR} syntax) instead of hardcoding keys.

Model Configuration Reference

LLMConfig Schema

llm_config:
  model: string              # Model name (required for non-default)
  temperature: float         # 0.0 - 2.0 (default: 0.7)
  max_tokens: int           # Max output tokens (optional)
  api_key: string           # Optional BYOK override

Field Descriptions

FieldTypeDefaultDescription
modelstringEnvironment defaultModel name or display name
temperaturefloat0.7Sampling temperature (0.0 = deterministic, 2.0 = creative)
max_tokensintModel defaultMaximum tokens to generate
api_keystringEnvironment variableOptional API key override

Temperature Guidelines

1

Low Temperature (0.0 - 0.3)

Use for: Data extraction, classification, structured outputBehavior: Deterministic, focused, consistent
2

Medium Temperature (0.4 - 0.8)

Use for: General conversation, Q&A, analysisBehavior: Balanced creativity and consistency
3

High Temperature (0.9 - 2.0)

Use for: Creative writing, brainstorming, diverse outputsBehavior: Creative, varied, less predictable

Token Limits & Context Windows

Each model has specific token limits that affect how much context can be processed:
# Example: Using a model with large context window
agents:
  - name: long_document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # 1M token context window
      temperature: 0.5
    prompt_config:
      include_history: true  # Can include extensive history
Token Calculation: Input tokens + Output tokens ≤ Max Total TokensThe framework automatically validates token limits and will raise errors if exceeded.

Best Practices

  1. Start with balanced models (gpt-4o, mistral-large-latest)
  2. Use specialized models for specific tasks (e.g., codestral for code)
  3. Consider cost vs. performance tradeoffs
  4. Test with different temperatures to find optimal settings
  1. Use vision models only when processing images
  2. Optimize image sizes before sending to API
  3. Consider token costs for image processing
  4. Validate modality support before deployment
  1. Implement key rotation for production systems
  2. Monitor usage and costs per key
  3. Use BYOK for multi-tenant isolation
  1. Set appropriate max_tokens to avoid waste
  2. Use smaller models for simple tasks
  3. Monitor token usage and optimize prompts

Troubleshooting

Error: Unknown provider type or Model not foundSolution:
  • Check model name spelling (case-insensitive)
  • Verify model is in supported list
  • Use display name or full model name
Error: 401 Unauthorized or Invalid API keySolution:
  • Verify API key is correct
  • Check environment variables are set
  • Ensure BYOK key has proper permissions
  • Validate API base URL is correct
Error: Token limit exceeded or Context too longSolution:
  • Reduce max_tokens setting
  • Limit conversation history (include_history: 5)
  • Use model with larger context window
  • Optimize prompt length
Error: Model does not support toolsSolution:
  • Check model’s supports_tools capability
  • Use different model (e.g., avoid Magistral models)
  • Remove tools from agent configuration

Examples

Multi-Provider System

name: multi_provider_system
description: System using different providers for different tasks
architecture: workflow

agents:
  - name: fast_classifier
    agent_type: llm_agent
    llm_config:
      model: mistral-small-latest  # Fast Mistral model
      temperature: 0.2
    prompt_config:
      system_prompt: Classify user intent quickly
  
  - name: deep_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # Large context Gemini
      temperature: 0.5
    prompt_config:
      system_prompt: Perform deep analysis with full context
  
  - name: image_processor
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Vision-capable model
      temperature: 0.3
    prompt_config:
      system_prompt: Extract information from images

edges:
  - from: START
    to: fast_classifier
  - from: fast_classifier
    to: deep_analyzer
  - from: deep_analyzer
    to: image_processor

BYOK Multi-Tenant Example

agents:
  - name: tenant_a_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large
      api_key: ${TENANT_A_MISTRAL_KEY}  # Tenant A's Mistral key
      temperature: 0.7
  
  - name: tenant_b_agent
    agent_type: llm_agent
    llm_config:
      model: Qwen3-30B  # Telekom OTC
      api_key: ${TENANT_B_TELEKOM_KEY}  # Tenant B's Telekom key
      temperature: 0.7

Next Steps

Prompt Configuration

Learn how to configure prompts and context

Agent Types

Explore different agent types and capabilities

Agent Capabilities

Explore tools, streaming, and other capabilities

YAML Configuration

Learn about complete agent configuration