Skip to main content

Overview

Atthene supports multiple LLM providers and models, each with unique capabilities. The framework automatically maps model names to their providers, handles authentication, and validates configurations.

Supported Providers

Atthene integrates with the following LLM providers:

Azure OpenAI

Enterprise-grade OpenAI models hosted on Azure

Telekom OTC

Open Telekom Cloud with Qwen, Claude, and custom models

Mistral AI

Fast and efficient European AI models

Google Gemini

Google’s multimodal AI models

Model Selection

Basic Configuration

Specify a model in your agent’s llm_config section:
agents:
  - name: analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o
      temperature: 0.7
      max_tokens: 4096

Automatic Provider Detection

Atthene automatically detects the provider based on the model name. You don’t need to specify the provider explicitly:
# These automatically map to their respective providers
llm_config:
  model: gpt-4o           # → Azure OpenAI
  model: mistral-large    # → Mistral AI
  model: gemini-2.5-pro   # → Google Gemini
  model: claude-sonnet-4  # → Telekom OTC
The framework uses case-insensitive matching for model names, so gpt-4o, GPT-4O, and Gpt-4o are all valid.

Available Models

Azure OpenAI Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
gpt-4o✅ YesText, Image128K tokens
Azure OpenAI uses deployment names. Ensure your deployment name matches the model configuration.

Telekom OTC Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
Qwen2.5-Coder-32B✅ YesText32K tokens
Qwen2.5-VL-72B✅ YesText, Image32K tokens
Qwen3-235B-A22B✅ YesText32K tokens
Qwen3-30B✅ YesText32K tokens
Teuken-7B✅ YesText32K tokens
claude-3-7-sonnet✅ YesText, Image128K tokens
claude-sonnet-4✅ YesText, Image128K tokens
gpt-oss-120b✅ YesText32K tokens

Mistral AI Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
mistral-large✅ YesText128K tokens
mistral-medium✅ YesText, Image128K tokens
mistral-small✅ YesText, Image128K tokens
pixtral-large✅ YesText, Image128K tokens
codestral✅ YesText128K tokens
ministral-3b✅ YesText128K tokens
ministral-8b✅ YesText128K tokens
devstral-small✅ YesText128K tokens
devstral-medium❌ NoText128K tokens
magistral-medium❌ NoText, Image128K tokens
magistral-small❌ NoText, Image128K tokens
mistral-moderation❌ NoText128K tokens
mistral-saba❌ NoText128K tokens
Magistral, devstral-medium, mistral-moderation, and mistral-saba models do not support tool calling. Use them only for text generation tasks without tools.

Google Gemini Models

Model Name (use in YAML)Tool SupportModalitiesContext Window
gemini-2.5-pro✅ YesText, Image1M tokens
gemini-2.5-flash✅ YesText, Image1M tokens
gemini-3-pro✅ YesText, Image1M tokens
Gemini models have massive context windows (1M tokens) and support all modalities, making them ideal for complex multimodal tasks.

Multimodal Models & OCR

Understanding Modalities

Atthene models support different input types (modalities):
  • Text: Standard text input/output
  • Image: Image understanding and OCR capabilities

Image Processing (OCR)

Models with image modality support can process images directly. You can send images via the Frontend UI or REST API.

Agent Configuration

Configure your agent with a vision-capable model:
agents:
  - name: document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Supports text + image
      temperature: 0.3
    prompt_config:
      system_prompt: |
        You are a document analyzer. Extract text and structured data from images.
        Provide detailed analysis of visual elements.

Sending Images via Frontend

Use the Atthene GPT frontend to upload and process images:
  1. Click the attachment icon in the chat interface
  2. Select your image file (JPEG, PNG, etc.)
  3. Add your text prompt
  4. Send the message
The frontend automatically handles image encoding and multimodal message formatting.

Sending Images via REST API

Send multimodal content using the /api/v1/sessions/{session_id}/execute endpoint:
POST /api/v1/sessions/{session_id}/execute
Content-Type: application/json
Authorization: Bearer <your_token>

{
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/jpeg",
        "data": "<base64_encoded_image_data>"
      }
    }
  ]
}
Alternative: Using Image URLs For better performance, use publicly accessible image URLs instead of base64:
{
  "content": [
    {
      "type": "text",
      "text": "Analyze this document"
    },
    {
      "type": "image",
      "source": {
        "type": "url",
        "media_type": "image/jpeg",
        "url": "https://example.com/document.jpg"
      }
    }
  ]
}

PDF Processing

PDFs are automatically converted to images when sent to vision-capable models:
  • Each PDF page is converted to a JPEG image (150 DPI)
  • Images are compressed to max 800KB per page (1568px max dimension)
  • Quality is optimized for LLM vision APIs (85-40% JPEG quality)
  • Conversion happens automatically in the backend

Sending PDFs via REST API

{
  "content": [
    {
      "type": "text",
      "text": "Extract data from this invoice"
    },
    {
      "type": "file",
      "source": {
        "type": "base64",
        "media_type": "application/pdf",
        "filename": "invoice.pdf",
        "data": "<base64_encoded_pdf_data>"
      }
    }
  ]
}
The backend automatically:
  1. Detects the PDF file content
  2. Converts each page to a compressed JPEG image
  3. Sends the images to the multimodal model
  4. Processes the response
Note: Only vision-capable models (e.g., gpt-4o, claude-3-5-sonnet-20241022, Qwen2-VL-72B-Instruct) can process images and PDFs. Text-only models will reject multimodal content. Supported Image Models:
  • Azure OpenAI: gpt-4o
  • Telekom OTC: Qwen2.5-VL-72B, claude-3-7-sonnet, claude-sonnet-4
  • Mistral AI: mistral-medium, mistral-small, pixtral-large, magistral-medium, magistral-small
  • Google Gemini: All Gemini models

Multimodal Use Cases

Use vision models to extract text from scanned documents, invoices, receipts, and forms.Best Models: gpt-4o, claude-sonnet-4, pixtral-large
Analyze images for content, objects, scenes, and context.Best Models: gemini-2.5-pro, gpt-4o, Qwen2.5-VL-72B

BYOK (Bring Your Own Key)

BYOK is currently supported for Telekom OTC and Mistral AI providers only. Azure OpenAI and Google Gemini do not support BYOK at this time.
Override API keys for specific agents using the api_key field:
agents:
  - name: customer_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large  # BYOK supported
      api_key: your-mistral-api-key  # BYOK
      temperature: 0.7
Security Notes: API Keys are currently stored in plain text when you save an agent. In v0.3 we are introducing a secure key vault, which can hold an API Key to be used globally across all provider calls.

Model Configuration Reference

LLMConfig Schema

llm_config:
  model: string              # Model name (required for non-default)
  temperature: float         # 0.0 - 2.0 (default: 0.7)
  max_tokens: int           # Max output tokens (optional)
  api_key: string           # Optional BYOK override

Field Descriptions

FieldTypeDefaultDescription
modelstringEnvironment defaultModel name or display name
temperaturefloat0.7Sampling temperature (0.0 = deterministic, 2.0 = creative)
max_tokensintModel defaultMaximum tokens to generate
api_keystringEnvironment variableOptional API key override

Temperature Guidelines

1

Low Temperature (0.0 - 0.3)

Use for: Data extraction, classification, structured outputBehavior: Deterministic, focused, consistent
2

Medium Temperature (0.4 - 0.8)

Use for: General conversation, Q&A, analysisBehavior: Balanced creativity and consistency
3

High Temperature (0.9 - 2.0)

Use for: Creative writing, brainstorming, diverse outputsBehavior: Creative, varied, less predictable

Token Limits & Context Windows

Each model has specific token limits that affect how much context can be processed:
# Example: Using a model with large context window
agents:
  - name: long_document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # 1M token context window
      temperature: 0.5
    prompt_config:
      include_history: true  # Can include extensive history
Token Calculation: Input tokens + Output tokens ≤ Max Total TokensThe framework automatically validates token limits and will raise errors if exceeded.

Best Practices

  1. Start with balanced models (gpt-4o, mistral-large-latest)
  2. Use specialized models for specific tasks (e.g., codestral for code)
  3. Consider cost vs. performance tradeoffs
  4. Test with different temperatures to find optimal settings
  1. Use vision models only when processing images
  2. Optimize image sizes before sending to API
  3. Consider token costs for image processing
  4. Validate modality support before deployment
  1. Implement key rotation for production systems
  2. Monitor usage and costs per key
  3. Use BYOK for multi-tenant isolation
  1. Set appropriate max_tokens to avoid waste
  2. Use smaller models for simple tasks
  3. Monitor token usage and optimize prompts

Troubleshooting

Error: Unknown provider type or Model not foundSolution:
  • Check model name spelling (case-insensitive)
  • Verify model is in supported list
  • Use display name or full model name
Error: 401 Unauthorized or Invalid API keySolution:
  • Verify API key is correct
  • Check environment variables are set
  • Ensure BYOK key has proper permissions
  • Validate API base URL is correct
Error: Token limit exceeded or Context too longSolution:
  • Reduce max_tokens setting
  • Limit conversation history (include_history: 5)
  • Use model with larger context window
  • Optimize prompt length
Error: Model does not support toolsSolution:
  • Check model’s supports_tools capability
  • Use different model (e.g., avoid Magistral models)
  • Remove tools from agent configuration

Examples

Multi-Provider System

name: multi_provider_system
description: System using different providers for different tasks
architecture: workflow

agents:
  - name: fast_classifier
    agent_type: llm_agent
    llm_config:
      model: mistral-small-latest  # Fast Mistral model
      temperature: 0.2
    prompt_config:
      system_prompt: Classify user intent quickly
  
  - name: deep_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # Large context Gemini
      temperature: 0.5
    prompt_config:
      system_prompt: Perform deep analysis with full context
  
  - name: image_processor
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Vision-capable model
      temperature: 0.3
    prompt_config:
      system_prompt: Extract information from images

edges:
  - from: START
    to: fast_classifier
  - from: fast_classifier
    to: deep_analyzer
  - from: deep_analyzer
    to: image_processor

BYOK Multi-Tenant Example

agents:
  - name: tenant_a_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large  # BYOK supported
      api_key: ${TENANT_A_MISTRAL_KEY}  # Tenant A's Mistral key
      temperature: 0.7
  
  - name: tenant_b_agent
    agent_type: llm_agent
    llm_config:
      model: Qwen3-235B-A22B  # BYOK supported (Telekom OTC)
      api_key: ${TENANT_B_TELEKOM_KEY}  # Tenant B's Telekom key
      temperature: 0.7

Next Steps