Available Models

Overview

Atthene supports multiple LLM providers and models, each with unique capabilities. The framework automatically maps model names to their providers, handles authentication, and validates configurations.

Supported Providers

Atthene integrates with the following LLM providers:

Azure OpenAI

Enterprise-grade OpenAI models hosted on Azure

Telekom OTC

Open Telekom Cloud with Qwen, Claude, and custom models

Mistral AI

Fast and efficient European AI models

Google Gemini

Google’s multimodal AI models

Model Selection

Basic Configuration

Specify a model in your agent’s llm_config section:

agents:
  - name: analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o
      temperature: 0.7
      max_tokens: 4096

Automatic Provider Detection

Atthene automatically detects the provider based on the model name. You don’t need to specify the provider explicitly:

# These automatically map to their respective providers
llm_config:
  model: gpt-4o           # → Azure OpenAI
  model: mistral-large    # → Mistral AI
  model: gemini-2.5-pro   # → Google Gemini
  model: claude-sonnet-4  # → Telekom OTC

The framework uses case-insensitive matching for model names, so gpt-4o, GPT-4O, and Gpt-4o are all valid.

Azure OpenAI Models

Model Name (use in YAML)	Tool Support	Modalities	Context Window
`gpt-4o`	✅ Yes	Text, Image	128K tokens

Azure OpenAI uses deployment names. Ensure your deployment name matches the model configuration.

Telekom OTC Models

Model Name (use in YAML)	Tool Support	Modalities	Context Window
`Qwen2.5-Coder-32B`	✅ Yes	Text	32K tokens
`Qwen2.5-VL-72B`	✅ Yes	Text, Image	32K tokens
`Qwen3-235B-A22B`	✅ Yes	Text	32K tokens
`Qwen3-30B`	✅ Yes	Text	32K tokens
`Teuken-7B`	✅ Yes	Text	32K tokens
`claude-3-7-sonnet`	✅ Yes	Text, Image	128K tokens
`claude-sonnet-4`	✅ Yes	Text, Image	128K tokens
`gpt-oss-120b`	✅ Yes	Text	32K tokens

Mistral AI Models

Model Name (use in YAML)	Tool Support	Modalities	Context Window
`mistral-large`	✅ Yes	Text	128K tokens
`mistral-medium`	✅ Yes	Text, Image	128K tokens
`mistral-small`	✅ Yes	Text, Image	128K tokens
`pixtral-large`	✅ Yes	Text, Image	128K tokens
`codestral`	✅ Yes	Text	128K tokens
`ministral-3b`	✅ Yes	Text	128K tokens
`ministral-8b`	✅ Yes	Text	128K tokens
`devstral-small`	✅ Yes	Text	128K tokens
`devstral-medium`	❌ No	Text	128K tokens
`magistral-medium`	❌ No	Text, Image	128K tokens
`magistral-small`	❌ No	Text, Image	128K tokens
`mistral-moderation`	❌ No	Text	128K tokens
`mistral-saba`	❌ No	Text	128K tokens

Magistral, devstral-medium, mistral-moderation, and mistral-saba models do not support tool calling. Use them only for text generation tasks without tools.

Google Gemini Models

Model Name (use in YAML)	Tool Support	Modalities	Context Window
`gemini-2.5-pro`	✅ Yes	Text, Image	1M tokens
`gemini-2.5-flash`	✅ Yes	Text, Image	1M tokens
`gemini-3-pro`	✅ Yes	Text, Image	1M tokens

Gemini models have massive context windows (1M tokens) and support all modalities, making them ideal for complex multimodal tasks.

Multimodal Models & OCR

Understanding Modalities

Atthene models support different input types (modalities):

Text: Standard text input/output
Image: Image understanding and OCR capabilities

Image Processing (OCR)

Models with image modality support can process images directly. You can send images via the Frontend UI or REST API.

Agent Configuration

Configure your agent with a vision-capable model:

agents:
  - name: document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Supports text + image
      temperature: 0.3
    prompt_config:
      system_prompt: |
        You are a document analyzer. Extract text and structured data from images.
        Provide detailed analysis of visual elements.

Sending Images via Frontend

Use the Atthene GPT frontend to upload and process images:

Click the attachment icon in the chat interface
Select your image file (JPEG, PNG, etc.)
Add your text prompt
Send the message

The frontend automatically handles image encoding and multimodal message formatting.

Sending Images via REST API

Send multimodal content using the /api/v1/sessions/{session_id}/execute endpoint:

POST /api/v1/sessions/{session_id}/execute
Content-Type: application/json
Authorization: Bearer <your_token>

{
  "content": [
    {
      "type": "text",
      "text": "What's in this image?"
    },
    {
      "type": "image",
      "source": {
        "type": "base64",
        "media_type": "image/jpeg",
        "data": "<base64_encoded_image_data>"
      }
    }
  ]
}

Alternative: Using Image URLs For better performance, use publicly accessible image URLs instead of base64:

{
  "content": [
    {
      "type": "text",
      "text": "Analyze this document"
    },
    {
      "type": "image",
      "source": {
        "type": "url",
        "media_type": "image/jpeg",
        "url": "https://example.com/document.jpg"
      }
    }
  ]
}

PDF Processing

PDFs are automatically converted to images when sent to vision-capable models:

Each PDF page is converted to a JPEG image (150 DPI)
Images are compressed to max 800KB per page (1568px max dimension)
Quality is optimized for LLM vision APIs (85-40% JPEG quality)
Conversion happens automatically in the backend

Sending PDFs via REST API

{
  "content": [
    {
      "type": "text",
      "text": "Extract data from this invoice"
    },
    {
      "type": "file",
      "source": {
        "type": "base64",
        "media_type": "application/pdf",
        "filename": "invoice.pdf",
        "data": "<base64_encoded_pdf_data>"
      }
    }
  ]
}

The backend automatically:

Detects the PDF file content
Converts each page to a compressed JPEG image
Sends the images to the multimodal model
Processes the response

Note: Only vision-capable models (e.g., gpt-4o, claude-3-5-sonnet-20241022, Qwen2-VL-72B-Instruct) can process images and PDFs. Text-only models will reject multimodal content. Supported Image Models:

Azure OpenAI: gpt-4o
Telekom OTC: Qwen2.5-VL-72B, claude-3-7-sonnet, claude-sonnet-4
Mistral AI: mistral-medium, mistral-small, pixtral-large, magistral-medium, magistral-small
Google Gemini: All Gemini models

Multimodal Use Cases

Document OCR & Analysis

Use vision models to extract text from scanned documents, invoices, receipts, and forms.Best Models: gpt-4o, claude-sonnet-4, pixtral-large

Image Understanding

Analyze images for content, objects, scenes, and context.Best Models: gemini-2.5-pro, gpt-4o, Qwen2.5-VL-72B

BYOK (Bring Your Own Key)

BYOK is currently supported for Telekom OTC and Mistral AI providers only. Azure OpenAI and Google Gemini do not support BYOK at this time.

Override API keys for specific agents using the api_key field:

agents:
  - name: customer_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large  # BYOK supported
      api_key: your-mistral-api-key  # BYOK
      temperature: 0.7

Security Notes: API Keys are currently stored in plain text when you save an agent. In v0.3 we are introducing a secure key vault, which can hold an API Key to be used globally across all provider calls.

Model Configuration Reference

LLMConfig Schema

llm_config:
  model: string              # Model name (required for non-default)
  temperature: float         # 0.0 - 2.0 (default: 0.7)
  max_tokens: int           # Max output tokens (optional)
  api_key: string           # Optional BYOK override

Field Descriptions

Field	Type	Default	Description
`model`	string	Environment default	Model name or display name
`temperature`	float	0.7	Sampling temperature (0.0 = deterministic, 2.0 = creative)
`max_tokens`	int	Model default	Maximum tokens to generate
`api_key`	string	Environment variable	Optional API key override

Temperature Guidelines

Low Temperature (0.0 - 0.3)

Use for: Data extraction, classification, structured outputBehavior: Deterministic, focused, consistent

Medium Temperature (0.4 - 0.8)

Use for: General conversation, Q&A, analysisBehavior: Balanced creativity and consistency

High Temperature (0.9 - 2.0)

Use for: Creative writing, brainstorming, diverse outputsBehavior: Creative, varied, less predictable

Token Limits & Context Windows

Each model has specific token limits that affect how much context can be processed:

# Example: Using a model with large context window
agents:
  - name: long_document_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # 1M token context window
      temperature: 0.5
    prompt_config:
      include_history: true  # Can include extensive history

Token Calculation: Input tokens + Output tokens ≤ Max Total TokensThe framework automatically validates token limits and will raise errors if exceeded.

Best Practices

Model Selection Strategy

Start with balanced models (gpt-4o, mistral-large-latest)
Use specialized models for specific tasks (e.g., codestral for code)
Consider cost vs. performance tradeoffs
Test with different temperatures to find optimal settings

Multimodal Workflows

Use vision models only when processing images
Optimize image sizes before sending to API
Consider token costs for image processing
Validate modality support before deployment

API Key Management

Implement key rotation for production systems
Monitor usage and costs per key
Use BYOK for multi-tenant isolation

Performance Optimization

Set appropriate max_tokens to avoid waste
Use smaller models for simple tasks
Monitor token usage and optimize prompts

Troubleshooting

Model Not Found Error

Error: Unknown provider type or Model not foundSolution:

Check model name spelling (case-insensitive)
Verify model is in supported list
Use display name or full model name

API Key Authentication Failed

Error: 401 Unauthorized or Invalid API keySolution:

Verify API key is correct
Check environment variables are set
Ensure BYOK key has proper permissions
Validate API base URL is correct

Token Limit Exceeded

Error: Token limit exceeded or Context too longSolution:

Reduce max_tokens setting
Limit conversation history (include_history: 5)
Use model with larger context window
Optimize prompt length

Tool Calling Not Supported

Error: Model does not support toolsSolution:

Check model’s supports_tools capability
Use different model (e.g., avoid Magistral models)
Remove tools from agent configuration

Examples

Multi-Provider System

name: multi_provider_system
description: System using different providers for different tasks
architecture: workflow

agents:
  - name: fast_classifier
    agent_type: llm_agent
    llm_config:
      model: mistral-small-latest  # Fast Mistral model
      temperature: 0.2
    prompt_config:
      system_prompt: Classify user intent quickly
  
  - name: deep_analyzer
    agent_type: llm_agent
    llm_config:
      model: gemini-2.5-pro  # Large context Gemini
      temperature: 0.5
    prompt_config:
      system_prompt: Perform deep analysis with full context
  
  - name: image_processor
    agent_type: llm_agent
    llm_config:
      model: gpt-4o  # Vision-capable model
      temperature: 0.3
    prompt_config:
      system_prompt: Extract information from images

edges:
  - from: START
    to: fast_classifier
  - from: fast_classifier
    to: deep_analyzer
  - from: deep_analyzer
    to: image_processor

BYOK Multi-Tenant Example

agents:
  - name: tenant_a_agent
    agent_type: llm_agent
    llm_config:
      model: mistral-large  # BYOK supported
      api_key: ${TENANT_A_MISTRAL_KEY}  # Tenant A's Mistral key
      temperature: 0.7
  
  - name: tenant_b_agent
    agent_type: llm_agent
    llm_config:
      model: Qwen3-235B-A22B  # BYOK supported (Telekom OTC)
      api_key: ${TENANT_B_TELEKOM_KEY}  # Tenant B's Telekom key
      temperature: 0.7

Next Steps

Prompt Configuration

Learn how to configure prompts and context

Agent Types

Explore different agent types and capabilities

Agent Capabilities

Explore tools, streaming, and other capabilities

YAML Configuration

Learn about complete agent configuration

Getting started

Agent Configuration

Multi Agent Systems

Atthene Agents Studio

Knowledge Base

Platform

Integration

​Overview

​Supported Providers

Azure OpenAI

Telekom OTC

Mistral AI

Google Gemini

​Model Selection

​Basic Configuration

​Automatic Provider Detection