Skip to main content
Version: Next

Multimodal Attachments

Agent Kernel supports multimodal input processing — users can send images and files alongside text, and the framework automatically handles description generation, storage, and context injection.

Overview

Key Design Decisions

  • No raw binary in session history — Images/files are stored externally; only text descriptions enter the conversation. This prevents session bloat.
  • Pluggable storage — Choose between in-memory, Redis, or DynamoDB depending on your deployment.
  • Automatic description — A vision-capable LLM generates brief descriptions of each attachment.
  • System tool for recall — The agent can call analyze_attachments to retrieve previously stored images/files.

Enabling Multimodal Support

Environment Variables

export AK_MULTIMODAL__ENABLED=true

Configuration File

multimodal:
enabled: true
max_attachments: 10 # Max attachments per session
description_max_length: 200 # Max chars for auto-generated descriptions
storage_type: in_memory # Default — no session bloat

Attachment Storage

Attachments are stored outside the session to prevent session bloat. The storage backend is independent of your session storage — you can use Redis sessions with in-memory attachment storage, or vice versa.

In-Memory (Default)

Fast, ephemeral storage. Attachments live in a module-level dictionary — not inside the session object.

export AK_MULTIMODAL__STORAGE_TYPE=in_memory
TraitValue
Session bloat❌ None
Persistence❌ Lost on restart
Setup✅ None required
Best forDevelopment, testing

Redis

Persistent storage for production. Requires a Redis server.

export AK_MULTIMODAL__STORAGE_TYPE=redis
export AK_MULTIMODAL__REDIS__URL=redis://localhost:6379
export AK_MULTIMODAL__REDIS__PREFIX=ak:attachments:
export AK_MULTIMODAL__REDIS__TTL=3600
TraitValue
Session bloat❌ None
Persistence✅ Across restarts
Setup🔧 Redis server
Best forContainerized production

DynamoDB

Serverless storage for AWS deployments.

export AK_MULTIMODAL__STORAGE_TYPE=dynamodb
export AK_MULTIMODAL__DYNAMODB__TABLE_NAME=ak-attachments
export AK_MULTIMODAL__DYNAMODB__REGION=us-east-1
export AK_MULTIMODAL__DYNAMODB__TTL=3600
TraitValue
Session bloat❌ None
Persistence✅ Fully managed
Setup🔧 AWS account + table
Best forAWS Lambda deployments

Session Cache (Legacy)

warning

This stores attachments inside the session object, causing session size to grow with each attachment. Use only for backward compatibility.

export AK_MULTIMODAL__STORAGE_TYPE=session_cache

The analyze_attachments System Tool

When multimodal is enabled, a system tool called analyze_attachments is automatically registered on all agents. This allows the agent to retrieve and re-analyze previously stored attachments.

# attachment_ids usually come from the multimodal storage layer
analyze_attachments(
attachment_ids=["att_123", "att_456"],
prompt="What breed is the dog?",
)

The tool:

  1. Takes a list of attachment IDs (returned when attachments are stored by the multimodal pre-hook or storage backend; see the attachment_id in the sequence diagram above)
  2. Fetches those attachments from storage
  3. Sends them (with the prompt) to the vision LLM
  4. Returns a detailed analysis

This enables multi-turn conversations about images:

User: [sends photo of a dog]
Agent: I see a golden retriever sitting in a park.

User: What breed is it exactly?
Agent: [calls analyze_attachments] It's a Golden Retriever, approximately 2-3 years old...

Configuration Reference

Full config.yaml Example

multimodal:
enabled: true
max_attachments: 10
description_max_length: 200
storage_type: in_memory # in_memory | redis | dynamodb | session_cache

redis:
url: "redis://localhost:6379"
prefix: "ak:attachments:"
ttl: 3600

dynamodb:
table_name: "ak-attachments"
region: "us-east-1"
ttl: 3600

Environment Variables

# Core
export AK_MULTIMODAL__ENABLED=true
export AK_MULTIMODAL__MAX_ATTACHMENTS=10
export AK_MULTIMODAL__DESCRIPTION_MAX_LENGTH=200
export AK_MULTIMODAL__STORAGE_TYPE=in_memory

# Redis storage
export AK_MULTIMODAL__REDIS__URL=redis://localhost:6379
export AK_MULTIMODAL__REDIS__PREFIX=ak:attachments:
export AK_MULTIMODAL__REDIS__TTL=3600

# DynamoDB storage
export AK_MULTIMODAL__DYNAMODB__TABLE_NAME=ak-attachments
export AK_MULTIMODAL__DYNAMODB__REGION=us-east-1
export AK_MULTIMODAL__DYNAMODB__TTL=3600

Storage Backend Comparison

FeatureIn-MemoryRedisDynamoDBSession Cache
Session Bloat❌ None❌ None❌ None⚠️ Yes
Persistence❌ Lost on restart✅ Persistent✅ Persistent✅ With session
Multi-Process❌ Single process✅ Distributed✅ DistributedDepends on session
Setup✅ None🔧 Redis server🔧 AWS account✅ None
Best ForDevelopmentProductionServerlessLegacy only

Supported Integrations

Multimodal attachments are supported on the following platforms:

PlatformImagesFilesNotes
TelegramPhotos + documents
REST APIVia AgentRequestImage / AgentRequestFile
CLIText only