AWS Serverless Deployment
Deploy agents to AWS Lambda for auto-scaling, serverless execution.
Architecture
Prerequisites
- AWS CLI configured
- AWS credentials with Lambda/API Gateway permissions
- Agent Kernel with AWS extras:
pip install agentkernel[aws]
Deployment
1. Install Dependencies
pip install agentkernel[aws,openai]
2. Configure
Refer to Terraform modules for configuration details.
3. Deploy
terraform init && terraform deploy
Lambda Handler
Your agent code remains the same, just import the Lambda handler:
from agents import Agent as OpenAIAgent
from agentkernel.openai import OpenAIModule
from agentkernel.aws import Lambda
agent = OpenAIAgent(name="assistant", ...)
OpenAIModule([agent])
handler = Lambda.handler
## API Endpoints
After deployment:
POST https://{api-id}.execute-api.us-east-1.amazonaws.com/prod/chat
Body:
{
"agent": "assistant",
"message": "Hello!",
"session_id": "user-123"
}
Cost Optimization
Lambda Configuration
Memory: 512 MB Timeout: 30
Refer to Terraform modules to update the configurations.
Cold Start Mitigation
- Use provisioned concurrency for critical endpoints
- Keep Lambda warm with scheduled pings
- Optimize package size
Fault Tolerance
AWS Lambda deployment is inherently fault-tolerant with fully managed infrastructure.
Serverless Resilience by Design
Lambda provides built-in fault tolerance without any configuration:
Key Features:
- Multi-AZ execution automatically
- No infrastructure to manage
- Automatic scaling to demand
- Built-in retry mechanisms
- AWS handles all failures
Multi-AZ Architecture
Automatic Distribution:
- Lambda functions run across all availability zones
- No configuration required
- Survives entire AZ failures
- Transparent to application code
Benefits:
- Zone-level isolation
- Geographic redundancy
- No single point of failure
- AWS-managed failover
Automatic Retry Logic
Lambda automatically retries failed invocations:
Synchronous Invocations (API Gateway):
1st attempt → Failure
↓
2nd attempt (immediate retry)
↓
3rd attempt (immediate retry)
↓
Error response to client
Error Types with Automatic Retry:
- Function errors (unhandled exceptions)
- Throttling errors (429)
- Service errors (5xx)
- Timeout errors
Scaling and Availability
Infinite Scaling:
- Automatically scales to handle any number of requests
- Each request can run in isolated execution environment
- No capacity planning needed
- No manual intervention required
Concurrency Management:
# Optional: Reserve capacity for critical functions
resource "aws_lambda_function" "agent" {
reserved_concurrent_executions = 100
}
# Optional: Provisioned concurrency (eliminates cold starts)
resource "aws_lambda_provisioned_concurrency_config" "agent" {
provisioned_concurrent_executions = 10
}
Benefits:
- Handle traffic spikes automatically
- No over-provisioning
- Pay only for actual usage
- No capacity limits (within AWS quotas)
State Persistence with DynamoDB
Serverless-native state management with maximum resilience:
export AK_SESSION__TYPE=dynamodb
export AK_SESSION__DYNAMODB__TABLE_NAME=agent-kernel-sessions
DynamoDB Fault Tolerance:
- Multi-AZ replication - Data replicated across 3 AZs automatically
- Point-in-time recovery (PITR) - Restore to any second in last 35 days
- Continuous backups - Automatic and continuous
- 99.999% availability SLA - Five nines uptime
- Global tables (optional) - Multi-region replication
Recovery Time and Point Objectives
Recovery Time Objective (RTO):
- Function failure: < 1 second (automatic retry)
- AZ failure: 0 seconds (multi-AZ by default)
- Region failure: Requires multi-region setup
Recovery Point Objective (RPO):
- DynamoDB: Continuous (synchronous multi-AZ replication)
- Data loss: 0 (with proper DynamoDB configuration)
Fault Tolerance Benefits
Compared to Traditional Servers:
- ✅ No server failures (serverless)
- ✅ No patching required (managed by AWS)
- ✅ No capacity planning
- ✅ Automatic scaling
- ✅ Built-in redundancy
Compared to ECS:
- ✅ Zero infrastructure management
- ✅ Infinite scaling
- ✅ Pay only for usage
- ⚠️ Higher latency (cold starts)
- ⚠️ 15-minute execution limit
Learn more about fault tolerance →
Session Storage
For serverless deployments, use DynamoDB or ElastiCache Redis for session persistence:
DynamoDB (Recommended for Serverless)
export AK_SESSION__TYPE=dynamodb
export AK_SESSION__DYNAMODB__TABLE_NAME=agent-kernel-sessions
export AK_SESSION__DYNAMODB__TTL=3600 # 1 hour
Benefits:
- Serverless, fully managed
- Auto-scaling
- No cold starts
- Pay-per-use
- AWS-native integration
Requirements:
- DynamoDB table with partition key
session_id(String) and sort keykey(String) - Lambda IAM role with DynamoDB permissions (
dynamodb:GetItem,dynamodb:PutItem,dynamodb:UpdateItem,dynamodb:DescribeTable)
ElastiCache Redis
export AK_SESSION__TYPE=redis
export AK_SESSION__REDIS__URL=redis://elasticache-endpoint:6379
Benefits:
- High performance
- Shared cache across functions
Note: Redis requires VPC configuration for Lambda, which can impact cold start times.
Monitoring
CloudWatch metrics automatically available:
- Invocation count
- Duration
- Errors
- Concurrent executions
Best Practices
- Use DynamoDB for session storage (serverless-native)
- Alternatively, use Redis for session storage if already using ElastiCache
- Set appropriate timeout (30-60s for LLM calls)