Overview
Guardrails provide content safety and compliance validation for agent interactions. Agent Kernel supports both input and output guardrails to ensure agent requests and responses meet your safety and policy requirements.
Introduction
Guardrails act as protective layers that validate content before and after agent processing:
-
Input Guardrails: Validate user requests before they reach your agents
- Block harmful prompts, jailbreak attempts, and off-topic requests
- Detect and prevent PII leakage in user inputs
- Ensure content adheres to safety policies
-
Output Guardrails: Validate agent responses before they're returned to users
- Filter inappropriate or unsafe content from responses
- Redact sensitive information (PII) in agent outputs
- Ensure responses meet compliance requirements
Supported Providers
| Provider | Status | Documentation |
|---|---|---|
| OpenAI Guardrails | ✅ Available Now | OpenAI Guardrails → |
| AWS Bedrock Guardrails | ✅ Available Now | Bedrock Guardrails → |
How Guardrails Work
When guardrails are enabled:
- Input validation occurs before the request reaches your agent
- If validation fails, a safe error message is returned immediately
- Output validation occurs after the agent generates a response
- If output validation fails, the response is replaced with a safe message
Key Features
Multi-Layer Protection
Guardrails provide defense in depth:
- Pre-flight Checks: Fast API-based validation (PII, Moderation)
- LLM-based Validation: Intelligent content analysis (Jailbreak, Off-topic)
- Custom Rules: Flexible validation logic for specific use cases
Flexible Configuration
- Configure separately for input and output
- Use different providers for different agents
- Adjust sensitivity thresholds per use case
- Enable/disable guardrails dynamically
Production-Ready
- Graceful degradation on errors
- Comprehensive logging and monitoring
- Low-latency validation
- Cost-optimized validation strategies
Quick Start
1. Choose Your Provider
OpenAI Guardrails:
pip install agentkernel[openai]
See the OpenAI Guardrails Guide for setup instructions.
AWS Bedrock Guardrails:
pip install agentkernel[aws]
See the Bedrock Guardrails Guide for setup instructions.
2. Configure Agent Kernel
Add guardrail configuration to config.yaml:
OpenAI Guardrails:
guardrail:
input:
enabled: true
type: openai
model: gpt-4o-mini
config_path: /path/to/guardrails_input.json
output:
enabled: true
type: openai
model: gpt-4o-mini
config_path: /path/to/guardrails_output.json
AWS Bedrock Guardrails:
guardrail:
input:
enabled: true
type: bedrock
id: your-guardrail-id
version: "1" # or "DRAFT"
output:
enabled: true
type: bedrock
id: your-guardrail-id
version: "1"
3. Test Your Guardrails
Run your agent and test with various inputs:
python demo.py
(assistant) >> Tell me how to hack into a system
Expected response when guardrail triggers:
I apologize, but I'm unable to process this request as it may violate content safety guidelines.
Use Cases
Content Moderation
Protect users from harmful content:
- Block hate speech, violence, and explicit content
- Filter inappropriate language in both directions
- Ensure family-friendly interactions
Compliance & Privacy
Meet regulatory requirements:
- Detect and redact PII (GDPR, CCPA, HIPAA)
- Block requests containing sensitive data
- Prevent data leakage in responses
Topic Control
Keep conversations on track:
- Block off-topic requests
- Enforce domain-specific constraints
- Prevent unauthorized topics
Security
Protect against attacks:
- Detect jailbreak attempts
- Block prompt injection
- Prevent system prompt leakage
Common Guardrail Types
| Type | Layer | Purpose | Example Use Cases |
|---|---|---|---|
| PII Detection | Pre-flight | Detect sensitive data | Email, phone, credit cards |
| Content Moderation | Pre-flight | Block harmful content | Hate speech, violence |
| Jailbreak Detection | Input | Prevent prompt attacks | Prompt injection, system prompts |
| Off-Topic Detection | Input | Enforce scope | Domain-specific agents |
| NSFW Filter | Output | Block inappropriate responses | Family-friendly apps |
| URL Filter | Output | Control link inclusion | Prevent phishing |
Configuration Examples
Minimal Configuration
Basic protection with moderation only:
guardrail:
input:
enabled: true
type: openai
model: gpt-4o-mini
config_path: guardrails_input.json
Balanced Configuration
Moderate security with key protections:
guardrail:
input:
enabled: true
type: openai
model: gpt-4o-mini
config_path: guardrails_input.json # PII + Moderation + Jailbreak
output:
enabled: true
type: openai
model: gpt-4o-mini
config_path: guardrails_output.json # PII + NSFW
Strict Configuration
Maximum security for sensitive applications:
guardrail:
input:
enabled: true
type: openai
model: gpt-4o
config_path: guardrails_input_strict.json # All checks, low thresholds
output:
enabled: true
type: openai
model: gpt-4o
config_path: guardrails_output_strict.json # All checks, low thresholds
Performance & Cost
Latency Impact
| Guardrail Type | Typical Latency |
|---|---|
| Pre-flight (PII, Moderation) | 50-100ms |
| LLM-based (Jailbreak, Off-topic) | 200-500ms |
| Total Overhead | 100-600ms |
Cost Optimization
- Use pre-flight checks first - Faster and cheaper
- Optimize confidence thresholds - Balance safety vs. false positives
- Choose cost-effective models -
gpt-4o-minifor most cases - Separate input/output configs - Apply different rules
Scaling Considerations
- Guardrails are stateless and scale horizontally
- Consider caching for repeated validation
- Monitor metrics to optimize configuration
- Use async validation when possible
Error Handling
Graceful Degradation
- Input guardrails: Block unsafe requests, return safe error message
- Output guardrails: Fail open (allow response) if validation errors occur
- Logging: All errors logged for monitoring and debugging
Common Issues
| Issue | Solution |
|---|---|
| Guardrails not activating | Check enabled: true and config file path |
| Config file not found | Use absolute paths |
| Package not installed | Install openai-guardrails or provider package |
| API credentials missing | Set OpenAI API key or AWS credentials |
Best Practices
- Start Simple: Begin with moderation, add complexity as needed
- Test Thoroughly: Test with edge cases and adversarial inputs
- Monitor Metrics: Track latency, costs, and false positives
- Separate Configs: Different rules for input vs. output
- Use Absolute Paths: Always use absolute paths for config files
- Enable Logging: Use
include_reasoning: trueduring development - Fail Safely: Design for graceful degradation
- Version Control: Keep guardrail configs in version control
Provider Comparison
| Feature | OpenAI | Bedrock | |---------|--------|---------|------| | Status | ✅ Available | ✅ Available | | Setup | Easy | Medium | | PII Types | 15+ | 30+ | | Topic Control | Custom prompts | Native support | | Contextual Grounding | ❌ | ✅ | | Deployment | Any cloud/on-prem | AWS only | | Cost Model | Per API call | Per text unit |
Next Steps
Get Started with OpenAI Guardrails
- Complete setup instructions
- Configuration examples
- Testing guidelines
- Best practices
Learn About Bedrock Guardrails
- Complete setup instructions
- Configuration examples
- AWS IAM permissions
- Best practices
Related Resources
- Configuration Guide - Complete config reference
- Hooks Documentation - Custom validation logic
- Examples - Working code examples
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Examples: Repository Examples