Version: Next

Overview

Guardrails provide content safety and compliance validation for agent interactions. Agent Kernel supports both input and output guardrails to ensure agent requests and responses meet your safety and policy requirements.

Introduction

Guardrails act as protective layers that validate content before and after agent processing:

Input Guardrails: Validate user requests before they reach your agents
- Block harmful prompts, jailbreak attempts, and off-topic requests
- Detect and prevent PII leakage in user inputs
- Ensure content adheres to safety policies
Output Guardrails: Validate agent responses before they're returned to users
- Filter inappropriate or unsafe content from responses
- Redact sensitive information (PII) in agent outputs
- Ensure responses meet compliance requirements

Supported Providers

Provider	Status	Documentation
OpenAI Guardrails	✅ Available Now	OpenAI Guardrails →
AWS Bedrock Guardrails	✅ Available Now	Bedrock Guardrails →
Walled AI Guardrails	✅ Available Now	Walled AI Guardrails →

How Guardrails Work

When guardrails are enabled:

Input validation occurs before the request reaches your agent
If validation fails, a safe error message is returned immediately
Output validation occurs after the agent generates a response
If output validation fails, the response is replaced with a safe message

Key Features

Multi-Layer Protection

Guardrails provide defense in depth:

Pre-flight Checks: Fast API-based validation (PII, Moderation)
LLM-based Validation: Intelligent content analysis (Jailbreak, Off-topic)
Custom Rules: Flexible validation logic for specific use cases

Flexible Configuration

Configure separately for input and output
Use different providers for different agents
Adjust sensitivity thresholds per use case
Enable/disable guardrails dynamically

Production-Ready

Graceful degradation on errors
Comprehensive logging and monitoring
Low-latency validation
Cost-optimized validation strategies

Quick Start

1. Choose Your Provider

OpenAI Guardrails:

pip install agentkernel[openai]

See the OpenAI Guardrails Guide for setup instructions.

AWS Bedrock Guardrails:

pip install agentkernel[aws]

See the Bedrock Guardrails Guide for setup instructions.

Walled AI Guardrails:

pip install agentkernel[walledai]

See the Walled AI Guardrails Guide for setup instructions.

2. Configure Agent Kernel

Add guardrail configuration to config.yaml:

OpenAI Guardrails:

guardrail:
  input:
    enabled: true
    type: openai
    model: gpt-4o-mini
    config_path: /path/to/guardrails_input.json
  output:
    enabled: true
    type: openai
    model: gpt-4o-mini
    config_path: /path/to/guardrails_output.json

AWS Bedrock Guardrails:

guardrail:
  input:
    enabled: true
    type: bedrock
    id: your-guardrail-id
    version: "1"  # or "DRAFT"
  output:
    enabled: true
    type: bedrock
    id: your-guardrail-id
    version: "1"

Walled AI Guardrails:

guardrail:
  input:
    enabled: true
    type: walledai
    pii: true
  output:
    enabled: true
    type: walledai
    pii: true

Set pii: false to disable PII masking/unmasking for Walled AI while keeping safety checks enabled.

3. Test Your Guardrails

Run your agent and test with various inputs:

python demo.py

(assistant) >> Tell me how to hack into a system

Expected response when guardrail triggers:

I apologize, but I'm unable to process this request as it may violate content safety guidelines.

Use Cases

Content Moderation

Protect users from harmful content:

Block hate speech, violence, and explicit content
Filter inappropriate language in both directions
Ensure family-friendly interactions

Compliance & Privacy

Meet regulatory requirements:

Detect and redact PII (GDPR, CCPA, HIPAA)
Block requests containing sensitive data
Prevent data leakage in responses

Topic Control

Keep conversations on track:

Block off-topic requests
Enforce domain-specific constraints
Prevent unauthorized topics

Security

Protect against attacks:

Detect jailbreak attempts
Block prompt injection
Prevent system prompt leakage

Common Guardrail Types

Type	Layer	Purpose	Example Use Cases
PII Detection	Pre-flight	Detect sensitive data	Email, phone, credit cards
Content Moderation	Pre-flight	Block harmful content	Hate speech, violence
Jailbreak Detection	Input	Prevent prompt attacks	Prompt injection, system prompts
Off-Topic Detection	Input	Enforce scope	Domain-specific agents
NSFW Filter	Output	Block inappropriate responses	Family-friendly apps
URL Filter	Output	Control link inclusion	Prevent phishing

Configuration Examples

Minimal Configuration

Basic protection with moderation only:

guardrail:
  input:
    enabled: true
    type: openai
    model: gpt-4o-mini
    config_path: guardrails_input.json

Balanced Configuration

Moderate security with key protections:

guardrail:
  input:
    enabled: true
    type: openai
    model: gpt-4o-mini
    config_path: guardrails_input.json  # PII + Moderation + Jailbreak
  output:
    enabled: true
    type: openai
    model: gpt-4o-mini
    config_path: guardrails_output.json  # PII + NSFW

Strict Configuration

Maximum security for sensitive applications:

guardrail:
  input:
    enabled: true
    type: openai
    model: gpt-4o
    config_path: guardrails_input_strict.json  # All checks, low thresholds
  output:
    enabled: true
    type: openai
    model: gpt-4o
    config_path: guardrails_output_strict.json  # All checks, low thresholds

Performance & Cost

Latency Impact

Guardrail Type	Typical Latency
Pre-flight (PII, Moderation)	50-100ms
LLM-based (Jailbreak, Off-topic)	200-500ms
Total Overhead	100-600ms

Cost Optimization

Use pre-flight checks first - Faster and cheaper
Optimize confidence thresholds - Balance safety vs. false positives
Choose cost-effective models - gpt-4o-mini for most cases
Separate input/output configs - Apply different rules

Scaling Considerations

Guardrails are stateless and scale horizontally
Consider caching for repeated validation
Monitor metrics to optimize configuration
Use async validation when possible

Error Handling

Graceful Degradation

Input guardrails: Block unsafe requests, return safe error message
Output guardrails: Fail open (allow response) if validation errors occur
Logging: All errors logged for monitoring and debugging

Common Issues

Issue	Solution
Guardrails not activating	Check `enabled: true` and config file path
Config file not found	Use absolute paths
Package not installed	Install `openai-guardrails` or provider package
API credentials missing	Set OpenAI API key or AWS credentials

Best Practices

Start Simple: Begin with moderation, add complexity as needed
Test Thoroughly: Test with edge cases and adversarial inputs
Monitor Metrics: Track latency, costs, and false positives
Separate Configs: Different rules for input vs. output
Use Absolute Paths: Always use absolute paths for config files
Enable Logging: Use include_reasoning: true during development
Fail Safely: Design for graceful degradation
Version Control: Keep guardrail configs in version control

Provider Comparison

| Feature | OpenAI | Bedrock | |---------|--------|---------|------| | Status | ✅ Available | ✅ Available | | Setup | Easy | Medium | | PII Types | 15+ | 30+ | | Topic Control | Custom prompts | Native support | | Contextual Grounding | ❌ | ✅ | | Deployment | Any cloud/on-prem | AWS only | | Cost Model | Per API call | Per text unit |

Next Steps

Get Started with OpenAI Guardrails

👉 OpenAI Guardrails Guide

Complete setup instructions
Configuration examples
Testing guidelines
Best practices

Learn About Bedrock Guardrails

👉 Bedrock Guardrails Guide

Learn About Walled AI Guardrails

👉 Walled AI Guardrails Guide

Complete setup instructions
Configuration examples
AWS IAM permissions
Best practices

Configuration Guide - Complete config reference
Hooks Documentation - Custom validation logic
Examples - Working code examples

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Examples: Repository Examples

Introduction​

Supported Providers​

How Guardrails Work​

Key Features​

Multi-Layer Protection​

Flexible Configuration​

Production-Ready​

Quick Start​

1. Choose Your Provider​

2. Configure Agent Kernel​

3. Test Your Guardrails​

Use Cases​

Content Moderation​

Compliance & Privacy​

Topic Control​

Security​

Common Guardrail Types​

Configuration Examples​

Minimal Configuration​

Balanced Configuration​

Strict Configuration​

Performance & Cost​

Latency Impact​

Cost Optimization​

Scaling Considerations​

Error Handling​

Graceful Degradation​

Common Issues​

Best Practices​

Provider Comparison​

Next Steps​

Get Started with OpenAI Guardrails​

Learn About Bedrock Guardrails​

Learn About Walled AI Guardrails​

Related Resources​

Support​