OpenAI Guardrails
OpenAI Guardrails provide comprehensive content safety validation for Agent Kernel using the openai-guardrails library. This integration enables you to validate both user inputs and agent outputs against configurable safety policies.
Overview
OpenAI Guardrails support:
- Pre-flight Checks: Fast validation before LLM calls (PII detection, content moderation)
- Input Validation: Jailbreak detection, off-topic prompts, custom checks
- Output Validation: PII filtering, NSFW content blocking, URL filtering
Installation
Install the openai-guardrails package:
pip install agentkernel[openai]
Configuration
Step 1: Create Guardrail Configuration Files
Create JSON configuration files that define your guardrail rules. You can use the interactive OpenAI Guardrails Builder to generate configurations.
Input Guardrails Configuration
Example (guardrails_input.json):
{
"version": 1,
"pre_flight": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": [
"CREDIT_CARD",
"EMAIL_ADDRESS",
"PHONE_NUMBER",
"CVV",
"CRYPTO",
"DATE_TIME",
"IBAN_CODE",
"BIC_SWIFT",
"IP_ADDRESS",
"LOCATION",
"MEDICAL_LICENSE",
"NRP",
"PERSON",
"URL"
]
}
},
{
"name": "Moderation",
"config": {
"categories": [
"sexual",
"sexual/minors",
"hate",
"hate/threatening",
"harassment",
"harassment/threatening",
"self-harm",
"self-harm/intent",
"self-harm/instructions",
"violence",
"violence/graphic",
"illicit",
"illicit/violent"
]
}
}
]
},
"input": {
"version": 1,
"guardrails": [
{
"name": "Jailbreak",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"include_reasoning": false
}
},
{
"name": "Off Topic Prompts",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "You are a helpful assistant for customer service. Keep responses focused on customer service topics.",
"include_reasoning": false
}
},
{
"name": "Custom Prompt Check",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "You are a general knowledge assistant. Raise the guardrail if questions aren't focused on general knowledge.",
"include_reasoning": false
}
}
]
},
"output": {
"version": 1,
"guardrails": []
}
}
Output Guardrails Configuration
Example (guardrails_output.json):
{
"version": 1,
"pre_flight": {
"version": 1,
"guardrails": []
},
"input": {
"version": 1,
"guardrails": []
},
"output": {
"version": 1,
"guardrails": [
{
"name": "Contains PII",
"config": {
"entities": [
"CREDIT_CARD",
"CVV",
"CRYPTO",
"DATE_TIME",
"EMAIL_ADDRESS",
"IBAN_CODE",
"BIC_SWIFT",
"IP_ADDRESS",
"LOCATION",
"MEDICAL_LICENSE",
"PHONE_NUMBER",
"URL"
],
"block": true
}
},
{
"name": "URL Filter",
"config": {}
},
{
"name": "Custom Prompt Check",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "You are a general knowledge assistant. Raise the guardrail if the response isn't appropriate.",
"include_reasoning": false
}
},
{
"name": "NSFW Text",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"include_reasoning": false
}
},
{
"name": "Moderation",
"config": {
"categories": [
"sexual",
"hate",
"harassment",
"violence"
]
}
}
]
}
}
Step 2: Configure Agent Kernel
Configure guardrails in your config.yaml:
guardrail:
input:
enabled: true
type: openai
model: gpt-4o-mini
config_path: /path/to/guardrails_input.json
output:
enabled: true
type: openai
model: gpt-4o-mini
config_path: /path/to/guardrails_output.json
Or via environment variables:
# Input guardrails
export AK_GUARDRAIL__INPUT__ENABLED=true
export AK_GUARDRAIL__INPUT__TYPE=openai
export AK_GUARDRAIL__INPUT__MODEL=gpt-4o-mini
export AK_GUARDRAIL__INPUT__CONFIG_PATH=/path/to/guardrails_input.json
# Output guardrails
export AK_GUARDRAIL__OUTPUT__ENABLED=true
export AK_GUARDRAIL__OUTPUT__TYPE=openai
export AK_GUARDRAIL__OUTPUT__MODEL=gpt-4o-mini
export AK_GUARDRAIL__OUTPUT__CONFIG_PATH=/path/to/guardrails_output.json
Available Guardrail Types
Pre-flight Guardrails
Pre-flight guardrails run before any LLM calls and are typically faster and more cost-effective:
Contains PII
Detects personally identifiable information in requests or responses.
Supported Entities:
CREDIT_CARD- Credit card numbersCVV- Card verification valuesCRYPTO- Cryptocurrency addressesDATE_TIME- Date and time informationEMAIL_ADDRESS- Email addressesIBAN_CODE- International bank account numbersBIC_SWIFT- Bank identification codesIP_ADDRESS- IP addressesLOCATION- Location informationMEDICAL_LICENSE- Medical license numbersNRP- National registration numbersPERSON- Person namesPHONE_NUMBER- Phone numbersURL- Web addresses
Configuration:
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"],
"block": true
}
}
Moderation
Uses OpenAI's content moderation API to detect harmful content.
Supported Categories:
sexual- Sexual contentsexual/minors- Sexual content involving minorshate- Hate speechhate/threatening- Hateful threatening contentharassment- Harassmentharassment/threatening- Harassing threatsself-harm- Self-harm contentself-harm/intent- Intent to self-harmself-harm/instructions- Self-harm instructionsviolence- Violent contentviolence/graphic- Graphic violenceillicit- Illicit contentillicit/violent- Violent illicit content
Configuration:
{
"name": "Moderation",
"config": {
"categories": [
"sexual",
"hate",
"harassment",
"self-harm",
"violence"
]
}
}
Input Guardrails
Input guardrails validate user requests using LLM-based checks:
Jailbreak
Detects prompt injection and jailbreak attempts.
Configuration:
{
"name": "Jailbreak",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"include_reasoning": false
}
}
Off Topic Prompts
Ensures requests stay within the defined scope of your agent.
Configuration:
{
"name": "Off Topic Prompts",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "You are a customer service assistant. Raise the guardrail if questions aren't about customer service.",
"include_reasoning": false
}
}
Custom Prompt Check
Define custom validation logic based on your specific requirements.
Configuration:
{
"name": "Custom Prompt Check",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "Custom validation instructions here.",
"include_reasoning": false
}
}
Output Guardrails
Output guardrails validate agent responses:
Contains PII (Output)
Prevents sensitive information from being included in responses.
Configuration:
{
"name": "Contains PII",
"config": {
"entities": ["EMAIL_ADDRESS", "PHONE_NUMBER", "CREDIT_CARD"],
"block": true
}
}
NSFW Text
Detects and blocks not-safe-for-work content in responses.
Configuration:
{
"name": "NSFW Text",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"include_reasoning": false
}
}
URL Filter
Controls URL inclusion in responses.
Configuration:
{
"name": "URL Filter",
"config": {}
}
Custom Prompt Check (Output)
Custom validation for agent responses.
Configuration:
{
"name": "Custom Prompt Check",
"config": {
"confidence_threshold": 0.7,
"model": "gpt-4o-mini",
"system_prompt_details": "Custom output validation instructions.",
"include_reasoning": false
}
}
Configuration Options
Common Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
confidence_threshold | float | Sensitivity level (0.0 - 1.0) | 0.7 |
model | string | LLM model for validation | gpt-4o-mini |
include_reasoning | boolean | Include explanation in logs | false |