Walled AI Guardrails
Walled AI guardrails provide safety validation and PII redaction for Agent Kernel interactions. This integration validates user inputs, masks sensitive values before agent execution, and restores placeholders in output responses.
What Walled AI Guardrails Provide
- Input safety validation using Walled AI Protect
- Input PII redaction using Walled AI Redact
- Output unmasking using session-stored placeholder mappings
- Provider-level integration without JSON guardrail rule files
Prerequisites
Install Agent Kernel with Walled AI extras:
pip install agentkernel[walledai]
Set required environment variables:
export WALLED_API_KEY=your-walledai-api-key
export AK_DEBUG=true
Configuration
Configure Walled AI in your config.yaml:
guardrail:
input:
enabled: true
type: walledai
pii: true
output:
enabled: true
type: walledai
pii: true
Equivalent environment-variable configuration:
export AK_GUARDRAIL__INPUT__ENABLED=true
export AK_GUARDRAIL__INPUT__TYPE=walledai
export AK_GUARDRAIL__INPUT__PII=true
export AK_GUARDRAIL__OUTPUT__ENABLED=true
export AK_GUARDRAIL__OUTPUT__TYPE=walledai
export AK_GUARDRAIL__OUTPUT__PII=true
# Optional: disable WalledAI PII redaction/unmasking while keeping safety checks
# export AK_GUARDRAIL__INPUT__PII=false
# export AK_GUARDRAIL__OUTPUT__PII=false
How It Works
Input Guardrails
- Iterate incoming request objects
- For each text request, validate text with Walled AI Protect (safety)
- For each text request, redact sensitive entities with Walled AI Redact
- Preserve non-text requests unchanged (for example file/image inputs)
- Store placeholder mapping in session cache
- Forward masked text requests to the agent
If safety validation fails, Agent Kernel returns a safe refusal response.
Output Guardrails
- Extract outgoing agent text
- Load stored placeholder mapping from session state
- Replace placeholders with original values
- Return unmasked response
Session Mapping Behavior
Walled AI redaction placeholders are persisted in session cache to support follow-up turns and restart-tolerant flows when durable session storage is enabled.
Recommended controls for production:
- Minimize retained mapping scope
- Apply retention/TTL policies
- Restrict storage access
- Encrypt data at rest
Optional: Local WalledGuard-Edge Moderation
If you want to run local moderation experiments, you can use walledai/walledguard-edge from Hugging Face.
According to Walled AI's announcement, WalledGuard-Edge is a 0.6B open-source model (Apache-2.0) positioned as stronger than LlamaGuard3 (1B) across multilingual and multiple jailbreak categories.
- API access and product updates: www.walled.ai
Manual setup
For local inference steps and the latest runnable example code, follow the model card:
- Hugging Face model page: walledai/walledguard-edge
Typical local dependencies include torch and transformers.
This local flow is optional and separate from the default Agent Kernel Walled AI API integration.
Example
Input:
my name is john
Masked request sent to agent:
my name is [Person_1]
When the reply contains [Person_1], output guardrail restores it to john before returning response.
Troubleshooting
Guardrails not triggering
- Ensure input/output guardrails are enabled in config
- Verify
type: walledaifor both input and output - Confirm
WALLED_API_KEYis set in the runtime environment - Enable debug logs with
AK_DEBUG=true
Missing API key or provider errors
Verify the shell environment used to start the runtime includes:
export WALLED_API_KEY=your-walledai-api-key
Unexpected masked placeholders in response
- Ensure output guardrails are enabled
- Ensure session ID is stable across turns
- Verify session storage mode and persistence expectations