Runtime Policy Enforcement for Autonomous AI Systems

Semantic Guards and Dynamic Redaction

Abstract

Autonomous AI agents present fundamental governance challenges. Without runtime policy enforcement, agents can access unauthorized integrations, expose sensitive data, perform destructive operations, and violate compliance requirements. Traditional approaches rely on prompt engineering or post-execution audits, neither of which provide deterministic guarantees about agent behavior.

This paper introduces a runtime policy enforcement architecture for autonomous systems, comprising two complementary primitives: Semantic Guards for pre-execution policy validation and Dynamic Redaction for post-execution data masking. Together, these primitives enable declarative policy configuration that enforces integration allowlists, side effect controls, PII protection, and scope verification without requiring agents to reason about compliance.

The system integrates with Cognitive Trust Certificates (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems) to provide cryptographic proof that workflows were validated against policies before execution and that sensitive data was redacted according to configuration. Policy enforcement operates over Semio’s typed tool contracts (see Semio: A Semantic Interface Layer for Tool-Oriented AI Systems), where side effect classifications, PII field annotations, and budget constraints are declared as part of the tool’s semantic surface. This transforms agent deployment from “trust and audit later” to “verify then execute with proof.”

Problem Landscape

Agents as Security Liability

Modern AI agents have broad capabilities but lack granular runtime controls. When an agent is given access to enterprise systems, it typically receives:

Unrestricted integration access - All authenticated APIs are available
Full operation permissions - Read, write, and delete capabilities
Unfiltered data access - Complete visibility into API responses
No compliance boundaries - No enforcement of regulatory requirements

This creates unacceptable risk:

Agents can access integrations they shouldn’t (e.g., production database when only dev is authorized)
Agents can perform destructive operations (e.g., bulk delete) without approval
Agents can expose PII in logs, tool outputs, or LLM context
Agents can violate scope restrictions despite OAuth policies

Prompt Engineering is Insufficient

A common approach is instructing agents to follow policies through system prompts:

You are an AI assistant. You must:
- Never access production databases
- Never delete records without confirmation
- Always redact PII before logging
- Only use approved integrations

This fails because:

Non-deterministic - LLMs can ignore instructions
Brittle - Adversarial prompts can override policies
No guarantees - Cannot prove policy compliance
Poor UX - Agents spend tokens reasoning about permissions

Post-Execution Audits Are Too Late

Another approach is auditing agent actions after execution:

Agent performs operations
Logs are reviewed later
Violations are detected
Damage control begins

This is inadequate because:

Reactive - Damage occurs before detection
Incomplete - Not all violations leave audit trails
Resource intensive - Manual log review doesn’t scale
No prevention - Only detection after the fact

Compliance as Blocking Concern

Enterprises face regulatory requirements that agents must satisfy:

GDPR (EU):

Article 25: Data protection by design
Article 32: Security of processing
Article 30: Records of processing activities

CCPA (California):

Reasonable security procedures
Data minimization requirements
Consumer rights enforcement

HIPAA (Healthcare):

Access controls (Section 164.312)
Audit controls (Section 164.312)
Data integrity (Section 164.312)

SOC 2 (Trust Services):

CC6.1: Logical access controls
CC6.6: Management of system operations
CC7.2: System monitoring

Without deterministic policy enforcement, agents cannot operate in regulated environments.

Design Principles

1. Declarative Policies, Not Prompt Engineering

Policies should be configuration, not instructions to the LLM. This enables:

Deterministic enforcement - Same input, same output
Independent verification - Third parties can validate policies
No token overhead - Agents don’t reason about permissions
Fail-safe defaults - Deny unless explicitly allowed

2. Defense in Depth

Multiple enforcement points provide layered security:

Pre-execution guards - Block disallowed operations before they start
Post-execution redaction - Mask sensitive data in responses
Policy validation in CTCs - Cryptographic proof of compliance checks
Audit logging - Record all policy decisions for review

3. Zero-Trust for Agents

Agents should have minimal default permissions:

Allowlist integrations - Only explicitly approved services
Constrain side effects - Read-only by default, write only when needed
Block destructive ops - Delete/bulk operations require approval
Verify scopes - Ensure OAuth permissions match requirements

4. Transparent to Agents

Policy enforcement should not require agent awareness (see Beyond MCP: The Missing Infrastructure Layer for the broader argument that intelligence infrastructure should be invisible to agents):

No prompt instructions - Agents don’t need to know about policies
Clean error messages - Policy violations return standard errors
Automatic compliance - Happens at infrastructure layer
No reasoning overhead - Zero tokens spent on permission checks

Semantic Guard Architecture

Purpose

Semantic Guards are pre-execution policy validators that check if a tool invocation is allowed before any API call occurs. They evaluate policies based on tool metadata and runtime context.

Guard Primitives

Integration Allowlists

Control which services agents can access:

policy = %{
  allowed_integrations: ["salesforce", "slack", "notion"]
}

# Agent tries to call "stripe" tool
SemanticGuard.allow?(server_id, %{integration: "stripe"})
# => {:error, %{code: -32600, message: "integration_not_allowed"}}

Use cases:

Separate dev/staging/production environments
Limit agents to approved vendors
Enforce integration budget constraints
Enable gradual rollout of new integrations

Side Effect Controls

Restrict read vs. write operations:

policy = %{
  allow_side_effects: ["read"]  # Only read operations allowed
}

# Agent tries to create a record
SemanticGuard.allow?(server_id, %{side_effect: "write"})
# => {:error, %{code: -32600, message: "side_effect_blocked"}}

Side effect classifications:

none - Pure computation, no external state change
read - Fetch data, no mutations
write - Create or update records
delete - Remove data (most restricted)

Use cases:

Read-only agents for reporting
Write protection during testing
Approval workflows for mutations
Audit requirements for data changes

Destructive Operation Blocking

Prevent irreversible actions:

policy = %{
  allow_destructive: false
}

# Agent tries bulk delete
SemanticGuard.allow?(server_id, %{destructive: true})
# => {:error, %{code: -32600, message: "destructive_blocked"}}

Destructive markers:

Bulk operations (>10 records)
Permanent deletions
Schema migrations
Configuration changes

Use cases:

Prevent accidental data loss
Require human approval for deletions
Enforce backup-before-delete workflows
Compliance with retention policies

Scope Verification

Ensure OAuth permissions are sufficient:

policy = %{
  granted_scopes: ["read:contacts", "read:leads"]
}

# Tool requires write permission
SemanticGuard.allow?(server_id, %{requires_scopes: ["write:contacts"]})
# => {:error, %{code: -32602, message: "missing_scopes", 
#      data: %{missing: ["write:contacts"]}}}

Use cases:

Enforce least-privilege OAuth grants
Prevent scope creep over time
Detect permission mismatches
Support multiple auth contexts

PII Protection

Block tools that expose sensitive data when policy forbids:

policy = %{
  pii_allowed: false
}

# Tool is marked as exposing PII
SemanticGuard.allow?(server_id, %{pii: true})
# => {:error, %{code: -32600, message: "pii_blocked"}}

PII classifications (tool metadata):

pii: true - Tool may return email, phone, SSN, etc.
pii: false - Tool returns only non-sensitive data
pii_fields: ["email", "ssn"] - Specific fields contain PII

Use cases:

GDPR/CCPA compliance
Minimize data exposure to LLMs
Separate dev/prod data policies
Support data residency requirements

Dynamic Risk Assessment

Detect PII in arguments even if tool isn’t flagged:

policy = %{
  pii_allowed: false
}

# Agent passes email in arguments
SemanticGuard.allow?(server_id, %{pii: false}, %{
  email: "user@example.com"
})
# => {:error, %{code: -32600, message: "pii_detected_in_args"}}

Detection heuristics:

Email patterns (@ symbol, domain format)
SSN patterns (XXX-XX-XXXX)
Credit card patterns (16 digits)
Phone patterns (parentheses, hyphens)

Use cases:

Catch accidental PII exposure
Prevent agents from leaking data
Enforce input sanitization
Support zero-trust data handling

Policy Composition

Policies can be layered with inheritance:

# Server-level policy (base)
server_policy = %{
  allow_side_effects: ["read", "write"],
  allow_destructive: false,
  pii_allowed: false
}

# Integration-specific override
salesforce_policy = Map.merge(server_policy, %{
  pii_allowed: true,  # Salesforce needs PII access
  required_scopes: ["read:leads", "write:leads"]
})

Inheritance rules:

Server policy is the base default
Integration policies merge on top, per integration
Tool policies can further constrain
Most restrictive policy wins; child policies can only tighten, never loosen

Conflict Resolution

Policy conflicts are resolved by monotonic restriction. An integration- or mux-level policy cannot grant permissions that the server-level policy denies. Child overrides may only tighten the parent posture: side effects clamp to a subset, destructive access cannot be widened, PII access cannot be widened, Warden settings may only become stricter, and field redactions remain additive. There is no implicit PII carve-out at the child layer. If a future workflow needs an exception, it should be modeled as an explicit approval capability rather than a hidden policy override.

When a tool requires access that the resolved policy forbids (e.g., a workflow requires PII access to function but policy blocks it), the tool invocation is rejected with a structured error. There is no automatic escalation or override. The user must explicitly adjust the policy. This fail-closed design ensures that policy violations are never silently resolved.

Enforcement Invariant

Every tool invocation, regardless of entry point, passes through the same enforcement chain: Lookup -> Guard -> Execute -> Redact. This pattern is an architectural invariant maintained across all execution paths. The MCP gateway, interactive sandbox, and agentic loop all apply identical policy enforcement. No execution path bypasses the guard or the redactor.

Dynamic Redaction Architecture

Purpose

Redactors mask sensitive data in tool responses after execution but before returning to agents. This provides defense-in-depth: even if a guard is bypassed, sensitive data is still protected.

Redaction Strategies

Field-Aware Masking

Target specific fields with appropriate masking:

policy = %{
  field_redactions: %{
    "email" => "mask_email",
    "phone" => "mask_phone",
    "ssn" => "mask_all"
  }
}

# Original response
%{
  name: "John Doe",
  email: "john.doe@example.com",
  phone: "(555) 123-4567"
}

# Redacted response
%{
  name: "John Doe",
  email: "j*******@e******.com",
  phone: "(**) ***-****"
}

Masking strategies:

mask_email - Preserve structure, mask local/domain parts
mask_phone - Replace digits with asterisks
mask_all - Replace entire value with fixed-length mask
apron - Keep N chars at start/end, mask middle

Apron Masking

Preserve prefix/suffix for readability:

policy = %{
  field_redactions: %{
    "api_key" => %{
      "strategy" => "apron",
      "apron" => 3  # Keep 3 chars on each end
    }
  }
}

# Original: "sk_live_4eC39HqLyjWDarjtT1zdp7dc"
# Redacted: "sk_******************7dc"

Configuration:

apron: N - Number of characters to preserve
mask_char: "*" - Character for masking (default: *)
fixed_length: M - Always produce M-length mask

Use cases:

API keys (show prefix for debugging)
Tokens (preserve format hints)
IDs (maintain length for UI spacing)
Account numbers (last 4 digits visible)

Structure-Preserving Scrambling

Generate realistic fake data that maintains format:

policy = %{
  field_redactions: %{
    "email" => "scramble",
    "phone" => "scramble",
    "ssn" => "scramble"
  }
}

# Original
%{
  email: "john.smith@acme.com",
  phone: "(555) 123-4567",
  ssn: "123-45-6789"
}

# Scrambled (realistic fakes)
%{
  email: "jane.jones@example.com",
  phone: "(722) 456-8901",
  ssn: "234-56-7890"
}

Scrambled values are deterministic: the same input produces the same fake output across invocations, ensuring consistency in logs and audit trails without exposing real data. Determinism is achieved through keyed hashing with a deployment-specific salt, so scrambled values cannot be reversed or predicted across deployments.

Generation rules:

Emails: Random first/last name + test domains
Phones: Random area codes (200-999 range)
SSNs: Random valid format (not real numbers)
Dates: Random dates in past 50 years
Names: Pool of common first/last names
Addresses: Random street numbers + common streets

Use cases:

Development/testing with realistic data
Demo environments with fake but structured data
UI testing without exposing real PII
Training datasets with privacy preservation

Auto-Masking on Policy Mismatch

Aggressive redaction when tool PII flag conflicts with policy:

policy = %{
  pii_allowed: false
}

tool_meta = %{
  pii: true  # Tool may return PII
}

# Response contains obvious PII
response = %{
  contact: "john@example.com"
}

# Auto-masked despite passing guard
%{
  contact: "j***@e******.com"
}

Triggers:

Tool marked pii: true but policy says pii_allowed: false
Pattern matching finds email/phone despite tool not flagged
Dynamic risk detection in arguments

Use cases:

Belt-and-suspenders PII protection
Catch tool metadata errors
Enforce data minimization
Support defense-in-depth

Configurable Per-Integration

Different integrations need different policies:

salesforce_policy = %{
  field_redactions: %{
    "Email" => "apron_2",
    "Phone" => "mask_phone"
  }
}

stripe_policy = %{
  field_redactions: %{
    "card" => %{
      "strategy" => "apron",
      "apron" => 4  # Show last 4 digits
    },
    "ssn_last_4" => "mask_all"
  }
}

Integration with Cognitive Trust Certificates

Identity-Anchored Enforcement

Policy enforcement presupposes authenticated identity: the system must know who is subject to a policy before it can evaluate what they are allowed to do. In substrate and conduit environments, identity is established through the platform’s Certificate Authority, which issues short-lived X.509 certificates for mutual TLS authentication (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents for the certificate lifecycle). Every policy evaluation occurs within an mTLS session where the caller’s identity has already been cryptographically verified against the CA chain. This means policy decisions are bound to a verified principal, not to a bearer token that could be replayed or shared.

The same CA root that authenticates the caller also anchors the CTC signatures that prove policy compliance after validation. Identity verification and compliance proof share a common trust root, creating an unbroken chain from “who is asking” through “what are they allowed to do” to “proof that what they did was validated.”

Policy Validation During CTC Generation

When a CTC is generated for a workflow plan (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems), every step is checked against the active policy. If any step violates policy (blocked integration, disallowed side effect, PII exposure without clearance), CTC generation fails and the plan is rejected before execution.

The CTC records which policies were enforced, including a snapshot of the policy configuration at validation time. This snapshot is included in the CTC evidence, ensuring that auditors can verify what rules were in effect when the plan was approved.

Execution Requires Valid CTC

Workflows cannot execute without a valid CTC that includes policy validation evidence. The execution layer verifies the CTC signature and confirms the policy snapshot before allowing any tool invocation.

Audit Trail Completeness

The combination of CTCs and policy enforcement provides a complete audit trail:

Pre-execution: CTC proves policies were validated
During execution: Guards enforce policies in real-time
Post-execution: Redactor masks sensitive data
Audit: CTC + execution logs show complete lineage

For compliance audits: show the CTC proving validation, show the policy configuration at execution time, show redaction was applied, and prove cryptographically that the workflow matched the validated plan.

Enterprise Implications

Compliance Enablement

GDPR Article 25 (Data Protection by Design):

Policies enforce data minimization by default
PII protection is architectural, not procedural
Redaction ensures “privacy by design and by default”

GDPR Article 30 (Records of Processing):

CTCs provide immutable record of what tools accessed what data
Policy snapshots show what controls were in place
Audit logs demonstrate continuous compliance

SOC 2 CC6.1 (Logical Access Controls):

Integration allowlists enforce least-privilege access
Scope verification ensures proper authorization
Policy inheritance supports role-based access

HIPAA Section 164.312(a)(1) (Access Control):

Semantic Guards implement required access controls
Policies define who/what can access PHI
Audit trails support compliance demonstrations

Risk Mitigation

Prevent Data Breaches:

PII detection stops accidental exposure
Redaction provides defense-in-depth
Integration allowlists limit blast radius

Prevent Destructive Operations:

Destructive operation blocking prevents data loss
Approval workflows for high-risk actions
Side effect controls enforce read-only where appropriate

Enable Incident Response:

Complete audit trail for forensics
Policy snapshots show what was allowed
CTC evidence supports root cause analysis

Operational Benefits

Developer Experience:

Declarative policies are readable
No prompt engineering for permissions
Policies version-controlled with code

Scalability:

Policies enforce automatically at runtime
No manual review of agent actions
Deterministic enforcement across all agents

Gradual Rollout:

Start with restrictive policies
Expand permissions incrementally
Test in dev before prod

Multi-Tenancy:

Per-server policies for isolation
Per-integration overrides for flexibility
Inheritance reduces duplication

Policy Configuration Examples

Development Environment

Restrictive policies for testing:

allow_side_effects:
  - read
allow_destructive: false
pii_allowed: false
allowed_integrations:
  - salesforce_sandbox
  - slack_dev
field_redactions:
  email: scramble
  phone: scramble
  ssn: mask_all

Production Environment

More permissive but with redaction:

allow_side_effects:
  - read
  - write
allow_destructive: false  # Still no destructive ops
pii_allowed: true  # Allow PII access
allowed_integrations:
  - salesforce_prod
  - slack_prod
  - stripe_prod
field_redactions:
  email: apron_2
  phone: mask_phone
  card: apron_4
  ssn: mask_all

Read-Only Analyst Agent

For reporting and analytics:

allow_side_effects:
  - read
allow_destructive: false
pii_allowed: false
allowed_integrations:
  - warehouse
  - analytics
  - reporting
field_redactions:
  email: mask_email
  user_id: fixed_length_10

Administrative Agent

Higher privileges, human approval required:

allow_side_effects:
  - read
  - write
  - delete  # Only for admin agent
allow_destructive: true  # With approval workflow
pii_allowed: true
allowed_integrations: any
field_redactions:
  ssn: mask_all  # Always mask SSN
  card: apron_4  # Show last 4

Performance Considerations

Guard Evaluation Overhead

Policy checks add minimal latency:

Integration check: O(1) hash lookup
Side effect check: O(1) set membership
Scope verification: O(n) where n = number of scopes
PII detection: O(m) where m = argument depth

Typical overhead: <5ms per tool call

Redaction Performance

Masking strategies have different costs:

Simple masking: O(n) where n = string length
Apron masking: O(n) string operations
Scrambling: O(n) + random generation
Deep walk: O(d * k) where d = depth, k = keys

Typical overhead: <10ms per response

Optimization Strategies

Policy Caching:

Pre-compile policies at server startup
Cache policy lookups by server_id
Invalidate cache on policy updates

Lazy Evaluation:

Check integration allowlist first (fail fast)
Skip PII detection if policy allows PII
Short-circuit on first violation

Batching:

Validate multiple steps in single pass
Apply redaction to batched responses
Amortize policy fetch overhead

Future Directions

Policy Intelligence

Behavioral analysis of agent operations could inform policy recommendations: identifying unused integration permissions, detecting overly permissive scopes, and suggesting least-privilege configurations based on observed access patterns.

Advanced Redaction

Potential extensions include tokenization (reversible masking for authorized users), format-preserving encryption, and contextual redaction that adapts masking strategy based on the recipient’s authorization level.

Appendix: Policy Decision Flow

Pre-Execution (Semantic Guard)

Agent requests tool execution
         |
Extract policy for server/integration
         |
Check integration allowlist
  - Not allowed -> Deny
  - Allowed -> Continue
         |
Check side effect permissions
  - Not allowed -> Deny
  - Allowed -> Continue
         |
Check destructive flag
  - Blocked -> Deny
  - Allowed -> Continue
         |
Check OAuth scopes
  - Missing scopes -> Deny
  - Scopes sufficient -> Continue
         |
Check PII flag
  - PII blocked -> Deny
  - Allowed -> Continue
         |
Check for PII in arguments
  - PII detected -> Deny
  - Clean -> Allow
         |
Execute tool

Post-Execution (Redactor)

Tool returns response
         |
Extract redaction policy
         |
Walk response structure
  - For each field:
    - Check field_redactions config
    - Apply masking strategy
    - Replace value
  - Continue
         |
Check for auto-masking conditions
  - Tool PII=true, Policy PII=false
    - Apply aggressive masking
  - Continue
         |
Return redacted response

This document describes the conceptual architecture of runtime policy enforcement for autonomous systems. Specific detection heuristics, masking algorithms, and performance optimizations are withheld to protect operational security while enabling understanding of the governance model.