Runtime Policy Enforcement for Autonomous AI Systems

Semantic Guards and Dynamic Redaction

Abstract

Autonomous AI agents present fundamental governance challenges. Without runtime policy enforcement, agents can access unauthorized integrations, expose sensitive data, perform destructive operations, and violate compliance requirements. Traditional approaches rely on prompt engineering or post-execution audits, neither of which provide deterministic guarantees about agent behavior.

This paper introduces a runtime policy enforcement architecture for autonomous systems, comprising two complementary primitives: Semantic Guards for pre-execution policy validation and Dynamic Redaction for post-execution data masking. Together, these primitives enable declarative policy configuration that enforces integration allowlists, side effect controls, PII protection, and scope verification without requiring agents to reason about compliance.

The system integrates with Cognitive Trust Certificates (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems) to provide cryptographic proof that workflows were validated against policies before execution and that sensitive data was redacted according to configuration. Policy enforcement operates over Semio’s typed tool contracts (see Semio: A Semantic Interface Layer for Tool-Oriented AI Systems), where side effect classifications, PII field annotations, and budget constraints are declared as part of the tool’s semantic surface. This transforms agent deployment from “trust and audit later” to “verify then execute with proof.”


Problem Landscape

Agents as Security Liability

Modern AI agents have broad capabilities but lack granular runtime controls. When an agent is given access to enterprise systems, it typically receives:

  • Unrestricted integration access - All authenticated APIs are available
  • Full operation permissions - Read, write, and delete capabilities
  • Unfiltered data access - Complete visibility into API responses
  • No compliance boundaries - No enforcement of regulatory requirements

This creates unacceptable risk:

  • Agents can access integrations they shouldn’t (e.g., production database when only dev is authorized)
  • Agents can perform destructive operations (e.g., bulk delete) without approval
  • Agents can expose PII in logs, tool outputs, or LLM context
  • Agents can violate scope restrictions despite OAuth policies

Prompt Engineering is Insufficient

A common approach is instructing agents to follow policies through system prompts:

You are an AI assistant. You must:
- Never access production databases
- Never delete records without confirmation
- Always redact PII before logging
- Only use approved integrations

This fails because:

  • Non-deterministic - LLMs can ignore instructions
  • Brittle - Adversarial prompts can override policies
  • No guarantees - Cannot prove policy compliance
  • Poor UX - Agents spend tokens reasoning about permissions

Post-Execution Audits Are Too Late

Another approach is auditing agent actions after execution:

  1. Agent performs operations
  2. Logs are reviewed later
  3. Violations are detected
  4. Damage control begins

This is inadequate because:

  • Reactive - Damage occurs before detection
  • Incomplete - Not all violations leave audit trails
  • Resource intensive - Manual log review doesn’t scale
  • No prevention - Only detection after the fact

Compliance as Blocking Concern

Enterprises face regulatory requirements that agents must satisfy:

GDPR (EU):

  • Article 25: Data protection by design
  • Article 32: Security of processing
  • Article 30: Records of processing activities

CCPA (California):

  • Reasonable security procedures
  • Data minimization requirements
  • Consumer rights enforcement

HIPAA (Healthcare):

  • Access controls (Section 164.312)
  • Audit controls (Section 164.312)
  • Data integrity (Section 164.312)

SOC 2 (Trust Services):

  • CC6.1: Logical access controls
  • CC6.6: Management of system operations
  • CC7.2: System monitoring

Without deterministic policy enforcement, agents cannot operate in regulated environments.


Design Principles

1. Declarative Policies, Not Prompt Engineering

Policies should be configuration, not instructions to the LLM. This enables:

  • Deterministic enforcement - Same input, same output
  • Independent verification - Third parties can validate policies
  • No token overhead - Agents don’t reason about permissions
  • Fail-safe defaults - Deny unless explicitly allowed

2. Defense in Depth

Multiple enforcement points provide layered security:

  • Pre-execution guards - Block disallowed operations before they start
  • Post-execution redaction - Mask sensitive data in responses
  • Policy validation in CTCs - Cryptographic proof of compliance checks
  • Audit logging - Record all policy decisions for review

3. Zero-Trust for Agents

Agents should have minimal default permissions:

  • Allowlist integrations - Only explicitly approved services
  • Constrain side effects - Read-only by default, write only when needed
  • Block destructive ops - Delete/bulk operations require approval
  • Verify scopes - Ensure OAuth permissions match requirements

4. Transparent to Agents

Policy enforcement should not require agent awareness (see Beyond MCP: The Missing Infrastructure Layer for the broader argument that intelligence infrastructure should be invisible to agents):

  • No prompt instructions - Agents don’t need to know about policies
  • Clean error messages - Policy violations return standard errors
  • Automatic compliance - Happens at infrastructure layer
  • No reasoning overhead - Zero tokens spent on permission checks

Semantic Guard Architecture

Purpose

Semantic Guards are pre-execution policy validators that check if a tool invocation is allowed before any API call occurs. They evaluate policies based on tool metadata and runtime context.

Guard Primitives

Integration Allowlists

Control which services agents can access:

policy = %{
  allowed_integrations: ["salesforce", "slack", "notion"]
}

# Agent tries to call "stripe" tool
SemanticGuard.allow?(server_id, %{integration: "stripe"})
# => {:error, %{code: -32600, message: "integration_not_allowed"}}

Use cases:

  • Separate dev/staging/production environments
  • Limit agents to approved vendors
  • Enforce integration budget constraints
  • Enable gradual rollout of new integrations

Side Effect Controls

Restrict read vs. write operations:

policy = %{
  allow_side_effects: ["read"]  # Only read operations allowed
}

# Agent tries to create a record
SemanticGuard.allow?(server_id, %{side_effect: "write"})
# => {:error, %{code: -32600, message: "side_effect_blocked"}}

Side effect classifications:

  • none - Pure computation, no external state change
  • read - Fetch data, no mutations
  • write - Create or update records
  • delete - Remove data (most restricted)

Use cases:

  • Read-only agents for reporting
  • Write protection during testing
  • Approval workflows for mutations
  • Audit requirements for data changes

Destructive Operation Blocking

Prevent irreversible actions:

policy = %{
  allow_destructive: false
}

# Agent tries bulk delete
SemanticGuard.allow?(server_id, %{destructive: true})
# => {:error, %{code: -32600, message: "destructive_blocked"}}

Destructive markers:

  • Bulk operations (>10 records)
  • Permanent deletions
  • Schema migrations
  • Configuration changes

Use cases:

  • Prevent accidental data loss
  • Require human approval for deletions
  • Enforce backup-before-delete workflows
  • Compliance with retention policies

Scope Verification

Ensure OAuth permissions are sufficient:

policy = %{
  granted_scopes: ["read:contacts", "read:leads"]
}

# Tool requires write permission
SemanticGuard.allow?(server_id, %{requires_scopes: ["write:contacts"]})
# => {:error, %{code: -32602, message: "missing_scopes", 
#      data: %{missing: ["write:contacts"]}}}

Use cases:

  • Enforce least-privilege OAuth grants
  • Prevent scope creep over time
  • Detect permission mismatches
  • Support multiple auth contexts

PII Protection

Block tools that expose sensitive data when policy forbids:

policy = %{
  pii_allowed: false
}

# Tool is marked as exposing PII
SemanticGuard.allow?(server_id, %{pii: true})
# => {:error, %{code: -32600, message: "pii_blocked"}}

PII classifications (tool metadata):

  • pii: true - Tool may return email, phone, SSN, etc.
  • pii: false - Tool returns only non-sensitive data
  • pii_fields: ["email", "ssn"] - Specific fields contain PII

Use cases:

  • GDPR/CCPA compliance
  • Minimize data exposure to LLMs
  • Separate dev/prod data policies
  • Support data residency requirements

Dynamic Risk Assessment

Detect PII in arguments even if tool isn’t flagged:

policy = %{
  pii_allowed: false
}

# Agent passes email in arguments
SemanticGuard.allow?(server_id, %{pii: false}, %{
  email: "user@example.com"
})
# => {:error, %{code: -32600, message: "pii_detected_in_args"}}

Detection heuristics:

  • Email patterns (@ symbol, domain format)
  • SSN patterns (XXX-XX-XXXX)
  • Credit card patterns (16 digits)
  • Phone patterns (parentheses, hyphens)

Use cases:

  • Catch accidental PII exposure
  • Prevent agents from leaking data
  • Enforce input sanitization
  • Support zero-trust data handling

Policy Composition

Policies can be layered with inheritance:

# Server-level policy (base)
server_policy = %{
  allow_side_effects: ["read", "write"],
  allow_destructive: false,
  pii_allowed: false
}

# Integration-specific override
salesforce_policy = Map.merge(server_policy, %{
  pii_allowed: true,  # Salesforce needs PII access
  required_scopes: ["read:leads", "write:leads"]
})

Inheritance rules:

  • Server policy is the base default
  • Integration policies merge on top, per integration
  • Tool policies can further constrain
  • Most restrictive policy wins; child policies can only tighten, never loosen

Conflict Resolution

Policy conflicts are resolved by monotonic restriction. An integration-level policy cannot grant permissions that the server-level policy denies. If the server policy sets pii_allowed: false, an integration override setting pii_allowed: true is accepted because the integration has a legitimate need, but the converse (a server allowing PII while an integration blocks it) is also valid. The merge always selects the more restrictive value at each field.

When a tool requires access that the resolved policy forbids (e.g., a workflow requires PII access to function but policy blocks it), the tool invocation is rejected with a structured error. There is no automatic escalation or override. The user must explicitly adjust the policy. This fail-closed design ensures that policy violations are never silently resolved.

Enforcement Invariant

Every tool invocation, regardless of entry point, passes through the same enforcement chain: Lookup -> Guard -> Execute -> Redact. This pattern is an architectural invariant maintained across all execution paths. The MCP gateway, interactive sandbox, and agentic loop all apply identical policy enforcement. No execution path bypasses the guard or the redactor.


Dynamic Redaction Architecture

Purpose

Redactors mask sensitive data in tool responses after execution but before returning to agents. This provides defense-in-depth: even if a guard is bypassed, sensitive data is still protected.

Redaction Strategies

Field-Aware Masking

Target specific fields with appropriate masking:

policy = %{
  field_redactions: %{
    "email" => "mask_email",
    "phone" => "mask_phone",
    "ssn" => "mask_all"
  }
}

# Original response
%{
  name: "John Doe",
  email: "john.doe@example.com",
  phone: "(555) 123-4567"
}

# Redacted response
%{
  name: "John Doe",
  email: "j*******@e******.com",
  phone: "(**) ***-****"
}

Masking strategies:

  • mask_email - Preserve structure, mask local/domain parts
  • mask_phone - Replace digits with asterisks
  • mask_all - Replace entire value with fixed-length mask
  • apron - Keep N chars at start/end, mask middle

Apron Masking

Preserve prefix/suffix for readability:

policy = %{
  field_redactions: %{
    "api_key" => %{
      "strategy" => "apron",
      "apron" => 3  # Keep 3 chars on each end
    }
  }
}

# Original: "sk_live_4eC39HqLyjWDarjtT1zdp7dc"
# Redacted: "sk_******************7dc"

Configuration:

  • apron: N - Number of characters to preserve
  • mask_char: "*" - Character for masking (default: *)
  • fixed_length: M - Always produce M-length mask

Use cases:

  • API keys (show prefix for debugging)
  • Tokens (preserve format hints)
  • IDs (maintain length for UI spacing)
  • Account numbers (last 4 digits visible)

Structure-Preserving Scrambling

Generate realistic fake data that maintains format:

policy = %{
  field_redactions: %{
    "email" => "scramble",
    "phone" => "scramble",
    "ssn" => "scramble"
  }
}

# Original
%{
  email: "john.smith@acme.com",
  phone: "(555) 123-4567",
  ssn: "123-45-6789"
}

# Scrambled (realistic fakes)
%{
  email: "jane.jones@example.com",
  phone: "(722) 456-8901",
  ssn: "234-56-7890"
}

Scrambled values are deterministic: the same input produces the same fake output across invocations, ensuring consistency in logs and audit trails without exposing real data. Determinism is achieved through keyed hashing with a deployment-specific salt, so scrambled values cannot be reversed or predicted across deployments.

Generation rules:

  • Emails: Random first/last name + test domains
  • Phones: Random area codes (200-999 range)
  • SSNs: Random valid format (not real numbers)
  • Dates: Random dates in past 50 years
  • Names: Pool of common first/last names
  • Addresses: Random street numbers + common streets

Use cases:

  • Development/testing with realistic data
  • Demo environments with fake but structured data
  • UI testing without exposing real PII
  • Training datasets with privacy preservation

Auto-Masking on Policy Mismatch

Aggressive redaction when tool PII flag conflicts with policy:

policy = %{
  pii_allowed: false
}

tool_meta = %{
  pii: true  # Tool may return PII
}

# Response contains obvious PII
response = %{
  contact: "john@example.com"
}

# Auto-masked despite passing guard
%{
  contact: "j***@e******.com"
}

Triggers:

  • Tool marked pii: true but policy says pii_allowed: false
  • Pattern matching finds email/phone despite tool not flagged
  • Dynamic risk detection in arguments

Use cases:

  • Belt-and-suspenders PII protection
  • Catch tool metadata errors
  • Enforce data minimization
  • Support defense-in-depth

Configurable Per-Integration

Different integrations need different policies:

salesforce_policy = %{
  field_redactions: %{
    "Email" => "apron_2",
    "Phone" => "mask_phone"
  }
}

stripe_policy = %{
  field_redactions: %{
    "card" => %{
      "strategy" => "apron",
      "apron" => 4  # Show last 4 digits
    },
    "ssn_last_4" => "mask_all"
  }
}

Integration with Cognitive Trust Certificates

Identity-Anchored Enforcement

Policy enforcement presupposes authenticated identity: the system must know who is subject to a policy before it can evaluate what they are allowed to do. In substrate and conduit environments, identity is established through the platform’s Certificate Authority, which issues short-lived X.509 certificates for mutual TLS authentication (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents for the certificate lifecycle). Every policy evaluation occurs within an mTLS session where the caller’s identity has already been cryptographically verified against the CA chain. This means policy decisions are bound to a verified principal, not to a bearer token that could be replayed or shared.

The same CA root that authenticates the caller also anchors the CTC signatures that prove policy compliance after validation. Identity verification and compliance proof share a common trust root, creating an unbroken chain from “who is asking” through “what are they allowed to do” to “proof that what they did was validated.”

Policy Validation During CTC Generation

When a CTC is generated for a workflow plan (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems), every step is checked against the active policy. If any step violates policy (blocked integration, disallowed side effect, PII exposure without clearance), CTC generation fails and the plan is rejected before execution.

The CTC records which policies were enforced, including a snapshot of the policy configuration at validation time. This snapshot is included in the CTC evidence, ensuring that auditors can verify what rules were in effect when the plan was approved.

Execution Requires Valid CTC

Workflows cannot execute without a valid CTC that includes policy validation evidence. The execution layer verifies the CTC signature and confirms the policy snapshot before allowing any tool invocation.

Audit Trail Completeness

The combination of CTCs and policy enforcement provides a complete audit trail:

  1. Pre-execution: CTC proves policies were validated
  2. During execution: Guards enforce policies in real-time
  3. Post-execution: Redactor masks sensitive data
  4. Audit: CTC + execution logs show complete lineage

For compliance audits: show the CTC proving validation, show the policy configuration at execution time, show redaction was applied, and prove cryptographically that the workflow matched the validated plan.


Enterprise Implications

Compliance Enablement

GDPR Article 25 (Data Protection by Design):

  • Policies enforce data minimization by default
  • PII protection is architectural, not procedural
  • Redaction ensures “privacy by design and by default”

GDPR Article 30 (Records of Processing):

  • CTCs provide immutable record of what tools accessed what data
  • Policy snapshots show what controls were in place
  • Audit logs demonstrate continuous compliance

SOC 2 CC6.1 (Logical Access Controls):

  • Integration allowlists enforce least-privilege access
  • Scope verification ensures proper authorization
  • Policy inheritance supports role-based access

HIPAA Section 164.312(a)(1) (Access Control):

  • Semantic Guards implement required access controls
  • Policies define who/what can access PHI
  • Audit trails support compliance demonstrations

Risk Mitigation

Prevent Data Breaches:

  • PII detection stops accidental exposure
  • Redaction provides defense-in-depth
  • Integration allowlists limit blast radius

Prevent Destructive Operations:

  • Destructive operation blocking prevents data loss
  • Approval workflows for high-risk actions
  • Side effect controls enforce read-only where appropriate

Enable Incident Response:

  • Complete audit trail for forensics
  • Policy snapshots show what was allowed
  • CTC evidence supports root cause analysis

Operational Benefits

Developer Experience:

  • Declarative policies are readable
  • No prompt engineering for permissions
  • Policies version-controlled with code

Scalability:

  • Policies enforce automatically at runtime
  • No manual review of agent actions
  • Deterministic enforcement across all agents

Gradual Rollout:

  • Start with restrictive policies
  • Expand permissions incrementally
  • Test in dev before prod

Multi-Tenancy:

  • Per-server policies for isolation
  • Per-integration overrides for flexibility
  • Inheritance reduces duplication

Policy Configuration Examples

Development Environment

Restrictive policies for testing:

allow_side_effects:
  - read
allow_destructive: false
pii_allowed: false
allowed_integrations:
  - salesforce_sandbox
  - slack_dev
field_redactions:
  email: scramble
  phone: scramble
  ssn: mask_all

Production Environment

More permissive but with redaction:

allow_side_effects:
  - read
  - write
allow_destructive: false  # Still no destructive ops
pii_allowed: true  # Allow PII access
allowed_integrations:
  - salesforce_prod
  - slack_prod
  - stripe_prod
field_redactions:
  email: apron_2
  phone: mask_phone
  card: apron_4
  ssn: mask_all

Read-Only Analyst Agent

For reporting and analytics:

allow_side_effects:
  - read
allow_destructive: false
pii_allowed: false
allowed_integrations:
  - warehouse
  - analytics
  - reporting
field_redactions:
  email: mask_email
  user_id: fixed_length_10

Administrative Agent

Higher privileges, human approval required:

allow_side_effects:
  - read
  - write
  - delete  # Only for admin agent
allow_destructive: true  # With approval workflow
pii_allowed: true
allowed_integrations: any
field_redactions:
  ssn: mask_all  # Always mask SSN
  card: apron_4  # Show last 4

Performance Considerations

Guard Evaluation Overhead

Policy checks add minimal latency:

  • Integration check: O(1) hash lookup
  • Side effect check: O(1) set membership
  • Scope verification: O(n) where n = number of scopes
  • PII detection: O(m) where m = argument depth

Typical overhead: <5ms per tool call

Redaction Performance

Masking strategies have different costs:

  • Simple masking: O(n) where n = string length
  • Apron masking: O(n) string operations
  • Scrambling: O(n) + random generation
  • Deep walk: O(d * k) where d = depth, k = keys

Typical overhead: <10ms per response

Optimization Strategies

Policy Caching:

  • Pre-compile policies at server startup
  • Cache policy lookups by server_id
  • Invalidate cache on policy updates

Lazy Evaluation:

  • Check integration allowlist first (fail fast)
  • Skip PII detection if policy allows PII
  • Short-circuit on first violation

Batching:

  • Validate multiple steps in single pass
  • Apply redaction to batched responses
  • Amortize policy fetch overhead

Future Directions

ML-Based Policy Suggestions

Analyze agent behavior to recommend policies:

  • Identify unused integration permissions
  • Detect overly permissive scopes
  • Suggest tighter redaction rules
  • Propose least-privilege configurations

Anomaly Detection

Flag suspicious patterns:

  • Sudden spike in destructive operations
  • Unusual integration access patterns
  • Policy violation attempts
  • Scope escalation requests

Policy Evolution

Support policy changes over time:

  • Staged rollout of policy updates
  • A/B testing of policy configurations
  • Automatic policy optimization
  • Rollback on excessive violations

Cross-Organization Policy Sharing

Enable policy templates:

  • Industry-specific policy baselines (healthcare, finance)
  • Compliance-driven templates (GDPR, HIPAA)
  • Best practice policies from community
  • Vendor-recommended configurations

Advanced Redaction

Enhanced masking capabilities:

  • Tokenization (reversible masking for authorized users)
  • Format-preserving encryption
  • Differential privacy guarantees
  • Contextual redaction (mask based on recipient)

Appendix: Policy Decision Flow

Pre-Execution (Semantic Guard)

Agent requests tool execution
         |
Extract policy for server/integration
         |
Check integration allowlist
  - Not allowed -> Deny
  - Allowed -> Continue
         |
Check side effect permissions
  - Not allowed -> Deny
  - Allowed -> Continue
         |
Check destructive flag
  - Blocked -> Deny
  - Allowed -> Continue
         |
Check OAuth scopes
  - Missing scopes -> Deny
  - Scopes sufficient -> Continue
         |
Check PII flag
  - PII blocked -> Deny
  - Allowed -> Continue
         |
Check for PII in arguments
  - PII detected -> Deny
  - Clean -> Allow
         |
Execute tool

Post-Execution (Redactor)

Tool returns response
         |
Extract redaction policy
         |
Walk response structure
  - For each field:
    - Check field_redactions config
    - Apply masking strategy
    - Replace value
  - Continue
         |
Check for auto-masking conditions
  - Tool PII=true, Policy PII=false
    - Apply aggressive masking
  - Continue
         |
Return redacted response

This document describes the conceptual architecture of runtime policy enforcement for autonomous systems. Specific detection heuristics, masking algorithms, and performance optimizations are withheld to protect operational security while enabling understanding of the governance model.

Author: Nicholas Wright

Title: Co-Founder & Chief Architect, DataGrout AI

Affiliation: DataGrout Labs

Version: 1.0

Published: January 2026

For questions or collaboration: labs@datagrout.ai