Governor: Neuro-Symbolic Runtime for Token-Efficient Agent Cognition

Reflex and Reflection Cycles for Bounded Autonomous Operation

Abstract

Long-running AI agents face a token economics problem. Continuous operation patterns (polling for state changes, repeated status checks, reactive monitoring) generate unbounded LLM inference costs. An agent checking “is it daytime yet?” every 30 minutes burns $20 overnight at Opus pricing, consuming 3M tokens for a decision that should cost zero.

Governor is an optional add-on module for the Arbiter Substrate that provides continuous autonomous execution through a neuro-symbolic runtime. Agent cognition is split into two cycles: Reflex (the symbolic cycle), a lightweight loop that evaluates Prolog triggers over a continuously updated fact database every ~30 seconds, and Reflection (the neural cycle), a full agentic loop that fires when triggers match or on a minimum heartbeat interval (~6 hours). Between cycles, percepts (sensors) update the agent’s world model in real time without consuming inference tokens.

Reflection reflects on accumulated information, updates Prolog rules governing the world model, and defines new triggers for the next Reflex period. This creates a learning loop where agents become progressively more efficient: routine conditions are handled symbolically, and LLM inference is reserved for genuine reasoning.

Key results: 10-100x token reduction, sub-10ms trigger evaluation, automatic conversion of polling patterns to event-driven percepts, and progressive efficiency gains through pattern learning.


Problem Landscape

Token Costs in Long-Running Agents

Production AI agents operate continuously. Traditional architectures use polling:

while True:
    state = check_current_state()  # LLM call
    if should_act(state):          # LLM call
        take_action()              # LLM call
    sleep(interval)

Cost analysis for a reminder agent:

  • Query: “Is it time to remind the user?” (120k tokens per call)
  • Model: Claude Opus 4.5 ($15 per million input tokens)
  • Frequency: Every 30 minutes
  • Cost per check: $1.80
  • Overnight (8 hours, 16 checks): $28.80
  • Monthly: $864

This is economically untenable. Polling-based continuous operation scales linearly with time, not value delivered. Agents need continuous operation but cannot afford continuous inference.


Governor Architecture

Reflex (Symbolic Cycle)

Reflex is a lightweight loop that runs approximately every 30 seconds. Each iteration evaluates a set of triggers against the current fact database. Triggers are Prolog queries defined during the previous Reflection (see The Symbolic Backbone: Why Agent Systems Need Logic Programming for the design rationale behind Prolog). Examples include time-of-day conditions, file system state changes with count thresholds, and budget remaining checks. Each trigger is a declarative pattern that fires when facts satisfy its conditions.

Execution:

  1. Reflex wakes from sleep (~30 seconds)
  2. Evaluates each trigger query against the current fact database
  3. If no triggers match: sleep and repeat
  4. If a trigger matches: fire Reflection with the trigger context

Characteristics:

  • Deterministic trigger matching over Prolog facts
  • Zero token consumption
  • <10ms evaluation latency per cycle
  • Handles the vast majority of cycles without invoking inference

Reflex does not make decisions about commands or policies. Command-level policy enforcement is the domain of ArbiterService and its rule packs (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents). Reflex’s sole purpose is monitoring the world model for conditions that warrant deeper reasoning.

Reflection (Neural Cycle)

Reflection is a full agentic loop that activates when a Reflex trigger fires or when a minimum heartbeat interval (~6 hours) elapses. The heartbeat ensures the agent periodically reflects even if no triggers match, preventing edge cases where important changes go unprocessed.

Reflection process:

  1. Review all new information accumulated since the last Reflection (percept updates, trigger context)
  2. Run agentic reasoning loop (LLM-powered, iterative until satisfied)
  3. Reflect on patterns, anomalies, and new conditions in the world model
  4. Create or update Prolog rules
  5. Define new triggers for the next Reflex period
  6. Go dormant until next trigger or heartbeat

Reflection runs until it is satisfied that it has processed all relevant information and configured appropriate triggers. There is no fixed time limit; it operates as a standard agentic loop with access to tools, facts, and the Prolog rule engine.

What Reflection produces:

  • Updated Prolog rules that refine the world model
  • New or modified triggers for future Reflex evaluation
  • Direct actions (tool calls, notifications, task execution)
  • Percept configuration changes (new sensors, adjusted thresholds)

Cost per Reflection:

  • Tokens: 100-400k (depending on complexity)
  • Cost: $1.50-6.00
  • Value: creates reusable triggers and rules that handle future conditions symbolically
  • Break-even: 1-2 future cycles where symbolic handling avoids inference

Percepts (World Model Sensors)

Percepts are event-driven sensors that update the fact database in real time. They bridge the external environment and the agent’s world model, ensuring Reflex always evaluates triggers against current state.

Percept types:

When a percept fires, it updates facts in the Prolog database – asserting new facts and retracting stale ones. The next Reflex iteration evaluates triggers against these updated facts. Percepts do not trigger Reflection directly; they update facts, and triggers determine when reasoning is needed.


Token Economics

Percept-Driven vs. Polling

Conditions that traditional agents evaluate through repeated LLM calls (directory monitoring, state checks, time conditions) are handled by percepts at zero token cost. A percept watching a directory for new files costs nothing between events – it updates the fact database when something changes, and Reflex evaluates the trigger symbolically. The polling equivalent of the same task would consume 4.8M tokens per hour ($72/hour at Opus pricing) for 60 checks per minute.

Reflex vs. Reflection Cost Distribution

Scenario: 100 conditions evaluated over 8 hours

Pure LLM approach:

  • 100 evaluations * 150k tokens = 15M tokens
  • Cost: $225 (Opus pricing)

Governor (Reflex handles most, Reflection fires selectively):

  • ~90 conditions resolved by Reflex trigger evaluation: 0 tokens = $0
  • ~10 conditions requiring Reflection: 2M tokens = $30
  • Total: $30

Savings: 87%

Progressive Efficiency Gains

As Reflection synthesizes rules and triggers, Reflex handles an increasing share of conditions:

Week 1: 60% Reflex, 40% Reflection -> $120/day Week 4: 90% Reflex, 10% Reflection -> $18/day Month 3: 95% Reflex, 5% Reflection -> $9/day

Efficiency compounds over time as each Reflection converts experiences into symbolic rules and triggers that Reflex evaluates at zero token cost.


Pattern Learning and Rule Synthesis

From Reflection to Reflex

When Reflection runs, it creates new symbolic rules and triggers:

Observation (Reflection analyzes accumulated percept data):

Percept log since last Reflection:
  08:00 - time_of_day(morning) asserted
  08:02 - user_query("check docker webapp status")
  08:30 - user_query("check docker webapp status")
  09:00 - user_query("check docker webapp status")
  
Pattern: Same query, 30-minute intervals
Intent: Checking if container is running
Current handling: Each query triggers full Reflection

Synthesis (Reflection creates):

  • Identifies polling pattern
  • Extracts intent (container status monitoring)
  • Installs a process percept that watches container state
  • Creates a trigger that fires when the container’s state changes to stopped
  • Future monitoring: $0 (percept updates facts, Reflex checks trigger)

Pattern categories learned:

  • Temporal patterns: Polling loops -> Event watchers + triggers
  • Resource monitoring: Repeated checks -> Percepts with threshold triggers
  • State verification: “Is X ready?” -> State change percepts with conditional triggers
  • Anomaly detection: Unusual fact patterns -> Triggers for investigative Reflection

Example: Polling Pattern Conversion

The Overnight Reminder

Agent task: “Remind me tomorrow morning to review the report”

Without symbolic reasoning (the problem): LLMs default to polling because they lack symbolic temporal reasoning, each inference is stateless, and they don’t understand “wait” as a primitive operation. The cost:

19:00 - Check: Is it morning? No. Cost: $1.80
19:30 - Check: Is it morning? No. Cost: $1.80
20:00 - Check: Is it morning? No. Cost: $1.80
...
08:00 - Check: Is it morning? Yes! Cost: $1.80

Total: 26 checks * $1.80 = $46.80
Actual work: 1 reminder delivered

With Governor (the solution):

First Reflection:

  1. Agent receives task during an active Reflection
  2. Reflection analyzes: this requires waiting for a time condition
  3. Reflection installs cron percept for 08:00 that will assert time_of_day(morning)
  4. Reflection creates trigger: trigger(morning_reminder, time_of_day(morning))
  5. Reflection goes dormant
  6. Cost: $3.00 (one Reflection)

Overnight (Reflex running every ~30 seconds):

  • Reflex evaluates trigger(morning_reminder, time_of_day(morning))
  • Fact time_of_day(morning) not yet asserted
  • Reflex sleeps. Repeat. Cost: $0

At 08:00:

  • Cron percept fires, asserts time_of_day(morning)
  • Next Reflex iteration: trigger matches
  • Reflection fires, delivers reminder, clears trigger
  • Cost: $1.50 (minimal Reflection)

Total cost comparison:

  • Without Governor: $46.80 (26 polling checks)
  • With Governor: $4.50 (2 Reflections + percept)
  • Savings: 90%

Integration with Arbiter Substrate

Governor is an optional module that plugs into the Arbiter Substrate. It is not a policy engine.

Arbiter Substrate provides OS-level governance for autonomous agents through ArbiterService, which evaluates commands against signed rule packs using Prolog-based symbolic pattern matching. This is the system that enforces command-level policy: denying dangerous operations, requiring confirmations, validating workflow suitability. All decisions are cryptographically signed with Ed25519, and Substrate instances authenticate via mutual TLS (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents).

Governor adds continuous cognition on top of this substrate. Where Arbiter Substrate governs what an agent is allowed to do, Governor governs what an agent chooses to do and when.

Percepts (sensors)
  | update facts in real time
Fact Database (Prolog world model)
  | evaluated every ~30s
Reflex (symbolic cycle)
  +- No trigger match -> Sleep
  +- Trigger match -> Fire Reflection
                        |
Reflection (neural cycle)
  +- Reflect on new information
  +- Update rules and triggers
  +- Execute actions (via Arbiter Substrate)
  +- Go dormant

When Reflection decides to execute an action (a tool call, a system command, an API request), that action routes through Arbiter Substrate’s policy layer as normal. Governor decides what to do; Arbiter Substrate decides whether it’s allowed.

Governor operates as a hosted service on the DataGrout platform, enabling centralized world model management and historical analysis of agent cognition patterns.


Operational Characteristics

Latency Distribution

Reflex trigger evaluation:

  • P50: 3ms, P95: 8ms, P99: 15ms

Reflection execution (when triggered):

  • P50: 2.3s, P95: 5.8s, P99: 12.0s

Reflex runs continuously with negligible overhead. Reflections are infrequent and produce durable rules that reduce future Reflection frequency.

Learning Rate

Trigger and rule growth over time:

  • Week 1: 50 base rules + 15 learned = 65 total
  • Week 4: 118 total (growth slows as common patterns are captured)

Long-tail novelty and the 6-hour heartbeat ensure Reflections continue to fire, but with decreasing frequency as the world model matures.


Comparison with Existing Approaches

vs. Pure LLM Agents

Aspect Pure LLM Governor
Decision speed 1-3 seconds <10ms (Reflex trigger evaluation)
Token cost 100-200k per decision 0 tokens (Reflex), 200k (Reflection when needed)
Polling overhead Unbounded Zero (percept-driven)
Learning None (stateless) Automatic (Reflection -> triggers + rules)
Continuity Stateless between calls Persistent world model

vs. Rule-Based Systems

Aspect Rule-Based Governor
Rule authoring Manual Automatic (Reflection synthesizes)
Adaptability Static Dynamic (learns from experience)
Novel situations Fail or require human Reflection handles via agentic loop
Minimum maintenance Continuous 6-hour heartbeat only

Governor combines the symbolic efficiency of rule-based systems with the adaptive learning of LLM agents, applied specifically to the problem of continuous autonomous operation.

For the economic primitives that Governor references, see Credit System: Economic Primitives for Autonomous Systems and Virtual Resource Accounting: Decoupled Agent Budgets for Autonomous Systems. For the policy enforcement layer that evaluates agent actions at the command level, see Runtime Policy Enforcement for Autonomous AI Systems.


Conclusion

Long-running AI agents face unsustainable token economics. Polling-based continuous operation generates unbounded LLM costs that scale linearly with time, not value delivered.

Governor solves this through neuro-symbolic cognition split into two cycles. Reflex evaluates Prolog triggers over a continuously updated fact database every ~30 seconds at zero token cost. Reflection provides full agentic reasoning when triggers match or a minimum heartbeat interval elapses. Percepts bridge the external environment to the world model in real time, converting polling patterns into event-driven updates.

The architecture achieves 10-100x token reduction with progressive efficiency gains: each Reflection converts experiences into symbolic rules and triggers, shifting an increasing share of conditions from costly inference to zero-cost symbolic evaluation.

Governor is an optional module for the Arbiter Substrate. It does not replace Arbiter Substrate’s policy enforcement, which is handled by ArbiterService and signed rule packs. Governor adds continuous cognition for autonomous systems that need to operate indefinitely with bounded costs.


February 2026

This document describes the Governor architecture for neuro-symbolic continuous agent cognition. Implementation details withheld.

Author: Nicholas Wright

Title: Co-Founder & Chief Architect, DataGrout AI

Affiliation: DataGrout Labs

Version: 1.0

Published: February 2026

For questions or collaboration: labs@datagrout.ai