Governor: Neuro-Symbolic Runtime for Token-Efficient Agent Cognition

Reflex and Reflection Cycles for Bounded Autonomous Operation

Abstract

Long-running AI agents face a token economics problem. Continuous operation patterns (polling for state changes, repeated status checks, reactive monitoring) generate unbounded LLM inference costs. An agent checking “is it daytime yet?” every 30 minutes burns $20 overnight at Opus pricing, consuming 3M tokens for a decision that should cost zero.

Governor is an optional add-on module for the Arbiter Substrate that provides continuous autonomous execution through a neuro-symbolic runtime. Agent cognition is split into two cycles: Reflex (the symbolic cycle), a lightweight loop that evaluates Prolog triggers over a continuously updated fact database every ~30 seconds, and Reflection (the neural cycle), a full agentic loop that fires when triggers match or on a minimum heartbeat interval (~6 hours). Between cycles, percepts (sensors) update the agent’s world model in real time without consuming inference tokens.

Reflection reflects on accumulated information, updates Prolog rules governing the world model, and defines new triggers for the next Reflex period. This creates a learning loop where agents become progressively more efficient: routine conditions are handled symbolically, and LLM inference is reserved for genuine reasoning.

Key results: 10-100x token reduction, sub-10ms trigger evaluation, automatic conversion of polling patterns to event-driven percepts, and progressive efficiency gains through pattern learning.

Problem Landscape

Token Costs in Long-Running Agents

Production AI agents operate continuously. Traditional architectures use polling:

while True:
    state = check_current_state()  # LLM call
    if should_act(state):          # LLM call
        take_action()              # LLM call
    sleep(interval)

Cost analysis from an observed production scenario — an agent framework using cron-triggered LLM loops for idle monitoring:

Query: “Is it time to remind the user?” (120k tokens per call)
Model: Claude Opus 4.5 ($15 per million input tokens)
Frequency: Every 30 minutes (cron interval)
Cost per check: $1.80
Overnight (8 hours, 16 checks): $28.80
Monthly: $864

This is economically untenable. Polling-based continuous operation scales linearly with time, not value delivered. Agents need continuous operation but cannot afford continuous inference.

Governor Architecture

Reflex (Symbolic Cycle)

Reflex is a lightweight loop that runs approximately every 30 seconds. Each iteration evaluates a set of triggers against the current fact database. Triggers are Prolog queries defined during the previous Reflection (see The Symbolic Backbone: Why Agent Systems Need Logic Programming for the design rationale behind Prolog). Examples include time-of-day conditions, file system state changes with count thresholds, and budget remaining checks. Each trigger is a declarative pattern that fires when facts satisfy its conditions.

Execution:

Reflex wakes from sleep (~30 seconds)
Evaluates each trigger query against the current fact database
If no triggers match: sleep and repeat
If a trigger matches: fire Reflection with the trigger context

Characteristics:

Deterministic trigger matching over Prolog facts
Zero token consumption
<10ms evaluation latency per cycle
Handles the vast majority of cycles without invoking inference

Reflex does not make decisions about commands or policies. Command-level policy enforcement is the domain of ArbiterService and its rule packs (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents). Reflex’s sole purpose is monitoring the world model for conditions that warrant deeper reasoning.

Reflection (Neural Cycle)

Reflection is a full agentic loop that activates when a Reflex trigger fires or when a minimum heartbeat interval (~6 hours) elapses. The heartbeat ensures the agent periodically reflects even if no triggers match, preventing edge cases where important changes go unprocessed.

Reflection process:

Review all new information accumulated since the last Reflection (percept updates, trigger context)
Run agentic reasoning loop (LLM-powered, iterative until satisfied)
Reflect on patterns, anomalies, and new conditions in the world model
Create or update Prolog rules
Define new triggers for the next Reflex period
Go dormant until next trigger or heartbeat

Reflection runs until it is satisfied that it has processed all relevant information and configured appropriate triggers. There is no fixed time limit; it operates as a standard agentic loop with access to tools, facts, and the Prolog rule engine.

What Reflection produces:

Updated Prolog rules that refine the world model
New or modified triggers for future Reflex evaluation
Direct actions (tool calls, notifications, task execution)
Percept configuration changes (new sensors, adjusted thresholds)

Cost per Reflection:

Tokens: 100-400k (depending on complexity)
Cost: $1.50-6.00
Value: creates reusable triggers and rules that handle future conditions symbolically
Break-even: 1-2 future cycles where symbolic handling avoids inference

Percepts (World Model Sensors)

Percepts are event-driven sensors that update the fact database in real time. They bridge the external environment and the agent’s world model, ensuring Reflex always evaluates triggers against current state.

Percept types:

File system: File creates, deletes, modifications (Watchman/inotify)
Process: Start, stop, crash, resource usage
Network: Connection events, traffic patterns
Time: Scheduled triggers, deadlines (cron)
External: Webhooks from integrated systems
Budget: Virtual budget state changes (see Virtual Resource Accounting: Decoupled Agent Budgets for Autonomous Systems)

When a percept fires, it updates facts in the Prolog database – asserting new facts and retracting stale ones. The next Reflex iteration evaluates triggers against these updated facts. Percepts do not trigger Reflection directly; they update facts, and triggers determine when reasoning is needed.

Token Economics

Percept-Driven vs. Polling

Conditions that traditional agents evaluate through repeated LLM calls (directory monitoring, state checks, time conditions) are handled by percepts at zero token cost. A percept watching a directory for new files costs nothing between events – it updates the fact database when something changes, and Reflex evaluates the trigger symbolically. The polling equivalent of the same task would consume 4.8M tokens per hour ($72/hour at Opus pricing) for 60 checks per minute.

Reflex vs. Reflection Cost Distribution

Scenario: 100 conditions evaluated over 8 hours

Pure LLM approach:

100 evaluations * 150k tokens = 15M tokens
Cost: $225 (Opus pricing)

Governor (Reflex handles most, Reflection fires selectively):

~90 conditions resolved by Reflex trigger evaluation: 0 tokens = $0
~10 conditions requiring Reflection: 2M tokens = $30
Total: $30

Savings: 87%

Progressive Efficiency Gains

As Reflection synthesizes rules and triggers, Reflex handles an increasing share of conditions. The trajectory depends on workload regularity, but the general pattern is predictable: routine conditions are captured first, long-tail novelty persists.

Modeled trajectory (regular workload, stable environment):

Week 1: ~60% Reflex, 40% Reflection
Week 4: ~90% Reflex, 10% Reflection
Month 3: ~95% Reflex, 5% Reflection

These are projections based on the architecture’s learning dynamics, not empirical measurements. Workloads with high novelty (frequent new task types) will converge more slowly; highly repetitive workloads will converge faster.

Pattern Learning and Rule Synthesis

From Reflection to Reflex

When Reflection runs, it creates new symbolic rules and triggers:

Observation (Reflection analyzes accumulated percept data):

Percept log since last Reflection:
  08:00 - time_of_day(morning) asserted
  08:02 - user_query("check docker webapp status")
  08:30 - user_query("check docker webapp status")
  09:00 - user_query("check docker webapp status")
  
Pattern: Same query, 30-minute intervals
Intent: Checking if container is running
Current handling: Each query triggers full Reflection

Synthesis (Reflection creates):

Identifies polling pattern
Extracts intent (container status monitoring)
Installs a process percept that watches container state
Creates a trigger that fires when the container’s state changes to stopped
Future monitoring: $0 (percept updates facts, Reflex checks trigger)

Pattern categories learned:

Temporal patterns: Polling loops -> Event watchers + triggers
Resource monitoring: Repeated checks -> Percepts with threshold triggers
State verification: “Is X ready?” -> State change percepts with conditional triggers
Anomaly detection: Unusual fact patterns -> Triggers for investigative Reflection

Example: Polling Pattern Conversion

The Overnight Reminder

Agent task: “Remind me tomorrow morning to review the report”

Without symbolic reasoning (the problem): LLMs default to polling because they lack symbolic temporal reasoning, each inference is stateless, and they don’t understand “wait” as a primitive operation. The cost:

19:00 - Check: Is it morning? No. Cost: $1.80
19:30 - Check: Is it morning? No. Cost: $1.80
20:00 - Check: Is it morning? No. Cost: $1.80
...
08:00 - Check: Is it morning? Yes! Cost: $1.80

Total: 26 checks * $1.80 = $46.80
Actual work: 1 reminder delivered

With Governor (the solution):

First Reflection:

Agent receives task during an active Reflection
Reflection analyzes: this requires waiting for a time condition
Reflection installs cron percept for 08:00 that will assert time_of_day(morning)
Reflection creates trigger: trigger(morning_reminder, time_of_day(morning))
Reflection goes dormant
Cost: $3.00 (one Reflection)

Overnight (Reflex running every ~30 seconds):

Reflex evaluates trigger(morning_reminder, time_of_day(morning))
Fact time_of_day(morning) not yet asserted
Reflex sleeps. Repeat. Cost: $0

At 08:00:

Cron percept fires, asserts time_of_day(morning)
Next Reflex iteration: trigger matches
Reflection fires, delivers reminder, clears trigger
Cost: $1.50 (minimal Reflection)

Total cost comparison:

Without Governor: $46.80 (26 polling checks)
With Governor: $4.50 (2 Reflections + percept)
Savings: 90%

Integration with Arbiter Substrate

Governor is an optional module that plugs into the Arbiter Substrate. It is not a policy engine.

Arbiter Substrate provides OS-level governance for autonomous agents through ArbiterService, which evaluates commands against signed rule packs using Prolog-based symbolic pattern matching. This is the system that enforces command-level policy: denying dangerous operations, requiring confirmations, validating workflow suitability. All decisions are cryptographically signed with Ed25519, and Substrate instances authenticate via mutual TLS (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents).

Governor adds continuous cognition on top of this substrate. Where Arbiter Substrate governs what an agent is allowed to do, Governor governs what an agent chooses to do and when.

Percepts (sensors)
  | update facts in real time
Fact Database (Prolog world model)
  | evaluated every ~30s
Reflex (symbolic cycle)
  +- No trigger match -> Sleep
  +- Trigger match -> Fire Reflection
                        |
Reflection (neural cycle)
  +- Reflect on new information
  +- Update rules and triggers
  +- Execute actions (via Arbiter Substrate)
  +- Go dormant

When Reflection decides to execute an action (a tool call, a system command, an API request), that action routes through Arbiter Substrate’s policy layer as normal. Governor decides what to do; Arbiter Substrate decides whether it’s allowed.

Governor operates as a hosted service on the DataGrout platform, enabling centralized world model management and historical analysis of agent cognition patterns.

Operational Characteristics

Latency Characteristics

Reflex trigger evaluation operates in the single-digit millisecond range — Prolog query evaluation over a local fact database. Reflection execution depends on the LLM model and complexity of accumulated information, typically completing in seconds.

Reflex runs continuously with negligible overhead. Reflections are infrequent and produce durable rules that reduce future Reflection frequency. The 6-hour heartbeat ensures the agent periodically reflects even in the absence of trigger matches.

Comparison with Existing Approaches

vs. Pure LLM Agents

Aspect	Pure LLM	Governor
Decision speed	1-3 seconds	<10ms (Reflex trigger evaluation)
Token cost	100-200k per decision	0 tokens (Reflex), 200k (Reflection when needed)
Polling overhead	Unbounded	Zero (percept-driven)
Learning	None (stateless)	Automatic (Reflection -> triggers + rules)
Continuity	Stateless between calls	Persistent world model

vs. Rule-Based Systems

Aspect	Rule-Based	Governor
Rule authoring	Manual	Automatic (Reflection synthesizes)
Adaptability	Static	Dynamic (learns from experience)
Novel situations	Fail or require human	Reflection handles via agentic loop
Minimum maintenance	Continuous	6-hour heartbeat only

Governor combines the symbolic efficiency of rule-based systems with the adaptive learning of LLM agents, applied specifically to the problem of continuous autonomous operation.

For the economic primitives that Governor references, see Credit System: Economic Primitives for Autonomous Systems and Virtual Resource Accounting: Decoupled Agent Budgets for Autonomous Systems. For the policy enforcement layer that evaluates agent actions at the command level, see Runtime Policy Enforcement for Autonomous AI Systems.

Conclusion

Long-running AI agents face unsustainable token economics. Polling-based continuous operation generates unbounded LLM costs that scale linearly with time, not value delivered.

Governor solves this through neuro-symbolic cognition split into two cycles. Reflex evaluates Prolog triggers over a continuously updated fact database every ~30 seconds at zero token cost. Reflection provides full agentic reasoning when triggers match or a minimum heartbeat interval elapses. Percepts bridge the external environment to the world model in real time, converting polling patterns into event-driven updates.

The architecture achieves 10-100x token reduction with progressive efficiency gains: each Reflection converts experiences into symbolic rules and triggers, shifting an increasing share of conditions from costly inference to zero-cost symbolic evaluation.

Governor is an optional module for the Arbiter Substrate. It does not replace Arbiter Substrate’s policy enforcement, which is handled by ArbiterService and signed rule packs. Governor adds continuous cognition for autonomous systems that need to operate indefinitely with bounded costs.

This document describes the Governor architecture for neuro-symbolic continuous agent cognition. Implementation details withheld.