Governor: Neuro-Symbolic Runtime for Token-Efficient Agent Cognition
Reflex and Reflection Cycles for Bounded Autonomous Operation
Abstract
Long-running AI agents face a token economics problem. Continuous operation patterns (polling for state changes, repeated status checks, reactive monitoring) generate unbounded LLM inference costs. An agent checking “is it daytime yet?” every 30 minutes burns $20 overnight at Opus pricing, consuming 3M tokens for a decision that should cost zero.
Governor is an optional add-on module for the Arbiter Substrate that provides continuous autonomous execution through a neuro-symbolic runtime. Agent cognition is split into two cycles: Reflex (the symbolic cycle), a lightweight loop that evaluates Prolog triggers over a continuously updated fact database every ~30 seconds, and Reflection (the neural cycle), a full agentic loop that fires when triggers match or on a minimum heartbeat interval (~6 hours). Between cycles, percepts (sensors) update the agent’s world model in real time without consuming inference tokens.
Reflection reflects on accumulated information, updates Prolog rules governing the world model, and defines new triggers for the next Reflex period. This creates a learning loop where agents become progressively more efficient: routine conditions are handled symbolically, and LLM inference is reserved for genuine reasoning.
Key results: 10-100x token reduction, sub-10ms trigger evaluation, automatic conversion of polling patterns to event-driven percepts, and progressive efficiency gains through pattern learning.
Problem Landscape
Token Costs in Long-Running Agents
Production AI agents operate continuously. Traditional architectures use polling:
while True:
state = check_current_state() # LLM call
if should_act(state): # LLM call
take_action() # LLM call
sleep(interval)
Cost analysis for a reminder agent:
- Query: “Is it time to remind the user?” (120k tokens per call)
- Model: Claude Opus 4.5 ($15 per million input tokens)
- Frequency: Every 30 minutes
- Cost per check: $1.80
- Overnight (8 hours, 16 checks): $28.80
- Monthly: $864
This is economically untenable. Polling-based continuous operation scales linearly with time, not value delivered. Agents need continuous operation but cannot afford continuous inference.
Governor Architecture
Reflex (Symbolic Cycle)
Reflex is a lightweight loop that runs approximately every 30 seconds. Each iteration evaluates a set of triggers against the current fact database. Triggers are Prolog queries defined during the previous Reflection (see The Symbolic Backbone: Why Agent Systems Need Logic Programming for the design rationale behind Prolog). Examples include time-of-day conditions, file system state changes with count thresholds, and budget remaining checks. Each trigger is a declarative pattern that fires when facts satisfy its conditions.
Execution:
- Reflex wakes from sleep (~30 seconds)
- Evaluates each trigger query against the current fact database
- If no triggers match: sleep and repeat
- If a trigger matches: fire Reflection with the trigger context
Characteristics:
- Deterministic trigger matching over Prolog facts
- Zero token consumption
- <10ms evaluation latency per cycle
- Handles the vast majority of cycles without invoking inference
Reflex does not make decisions about commands or policies. Command-level policy enforcement is the domain of ArbiterService and its rule packs (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents). Reflex’s sole purpose is monitoring the world model for conditions that warrant deeper reasoning.
Reflection (Neural Cycle)
Reflection is a full agentic loop that activates when a Reflex trigger fires or when a minimum heartbeat interval (~6 hours) elapses. The heartbeat ensures the agent periodically reflects even if no triggers match, preventing edge cases where important changes go unprocessed.
Reflection process:
- Review all new information accumulated since the last Reflection (percept updates, trigger context)
- Run agentic reasoning loop (LLM-powered, iterative until satisfied)
- Reflect on patterns, anomalies, and new conditions in the world model
- Create or update Prolog rules
- Define new triggers for the next Reflex period
- Go dormant until next trigger or heartbeat
Reflection runs until it is satisfied that it has processed all relevant information and configured appropriate triggers. There is no fixed time limit; it operates as a standard agentic loop with access to tools, facts, and the Prolog rule engine.
What Reflection produces:
- Updated Prolog rules that refine the world model
- New or modified triggers for future Reflex evaluation
- Direct actions (tool calls, notifications, task execution)
- Percept configuration changes (new sensors, adjusted thresholds)
Cost per Reflection:
- Tokens: 100-400k (depending on complexity)
- Cost: $1.50-6.00
- Value: creates reusable triggers and rules that handle future conditions symbolically
- Break-even: 1-2 future cycles where symbolic handling avoids inference
Percepts (World Model Sensors)
Percepts are event-driven sensors that update the fact database in real time. They bridge the external environment and the agent’s world model, ensuring Reflex always evaluates triggers against current state.
Percept types:
- File system: File creates, deletes, modifications (Watchman/inotify)
- Process: Start, stop, crash, resource usage
- Network: Connection events, traffic patterns
- Time: Scheduled triggers, deadlines (cron)
- External: Webhooks from integrated systems
- Budget: Virtual budget state changes (see Virtual Resource Accounting: Decoupled Agent Budgets for Autonomous Systems)
When a percept fires, it updates facts in the Prolog database – asserting new facts and retracting stale ones. The next Reflex iteration evaluates triggers against these updated facts. Percepts do not trigger Reflection directly; they update facts, and triggers determine when reasoning is needed.
Token Economics
Percept-Driven vs. Polling
Conditions that traditional agents evaluate through repeated LLM calls (directory monitoring, state checks, time conditions) are handled by percepts at zero token cost. A percept watching a directory for new files costs nothing between events – it updates the fact database when something changes, and Reflex evaluates the trigger symbolically. The polling equivalent of the same task would consume 4.8M tokens per hour ($72/hour at Opus pricing) for 60 checks per minute.
Reflex vs. Reflection Cost Distribution
Scenario: 100 conditions evaluated over 8 hours
Pure LLM approach:
- 100 evaluations * 150k tokens = 15M tokens
- Cost: $225 (Opus pricing)
Governor (Reflex handles most, Reflection fires selectively):
- ~90 conditions resolved by Reflex trigger evaluation: 0 tokens = $0
- ~10 conditions requiring Reflection: 2M tokens = $30
- Total: $30
Savings: 87%
Progressive Efficiency Gains
As Reflection synthesizes rules and triggers, Reflex handles an increasing share of conditions:
Week 1: 60% Reflex, 40% Reflection -> $120/day Week 4: 90% Reflex, 10% Reflection -> $18/day Month 3: 95% Reflex, 5% Reflection -> $9/day
Efficiency compounds over time as each Reflection converts experiences into symbolic rules and triggers that Reflex evaluates at zero token cost.
Pattern Learning and Rule Synthesis
From Reflection to Reflex
When Reflection runs, it creates new symbolic rules and triggers:
Observation (Reflection analyzes accumulated percept data):
Percept log since last Reflection:
08:00 - time_of_day(morning) asserted
08:02 - user_query("check docker webapp status")
08:30 - user_query("check docker webapp status")
09:00 - user_query("check docker webapp status")
Pattern: Same query, 30-minute intervals
Intent: Checking if container is running
Current handling: Each query triggers full Reflection
Synthesis (Reflection creates):
- Identifies polling pattern
- Extracts intent (container status monitoring)
- Installs a process percept that watches container state
- Creates a trigger that fires when the container’s state changes to stopped
- Future monitoring: $0 (percept updates facts, Reflex checks trigger)
Pattern categories learned:
- Temporal patterns: Polling loops -> Event watchers + triggers
- Resource monitoring: Repeated checks -> Percepts with threshold triggers
- State verification: “Is X ready?” -> State change percepts with conditional triggers
- Anomaly detection: Unusual fact patterns -> Triggers for investigative Reflection
Example: Polling Pattern Conversion
The Overnight Reminder
Agent task: “Remind me tomorrow morning to review the report”
Without symbolic reasoning (the problem): LLMs default to polling because they lack symbolic temporal reasoning, each inference is stateless, and they don’t understand “wait” as a primitive operation. The cost:
19:00 - Check: Is it morning? No. Cost: $1.80
19:30 - Check: Is it morning? No. Cost: $1.80
20:00 - Check: Is it morning? No. Cost: $1.80
...
08:00 - Check: Is it morning? Yes! Cost: $1.80
Total: 26 checks * $1.80 = $46.80
Actual work: 1 reminder delivered
With Governor (the solution):
First Reflection:
- Agent receives task during an active Reflection
- Reflection analyzes: this requires waiting for a time condition
-
Reflection installs cron percept for 08:00 that will assert
time_of_day(morning) -
Reflection creates trigger:
trigger(morning_reminder, time_of_day(morning)) - Reflection goes dormant
- Cost: $3.00 (one Reflection)
Overnight (Reflex running every ~30 seconds):
-
Reflex evaluates
trigger(morning_reminder, time_of_day(morning)) -
Fact
time_of_day(morning)not yet asserted - Reflex sleeps. Repeat. Cost: $0
At 08:00:
-
Cron percept fires, asserts
time_of_day(morning) - Next Reflex iteration: trigger matches
- Reflection fires, delivers reminder, clears trigger
- Cost: $1.50 (minimal Reflection)
Total cost comparison:
- Without Governor: $46.80 (26 polling checks)
- With Governor: $4.50 (2 Reflections + percept)
- Savings: 90%
Integration with Arbiter Substrate
Governor is an optional module that plugs into the Arbiter Substrate. It is not a policy engine.
Arbiter Substrate provides OS-level governance for autonomous agents through ArbiterService, which evaluates commands against signed rule packs using Prolog-based symbolic pattern matching. This is the system that enforces command-level policy: denying dangerous operations, requiring confirmations, validating workflow suitability. All decisions are cryptographically signed with Ed25519, and Substrate instances authenticate via mutual TLS (see Arbiter Substrate: OS-Level Governance for Autonomous AI Agents).
Governor adds continuous cognition on top of this substrate. Where Arbiter Substrate governs what an agent is allowed to do, Governor governs what an agent chooses to do and when.
Percepts (sensors)
| update facts in real time
Fact Database (Prolog world model)
| evaluated every ~30s
Reflex (symbolic cycle)
+- No trigger match -> Sleep
+- Trigger match -> Fire Reflection
|
Reflection (neural cycle)
+- Reflect on new information
+- Update rules and triggers
+- Execute actions (via Arbiter Substrate)
+- Go dormant
When Reflection decides to execute an action (a tool call, a system command, an API request), that action routes through Arbiter Substrate’s policy layer as normal. Governor decides what to do; Arbiter Substrate decides whether it’s allowed.
Governor operates as a hosted service on the DataGrout platform, enabling centralized world model management and historical analysis of agent cognition patterns.
Operational Characteristics
Latency Distribution
Reflex trigger evaluation:
- P50: 3ms, P95: 8ms, P99: 15ms
Reflection execution (when triggered):
- P50: 2.3s, P95: 5.8s, P99: 12.0s
Reflex runs continuously with negligible overhead. Reflections are infrequent and produce durable rules that reduce future Reflection frequency.
Learning Rate
Trigger and rule growth over time:
- Week 1: 50 base rules + 15 learned = 65 total
- Week 4: 118 total (growth slows as common patterns are captured)
Long-tail novelty and the 6-hour heartbeat ensure Reflections continue to fire, but with decreasing frequency as the world model matures.
Comparison with Existing Approaches
vs. Pure LLM Agents
| Aspect | Pure LLM | Governor |
|---|---|---|
| Decision speed | 1-3 seconds | <10ms (Reflex trigger evaluation) |
| Token cost | 100-200k per decision | 0 tokens (Reflex), 200k (Reflection when needed) |
| Polling overhead | Unbounded | Zero (percept-driven) |
| Learning | None (stateless) | Automatic (Reflection -> triggers + rules) |
| Continuity | Stateless between calls | Persistent world model |
vs. Rule-Based Systems
| Aspect | Rule-Based | Governor |
|---|---|---|
| Rule authoring | Manual | Automatic (Reflection synthesizes) |
| Adaptability | Static | Dynamic (learns from experience) |
| Novel situations | Fail or require human | Reflection handles via agentic loop |
| Minimum maintenance | Continuous | 6-hour heartbeat only |
Governor combines the symbolic efficiency of rule-based systems with the adaptive learning of LLM agents, applied specifically to the problem of continuous autonomous operation.
For the economic primitives that Governor references, see Credit System: Economic Primitives for Autonomous Systems and Virtual Resource Accounting: Decoupled Agent Budgets for Autonomous Systems. For the policy enforcement layer that evaluates agent actions at the command level, see Runtime Policy Enforcement for Autonomous AI Systems.
Conclusion
Long-running AI agents face unsustainable token economics. Polling-based continuous operation generates unbounded LLM costs that scale linearly with time, not value delivered.
Governor solves this through neuro-symbolic cognition split into two cycles. Reflex evaluates Prolog triggers over a continuously updated fact database every ~30 seconds at zero token cost. Reflection provides full agentic reasoning when triggers match or a minimum heartbeat interval elapses. Percepts bridge the external environment to the world model in real time, converting polling patterns into event-driven updates.
The architecture achieves 10-100x token reduction with progressive efficiency gains: each Reflection converts experiences into symbolic rules and triggers, shifting an increasing share of conditions from costly inference to zero-cost symbolic evaluation.
Governor is an optional module for the Arbiter Substrate. It does not replace Arbiter Substrate’s policy enforcement, which is handled by ArbiterService and signed rule packs. Governor adds continuous cognition for autonomous systems that need to operate indefinitely with bounded costs.
February 2026
This document describes the Governor architecture for neuro-symbolic continuous agent cognition. Implementation details withheld.