Semio: A Semantic Interface Layer for Tool-Oriented AI Systems
Typed Interoperability for Scalable Agent Infrastructure
Abstract
Modern AI agents face a fundamental interoperability problem: tool surfaces are syntactically heterogeneous but semantically similar. A customer record in Salesforce, HubSpot, and Stripe represents the same conceptual entity but exposes different schemas, field names, and access patterns. Current approaches rely on LLM reasoning to bridge these gaps, resulting in probabilistic failures, high token costs, and brittle integrations.
Semio introduces a semantic interface layer that enables deterministic tool composition through typed contracts. Rather than requiring agents to reason about schema differences at runtime, Semio provides a declarative type system where tools announce their semantic capabilities and adapters handle structural transformations. This approach reduces integration fragility, lowers inference overhead, and enables formal verification of multi-step workflows.
The system operates as a compatibility substrate: tools declare inputs and outputs using semantic types (e.g., billing.customer@1), the planner reasons about type compatibility, and adapters bridge structural differences when needed. Identity anchors (keys) enable cross-system entity resolution without forcing schema normalization.
Problem Landscape
Tool Sprawl and Schema Fragmentation
Enterprise systems expose thousands of API endpoints across hundreds of services. Each system evolved independently, resulting in:
- Incompatible schemas - A “customer” in one system has different fields than another
-
Naming variations -
customer_id,customerId,external_id,cust_refall mean “customer identifier” - Type mismatches - Dates as strings, ISO timestamps, or Unix epochs depending on the API
- Identity fragmentation - No canonical way to reference the same entity across systems
Manual Integration Brittleness
Traditional integration approaches require hand-coded glue logic for every tool pair. This results in:
- O(N^2) integration complexity - Every new tool requires adapters for every existing tool
- Maintenance burden - API changes break existing workflows
- Hidden assumptions - Implicit schema mappings that fail silently
- No reusability - Integration logic cannot be shared or composed
LLM Probabilistic Failures
Using LLM reasoning alone to bridge integration gaps introduces:
- Non-deterministic behavior - Same request produces different results
- Token waste - Extensive context needed to describe schemas and mappings
- Silent failures - Type mismatches discovered at execution time, not plan time
- Hallucination risk - LLMs invent plausible but incorrect field mappings
Design Principles
Semio is built on five core principles:
1. Semantic Over Syntactic Compatibility
Tools declare what they produce and consume semantically, not syntactically. A tool outputs billing.invoice@1, not “an object with fields id, amount, customer_id.”
2. Partial Type Coverage
Not all fields must be declared. Types can be partially specified, with core fields annotated and extended fields available but not semantically indexed. This balances expressiveness with maintenance burden.
3. Identity-First Interoperability
Cross-system composition requires identity anchors. Semio defines key kinds (email, id, external_id) that enable entity resolution without forcing global unique identifiers.
4. Declarative Tool Contracts
Tools announce their capabilities through typed contracts. The planner reasons about compatibility without executing tools. This enables pre-execution verification and cost estimation.
5. Safety Through Structural Constraints
Type mismatches are caught at plan time, not execution time. Adapters are explicitly modeled and verified, preventing silent data corruption.
Semio Model Overview
Semantic Types
Types follow a versioned naming convention:
<family>.<entity>@<version>
Examples:
-
crm.account@1- CRM account record -
billing.invoice@1- Billing invoice -
core.email@1- Email address (primitive) -
crm.account.list@1- Collection of accounts
Each type declares:
- Family - Domain grouping (crm, billing, hr, etc.)
- Label - Human-readable name
- Keys - Identity anchors available on this type
- Fields - Named properties with types and tiers
- Containers - List and page variants
Type Lifting and Duck Typing
Vendor-specific data is automatically recognized as semantic types through structural matching. Objects that provide all required fields of a type implicitly satisfy that type.
Example:
# Salesforce returns:
%{"Id" => "00Q...", "Email" => "user@example.com", "Company" => "Acme Corp"}
# Automatically recognized as crm.lead@1 (has required "id" and "email")
# Vendor type (salesforce.Lead) preserved alongside semantic type
# Usable anywhere crm.lead@1 is accepted
Cross-type matching uses shared anchors (like email), not type-specific IDs:
# Lead: {id: "00Q...", email: "user@example.com"}
# Account: {id: "001...", email: "user@example.com"}
# Match via "email" anchor, not "id" (different ID namespaces)
Keys: Identity Anchors
Keys enable cross-system entity resolution:
keys: [email, id, external_id]
A tool that outputs crm.account@1 with keys [email, id] can provide input to a tool requiring billing.customer@1 if an adapter bridges the type difference and shares a common key (e.g., email).
Fields and Tiers
Fields are categorized into strategic tiers that guide planning and optimization:
- Core - Essential properties required for basic operations (id, name, email)
- Useful - Valuable fields that enhance workflows but aren’t strictly required (company, status, owner)
- PII - Fields containing personally identifiable information requiring redaction (email, phone)
- Index - Fields optimized for search and lookup operations (email, company)
Example type definition with tiers:
{
"$id": "crm.lead@1",
"type": "object",
"properties": {
"id": { "$ref": "core.entity_ref@1" },
"name": { "type": "string" },
"company": { "type": "string" },
"email": { "type": "string", "format": "email" },
"status": {
"type": "string",
"enum": ["new", "working", "qualified", "unqualified", "other"]
},
"owner": { "$ref": "core.user_ref@1" },
"created_at": { "type": "string", "format": "date-time" }
},
"required": ["id", "email"],
"x_tiers": {
"core": ["id", "name", "email"],
"useful": ["company", "status", "owner"],
"pii": ["email"],
"index": ["email", "company"]
}
}
This tiering system enables cost-aware planning: core fields are always fetched, useful fields are included when credits permit, and PII fields require policy clearance.
Adapters: Type Bridges
Adapters transform one type to another:
adapter:
from: crm.account@1
to: billing.customer@1
anchor: email
confidence: 0.9
cost: 1.0
Confidence represents how reliably an adapter bridges two types. A direct field mapping with no information loss scores 1.0, while a lossy or heuristic mapping (e.g., inferring a billing customer from a CRM lead where not all fields carry over) scores lower. The planner treats adapters as weighted edges in a type graph, using confidence and cost as search criteria when discovering transformation paths.
Tool Contracts
Tools declare their semantic surface with rich metadata:
tool: salesforce.query_accounts
inputs:
- name: email
type: core.email@1
required: true
outputs:
- type: crm.account@1
mode: one
provides_keys: [id, email, external_id]
requires_keys: [email]
supports_select: true
select_fields: [id, name, email, company, status, owner, created_at]
jmespath_selector: "records[0]"
Key contract features:
-
Field selection - Tools can specify which semantic fields they support (
supports_select,select_fields) - JMESPath selectors - Integration-specific paths for extracting typed fields from responses
- Identity anchors - Which keys are provided for cross-system resolution
- Enrichment capability - Whether tool can augment partial data
JMESPath Field Mapping:
Integrations declare how to map vendor-specific JSON fields to semantic types using JMESPath selectors. JMESPath is a JSON query language (like XPath for JSON) that extracts data from complex API responses.
Example - Salesforce SOQL Query:
API returns:
{
"totalSize": 1,
"done": true,
"records": [{
"Id": "00Q5G00000ABC123",
"Email": "user@example.com",
"Company": "Acme Corp",
"Status": "New"
}]
}
Semantic field mappings for crm.lead@1:
{
"id": "records[0].Id",
"email": "records[0].Email",
"company": "records[0].Company",
"status": "records[0].Status | lowercase(@)"
}
Result after extraction:
{
"id": "00Q5G00000ABC123",
"email": "user@example.com",
"company": "Acme Corp",
"status": "new"
}
This normalized data satisfies crm.lead@1 and can be used in cross-system workflows without additional transformation.
Common patterns:
-
records[0]- First item from paginated list -
data.items[*]- All items from nested array -
response.user.{id: id, email: email}- Multi-field projection -
results[?active=='true']- Filtered selection
Enrichment annotations:
Tools can declare their ability to augment partial data. Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.
When the planner encounters incomplete data (e.g., only has id and email but needs status), it automatically searches for enrichment tools that can fill the gaps using available keys.
This contract system enables:
- Static validation - Verify inputs before execution
- Capability discovery - Find tools by semantic output
- Cost estimation - Calculate credit costs before running
- Formal verification - Prove workflow correctness
- Automatic enrichment - Fill missing fields using available data
Example Workflow Walkthrough
Scenario: Invoice Generation from CRM Lead
Goal: Generate an invoice for a customer using only their email address.
Available Types:
-
User provides:
core.email@1 -
Goal requires:
billing.invoice@1
Step 1: Discovery
The planner searches for tools that can bridge the gap:
have: [core.email@1]
want: billing.invoice@1
Discovery finds:
-
salesforce.get_lead- Outputscrm.lead@1, requiresemailkey -
adapter: crm.lead@1 -> billing.customer@1- Bridges CRM to billing domain -
stripe.create_invoice- Outputsbilling.invoice@1, requiresbilling.customer@1
Step 2: Type Path Construction
The planner constructs a typed path:
core.email@1
-> [tool: salesforce.get_lead]
-> crm.lead@1 {email, id}
-> [adapter: crm.lead@1 -> billing.customer@1]
-> billing.customer@1 {email, id}
-> [tool: stripe.create_invoice]
-> billing.invoice@1 {id, amount, customer_id}
Step 3: Adapter Bridging
The adapter crm.lead@1 -> billing.customer@1 uses the email anchor:
from: crm.lead@1
to: billing.customer@1
anchor: email # Both types provide email
transform:
- map: lead.email -> customer.email
- map: lead.id -> customer.external_id
Step 4: Execution
The plan executes deterministically:
-
Call
salesforce.get_lead(email: "user@example.com")-> Returns lead record - Apply adapter -> Transform lead fields to customer fields
-
Call
stripe.create_invoice(customer: {email, external_id})-> Returns invoice
The entire workflow was verified at plan time. No LLM reasoning needed during execution.
Automatic Data Enrichment
The Enrichment Problem
Plans often encounter incomplete data. A workflow receives a lead with only {id, email} but needs status to proceed. Traditional approaches fail here or require manual intervention.
Discovery-Driven Enrichment
Semio’s planner automatically detects “holes” in data and searches for enrichment tools. Hole-filling is integrated into the planning search itself, not a separate post-processing pass, so enrichment steps are discovered, costed, and validated alongside primary tool calls within the same search space.
Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.
Enrichment discovery:
-
Detect missing fields - Plan requires
crm.lead@1with[id, email, status] -
Current data - Have
crm.lead@1with[id, email](missingstatus) -
Search enrichment tools - Find tools that accept
crm.lead@1(withidkey) and addstatus -
Inject enrichment step - Automatically insert
get_lead_detailsbefore the step that needsstatus
Example plan with automatic enrichment:
Step 1: get_lead(email) -> crm.lead@1 {id, email}
Step 2: [AUTO-ENRICHMENT] get_lead_details(id) -> crm.lead@1 {id, email, status, owner}
Step 3: check_lead_status(status) -> ...
Benefits:
- No manual patching - Planner fills gaps automatically
- Key-based lookup - Uses available keys (id, email) for enrichment
- Cost-aware - Enrichment steps included in cost estimate
- Deterministic - Same missing fields -> same enrichment strategy
Field Selection and Projection
Tools declare which semantic fields they support for selective retrieval. The planner uses this information to request minimal fields, optimize API calls, and identify which fields require additional enrichment lookups.
Integration Surface
Tool Authors: Declaring Contracts
Tool developers annotate their endpoints with semantic type declarations specifying outputs, identity keys, required keys, and output mode. The annotation format integrates with the language’s existing metadata system.
Adapter Configuration
Platform operators define adapters between semantically equivalent types:
adapters:
- from: crm.account@1
to: billing.customer@1
anchor: email
confidence: 0.95
cost: 1.0
rationale: "Both represent customer entities"
Platform Resolution
The Semio engine:
- Indexes all tool contracts into a semantic graph
-
Resolves types to families (e.g.,
crm.*types) - Discovers adapter chains via heuristic search over the semantic graph
- Evaluates plans across cost, latency, and risk objectives
- Computes Pareto frontier of non-dominated solutions
- Validates key availability for each transformation
- Returns optimal plans with multi-objective metrics
Why Symbolic Planning
Plan generation over the type graph uses logic programming (Prolog) rather than LLM reasoning. This is a deliberate architectural choice. The planning problem (exhaustive search over typed facts with backtracking, unification, and constraint propagation) maps directly to capabilities that logic programming provides natively and that LLMs approximate probabilistically.
Symbolic planning offers properties that are difficult to achieve with neural approaches alone: deterministic outputs (same inputs produce the same plan), exhaustive search (all valid plans are found, not just the first plausible guess), and proof generation (the planner can explain why a plan is valid through its derivation trace). These properties are prerequisites for formal verification via Cognitive Trust Certificates.
The LLM’s role is constrained to intent parsing (natural language to structured query) and optional result ranking. The planning itself is symbolic. This separation is explored in detail in The Symbolic Backbone: Why Agent Systems Need Logic Programming.
Prior Art
Typed service composition has precedent in the semantic web services literature. Projects such as OWL-S and WSMO explored similar ideas (typed service contracts, semantic matching, and automated composition) during the 2000s. These efforts produced valuable theoretical foundations but failed to achieve practical adoption, largely due to the knowledge acquisition bottleneck: manually authoring ontologies and service descriptions was prohibitively expensive.
The neuro-symbolic approach resolves this bottleneck. LLMs can infer tool semantics from documentation and API schemas, automatically generating the typed contracts that semantic web systems required humans to author. The symbolic planning layer then operates over these contracts with the same rigor the earlier systems intended, but without the manual overhead that prevented their adoption.
Safety and Governance Layer
Semio’s type system integrates with the broader DataGrout governance stack. Policy enforcement, cost accounting, and formal verification are covered in dedicated companion papers; this section summarizes how Semio’s typed contracts participate in each.
Policy Enforcement
Semio integrates with DataGrout’s Semantic Guard layer (see Runtime Policy Enforcement for Autonomous AI Systems) to enforce:
- Side effect classification - Tools declare side effect classes (none, read, write, delete) that gates enforce at runtime
-
PII handling - Fields marked with
pii: truetrigger Dynamic Redaction before data reaches agent context - Approval requirements - Write operations route through the approval system for human confirmation
- Budget constraints - Plans rejected if estimated cost exceeds credit allocation (see Credit System: Economic Primitives for Autonomous Systems)
Read/Write Classification
Tools declare side effect classes:
tool: stripe.create_invoice
side_effect_class: write
requires_approval: true
The planner respects policy constraints configured per server:
policy:
allow_side_effect: [read] # Blocks write operations
max_cost: 10.0 # Budget constraint
Redaction
Fields with pii: true are automatically redacted in logs and traces via DataGrout’s Dynamic Redaction engine:
fields:
- name: email
type: core.email@1
pii: true # Triggers redaction
Redaction strategies (masking, apron, scrambling) are configured per integration and enforced transparently. See Runtime Policy Enforcement for Autonomous AI Systems for the full redaction architecture.
Formal Verification
Plans generated through Semio’s type graph are validated via Cognitive Trust Certificates (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems). The CTC validator checks cycle-freedom, type safety, policy compliance, budget adherence, credential availability, and input consumption before any execution occurs.
Auditability
Every plan execution generates:
- Type trace - Sequence of type transformations
- Adapter chain - Which bridges were applied
- Credit breakdown - Cost per step (itemized in execution receipts)
- Policy snapshot - What constraints were active
- CTC proof - Cryptographically signed validation evidence
Implications for Agent Architectures
Reduced Hallucination Risk
By moving schema reasoning from runtime (LLM) to plan time (symbolic), Semio eliminates a major source of agent errors. The LLM describes intent; the type system handles compatibility.
Deterministic Composition
Given the same inputs and available tools, Semio produces the same plan. This predictability is critical for production systems where non-determinism creates operational risk.
Lower Inference Overhead
Compact type representations reduce prompt size. Instead of including full API schemas in context, the planner sees:
tool: salesforce.query_accounts
out: crm.account@1
keys: [email, id]
This fits thousands of tools in a single prompt.
Planner Compatibility
Semio’s type graph integrates with existing planners:
- Prolog-based - Native support for typed facts and rules
- LLM-based - Types as structured prompts
- Hybrid - Symbolic planning with LLM refinement
Scalable Orchestration
Adding a new tool requires:
- Declare its semantic contract
- Optionally define adapters to existing types
- Index into the graph
No N^2 integration work. The planner automatically discovers new composition paths.
Future Work
Community and Federated Type Registries
Currently, types are platform-defined. A federated type registry would enable:
- Shared semantic definitions across organizations
- Community-contributed adapters
- Type versioning and deprecation workflows
- Cross-company workflow composition (organizations publishing internal type catalogs)
- Standardized industry types (healthcare, finance, etc.)
- Marketplace for commercial adapters
Cross-Platform Interoperability Standards
Semio could evolve into an interchange format:
- Standard serialization for type contracts
- Adapter portability across platforms
- Tool compatibility guarantees
Formalization and Ecosystem Tooling
Potential areas for standardization:
- Type inference from OpenAPI specs
- Adapter validation and testing frameworks
- Performance benchmarks for plan complexity
Appendix: Cross-System Type Definitions
The invoice generation walkthrough above relies on three interoperating type definitions: crm.account@1 (CRM domain, keyed on id and email), billing.customer@1 (Billing domain, keyed on id, email, and external_id), and an adapter that bridges them via the shared email anchor.
Each type definition specifies: required and optional properties, field tiers (core, useful, PII, index), identity keys for cross-system resolution, and JSON Schema compatibility for validation. Adapter contracts specify the source and target types, the anchor key used for identity continuity, transformation logic, confidence score, cost, and tier preservation rules.
The specific JSON schemas, tier assignments, and adapter transformation specifications are part of the operational implementation.
This document describes the conceptual architecture of Semio. Implementation details and optimization strategies are not included to protect operational IP while enabling conceptual understanding.