Semio: A Semantic Interface Layer for Tool-Oriented AI Systems

Typed Interoperability for Scalable Agent Infrastructure

Abstract

Modern AI agents face a fundamental interoperability problem: tool surfaces are syntactically heterogeneous but semantically similar. A customer record in Salesforce, HubSpot, and Stripe represents the same conceptual entity but exposes different schemas, field names, and access patterns. Current approaches rely on LLM reasoning to bridge these gaps, resulting in probabilistic failures, high token costs, and brittle integrations.

Semio introduces a semantic interface layer that enables deterministic tool composition through typed contracts. Rather than requiring agents to reason about schema differences at runtime, Semio provides a declarative type system where tools announce their semantic capabilities and adapters handle structural transformations. This approach reduces integration fragility, lowers inference overhead, and enables formal verification of multi-step workflows.

The system operates as a compatibility substrate: tools declare inputs and outputs using semantic types (e.g., billing.customer@1), the planner reasons about type compatibility, and adapters bridge structural differences when needed. Identity anchors (keys) enable cross-system entity resolution without forcing schema normalization.

Problem Landscape

Tool Sprawl and Schema Fragmentation

Enterprise systems expose thousands of API endpoints across hundreds of services. Each system evolved independently, resulting in:

Incompatible schemas - A “customer” in one system has different fields than another
Naming variations - customer_id, customerId, external_id, cust_ref all mean “customer identifier”
Type mismatches - Dates as strings, ISO timestamps, or Unix epochs depending on the API
Identity fragmentation - No canonical way to reference the same entity across systems

Manual Integration Brittleness

Traditional integration approaches require hand-coded glue logic for every tool pair. This results in:

O(N^2) integration complexity - Every new tool requires adapters for every existing tool
Maintenance burden - API changes break existing workflows
Hidden assumptions - Implicit schema mappings that fail silently
No reusability - Integration logic cannot be shared or composed

LLM Probabilistic Failures

Using LLM reasoning alone to bridge integration gaps introduces:

Non-deterministic behavior - Same request produces different results
Token waste - Extensive context needed to describe schemas and mappings
Silent failures - Type mismatches discovered at execution time, not plan time
Hallucination risk - LLMs invent plausible but incorrect field mappings

Design Principles

Semio is built on five core principles:

1. Semantic Over Syntactic Compatibility

Tools declare what they produce and consume semantically, not syntactically. A tool outputs billing.invoice@1, not “an object with fields id, amount, customer_id.”

2. Partial Type Coverage

Not all fields must be declared. Types can be partially specified, with core fields annotated and extended fields available but not semantically indexed. This balances expressiveness with maintenance burden.

3. Identity-First Interoperability

Cross-system composition requires identity anchors. Semio defines key kinds (email, id, external_id) that enable entity resolution without forcing global unique identifiers.

4. Declarative Tool Contracts

Tools announce their capabilities through typed contracts. The planner reasons about compatibility without executing tools. This enables pre-execution verification and cost estimation.

5. Safety Through Structural Constraints

Type mismatches are caught at plan time, not execution time. Adapters are explicitly modeled and verified, preventing silent data corruption.

Semio Model Overview

Semantic Types

Types follow a versioned naming convention:

<family>.<entity>@<version>

Examples:

crm.account@1 - CRM account record
billing.invoice@1 - Billing invoice
core.email@1 - Email address (primitive)
crm.account.list@1 - Collection of accounts

Each type declares:

Family - Domain grouping (crm, billing, hr, etc.)
Label - Human-readable name
Keys - Identity anchors available on this type
Fields - Named properties with types and tiers
Containers - List and page variants

Type Lifting and Duck Typing

Vendor-specific data is automatically recognized as semantic types through structural matching. Objects that provide all required fields of a type implicitly satisfy that type.

Example:

# Salesforce returns:
%{"Id" => "00Q...", "Email" => "user@example.com", "Company" => "Acme Corp"}

# Automatically recognized as crm.lead@1 (has required "id" and "email")
# Vendor type (salesforce.Lead) preserved alongside semantic type
# Usable anywhere crm.lead@1 is accepted

Cross-type matching uses shared anchors (like email), not type-specific IDs:

# Lead: {id: "00Q...", email: "user@example.com"}
# Account: {id: "001...", email: "user@example.com"}
# Match via "email" anchor, not "id" (different ID namespaces)

Keys: Identity Anchors

Keys enable cross-system entity resolution:

keys: [email, id, external_id]

A tool that outputs crm.account@1 with keys [email, id] can provide input to a tool requiring billing.customer@1 if an adapter bridges the type difference and shares a common key (e.g., email).

Fields and Tiers

Fields are categorized into strategic tiers that guide planning and optimization:

Core - Essential properties required for basic operations (id, name, email)
Useful - Valuable fields that enhance workflows but aren’t strictly required (company, status, owner)
PII - Fields containing personally identifiable information requiring redaction (email, phone)
Index - Fields optimized for search and lookup operations (email, company)

Example type definition with tiers:

{
  "$id": "crm.lead@1",
  "type": "object",
  "properties": {
    "id": { "$ref": "core.entity_ref@1" },
    "name": { "type": "string" },
    "company": { "type": "string" },
    "email": { "type": "string", "format": "email" },
    "status": { 
      "type": "string", 
      "enum": ["new", "working", "qualified", "unqualified", "other"] 
    },
    "owner": { "$ref": "core.user_ref@1" },
    "created_at": { "type": "string", "format": "date-time" }
  },
  "required": ["id", "email"],
  "x_tiers": {
    "core": ["id", "name", "email"],
    "useful": ["company", "status", "owner"],
    "pii": ["email"],
    "index": ["email", "company"]
  }
}

This tiering system enables cost-aware planning: core fields are always fetched, useful fields are included when credits permit, and PII fields require policy clearance.

Adapters: Type Bridges

Adapters transform one type to another:

adapter:
  from: crm.account@1
  to: billing.customer@1
  anchor: email
  confidence: 0.9
  cost: 1.0

Confidence represents how reliably an adapter bridges two types. A direct field mapping with no information loss scores 1.0, while a lossy or heuristic mapping (e.g., inferring a billing customer from a CRM lead where not all fields carry over) scores lower. The planner treats adapters as weighted edges in a type graph, using confidence and cost as search criteria when discovering transformation paths.

Tool Contracts

Tools declare their semantic surface with rich metadata:

tool: salesforce.query_accounts
inputs:
  - name: email
    type: core.email@1
    required: true
outputs:
  - type: crm.account@1
    mode: one
provides_keys: [id, email, external_id]
requires_keys: [email]
supports_select: true
select_fields: [id, name, email, company, status, owner, created_at]
jmespath_selector: "records[0]"

Key contract features:

Field selection - Tools can specify which semantic fields they support (supports_select, select_fields)
JMESPath selectors - Integration-specific paths for extracting typed fields from responses
Identity anchors - Which keys are provided for cross-system resolution
Enrichment capability - Whether tool can augment partial data

JMESPath Field Mapping:

Integrations declare how to map vendor-specific JSON fields to semantic types using JMESPath selectors. JMESPath is a JSON query language (like XPath for JSON) that extracts data from complex API responses.

Example - Salesforce SOQL Query:

API returns:

{
  "totalSize": 1,
  "done": true,
  "records": [{
    "Id": "00Q5G00000ABC123",
    "Email": "user@example.com",
    "Company": "Acme Corp",
    "Status": "New"
  }]
}

Semantic field mappings for crm.lead@1:

{
  "id": "records[0].Id",
  "email": "records[0].Email",
  "company": "records[0].Company",
  "status": "records[0].Status | lowercase(@)"
}

Result after extraction:

{
  "id": "00Q5G00000ABC123",
  "email": "user@example.com",
  "company": "Acme Corp",
  "status": "new"
}

This normalized data satisfies crm.lead@1 and can be used in cross-system workflows without additional transformation.

Common patterns:

records[0] - First item from paginated list
data.items[*] - All items from nested array
response.user.{id: id, email: email} - Multi-field projection
results[?active=='true'] - Filtered selection

Enrichment annotations:

Tools can declare their ability to augment partial data. Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.

When the planner encounters incomplete data (e.g., only has id and email but needs status), it automatically searches for enrichment tools that can fill the gaps using available keys.

This contract system enables:

Static validation - Verify inputs before execution
Capability discovery - Find tools by semantic output
Cost estimation - Calculate credit costs before running
Formal verification - Prove workflow correctness
Automatic enrichment - Fill missing fields using available data

Example Workflow Walkthrough

Scenario: Invoice Generation from CRM Lead

Goal: Generate an invoice for a customer using only their email address.

Available Types:

User provides: core.email@1
Goal requires: billing.invoice@1

Step 1: Discovery

The planner searches for tools that can bridge the gap:

have: [core.email@1]
want: billing.invoice@1

Discovery finds:

salesforce.get_lead - Outputs crm.lead@1, requires email key
adapter: crm.lead@1 -> billing.customer@1 - Bridges CRM to billing domain
stripe.create_invoice - Outputs billing.invoice@1, requires billing.customer@1

Step 2: Type Path Construction

The planner constructs a typed path:

core.email@1 
  -> [tool: salesforce.get_lead] 
  -> crm.lead@1 {email, id}
  -> [adapter: crm.lead@1 -> billing.customer@1] 
  -> billing.customer@1 {email, id}
  -> [tool: stripe.create_invoice]
  -> billing.invoice@1 {id, amount, customer_id}

Step 3: Adapter Bridging

The adapter crm.lead@1 -> billing.customer@1 uses the email anchor:

from: crm.lead@1
to: billing.customer@1
anchor: email  # Both types provide email
transform:
  - map: lead.email -> customer.email
  - map: lead.id -> customer.external_id

Step 4: Execution

The plan executes deterministically:

Call salesforce.get_lead(email: "user@example.com") -> Returns lead record
Apply adapter -> Transform lead fields to customer fields
Call stripe.create_invoice(customer: {email, external_id}) -> Returns invoice

The entire workflow was verified at plan time. No LLM reasoning needed during execution.

Automatic Data Enrichment

The Enrichment Problem

Plans often encounter incomplete data. A workflow receives a lead with only {id, email} but needs status to proceed. Traditional approaches fail here or require manual intervention.

Discovery-Driven Enrichment

Semio’s planner automatically detects “holes” in data and searches for enrichment tools. Hole-filling is integrated into the planning search itself, not a separate post-processing pass, so enrichment steps are discovered, costed, and validated alongside primary tool calls within the same search space.

Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.

Enrichment discovery:

Detect missing fields - Plan requires crm.lead@1 with [id, email, status]
Current data - Have crm.lead@1 with [id, email] (missing status)
Search enrichment tools - Find tools that accept crm.lead@1 (with id key) and add status
Inject enrichment step - Automatically insert get_lead_details before the step that needs status

Example plan with automatic enrichment:

Step 1: get_lead(email) -> crm.lead@1 {id, email}
Step 2: [AUTO-ENRICHMENT] get_lead_details(id) -> crm.lead@1 {id, email, status, owner}
Step 3: check_lead_status(status) -> ...

Benefits:

No manual patching - Planner fills gaps automatically
Key-based lookup - Uses available keys (id, email) for enrichment
Cost-aware - Enrichment steps included in cost estimate
Deterministic - Same missing fields -> same enrichment strategy

Field Selection and Projection

Tools declare which semantic fields they support for selective retrieval. The planner uses this information to request minimal fields, optimize API calls, and identify which fields require additional enrichment lookups.

Integration Surface

Tool Authors: Declaring Contracts

Tool developers annotate their endpoints with semantic type declarations specifying outputs, identity keys, required keys, and output mode. The annotation format integrates with the language’s existing metadata system.

Adapter Configuration

Platform operators define adapters between semantically equivalent types:

adapters:
  - from: crm.account@1
    to: billing.customer@1
    anchor: email
    confidence: 0.95
    cost: 1.0
    rationale: "Both represent customer entities"

Platform Resolution

The Semio engine:

Indexes all tool contracts into a semantic graph
Resolves types to families (e.g., crm.* types)
Discovers adapter chains via heuristic search over the semantic graph
Evaluates plans across cost, latency, and risk objectives
Computes Pareto frontier of non-dominated solutions
Validates key availability for each transformation
Returns optimal plans with multi-objective metrics

Why Symbolic Planning

Plan generation over the type graph uses logic programming (Prolog) rather than LLM reasoning. This is a deliberate architectural choice. The planning problem (exhaustive search over typed facts with backtracking, unification, and constraint propagation) maps directly to capabilities that logic programming provides natively and that LLMs approximate probabilistically.

Symbolic planning offers properties that are difficult to achieve with neural approaches alone: deterministic outputs (same inputs produce the same plan), exhaustive search (all valid plans are found, not just the first plausible guess), and proof generation (the planner can explain why a plan is valid through its derivation trace). These properties are prerequisites for formal verification via Cognitive Trust Certificates.

The LLM’s role is constrained to intent parsing (natural language to structured query) and optional result ranking. The planning itself is symbolic. This separation is explored in detail in The Symbolic Backbone: Why Agent Systems Need Logic Programming.

Prior Art

Typed service composition has precedent in the semantic web services literature. Projects such as OWL-S and WSMO explored similar ideas (typed service contracts, semantic matching, and automated composition) during the 2000s. These efforts produced valuable theoretical foundations but failed to achieve practical adoption, largely due to the knowledge acquisition bottleneck: manually authoring ontologies and service descriptions was prohibitively expensive.

The neuro-symbolic approach resolves this bottleneck. LLMs can infer tool semantics from documentation and API schemas, automatically generating the typed contracts that semantic web systems required humans to author. The symbolic planning layer then operates over these contracts with the same rigor the earlier systems intended, but without the manual overhead that prevented their adoption.

Governance Integration

Semio’s typed contracts participate in the broader governance stack. Side effect classifications, PII field annotations, and budget constraints declared in tool contracts are consumed by:

Semantic Guards — pre-execution policy enforcement based on tool metadata (see Runtime Policy Enforcement for Autonomous AI Systems)
Dynamic Redaction — PII-annotated fields trigger automatic masking in outputs
Cognitive Trust Certificates — plans generated through the type graph are validated for cycle-freedom, type safety, policy compliance, and input consumption before execution (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems)
Credit accounting — cost estimates are computed from the plan and included in execution receipts (see Credit System: Economic Primitives for Autonomous Systems)

Implications for Agent Architectures

Reduced Hallucination Risk

By moving schema reasoning from runtime (LLM) to plan time (symbolic), Semio eliminates a major source of agent errors. The LLM describes intent; the type system handles compatibility.

Deterministic Composition

Given the same inputs and available tools, Semio produces the same plan. This predictability is critical for production systems where non-determinism creates operational risk.

Lower Inference Overhead

Compact type representations reduce prompt size. Instead of including full API schemas in context, the planner sees:

tool: salesforce.query_accounts
out: crm.account@1
keys: [email, id]

This fits thousands of tools in a single prompt.

Planner Compatibility

Semio’s type graph integrates with existing planners:

Prolog-based - Native support for typed facts and rules
LLM-based - Types as structured prompts
Hybrid - Symbolic planning with LLM refinement

Scalable Orchestration

Adding a new tool requires:

Declare its semantic contract
Optionally define adapters to existing types
Index into the graph

No N^2 integration work. The planner automatically discovers new composition paths.

Future Work

Federated Type Registries

Currently, types are platform-defined. A federated registry would enable shared semantic definitions across organizations, community-contributed adapters, and standardized industry types.

Ecosystem Tooling

Potential areas include type inference from OpenAPI specs, adapter validation frameworks, and cross-platform interoperability standards.

Appendix: Cross-System Type Definitions

The invoice generation walkthrough above relies on three interoperating type definitions: crm.account@1 (CRM domain, keyed on id and email), billing.customer@1 (Billing domain, keyed on id, email, and external_id), and an adapter that bridges them via the shared email anchor.

Each type definition specifies: required and optional properties, field tiers (core, useful, PII, index), identity keys for cross-system resolution, and JSON Schema compatibility for validation. Adapter contracts specify the source and target types, the anchor key used for identity continuity, transformation logic, confidence score, cost, and tier preservation rules.

The specific JSON schemas, tier assignments, and adapter transformation specifications are part of the operational implementation.

This document describes the conceptual architecture of Semio. Implementation details and optimization strategies are not included to protect operational IP while enabling conceptual understanding.