Semio: A Semantic Interface Layer for Tool-Oriented AI Systems

Typed Interoperability for Scalable Agent Infrastructure

Abstract

Modern AI agents face a fundamental interoperability problem: tool surfaces are syntactically heterogeneous but semantically similar. A customer record in Salesforce, HubSpot, and Stripe represents the same conceptual entity but exposes different schemas, field names, and access patterns. Current approaches rely on LLM reasoning to bridge these gaps, resulting in probabilistic failures, high token costs, and brittle integrations.

Semio introduces a semantic interface layer that enables deterministic tool composition through typed contracts. Rather than requiring agents to reason about schema differences at runtime, Semio provides a declarative type system where tools announce their semantic capabilities and adapters handle structural transformations. This approach reduces integration fragility, lowers inference overhead, and enables formal verification of multi-step workflows.

The system operates as a compatibility substrate: tools declare inputs and outputs using semantic types (e.g., billing.customer@1), the planner reasons about type compatibility, and adapters bridge structural differences when needed. Identity anchors (keys) enable cross-system entity resolution without forcing schema normalization.


Problem Landscape

Tool Sprawl and Schema Fragmentation

Enterprise systems expose thousands of API endpoints across hundreds of services. Each system evolved independently, resulting in:

  • Incompatible schemas - A “customer” in one system has different fields than another
  • Naming variations - customer_id, customerId, external_id, cust_ref all mean “customer identifier”
  • Type mismatches - Dates as strings, ISO timestamps, or Unix epochs depending on the API
  • Identity fragmentation - No canonical way to reference the same entity across systems

Manual Integration Brittleness

Traditional integration approaches require hand-coded glue logic for every tool pair. This results in:

  • O(N^2) integration complexity - Every new tool requires adapters for every existing tool
  • Maintenance burden - API changes break existing workflows
  • Hidden assumptions - Implicit schema mappings that fail silently
  • No reusability - Integration logic cannot be shared or composed

LLM Probabilistic Failures

Using LLM reasoning alone to bridge integration gaps introduces:

  • Non-deterministic behavior - Same request produces different results
  • Token waste - Extensive context needed to describe schemas and mappings
  • Silent failures - Type mismatches discovered at execution time, not plan time
  • Hallucination risk - LLMs invent plausible but incorrect field mappings

Design Principles

Semio is built on five core principles:

1. Semantic Over Syntactic Compatibility

Tools declare what they produce and consume semantically, not syntactically. A tool outputs billing.invoice@1, not “an object with fields id, amount, customer_id.”

2. Partial Type Coverage

Not all fields must be declared. Types can be partially specified, with core fields annotated and extended fields available but not semantically indexed. This balances expressiveness with maintenance burden.

3. Identity-First Interoperability

Cross-system composition requires identity anchors. Semio defines key kinds (email, id, external_id) that enable entity resolution without forcing global unique identifiers.

4. Declarative Tool Contracts

Tools announce their capabilities through typed contracts. The planner reasons about compatibility without executing tools. This enables pre-execution verification and cost estimation.

5. Safety Through Structural Constraints

Type mismatches are caught at plan time, not execution time. Adapters are explicitly modeled and verified, preventing silent data corruption.


Semio Model Overview

Semantic Types

Types follow a versioned naming convention:

<family>.<entity>@<version>

Examples:

  • crm.account@1 - CRM account record
  • billing.invoice@1 - Billing invoice
  • core.email@1 - Email address (primitive)
  • crm.account.list@1 - Collection of accounts

Each type declares:

  • Family - Domain grouping (crm, billing, hr, etc.)
  • Label - Human-readable name
  • Keys - Identity anchors available on this type
  • Fields - Named properties with types and tiers
  • Containers - List and page variants

Type Lifting and Duck Typing

Vendor-specific data is automatically recognized as semantic types through structural matching. Objects that provide all required fields of a type implicitly satisfy that type.

Example:

# Salesforce returns:
%{"Id" => "00Q...", "Email" => "user@example.com", "Company" => "Acme Corp"}

# Automatically recognized as crm.lead@1 (has required "id" and "email")
# Vendor type (salesforce.Lead) preserved alongside semantic type
# Usable anywhere crm.lead@1 is accepted

Cross-type matching uses shared anchors (like email), not type-specific IDs:

# Lead: {id: "00Q...", email: "user@example.com"}
# Account: {id: "001...", email: "user@example.com"}
# Match via "email" anchor, not "id" (different ID namespaces)

Keys: Identity Anchors

Keys enable cross-system entity resolution:

keys: [email, id, external_id]

A tool that outputs crm.account@1 with keys [email, id] can provide input to a tool requiring billing.customer@1 if an adapter bridges the type difference and shares a common key (e.g., email).

Fields and Tiers

Fields are categorized into strategic tiers that guide planning and optimization:

  • Core - Essential properties required for basic operations (id, name, email)
  • Useful - Valuable fields that enhance workflows but aren’t strictly required (company, status, owner)
  • PII - Fields containing personally identifiable information requiring redaction (email, phone)
  • Index - Fields optimized for search and lookup operations (email, company)

Example type definition with tiers:

{
  "$id": "crm.lead@1",
  "type": "object",
  "properties": {
    "id": { "$ref": "core.entity_ref@1" },
    "name": { "type": "string" },
    "company": { "type": "string" },
    "email": { "type": "string", "format": "email" },
    "status": { 
      "type": "string", 
      "enum": ["new", "working", "qualified", "unqualified", "other"] 
    },
    "owner": { "$ref": "core.user_ref@1" },
    "created_at": { "type": "string", "format": "date-time" }
  },
  "required": ["id", "email"],
  "x_tiers": {
    "core": ["id", "name", "email"],
    "useful": ["company", "status", "owner"],
    "pii": ["email"],
    "index": ["email", "company"]
  }
}

This tiering system enables cost-aware planning: core fields are always fetched, useful fields are included when credits permit, and PII fields require policy clearance.

Adapters: Type Bridges

Adapters transform one type to another:

adapter:
  from: crm.account@1
  to: billing.customer@1
  anchor: email
  confidence: 0.9
  cost: 1.0

Confidence represents how reliably an adapter bridges two types. A direct field mapping with no information loss scores 1.0, while a lossy or heuristic mapping (e.g., inferring a billing customer from a CRM lead where not all fields carry over) scores lower. The planner treats adapters as weighted edges in a type graph, using confidence and cost as search criteria when discovering transformation paths.

Tool Contracts

Tools declare their semantic surface with rich metadata:

tool: salesforce.query_accounts
inputs:
  - name: email
    type: core.email@1
    required: true
outputs:
  - type: crm.account@1
    mode: one
provides_keys: [id, email, external_id]
requires_keys: [email]
supports_select: true
select_fields: [id, name, email, company, status, owner, created_at]
jmespath_selector: "records[0]"

Key contract features:

  • Field selection - Tools can specify which semantic fields they support (supports_select, select_fields)
  • JMESPath selectors - Integration-specific paths for extracting typed fields from responses
  • Identity anchors - Which keys are provided for cross-system resolution
  • Enrichment capability - Whether tool can augment partial data

JMESPath Field Mapping:

Integrations declare how to map vendor-specific JSON fields to semantic types using JMESPath selectors. JMESPath is a JSON query language (like XPath for JSON) that extracts data from complex API responses.

Example - Salesforce SOQL Query:

API returns:

{
  "totalSize": 1,
  "done": true,
  "records": [{
    "Id": "00Q5G00000ABC123",
    "Email": "user@example.com",
    "Company": "Acme Corp",
    "Status": "New"
  }]
}

Semantic field mappings for crm.lead@1:

{
  "id": "records[0].Id",
  "email": "records[0].Email",
  "company": "records[0].Company",
  "status": "records[0].Status | lowercase(@)"
}

Result after extraction:

{
  "id": "00Q5G00000ABC123",
  "email": "user@example.com",
  "company": "Acme Corp",
  "status": "new"
}

This normalized data satisfies crm.lead@1 and can be used in cross-system workflows without additional transformation.

Common patterns:

  • records[0] - First item from paginated list
  • data.items[*] - All items from nested array
  • response.user.{id: id, email: email} - Multi-field projection
  • results[?active=='true'] - Filtered selection

Enrichment annotations:

Tools can declare their ability to augment partial data. Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.

When the planner encounters incomplete data (e.g., only has id and email but needs status), it automatically searches for enrichment tools that can fill the gaps using available keys.

This contract system enables:

  • Static validation - Verify inputs before execution
  • Capability discovery - Find tools by semantic output
  • Cost estimation - Calculate credit costs before running
  • Formal verification - Prove workflow correctness
  • Automatic enrichment - Fill missing fields using available data

Example Workflow Walkthrough

Scenario: Invoice Generation from CRM Lead

Goal: Generate an invoice for a customer using only their email address.

Available Types:

  • User provides: core.email@1
  • Goal requires: billing.invoice@1

Step 1: Discovery

The planner searches for tools that can bridge the gap:

have: [core.email@1]
want: billing.invoice@1

Discovery finds:

  1. salesforce.get_lead - Outputs crm.lead@1, requires email key
  2. adapter: crm.lead@1 -> billing.customer@1 - Bridges CRM to billing domain
  3. stripe.create_invoice - Outputs billing.invoice@1, requires billing.customer@1

Step 2: Type Path Construction

The planner constructs a typed path:

core.email@1 
  -> [tool: salesforce.get_lead] 
  -> crm.lead@1 {email, id}
  -> [adapter: crm.lead@1 -> billing.customer@1] 
  -> billing.customer@1 {email, id}
  -> [tool: stripe.create_invoice]
  -> billing.invoice@1 {id, amount, customer_id}

Step 3: Adapter Bridging

The adapter crm.lead@1 -> billing.customer@1 uses the email anchor:

from: crm.lead@1
to: billing.customer@1
anchor: email  # Both types provide email
transform:
  - map: lead.email -> customer.email
  - map: lead.id -> customer.external_id

Step 4: Execution

The plan executes deterministically:

  1. Call salesforce.get_lead(email: "user@example.com") -> Returns lead record
  2. Apply adapter -> Transform lead fields to customer fields
  3. Call stripe.create_invoice(customer: {email, external_id}) -> Returns invoice

The entire workflow was verified at plan time. No LLM reasoning needed during execution.


Automatic Data Enrichment

The Enrichment Problem

Plans often encounter incomplete data. A workflow receives a lead with only {id, email} but needs status to proceed. Traditional approaches fail here or require manual intervention.

Discovery-Driven Enrichment

Semio’s planner automatically detects “holes” in data and searches for enrichment tools. Hole-filling is integrated into the planning search itself, not a separate post-processing pass, so enrichment steps are discovered, costed, and validated alongside primary tool calls within the same search space.

Enrichment capabilities are declared as structured facts in the planning knowledge base, specifying which tools can augment which types, what keys they require for lookup, and which fields they add.

Enrichment discovery:

  1. Detect missing fields - Plan requires crm.lead@1 with [id, email, status]
  2. Current data - Have crm.lead@1 with [id, email] (missing status)
  3. Search enrichment tools - Find tools that accept crm.lead@1 (with id key) and add status
  4. Inject enrichment step - Automatically insert get_lead_details before the step that needs status

Example plan with automatic enrichment:

Step 1: get_lead(email) -> crm.lead@1 {id, email}
Step 2: [AUTO-ENRICHMENT] get_lead_details(id) -> crm.lead@1 {id, email, status, owner}
Step 3: check_lead_status(status) -> ...

Benefits:

  • No manual patching - Planner fills gaps automatically
  • Key-based lookup - Uses available keys (id, email) for enrichment
  • Cost-aware - Enrichment steps included in cost estimate
  • Deterministic - Same missing fields -> same enrichment strategy

Field Selection and Projection

Tools declare which semantic fields they support for selective retrieval. The planner uses this information to request minimal fields, optimize API calls, and identify which fields require additional enrichment lookups.


Integration Surface

Tool Authors: Declaring Contracts

Tool developers annotate their endpoints with semantic type declarations specifying outputs, identity keys, required keys, and output mode. The annotation format integrates with the language’s existing metadata system.

Adapter Configuration

Platform operators define adapters between semantically equivalent types:

adapters:
  - from: crm.account@1
    to: billing.customer@1
    anchor: email
    confidence: 0.95
    cost: 1.0
    rationale: "Both represent customer entities"

Platform Resolution

The Semio engine:

  1. Indexes all tool contracts into a semantic graph
  2. Resolves types to families (e.g., crm.* types)
  3. Discovers adapter chains via heuristic search over the semantic graph
  4. Evaluates plans across cost, latency, and risk objectives
  5. Computes Pareto frontier of non-dominated solutions
  6. Validates key availability for each transformation
  7. Returns optimal plans with multi-objective metrics

Why Symbolic Planning

Plan generation over the type graph uses logic programming (Prolog) rather than LLM reasoning. This is a deliberate architectural choice. The planning problem (exhaustive search over typed facts with backtracking, unification, and constraint propagation) maps directly to capabilities that logic programming provides natively and that LLMs approximate probabilistically.

Symbolic planning offers properties that are difficult to achieve with neural approaches alone: deterministic outputs (same inputs produce the same plan), exhaustive search (all valid plans are found, not just the first plausible guess), and proof generation (the planner can explain why a plan is valid through its derivation trace). These properties are prerequisites for formal verification via Cognitive Trust Certificates.

The LLM’s role is constrained to intent parsing (natural language to structured query) and optional result ranking. The planning itself is symbolic. This separation is explored in detail in The Symbolic Backbone: Why Agent Systems Need Logic Programming.

Prior Art

Typed service composition has precedent in the semantic web services literature. Projects such as OWL-S and WSMO explored similar ideas (typed service contracts, semantic matching, and automated composition) during the 2000s. These efforts produced valuable theoretical foundations but failed to achieve practical adoption, largely due to the knowledge acquisition bottleneck: manually authoring ontologies and service descriptions was prohibitively expensive.

The neuro-symbolic approach resolves this bottleneck. LLMs can infer tool semantics from documentation and API schemas, automatically generating the typed contracts that semantic web systems required humans to author. The symbolic planning layer then operates over these contracts with the same rigor the earlier systems intended, but without the manual overhead that prevented their adoption.


Safety and Governance Layer

Semio’s type system integrates with the broader DataGrout governance stack. Policy enforcement, cost accounting, and formal verification are covered in dedicated companion papers; this section summarizes how Semio’s typed contracts participate in each.

Policy Enforcement

Semio integrates with DataGrout’s Semantic Guard layer (see Runtime Policy Enforcement for Autonomous AI Systems) to enforce:

  • Side effect classification - Tools declare side effect classes (none, read, write, delete) that gates enforce at runtime
  • PII handling - Fields marked with pii: true trigger Dynamic Redaction before data reaches agent context
  • Approval requirements - Write operations route through the approval system for human confirmation
  • Budget constraints - Plans rejected if estimated cost exceeds credit allocation (see Credit System: Economic Primitives for Autonomous Systems)

Read/Write Classification

Tools declare side effect classes:

tool: stripe.create_invoice
side_effect_class: write
requires_approval: true

The planner respects policy constraints configured per server:

policy:
  allow_side_effect: [read]  # Blocks write operations
  max_cost: 10.0             # Budget constraint

Redaction

Fields with pii: true are automatically redacted in logs and traces via DataGrout’s Dynamic Redaction engine:

fields:
  - name: email
    type: core.email@1
    pii: true  # Triggers redaction

Redaction strategies (masking, apron, scrambling) are configured per integration and enforced transparently. See Runtime Policy Enforcement for Autonomous AI Systems for the full redaction architecture.

Formal Verification

Plans generated through Semio’s type graph are validated via Cognitive Trust Certificates (see Cognitive Trust Certificates: Verifiable Execution Proofs for Autonomous Systems). The CTC validator checks cycle-freedom, type safety, policy compliance, budget adherence, credential availability, and input consumption before any execution occurs.

Auditability

Every plan execution generates:

  • Type trace - Sequence of type transformations
  • Adapter chain - Which bridges were applied
  • Credit breakdown - Cost per step (itemized in execution receipts)
  • Policy snapshot - What constraints were active
  • CTC proof - Cryptographically signed validation evidence

Implications for Agent Architectures

Reduced Hallucination Risk

By moving schema reasoning from runtime (LLM) to plan time (symbolic), Semio eliminates a major source of agent errors. The LLM describes intent; the type system handles compatibility.

Deterministic Composition

Given the same inputs and available tools, Semio produces the same plan. This predictability is critical for production systems where non-determinism creates operational risk.

Lower Inference Overhead

Compact type representations reduce prompt size. Instead of including full API schemas in context, the planner sees:

tool: salesforce.query_accounts
out: crm.account@1
keys: [email, id]

This fits thousands of tools in a single prompt.

Planner Compatibility

Semio’s type graph integrates with existing planners:

  • Prolog-based - Native support for typed facts and rules
  • LLM-based - Types as structured prompts
  • Hybrid - Symbolic planning with LLM refinement

Scalable Orchestration

Adding a new tool requires:

  1. Declare its semantic contract
  2. Optionally define adapters to existing types
  3. Index into the graph

No N^2 integration work. The planner automatically discovers new composition paths.


Future Work

Community and Federated Type Registries

Currently, types are platform-defined. A federated type registry would enable:

  • Shared semantic definitions across organizations
  • Community-contributed adapters
  • Type versioning and deprecation workflows
  • Cross-company workflow composition (organizations publishing internal type catalogs)
  • Standardized industry types (healthcare, finance, etc.)
  • Marketplace for commercial adapters

Cross-Platform Interoperability Standards

Semio could evolve into an interchange format:

  • Standard serialization for type contracts
  • Adapter portability across platforms
  • Tool compatibility guarantees

Formalization and Ecosystem Tooling

Potential areas for standardization:

  • Type inference from OpenAPI specs
  • Adapter validation and testing frameworks
  • Performance benchmarks for plan complexity

Appendix: Cross-System Type Definitions

The invoice generation walkthrough above relies on three interoperating type definitions: crm.account@1 (CRM domain, keyed on id and email), billing.customer@1 (Billing domain, keyed on id, email, and external_id), and an adapter that bridges them via the shared email anchor.

Each type definition specifies: required and optional properties, field tiers (core, useful, PII, index), identity keys for cross-system resolution, and JSON Schema compatibility for validation. Adapter contracts specify the source and target types, the anchor key used for identity continuity, transformation logic, confidence score, cost, and tier preservation rules.

The specific JSON schemas, tier assignments, and adapter transformation specifications are part of the operational implementation.


This document describes the conceptual architecture of Semio. Implementation details and optimization strategies are not included to protect operational IP while enabling conceptual understanding.

Author: Nicholas Wright

Title: Co-Founder & Chief Architect, DataGrout AI

Affiliation: DataGrout Labs

Version: 1.0

Published: January 2026

For questions or collaboration: labs@datagrout.ai