Expression Layer for Data Agents

DATA EXPRESSIONS FOR MULTI-AGENT WORK

[G] GITHUB

Agent work is monolithic. A session produces a result, but nothing inside it can be verified, reused, or built on. A human reviews the output but can’t reproduce it. The problem isn’t better tools — it’s a missing primitive.

EXPRESSIONS AS EVIDENCE AND MATERIAL

An expression is the unit of work — git-distributed, arrow-native, composable, with lineage by default. A declarative object that carries its own identity, its own history, its own cache.

Every expression has a dual nature. Looking backward, it’s evidence: what was computed, from what inputs, with what hash. Same expression, same hash — agents that arrive at the same answer converge automatically, without coordination.

Looking forward, it’s material: a composable part that snaps into larger computations without knowing who made it or when. The catalog isn’t a document store agents read. It’s a surface agents build on.

Same object. Two faces. That’s the primitive.

SEARCH, COMPOSE, VERIFY

  • xorq catalog list — search the catalog. Find what exists before computing anything new.
  • xorq run-unbound — execute an expression with inputs resolved at runtime.
  • xorq lineage — trace any result to its source. Verify provenance through the graph.
Layer Role Protocol
Expression Specification Ibis / xorq
Catalog Identity + lineage Git
Cache Input-addressed storage Parquet
Execution Pluggable engines Arrow

THE AGENT LOOP

Agents share context between sessions. They can’t share data. The catalog does it for them.

Agent receives a question. Searches the catalog for existing entries. Finds partial coverage — some expressions already computed, cached, verified. Composes a new expression from existing parts plus new logic. Executes. The result becomes a new entry.

Second session. Same question. Cache hit. No recomputation. The work compounded.

Different agent. Related question. Finds the entry, extends it. Doesn’t need to know who computed it or when. The hash is the handshake.

Third case. Upstream data changed. Same expression, new input fingerprint. The cache re-executes and stores a fresh result. Lineage stays intact — same recipe, different run.

%%{init: {'theme': 'base', 'themeVariables': {'actorBkg': '#0a2a2e', 'actorBorder': '#C1F0FF', 'actorTextColor': '#C1F0FF', 'actorLineColor': '#1a4a50', 'signalColor': '#C1F0FF', 'signalTextColor': '#C1F0FF', 'noteBkgColor': '#0f3538', 'noteBorderColor': '#1a4a50', 'noteTextColor': '#8fd4e8', 'activationBkgColor': '#0f3538', 'activationBorderColor': '#1a4a50', 'labelBoxBkgColor': '#05181A', 'labelBoxBorderColor': '#1a4a50', 'labelTextColor': '#5ab0c8', 'loopTextColor': '#5ab0c8', 'background': '#05181A', 'mainBkg': '#0a2a2e', 'lineColor': '#1a4a50', 'textColor': '#C1F0FF', 'primaryColor': '#0a2a2e', 'primaryBorderColor': '#C1F0FF', 'primaryTextColor': '#C1F0FF', 'secondaryColor': '#0f3538', 'tertiaryColor': '#0f3538', 'fontFamily': 'FK Grotesk Mono, monospace', 'fontSize': '13px'}}}%%
sequenceDiagram
    participant User
    participant Agent
    participant Catalog
    participant Engine

    User->>Agent: churn by channel?
    activate Agent
    Agent->>Catalog: search / compose / lineage
    Agent->>Catalog: execute
    Catalog->>Engine: cache miss → run
    activate Engine
    Engine-->>Agent: result
    deactivate Engine
    Agent->>Catalog: add
    Agent->>User: result
    deactivate Agent

    rect rgba(15, 53, 56, 0.3)
        Note over Agent,Catalog: NEW SESSION
        User->>Agent: churn by channel?
        activate Agent
        Agent->>Catalog: execute
        Catalog-->>Agent: cache hit
        Agent->>User: result
        deactivate Agent
    end

Agent session lifecycle — search, compose, lineage, add

HOW AGENTS REACH THE CATALOG

  • MCP Server — the catalog appears as a tool in Claude Desktop or any MCP-native framework. Agents search, compose, and execute expressions without SDK integration.
  • Hooks — lifecycle callbacks (PreToolUse, PostToolUse, ToolFailure) that validate inputs and verify outputs at each step. Guardrails set by humans, enforced on every session.
  • Plugins — slash commands inside agent UIs. Type /xorq to search the catalog or run an expression inline.
  • Skills — reusable capabilities that coding agents like Claude Code invoke directly to read from or write to the catalog.

INTEGRATIONS

Xorq sits between agents and the infrastructure you already run. No migrations.

Code Agents

  • Claude Code, Codex, Cortex Code (Snowflake)

Data Infrastructure

  • Snowflake, Databricks, S3

Libraries and Frameworks

  • Scikit-learn, Feast, LangChain

PRICING

Pay for catalog storage and bytes transferred over Arrow Flight. Compute stays pluggable.

CATALOG STORAGE

+----------------------------------+-------------------+
| Tier                             | Price / Month     |
|----------------------------------|-------------------|
| Starter (up to 10gb namespaces)  | $99               |
| Team (up to 25gb)                | $299              |
| Enterprise (unlimited)           | Custom            |
+----------------------------------+-------------------+

BYTES

+----------------------------------+-------------------+
| Endpoint Type                    | Price / GB        |
|----------------------------------|-------------------|
| Flight DoGet (read)              | $0.10             |
| Flight DoPut (write)             | $0.05             |
| Flight DoExchange (transform)    | $0.01             |
+----------------------------------+-------------------+

INSTALL XORQ

[Q] QUICKSTART
pip install xorq
xorq init

FAQS

What is Xorq?

» You write a declarative expression, Xorq saves it to an immutable catalog entry (expression + metadata + cached results) that can be executed, diffed, shared, and served.

How is this different from a data catalog?

» Traditional catalogs document tables for humans — name-addressed, passively consumed. Xorq catalogs executable expressions for agents — input-addressed, actively composed. A traditional catalog answers “what tables exist?” Xorq answers “has this exact computation been done before?” That’s the difference between documentation and coordination.

Do I need to migrate to Xorq?

» No. Xorq connects directly to your existing infrastructure (Snowflake, Databricks, S3). No migrations required.

What does “input-addressed” caching mean?

» The identity of a cached result is determined by hashing the expression and all of its inputs — not the output content. If the inputs haven’t changed, the result is reused without re-execution. This is input-addressed, not content-addressed: trust comes from knowing the recipe is identical, not from inspecting the output. Traditional caching asks “is this expired?”. Input-addressed caching asks “are the inputs the same?”.

What does it mean to “serve an expression”?

» Serving exposes a compiled expression as a remote endpoint over Arrow Flight so other services or agents can send inputs and receive results without re-implementing pipeline logic.

How does Xorq compare to dbt Fusion?

» dbt Fusion gives you a faster, Rust-powered dbt with SQL-aware validation. Xorq gives you an expression graph that spans languages and engines, with a catalog that versions, caches, and governs the full pipeline. If your world is SQL models, Fusion is a meaningful upgrade. If your world is SQL + Python + ML across multiple engines, xorq is built for that from the ground up.

What if I need imperative code or custom logic?

» Expression-first does not eliminate flexibility. Escape hatches exist via UDFs and Arrow-based interfaces — opaque stages can conform to the same contract. Declarative at the top level, imperative underneath when necessary. Like unsafe in Rust — controlled, explicit, bounded.

What about non-deterministic steps like model training?

» The expression captures the full specification including the seed. If nothing changes, the cache avoids re-execution entirely. If you do re-run, reproducibility depends on the algorithm — that’s a property of the math, not the system. Xorq makes the specification explicit so you can tell the difference.

What happens when upstream data changes between sessions?

» You choose the invalidation strategy: modification-time (re-run if the source has been touched) or snapshot with TTL (trust the cache for a window, then re-validate). The expression stays the same. The cache policy is separate. Staleness tolerance is a business decision, not something the system should hide from you.

How does Xorq handle merge conflicts?

» It doesn’t — because there aren’t any. Same expression, same hash, automatic dedup. Different expression, different hash, no collision. In code, parallel work creates merge conflicts. In an input-addressed catalog, parallel work either converges or coexists. No locking, no conflicts, no deduplication logic.

WHAT COMPOUNDS

Agents are ephemeral. Sessions end. Models get swapped. Prompts change.

What compounds is the catalog — each entry verified evidence that the next agent builds on. Not accumulated output. Composable, reproducible work.

Not the agents. The work.

REQUEST A DEMO ESC
$ hello@xorq.dev
KEYBOARD SHORTCUTS ESC
HHOME
GGITHUB
DDOCS
PPRICING
RREQUEST DEMO
QQUICKSTART
JSCROLL DOWN
KSCROLL UP
⌘KTOGGLE THIS GUIDE