v2.b · iter-2b + VGS k=2 · RL-trained agent · KARL category · EARLY ACCESS · 2026

Scaling AI
collaboration.

Deep Horizon trains agents that reason over your team's knowledge —
handling the complex queries where frontier models break, surfacing
context before you ask, and giving every AI in your stack a shared understanding of your work.

/ Quality · vs. frontier on extraction
+0.0pp
over Claude Sonnet on structured knowledge extraction
/ Efficiency · reasoning steps
0×
fewer steps than brute-force majority vote
/ Training · RL compounding
0.0×
8% → 45% through OAPL training iterations
/01
◆ Three pillars · what RL-trained agents unlock

Train the agent.
Not the prompt.

/ Thesis

Every AI memory product gives the model more context. Deep Horizon takes a different approach — we train the agent itself, using the same OAPL algorithm behind Databricks' KARL, until it handles the queries frontier models break on.

/01·B
◆ Industries · same agent · six deployments

One agent.
Six knowledge
frontiers.

Same RL-trained policy. Same API. Specialized to the way each industry actually stores, references, and recalls knowledge — and the exact queries each team gets stuck on.

/ 01 · classified knowledge under hard isolation

Defense
& Intel.

Operational decision recall across classification boundaries. Tool gating respects compartment walls — the agent never queries a corpus it wasn't admitted to. Self-hosted on your GPUs, your network, your authority to operate.

14 tool calls
Median to a grounded answer · vs. 110 on majority vote
0 hops
In-process · sub-ms tool dispatch · zero egress
◆ Example querieshandled
◆ COA reconstruction · J3 / planning

"Reconstruct the COA review chain for OP-77 Phoenix — every dissent, every reason, every cited assessment."

◆ Analyst attribution · cyber

"Who on the intel team has reported on Helios-cluster activity in the last 90 days, and which assessments converged?"

◆ Cross-source synthesis · logistics

"What did the J4 conclude about the November supply gap, who supported that conclusion, and who pushed back?"

/ 02 · institutional memory across PIs, papers, grants

Research
Labs.

Institutional memory across PIs, papers, grants, and lab notebooks. Resolves authorship chains, cites the meeting where a hypothesis was first floated, surfaces who actually knows what across the lab.

50.2%
Profile completeness on 18-person research bench
67 leaves
Avg. structured profile slots populated per entity
◆ Example querieshandled
◆ Profile extraction · PI onboarding

"Build me Aisha Patel's research profile — papers, collaborators, grant history, expertise, recent topics."

◆ Collaborator discovery · cross-lab

"Who in the lab has worked on attention mechanisms with Boris Katz, and what did they each conclude independently?"

◆ Hypothesis archaeology · internal

"When did we first propose the gradient-routing approach, who pushed back, and what was the resolution?"

/ 03 · deal & relationship intelligence

Financial
Services.

Deal & relationship intelligence over IB chat, CRM, and email. Answers "who at Lazard did we last talk to about MidCap Energy" with a citation chain — not a document hit.

$0.012
Per complex query · VGS k=2 · agent-callable
15–25×
Cheaper than frontier API per reasoning query
◆ Example querieshandled
◆ Touchpoint reconstruction · coverage

"What was the last touchpoint with Goldman on the Helios financing, and what did they push back on?"

◆ Account history · relationship

"Who from the team has covered the MidCap Energy account in the last 18 months, and what's been said about credit?"

◆ Internal sentiment · risk

"Synthesize all internal commentary on Q3 default risk for the EMEA book — by sector, by analyst, by date."

/ 04 · clinical-trial memory · protocol recall

Life
Sciences.

Clinical-trial memory and protocol recall across CRO threads, IRB amendments, and investigator notes. Time-aware reasoning resolves "as of the v3 amendment" without re-indexing the corpus.

±3 pp
Parity gate · deployed API tracks eval-harness within
audit-ready
Every answer carries a citation chain to source
◆ Example querieshandled
◆ Amendment trace · protocol

"What did the principal investigator decide about the v3 dosing change, and what was the IRB rationale?"

◆ Site signal · operations

"Which sites flagged the cohort B drop-out spike, when, and what was the resolution per site?"

◆ Decision archaeology · steering committee

"Reconstruct every protocol amendment to TRIAL-42, who drove each, and what evidence supported it."

/ 05 · case & precedent recall

Legal
& Compliance.

Case & precedent recall across discovery, depositions, and partner correspondence. The agent surfaces the analogous matter — not just the matching word — and grounds every answer in source.

12 → 1
Hours of associate review collapsed to one query
cited
Every claim traced back to the source document
◆ Example querieshandled
◆ Precedent search · firm-wide

"Which of our matters since 2022 involve a similar consent-decree carve-out to Acme v. Olson?"

◆ Drafting history · deal

"Who drafted the indemnification clause in the Phoenix deal and what was the negotiation history with opposing counsel?"

◆ Correspondence pull · discovery

"Surface every email between us and opposing counsel about the privilege log in this matter — chronologically."

/ 06 · inter-agency knowledge · namespace isolation

Public
Sector.

Inter-agency knowledge with hard team isolation. Each agency gets its own memory namespace. API keys are team-scoped. No cross-jurisdiction data leakage by construction — enforced by the gating plugin, not policy.

0 egress
Self-hosted · per-agency namespace · sub-ms dispatch
N keys
Per-agency bearer-auth · no cross-namespace queries
◆ Example querieshandled
◆ Cleared synthesis · DHS / DOJ

"What did DHS conclude about the December supply-chain incident, and who has clearance to see the underlying source?"

◆ Policy archaeology · program office

"Reconstruct the policy reasoning behind the 2025 grant program revision — every memo, every author, every dissent."

◆ Working group lookup · inter-agency

"Who across the inter-agency working group has worked on rural broadband, and what did each agency conclude?"

/02
◆ Problem · the queries frontier models break on

AI breaks on the
questions that
matter most.

Simple questions are solved. The ones teams actually ask are multi-hop, cross-person, temporally scattered — answers live across dozens of conversations, weeks apart, involving different people. No single document contains them. No retrieval system returns them.

◆ 01 · multi-hop reasoning

"What was the reasoning behind the auth rewrite, who drove it, and what compliance constraint forced the decision?"

◆ 02 · cross-person synthesis

"Build me a complete profile of this person — everything the team knows about their work, expertise, collaborators, and decisions."

◆ 03 · knowledge overlap

"Which team members have overlapping knowledge about distributed caching, and what did they each conclude independently?"

This isn't a retrieval problem. It's a reasoning problem. You need an agent that searches iteratively, cross-references evidence, follows chains across people and time, and synthesizes grounded answers. An agent that gets better through training — not prompt engineering.
/03
◆ Demo · zero-prompt context in action

The AI starts informed.
You start typing.

Hooked into Claude Code, Cursor, your IDE. Architecture decisions, who's working on what, prior solutions to similar problems — injected into the session before you finish your sentence.

~ engineering / auth-middleware · deep horizon attached
● CONNECTED · iter2b-vgs-k2
$ user>Fix the auth middleware timeout issue
◆ deep horizon — context injected automatically · 12 tool calls · 4.2 s
architectureauth middleware rewrite in progress (compliance-driven)
team activity[sarah] merged auth-token-rotation PR 3 days ago
related decisionrotating tokens chosen on 2026-03-05 · #arch-review
prior workidentical timeout fixed via connection pooling (PR #847)
▸ claude>Based on team context, the timeout is likely related to
Sarah's new token rotation. PR #847 had identical symptoms — the
fix was increasing pool size from 5 to 20 in the session store
config. Let me check if that applies here
No copy-paste. No re-explaining. No "let me give you some context." The AI already has context because Deep Horizon gave it context — automatically, from team knowledge, before the human typed a word.
/04
◆ Training loop · why RL, not prompts

The agent learns which
search strategies
actually work.

A prompted agent uses the same strategy every time. An RL-trained agent has learned — from thousands of rollout trajectories — when to cross-reference, when to go deeper, and when it has enough evidence to commit. This is learned behavior, not instructed behavior.

/ 01 · input
Corpus
Team conversations, decisions, docs, code reviews — the substrate.
/ 02 · generate
Question gen.
Model generates its own training questions from the corpus.
/ 03 · explore
Agent rollouts
Search · reason · answer. Thousands of trajectories per iteration.
/ 04 · score
Reward signal
Nugget-coverage scoring (Voorhees) — did the agent get it right?
/ 05 · update
OAPL update
Off-policy RL. The policy learns which strategies actually pay off.
◂ iterate · 5.5× compound improvement across iter 1 → iter 2b
◆ Why RL, not just prompting

A prompted agent runs the same heuristic every query. A trained agent has learned — over thousands of rollouts — which search strategies pay off for which question shapes. Learned behavior, not instructed behavior.

◆ Test-time compute · each step counts

Value-Guided Search picks the highest-scoring action per step — one smart trajectory. Parallel Thinking runs N trajectories and merges results — maximum coverage. The value model is trained on Deep Horizon rollouts, not generic.

/05
◆ Benchmarks · measured against frontier models · open eval harness

Numbers, not
narratives.

Evaluated on real team corpora and academic benchmarks. Every claimed improvement has a parity gate: the deployed API must reproduce eval-harness numbers within ±3 percentage points.

/A · 18-person bench · profile extraction

Structured-profile coverage

Given a person's name and a team memory corpus, extract a complete structured profile across identity, professional background, education, relationships, publications, and all known facts.

Model
Accuracy
Cost / Profile
Deep Horizon iter-2b + PT N=10
50.2% · $0.15
Claude Sonnet 4.6
43.8% · $1.14
GLM 4.5 Air (base, no RL)
8.2% · $0.04

+6.4 pp over Sonnet. 7.6× cheaper. 5× faster. Method: 10 independent agent rollouts with per-leaf union aggregation — a novel mechanical merge across rollouts for maximum coverage with no hallucination risk.

/B · OAPL 101-question bench · factoid QA

Short-answer factual recall

Answer short factual questions (1–5 words) about people, events, and relationships in the team corpus. Tool-call efficiency matters as much as accuracy.

Model
Accuracy
Tool calls
Deep Horizon iter-2b + VGS k=2
45.5% · ~14
DH iter-2b + Majority Vote N=10
39.6% · ~110
GLM 4.5 Air (base, no RL)
23.0% · ~12

+5.9 pp over majority vote with 8× fewer tool calls. Value-Guided Search uses a trained value model (Qwen3-4B) to pick the best action at each step — smarter, not just more compute.

/05·B
◆ Training trajectory · 5.5× improvement through RL

From 8% to 45.5%.

8.2%
Base
no RL
23.0%
Iter 1
first OAPL
28.7%
Iter 1.1
+ extraction
33.7%
Iter 2b
multi-bench
45.5%
Iter 2b + VGS
test-time compute
0% accuracy OAPL 101-Q bench · factoid QA 50%
Iteration
Acc.
What changed
Base · no RL
8.2%
GLM 4.5 Air out-of-the-box
Iter 1
23.0%
First OAPL training run (reward-function fix was critical)
Iter 1.1
28.7%
Added extraction questions to training mix
Iter 2b
33.7%
Multi-benchmark training (HotpotQA, MuSiQue, QAMPARI, FinanceBench)
Iter 2b + VGS
45.5%
Test-time compute · Value-Guided Search
/06
◆ Test-time compute · dispatched per task shape

Two inference
strategies. Pick
the right one.

The dispatcher inspects the request, picks a strategy, and the agent inherits the right inference budget. Parallel Thinking buys breadth. Value-Guided Search buys depth.

Best for · profile extraction · schema-fill tasks

Parallel Thinking.

QUERY ∪ UNION N=10 parallel rollouts

How it works. Spawn N independent agent rollouts in parallel. Each searches the corpus independently and produces a candidate answer. For structured profiles, aggregate with per-leaf union (our novel aggregator); for short answers, an LLM aggregator.

Why it works. Different rollouts find different facts. Union aggregation combines coverage from all rollouts without hallucination — a fact only ships if a rollout cited it.

50.2% on profile extraction · +6.4 pp over Sonnet · $0.15 / profile
Best for · factoid questions · short-answer retrieval

Value-Guided Search.

QUERY v=0.12 v=0.81 ✓ v=0.94 ✓ v=0.31 → ans k=2 candidates · value-scored at each step

How it works. At each step, sample k candidate actions from the policy. Score each candidate with a trained value model (Qwen3-4B fine-tuned on Deep Horizon rollouts). Execute the highest-scoring action. Repeat until the agent commits.

Why it works. Instead of more rollouts (breadth), VGS makes each rollout smarter (depth). The value model learns which search queries and reasoning paths lead to correct answers.

45.5% on factoid QA · +5.9 pp over majority vote · 8× fewer tool calls
/07
◆ Architecture · the training + inference substrate

Same agent.
Trained, then
inferred.

/ layer 02
Environment
lifecycle plugins · KARL-faithful

◆ KARL-faithful lifecycle plugins

Compression, step budgeting, and tool gating are composable interceptors — not hardcoded logic. Add new plugins without touching the agent loop.

◆ In-process retrieval

The agent's search tool runs in-process. Sub-millisecond tool dispatch. No HTTP hops between agent and corpus.

◆ Open-weight policy

GLM 4.5 Air is fully open. No API dependency for the core reasoning model. Deploy on your infrastructure, your GPUs, your classification boundary.

◆ Trained value model

Qwen3-4B fine-tuned on Deep Horizon rollout data. Scores search trajectories to guide the agent at inference time. ~$0.0001 per call.

/08
◆ Lineage · KARL (Databricks, 2025)

Same research lineage.
Different application.

KARL proved that reinforcement learning can train open-weight models into knowledge agents that beat frontier models. They trained on academic benchmarks. Deep Horizon takes the same algorithm, the same harness, the same test-time compute — and applies it to team collaboration.

KARL · Databricks Deep Horizon
Training algorithm OAPL OAPL · same
Architecture aroll harness + lifecycle plugins aroll harness + lifecycle plugins · same framework
Test-time compute Parallel Thinking + Value-Guided Search PT + VGS · same strategies
Application domain Academic QA benchmarks · HotpotQA, MuSiQue, QAMPARI Team knowledge · people, decisions, relationships, expertise
Novel contribution Proved RL works for knowledge agents Per-leaf union for structured extraction · promptless recall · agent-to-agent API
Target user Researchers Engineering teams using AI daily
Deep Horizon is KARL for teams. Same proven RL training pipeline. Applied to the queries your team actually asks. Deployed as an API any agent can call — the addition KARL doesn't address.
/09
◆ Agent-to-agent · the collaboration layer

The next step is
AI-to-AI.
Your stack is ready.

Human-AI collaboration is solved. The next step is AI-AI. Every agent in your stack needs the same team context. Same multi-hop reasoning. Same API. Structured JSON in, structured JSON out.

◆ code review
needs module ownership and the constraints the owner set.
◆ planning
needs architectural decisions and the rationale behind them.
◆ incident response
needs the history of similar issues and how they were resolved.
◆ onboarding
needs months of team context synthesized for a new hire.
◆ release notes
needs cross-PR narrative — who shipped what, why, what landed together.
// any agent in your stack can call this
# Agent-to-agent knowledge query
curl -X POST https://api.deephorizon.dev/v1/agent/search \
  -H "Authorization: Bearer $AGENT_KEY" \
  -d '{
    "query": "Who owns the payment processing module
              and what were the last 3 architectural
              decisions affecting it?",
    "team_id": "engineering",
    "model": "iter2b-vgs-k2",
    "caller": "code-review-agent"
  }'

# Same API. Same quality. Agent or human.
{
  "owner": "Sarah Chen",
  "decisions": […3 cited entries…],
  "n_tool_calls": 14,
  "cost_usd": 0.014
}
This is what "scaling AI collaboration" means. Not just human + AI. Human + AI + AI + AI — all reasoning over the same team knowledge, all getting smarter as the underlying agent improves through training.
/10
◆ Economics · frontier reasoning, open-model prices

Predictable
by the month.
Not the token.

The policy model (GLM 4.5 Air) runs on Modal GPUs at ~$0.0005 per call. The value model (Qwen3-4B) costs ~$0.0001. All orchestration runs on a $30/month CPU server. No per-token API pricing creeping into your runrate.

Operation Deep Horizon Claude Sonnet Savings
Profile extraction $0.15 / profile $1.14 / profile 7.6× cheaper
Factoid search · VGS k=2 $0.012 / query $0.18 – 0.30 / query 15 – 25× cheaper
Factoid search · PT N=10 $0.055 / query $0.18 – 0.30 / query 3 – 5× cheaper
Always-on orchestration ~$30 / month per-call pricing predictable
/11
◆ Interface · two endpoints · human or agent caller

Two endpoints.
That's it.

RESTful API. Bearer token. Drop it into any workflow. Model selection is a single field — the dispatcher does the rest.

/v1/agent/search · complex reasoning query
# Complex reasoning query
curl -X POST https://api.deephorizon.dev/v1/agent/search \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "query": "What was the reasoning behind the auth
              rewrite and who drove it?",
    "team_id": "engineering"
  }'

# Response
{
  "answer": "The auth middleware rewrite was driven
    by legal/compliance requirements around session
    token storage. Sarah Chen led the effort,
    decision finalized March 5. Key constraint:
    tokens must rotate every 24h...",
  "model_used": "iter2b-vgs-k2",
  "n_tool_calls": 12,
  "elapsed_seconds": 68.4,
  "cost_usd": 0.014
}
/v1/agent/extract · structured knowledge
# Structured knowledge extraction
curl -X POST https://api.deephorizon.dev/v1/agent/extract \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "target_entity": "Sarah Chen",
    "team_id": "engineering"
  }'

# Response
{
  "profile": {
    "name": "Sarah Chen",
    "role": "Senior Backend Engineer",
    "owns": ["auth middleware", "session mgmt"],
    "recent_decisions": ["token rotation", ],
    "collaborators": ["Boris K.", "Alex M."],
    "expertise": ["security", "distributed sys"]
  },
  "n_leaves_populated": 67,
  "cost_usd": 0.15
}
/11·B
◆ Available models
Model Best for Method Default for
iter2b-vgs-k2 Factoid questions Value-Guided Search /search
iter2b-pt-n10 Profile extraction Parallel Thinking + per-leaf union /extract
iter2b-single Quick baseline Single rollout, no TTC
claude-sonnet-4-6 Fallback Frontier API path
/12
◆ Field comparison · not a memory store · a trained agent

How Deep Horizon
compares.

Capability Deep Horizon KARL · Databricks Mem0 / Zep Frontier APIs
RL-trained reasoning agent ● yes ● yes ○ no ○ no
Test-time compute · PT + VGS ● yes ● yes ○ no ○ no
Beats frontier on extraction +6.4 pp over Sonnet +pp over GPT-4 (reported) n/a baseline
Promptless context injection ● yes ○ no ○ no ○ no
Agent-to-agent knowledge API ● yes ○ no key-value store n/a
Structured profile extraction 50.2% accuracy not addressed ○ no prompt-only
Open-weight policy model ● yes ● yes n/a ○ no
Application domain team collaboration academic benchmarks memory storage general
Pricing · per complex query $0.012 research only SaaS tiers $0.18+
One row that matters: Mem0 and Zep store memories. KARL and Deep Horizon train agents to reason over them. We're in the KARL column — with the addition of promptless context and agent-to-agent collaboration KARL doesn't address.
/13
◆ Early access · 2026 · the collaboration layer for your AI stack

Your AI stack is missing
a collaboration layer.

Every agent in your stack starts from zero. Every session forgets.
Every complex question gets a shallow answer. Deep Horizon fixes this —
with RL-trained agents that reason over your team's knowledge, get smarter through training,
and serve every AI tool you use through a single API.