Deep Horizon trains agents that reason over your team's knowledge —
handling the complex queries where frontier models break, surfacing
context before you ask, and giving every AI in your stack a shared understanding of your work.
Every AI memory product gives the model more context. Deep Horizon takes a different approach — we train the agent itself, using the same OAPL algorithm behind Databricks' KARL, until it handles the queries frontier models break on.
Search. Cross-reference. Synthesize. The agent issues a search, reads the results, decides what to search next, follows chains across people and time, and commits to an answer only when grounded. This loop is RL-trained — learned, not instructed.
Your AI starts informed, not blank. Deep Horizon hooks into Claude Code, Cursor, your IDE — and injects relevant team knowledge before you type. Architecture decisions, prior solutions, who's working on what. No copy-paste. No "let me give you some context."
One substrate. Every AI in your stack. Code review needs module ownership. Planning needs architectural rationale. Onboarding needs months of team history. They all call the same endpoint — and finally reason over the same team context humans do.
Same RL-trained policy. Same API. Specialized to the way each industry actually stores, references, and recalls knowledge — and the exact queries each team gets stuck on.
Operational decision recall across classification boundaries. Tool gating respects compartment walls — the agent never queries a corpus it wasn't admitted to. Self-hosted on your GPUs, your network, your authority to operate.
"Reconstruct the COA review chain for OP-77 Phoenix — every dissent, every reason, every cited assessment."
"Who on the intel team has reported on Helios-cluster activity in the last 90 days, and which assessments converged?"
"What did the J4 conclude about the November supply gap, who supported that conclusion, and who pushed back?"
Institutional memory across PIs, papers, grants, and lab notebooks. Resolves authorship chains, cites the meeting where a hypothesis was first floated, surfaces who actually knows what across the lab.
"Build me Aisha Patel's research profile — papers, collaborators, grant history, expertise, recent topics."
"Who in the lab has worked on attention mechanisms with Boris Katz, and what did they each conclude independently?"
"When did we first propose the gradient-routing approach, who pushed back, and what was the resolution?"
Deal & relationship intelligence over IB chat, CRM, and email. Answers "who at Lazard did we last talk to about MidCap Energy" with a citation chain — not a document hit.
"What was the last touchpoint with Goldman on the Helios financing, and what did they push back on?"
"Who from the team has covered the MidCap Energy account in the last 18 months, and what's been said about credit?"
"Synthesize all internal commentary on Q3 default risk for the EMEA book — by sector, by analyst, by date."
Clinical-trial memory and protocol recall across CRO threads, IRB amendments, and investigator notes. Time-aware reasoning resolves "as of the v3 amendment" without re-indexing the corpus.
"What did the principal investigator decide about the v3 dosing change, and what was the IRB rationale?"
"Which sites flagged the cohort B drop-out spike, when, and what was the resolution per site?"
"Reconstruct every protocol amendment to TRIAL-42, who drove each, and what evidence supported it."
Case & precedent recall across discovery, depositions, and partner correspondence. The agent surfaces the analogous matter — not just the matching word — and grounds every answer in source.
"Which of our matters since 2022 involve a similar consent-decree carve-out to Acme v. Olson?"
"Who drafted the indemnification clause in the Phoenix deal and what was the negotiation history with opposing counsel?"
"Surface every email between us and opposing counsel about the privilege log in this matter — chronologically."
Inter-agency knowledge with hard team isolation. Each agency gets its own memory namespace. API keys are team-scoped. No cross-jurisdiction data leakage by construction — enforced by the gating plugin, not policy.
"What did DHS conclude about the December supply-chain incident, and who has clearance to see the underlying source?"
"Reconstruct the policy reasoning behind the 2025 grant program revision — every memo, every author, every dissent."
"Who across the inter-agency working group has worked on rural broadband, and what did each agency conclude?"
Simple questions are solved. The ones teams actually ask are multi-hop, cross-person, temporally scattered — answers live across dozens of conversations, weeks apart, involving different people. No single document contains them. No retrieval system returns them.
"What was the reasoning behind the auth rewrite, who drove it, and what compliance constraint forced the decision?"
"Build me a complete profile of this person — everything the team knows about their work, expertise, collaborators, and decisions."
"Which team members have overlapping knowledge about distributed caching, and what did they each conclude independently?"
Hooked into Claude Code, Cursor, your IDE. Architecture decisions, who's working on what, prior solutions to similar problems — injected into the session before you finish your sentence.
A prompted agent uses the same strategy every time. An RL-trained agent has learned — from thousands of rollout trajectories — when to cross-reference, when to go deeper, and when it has enough evidence to commit. This is learned behavior, not instructed behavior.
A prompted agent runs the same heuristic every query. A trained agent has learned — over thousands of rollouts — which search strategies pay off for which question shapes. Learned behavior, not instructed behavior.
Value-Guided Search picks the highest-scoring action per step — one smart trajectory. Parallel Thinking runs N trajectories and merges results — maximum coverage. The value model is trained on Deep Horizon rollouts, not generic.
Evaluated on real team corpora and academic benchmarks. Every claimed improvement has a parity gate: the deployed API must reproduce eval-harness numbers within ±3 percentage points.
Given a person's name and a team memory corpus, extract a complete structured profile across identity, professional background, education, relationships, publications, and all known facts.
+6.4 pp over Sonnet. 7.6× cheaper. 5× faster. Method: 10 independent agent rollouts with per-leaf union aggregation — a novel mechanical merge across rollouts for maximum coverage with no hallucination risk.
Answer short factual questions (1–5 words) about people, events, and relationships in the team corpus. Tool-call efficiency matters as much as accuracy.
+5.9 pp over majority vote with 8× fewer tool calls. Value-Guided Search uses a trained value model (Qwen3-4B) to pick the best action at each step — smarter, not just more compute.
The dispatcher inspects the request, picks a strategy, and the agent inherits the right inference budget. Parallel Thinking buys breadth. Value-Guided Search buys depth.
How it works. Spawn N independent agent rollouts in parallel. Each searches the corpus independently and produces a candidate answer. For structured profiles, aggregate with per-leaf union (our novel aggregator); for short answers, an LLM aggregator.
Why it works. Different rollouts find different facts. Union aggregation combines coverage from all rollouts without hallucination — a fact only ships if a rollout cited it.
How it works. At each step, sample k candidate actions from the policy. Score each candidate with a trained value model (Qwen3-4B fine-tuned on Deep Horizon rollouts). Execute the highest-scoring action. Repeat until the agent commits.
Why it works. Instead of more rollouts (breadth), VGS makes each rollout smarter (depth). The value model learns which search queries and reasoning paths lead to correct answers.
Compression, step budgeting, and tool gating are composable interceptors — not hardcoded logic. Add new plugins without touching the agent loop.
The agent's search tool runs in-process. Sub-millisecond tool dispatch. No HTTP hops between agent and corpus.
GLM 4.5 Air is fully open. No API dependency for the core reasoning model. Deploy on your infrastructure, your GPUs, your classification boundary.
Qwen3-4B fine-tuned on Deep Horizon rollout data. Scores search trajectories to guide the agent at inference time. ~$0.0001 per call.
KARL proved that reinforcement learning can train open-weight models into knowledge agents that beat frontier models. They trained on academic benchmarks. Deep Horizon takes the same algorithm, the same harness, the same test-time compute — and applies it to team collaboration.
| KARL · Databricks | Deep Horizon | |
|---|---|---|
| Training algorithm | OAPL | OAPL · same |
| Architecture | aroll harness + lifecycle plugins | aroll harness + lifecycle plugins · same framework |
| Test-time compute | Parallel Thinking + Value-Guided Search | PT + VGS · same strategies |
| Application domain | Academic QA benchmarks · HotpotQA, MuSiQue, QAMPARI | Team knowledge · people, decisions, relationships, expertise |
| Novel contribution | Proved RL works for knowledge agents | Per-leaf union for structured extraction · promptless recall · agent-to-agent API |
| Target user | Researchers | Engineering teams using AI daily |
Human-AI collaboration is solved. The next step is AI-AI. Every agent in your stack needs the same team context. Same multi-hop reasoning. Same API. Structured JSON in, structured JSON out.
# Agent-to-agent knowledge query curl -X POST https://api.deephorizon.dev/v1/agent/search \ -H "Authorization: Bearer $AGENT_KEY" \ -d '{ "query": "Who owns the payment processing module and what were the last 3 architectural decisions affecting it?", "team_id": "engineering", "model": "iter2b-vgs-k2", "caller": "code-review-agent" }' # Same API. Same quality. Agent or human. { "owner": "Sarah Chen", "decisions": […3 cited entries…], "n_tool_calls": 14, "cost_usd": 0.014 }
The policy model (GLM 4.5 Air) runs on Modal GPUs at ~$0.0005 per call. The value model (Qwen3-4B) costs ~$0.0001. All orchestration runs on a $30/month CPU server. No per-token API pricing creeping into your runrate.
| Operation | Deep Horizon | Claude Sonnet | Savings |
|---|---|---|---|
| Profile extraction | $0.15 / profile | $1.14 / profile | 7.6× cheaper |
| Factoid search · VGS k=2 | $0.012 / query | $0.18 – 0.30 / query | 15 – 25× cheaper |
| Factoid search · PT N=10 | $0.055 / query | $0.18 – 0.30 / query | 3 – 5× cheaper |
| Always-on orchestration | ~$30 / month | per-call pricing | predictable |
RESTful API. Bearer token. Drop it into any workflow. Model selection is a single field — the dispatcher does the rest.
# Complex reasoning query curl -X POST https://api.deephorizon.dev/v1/agent/search \ -H "Authorization: Bearer $API_KEY" \ -d '{ "query": "What was the reasoning behind the auth rewrite and who drove it?", "team_id": "engineering" }' # Response { "answer": "The auth middleware rewrite was driven by legal/compliance requirements around session token storage. Sarah Chen led the effort, decision finalized March 5. Key constraint: tokens must rotate every 24h...", "model_used": "iter2b-vgs-k2", "n_tool_calls": 12, "elapsed_seconds": 68.4, "cost_usd": 0.014 }
# Structured knowledge extraction curl -X POST https://api.deephorizon.dev/v1/agent/extract \ -H "Authorization: Bearer $API_KEY" \ -d '{ "target_entity": "Sarah Chen", "team_id": "engineering" }' # Response { "profile": { "name": "Sarah Chen", "role": "Senior Backend Engineer", "owns": ["auth middleware", "session mgmt"], "recent_decisions": ["token rotation", …], "collaborators": ["Boris K.", "Alex M."], "expertise": ["security", "distributed sys"] }, "n_leaves_populated": 67, "cost_usd": 0.15 }
| Model | Best for | Method | Default for |
|---|---|---|---|
| iter2b-vgs-k2 | Factoid questions | Value-Guided Search | /search |
| iter2b-pt-n10 | Profile extraction | Parallel Thinking + per-leaf union | /extract |
| iter2b-single | Quick baseline | Single rollout, no TTC | — |
| claude-sonnet-4-6 | Fallback | Frontier API path | — |
| Capability | Deep Horizon | KARL · Databricks | Mem0 / Zep | Frontier APIs |
|---|---|---|---|---|
| RL-trained reasoning agent | ● yes | ● yes | ○ no | ○ no |
| Test-time compute · PT + VGS | ● yes | ● yes | ○ no | ○ no |
| Beats frontier on extraction | +6.4 pp over Sonnet | +pp over GPT-4 (reported) | n/a | baseline |
| Promptless context injection | ● yes | ○ no | ○ no | ○ no |
| Agent-to-agent knowledge API | ● yes | ○ no | key-value store | n/a |
| Structured profile extraction | 50.2% accuracy | not addressed | ○ no | prompt-only |
| Open-weight policy model | ● yes | ● yes | n/a | ○ no |
| Application domain | team collaboration | academic benchmarks | memory storage | general |
| Pricing · per complex query | $0.012 | research only | SaaS tiers | $0.18+ |
Every agent in your stack starts from zero. Every session forgets.
Every complex question gets a shallow answer. Deep Horizon fixes this —
with RL-trained agents that reason over your team's knowledge, get smarter through training,
and serve every AI tool you use through a single API.