AI Workforce Platform

Build Your
Parallel
Workforce.

Name: DeepBench
Availability: InStock
Author: John Leonard

Stop re-training humans who leave. Encode your expertise into agents that work 24/7, get smarter with every task, and never forget what you taught them.

DeepBench lets you build a bench of specialized AI agents, assign them real work tasks, and manage output through a single dashboard. The long game: train an agent on your own knowledge — so your expertise generates deliverables, briefings, and research even when you're not in the room.

Try it Live

Your bench — 7 agents, 4 active right now

deepbench · Your Bench

＋ Add a Player

Bench Size

Working Now

$1.1M

Annual Value

24/7

Always On

📊

Robyn Castellanos

NIGP Consultant · RAG+Deep

Working

🌐

Brent Matthews

Web Agent · ReAct+Memory

Fetching

📋

Michelle Manning

Project Planner

Ready

🎯

Mike Alvarez

Senior Analyst · LLM Deep

Trained

Active work right now

Robyn — Austin FY2025 Spend Analysis · In progress

Brent — Maryland Comptroller live data fetch · Running

Mike — Vendor consolidation briefing · Needs Review

The Problem

"I can't take on more clients without cloning myself."

Every enterprise AI deployment shares the same invisible failure mode: the model forgets. AI Amnesia — the inability to carry institutional knowledge, reasoning history, and domain context forward — means organizations re-inject identical context on every call while compounding value is left untouched.

Every consultant, business owner, and enterprise team faces the same ceiling beneath that. Capacity is capped by the number of skilled humans on the team. Those humans leave — taking years of knowledge with them. The result: hire, train, lose, repeat. The expertise never compounds.

The Five Layers of Enterprise AI

LLM — Base reasoning engine

Solved

Prompt — Instructions given at call time

Solved

Memory — What the agent retains across sessions

Unsolved

Reasoning — Domain-tuned logic and judgment

Unsolved

Deliverable — Structured, governed, auditable output

Partial

Memory and Reasoning are the unsolved layers — and the highest-value ones. DeepBench is architected to solve for all five.

👤
You're the bottleneckEvery critical task routes through one or two people. Growth means either burning them out or hiring — and hoping the new person sticks.
🔁
Training is permanent overheadEvery new hire costs months of ramp time. When they leave, the knowledge walks out the door. You start over.
⏱
Your value is in analysis, not data wranglingThe most skilled people on your team spend their best hours on prep work — cleaning data, mapping columns, running reports — instead of insight that moves the needle.
🏢
Enterprise AI isn't built for youThe platforms built for Fortune 500 teams cost millions and take months to deploy. They don't fit consultants, mid-market firms, or government agencies with real deadlines.
🧠
Your expertise isn't transferableThe most valuable thing in your organization is what your best people know. There's no way to capture it, scale it, or keep it when they leave. Until now.

How It Works

Three Steps.
Your Parallel Workforce.

Build once. Assign forever. Every task makes your agents better.

Build Your Agent

Choose a domain — analysis, research, writing, compliance. Name your agent, define their specialty, set their skill level. In minutes you have a permanent member of your bench with a full personnel file.

→

Teach Your Agent

Upload documents, training materials, and past work. Your agents build a RAG knowledge base from everything you teach them — grounding every future response in your methodology, not generic AI defaults. Depth is measurable: agents move from General → Trained → Expert → Proprietary as you invest in their knowledge base. Deeper capability means higher-quality output — and a compounding asset you own.

→

Assign the Work

Describe the task. Your planning agent generates a step-by-step execution plan, assigns the right agents to each step, and flags where human judgment is required. Approve the plan, and let them work.

Who It's For

Three Ceilings.
One Solution.

DeepBench meets you where your capacity actually breaks down.

Solo Consultant

"I can't take on more clients without cloning myself."

Build an AI bench that handles the analytical heavy lifting for every engagement. Upload client data, assign the analysis to your agents, and arrive to kickoffs with findings already in hand. More engagements. Same you.

Scale Without Hiring

Business Owner

"I'm tired of re-training people who leave."

Encode your standard operating procedures, best practices, and institutional knowledge into agents that never forget and never quit. Every new team member inherits the full depth of your organization's expertise on day one.

Encode Your Expertise

Enterprise / Association

"Our practice can't scale to every member."

Deploy a bench of specialist agents trained on your methodology, standards, and knowledge base. Give every member — or every client — access to the same quality of expert guidance you currently reserve for your largest accounts.

Methodology at Scale

Live Demo — City of Austin, FY2025

$372M Analyzed.
11,711 Transactions.
28 Seconds.

The City of Austin's full fiscal year procurement dataset — loaded, classified, flagged, and briefed by DeepBench's NIGP Consultant agent (Robyn Castellanos) in under a minute. What used to take a consultant half a day of Excel work is now a live task result you can share via URL.

View Live Demo Task ↗

$372M

Total procurement spend analyzed

11,711

Individual transactions classified

$33.6M

Maverick spend surfaced — 9% of total

28s

Upload to full analysis report

✦ AI Flags Detected

Maverick Spend$33,600,000HIGH

Oct Spending Spike$60,400,000 · 2.1×MED

Top Vendor (Motorola)$29,200,000 · 7.8%MED

Vendor HHI Score~850 CompetitiveOK

Origin

"We didn't start with a platform idea. We started with a real government procurement problem."

DeepBench grew out of the NIGP Spend Analyzer — a production tool built for government chief procurement officers and consultants that classifies transactions against the NIGP taxonomy, surfaces compliance risks, and generates AI executive briefings in under a minute.

That tool is still live. It analyzed $372M in City of Austin procurement spend as a single DeepBench task. It's not a demo — it's the proof that domain-specific AI agents, grounded in real methodology, produce results that generic AI tools can't.

DeepBench is the platform generalization of that concept. The NIGP Analyzer became the first agent on the bench. The architecture that made it work — deterministic logic as the analytical foundation, LLMs as the intelligence layer on top — became the design identity for everything that followed.

The Design Identity

Deterministic + Generative Hybrid

Domain logic runs the analysis. LLMs interpret the findings. Never the other way around. The numbers are always right — the AI adds the narrative.

Multi-Agent Orchestration

A planning agent (Michelle) coordinates specialist agents across every task. Each agent has a distinct role, a trainable knowledge base, and a composable place in any workflow.

Self-Improving Web Agent

Brent's ReAct loop — browser perception → AI decision → Playwright action → knowledge write-back — gets better at every portal it visits. No retraining required.

Human-in-the-Loop by Design

HITL gates are built into the task planning layer — not bolted on. Every consequential decision has a natural pause point for human judgment before proceeding.

↗

The original NIGP Analyzer is still live

Government CPOs and NIGP consultants use it today. It's the production proof of concept that DeepBench was built on top of.

nigp.roadmapventure.com ↗

Platform Capabilities

Everything Your Bench Needs.
Nothing They Don't.

Purpose-built for knowledge work. Every capability in service of getting real analysis done faster.

🧠

RAG Knowledge Base

Train agents on your documents, methodology, and past work. Every agent response is grounded in what you've taught them — not generic AI defaults. Knowledge compounds with every training session.

Trainable Agents

📋

Plan-and-Execute Tasks

Describe work in plain language. Your planning agent generates a step-by-step execution plan, assigns the right specialist agents, and flags human-in-the-loop gates automatically. You approve; they execute.

AI Task Planning

🌐

Live Web Agent

Brent Matthews is a Playwright-powered web agent who navigates real government portals, fetches live procurement data, and builds memory from every successful run. Self-improving with every task.

ReAct + Memory

📊

NIGP Spend Analysis

The full NIGP classification engine lives inside DeepBench as a first-class task type. Upload any government procurement CSV — classification, health flags, vendor HHI, and AI executive briefing in under a minute.

NIGP Native

💬

Agent Chat Panel

Chat directly with any agent on your bench. Intelligent routing suggests the right agent for your question. Every response shows its knowledge provenance — Trained, Informed, or General. Save any answer as a task assignment.

Consultative AI

✦

Full AI Transparency

Every AI-touched element is labeled with a ✦ AI badge. A live activity panel shows cost, latency, and call type for every AI operation. You always know exactly where AI is being used and what it's doing.

AI Audit Layer

No Vaporware

Every Claim.
Verified Live.

The AI industry is full of demos that don't survive contact with reality. Here's what's actually running in production today — and what's designed but not yet shipped.

Live in Production

RAG + Vector Embeddings

pgvector · Supabase · OpenAI embeddings — live at query time

✅ Live

ReAct Agent Loop

Brent — Railway/Playwright — reason, act, observe cycles

✅ Live

Per-Agent Guardrails

Always/never rules stored in Supabase, enforced on every call

✅ Live

AI Cost Audit

Per-call: model, tokens, cost, latency → Supabase log

✅ Live

Self-Learning / Knowledge Reinforcement

Brent writes fetch results back as training entries automatically

✅ Live

Structured Tool Use

Claude tool use throughout — no free-text JSON parsing anywhere

✅ Live

Prompt Caching

Anthropic caching on system prompts — up to 90% cost reduction

✅ Live

Multi-Agent Orchestration

Michelle plans, assigns, and coordinates specialist agents per task

✅ Live

Designed — Shipping Next

Capability Depth Spectrum (1–4)

General → Trained → Expert → Proprietary — priced by depth

🔶 Designed

Human-in-the-Loop Execution Gates

Step execution pauses for human review before proceeding

🔶 Designed

Multi-Tenancy + Auth

tenant_id on every table today — Clerk auth wraps in v6

🔶 v6

Agent Marketplace

Build, train, and publish agents — earn revenue share

🔶 v6

BYOK + Per-Agent LLM Assignment

Tenant API keys, per-agent model selection

🔶 v7

MCP — Agent Integration

DeepBench agents callable from Claude Desktop, Cursor, any MCP environment

🔶 v7

MCP means a trained DeepBench procurement agent becomes callable from any AI tool — Claude Desktop, Cursor, or any MCP-compatible environment. Distribution without the UI.

Your Starter Bench

Eight Specialists.
Ready to Work Today.

Every DeepBench account starts with a pre-built bench of specialist agents. Trainable, extensible, and ready to be assigned real work the moment you log in.

📊

Robyn Castellanos

NIGP Consultant

RAG + deep reasoning. The specialist for government procurement spend analysis and NIGP taxonomy work. Trained on procurement methodology.

🌐

Brent Matthews

Web Agent

Playwright-powered ReAct loop with persistent memory. Navigates government portals, fetches live data, and self-improves from every run.

📋

Michelle Manning

Project Planner

Generates step-by-step execution plans, assigns the right agents, and flags human-in-the-loop gates. The conductor of every multi-step task.

🔍

Bob Whitfield

Professional Analyst

RAG-augmented professional-grade analysis. Deep knowledge base queries, structured output, and briefing-quality deliverables.

📈

Mike Alvarez

Senior Analyst

Deep LLM reasoning for complex analytical tasks. Executive-quality structured output. The go-to for nuanced interpretation and strategy drafts.

🎨

Christy Park

Marketing Designer

Formatting and presentation specialist. Transforms raw analysis into polished client-facing deliverables. Pitch decks, one-pagers, executive summaries.

📝

Chloe Okafor

Junior Analyst

Fast, lightweight classification and routing tasks. Perfect for first-pass triage, categorization, and routing decisions before senior agents take over.

＋

Add Your Own

Build a specialist agent trained on your domain, your methodology, your knowledge base. Your bench, your rules.

Try it Live

AI You Can Trust

Every AI Decision.
Visible.

✦ AI Transparency Layer — Built In

DeepBench doesn't hide where AI is being used. Every output generated by an AI agent carries a ✦ AI badge. A live activity panel shows exactly which model ran, what it cost, and how long it took — for every operation in the session.

Human-in-the-loop gates are flagged automatically in every task plan. When a step requires your judgment before proceeding, it stops. No AI agent makes a consequential decision without the opportunity for human review.

Deterministic algorithms — procurement health flags, HHI scoring — carry no AI badge. You always know the difference between a rule-based calculation and a model-generated insight.

✦ Task Plan — Spend Analysis Engagement

Michelle Manning · Planner

Brent Matthews · Web Agent

Fetch Maryland Comptroller FY2025 procurement data via live portal agent. Playwright navigation with memory recall from prior runs.

⚑ HITL Review fetched dataset before proceeding — confirm date range and column mapping is correct.

Robyn Castellanos · NIGP Consultant

Run full NIGP classification, 6 procurement health flags, vendor HHI scoring, and AI executive briefing on confirmed dataset.

Christy Park · Marketing Designer

Format health flag findings and AI briefing into a client-ready executive summary deck for the CPO presentation.

⚑ HITL Review final deliverable before sharing with client — confirm key findings are accurate and recommendations are appropriately scoped.

The Long Game

"The end goal isn't just automating tasks. It's training an agent that represents your knowledge — and works on your behalf."

Every expert has a body of knowledge that took years to build. Methodologies, judgment calls, pattern recognition, institutional memory. Right now, that expertise exists only in one place: inside one person's head.

DeepBench's training layer is designed for this. Upload your documents, your frameworks, your past work. Teach your agents how you think. Over time, you build an AI that doesn't just do tasks — it does tasks the way you would do them.

What this looks like in practice

01

A consultant uploads every engagement deliverable they've produced over 10 years. Their agents now draft using that methodology — not generic AI defaults.

02

A business owner encodes their standard operating procedures. New hires inherit the full institutional playbook on day one — agents answer questions the same way their founder would.

03

An association trains agents on their published standards and best practices. Every member gets access to expert-level guidance that previously required a direct consulting engagement.

        Your expertise · Encoded · Permanently on your bench
      

The Compounding Moat

A deeply trained agent built over 12 months is not easily replaced. The more you invest in training, the more you own — and the less you'd ever want to leave. Switching cost is the compounding advantage.

Built By

A Product Leader.
Not an Engineer.

Every line of code in DeepBench was written through AI-assisted development. Every architecture decision, design principle, product priority, and session rule was mine.

DeepBench is the artifact that proves what a product leader can build when they combine deep domain knowledge, strong architectural thinking, and the discipline to apply both consistently across 50+ documented build sessions.

I've spent my career at the intersection of enterprise technology, business strategy, and applied AI — with deep roots in government procurement intelligence. NIGP standards, vendor concentration risk, spend analysis, and compliance flags aren't features I designed abstractly — they're problems I've worked from the inside.

John Leonard

Product Leader · AI Systems Architect · Principal, Roadmap Venture

roadmapventure.com ↗ linkedin ↗

"My ultimate goal: build an agent that approaches answers the way I do — same diagnostic process, same reasoning arc, same domain priorities. At minimum, outputs equivalent to mine. Ultimately: faster and smarter than I could alone."

If I had the strength of a company behind this, it would be in market today. This is the proof the bones are there — and that I know exactly what to build when the team arrives.

Source files

~18K

Lines of code

Written kickoff specs

Build Your Parallel Workforce.

Three Steps.Your Parallel Workforce.

Three Ceilings.One Solution.

$372M Analyzed.11,711 Transactions.28 Seconds.

Everything Your Bench Needs.Nothing They Don't.

Every Claim.Verified Live.

Eight Specialists.Ready to Work Today.

Every AI Decision.Visible.

A Product Leader.Not an Engineer.

Your Bench IsReady to Work.