Software that's smarter every month
than the day you shipped it.
Dendra is a classification primitive that wraps any decision site in your codebase with a self-improving lifecycle: rule today, LLM in shadow tomorrow, a sub-millisecond ML head when evidence has earned it. The rule stays behind everything as the safety floor.
Python ships in v1.0. TypeScript, Go, Rust, and Java clients are v1.1 follow-ons, alongside a WASM build of the gate primitive — so every client will share one audit-chain format and one statistical contract.
Already audited the eight LLM-broker libraries enterprise teams build on:
LangChain, LlamaIndex, Haystack, AutoGen, CrewAI, DSPy, LiteLLM, Instructor.
10,889 files scanned, 919 classification sites surfaced. To re-derive on any of them,
pip install dendra && dendra analyze /path/to/repo.
Real analyzer output, real codebases
See the classification sites Dendra finds in real OSS Python repos. And the LLM bill they'd retire.
Pick a repo. Drag the sliders. Watch the savings move. Every row below comes from the v1.0 analyzer running on the real source.
v1.1 Run live on any public GitHub repo
Live custom-repo analysis ships in v1.1 (Pyodide-in-browser;
the analyzer fetches the repo from GitHub and runs in
your tab — your code never touches our server). For now,
pip install dendra && dendra analyze .
runs the same scan on your machine in a few seconds.
Loading marimo…
| file:line | function | pattern? | labels? | regime? | fit? |
|---|---|---|---|---|---|
| Picking a preset… | |||||
If this repo's LLM bill looked like…
Sliders are your inputs; the savings number is what graduating to in-process ML heads recovers, weighted by how many of the detected sites are realistically Dendra-fit (score ≥ 3.0).
Click a provider to set the slider to a typical classifier-call cost at that rate.
Want to run this on your private repo? Same scan, same JSON, on your own machine in under a minute.
5 minutes on your machine
Install Dendra and see your first classification site, locally.
The walkthrough below is the same path you'd run in your terminal right now. Steps 4 and 7 are optional (auto-lift and the MCP server); the other five are the core install path. Every command is copy-able. Every output snippet is from the v1.0 CLI, not a mockup.
-
1
Install
No GPU, no torch, no service deps. Sklearn-class only.
pip install dendraSuccessfully installed dendra-1.0.0~30s on a warm pip cache, up to a minute cold.
uv pip install dendraworks too if you've moved on. -
2
Find your classification sites
Pure-Python AST scan. No network, no upload — your code stays on your machine.
dendra analyze .Dendra static analyzer — classification sites ============================================================ Root: /Users/you/myapp Files scanned: 84 Sites found: 7 file:line function ptn labels regime fit ------------------------------------------------------------------------ src/triage.py:14 triage_ticket P1 3 narrow 5.0 src/router.py:88 route_intent P1 7 narrow 4.5 src/moderation.py:31 classify_post P1 3 narrow 4.5 … By regime: narrow: 5 high_card: 1 unknown: 1 Next step: dendra init <file>:<function> --author @you:teamRuns in ~270 ms on 1,000 files. The same JSON shape the website demo uses; pipe to
--jsonif you want it programmatic. -
3
Preview the wrapper
See the AST-injected diff before you commit a single line.
dendra init src/triage.py:triage_ticket --author @you:team --dry-run# would modify src/triage.py: 3 labels (inferred), phase=RULE --- src/triage.py (before dendra init) +++ src/triage.py (after dendra init) @@ -1,3 +1,10 @@ +from dendra import ml_switch, Phase, SwitchConfig + +@ml_switch( + labels=['auto_close', 'escalate', 'queue'], + author='@you:team', + config=SwitchConfig(phase=Phase.RULE), +) def triage_ticket(ticket: dict) -> str: title = (ticket.get("title") or "").lower()The body of your function is unchanged. Drop
--dry-runwhen you're ready and the file is rewritten in place. -
4
Auto-lift the wrapper (optional)
Extract per-branch handlers and hidden-state evidence into a Switch subclass, so the LLM/ML head can graduate.
dendra init src/triage.py:triage_ticket --author @you:team --auto-lift# would modify src/triage.py and create __dendra_generated__/triage__triage_ticket.py --- src/triage.py (before) +++ src/triage.py (after) @@ -1,3 +1,5 @@ +from __dendra_generated__.triage__triage_ticket import TriageTicketSwitch + def triage_ticket(ticket: dict) -> str: - # original body unchanged - ... + return TriageTicketSwitch().dispatch(ticket).label +++ __dendra_generated__/triage__triage_ticket.py (new) @@ -0,0 +1,18 @@ +from dendra import Switch + +class TriageTicketSwitch(Switch): + def _evidence_title(self, ticket: dict) -> str: + return (ticket.get("title") or "").lower() + + def _rule(self, evidence) -> str: + if "crash" in evidence.title or "outage" in evidence.title: + return "escalate" + if evidence.title.endswith("?"): + return "queue" + return "auto_close" + + def _on_escalate(self, ticket): ... # extracted side effect + def _on_queue(self, ticket): ... # extracted side effect + def _on_auto_close(self, ticket): ... # extracted side effect--auto-liftadds the per-branch (_on_*) and per-evidence (_evidence_*) machinery the analyzer would otherwise flag as missing. Skip it on simple sites; reach for it once a site grows hidden state or branch-specific side effects. -
5
Run your code as usual
No command. The wrapper does the work.
Outcomes log silently to
runtime/dendra/<switch>/every time your app callstriage_ticket(...). Phase 0: the rule still decides; the switch just records. Hand the same record IDs toswitch.record_verdict(record_id, Verdict.CORRECT)when downstream signals (CSAT, resolution code, human review) tell you whether the classification was right. -
6
See your ROI
Self-measured from your own outcome logs. Knobs are exposed for your own assumptions.
dendra roi runtime/dendra/Dendra ROI report ============================================================ Sites tracked: 1 Outcomes logged: 4,210 Verified outcomes: 3,876 (92%) Per-site projection: triage_ticket $1,200/mo low - $3,800/mo high (eng-cost saved + LLM-bill avoided post-graduation) Total projected annual value: $14,400 - $45,600Override the assumption bands with
--monthly-value-low/--monthly-value-high/--engineer-cost-per-week, or pipe--jsoninto your own dashboard. -
7
Drive Dendra from your IDE (optional)
Dendra ships an MCP server (stdio) so Claude Code, Cursor, and any MCP-aware client can drive the CLI directly.
dendra mcp// ~/.config/claude-code/mcp.json (or your client's equivalent) { "mcpServers": { "dendra": { "command": "dendra", "args": ["mcp"] } } }Same 14 CLI verbs (
analyze,init,roi,benchmark,report,graduate,verdict, ...), reachable as MCP tools. Your assistant callsdendra initanddendra benchmarkfor you, with deterministic output.
That's it. The same path runs on every classifier in your codebase.
Close the loop
Prove the savings on your data.
Dendra ships a benchmark + report harness that turns
projected savings into measured savings on
your repo, with your data. Run
dendra benchmark <site> after every
change to a switch (rule edit, label add, evidence
tweak); dendra report rolls every recorded
run up into a per-switch timeseries (phase transitions,
cost per call, cumulative dollars saved).
The same JSON the report emits is what
--json returns, so you can pipe it into
whatever dashboard your team already runs. Nothing
leaves your machine.
$ dendra benchmark src/triage.py:triage_ticket 3 passed in 0.04s triage_ticket: first run (phase=RULE, cost $0.00002-$0.00032/call) $ dendra report Dendra report - 1 switch, 14 days of data triage_ticket RULE -> MODEL_PRIMARY 2026-05-04 (after 312 verdicts) cost: $0.0042 -> $0.00031/call (-92%) estimated saved this week: $128
Wrap your if/else. Walk away. Come back to a
classifier paid in microseconds.
from dendra import ml_switch
from myapp.support import auto_close, queue_for_human, escalate_to_oncall
# Each label is paired with a downstream action.
# The decorator wires classification, dispatch, and outcome
# logging at the call site. The body is the exact if/else
# your team would have inlined anyway.
@ml_switch(labels={
"auto_close": auto_close,
"queue": queue_for_human,
"escalate": escalate_to_oncall,
})
def triage_ticket(ticket: dict) -> str:
title = (ticket.get("title") or "").lower()
if "crash" in title or "outage" in title:
return "escalate"
if title.endswith("?"):
return "queue"
return "auto_close"
triage_ticket(ticket) # classifies AND fires the matching handler
Zero behavior change on day one. Your team ships the exact
if/else they would have inlined
anyway. Dendra does the work that usually takes a six-month
ML migration project: outcome logging, training, shadow
evaluation, graduation. By month six, that same
triage_ticket(ticket) call is paid in
microseconds and fractions of a cent. Nobody touches the
call site.
What you actually ship
An LLM bill that retires itself
Most production classifications eventually accumulate enough evidence to retire the LLM tier. Dendra is the substrate that earns it — automatically, with a statistical floor under every promotion. By month six you're paying microseconds and pennies, not seconds and dollars, for the same call.
Code that compounds
Every classification site is a wrapper that gathers evidence on its own clock. The decision quality improves with use. The migration to ML never blocks a sprint. The call site is permanent — only the brain underneath moves.
A rule still behind everything
Your hand-written rule is preserved structurally at every phase. When ML is uncertain, falls through to LLM. When LLM is uncertain, falls through to rule. Safety-critical sites cap at "ML-with-fallback" — no ML-primary for authorization, ever, by construction.
How it works
The mechanism, for the curious.
None of what follows is required reading to use Dendra —
pip install dendra and the decorator above
are sufficient. The rest of this page is for the engineer
asking "yes, but how do you actually know when to
graduate?"
Six phases. Three evidence-gated graduations. One rule floor.
Each phase routes decisions the same way every time. The lifecycle only graduates to the next phase once enough outcome evidence has accumulated to prove the upgrade is real, not a coincidence. The bar is conservative by default and the math is in the paper for those who want to verify it.
-
RULEYour function decides.
-
MODEL_SHADOWRule decides; LLM watches.
-
MODEL_PRIMARYLLM decides; rule is fallback.
-
ML_SHADOWML trains from outcomes.
-
ML_WITH_FALLBACKML decides; on uncertainty falls through LLM, then rule.
-
ML_PRIMARYML decides; circuit breaker reverts to rule on anomaly. safety_critical=True refuses this phase at construction.
Predecessor cascade. Each phase's low-confidence fallback is its predecessor's full routing. Phase 4 → Phase 3 → Phase 2 → Rule, in the order each tier was earned. Promotions add tiers; uncertainty walks them back down. The rule fires when every learnable tier above it is below threshold.
Drift detection rides along. The same
evidence test runs in reverse: if accumulated outcomes
show the rule has reclaimed the lead, the lifecycle
demotes by one phase. Safety-critical sites cap at
ML_WITH_FALLBACK — no ML-primary for
authorization decisions, ever, by construction.
Eight public benchmarks across four text domains. All clear the gate.
Eight text benchmarks across intent classification (ATIS, HWU64, Banking77, CLINC150, Snips), question categorization (TREC-6), news topics (AG News), and programming-language detection (codelangs, including FORTRAN sourced from NJOY2016 nuclear-data processing code). Every benchmark cleared the evidence gate at p < 0.01. Most graduated within 250 outcomes; the slowest needed 2,000 (Snips, where the rule briefly beat the ML head — the gate held until the ML head overtook).
Three regimes by cardinality × rule-keyword affinity
Regime I — rule near optimum
codelangs (12 langs, 87.8% rule).
Rigid syntax keywords (def,
function, subroutine) leave
the rule within ~10 points of ML. Graduation is
evidence-justifiable but the lift is modest; the
lifecycle's value is audit chain and drift symmetry.
Regime II — rule usable
ATIS (70.0% rule), TREC-6 (43.0% rule). Mid-cardinality with strong-to-moderate keyword affinity. The rule ships on day one. ML decisively wins (88.7%, 85.2%) and the gate clears at the first 250-outcome checkpoint.
Regime III — rule at floor
HWU64, Banking77, CLINC150 by high cardinality; Snips, AG News by weak keyword affinity. Rule is at-or-near chance on day one. Dendra's role here is cold-start substrate — outcome logging while a zero-shot LLM runs in front, trained head warms up underneath.
The mechanism is modality-agnostic by design. The gate operates on right/wrong outcome streams that any classifier produces — text, image, audio, or structured data. Image and audio benchmarks with pretrained-embedding heads (CLIP, ViT, Wav2Vec2) ship in a companion paper.
dendra bench →
Paper preprint coming with v1.0 launch (arXiv, May 2026).
The same evidence that lifts accuracy retires the LLM bill.
When the lifecycle reaches its final phase, every classification has earned two things at once: the right to trust the ML head's accuracy, and the right to skip the LLM tier permanently. Latency drops from hundreds of milliseconds to sub-millisecond. Per-call cost drops to essentially zero. The math, illustrative not predictive, at three scales:
| Workload | LLM tier (~$0.005/call) | ML head (post-graduation) | Annualized swing |
|---|---|---|---|
| 10⁴/day · small SaaS | ~$50/day | ~$0/day | ~$18k/year |
| 10⁶/day · mid-scale | ~$5,000/day | ~$0/day | ~$1.8M/year |
| 10⁸/day · large platform | ~$500,000/day | ~$0/day | ~$182M/year |
Linear in unit cost. A frontier-tier model at $0.05/call multiplies the savings 10×; a small-tier model at $0.0005 divides them 10×. The shape that matters is constant: once the lifecycle has graduated, the ML-head tier is essentially free per call. Workloads that started on an LLM because their rule was a non-starter see the most visible economic effect.
Find the classifiers in your codebase. Free. 30 seconds.
dendra analyze ./my-repo
Runs entirely locally. No upload. No signup. Walks your Python source, identifies classification decision points via six AST patterns, scores each for Dendra-fit, and outputs a JSON artifact for CI diff tracking.
$ dendra analyze ./my-repo Scanned 12,408 Python files; found 7 classification sites. src/support/triage.py:42 — 5 labels, medium cardinality Dendra-fit: 4.5/5 Regime: narrow-domain rule-viable (ATIS-like) Estimated: rule accuracy ~70%; ML would add ~15-20pp after ~500 outcomes src/mod/content_score.py:88 — 3 labels, binary-ish Dendra-fit: 4/5 Regime: safety-critical boundary Recommend: Phase 4 cap (ML_WITH_FALLBACK, never ML_PRIMARY) ... 5 more Report written to .dendra/analyze-2026-04-22.json
Volume-based pricing. No per-seat fees. Free forever for the library.
| Tier | Price | Classifications / mo |
|---|---|---|
| OSS library | Free — install now → | unlimited (self-hosted) |
| Free hosted | $0 | 10,000 |
| Solo | $19/mo | 100,000 |
| Team | $99/mo | 1,000,000 |
| Pro | $499/mo | 10,000,000 |
| Scale | $2,499/mo | 100,000,000 |
| Metered | $0.01 / 1k | above Scale |
| Enterprise | Custom | Custom |
Every paid tier has a published price. No "contact us" gating below Enterprise. Volume-priced so adding another classifier doesn't cost you another seat. Cancel anytime.
Where Dendra has measurable impact today
- Customer-support triage
- Chatbot intent routing
- LLM output moderation / PII filtering
- Fraud and anomaly triage
- SOC alert classification
- Content moderation
- Clinical coding (ICD-10, CPT)
- RAG retrieval-strategy selection
- Agent tool routing
Four more categories in the full list.
What this replaces
| Approach | What it solves | Where Dendra differs |
|---|---|---|
| LLM-response caching (LiteLLM, langfuse cache) |
Cuts repeat-call cost on identical inputs. | Dendra retires the LLM tier permanently when an in-process ML head clears the gate — not only on cache hits. |
| Fine-tuning the LLM | Better LLM accuracy on your domain. | Dendra's terminal classifier is not a tuned LLM. It's a sklearn-class in-process head: sub-ms, ~$0/call, no GPU, no inference API. |
| Roll your own ML migration | Full control over outcome plumbing, training, shadow rollout. | Dendra is one decorator at every classification site, with a uniform statistical contract and audit chain — and the rule preserved as the safety floor. |
| Feature flags + manual ramp | Operator-driven rollout with an off switch. | Dendra's gate is statistical, not vibes-based: graduation only fires when accumulated outcomes prove the upgrade is real. |
When Dendra isn't the right fit. If your LLM bill is dominated by generation, summarization, or agent loops rather than classification, the savings story scales down — you'll get the safety-floor and audit-chain benefits but not the "$5,000/day retires itself" story. If you have abundant day-one labeled data, train a classifier directly. If your verdicts never arrive (no downstream signal, no human review, no inferred outcome), the gate has nothing to fire on.
Your AI coding assistant already knows how to install Dendra.
Dendra ships a SKILL.md that Claude Code, Cursor, and Copilot Workspaces can load as context. Just ask:
"Add Dendra to the triage function in
src/support/triage.py."
Your assistant will wrap the function, add the import, infer the labels, and leave a minimal, reviewable diff.
dendra init src/support/triage.py:triage --author "@you:team"
dendra init is the deterministic CLI path
that skips the LLM's risk of hallucinating decorator
syntax. Your assistant should reach for it by default.