The Problem
Ask an AI language model to write a training program and you get something that reads like expertise: confident prose, plausible rep schemes, the right vocabulary. The trouble is that the same fluent surface attaches just as easily to advice that isn't grounded in anything — a claim about deficit size, joint loading, or injury risk that sounds authoritative and is quietly wrong. On the page, a sourced recommendation and an invented one look identical. For something you're going to do with your own body for ten weeks, that gap matters.
There is a second problem, specific to training. A plan is only as good as whether it actually gets used — logged, referenced, adjusted week to week. A beautiful PDF that lives in a downloads folder changes nothing. The recommendation and the tool that carries it are really one problem: the plan has to be trustworthy and it has to be livable.
This project took both on at once, on a deliberately checkable subject — one of our own. Taylor was returning to training after time off, carrying a couple of real constraints (an achy hip, a recently-healed wrist). The goal was a program grounded claim-by-claim in published research, delivered through an app built to be opened every session.
The Approach
The work moved through a fixed sequence rather than open-ended prompting. Each stage produces something the next stage depends on, and the final program carries its claims back to verified sources.
- Frame the real question. Taylor came in with named goals (drop body fat at a muscle-sparing rate, rebuild an aerobic base, keep skill work alive) and named constraints (a hip to program around, a healed wrist to ramp carefully). The constraints were written down before any source was opened — the plan had to answer them, not ignore them.
- Gather and verify the evidence. Across running form, nutrition, resistance training, swimming, mobility, and even the safety of running with a dog, candidate sources were retrieved and captured as verified content. A claim couldn't enter the program unless a captured passage actually supported it. Where a publisher blocked automated access, the source was captured from a mirror of the same record instead — which is why a few claims are grounded against a published abstract rather than the full text, a limit the research writeup names directly.
- Ground every load-bearing claim. Nineteen findings became the backbone of the program — each tied to a specific passage of a specific source by a content fingerprint. Where the evidence was thin or drawn from a small or analogous study, the writeup says so rather than overstating it.
- Design the program around the findings. The research shaped real decisions: how aggressive the calorie deficit should be, which running cue to build and which to drop, whether a swim stroke was safe to chase. (More on the cue that changed below.)
- Build the app to carry it. The program was delivered through a logging application — not a static document — with two new tracking features added specifically for Taylor's needs.
- Audit before publishing. The public writeup was scanned for internal-terminology leakage, checked for honest disclosure of what the method does not guarantee, and held to a no-overclaim standard on every health-adjacent statement.
The Build
The program runs on a reusable training-app engine — a single-file web app that renders a structured program, lets the athlete log every set with a one-tap movement-quality marker, and shows the previous session's numbers inline so progress is visible at a glance. For Taylor's program, two new tracking features were added on top of the shared engine without disturbing it:
| App feature | What it does | Why it was added |
|---|---|---|
| Program & set logging | Renders the 10-week program; logs reps, load, and a Clean / Grind / Miss quality marker per set; surfaces last session's numbers inline | The shared engine — makes the plan livable, not just readable |
| Cardio tracker (new) | Captures distance, time, perceived effort, and heart rate for runs and swims in one row | This program is multi-modal — running and swimming needed first-class logging, not a notes field |
| Daily Fuel panel (new) | Logs calories, protein, carbs, fat, and bodyweight against the program's targets, flagging on-target vs. under | The nutrition findings are central to a muscle-sparing cut; the target had to be visible every day |
| Local-first storage | Everything saves on the device; an optional database sync is available but never required | Personal training data should stay private by default |
The finding that changed the plan. Going in, Taylor had a favorite running cue — lean forward and “push the ground out behind you.” The research didn't support it: the impact-loading benefit it was supposed to deliver didn't show up in trials, and the cue tends to add load at the ankle and Achilles, the exact area that was already taxed. The grounded answer was different — build form around a slightly quicker cadence and a whole-body lean from the ankle. A program written to sound good would have kept the cue Taylor already liked. Verification is what replaced it with the one the evidence backs.
The Outcome
The result is a working program Taylor actually trains on, and two artifacts anyone can inspect: a live demo of the app, and a research writeup where every load-bearing claim links to a verifiable source. Five training domains were covered; the recommendations that made it in are the ones a captured source supported.
The demo below is the real app with the database connection switched off: you can log a set, watch the quality marker cycle, open the Daily Fuel panel — and nothing leaves your browser. It comes pre-loaded with sample entries so the logging features show populated. The research writeup is published as a sibling page; every load-bearing claim in it carries a citation link you can open and read for yourself.
Technical details: how the app and the grounding work
For readers who want the working parts rather than the surface, this appendix describes the structure at category level. The same patterns carry across our other work.
One engine, many athletes
The app is a single-file progressive web app assembled from a small set of shared modules — program rendering, set logging, session history, and storage. A given athlete's program is just data injected into that engine at build time. The two features added for this program (the cardio tracker and the Daily Fuel panel) are additive: they activate only for a program that asks for them, so every other athlete's build is byte-for-byte unaffected. New capability without regression is the point of the architecture.
Local-first, sync optional
Training data is written to the device first and stays there. Syncing to a database is a configuration choice, not a requirement — and the public demo ships with it switched off entirely, so the demo page contains no account credentials and makes no network calls for your data. Privacy by default is a structural property here, not a promise.
The grounding standard
Every load-bearing claim in the program rests on a passage of verified source content, bound to it by a content fingerprint — not on a language model's unaided memory of what a study says, which is never acceptable as a citation. Two checks gate any claim before it ships:
- Ground-or-flag. Every factual claim is either tied to a captured source or explicitly marked as judgement. The reader is never left to guess which is which.
- Match check. The cited passage has to actually say what the claim says — topical match, not just a plausible-looking link. This is the check that caught the running-cue problem.
What this does not guarantee
No method removes every source of error, and presenting one as if it did would contradict the standard the rest of this page describes. What the grounding guarantees is bounded:
- It establishes that load-bearing claims rest on verified source content.
- It establishes that judgement calls are labelled as judgement.
- It establishes that every citation can be audited by the reader.
- It does not establish that the underlying studies are correct on the merits, or that a finding from a small or analogous study generalizes.
- It does not turn a grounded program into medical or dietary advice. A hip that hurts and an aggressive cut are a clinician's and a dietitian's domain — the program says so, and is built to be programmed around those constraints, not through them.
Disclosing these limits is part of the method, not a departure from it.
Verification discipline, applied where you can check the work
This program was built for one person, on a subject any reader can sanity-check, precisely so the method is visible. The same grounding and the same build discipline carry to research and tools where the stakes are higher and the work is harder to audit from the outside.
Get in Touch View More Projects