For clinicians — ALPINE AI

Why accuracy alone doesn't work here

ALPINE is built to be errorless: the design keeps her error rate low on purpose, so a single accuracy percentage carries very little clinical information — it saturates near ceiling quickly and stops discriminating between "just met the target" and "flying." Instead of one accuracy number, learning shows up in two complementary ways:

The support shift. The proportion of her sentence-builds completed with no error-triggered support at all, rising over sessions for a given target. Because our nudges only ever fire in response to an actual mismatch (not on a fixed schedule), this shift is a direct read on her growing independence.
Fluency past the ceiling. Independent builds per minute, for a target she has already stopped needing support on. This keeps discriminating progress even after the support-shift has maxed out — the same "celeration" idea used in precision-teaching.

The support taxonomy

Every completed sentence-build is classified into exactly one of four levels, derived automatically from her taps — nothing extra for her to do, nothing visible to her.

Level	What it means	Counts toward advancement?
L0 — independent	No nudge ever fired, no reselection. A clean, unprompted build.	Yes — the criterion metric
L0b — self-corrected	She changed a tile before finishing, but no nudge ever fired. Shown as its own signal — an emerging-mastery sign — but kept separate.	No — displayed, not counted
L1 — visual nudge	At least one gentle "look again" nudge fired for a tile that didn't fit the grammar rule.	No
L2 — prompt redirect	At least one photo-mismatch redirect fired (the sentence didn't match what's in the photo).	No

Only true L0 builds count toward the advance criterion below — L0b is deliberately excluded from that math, even though it is displayed, because a self-correction still involved an internal mismatch worth seeing separately from a fully anticipatory, unprompted build.

The advance criterion — evidence, not a verdict

The default reference criterion is about 90% independent (L0) builds across three sittings, on at least two distinct days. It is deliberately labeled a reference, not a rule: every threshold — the percentage, the number of sittings, the day-spread requirement — is configurable by the SLP for this learner. The figure traces back to a small published study (N = 3 children); that population caveat is displayed alongside the criterion everywhere it appears, not buried in a footnote.

A sitting is derived automatically, not hand-logged: it's the longest run of activity where no gap between events exceeds 30 minutes (also SLP-adjustable). This keeps "three sittings" meaningful even though the tablet is used in short, irregular bursts rather than scheduled clinical sessions.

The evidence table

The dashboard's central artifact is a per-target evidence table. Every row names what the data bears on — never what to do about it — and every figure shows its raw denominator so a small sample is never dressed up as a strong signal.

Metric	What it shows	What it's evidence for	Minimum data
Independent rate	% of builds at L0 (L0b excluded), this session and last three sessions	Advancement, against the SLP's criterion	≥5 builds/session
Support-shift trajectory	%L0 vs %L0b vs %L1+L2, per session, over time	Fading or adjusting support	≥5 builds/session
Fluency rate	Independent builds per minute, within a session	Progress past the accuracy ceiling	≥3 builds
Trials-to-criterion	Builds needed until criterion is first met	Teaching efficiency; comparing conditions	n/a
Cold-probe rate	Each session's first build on a target, treated as a natural probe	Regression flag; maintenance evidence	≥3 probes
Error anatomy	Which grammar rules and tile choices trigger support, from the tap record	What to re-teach or emphasize next	≥3 support events
Self-correction rate	L0b / (L0 + L0b)	Emerging-mastery nuance (display-only)	≥5 builds
Side-bias check	Tile position chosen vs. correct position, per column	Guessing alert	≥20 taps/column
Abandonment	Builds started but never finished, and where in the session	Engagement	≥10 builds

Rows phrase results as evidence ("9 of 10 independent, last 3 sessions; criterion reference 90%×3") — never as a readiness or regression verdict. Below the floor, the dashboard says so plainly: "insufficient data" rather than guessing.

The dashboard

The dashboard is the guardian- and clinician-facing view of everything above: a session log, a per-map view, and the export tools described below.

The guardian and clinician dashboard's sign-in page. — Dashboard sign-in.

The dashboard's Practice tab, showing a session log, a per-map view, and export buttons. — Session log, per-map view, and exports.

A close-up of the dashboard's CSV export buttons. — CSV exports, close up.

The CSV exports

Every trial is captured as detailed tap-by-tap telemetry, invisible to her and free of any extra step on her side. The CSV exports are the primary clinical artifact — built so an SLP can bring the data into her own tools rather than being limited to our charts.

Export	Grain	Contains
Trials	One row per sentence-build	IDs and timestamps, pack and target, support level (L0/L0b/L1/L2), error anatomy summary, condition stamp, cold-probe flag, sitting membership.
Taps	One row per tile tap, long format	Timestamp within trial, slot and tile identity, position in column, reselection flag, feedback fired (continue / look-again / prompt-mismatch / reward), and any rule violated.
Pack registry	One row per pack version	Slots, choice sizes, distractor classes, grammar targets, and sentence length — joined against trials so a condition or pack change is always interpretable in context.

The one chart the dashboard always shows is a per-target time series: percent independent (solid line), percent self-corrected (dotted), percent supported (muted), with phase markers at any pack or map change and cold-probe points overlaid — no smoothing, no cumulative tricks.

What the app deliberately does not do

A short list of anti-metrics commitments, held on purpose:

No overall accuracy headline
No mastery badges
No cross-target score
No auto-advancement
No camera, gaze, or affect capture
No learner-visible metrics
No third-party analytics

The app reports. The clinician judges.

All data belongs to the family: single-tenant storage, indefinite retention until the family deletes it, access limited to one account. Every criterion default is SLP-configurable, and every population caveat behind a default (like the N = 3 study behind the 90%×3 reference) is shown alongside the number, not hidden.

Episode 3

The adult side: practice mode, packs, advancing maps

Episode 5

Measurement that respects her — and respects you