For clinicians

Measurement that respects her — and respects you

The app reports. The clinician judges. This page is the full measurement model behind that line.

Why accuracy alone doesn't work here

ALPINE is built to be errorless: the design keeps her error rate low on purpose, so a single accuracy percentage carries very little clinical information — it saturates near ceiling quickly and stops discriminating between "just met the target" and "flying." Instead of one accuracy number, learning shows up in two complementary ways:

  1. The support shift. The proportion of her sentence-builds completed with no error-triggered support at all, rising over sessions for a given target. Because our nudges only ever fire in response to an actual mismatch (not on a fixed schedule), this shift is a direct read on her growing independence.
  2. Fluency past the ceiling. Independent builds per minute, for a target she has already stopped needing support on. This keeps discriminating progress even after the support-shift has maxed out — the same "celeration" idea used in precision-teaching.

The support taxonomy

Every completed sentence-build is classified into exactly one of four levels, derived automatically from her taps — nothing extra for her to do, nothing visible to her.

LevelWhat it meansCounts toward advancement?
L0 — independentNo nudge ever fired, no reselection. A clean, unprompted build.Yes — the criterion metric
L0b — self-correctedShe changed a tile before finishing, but no nudge ever fired. Shown as its own signal — an emerging-mastery sign — but kept separate.No — displayed, not counted
L1 — visual nudgeAt least one gentle "look again" nudge fired for a tile that didn't fit the grammar rule.No
L2 — prompt redirectAt least one photo-mismatch redirect fired (the sentence didn't match what's in the photo).No

Only true L0 builds count toward the advance criterion below — L0b is deliberately excluded from that math, even though it is displayed, because a self-correction still involved an internal mismatch worth seeing separately from a fully anticipatory, unprompted build.

The advance criterion — evidence, not a verdict

The default reference criterion is about 90% independent (L0) builds across three sittings, on at least two distinct days. It is deliberately labeled a reference, not a rule: every threshold — the percentage, the number of sittings, the day-spread requirement — is configurable by the SLP for this learner. The figure traces back to a small published study (N = 3 children); that population caveat is displayed alongside the criterion everywhere it appears, not buried in a footnote.

A sitting is derived automatically, not hand-logged: it's the longest run of activity where no gap between events exceeds 30 minutes (also SLP-adjustable). This keeps "three sittings" meaningful even though the tablet is used in short, irregular bursts rather than scheduled clinical sessions.

The evidence table

The dashboard's central artifact is a per-target evidence table. Every row names what the data bears on — never what to do about it — and every figure shows its raw denominator so a small sample is never dressed up as a strong signal.

MetricWhat it showsWhat it's evidence forMinimum data
Independent rate% of builds at L0 (L0b excluded), this session and last three sessionsAdvancement, against the SLP's criterion≥5 builds/session
Support-shift trajectory%L0 vs %L0b vs %L1+L2, per session, over timeFading or adjusting support≥5 builds/session
Fluency rateIndependent builds per minute, within a sessionProgress past the accuracy ceiling≥3 builds
Trials-to-criterionBuilds needed until criterion is first metTeaching efficiency; comparing conditionsn/a
Cold-probe rateEach session's first build on a target, treated as a natural probeRegression flag; maintenance evidence≥3 probes
Error anatomyWhich grammar rules and tile choices trigger support, from the tap recordWhat to re-teach or emphasize next≥3 support events
Self-correction rateL0b / (L0 + L0b)Emerging-mastery nuance (display-only)≥5 builds
Side-bias checkTile position chosen vs. correct position, per columnGuessing alert≥20 taps/column
AbandonmentBuilds started but never finished, and where in the sessionEngagement≥10 builds

Rows phrase results as evidence ("9 of 10 independent, last 3 sessions; criterion reference 90%×3") — never as a readiness or regression verdict. Below the floor, the dashboard says so plainly: "insufficient data" rather than guessing.

The dashboard

The dashboard is the guardian- and clinician-facing view of everything above: a session log, a per-map view, and the export tools described below.

The guardian and clinician dashboard's sign-in page.
Dashboard sign-in.
The dashboard's Practice tab, showing a session log, a per-map view, and export buttons.
Session log, per-map view, and exports.
A close-up of the dashboard's CSV export buttons.
CSV exports, close up.

The CSV exports

Every trial is captured as detailed tap-by-tap telemetry, invisible to her and free of any extra step on her side. The CSV exports are the primary clinical artifact — built so an SLP can bring the data into her own tools rather than being limited to our charts.

ExportGrainContains
Trials One row per sentence-build IDs and timestamps, pack and target, support level (L0/L0b/L1/L2), error anatomy summary, condition stamp, cold-probe flag, sitting membership.
Taps One row per tile tap, long format Timestamp within trial, slot and tile identity, position in column, reselection flag, feedback fired (continue / look-again / prompt-mismatch / reward), and any rule violated.
Pack registry One row per pack version Slots, choice sizes, distractor classes, grammar targets, and sentence length — joined against trials so a condition or pack change is always interpretable in context.

The one chart the dashboard always shows is a per-target time series: percent independent (solid line), percent self-corrected (dotted), percent supported (muted), with phase markers at any pack or map change and cold-probe points overlaid — no smoothing, no cumulative tricks.

What the app deliberately does not do

A short list of anti-metrics commitments, held on purpose:

  • No overall accuracy headline
  • No mastery badges
  • No cross-target score
  • No auto-advancement
  • No camera, gaze, or affect capture
  • No learner-visible metrics
  • No third-party analytics

The app reports. The clinician judges.

All data belongs to the family: single-tenant storage, indefinite retention until the family deletes it, access limited to one account. Every criterion default is SLP-configurable, and every population caveat behind a default (like the N = 3 study behind the 90%×3 reference) is shown alongside the number, not hidden.

Episode 3

The adult side: practice mode, packs, advancing maps

Episode 5

Measurement: dashboard, evidence, CSVs