OpenMic · FluentPlay
01 / 09
PAD v2 · Speech signal analysis

What the signal
tells us

Every word spoken generates a structured record of motor planning stability — at the session, word, syllable, and phoneme level.

Per-phoneme resolution Adaptive personal baseline Longitudinal tracking Clinical override
OpenMic · FluentPlay
02 / 09
Resolution stack

Four levels of analysis,
every utterance

01 Session PAD score · stability floor · event cost · WPM · stutter count · disfluency rate · fluency index · σ variance · 3D PAD profile
02 Word PAD score · gap · duration · syllable rate · confidence · disfluency flags · challenge tag · POA · chunk position
03 Syllable Duration vs. speaker median · stress position · prolongation detection · voicing onset timing
04 Phoneme Per-phoneme duration · intra-phoneme repetition · acoustic onset · coda analysis · IPA mapping

Most speech tools report a fluency score. OpenMic reports a structured acoustic record at four resolution levels — any of which can be the target of measurement, practice, or research.

OpenMic · FluentPlay
03 / 09
Session-level view

The full picture of
a session's motor stability

PAD over session · per word
Disfluency feature stream · frame-level acoustic state
bettybutterboughtsomebutterbutshesaidbutter'sbitterifIputitinmybatteritwillmakemybatterbitterbutabitofbetterbutterwillmakemybatterbetter
3D PAD Profile · session result
Stability Block-free Mean PAD WPM Fluency Voiced 81 100 84 45 76 91
Session PAD84
Floor93
Stutters detected5
Disfluencies7
Challenge word hits0
Top POABilabial (5)
Words35
Duration17s
OpenMic · FluentPlay
04 / 09
Word-level analysis

Every word is a
scored motor event

"bitter" PAD 68 · 2 syl · 495ms/syl
Base score100
Duration 495ms/syl (baseline 265ms)Penalty
Repetition detectedPenalty
Azure insertion error → part-wordPenalty
Final PAD68
ClassifiedPart-word repetition · Stutter
Place of articulationBilabial
  • Inter-word gap — silence before word onset, compared to that speaker's rolling median
  • Syllable rate — duration per syllable vs. speaker baseline; prolongation fires at 1.8×
  • Disfluency flags — block, prolongation, repetition, filler, articulation error, omission — scored independently
  • Stutter convergence — fires on multiple signals, not single threshold; clinically meaningful, not over-sensitive
  • Challenge word tag — feared words scored at elevated sensitivity; any flag = stutter classification
  • POA mapping — onset phoneme cross-referenced against user's declared motor difficulty zones
  • Dual-source detection — Azure word timing + DFS acoustic stream; each catches what the other misses
OpenMic · FluentPlay
05 / 09
Phoneme-level resolution

The instability inside the word

The acoustic signal resolves to individual phonemes — duration, voicing onset, and intra-phoneme repetition. A word can score 99% recognition confidence and still contain a detectable motor planning failure at the phoneme level.

Phonemes heard — "bitter"
B
419ms
IH
T
ER
ER
repeat
⚠ Intra-phoneme repetition · Coda B · 419ms · 1.8× word mean
Duration bar · phoneme timeline

What phoneme-level data unlocks

  • Intra-phoneme repetition — two voicing onsets within a single phoneme window; invisible to word-level scoring alone
  • Coda vs. onset asymmetry — where in the syllable the instability fires; critical for POA-targeted practice
  • Duration outliers per phoneme — which specific sound is held, not just that the word was slow
  • IPA mapping — phoneme identity cross-referenced against declared POA difficulty zones
  • Research relevance — per-phoneme duration and voicing onset data matches the variables measured in auditory feedback perturbation studies — now available in naturalistic speech, continuously
Why this matters for stuttering Most stuttering occurs at word-initial consonants, particularly bilabials and velars. Phoneme-level resolution tells you exactly which sound is failing, not just which word.
OpenMic · FluentPlay
06 / 09
Personalized baseline

Scoring is relative
to the speaker, not a norm

Adaptive baseline

Rolling 30-word window

Gap, duration, and rate deviations are scored against that speaker's own rolling median. A naturally slow speaker and a fast speaker can both score 100.

Challenge words

Feared word sensitivity

User-declared feared words scored at elevated sensitivity. Any disfluency flag on a challenge word is classified as a stutter — tracking anticipatory motor load directly.

birthday butter business
Motor difficulty zones

POA mapping

User declares which places of articulation cause motor difficulty. Any word whose onset phoneme uses a declared POA is scored with elevated sensitivity — independent of challenge word tags.

In practice: The PAD score for a given word reflects that speaker's motor planning reality — accounting for their typical rate, feared vocabulary, and known articulatory challenge zones. Progress is measured against the speaker's own history, longitudinally, across sessions.

OpenMic · FluentPlay
07 / 09
Waveform inspection layer

Any frame, any moment —
isolate and listen

Audio waveform · word region
start end
Isolated 159ms · drag handles to adjust · click to start new isolation
— Azure recognition — Acoustic onset
Part-word repetition (Azure insertion)

Intra-phoneme acoustic detection: Phoneme F contains 2 voicing onsets in its 170ms audio window — Azure assigned a single phoneme but the acoustic signal shows 2 separate productions (F-F).

⚠ Also classified as stutter
Convergence rule(s) triggered · legacy classification

What this layer enables

  • Frame-level isolation — drag handles to select any window within a word's audio region; inspect exactly the moment of instability
  • Azure vs. acoustic comparison — two markers on the waveform show where Azure recognition fired vs. where the acoustic onset actually occurred; gap between them is measurable
  • Intra-phoneme detection — when the acoustic signal shows two productions inside a window Azure classified as one phoneme, the discrepancy is flagged and the region is isolatable
  • Playback modes — play the isolated selection, play the full canonical word, or play the word as produced; compare what was intended vs. what was delivered
  • Self-analysis — speaker can hear the exact frame where the motor plan broke down; not a score but a direct auditory confrontation with the event
  • Clinical use — SLP can isolate, annotate, and use the waveform as a teaching surface; ground truth override applies to the same word
"for" · PAD 70 · 1 syl · 410ms/syl A single-syllable function word at 410ms — 1.55× the speaker's baseline — with 2 voicing onsets inside the F phoneme window. Azure heard one phoneme. The acoustic signal recorded two attempts.
OpenMic · FluentPlay
08 / 09
Disfluency taxonomy + clinical layer

Six types. Auto-detected.
Clinician-correctable.

Block

Building state >400ms without voicing onset.

Prolongation

Duration >1.8× speaker median per syllable.

Repetition

Word or part-word; intra-phoneme detection.

Filler

Planning load signal, not motor disruption.

Articulation error

Phoneme substitution or distortion. Relevant for motor speech disorders.

Omission

Word skipped vs. scripted reference. Completeness signal.

Stutter fires only on convergence — two or more flags, or any flag on a challenge word.

Ground truth override
✓ Mark Fluent Stutter Block Prolongation Repetition Filler Articulation Omission

Override propagates immediately through all counters, filters, and session metrics. Every data point is editable — auto-detection is a starting point, not a verdict.

Roadmap: speaker model training Ground truth assignments accumulate into a speaker-specific recognition model — improving accuracy on that speaker's idiolect and reducing false positive rate over time.
OpenMic · FluentPlay
09 / 09
What can be practiced and measured

Every dimension is a
target for improvement

Fluency
  • Session PAD score
  • Stability floor
  • PAD variance (σ)
  • Block-free rate
  • Stutter frequency
  • Disfluency rate by type
Timing
  • Words per minute
  • Syllable rate (ms/syl)
  • Inter-word gap distribution
  • Prolongation frequency
  • Block duration
  • Phoneme duration outliers
Context
  • Challenge word hit rate
  • POA-specific disfluency rate
  • Stressed vs. unstressed instability
  • Session-over-session PAD trend
  • Scripted vs. unscripted comparison
  • Warm-up vs. fatigue window analysis

The instrument doesn't prescribe what to practice. It exposes every dimension of the motor planning signal — and tracks progress on whichever dimensions the speaker, clinician, or researcher chooses to target.