OpenMic · FluentPlay

01 / 09

PAD v2 · Speech signal analysis

What the signal
tells us

Every word spoken generates a structured record of motor planning stability — at the session, word, syllable, and phoneme level.

Per-phoneme resolution Adaptive personal baseline Longitudinal tracking Clinical override

OpenMic · FluentPlay

02 / 09

Resolution stack

Four levels of analysis,
every utterance

01 Session PAD score · stability floor · event cost · WPM · stutter count · disfluency rate · fluency index · σ variance · 3D PAD profile

02 Word PAD score · gap · duration · syllable rate · confidence · disfluency flags · challenge tag · POA · chunk position

03 Syllable Duration vs. speaker median · stress position · prolongation detection · voicing onset timing

04 Phoneme Per-phoneme duration · intra-phoneme repetition · acoustic onset · coda analysis · IPA mapping

Most speech tools report a fluency score. OpenMic reports a structured acoustic record at four resolution levels — any of which can be the target of measurement, practice, or research.

OpenMic · FluentPlay

03 / 09

Session-level view

The full picture of
a session's motor stability

PAD over session · per word

Disfluency feature stream · frame-level acoustic state

bettybutterboughtsomebutterbutshesaidbutter'sbitterifIputitinmybatteritwillmakemybatterbitterbutabitofbetterbutterwillmakemybatterbetter

3D PAD Profile · session result

OpenMic · FluentPlay

04 / 09

Word-level analysis

Every word is a
scored motor event

"bitter" PAD 68 · 2 syl · 495ms/syl

Base score100

Duration 495ms/syl (baseline 265ms)Penalty

Repetition detectedPenalty

Azure insertion error → part-wordPenalty

Final PAD68

ClassifiedPart-word repetition · Stutter

Place of articulationBilabial

Inter-word gap — silence before word onset, compared to that speaker's rolling median
Syllable rate — duration per syllable vs. speaker baseline; prolongation fires at 1.8×
Disfluency flags — block, prolongation, repetition, filler, articulation error, omission — scored independently
Stutter convergence — fires on multiple signals, not single threshold; clinically meaningful, not over-sensitive
Challenge word tag — feared words scored at elevated sensitivity; any flag = stutter classification
POA mapping — onset phoneme cross-referenced against user's declared motor difficulty zones
Dual-source detection — Azure word timing + DFS acoustic stream; each catches what the other misses

OpenMic · FluentPlay

05 / 09

Phoneme-level resolution

The instability inside the word

The acoustic signal resolves to individual phonemes — duration, voicing onset, and intra-phoneme repetition. A word can score 99% recognition confidence and still contain a detectable motor planning failure at the phoneme level.

Phonemes heard — "bitter"

B

419ms

IH

—

T

—

ER

—

ER

repeat

⚠ Intra-phoneme repetition · Coda B · 419ms · 1.8× word mean

Duration bar · phoneme timeline

What phoneme-level data unlocks

Intra-phoneme repetition — two voicing onsets within a single phoneme window; invisible to word-level scoring alone
Coda vs. onset asymmetry — where in the syllable the instability fires; critical for POA-targeted practice
Duration outliers per phoneme — which specific sound is held, not just that the word was slow
IPA mapping — phoneme identity cross-referenced against declared POA difficulty zones
Research relevance — per-phoneme duration and voicing onset data matches the variables measured in auditory feedback perturbation studies — now available in naturalistic speech, continuously

Why this matters for stuttering Most stuttering occurs at word-initial consonants, particularly bilabials and velars. Phoneme-level resolution tells you exactly which sound is failing, not just which word.

OpenMic · FluentPlay

06 / 09

Personalized baseline

Scoring is relative
to the speaker, not a norm

Adaptive baseline

Rolling 30-word window

Gap, duration, and rate deviations are scored against that speaker's own rolling median. A naturally slow speaker and a fast speaker can both score 100.

Challenge words

Feared word sensitivity

User-declared feared words scored at elevated sensitivity. Any disfluency flag on a challenge word is classified as a stutter — tracking anticipatory motor load directly.

birthday butter business

Motor difficulty zones

POA mapping

User declares which places of articulation cause motor difficulty. Any word whose onset phoneme uses a declared POA is scored with elevated sensitivity — independent of challenge word tags.

In practice: The PAD score for a given word reflects that speaker's motor planning reality — accounting for their typical rate, feared vocabulary, and known articulatory challenge zones. Progress is measured against the speaker's own history, longitudinally, across sessions.

OpenMic · FluentPlay

07 / 09

Waveform inspection layer

Any frame, any moment —
isolate and listen

Audio waveform · word region

start end

Isolated 159ms · drag handles to adjust · click to start new isolation
                — Azure recognition
                — Acoustic onset
              

Part-word repetition (Azure insertion)

Intra-phoneme acoustic detection: Phoneme F contains 2 voicing onsets in its 170ms audio window — Azure assigned a single phoneme but the acoustic signal shows 2 separate productions (F-F).

⚠ Also classified as stutter

Convergence rule(s) triggered · legacy classification

What this layer enables

Frame-level isolation — drag handles to select any window within a word's audio region; inspect exactly the moment of instability
Azure vs. acoustic comparison — two markers on the waveform show where Azure recognition fired vs. where the acoustic onset actually occurred; gap between them is measurable
Intra-phoneme detection — when the acoustic signal shows two productions inside a window Azure classified as one phoneme, the discrepancy is flagged and the region is isolatable
Playback modes — play the isolated selection, play the full canonical word, or play the word as produced; compare what was intended vs. what was delivered
Self-analysis — speaker can hear the exact frame where the motor plan broke down; not a score but a direct auditory confrontation with the event
Clinical use — SLP can isolate, annotate, and use the waveform as a teaching surface; ground truth override applies to the same word

"for" · PAD 70 · 1 syl · 410ms/syl A single-syllable function word at 410ms — 1.55× the speaker's baseline — with 2 voicing onsets inside the F phoneme window. Azure heard one phoneme. The acoustic signal recorded two attempts.

OpenMic · FluentPlay

08 / 09

Disfluency taxonomy + clinical layer

Six types. Auto-detected.
Clinician-correctable.

Block

Building state >400ms without voicing onset.

Prolongation

Duration >1.8× speaker median per syllable.

Repetition

Word or part-word; intra-phoneme detection.

Filler

Planning load signal, not motor disruption.

Articulation error

Phoneme substitution or distortion. Relevant for motor speech disorders.

Omission

Word skipped vs. scripted reference. Completeness signal.

Stutter fires only on convergence — two or more flags, or any flag on a challenge word.

Ground truth override

✓ Mark Fluent Stutter Block Prolongation Repetition Filler Articulation Omission

Override propagates immediately through all counters, filters, and session metrics. Every data point is editable — auto-detection is a starting point, not a verdict.

Roadmap: speaker model training Ground truth assignments accumulate into a speaker-specific recognition model — improving accuracy on that speaker's idiolect and reducing false positive rate over time.

OpenMic · FluentPlay

09 / 09

What can be practiced and measured

Every dimension is a
target for improvement

Fluency

Session PAD score
Stability floor
PAD variance (σ)
Block-free rate
Stutter frequency
Disfluency rate by type

Timing

Words per minute
Syllable rate (ms/syl)
Inter-word gap distribution
Prolongation frequency
Block duration
Phoneme duration outliers

Context

Challenge word hit rate
POA-specific disfluency rate
Stressed vs. unstressed instability
Session-over-session PAD trend
Scripted vs. unscripted comparison
Warm-up vs. fatigue window analysis

The instrument doesn't prescribe what to practice. It exposes every dimension of the motor planning signal — and tracks progress on whichever dimensions the speaker, clinician, or researcher chooses to target.

What the signaltells us

Four levels of analysis,every utterance

The full picture ofa session's motor stability

Every word is ascored motor event

The instability inside the word

What phoneme-level data unlocks

Scoring is relativeto the speaker, not a norm

Rolling 30-word window

Feared word sensitivity

POA mapping

Any frame, any moment —isolate and listen

What this layer enables

Six types. Auto-detected.Clinician-correctable.

Block

Prolongation

Repetition

Filler

Articulation error

Omission

Every dimension is atarget for improvement

What the signal
tells us

Four levels of analysis,
every utterance

The full picture of
a session's motor stability

Every word is a
scored motor event

Scoring is relative
to the speaker, not a norm

Any frame, any moment —
isolate and listen

Six types. Auto-detected.
Clinician-correctable.

Every dimension is a
target for improvement