This article develops a general detection theory for speech analysis based on
time-varying autoregressive models, which themselves generalize the classical
linear predictive speech analysis framework. This theory leads to a
computationally efficient decision-theoretic procedure that may be applied to
detect the presence of vocal tract variation in speech waveform data. A
corresponding generalized likelihood ratio test is derived and studied both
empirically for short data records, using formant-like synthetic examples, and
asymptotically, leading to constant false alarm rate hypothesis tests for
changes in vocal tract configuration. Two in-depth case studies then serve to
illustrate the practical efficacy of this procedure across different time
scales of speech dynamics: first, the detection of formant changes on the scale
of tens of milliseconds of data, and second, the identification of glottal
opening and closing instants on time scales below ten milliseconds.Comment: 12 pages, 12 figures; revised versio