373 research outputs found

    Time-Varying Autoregressions in Speech: Detection Theory and Applications

    Full text link
    This article develops a general detection theory for speech analysis based on time-varying autoregressive models, which themselves generalize the classical linear predictive speech analysis framework. This theory leads to a computationally efficient decision-theoretic procedure that may be applied to detect the presence of vocal tract variation in speech waveform data. A corresponding generalized likelihood ratio test is derived and studied both empirically for short data records, using formant-like synthetic examples, and asymptotically, leading to constant false alarm rate hypothesis tests for changes in vocal tract configuration. Two in-depth case studies then serve to illustrate the practical efficacy of this procedure across different time scales of speech dynamics: first, the detection of formant changes on the scale of tens of milliseconds of data, and second, the identification of glottal opening and closing instants on time scales below ten milliseconds.Comment: 12 pages, 12 figures; revised versio

    Neurophysiological Vocal Source Modeling for Biomarkers of Disease

    Get PDF
    Speech is potentially a rich source of biomarkers for detecting and monitoring neuropsychological disorders. Current biomarkers typically comprise acoustic descriptors extracted from behavioral measures of source, filter, prosodic and linguistic cues. In contrast, in this paper, we extract vocal features based on a neurocomputational model of speech production, reflecting latent or internal motor control parameters that may be more sensitive to individual variation under neuropsychological disease. These features, which are constrained by neurophysiology, may be resilient to artifacts and provide an articulatory complement to acoustic features. Our features represent a mapping from a low-dimensional acoustics-based feature space to a high-dimensional space that captures the underlying neural process including articulatory commands and auditory and somatosensory feedback errors. In particular, we demonstrate a neurophysiological vocal source model that generates biomarkers of disease by modeling vocal source control. By using the fundamental frequency contour and a biophysical representation of the vocal source, we infer two neuromuscular time series whose coordination provides vocal features that are applied to depression and Parkinson’s disease as examples. These vocal source coordination features alone, on a single held vowel, outperform or are comparable to other features sets and reflect a significant compression of the feature space.United States. Air Force (Contract No. FA8721-05-C-0002)United States. Air Force (Contract No. FA8702-15- D-0001

    The Pole Behaviour of the Phase Derivative of the Short-Time Fourier Transform

    Full text link
    The short-time Fourier transform (STFT) is a time-frequency representation widely used in applications, for example in audio signal processing. Recently it has been shown that not only the amplitude, but also the phase of this representation can be successfully exploited for improved analysis and processing. In this paper we describe a rather peculiar pole phenomenon in the phase derivative, a recurring pattern that appears in a characteristic way in the neighborhood around any of the zeros of the STFT, a negative peak followed by a positive one. We describe this phenomenon numerically and provide a complete analytical explanation.Comment: 15 pages, 4 figures; Applied and Computational Harmonic Analysis (in press), available online 22 October 201

    Long Term Suboxone™ Emotional Reactivity As Measured by Automatic Detection in Speech

    Get PDF
    Addictions to illicit drugs are among the nation’s most critical public health and societal problems. The current opioid prescription epidemic and the need for buprenorphine/naloxone (Suboxone®; SUBX) as an opioid maintenance substance, and its growing street diversion provided impetus to determine affective states (“true ground emotionality”) in long-term SUBX patients. Toward the goal of effective monitoring, we utilized emotion-detection in speech as a measure of “true” emotionality in 36 SUBX patients compared to 44 individuals from the general population (GP) and 33 members of Alcoholics Anonymous (AA). Other less objective studies have investigated emotional reactivity of heroin, methadone and opioid abstinent patients. These studies indicate that current opioid users have abnormal emotional experience, characterized by heightened response to unpleasant stimuli and blunted response to pleasant stimuli. However, this is the first study to our knowledge to evaluate “true ground” emotionality in long-term buprenorphine/naloxone combination (Suboxone™). We found in long-term SUBX patients a significantly flat affect (p<0.01), and they had less self-awareness of being happy, sad, and anxious compared to both the GP and AA groups. We caution definitive interpretation of these seemingly important results until we compare the emotional reactivity of an opioid abstinent control using automatic detection in speech. These findings encourage continued research strategies in SUBX patients to target the specific brain regions responsible for relapse prevention of opioid addiction.United States. Dept. of Defense. Assistant Secretary of Defense for Research & Engineering (Air Force Contract FA8721-05-C-0002

    Digital Signal Processing

    Get PDF
    Contains research objectives and summary of research on seven research projects.Joint Services Electronics Program (Contract DAAB07-76-C-1400)U. S. Navy - Office of Naval Research (Contract N00014-75-C-0951-NR 049-308)National Science Foundation (Grant ENG71-02319-AO2

    Digital Signal Processing

    Get PDF
    Contains research objectives and summary of research on seven research projects.U. S. Navy Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ENG71-02319-A02

    Digital Signal Processing

    Get PDF
    Contains a research summary and reports on fifteen research projects.National Science Foundation FellowshipJoint Services Electronics Program (Contract DAAG29-78-C-0020)National Science Foundation (Grant ENG76-24117)U.S. Navy - Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ENG76-24117)Schlumberger-Doll Research Center FellowshipHertz Foundation FellowshipNational Aeronautics and Space Administration (Grant NSG-5157)U.S. Navy - Office of Naval Research (Contract N00014-77-C-0196

    Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement

    Get PDF
    New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Project No. TEC2008-06382/TEC.Publicad

    Digital Signal Processing

    Get PDF
    Contains reports on twelve research projects.U. S. Navy - Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ENG76-24117)National Aeronautics and Space Administration (Grant NSG-5157)Joint Services Electronics Program (Contract DAABO7-76-C-1400)U.S. Navy-Office of Naval Research (Contract N00014-77-C-0196)Woods Hole Oceanographic InstitutionU. S. Navy - Office of Naval Research (Contract N00014-75-C-0852)Department of Ocean Engineering, M.I.T.National Science Foundation subcontract to Grant GX 41962 to Woods Hole Oceanographic Institutio

    Domain adaptation for enhancing speech-based depression detection in natural environmental conditions using dilated CNNs

    Full text link
    Depression disorders are a major growing concern worldwide, especially given the unmet need for widely deployable depression screening for use in real-world environments. Speech-based depression screening technologies have shown promising results, but primarily in systems that are trained using laboratory-based recorded speech. They do not generalize well on data from more naturalistic settings. This paper addresses the generalizability issue by proposing multiple adaptation strategies that update pre-trained models based on a dilated convolutional neural network (CNN) framework, which improve depression detection performance in both clean and naturalistic environments. Experimental results on two depression corpora show that feature representations in CNN layers need to be adapted to accommodate environmental changes, and that increases in data quantity and quality are helpful for pre-training models for adaptation. The cross-corpus adapted systems produce relative improvements of 29.4% and 17.2% in unweighted average recall over non-adapted systems for both clean and naturalistic corpora, respectively
    corecore