Search CORE

2,355 research outputs found

The MIT Summit Speech Recognition System: A Progress Report

Author: James Glass
Michael Phillips
Stephanie Seneff
Victor Zue
Publication venue
Publication date: 01/01/1989
Field of study

Recently, we initiated a project to develop a phonetically-based spoken language understanding system called SUMMIT. In contrast to many of the past efforts that make use of heuristic rules whose development requires intense knowledge engineering, our approach attempts to express the speech knowledge within a formal framework using well-defined mathematical tools. In our system, features and decision strategies are discovered and trained automatically, using a large body of speech data. This paper describes the system, and documents its current performance

CiteSeerX

Crossref

Phonologically-Informed Speech Coding for Automatic Speech Recognition-based Foreign Language Pronunciation Training

Author: Vicario Anthony J
Publication venue: CUNY Academic Works
Publication date: 01/02/2020
Field of study

Automatic speech recognition (ASR) and computer-assisted pronunciation training (CAPT) systems used in foreign-language educational contexts are often not developed with the specific task of second-language acquisition in mind. Systems that are built for this task are often excessively targeted to one native language (L1) or a single phonemic contrast and are therefore burdensome to train. Current algorithms have been shown to provide erroneous feedback to learners and show inconsistencies between human and computer perception. These discrepancies have thus far hindered more extensive application of ASR in educational systems. This thesis reviews the computational models of the human perception of American English vowels for use in an educational context; exploring and comparing two types of acoustic representation: a low-dimensionality linguistically-informed formant representation and more traditional Mel frequency cepstral coefficients (MFCCs). We first compare two algorithms for phoneme classification (support vector machines and long short-term memory recurrent neural networks) trained on American English vowel productions from the TIMIT corpus. We then conduct a perceptual study of non-native English vowel productions perceived by native American English speakers. We compare the results of the computational experiment and the human perception experiment to assess human/model agreement. Dissimilarities between human and model classification are explored. More phonologically-informed audio signal representations should create a more human-aligned, less L1-dependent vowel classification system with higher interpretability that can be further refined with more phonetic- and/or phonological-based research. Results show that linguistically-informed speech coding produces results that better align with human classification, supporting use of the proposed coding for ASR-based CAPT

City University of New York

Speech Communication

Author: Abramson Katie
Allen Jonathan
Alwan Abeer A.
Bateman Nicholas P. T.
Bickley Corine A.
Boyce Suzanne E.
Chapin Ringo Carol
Daly Nancy
Espy-Wilson Carol Y.
Forestell Ann F.
Furtado Xavier
Glass James R.
Glicksman Laura B.
Goldhor Richard S.
Hall Seth M.
Halle Morris
Hillman Robert E.
Hirahara Tatsuya
Holmberg Eva B.
Howitt Andrew William
Huang Caroline B.
Ihionu Peter
Isaacs Katy
Jankowski Charles
Kassel Rob
Kawasaki Haruko
Kennedy Fred G.
Keyser Samuel J.
Klatt Dennis H.
Kuru Tunay
Lamel Lori
Lane Harlan L.
Larkey Leah S.
Leung Hong
Locke John L.
Makhoul John I.
Manuel Sharon Y.
Marcus Jeff
Matelli Joan
McCandless Michael K.
Menard Hope
Meng Helen
Mitra Haruko
North Keith
Ono Shigeru
Palay Vicky
Pao Christine
Perkell Joseph S.
Phillips Michael
Pitrelli John
Randolph Mark A.
Seneff Stephanie
Shattuck-Hufnagel Stephanie
Shaw Andy
Stevens Kenneth N.
Suzuki Noriko
Svirsky Mario A.
Takeda Kasuya
Webster Jane W.
Whitney Dave
Wilde Lorin F.
Wong Davin
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Part IV, table of contents for Section 1 and reports on five research projects.Apple Computer, Inc.C.J. Lebel FellowshipNational Institutes of Health (Grant T32-NS07040)National Institutes of Health (Grant R01-NS04332)National Institutes of Health (Grant R01-NS21183)National Institutes of Health (Grant P01-NS23734)U.S. Navy / Naval Electronic Systems Command (Contract N00039-85-C-0254)U.S. Navy - Office of Naval Research (Contract N00014-82-K-0727

DSpace@MIT

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

Author: Wang Jianglin
Publication venue: e-Publications@Marquette
Publication date: 01/10/2013
Field of study

Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

epublications@Marquette

Voice Analysis to Differentiate the Dopaminergic Response in People With Parkinson's Disease

Author: Abedinpour Kian
Asaei Afsaneh
caliskan Mine Melodi
Cernak Milos
Fietzek Urban M.
Jain Anubhav
Pfister Franz M. J.
Polat Ozgur
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 31/05/2021
Field of study

Humans' voice offers the widest variety of motor phenomena of any human activity. However, its clinical evaluation in people with movement disorders such as Parkinson's disease (PD) lags behind current knowledge on advanced analytical automatic speech processing methodology. Here, we use deep learning-based speech processing to differentially analyze voice recordings in 14 people with PD before and after dopaminergic medication using personalized Convolutional Recurrent Neural Networks (p-CRNN) and Phone Attribute Codebooks (PAC). p-CRNN yields an accuracy of 82.35% in the binary classification of ON and OFF motor states at a sensitivity/specificity of 0.86/0.78. The PAC-based approach's accuracy was slightly lower with 73.08% at a sensitivity/specificity of 0.69/0.77, but this method offers easier interpretation and understanding of the computational biomarkers. Both p-CRNN and PAC provide a differentiated view and novel insights into the distinctive components of the speech of persons with PD. Both methods detect voice qualities that are amenable to dopaminergic treatment, including active phonetic and prosodic features. Our findings may pave the way for quantitative measurements of speech in persons with PD

Open Access LMU

PubMed Central

Role of Selected Spectral Attributes in the Perception of Synthetic Vowels

Author: Savela Janne
Publication venue: Turku Centre for Computer Science
Publication date: 26/06/2009
Field of study

This thesis is an experimental study regarding the identification and discrimination of vowels, studied using synthetic stimuli. The acoustic attributes of synthetic stimuli vary, which raises the question of how different spectral attributes are linked to the behaviour of the subjects. The spectral attributes used are formants and spectral moments (centre of gravity, standard deviation, skewness and kurtosis). Two types of experiments are used, related to the identification and discrimination of the stimuli, respectively. The discrimination is studied by using both the attentive procedures that require a response from the subject, and the preattentive procedures that require no response. Together, the studies offer information about the identification and discrimination of synthetic vowels in 15 different languages. Furthermore, this thesis discusses the role of various spectral attributes in the speech perception processes. The thesis is divided into three studies. The first is based only on attentive methods, whereas the other two concentrate on the relationship between identification and discrimination experiments. The neurophysiological methods (EEG recordings) reveal the role of attention in processing, and are used in discrimination experiments, while the results reveal differences in perceptual processes based on the language, attention and experimental procedure.Siirretty Doriast

UTUPub

A detection-based pattern recognition framework and its applications

Author: Ma Chengyuan
Publication venue: Georgia Institute of Technology
Publication date: 06/04/2010
Field of study

The objective of this dissertation is to present a detection-based pattern recognition framework and demonstrate its applications in automatic speech recognition and broadcast news video story segmentation. Inspired by the studies of modern cognitive psychology and real-world pattern recognition systems, a detection-based pattern recognition framework is proposed to provide an alternative solution for some complicated pattern recognition problems. The primitive features are first detected and the task-specific knowledge hierarchy is constructed level by level; then a variety of heterogeneous information sources are combined together and the high-level context is incorporated as additional information at certain stages. A detection-based framework is a â divide-and-conquerâ design paradigm for pattern recognition problems, which will decompose a conceptually difficult problem into many elementary sub-problems that can be handled directly and reliably. Some information fusion strategies will be employed to integrate the evidence from a lower level to form the evidence at a higher level. Such a fusion procedure continues until reaching the top level. Generally, a detection-based framework has many advantages: (1) more flexibility in both detector design and fusion strategies, as these two parts can be optimized separately; (2) parallel and distributed computational components in primitive feature detection. In such a component-based framework, any primitive component can be replaced by a new one while other components remain unchanged; (3) incremental information integration; (4) high level context information as additional information sources, which can be combined with bottom-up processing at any stage. This dissertation presents the basic principles, criteria, and techniques for detector design and hypothesis verification based on the statistical detection and decision theory. In addition, evidence fusion strategies were investigated in this dissertation. Several novel detection algorithms and evidence fusion methods were proposed and their effectiveness was justified in automatic speech recognition and broadcast news video segmentation system. We believe such a detection-based framework can be employed in more applications in the future.Ph.D.Committee Chair: Lee, Chin-Hui; Committee Member: Clements, Mark; Committee Member: Ghovanloo, Maysam; Committee Member: Romberg, Justin; Committee Member: Yuan, Min

Scholarly Materials And Research @ Georgia Tech

Speech knowledge modelling for speech recognition : a study based on distinctive features

Author: Ran Shuping
Publication venue
Publication date: 19/07/2018
Field of study

The Australian National University