Search CORE

5,215 research outputs found

Dynamic Bayesian Networks for multi-band automatic speech recognition

Author: Antoine Christophe
Daoudi Khalid
Fohr Dominique
Publication venue: 'Elsevier BV'
Publication date: 01/01/2003
Field of study

Article dans revue scientifique avec comité de lecture.This paper presents a new approach to multi-band automatic speech recognition which has the advantage to overcome many limitations of classical muti-band systems. The principle of this new approach is to build a speech model in the time-frequency domain using the formalism of dynamic Bayesian networks. In contrast to classical multi-band modeling, this formalism leads to a probabilistic speech model which allows communications between the different sub-bands and, consequently, no recombination step is required in recognition. We develop efficient learning and decoding algorithms both for isolated and continuous speech recognition. We present illustrative experiments on isolated and connected digit recognition tasks. These experiments show that the this new approach is very promising in the field of noisy speech recognition

Crossref

INRIA a CCSD electronic archive server

Information fusion for subband-HMM speaker recognition

Author: Damper R. I.
Dodd T. J.
Higgins J. E.
Publication venue
Publication date: 01/01/2001
Field of study

Southampton (e-Prints Soton)

[[alternative]]Text-Independent Speaker Identification Systems Based on Multi-Layer Gaussian Mixture Models

Author: 賴友仁
Publication venue
Publication date
Field of study

計畫編號：NSC92-2213-E032-026研究期間：200308~200407研究經費：541,000[[sponsorship]]行政院國家科學委員

Tamkang University Institutional Repository

Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition

Author: Bengio Yoshua
De Mori Renato
Linarès Georges
Morchid Mohamed
Parcollet Titouan
Trabelsi Chiheb
Zhang Ying
Publication venue
Publication date: 20/06/2018
Field of study

Recently, the connectionist temporal classification (CTC) model coupled with recurrent (RNN) or convolutional neural networks (CNN), made it easier to train speech recognition systems in an end-to-end fashion. However in real-valued models, time frame components such as mel-filter-bank energies and the cepstral coefficients obtained from them, together with their first and second order derivatives, are processed as individual elements, while a natural alternative is to process such components as composed entities. We propose to group such elements in the form of quaternions and to process these quaternions using the established quaternion algebra. Quaternion numbers and quaternion neural networks have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with less learning parameters than real-valued models. This paper proposes to integrate multiple feature views in quaternion-valued convolutional neural network (QCNN), to be used for sequence-to-sequence mapping with the CTC model. Promising results are reported using simple QCNNs in phoneme recognition experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme error rate (PER) with less learning parameters than a competing model based on real-valued CNNs.Comment: Accepted at INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

Towards Robust and Adaptive Speech Recognition Models

Author: B Kingsbury
H Hermansky
H Mcgurk
J Allen
S Rao
T Houtgast
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Localization and Selection of Speaker Specific Information with Statistical Modeling

Author: Besacier L
Bonastre J.F.
Fredouille C.
Publication venue: Elsevier : North-Holland
Publication date: 01/01/2000
Field of study

International audienceStatistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve the current systems, a better understanding of the nature of the information used by statistical methods is needed. This knowledge should allow to select only the relevant information or to add new sources of information. The first part of this paper presents experiments that aim at localizing the most useful acoustic events for speaker recognition. The relation between the discriminant ability and the speech's events nature is studied. Particularly, the phonetic content, the signal stability and the frequency domain are explored. Finally, the potential of dynamic information contained in the relation between a frame and its p neighbours is investigated. In the second part, the authors suggest a new selection procedure designed to select the pertinent features. Conventional feature selection techniques (ascendant selection, knockout) allow only global and a posteriori knowledge about the relevance of an information source. However, some speech clusters may be very efficient to recognize a particular speaker, whereas they can be non informative for another one. Moreover, some information classes may be corrupted or even missing for particular recording conditions. This necessity fo

22q11.2 deletion syndrome

Author: Bassett Anne S.
Emanuel Beverly S.
Marino Bruno
McDonald-McGinn Donna M.
Morrow Bernice E.
Philip Nicole
Scambler Peter J.
Sullivan Kathleen E.
Swillen Ann
Vermeesch Joris R.
Vorstman Jacob A. S.
Zackai Elaine H.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

22q11.2 deletion syndrome (22q11.2DS) is the most common chromosomal microdeletion disorder, estimated to result mainly from de novo non-homologous meiotic recombination events occurring in approximately 1 in every 1,000 fetuses. The first description in the English language of the constellation of findings now known to be due to this chromosomal difference was made in the 1960s in children with DiGeorge syndrome, who presented with the clinical triad of immunodeficiency, hypoparathyroidism and congenital heart disease. The syndrome is now known to have a heterogeneous presentation that includes multiple additional congenital anomalies and later-onset conditions, such as palatal, gastrointestinal and renal abnormalities, autoimmune disease, variable cognitive delays, behavioural phenotypes and psychiatric illness - all far extending the original description of DiGeorge syndrome. Management requires a multidisciplinary approach involving paediatrics, general medicine, surgery, psychiatry, psychology, interventional therapies (physical, occupational, speech, language and behavioural) and genetic counselling. Although common, lack of recognition of the condition and/or lack of familiarity with genetic testing methods, together with the wide variability of clinical presentation, delays diagnosis. Early diagnosis, preferably prenatally or neonatally, could improve outcomes, thus stressing the importance of universal screening. Equally important, 22q11.2DS has become a model for understanding rare and frequent congenital anomalies, medical conditions, psychiatric and developmental disorders, and may provide a platform to better understand these disorders while affording opportunities for translational strategies across the lifespan for both patients with 22q11.2DS and those with these associated features in the general population

PubMed Central

Archivio della ricerca- Università di Roma La Sapienza