5,215 research outputs found
Dynamic Bayesian Networks for multi-band automatic speech recognition
Article dans revue scientifique avec comité de lecture.This paper presents a new approach to multi-band automatic speech recognition which has the advantage to overcome many limitations of classical muti-band systems. The principle of this new approach is to build a speech model in the time-frequency domain using the formalism of dynamic Bayesian networks. In contrast to classical multi-band modeling, this formalism leads to a probabilistic speech model which allows communications between the different sub-bands and, consequently, no recombination step is required in recognition. We develop efficient learning and decoding algorithms both for isolated and continuous speech recognition. We present illustrative experiments on isolated and connected digit recognition tasks. These experiments show that the this new approach is very promising in the field of noisy speech recognition
[[alternative]]Text-Independent Speaker Identification Systems Based on Multi-Layer Gaussian Mixture Models
計畫編號:NSC92-2213-E032-026研究期間:200308~200407研究經費:541,000[[sponsorship]]行政院國家科學委員
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Recently, the connectionist temporal classification (CTC) model coupled with
recurrent (RNN) or convolutional neural networks (CNN), made it easier to train
speech recognition systems in an end-to-end fashion. However in real-valued
models, time frame components such as mel-filter-bank energies and the cepstral
coefficients obtained from them, together with their first and second order
derivatives, are processed as individual elements, while a natural alternative
is to process such components as composed entities. We propose to group such
elements in the form of quaternions and to process these quaternions using the
established quaternion algebra. Quaternion numbers and quaternion neural
networks have shown their efficiency to process multidimensional inputs as
entities, to encode internal dependencies, and to solve many tasks with less
learning parameters than real-valued models. This paper proposes to integrate
multiple feature views in quaternion-valued convolutional neural network
(QCNN), to be used for sequence-to-sequence mapping with the CTC model.
Promising results are reported using simple QCNNs in phoneme recognition
experiments with the TIMIT corpus. More precisely, QCNNs obtain a lower phoneme
error rate (PER) with less learning parameters than a competing model based on
real-valued CNNs.Comment: Accepted at INTERSPEECH 201
Localization and Selection of Speaker Specific Information with Statistical Modeling
International audienceStatistical modeling of the speech signal has been widely used in speaker recognition. The performance obtained with this type of modeling is excellent in laboratories but decreases dramatically for telephone or noisy speech. Moreover, it is difficult to know which piece of information is taken into account by the system. In order to solve this problem and to improve the current systems, a better understanding of the nature of the information used by statistical methods is needed. This knowledge should allow to select only the relevant information or to add new sources of information. The first part of this paper presents experiments that aim at localizing the most useful acoustic events for speaker recognition. The relation between the discriminant ability and the speech's events nature is studied. Particularly, the phonetic content, the signal stability and the frequency domain are explored. Finally, the potential of dynamic information contained in the relation between a frame and its p neighbours is investigated. In the second part, the authors suggest a new selection procedure designed to select the pertinent features. Conventional feature selection techniques (ascendant selection, knockout) allow only global and a posteriori knowledge about the relevance of an information source. However, some speech clusters may be very efficient to recognize a particular speaker, whereas they can be non informative for another one. Moreover, some information classes may be corrupted or even missing for particular recording conditions. This necessity fo
22q11.2 deletion syndrome
22q11.2 deletion syndrome (22q11.2DS) is the most common chromosomal microdeletion disorder, estimated to result mainly from de novo non-homologous meiotic recombination events occurring in approximately 1 in every 1,000 fetuses. The first description in the English language of the constellation of findings now known to be due to this chromosomal difference was made in the 1960s in children with DiGeorge syndrome, who presented with the clinical triad of immunodeficiency, hypoparathyroidism and congenital heart disease. The syndrome is now known to have a heterogeneous presentation that includes multiple additional congenital anomalies and later-onset conditions, such as palatal, gastrointestinal and renal abnormalities, autoimmune disease, variable cognitive delays, behavioural phenotypes and psychiatric illness - all far extending the original description of DiGeorge syndrome. Management requires a multidisciplinary approach involving paediatrics, general medicine, surgery, psychiatry, psychology, interventional therapies (physical, occupational, speech, language and behavioural) and genetic counselling. Although common, lack of recognition of the condition and/or lack of familiarity with genetic testing methods, together with the wide variability of clinical presentation, delays diagnosis. Early diagnosis, preferably prenatally or neonatally, could improve outcomes, thus stressing the importance of universal screening. Equally important, 22q11.2DS has become a model for understanding rare and frequent congenital anomalies, medical conditions, psychiatric and developmental disorders, and may provide a platform to better understand these disorders while affording opportunities for translational strategies across the lifespan for both patients with 22q11.2DS and those with these associated features in the general population
- …