18 research outputs found
Deep Learning for Distant Speech Recognition
Deep learning is an emerging technology that is considered one of the most
promising directions for reaching higher levels of artificial intelligence.
Among the other achievements, building computers that understand speech
represents a crucial leap towards intelligent machines. Despite the great
efforts of the past decades, however, a natural and robust human-machine speech
interaction still appears to be out of reach, especially when users interact
with a distant microphone in noisy and reverberant environments. The latter
disturbances severely hamper the intelligibility of a speech signal, making
Distant Speech Recognition (DSR) one of the major open challenges in the field.
This thesis addresses the latter scenario and proposes some novel techniques,
architectures, and algorithms to improve the robustness of distant-talking
acoustic models. We first elaborate on methodologies for realistic data
contamination, with a particular emphasis on DNN training with simulated data.
We then investigate on approaches for better exploiting speech contexts,
proposing some original methodologies for both feed-forward and recurrent
neural networks. Lastly, inspired by the idea that cooperation across different
DNNs could be the key for counteracting the harmful effects of noise and
reverberation, we propose a novel deep learning paradigm called network of deep
neural networks. The analysis of the original concepts were based on extensive
experimental validations conducted on both real and simulated data, considering
different corpora, microphone configurations, environments, noisy conditions,
and ASR tasks.Comment: PhD Thesis Unitn, 201
Learning Feature Representation for Automatic Speech Recognition
Feature extraction in automatic speech recognition (ASR) can be regarded
as learning representations from lower-level to more abstract higher-level features.
Lower-level feature can be viewed as features from the signal domain,
such as perceptual linear predictive (PLP) and Mel-frequency cepstral coefficients
(MFCCs) features. Higher-level feature representations can be considered
as bottleneck features (BNFs) learned using deep neural networks
(DNNs). In this thesis, we focus on improving feature extraction at different
levels mainly for ASR.
The first part of this thesis focuses on learning features from the signal
domain that help ASR. Hand-crafted spectral and cepstral features such as
MFCC are the main features used in most conventional ASR systems; all are
inspired by physiological models of the human auditory system. However, some
aspects of the signal such as pitch cannot be easily extracted from spectral
features, but are found to be useful for ASR. We explore new algorithm to extract
a pitch feature directly from a signal for ASR and show that this feature, appended to the other feature, gives consistent improvements in various languages,
especially tonal languages.
We then investigate replacing the conventional features with jointly training
from the signal domain using time domain, and frequency domain approaches.
The results show that our time-domain joint feature learning setup
achieves state-of-the-art performance using MFCC, while our frequency domain
setup outperforms them in various datasets.
Joint feature extraction results in learning data or language-dependent filter
banks, that can degrade the performance in unseen noise and channel conditions
or other languages. To tackle this, we investigate joint universal feature
learning across different languages using the proposed direct-from-signal
setups. We then investigate the filter banks learned in this setup and propose
a new set of features as an extension to conventional Mel filter banks. The results
show consistent word error rate (WER) improvement, especially in clean
condition.
The second part of this thesis focuses on learning higher-level feature embedding.
We investigate learning and transferring deep feature representations
across different domains using multi-task learning and weight transfer
approaches. They have been adopted to explicitly learn intermediate-level features
that are useful for several different tasks
Rapid Generation of Pronunciation Dictionaries for new Domains and Languages
This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists
Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme
Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
Esprit '90. Proceedings of the annual Esprit conference. Brussels, 12-15 November 1990. EUR 13148 EN
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
Data bases and data base systems related to NASA's aerospace program. A bibliography with indexes
This bibliography lists 1778 reports, articles, and other documents introduced into the NASA scientific and technical information system, 1975 through 1980
Abstracts on Radio Direction Finding (1899 - 1995)
The files on this record represent the various databases that originally composed the CD-ROM issue of "Abstracts on Radio Direction Finding" database, which is now part of the Dudley Knox Library's Abstracts and Selected Full Text Documents on Radio Direction Finding (1899 - 1995) Collection. (See Calhoun record https://calhoun.nps.edu/handle/10945/57364 for further information on this collection and the bibliography).
Due to issues of technological obsolescence preventing current and future audiences from accessing the bibliography, DKL exported and converted into the three files on this record the various databases contained in the CD-ROM.
The contents of these files are:
1) RDFA_CompleteBibliography_xls.zip [RDFA_CompleteBibliography.xls: Metadata for the complete bibliography, in Excel 97-2003 Workbook format; RDFA_Glossary.xls: Glossary of terms, in Excel 97-2003 Workbookformat; RDFA_Biographies.xls: Biographies of leading figures, in Excel 97-2003 Workbook format];
2) RDFA_CompleteBibliography_csv.zip [RDFA_CompleteBibliography.TXT: Metadata for the complete bibliography, in CSV format; RDFA_Glossary.TXT: Glossary of terms, in CSV format; RDFA_Biographies.TXT: Biographies of leading figures, in CSV format];
3) RDFA_CompleteBibliography.pdf: A human readable display of the bibliographic data, as a means of double-checking any possible deviations due to conversion