99 research outputs found
Recommended from our members
Introduction to the Special Issue on End-to-End Speech and Language Processing
The eleven papers in this special section focus on end-to-end speech and language processing (SLP) which is a series of sequence-to-sequence learning problems. Conventional SLP systems map input to output sequences through module-based architectures where each module is independently trained. These have a number of limitations including local optima, assumptions about intermediate models and features, and complex expert knowledge driven steps. It can be difficult for non-experts to use and develop new applications. Integrated End-to-End (E2E) systems aim to simplify the solution to these problems through a single network architecture to map an input sequence directly to the desired output sequence without the need for intermediate module representations. E2E models rely on flexible and powerful machine learning models such as recurrent neural networks. The emergence of models for end-to-end speech processing has lowered the barriers to entry into serious speech research. This special issue showcases the power of novel machine learning methods in end-to-end speech and language processing
Borderline Aggregation Kinetics in ``Dry'' and ``Wet'' Environments
We investigate the kinetics of constant-kernel aggregation which is augmented
by either: (a) evaporation of monomers from finite-mass clusters, or (b)
continuous cluster growth -- \ie, condensation. The rate equations for these
two processes are analyzed using both exact and asymptotic methods. In
aggregation-evaporation, if the evaporation is mass conserving, \ie, the
monomers which evaporate remain in the system and continue to be reactive, the
competition between evaporation and aggregation leads to several asymptotic
outcomes. For weak evaporation, the kinetics is similar to that of aggregation
with no evaporation, while equilibrium is quickly reached in the opposite case.
At a critical evaporation rate, the cluster mass distribution decays as
, where is the mass, while the typical cluster mass grows with
time as . In aggregation-condensation, we consider the process with a
growth rate for clusters of mass , , which is: (i) independent of ,
(ii) proportional to , and (iii) proportional to , with . In
the first case, the mass distribution attains a conventional scaling form, but
with the typical cluster mass growing as . When , the
typical mass grows exponentially in time, while the mass distribution again
scales. In the intermediate case of , scaling generally
applies, with the typical mass growing as . We also give an
exact solution for the linear growth model, , in one dimension.Comment: plain TeX, 17 pages, no figures, macro file prepende
Multilingual representations for low resource speech recognition and keyword search
© 2015 IEEE. This paper examines the impact of multilingual (ML) acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program. The task is to develop Swahili ASR and KWS systems within two weeks using as little as 3 hours of transcribed data. Multilingual acoustic representations proved to be crucial for building these systems under strict time constraints. The paper discusses several key insights on how these representations are derived and used. First, we present a data sampling strategy that can speed up the training of multilingual representations without appreciable loss in ASR performance. Second, we show that fusion of diverse multilingual representations developed at different LORELEI sites yields substantial ASR and KWS gains. Speaker adaptation and data augmentation of these representations improves both ASR and KWS performance (up to 8.7% relative). Third, incorporating un-transcribed data through semi-supervised learning, improves WER and KWS performance. Finally, we show that these multilingual representations significantly improve ASR and KWS performance (relative 9% for WER and 5% for MTWV) even when forty hours of transcribed audio in the target language is available. Multilingual representations significantly contributed to the LORELEI KWS systems winning the OpenKWS15 evaluation
Spike pattern recognition by supervised classification in low dimensional embedding space
© The Author(s) 2016. This article is published with open access at Springerlink.com under the terms of the Creative Commons Attribution License 4.0, (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.Epileptiform discharges in interictal electroencephalography (EEG) form the mainstay of epilepsy diagnosis and localization of seizure onset. Visual analysis is rater-dependent and time consuming, especially for long-term recordings, while computerized methods can provide efficiency in reviewing long EEG recordings. This paper presents a machine learning approach for automated detection of epileptiform discharges (spikes). The proposed method first detects spike patterns by calculating similarity to a coarse shape model of a spike waveform and then refines the results by identifying subtle differences between actual spikes and false detections. Pattern classification is performed using support vector machines in a low dimensional space on which the original waveforms are embedded by locality preserving projections. The automatic detection results are compared to experts’ manual annotations (101 spikes) on a whole-night sleep EEG recording. The high sensitivity (97 %) and the low false positive rate (0.1 min−1), calculated by intra-patient cross-validation, highlight the potential of the method for automated interictal EEG assessment.Peer reviewedFinal Published versio
Factor Analysis Invariant To Linear Transformations Of Data
Modeling data with Gaussian distributions is an important statistical problem. To obtain robust models one imposes constraints the means and covariances of these distributions [6, 4, 10, 8]. Constrained ML modeling implies the existence of optimal feature spaces where the constraints are more valid [2, 3]. This paper introduces one such constrained ML modeling technique called factor analysis invariant to linear transformations (FACILT) which is essentially factor analysis in optimal feature spaces. FACILT is a generalization of several existing methods for modeling covariances. This paper presents an EM algorithm for FACILT modeling
Method for the isolation of Escherichia coli relaxed mutants, utilizing near-ultraviolet irradiation
What's in a vote The short- and long-run impact of dual-class equity on IPO firm values
We find that relative to fundamentals, dual-class firms trade at lower prices than do single-class firms, both at the IPO and for at least the subsequent 5 years. The lower prices attached to duals do not foreshadow abnormally low stock or accounting returns. Moreover, some types of CEO turnover are less frequent among duals, and in general CEO turnover is sensitive to firm performance for singles but not for duals. Finally, when duals unify their share classes, statistically and economically significant value gains occur. Collectively, our results suggest that the governance associated with dual-class equity influences the pricing of duals.
- …