5,104 research outputs found
Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech
The rapid population aging has stimulated the development of assistive
devices that provide personalized medical support to the needies suffering from
various etiologies. One prominent clinical application is a computer-assisted
speech training system which enables personalized speech therapy to patients
impaired by communicative disorders in the patient's home environment. Such a
system relies on the robust automatic speech recognition (ASR) technology to be
able to provide accurate articulation feedback. With the long-term aim of
developing off-the-shelf ASR systems that can be incorporated in clinical
context without prior speaker information, we compare the ASR performance of
speaker-independent bottleneck and articulatory features on dysarthric speech
used in conjunction with dedicated neural network-based acoustic models that
have been shown to be robust against spectrotemporal deviations. We report ASR
performance of these systems on two dysarthric speech datasets of different
characteristics to quantify the achieved performance gains. Despite the
remaining performance gap between the dysarthric and normal speech, significant
improvements have been reported on both datasets using speaker-independent ASR
architectures.Comment: to appear in Computer Speech & Language -
https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial
text overlap with arXiv:1807.1094
Through precision straits to next standard model heights
After the LHC Run 1, the standard model (SM) of particle physics has been
completed. Yet, despite its successes, the SM has shortcomings vis-\`{a}-vis
cosmological and other observations. At the same time, while the LHC restarts
for Run 2 at 13 TeV, there is presently a lack of direct evidence for new
physics phenomena at the accelerator energy frontier. From this state of
affairs arises the need for a consistent theoretical framework in which
deviations from the SM predictions can be calculated and compared to precision
measurements. Such a framework should be able to comprehensively make use of
all measurements in all sectors of particle physics, including LHC Higgs
measurements, past electroweak precision data, electric dipole moment, ,
penguins and flavor physics, neutrino scattering, deep inelastic scattering,
low-energy scattering, mass measurements, and any search for
physics beyond the SM. By simultaneously describing all existing measurements,
this framework then becomes an intermediate step, pointing us toward the next
SM, and hopefully revealing the underlying symmetries. We review the role that
the standard model effective field theory (SMEFT) could play in this context,
as a consistent, complete, and calculable generalization of the SM in the
absence of light new physics. We discuss the relationship of the SMEFT with the
existing kappa-framework for Higgs boson couplings characterization and the use
of pseudo-observables, that insulate experimental results from refinements due
to ever-improving calculations. The LHC context, as well as that of previous
and future accelerators and experiments, is also addressed.Comment: 19 pages, 3 figure
Non-autoregressive Transformer-based End-to-end ASR using BERT
Transformer-based models have led to a significant innovation in various
classic and practical subjects, including speech processing, natural language
processing, and computer vision. On top of the transformer, the attention-based
end-to-end automatic speech recognition (ASR) models have become a popular
fashion in recent years. Specifically, the non-autoregressive modeling, which
can achieve fast inference speed and comparable performance when compared to
conventional autoregressive methods, is an emergent research topic. In the
context of natural language processing, the bidirectional encoder
representations from transformers (BERT) model has received widespread
attention, partially due to its ability to infer contextualized word
representations and to obtain superior performances of downstream tasks by
performing only simple fine-tuning. In order to not only inherit the advantages
of non-autoregressive ASR modeling, but also receive benefits from a
pre-trained language model (e.g., BERT), a non-autoregressive transformer-based
end-to-end ASR model based on BERT is presented in this paper. A series of
experiments conducted on the AISHELL-1 dataset demonstrates competitive or
superior results of the proposed model when compared to state-of-the-art ASR
systems
Roadmap for Next-Generation Accountability Systems
Offers a framework for designing and implementing state accountability systems that enable consistent, aligned goals to ensure college- and career-readiness; valid measurement, support, and interventions; transparent reporting; and continuous improvement
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Incremental interpretation and prediction of utterance meaning for interactive dialogue
We present techniques for the incremental interpretation and prediction of utterance meaning in dialogue systems. These techniques open possibilities for systems to initiate responsive overlap behaviors during user speech, such as interrupting, acknowledging, or completing a user's utterance while it is still in progress. In an implemented system, we show that relatively high accuracy can be achieved in understanding of spontaneous utterances before utterances are completed. Further, we present a method for determining when a system has reached a point of maximal understanding of an ongoing user utterance, and show that this determination can be made with high precision. Finally, we discuss a prototype implementation that shows how systems can use these abilities to strategically initiate system completions of user utterances. More broadly, this framework facilitates the implementation of a range of overlap behaviors that are common in human dialogue, but have been largely absent in dialogue systems
Data Balancing for Efficient Training of Hybrid ANN/HMM Automatic Speech Recognition Systems
Hybrid speech recognizers, where the estimation of the emission pdf of the states of Hidden Markov Models (HMMs), usually carried out using Gaussian Mixture Models (GMMs), is substituted by Artificial Neural Networks (ANNs) have several advantages over the classical systems. However, to obtain performance improvements, the computational requirements are heavily increased because of the need to train the ANN. Departing from the observation of the remarkable skewness of speech data, this paper proposes sifting out the training set and balancing the amount of samples per class. With this method the training time has been reduced 18 times while obtaining performances similar to or even better than those with the whole database, especially in noisy environments. However, the application of these reduced sets is not straightforward. To avoid the mismatch between training and testing conditions created by the modification of the distribution of the training data, a proper scaling of the a posteriori probabilities obtained and a resizing of the context window need to be performed as demonstrated in the paper.This work was supported in part by the regional grant (Comunidad AutĂłnoma de Madrid-UC3M) CCG06-UC3M/TIC-0812 and in part by a project funded by the Spanish Ministry of Science and Innovation (TEC 2008-06382).Publicad
- âŠ