214 research outputs found
The Kumaraswamy Generalized Power Weibull Distribution
A new family of distributions called Kumaraswamy-generalized power Weibull (Kgpw) distribution is proposed and studied. This family has a number of well known sub-models such as Weibull, exponentiated Weibull, Kumaraswamy Weibull, generalized power Weibull and new sub-models, namely, exponentiated generalized power Weibull, Kumaraswamy generalized power exponential distributions. Â Some statistical properties of the new distribution include its moments, moment generating function, quantile function and hazard function are derived. In addition, maximum likelihood estimates of the model parameters are obtained. An application as well as comparisons of the Kgpw and its sub-distributions is given. Â Keywords: Generalized power Weibull distribution, Kumaraswamy distribution, Maximum likelihood estimation, Moment generating function, Hazard rate function.
Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings
Models of acoustic word embeddings (AWEs) learn to map variable-length spoken
word segments onto fixed-dimensionality vector representations such that
different acoustic exemplars of the same word are projected nearby in the
embedding space. In addition to their speech technology applications, AWE
models have been shown to predict human performance on a variety of auditory
lexical processing tasks. Current AWE models are based on neural networks and
trained in a bottom-up approach that integrates acoustic cues to build up a
word representation given an acoustic or symbolic supervision signal.
Therefore, these models do not leverage or capture high-level lexical knowledge
during the learning process. In this paper, we propose a multi-task learning
model that incorporates top-down lexical knowledge into the training procedure
of AWEs. Our model learns a mapping between the acoustic input and a lexical
representation that encodes high-level information such as word semantics in
addition to bottom-up form-based supervision. We experiment with three
languages and demonstrate that incorporating lexical knowledge improves the
embedding space discriminability and encourages the model to better separate
lexical categories.Comment: Accepted in INTERSPEECH 202
An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech
Self-supervised representation learning for speech often involves a
quantization step that transforms the acoustic input into discrete units.
However, it remains unclear how to characterize the relationship between these
discrete units and abstract phonetic categories such as phonemes. In this
paper, we develop an information-theoretic framework whereby we represent each
phonetic category as a distribution over discrete units. We then apply our
framework to two different self-supervised models (namely wav2vec 2.0 and XLSR)
and use American English speech as a case study. Our study demonstrates that
the entropy of phonetic distributions reflects the variability of the
underlying speech sounds, with phonetically similar sounds exhibiting similar
distributions. While our study confirms the lack of direct, one-to-one
correspondence, we find an intriguing, indirect relationship between phonetic
categories and discrete units.Comment: Accepted in Interspeech 202
Dynamic Extension of ASR Lexicon Using Wikipedia Data
International audienceDespite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology
- …