214 research outputs found

    The Kumaraswamy Generalized Power Weibull Distribution

    Get PDF
    A new family of distributions called Kumaraswamy-generalized power Weibull (Kgpw) distribution is proposed and studied. This family has a number of well known sub-models such as Weibull, exponentiated Weibull, Kumaraswamy Weibull, generalized power Weibull and new sub-models, namely, exponentiated generalized power Weibull, Kumaraswamy generalized power exponential distributions.  Some statistical properties of the new distribution include its moments, moment generating function, quantile function and hazard function are derived. In addition, maximum likelihood estimates of the model parameters are obtained. An application as well as comparisons of the Kgpw and its sub-distributions is given.   Keywords: Generalized power Weibull distribution, Kumaraswamy distribution, Maximum likelihood estimation, Moment generating function, Hazard rate function.

    Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

    Full text link
    Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.Comment: Accepted in INTERSPEECH 202

    An Information-Theoretic Analysis of Self-supervised Discrete Representations of Speech

    Full text link
    Self-supervised representation learning for speech often involves a quantization step that transforms the acoustic input into discrete units. However, it remains unclear how to characterize the relationship between these discrete units and abstract phonetic categories such as phonemes. In this paper, we develop an information-theoretic framework whereby we represent each phonetic category as a distribution over discrete units. We then apply our framework to two different self-supervised models (namely wav2vec 2.0 and XLSR) and use American English speech as a case study. Our study demonstrates that the entropy of phonetic distributions reflects the variability of the underlying speech sounds, with phonetically similar sounds exhibiting similar distributions. While our study confirms the lack of direct, one-to-one correspondence, we find an intriguing, indirect relationship between phonetic categories and discrete units.Comment: Accepted in Interspeech 202

    Dynamic Extension of ASR Lexicon Using Wikipedia Data

    Get PDF
    International audienceDespite recent progress in developing Large Vocabulary Continuous Speech Recognition Systems (LVCSR), these systems suffer from Out-Of-Vocabulary words (OOV). In many cases, the OOV words are Proper Nouns (PNs). The correct recognition of PNs is essential for broadcast news, audio indexing, etc. In this article, we address the problem of OOV PN retrieval in the framework of broadcast news LVCSR. We focused on dynamic (document dependent) extension of LVCSR lexicon. To retrieve relevant OOV PNs, we propose to use a very large multipurpose text corpus: Wikipedia. This corpus contains a huge number of PNs. These PNs are grouped in semantically similar classes using word embedding. We use a two-step approach: first, we select OOV PN pertinent classes with a multi-class Deep Neural Network (DNN). Secondly, we rank the OOVs of the selected classes. The experiments on French broadcast news show that the Bi-GRU model outperforms other studied models. Speech recognition experiments demonstrate the effectiveness of the proposed methodology
    • …
    corecore