Search CORE

3,731 research outputs found

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Author: Nagasaka Shogo
Nakashima Ryo
Taniguchi Tadahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/03/2016
Field of study

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

arXiv.org e-Print Archive

Transfer Learning for Speech and Language Processing

Author: Wang Dong
Zheng Thomas Fang
Publication venue
Publication date: 19/11/2015
Field of study

Transfer learning is a vital technique that generalizes models trained for one setting or task to other settings or tasks. For example in speech recognition, an acoustic model trained for one language can be used to recognize speech in another language, with little or no re-training data. Transfer learning is closely related to multi-task learning (cross-lingual vs. multilingual), and is traditionally studied in the name of `model adaptation'. Recent advance in deep learning shows that transfer learning becomes much easier and more effective with high-level abstract features learned by deep models, and the `transfer' can be conducted not only between data distributions and data types, but also between model structures (e.g., shallow nets and deep nets) or even model types (e.g., Bayesian models and neural models). This review paper summarizes some recent prominent research towards this direction, particularly for speech and language processing. We also report some results from our group and highlight the potential of this very interesting research field.Comment: 13 pages, APSIPA 201

arXiv.org e-Print Archive

Crossref

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition

Author: Lu Liang
Renals Steve
Publication venue
Publication date: 30/11/2014
Field of study

Edinburgh Research Explorer

An Efficient Probabilistic Deep Learning Model for the Oral Proficiency Assessment of Student Speech Recognition and Classification

Author: Li Wenyi
Mohamad Maslawati
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/07/2023
Field of study

Natural Language Processing is a branch of artificial intelligence (AI) that focuses on the interaction between computers and human language. Speech recognition systems utilize machine learning algorithms and statistical models to analyze acoustic features of speech, such as pitch, duration, and frequency, to convert spoken words into written text. The Student English Oral Proficiency Assessment and Feedback System provides students with a comprehensive evaluation of their spoken English skills and offers tailored feedback to help them improve. It can be used in language learning institutions, universities, or online platforms to support language education and enhance oral communication abilities. In this paper constructed a framework stated as Latent Dirichlet Integrated Deep Learning (LDiDL) for the assessment of student English proficiency assessment. The system begins by collecting a comprehensive dataset of spoken English samples, encompassing various proficiency levels. Relevant features are extracted from the samples, including acoustic characteristics and linguistic attributes. Leveraging Latent Dirichlet Allocation (LDA), the system uncovers latent topics within the data, enabling a deeper understanding of the underlying themes present in the spoken English. To further enhance the analysis, a deep learning model is developed, integrating the LDA topics with the extracted features. This model is trained using appropriate techniques and evaluated using performance metrics. Utilizing the predictions made by the model, the system generates personalized feedback for each student, focusing on areas of improvement such as vocabulary, grammar, fluency, and pronunciation. Simulation mode uses the native English speech audio for the LDiDL training and classification. The experimental analysis stated that the proposed LDiDL model achieves an accuracy of 99% for the assessment of English Proficiency

International Journal on Recent and Innovation Trends in Computing and Communication

Context based multimedia information retrieval

Author: Mølgaard Lasse Lohilahti
Publication venue: Technical University of Denmark
Publication date: 01/12/2009
Field of study

Online Research Database In Technology