Search CORE

690 research outputs found

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals

Author: Nagasaka Shogo
Nakashima Ryo
Taniguchi Tadahiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/03/2016
Field of study

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. In this paper, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we developed a novel double articulation analyzer. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. The novel unsupervised double articulation analyzer is called NPB-DAA. The NPB-DAA can automatically estimate double articulation structure embedded in speech signals. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on Autonomous Mental Development (TAMD

arXiv.org e-Print Archive

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Crossref

Edinburgh Research Explorer

Macquarie University ResearchOnline

Distributional cues to word segmentation: Context is important

Author: Goldwater Sharon
Griffiths Thomas L.
Johnson Mark
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Modeling Human Performance on Statistical Word Segmentation Tasks

Author: Frank Michael C.
Goldwater Sharon
Griffiths Thomas L.
Mansinghka Vikash
Tenenbaum Joshua B.
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Modeling Human Performance on Statistical Word Segmentation Tasks

Author: Cai Xinlun
Chen Lifeng
Ho Daniel
Li Huanlu
Phillips David B.
Wang Xuyang
Yu Siyuan
Zhou Xiaoqi
Zhu Jiangbo
Publication venue
Publication date: 01/01/2007
Field of study

Harnessing the orbital angular momentum (OAM) of light is an appealing approach to developing photonic technologies for future applications in optical communications and high-dimensional quantum key distribution (QKD) systems. An outstanding challenge to the widespread uptake of the OAM resource is its efficient generation. In this work we design a new device that can directly emit an OAM-carrying light beam from a low-cost semiconductor laser. By fabricating micro-scale spiral phase plates within the aperture of a vertical-cavity surface-emitting laser (VCSEL), the linearly polarized Gaussian beam emitted by the VCSEL is converted into a beam carrying specific OAM modes and their superposition states, with high efficiency and high beam quality. This new approach to OAM generation may be particularly useful in the field of OAM-based optical and quantum communications, especially for short-reach data interconnects and QKD

Northumbria University Research Portal

Crossref

Edinburgh Research Explorer

Enlighten

Explore Bristol Research

Unsupervised extraction of recurring words from infant-directed speech

Author: Goldwater Sharon
McInnes Fergus R.
Publication venue
Publication date: 01/01/2011
Field of study

To date, most computational models of infant word segmentation have worked from phonemic or phonetic input, or have used toy datasets. In this paper, we present an algorithm for word extraction that works directly from naturalistic acoustic input: infant-directed speech from the CHILDES corpus. The algorithm identifies recurring acoustic patterns that are candidates for identification as words or phrases, and then clusters together the most similar patterns. The recurring patterns are found in a single pass through the corpus using an incremental method, where only a small number of utterances are considered at once. Despite this limitation, we show that the algorithm is able to extract a number of recurring words, including some that infants learn earliest, such as Mommy and the child’s name. We also introduce a novel information-theoretic evaluation measure

CiteSeerX

Edinburgh Research Explorer

eScholarship - University of California

Symbol Emergence in Robotics: A Survey

Author: Asoh Hideki
Iwahashi Naoto
Nagai Takayuki
Nakamura Tomoaki
Ogata Tetsuya
Taniguchi Tadahiro
Publication venue
Publication date: 29/09/2015
Field of study

Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

arXiv.org e-Print Archive