2,956 research outputs found
A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition
We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s
Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals
Human infants can discover words directly from unsegmented speech signals
without any explicitly labeled data. In this paper, we develop a novel machine
learning method called nonparametric Bayesian double articulation analyzer
(NPB-DAA) that can directly acquire language and acoustic models from observed
continuous speech signals. For this purpose, we propose an integrative
generative model that combines a language model and an acoustic model into a
single generative model called the "hierarchical Dirichlet process hidden
language model" (HDP-HLM). The HDP-HLM is obtained by extending the
hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by
Johnson et al. An inference procedure for the HDP-HLM is derived using the
blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure
enables the simultaneous and direct inference of language and acoustic models
from continuous speech signals. Based on the HDP-HLM and its inference
procedure, we developed a novel double articulation analyzer. By assuming
HDP-HLM as a generative model of observed time series data, and by inferring
latent variables of the model, the method can analyze latent double
articulation structure, i.e., hierarchically organized latent words and
phonemes, of the data in an unsupervised manner. The novel unsupervised double
articulation analyzer is called NPB-DAA.
The NPB-DAA can automatically estimate double articulation structure embedded
in speech signals. We also carried out two evaluation experiments using
synthetic data and actual human continuous speech signals representing Japanese
vowel sequences. In the word acquisition and phoneme categorization tasks, the
NPB-DAA outperformed a conventional double articulation analyzer (DAA) and
baseline automatic speech recognition system whose acoustic model was trained
in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on
Autonomous Mental Development (TAMD
An Empirical Evaluation of Zero Resource Acoustic Unit Discovery
Acoustic unit discovery (AUD) is a process of automatically identifying a
categorical acoustic unit inventory from speech and producing corresponding
acoustic unit tokenizations. AUD provides an important avenue for unsupervised
acoustic model training in a zero resource setting where expert-provided
linguistic knowledge and transcribed speech are unavailable. Therefore, to
further facilitate zero-resource AUD process, in this paper, we demonstrate
acoustic feature representations can be significantly improved by (i)
performing linear discriminant analysis (LDA) in an unsupervised self-trained
fashion, and (ii) leveraging resources of other languages through building a
multilingual bottleneck (BN) feature extractor to give effective cross-lingual
generalization. Moreover, we perform comprehensive evaluations of AUD efficacy
on multiple downstream speech applications, and their correlated performance
suggests that AUD evaluations are feasible using different alternative language
resources when only a subset of these evaluation resources can be available in
typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Topic Identification for Speech without ASR
Modern topic identification (topic ID) systems for speech use automatic
speech recognition (ASR) to produce speech transcripts, and perform supervised
classification on such ASR outputs. However, under resource-limited conditions,
the manually transcribed speech required to develop standard ASR systems can be
severely limited or unavailable. In this paper, we investigate alternative
unsupervised solutions to obtaining tokenizations of speech in terms of a
vocabulary of automatically discovered word-like or phoneme-like units, without
depending on the supervised training of ASR systems. Moreover, using automatic
phoneme-like tokenizations, we demonstrate that a convolutional neural network
based framework for learning spoken document representations provides
competitive performance compared to a standard bag-of-words representation, as
evidenced by comprehensive topic ID evaluations on both single-label and
multi-label classification tasks.Comment: 5 pages, 2 figures; accepted for publication at Interspeech 201
Bayesian Models for Unit Discovery on a Very Low Resource Language
Developing speech technologies for low-resource languages has become a very
active research field over the last decade. Among others, Bayesian models have
shown some promising results on artificial examples but still lack of in situ
experiments. Our work applies state-of-the-art Bayesian models to unsupervised
Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also
show that Bayesian models can naturally integrate information from other
resourceful languages by means of informative prior leading to more consistent
discovered units. Finally, discovered acoustic units are used, either as the
1-best sequence or as a lattice, to perform word segmentation. Word
segmentation results show that this Bayesian approach clearly outperforms a
Segmental-DTW baseline on the same corpus.Comment: Accepted to ICASSP 201
Nonparametric Dark Energy Reconstruction from Supernova Data
Understanding the origin of the accelerated expansion of the Universe poses
one of the greatest challenges in physics today. Lacking a compelling
fundamental theory to test, observational efforts are targeted at a better
characterization of the underlying cause. If a new form of mass-energy, dark
energy, is driving the acceleration, the redshift evolution of the equation of
state parameter w(z) will hold essential clues as to its origin. To best
exploit data from observations it is necessary to develop a robust and accurate
reconstruction approach, with controlled errors, for w(z). We introduce a new,
nonparametric method for solving the associated statistical inverse problem
based on Gaussian Process modeling and Markov chain Monte Carlo sampling.
Applying this method to recent supernova measurements, we reconstruct the
continuous history of w out to redshift z=1.5.Comment: 4 pages, 2 figures, accepted for publication in Physical Review
Letter
- …