Search CORE

740 research outputs found

Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Author: Eisenstein Jacob
Elsner Micha
Goldwater Sharon
Publication venue
Publication date: 01/07/2012
Field of study

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

Author: Elsner Micha
Feldman Naomi
Goldwater Sharon
Wood Frank
Publication venue
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

Computational and Robotic Models of Early Language Development: A Review

Author: Kachergis George
Oudeyer Pierre-Yves
Schueller William
Publication venue
Publication date: 25/03/2019
Field of study

We review computational and robotics models of early language learning and development. We first explain why and how these models are used to understand better how children learn language. We argue that they provide concrete theories of language learning as a complex dynamic system, complementing traditional methods in psychology and linguistics. We review different modeling formalisms, grounded in techniques from machine learning and artificial intelligence such as Bayesian and neural network approaches. We then discuss their role in understanding several key mechanisms of language development: cross-situational statistical learning, embodiment, situated social interaction, intrinsically motivated learning, and cultural evolution. We conclude by discussing future challenges for research, including modeling of large-scale empirical data about language acquisition in real-world environments. Keywords: Early language learning, Computational and robotic models, machine learning, development, embodiment, social interaction, intrinsic motivation, self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J. Horst and J. von Koss Torkildsen, Routledg

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Liaison acquisition: debates, critical issues, future research

Author: Chevrot Jean-Pierre
Dugua Céline
Harnois-Delpiano Mylène
Siccardi Anne
Spinelli Elsa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

International audienceLiaison is a sandhi phenomenon in French. Over the last four decades, it has given rise to many different models illustrating the whole range of phonological theories. More recently, new studies have documented its acquisition in French-speaking children as well as adult learners of French as a second language. These studies have resulted in the elaboration of two models of the acquisition process: 1/ the constructionist model (Chevrot, Dugua & Fayol, 2009; Nicoladis & Paradis, 2011) developed within the framework of the usage-based theories; 2/ the phonological model (Wauquier, 2009) which represents the framework of nonlinear phonology. Our aim is to re-examine the usage-based model in the light of the criticisms and suggestions made by Wauquier (2009). We shall first present the two models and then examine the issues under discussion. After that, we shall present longitudinal data testing a prediction made by the phonological model with regard to the generalization process in L1 and L2 acquisition. To conclude, we shall identify the points that remain to be clarified for each of the models and the directions which future research should take

Hal - Université Grenoble Alpes

HAL Université de Savoie

HAL Université de Tours

A role for the developing lexicon in phonetic category acquisition

Author: Feldman Naomi H.
Goldwater Sharon
Griffiths Thomas L.
Morgan James L.
Publication venue
Publication date: 01/01/2013
Field of study

Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning

Crossref

PubMed Central

Edinburgh Research Explorer

Bootstrapping Lexical Choice via Multiple-Sequence Alignment

Author: Barzilay Regina
Lee Lillian
Publication venue
Publication date: 01/01/2002
Field of study

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.Comment: 8 pages; to appear in the proceedings of EMNLP-200

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

SHOE:The extraction of hierarchical structure for machine learning of natural language

Author: Daelemans W.M.P.
Powers D.M.W.
Publication venue: Institute for Language Technology and Artifical IntelIigence, Tilburg University
Publication date: 01/01/1991
Field of study

Tilburg University Repository

Infants' First Words are not Phonetically Specified: Own Name Recognition in British English-Learning 5-Month-Olds

Author: Aslin
Benavides-Varela
Bergelson
Bertoncini
Bijeljac-Babic
Bleses
Bouchon
Bull
Burnham
Christophe
Curtin
Davis
Delattre
Delattre
Delattre
Dell
Delle Luche
Dilley
Dodane
Eilers
Eimas
Eimas
Elsner
Englund
Fear
Feldman
Feldman
Fenson
Floccia
Fougeron
Fourtassi
Giegrich
Hallé
Hamilton
Havy
Hilaire
Hirst
Hochmann
Hochmann
Houston
Höhle
Højen
Iverson
Jusczyk
Jusczyk
Jusczyk
Kuhl
Kuhl
Kuhl
Lahiri
Lilienfeld
Mack
Malécot
Mandel
Mani
Mani
Mani
Martin
Martin
Narayan
Nazzi
Nazzi
Nazzi
Nazzi
Nespor
Ngon
Nittrouer
Nittrouer
Patel
Pater
Polka
Poltrock
Ramus
Skandera
Skoruppa
Song
Stevens
Swingley
Swingley
Tincoff
Tincoff
Vihman
Werker
Werker
Werner
White
Yeung
Yeung
Yoshida
Publication venue: 'Wiley'
Publication date: 01/05/2017
Field of study

University of Essex Research Repository

Crossref

Plymouth Electronic Archive and Research Library

Rhythmic unit extraction and modelling for automatic language identification

Author: André-Obrecht Régine
Farinas Jérôme
Pellegrino François
Rouas Jean-Luc
Publication venue: Elsevier : North-Holland
Publication date: 01/01/2005
Field of study

International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

HAL

Hal-Diderot