Search CORE

46 research outputs found

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Crossref

Edinburgh Research Explorer

Macquarie University ResearchOnline

Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

Author: Arthur Philip
Besacier Laurent
Black Alan
Ciannella Francesco
Du Mingxing
Dupoux Emmanuel
Godard Pierre
Hasegawa-Johnson Mark
Larsen Elin
Merkx Danny
Metze Florian
Mueller Markus
Neubig Graham
Ondel Lucas
Palaskar Shruti
Riad Rachid
Scharenborg Odette
Stueker Sebastian
Wang Liming
Publication venue
Publication date: 14/02/2018
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Visually grounded learning of keyword prediction from untranscribed speech

Author: Kamper Herman
Livescu Karen
Settle Shane
Shakhnarovich Gregory
Publication venue
Publication date: 25/05/2017
Field of study

During language acquisition, infants have the benefit of visual cues to ground spoken language. Robots similarly have access to audio and visual sensors. Recent work has shown that images and spoken captions can be mapped into a meaningful common space, allowing images to be retrieved using speech and vice versa. In this setting of images paired with untranscribed spoken captions, we consider whether computer vision systems can be used to obtain textual labels for the speech. Concretely, we use an image-to-words multi-label visual classifier to tag images with soft textual labels, and then train a neural network to map from the speech to these soft targets. We show that the resulting speech system is able to predict which words occur in an utterance---acting as a spoken bag-of-words classifier---without seeing any parallel speech and text. We find that the model often confuses semantically related words, e.g. "man" and "person", making it even more effective as a semantic keyword spotter.Comment: 5 pages, 3 figures, 5 tables; small updates, added link to code; accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref