Search CORE

4,422 research outputs found

Weak top-down constraints for unsupervised acoustic model training

Author: Aren Jansen
Hynek Hermansky
Samuel Thomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Typical supervised acoustic model training relies on strong top-down constraints provided by dynamic programming alignment of the in-put observations to phonetic sequences derived from orthographic word transcripts and pronunciation dictionaries. This paper investi-gates a much weaker form of top-down supervision for use in place of transcripts and dictionaries in the zero resource setting. Our pro-posed constraints, which can be produced using recent spoken term discovery systems, come in the form of pairs of isolated word exam-ples that share the same unknown type. For each pair, we perform a dynamic programming alignment of the acoustic observations of the two constituent examples, generating an inventory of cross-speaker frame pairs that each provide evidence that the same subword unit model should account for them. We find these weak top-down con-straints are capable of improving model speaker independence by up to 57 % relative over bottom-up training alone. Index Terms — speaker independent acoustic models, unsuper-vised training, spectral clustering, top-down constraints 1

CiteSeerX

Crossref

A summary of the 2012 JHU CLSP Workshop on Zero Resource Speech Technologies and Models of Early Language Acquisition

Author: Bennett Erin
Borschinger Benjamin
Chiu Justin
Church Kenneth
Clark Pascal
Dunbar Ewan
Dupoux Emmanuel
Feldman Naomi
Fourtassi Abdallah
Goldwater Sharon
Harwath David
Hermansky Hynek
Jansen Aren
Johnson Mark
Khudanpur Sanjeev
Lee Chia-ying
Levin Keith
McGraw Ian
Metze Florian
Norouzian Atta
Peddinti Vijay
Richardson Rachel
Rose Richard
Schatz Thomas
Seltzer Mike
Thomas Samuel
Varadarajan Balakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.5 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

Unsupervised Learning of Semantic Audio Representations

Author: Ellis Daniel P. W.
Hershey Shawn
Jansen Aren
Liu Jiayang
Moore R. Channing
Pandya Ratheet
Plakal Manoj
Saurous Rif A.
Publication venue
Publication date: 06/11/2017
Field of study

Even in the absence of any explicit semantic annotation, vast collections of audio recordings provide valuable information for learning the categorical structure of sounds. We consider several class-agnostic semantic constraints that apply to unlabeled nonspeech audio: (i) noise and translations in time do not change the underlying sound category, (ii) a mixture of two sound events inherits the categories of the constituents, and (iii) the categories of events in close temporal proximity are likely to be the same or related. Without labels to ground them, these constraints are incompatible with classification loss functions. However, they may still be leveraged to identify geometric inequalities needed for triplet loss-based training of convolutional neural networks. The result is low-dimensional embeddings of the input spectrograms that recover 41% and 84% of the performance of their fully-supervised counterparts when applied to downstream query-by-example sound retrieval and sound event classification tasks, respectively. Moreover, in limited-supervision settings, our unsupervised embeddings double the state-of-the-art classification performance.Comment: Submitted to ICASSP 201

arXiv.org e-Print Archive

Crossref