2,720 research outputs found

    Frame-by-frame language identification in short utterances using deep neural networks

    Full text link
    This is the author’s version of a work that was accepted for publication in Neural Networks. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Neural Networks, VOL 64, (2015) DOI 10.1016/j.neunet.2014.08.006This work addresses the use of deep neural networks (DNNs) in automatic language identification (LID) focused on short test utterances. Motivated by their recent success in acoustic modelling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from the short-term acoustic features. We show how DNNs are particularly suitable to perform LID in real-time applications, due to their capacity to emit a language identification posterior at each new frame of the test utterance. We then analyse different aspects of the system, such as the amount of required training data, the number of hidden layers, the relevance of contextual information and the effect of the test utterance duration. Finally, we propose several methods to combine frame-by-frame posteriors. Experiments are conducted on two different datasets: the public NIST Language Recognition Evaluation 2009 (3 s task) and a much larger corpus (of 5 million utterances) known as Google 5M LID, obtained from different Google Services. Reported results show relative improvements of DNNs versus the i-vector system of 40% in LRE09 3 second task and 76% in Google 5M LID

    Phonetic cues to depression:A sociolinguistic perspective

    Get PDF
    Phonetic data are used in several ways outside of the core field of phonetics. This paper offers the perspective of one such field, sociophonetics, towards another, the study of acoustic cues to clinical depression. While sociophonetics is interested in how, when, and why phonetic variables cue information about the world, the study of acoustic cues to depression is focused on how phonetic variables can be used by medical professionals as tools to diagnosis. The latter is only interested in identifying phonetic cues to depression, while the former is interested in how phonetic variation cues anything at all. While the two fields fundamentally differ with respect to ontology, epistemology, and methodology, I argue that there are, nonetheless, possible avenues for future engagement, collaboration, and investigation. Ultimately, both fields need to engage with Crip Linguistics for any successful intervention on the relationship between speech and depression

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    Methods in Contemporary Linguistics

    Get PDF
    The present volume is a broad overview of methods and methodologies in linguistics, illustrated with examples from concrete research. It collects insights gained from a broad range of linguistic sub-disciplines, ranging from core disciplines to topics in cross-linguistic and language-internal diversity or to contributions towards language, space and society. Given its critical and innovative nature, the volume is a valuable source for students and researchers of a broad range of linguistic interests
    corecore