Search CORE

192 research outputs found

The CMU-InterACT 2008 Mandarin Transcription System

Author: Fuhs Mark
Hsiao Roger
Jin Qin
Schultz Tanja
Wilson Tam
Publication venue
Publication date: 04/08/2008
Field of study

Integration of Language Identification into a Recognition System for Spoken Conversations Containing Code-Switches

Author: Dau-Cheng Lyu
Dominic Telaar
Eng-Siong Chng
Florian Metze
Haizhou Li
Jochen Weiner
Ngoc Thang Vu
Tanja Schultz
Publication venue
Publication date: 01/01/2012
Field of study

ABSTRACT This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system for spoken conversations containing code-switches between Mandarin and English. We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component. Furthermore, we advance this multistream approach by a new method called "Language Lookahead", in which the language information of subsequent frames is used to improve accuracy. Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%

CiteSeerX

Patterns of Interaction Between Moderators and Learners during Synchronous Oral Discussions Online

Author: Bayle Aurélie
Youngs Bonnie
Publication venue: Computer Assisted Language Instruction Consortium (CALICO)
Publication date: 01/01/2013
Field of study

This paper describes research with French university graduate student moderators in a Master's program on using technology to teach French as a foreign language and advanced undergraduate students learning French at an American university who used Second Life software and Moodle to carry out oral tasks synchronously. For fall 2011, the researchers designed five tasks (étapes) that paralleled the undergraduates' course curriculum. Transcripts of two of the six groups of moderators and learners show that the unintended different styles of moderator behaviors influenced learner interactions with each other and with the moderators. The authors show that students were less able to engage with each other independently when faced with more rigid questioning behaviors by the moderators

HAL Clermont Université

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

On the Development of Adaptive and User-Centred Interactive Multimodal Interfaces

Author: Callejas Zoraida
Espejo Gonzalo
Griol David
López-Cózar Ramón
Ábalos Nieves
Publication venue: 'IGI Global'
Publication date: 01/01/2012
Field of study

Multimodal systems have attained increased attention in recent years, which has made possible important improvements in the technologies for recognition, processing, and generation of multimodal information. However, there are still many issues related to multimodality which are not clear, for example, the principles that make it possible to resemble human-human multimodal communication. This chapter focuses on some of the most important challenges that researchers have recently envisioned for future multimodal interfaces. It also describes current efforts to develop intelligent, adaptive, proactive, portable and affective multimodal interfaces

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

cmu gale speech-to-text system,”

Author: Florian Metze
Qin Jin
Roger Hsiao
Tanja Schultz
Udhyakumar Nallasamy
Publication venue
Publication date: 01/01/2010
Field of study

Abstract This paper describes the latest Speech-to-Text system developed for the Global Autonomous Language Exploitation ("GALE") domain by Carnegie Mellon University (CMU). This systems uses discriminative training, bottle-neck features and other techniques that were not used in previous versions of our system, and is trained on 1150 hours of data from a variety of Arabic speech sources. In this paper, we show how different lexica, pre-processing, and system combination techniques can be used to improve the final output, and provide analysis of the improvements achieved by the individual techniques

CiteSeerX

Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

Author: Vu Ngoc Thang
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

KITopen

Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

Author: Abdullah Hadi
Butler Kevin R. B.
Garcia Washington
Peeters Christian
Traynor Patrick
Wilson Joseph
Publication venue
Publication date: 01/01/2019
Field of study

Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks

arXiv.org e-Print Archive

Crossref