Search CORE

12 research outputs found

Recommended from our members

A WOz Variant with Contrastive Conditions

Author: Levin Esther
Passonneau Rebecca
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

We present a variant of the WOz paradigm we refer to as incremental ablation. The new feature involves incrementally restricting the human wizard’s capacities in the direction of a dialog system. We lay out a data collection design with six conditions of user-system and user-wizard interactions that allows us to more precisely identify how to close the communication gap between humans and systems. We describe the application of the method to analysis of contexts in which ASR errors occur, giving us a means to investigate the problem solving strategies humans would resort to if their communication channel were restricted to be more like the machine’s. We describe how we can use the methodology to collect data that is more relevant to a particular learning paradigm involving Markov Decision Processes (MDP)

Columbia University Academic Commons

PRESENCE: A human-inspired architecture for speech-based human-machine interaction

Author: Moore R.K.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2007
Field of study

Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system

White Rose Research Online

Detección de palabras claves en lenguajes sin datos de entrenamiento

Author: Brusco Pablo
Ferrer Luciana
Gravano Agustín
Publication venue
Publication date: 01/10/2014
Field of study

Estudiamos el problema de detección de palabras claves (key-word-spotting) para idiomas que no disponen de corpus de datos con grabaciones y transcripciones fonéticas. Este problema es de central importancia para poder realizar búsquedas en bases de datos de grabaciones de habla. Usando el Boston University Radio Speech Corpus como corpus de referencia, analizamos diversas topologías y parametrizaciones de Modelos Ocultos de Markov para la detección de palabras sobre habla continua. Los modelos se basan en el uso de "fillers" para palabras no buscadas, y empleamos fonemas como unidades mínimas de detección. Para las pruebas, utilizamos un conjunto de 20 keywords entrenadas con 14 minutos de datos transcriptos y fillers entrenados con 7 horas sin transcripciones. Los resultados muestran que el mejor modelo alcanza rendimientos superiores a un 0.47 de FOM promedio, un porcentaje de detecciones correctas del 72.1% y 3.95 falsas alarmas por hora por keyword.XI Workshop Bases de Datos y Minería de DatosRed de Universidades con Carreras de Informática (RedUNCI

Konuşma Tanıma için İnsan-makine Karşılaştırması

Author: Ayşe Gürel
Levent M. Arslan
Publication venue: BÜTEK Boğaziçi Eğitim Turizm Teknopark Uygulama ve Dan. Hiz. San. Tic. A.Ş.
Publication date: 01/07/2008
Field of study

Speech/voice recognition by machines has been a topic of interest since 1950s. Research that initially adopted dynamic programming methodologies now mostly uses the hidden Markov model as the method for speech recognition. Nevertheless, even the most advanced speech recognition system makes, depending on the context, 2-20 times more errors than humans. Although the basic principles behind human speech recognition have not been completely understood, there are some theories that attempt to explain biological mechanisms for speech recognition. This paper aims to provide a review of these theories as well as a brief history of developments in automatic speech recognition technology. Furthermore, the paper discusses some recent studies on Turkish speech recognition. The paper concludes with a comparison between human and machine speech recognition performance

Directory of Open Access Journals

Detección de palabras claves en lenguajes sin datos de entrenamiento

Author: Brusco Pablo
Ferrer Luciana
Gravano Agustín
Publication venue
Publication date: 01/10/2014
Field of study

Servicio de Difusión de la Creación Intelectual

Discovery of Words: towards a Computational Model of Language Acquisition

Author: Bosch L.F.M. ten
Boves L.W.J.
Hamme H. Van
Mihelic F.
Zibert J.
Publication venue: 'IntechOpen'
Publication date: 01/01/2008
Field of study

IntechOpen

Crossref

Error Correction based on Error Signatures applied to automatic speech recognition

Author: Telaar Dominic
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

KITopen

An integrated approach to speech recognition using phrase-based units

Author: Watkins Christopher James
Publication venue: University of East Anglia
Publication date: 01/01/2010
Field of study

University of East Anglia digital repository

Phonetic Event-based Whole-Word Modeling Approaches for Speech Recognition

Author: Kintzley Keith Russell
Publication venue: Johns Hopkins University
Publication date
Field of study

Speech is composed of basic speech sounds called phonemes, and these subword units are the foundation of most speech recognition systems. While detailed acoustic models of phones (and phone sequences) are common, most recognizers model words themselves as a simple concatenation of phonemes and do not closely model the temporal relationships between phonemes within words. Human speech production is constrained by the movement of speech articulators, and there is abundant evidence to indicate that human speech recognition is inextricably linked to the temporal patterns of speech sounds. Structures such as the hidden Markov model (HMM) have proved extremely useful and effective because they offer a convenient framework for combining acoustic modeling of phones with powerful probabilistic language models. However, this convenience masks deficiencies in temporal modeling. Additionally, robust recognition requires complex automatic speech recognition (ASR) systems and entails non-trivial computational costs. As an alternative, we extend previous work on the point process model (PPM) for keyword spotting, an approach to speech recognition expressly based on whole-word modeling of the temporal relations of phonetic events. In our research, we have investigated and advanced a number of major components of this system. First, we have considered alternate methods of determining phonetic events from phone posteriorgrams. We have introduced several parametric approaches to modeling intra-word phonetic timing distributions which allow us to cope with data sparsity issues. We have substantially improved algorithms used to compute keyword detections, capitalizing on the sparse nature of the phonetic input which permits the system to be scaled to large data sets. We have considered enhanced CART-based modeling of phonetic timing distributions based on related text-to-speech synthesis work. Lastly, we have developed a point process based spoken term detection system and applied it to the conversational telephone speech task of the 2006 NIST Spoken Term Detection evaluation. We demonstrate the PPM system to be competitive with state-of-the-art phonetic search systems while requiring significantly fewer computational resources

JScholarship