Search CORE

51 research outputs found

Factoid question answering for spoken documents

Author: Comas Umbert Pere R.
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario. More specifically, we study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution. Our approach is largely based on supervised machine learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages. In the work resulting of this Thesis, we have impulsed and coordinated the creation of an evaluation framework for the task of QA on spoken documents. The framework, named QAst, provides multi-lingual corpora, evaluation questions, and answers key. These corpora have been used in the QAst evaluation that was held in the CLEF workshop for the years 2007, 2008 and 2009, thus helping the developing of state-of-the-art techniques for this particular topic. The presentend QA system and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus composed of manual transcripts and automatic transcripts obtained by three different Automatic Speech Recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts unless the ASR quality is very low. Overall, the performance of our system is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.En aquesta Tesi, presentem un sistema de Question Answering (QA) factual, especialment ajustat per treballar amb documents orals. En el desenvolupament explorem, per primera vegada, quines tècniques de les habitualment emprades en QA per documents escrit són suficientment robustes per funcionar en l'escenari més difícil de documents orals. Amb més especificitat, estudiem nous mètodes de Information Retrieval (IR) dissenyats per tractar amb la veu, i utilitzem diversos nivells d'informació linqüística. Entre aquests s'inclouen, a saber: detecció de Named Entities utilitzant informació fonètica, "parsing" sintàctic aplicat a transcripcions de veu, i també l'ús d'un sub-sistema de detecció i resolució de la correferència. La nostra aproximació al problema es recolza en gran part en tècniques supervisades de Machine Learning, estant aquestes enfocades especialment cap a la part d'extracció de la resposta, i fa servir la menor quantitat possible de coneixement creat per humans. En conseqüència, tot el procés de QA pot ser adaptat a altres dominis o altres llengües amb relativa facilitat. Un dels resultats addicionals de la feina darrere d'aquesta Tesis ha estat que hem impulsat i coordinat la creació d'un marc d'avaluació de la taska de QA en documents orals. Aquest marc de treball, anomenat QAst (Question Answering on Speech Transcripts), proporciona un corpus de documents orals multi-lingüe, uns conjunts de preguntes d'avaluació, i les respostes correctes d'aquestes. Aquestes dades han estat utilitzades en les evaluacionis QAst que han tingut lloc en el si de les conferències CLEF en els anys 2007, 2008 i 2009; d'aquesta manera s'ha promogut i ajudat a la creació d'un estat-de-l'art de tècniques adreçades a aquest problema en particular. El sistema de QA que presentem i tots els seus particulars sumbòduls, han estat avaluats extensivament utilitzant el corpus EPPS (transcripcions de les Sessions Plenaries del Parlament Europeu) en anglès, que cónté transcripcions manuals de tots els discursos i també transcripcions automàtiques obtingudes mitjançant tres reconeixedors automàtics de la parla (ASR) diferents. Els reconeixedors tenen característiques i resultats diferents que permetes una avaluació quantitativa i qualitativa de la tasca. Aquestes dades pertanyen a l'avaluació QAst del 2009. Els resultats principals de la nostra feina confirmen que la informació sintàctica és mol útil per aprendre automàticament a valorar la plausibilitat de les respostes candidates, millorant els resultats previs tan en transcripcions manuals com transcripcions automàtiques, descomptat que la qualitat de l'ASR sigui molt baixa. En general, el rendiment del nostre sistema és comparable o millor que els altres sistemes pertanyents a l'estat-del'art, confirmant així la validesa de la nostra aproximació

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Recommended from our members

Automatic Dialect and Accent Recognition and its Application to Speech Recognition

Author: Biadsy Fadi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

A fundamental challenge for current research on speech science and technology is understanding and modeling individual variation in spoken language. Individuals have their own speaking styles, depending on many factors, such as their dialect and accent as well as their socioeconomic background. These individual differences typically introduce modeling difficulties for large-scale speaker-independent systems designed to process input from any variant of a given language. This dissertation focuses on automatically identifying the dialect or accent of a speaker given a sample of their speech, and demonstrates how such a technology can be employed to improve Automatic Speech Recognition (ASR). In this thesis, we describe a variety of approaches that make use of multiple streams of information in the acoustic signal to build a system that recognizes the regional dialect and accent of a speaker. In particular, we examine frame-based acoustic, phonetic, and phonotactic features, as well as high-level prosodic features, comparing generative and discriminative modeling techniques. We first analyze the effectiveness of approaches to language identification that have been successfully employed by that community, applying them here to dialect identification. We next show how we can improve upon these techniques. Finally, we introduce several novel modeling approaches -- Discriminative Phonotactics and kernel-based methods. We test our best performing approach on four broad Arabic dialects, ten Arabic sub-dialects, American English vs. Indian English accents, American English Southern vs. Non-Southern, American dialects at the state level plus Canada, and three Portuguese dialects. Our experiments demonstrate that our novel approach, which relies on the hypothesis that certain phones are realized differently across dialects, achieves new state-of-the-art performance on most dialect recognition tasks. This approach achieves an Equal Error Rate (EER) of 4% for four broad Arabic dialects, an EER of 6.3% for American vs. Indian English accents, 14.6% for American English Southern vs. Non-Southern dialects, and 7.9% for three Portuguese dialects. Our framework can also be used to automatically extract linguistic knowledge, specifically the context-dependent phonetic cues that may distinguish one dialect form another. We illustrate the efficacy of our approach by demonstrating the correlation of our results with geographical proximity of the various dialects. As a final measure of the utility of our studies, we also show that, it is possible to improve ASR. Employing our dialect identification system prior to ASR to identify the Levantine Arabic dialect in mixed speech of a variety of dialects allows us to optimize the engine's language model and use Levantine-specific acoustic models where appropriate. This procedure improves the Word Error Rate (WER) for Levantine by 4.6% absolute; 9.3% relative. In addition, we demonstrate in this thesis that, using a linguistically-motivated pronunciation modeling approach, we can improve the WER of a state-of-the art ASR system by 2.2% absolute and 11.5% relative WER on Modern Standard Arabic

Columbia University Academic Commons

TechNews digests: Jan - Mar 2010

Author
Publication venue: British Educational Communications and Technology Agency (BECTA)
Publication date: 01/01/2010
Field of study

TechNews is a technology, news and analysis service aimed at anyone in the education sector keen to stay informed about technology developments, trends and issues. TechNews focuses on emerging technologies and other technology news. TechNews service : digests september 2004 till May 2010 Analysis pieces and News combined publish every 2 to 3 month

Digital Education Resource Archive

Personal long-term memory aids

Author: Vemuri Sunil, 1969-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, February 2005.MIT Institute Archives Copy: p. 101-132 bound in reverse order.Includes bibliographical references (p. 126-132).The prevalence and affordability of personal and environmental recording apparatuses are leading to increased documentation of our daily lives. This trend is bound to continue and it follows that academic, industry, and government groups are showing an increased interest in such endeavors for various purposes. In the present case, I assert that such documentation can be used to help remedy common memory problems. Assuming a long-term personal archive exists, when confronted with a memory problem, one faces a new challenge, that of finding relevant memory triggers. This dissertation examines the use of information-retrieval technologies on long-term archives of personal experiences towards remedying certain types of long-term forgetting. The approach focuses on capturing audio for the content. Research on Spoken Document Retrieval examines the pitfalls of information-retrieval techniques on error-prone speech- recognizer-generated transcripts and these challenges carry over to the present task. However, "memory retrieval" can benefit from the person's familiarity of the recorded data and the context in which it was recorded to help guide their effort. To study this, I constructed memory-retrieval tools designed to leverage a person's familiarity of their past to optimize their search task. To evaluate the utility of these towards solving long-term memory problems, I (1) recorded public events and evaluated witnesses' memory-retrieval approaches using these tools; and (2) conducted a longer- term memory-retrieval study based on recordings of several years of my personal and research-related conversations. Subjects succeeded with memory-retrieval tasks in both studies, typically finding answers within minutes.(cont.) This is far less time than the alternate of re-listening to hours of recordings. Subjects' memories of the past events, in particular their ability to narrow the window of time in which past events occurred, improved their ability to find answers. In addition to results from the memory-retrieval studies, I present a technique called "speed listening." By using a transcript (even one with many errors), it allows people to reduce listening time while maintaining comprehension. Finally, I report on my experiences recording events in my life over 2.5 years.by Sunil Vemuri.Ph.D

DSpace@MIT

Digital imaging technology assessment: Digital document storage project

Author
Publication venue
Publication date
Field of study

An ongoing technical assessment and requirements definition project is examining the potential role of digital imaging technology at NASA's STI facility. The focus is on the basic components of imaging technology in today's marketplace as well as the components anticipated in the near future. Presented is a requirement specification for a prototype project, an initial examination of current image processing at the STI facility, and an initial summary of image processing projects at other sites. Operational imaging systems incorporate scanners, optical storage, high resolution monitors, processing nodes, magnetic storage, jukeboxes, specialized boards, optical character recognition gear, pixel addressable printers, communications, and complex software processes

NASA Technical Reports Server

Error analysis in automatic speech recognition and machine translation

Author: Loomans Nicolaas Dirk Petrus
Publication venue
Publication date: 13/09/2021
Field of study

Automatic speech recognition and machine translation are well-known terms in the translation world nowadays. Systems that carry out these processes are taking over the work of humans more and more. Reasons for this are the speed at which the tasks are performed and their costs. However, the quality of these systems is debatable. They are not yet capable of delivering the same performance as human transcribers or translators. The lack of creativity, the ability to interpret texts and the sense of language is often cited as the reason why the performance of machines is not yet at the level of human translation or transcribing work. Despite this, there are companies that use these machines in their production pipelines. Unbabel, an online translation platform powered by artificial intelligence, is one of these companies. Through a combination of human translators and machines, Unbabel tries to provide its customers with a translation of good quality. This internship report was written with the aim of gaining an overview of the performance of these systems and the errors they produce. Based on this work, we try to get a picture of possible error patterns produced by both systems. The present work consists of an extensive analysis of errors produced by automatic speech recognition and machine translation systems after automatically transcribing and translating 10 English videos into Dutch. Different videos were deliberately chosen to see if there were significant differences in the error patterns between videos. The generated data and results from this work, aims at providing possible ways to improve the quality of the services already mentioned.O reconhecimento automático de fala e a tradução automática são termos conhecidos no mundo da tradução, hoje em dia. Os sistemas que realizam esses processos estão a assumir cada vez mais o trabalho dos humanos. As razões para isso são a velocidade com que as tarefas são realizadas e os seus custos. No entanto, a qualidade desses sistemas é discutível. As máquinas ainda não são capazes de ter o mesmo desempenho dos transcritores ou tradutores humanos. A falta de criatividade, de capacidade de interpretar textos e de sensibilidade linguística são motivos frequentemente usados para justificar o facto de as máquinas ainda não estarem suficientemente desenvolvidas para terem um desempenho comparável com o trabalho de tradução ou transcrição humano. Mesmo assim, existem empresas que fazem uso dessas máquinas. A Unbabel, uma plataforma de tradução online baseada em inteligência artificial, é uma dessas empresas. Através de uma combinação de tradutores humanos e de máquinas, a Unbabel procura oferecer aos seus clientes traduções de boa qualidade. O presente relatório de estágio foi feito com o intuito de obter uma visão geral do desempenho desses sistemas e das falhas que cometem, propondo delinear uma imagem dos possíveis padrões de erro existentes nos mesmos. Para tal, fez-se uma análise extensa das falhas que os sistemas de reconhecimento automático de fala e de tradução automática cometeram, após a transcrição e a tradução automática de 10 vídeos. Foram deliberadamente escolhidos registos videográficos diversos, de modo a verificar possíveis diferenças nos padrões de erro. Através dos dados gerados e dos resultados obtidos, propõe-se encontrar uma forma de melhorar a qualidade dos serviços já mencionados

Universidade de Lisboa: Repositório.UL

Automatic topic detection of multi-lingual news stories.

Author
Publication venue
Publication date: 01/01/2000
Field of study

Wong Kam Lai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 92-98).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Our Contributions --- p.5Chapter 1.2 --- Organization of this Thesis --- p.5Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Dragon Systems --- p.7Chapter 2.2 --- Carnegie Mellon University (CMU) --- p.9Chapter 2.3 --- University of Massachusetts (UMass) --- p.10Chapter 2.4 --- IBM T.J. Watson Research Center --- p.11Chapter 2.5 --- BBN Technologies --- p.12Chapter 2.6 --- National Taiwan University (NTU) --- p.13Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14Chapter 3 --- Overview of Proposed Approach --- p.15Chapter 3.1 --- News Source --- p.15Chapter 3.2 --- Story Preprocessing --- p.18Chapter 3.3 --- Concept Term Generation --- p.20Chapter 3.4 --- Named Entity Extraction --- p.21Chapter 3.5 --- Gross Translation of Chinese to English --- p.21Chapter 3.6 --- Topic Detection method --- p.22Chapter 3.6.1 --- Deferral Period --- p.22Chapter 3.6.2 --- Detection Approach --- p.23Chapter 4 --- Concept Term Model --- p.25Chapter 4.1 --- Background of Contextual Analysis --- p.25Chapter 4.2 --- Concept Term Generation --- p.28Chapter 4.2.1 --- Concept Generation Algorithm --- p.28Chapter 4.2.2 --- Concept Term Representation for Detection --- p.33Chapter 5 --- Topic Detection Model --- p.35Chapter 5.1 --- Text Representation and Term Weights --- p.35Chapter 5.1.1 --- Story Representation --- p.35Chapter 5.1.2 --- Topic Representation --- p.43Chapter 5.1.3 --- Similarity Score --- p.43Chapter 5.1.4 --- Time adjustment scheme --- p.46Chapter 5.2 --- Gross Translation Method --- p.48Chapter 5.3 --- The Detection System --- p.50Chapter 5.3.1 --- Detection Requirement --- p.50Chapter 5.3.2 --- The Top Level Model --- p.52Chapter 5.4 --- The Clustering Algorithm --- p.55Chapter 5.4.1 --- Similarity Calculation --- p.55Chapter 5.4.2 --- Grouping Related Elements --- p.56Chapter 5.4.3 --- Topic Identification --- p.60Chapter 6 --- Experimental Results and Analysis --- p.63Chapter 6.1 --- Evaluation Model --- p.63Chapter 6.1.1 --- Evaluation Methodology --- p.64Chapter 6.2 --- Experiments on the effects of tuning the parameter --- p.68Chapter 6.2.1 --- Experiment Setup --- p.68Chapter 6.2.2 --- Results and Analysis --- p.69Chapter 6.3 --- Experiments on the effects of named entities and concept terms --- p.74Chapter 6.3.1 --- Experiment Setup --- p.74Chapter 6.3.2 --- Results and Analysis --- p.75Chapter 6.4 --- Experiments on the effect of using time adjustment --- p.77Chapter 6.4.1 --- Experiment Setup --- p.77Chapter 6.4.2 --- Results and Analysis --- p.79Chapter 6.5 --- Experiments on mono-lingual detection --- p.80Chapter 6.5.1 --- Experiment Setup --- p.80Chapter 6.5.2 --- Results and Analysis --- p.80Chapter 7 --- Conclusions and Future Work --- p.83Chapter 7.1 --- Conclusions --- p.83Chapter 7.2 --- Future Work --- p.85Chapter A --- List of Topics annotated for TDT3 Corpus --- p.86Chapter B --- Matching evaluation topics to hypothesized topics --- p.90Bibliography --- p.9

CUHK Digital Repository

Recommended from our members

NBS monograph

Author: Stevens Mary Elizabeth
United States. Bureau of Standards.
Publication venue: United States. Government Printing Office.
Publication date: 01/03/1970
Field of study

From Introduction: "This report is the first of a series intended to provide a selective overview of research and development efforts and requirements in the somewhat overlapping fields of the computer and information sciences and technologies. The projected series of reports will attempt to outline the probable range of R & D activities in the computer and information sciences and technologies through selective reviews of the literature and to develop a reasonable consensus with respect to the opinions of workers in these and potentially related fields as to areas of continuing R & D concern for research program planning or review in these areas.

UNT Digital Library