Search CORE

7 research outputs found

Automatic summarization of voicemail messages using lexical and prosodic features

Author: Chen F.
Cordoba R.
Garofolo J.
Gotoh Y.
Hakkani-Tür D.
Hirschberg J.
Hirschberg J.
Hori C.
Huang J.
Jansche M.
Kato Y.
Konstantinos Koumpis
Koumpis K.
Koumpis K.
Koumpis K.
Koumpis K.
Koumpis K.
Kubala F.
Maclay H.
Makhoul J.
Medan Y.
Morgan N.
Morgan N.
Padmanabhan M.
Paksoy E.
Rohlicek J. R.
Saon G.
Scott M.
Shriberg E.
Steve Renals
Stevenson M.
Valenza R.
Walker M. A.
Warnke V.
Williams G.
Zechner K.
Zweig M. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

This article presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

Noisy Text Clustering

Author: Grangier David
Vinciarelli Alessandro
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This work presents document clustering experiments performed over noisy texts (i.e. text that have been extracted through an automatic process like speech or character recognition). The effect of recognition errors on different clustering techniques is measured through the comparison of the results obtained with clean (manually typed texts) and noisy (automatic speech transcripts affected by

30\%

Word Error Rate) versions of the TDT2 corpus (

\sim600

hours of spoken data from broadcast news). The results suggest that clustering can be performed over noisy data with an acceptable performance degradation

Infoscience - École polytechnique fédérale de Lausanne

The Impact of Accents on Automatic Recognition of South African English Speech: A Preliminary Investigation

Author: Katz Michelle
Mbogho Dr Audrey
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2010
Field of study

The accent with which words are spoken can have a strong effect on the performance of a speech recognition system. In a multilingual country such as South Africa where English is not the first language of most citizens, the need to address this issue is critical when building speech-based systems. In this project we trained two sets of hidden Markov Models for isolated word English speech. The first set of models was trained with native English speakers and the second set was trained with non-native speakers from a representative sample of major South African accent groups. We compared the recognition accuracies of the two sets of models and found that the models trained with accented English performed better. This preliminary research indicates that there is merit to committing resources to the task of accented training

UCT Computer Science Research Document Archive

Effect of Recognition Errors on Text Clustering

Author: Grangier David
Vinciarelli Alessandro
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

This paper presents clustering experiments performed over noisy texts (i.e. texts that have been extracted through an automatic process like character or speech recognition). The effect of recognition errors is investigated by comparing clustering results performed over both clean (manually typed data) and noisy (automatic speech transcriptions) versions of the same speech recording corpus

Infoscience - École polytechnique fédérale de Lausanne

Content-based access to spoken audio

Author: Koumpis Konstantinos
Renals Steve
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The amount of archived audio material in digital form is increasing rapidly, as advantage is taken of the growth in available storage and processing power. Computational resources are becoming less of a bottleneck to digitally record and archive vast amounts of spoken material, both television and radio broadcasts and individual conversations. However, listening to this ever-growing amount of spoken audio sequentially is too slow, and the bottleneck will become the development of effective ways to access content in these voluminous archives. The provision of accurate and efficient computer-mediated content access is a challenging task, because spoken audio combines information from multiple levels (phonetic, acoustic, syntactic, semantic and discourse). Most systems that assist humans in accessing spoken audio content have approached the problem by performing automatic speech recognition, followed by text-based information access. These systems have addressed diverse tasks including indexing and retrieving voicemail messages, searching for broadcast news, and extracting information from recordings of meetings and lectures. Spoken audio content is far richer than what a simple textual transcription can capture as it has additional cues that disclose the intended meaning and speaker’s emotional state. However, the text transcription alone still provides a great deal of useful information in applications. This article describes approaches to content-based access to spoken audio with a qualitative and tutorial emphasis. We describe how the analysis, retrieval and delivery phases contribute making spoken audio content more accessible, and we outline a number of outstanding research issues. We also discuss the main application domains and try to identify important issues for future developments. The structure of the article is based on general system architecture for content-based access which is depicted in Figure 1. Although the tasks within each processing stage may appear unconnected, the interdependencies and the sequence with which they take place vary

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Improvements to the complex question answering models

Author: Imam Md. Kaisar
Publication venue: 'University of Central Missouri, Department of Mathematics and Computer Science'
Publication date: 01/01/2011
Field of study

x, 128 leaves : ill. ; 29 cmIn recent years the amount of information on the web has increased dramatically. As a result, it has become a challenge for the researchers to find effective ways that can help us query and extract meaning from these large repositories. Standard document search engines try to address the problem by presenting the users a ranked list of relevant documents. In most cases, this is not enough as the end-user has to go through the entire document to find out the answer he is looking for. Question answering, which is the retrieving of answers to natural language questions from a document collection, tries to remove the onus on the end-user by providing direct access to relevant information. This thesis is concerned with open-domain complex question answering. Unlike simple questions, complex questions cannot be answered easily as they often require inferencing and synthesizing information from multiple documents. Hence, we considered the task of complex question answering as query-focused multi-document summarization. In this thesis, to improve complex question answering we experimented with both empirical and machine learning approaches. We extracted several features of different types (i.e. lexical, lexical semantic, syntactic and semantic) for each of the sentences in the document collection in order to measure its relevancy to the user query. We have formulated the task of complex question answering using reinforcement framework, which to our best knowledge has not been applied for this task before and has the potential to improve itself by fine-tuning the feature weights from user feedback. We have also used unsupervised machine learning techniques (random walk, manifold ranking) and augmented semantic and syntactic information to improve them. Finally we experimented with question decomposition where instead of trying to find the answer of the complex question directly, we decomposed the complex question into a set of simple questions and synthesized the answers to get our final result

OPUS: Open Uleth Scholarship - University of Lethbridge Research Repository