Search CORE

34,792 research outputs found

Robust audio indexing for Dutch spoken-word collections

Author: Huijbregts Marijn
Jong Franciska de
Leeuwen David van
Ordelman Roeland
Publication venue: KNAW
Publication date: 01/01/2005
Field of study

Abstract—Whereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections

University of Twente Research Information

Automated speech and audio analysis for semantic access to multimedia

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

The deployment and integration of audio processing tools can enhance the semantic annotation of multimedia content, and as a consequence, improve the effectiveness of conceptual access tools. This paper overviews the various ways in which automatic speech and audio analysis can contribute to increased granularity of automatically extracted metadata. A number of techniques will be presented, including the alignment of speech and text resources, large vocabulary speech recognition, key word spotting and speaker classification. The applicability of techniques will be discussed from a media crossing perspective. The added value of the techniques and their potential contribution to the content value chain will be illustrated by the description of two (complementary) demonstrators for browsing broadcast news archives

University of Twente Research Information

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Robust Grammatical Analysis for Spoken Dialogue Systems

Author: Bouma Gosse
Koeling Rob
Nederhof Mark-Jan
van Noord Gertjan
Publication venue
Publication date: 01/01/1998
Field of study

We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.Comment: Accepted for JNL

arXiv.org e-Print Archive

CiteSeerX

Phonetic Temporal Neural Model for Language Identification

Author: Abel Andrew
Chen Yixiang
Li Lantian
Tang Zhiyuan
Wang Dong
Publication venue
Publication date: 25/08/2017
Field of study

Deep neural models, particularly the LSTM-RNN model, have shown great potential for language identification (LID). However, the use of phonetic information has been largely overlooked by most existing neural LID methods, although this information has been used very successfully in conventional phonetic LID systems. We present a phonetic temporal neural model for LID, which is an LSTM-RNN LID system that accepts phonetic features produced by a phone-discriminative DNN as the input, rather than raw acoustic features. This new model is similar to traditional phonetic LID methods, but the phonetic knowledge here is much richer: it is at the frame level and involves compacted information of all phones. Our experiments conducted on the Babel database and the AP16-OLR database demonstrate that the temporal phonetic neural approach is very effective, and significantly outperforms existing acoustic neural models. It also outperforms the conventional i-vector approach on short utterances and in noisy conditions.Comment: Submitted to TASL

arXiv.org e-Print Archive

University of Strathclyde Institutional Repository

Robust semantic analysis for adaptive speech interfaces

Author: Cheadle Maria
Gambäck Björn
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

The DUMAS project develops speech-based applications that are adaptable to different users and domains. The paper describes the project's robust semantic analysis strategy, used both in the generic framework for the development of multilingual speech-based dialogue systems which is the main project goal, and in the initial test application, a mobile phone-based e-mail interface

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive