803 research outputs found
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
Dealing with Metonymic Readings of Named Entities
The aim of this paper is to propose a method for tagging named entities (NE),
using natural language processing techniques. Beyond their literal meaning,
named entities are frequently subject to metonymy. We show the limits of
current NE type hierarchies and detail a new proposal aiming at dynamically
capturing the semantics of entities in context. This model can analyze complex
linguistic phenomena like metonymy, which are known to be difficult for natural
language processing but crucial for most applications. We present an
implementation and some test using the French ESTER corpus and give significant
results
Recommended from our members
Speaker diarisation and longitudinal linking in multi-genre broadcast data
This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.This work is in part supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology). C. Zhang is also supported by a Cambridge International Scholarship from the Cambridge Commonwealth, European & International Trust.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ASRU.2015.740485
Albayzin 2010 Evaluation campaign: speaker diarization
In this paper we present the evaluation results for the task of
speaker diarization in broadcast news domain as part of the Albayzin
2010 evaluation campaign of language and speech technologies.
The evaluation data was a subset of the Catalan broadcast
news database recorded from the 3/24 TV channel. Six
competing systems from five different universities were submitted
for the Albayzin 2010: Speaker diarization session and the
lowest diarization error rate obtained was 30.4%.Postprint (published version
Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance
This paper analyzes the gender representation in four major corpora of French
broadcast. These corpora being widely used within the speech processing
community, they are a primary material for training automatic speech
recognition (ASR) systems. As gender bias has been highlighted in numerous
natural language processing (NLP) applications, we study the impact of the
gender imbalance in TV and radio broadcast on the performance of an ASR system.
This analysis shows that women are under-represented in our data in terms of
speakers and speech turns. We introduce the notion of speaker role to refine
our analysis and find that women are even fewer within the Anchor category
corresponding to prominent speakers. The disparity of available data for both
gender causes performance to decrease on women. However this global trend can
be counterbalanced for speaker who are used to speak in the media when
sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T
Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media
The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained
Comparison of Spectral Properties of Read, Prepared and Casual Speech in French
International audienceIn this paper, we investigate the acoustic properties of phonemes in three speaking styles: read speech, prepared speech and spontaneous speech. Our aim is to better understand why speech recognition systems still fails to achieve good performances on spontaneous speech. This work follows the work of Nakamura et al. \cite{nakamura2008} on Japanese speaking styles, with the difference that we here focus on French. Using Nakamura's method, we use classical speech recognition features, MFCC, and try to represent the effects of the speaking styles on the spectral space. Two measurements are defined in order to represent the spectral space reduction and the spectral variance extension. Experiments are then carried on to investigate if indeed we find some differences between the three speaking styles using these measurements. We finally compare our results to those obtained by Nakamura on Japanese to see if the same phenomenon appears
Proposal for an Extension of Traditional Named Entitites: from Guidelines to Evaluation, an Overview
International audienceWithin the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities defini- tion. We thus defined new types of entities with broader coverage including substantive- based expressions. These extended named en- tities are hierarchical (with types and compo- nents) and compositional (with recursive type inclusion and metonymy annotation). Human annotators used these guidelines to annotate a 1.3M word broadcast news corpus in French. This article presents the definition and novelty of extended named entity annotation guide- lines, the human annotation of a global corpus and of a mini reference corpus, and the evalu- ation of annotations through the computation of inter-annotator agreement. Finally, we dis- cuss our approach and the computed results, and outline further work
- …