4 research outputs found
ASR error management for improving spoken language understanding
This paper addresses the problem of automatic speech recognition (ASR) error
detection and their use for improving spoken language understanding (SLU)
systems. In this study, the SLU task consists in automatically extracting, from
ASR transcriptions , semantic concepts and concept/values pairs in a e.g
touristic information system. An approach is proposed for enriching the set of
semantic labels with error specific labels and by using a recently proposed
neural approach based on word embeddings to compute well calibrated ASR
confidence measures. Experimental results are reported showing that it is
possible to decrease significantly the Concept/Value Error Rate with a state of
the art system, outperforming previously published results performance on the
same experimental data. It also shown that combining an SLU approach based on
conditional random fields with a neural encoder/decoder attention based
architecture , it is possible to effectively identifying confidence islands and
uncertain semantic output segments useful for deciding appropriate error
handling actions by the dialogue manager strategy .Comment: Interspeech 2017, Aug 2017, Stockholm, Sweden. 201
The MGB Challenge: Evaluating Multi-genre Broadcast Media Recognition
This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting - i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained
Speaker Diarization
DisertaÄŤnĂ práce se zaměřuje na tĂ©ma diarizace Ĺ™eÄŤnĂkĹŻ, coĹľ je Ăşloha zpracovánĂ Ĺ™eÄŤi typicky charakterizovaná otázkou "Kdo kdy mluvĂ?". Práce se takĂ© zabĂ˝vá souvisejĂcĂ Ăşlohou detekce pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je velmi relevantnĂ pro diarizaci.
Teoretická část práce poskytuje pĹ™ehled existujĂcĂch metod diarizace Ĺ™eÄŤnĂkĹŻ, a to jak tÄ›ch offline, tak online, a pĹ™ibliĹľuje nÄ›kolik problematickĂ˝ch oblastĂ, kterĂ© byly identifikovány v ranĂ© fázi autorÄŤina vĂ˝zkumu. V práci je takĂ© pĹ™edloĹľeno rozsáhlĂ© srovnánĂ existujĂcĂch systĂ©mĹŻ se zaměřenĂm na jejich uvádÄ›nĂ© vĂ˝sledky. Jedna kapitola se takĂ© zaměřuje na tĂ©ma pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi a na metody jejĂ detekce.
Experimentálnà část práce pĹ™edkládá praktickĂ© vĂ˝stupy, kterĂ˝ch bylo dosaĹľeno. Experimenty s diarizacĂ se zaměřovaly zejmĂ©na na online systĂ©m zaloĹľenĂ˝ na GMM a na i-vektorovĂ˝ systĂ©m, kterĂ˝ mÄ›l offline i online varianty. ZávÄ›reÄŤná sekce experimentĹŻ takĂ© pĹ™ibliĹľuje novÄ› navrĹľenou metodu pro detekci pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je zaloĹľena na konvoluÄŤnĂ neuronovĂ© sĂti.ObhájenoThe thesis focuses on the topic of speaker diarization, a speech processing task that is commonly characterized as the question "Who speaks when?". It also addresses the related task of overlapping speech detection, which is very relevant for diarization.
The theoretical part of the thesis provides an overview of existing diarization approaches, both offline and online, and discusses some of the problematic areas which were identified in early stages of the author's research. The thesis also includes an extensive comparison of existing diarization systems, with focus on their reported performance. One chapter is also dedicated to the topic of overlapping speech and the methods of its detection.
The experimental part of the thesis then presents the work which has been done on speaker diarization, which was focused mostly on a GMM-based online diarization system and an i-vector based system with both offline and online variants. The final section also details a newly proposed approach for detecting overlapping speech using a convolutional neural network