2 research outputs found
Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media
The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en TecnologĂas del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained
Speaker Diarization
DisertaÄŤnĂ práce se zaměřuje na tĂ©ma diarizace Ĺ™eÄŤnĂkĹŻ, coĹľ je Ăşloha zpracovánĂ Ĺ™eÄŤi typicky charakterizovaná otázkou "Kdo kdy mluvĂ?". Práce se takĂ© zabĂ˝vá souvisejĂcĂ Ăşlohou detekce pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je velmi relevantnĂ pro diarizaci.
Teoretická část práce poskytuje pĹ™ehled existujĂcĂch metod diarizace Ĺ™eÄŤnĂkĹŻ, a to jak tÄ›ch offline, tak online, a pĹ™ibliĹľuje nÄ›kolik problematickĂ˝ch oblastĂ, kterĂ© byly identifikovány v ranĂ© fázi autorÄŤina vĂ˝zkumu. V práci je takĂ© pĹ™edloĹľeno rozsáhlĂ© srovnánĂ existujĂcĂch systĂ©mĹŻ se zaměřenĂm na jejich uvádÄ›nĂ© vĂ˝sledky. Jedna kapitola se takĂ© zaměřuje na tĂ©ma pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi a na metody jejĂ detekce.
Experimentálnà část práce pĹ™edkládá praktickĂ© vĂ˝stupy, kterĂ˝ch bylo dosaĹľeno. Experimenty s diarizacĂ se zaměřovaly zejmĂ©na na online systĂ©m zaloĹľenĂ˝ na GMM a na i-vektorovĂ˝ systĂ©m, kterĂ˝ mÄ›l offline i online varianty. ZávÄ›reÄŤná sekce experimentĹŻ takĂ© pĹ™ibliĹľuje novÄ› navrĹľenou metodu pro detekci pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je zaloĹľena na konvoluÄŤnĂ neuronovĂ© sĂti.ObhájenoThe thesis focuses on the topic of speaker diarization, a speech processing task that is commonly characterized as the question "Who speaks when?". It also addresses the related task of overlapping speech detection, which is very relevant for diarization.
The theoretical part of the thesis provides an overview of existing diarization approaches, both offline and online, and discusses some of the problematic areas which were identified in early stages of the author's research. The thesis also includes an extensive comparison of existing diarization systems, with focus on their reported performance. One chapter is also dedicated to the topic of overlapping speech and the methods of its detection.
The experimental part of the thesis then presents the work which has been done on speaker diarization, which was focused mostly on a GMM-based online diarization system and an i-vector based system with both offline and online variants. The final section also details a newly proposed approach for detecting overlapping speech using a convolutional neural network