Search CORE

33 research outputs found

Transformer-based encoder-encoder architecture for Spoken Term Detection

Author: Lehečka Jan
Šmídl Luboš
Švec Jan
Publication venue
Publication date: 02/11/2022
Field of study

The paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including convolutional and upsampling layers, attention masking, and shared parameters. The encoders project a recognized hypothesis and a searched term into a shared embedding space, where the score of the putative hit is computed using the calibrated dot product. In the experiments, we used the Wav2Vec 2.0 speech recognizer, and the proposed system outperformed a baseline method based on deep LSTMs on the English and Czech STD datasets based on USC Shoah Foundation Visual History Archive (MALACH).Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Expanding Decision Support Systems Outside Company Gates

Author: Jiří Hodík
Jiří Vokřínek
Josef Psutka
Luboš Šmídl
Michal Pěchouček
Petr Bečvář
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

Air Traffic Control Communication

Author: Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 15/12/2011
Field of study

Corpus contains recordings of communication between air traffic controllers and pilots. The speech is manually transcribed and labeled with the information about the speaker (pilot/controller, not the full identity of the person). The corpus is currently small (20 hours) but we plan to search for additional data next year. The audio data format is: 8kHz, 16bit PCM, mono

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

ATCC: Pronunciation lexicon and n-gram counts for ASR module

Author: Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 01/01/2013
Field of study

The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0)

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Air Traffic Control Communication

Author: Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 15/12/2011
Field of study

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

ATCC: Pronunciation lexicon and n-gram counts for ASR module

Author: Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 01/01/2013
Field of study

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

OVM – Otázky Václava Moravce

Author: Pražák Aleš
Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 04/01/2013
Field of study

The corpus consists of transcribed recordings from the Czech political discussion broadcast “Otázky Václava Moravce“. It contains 35 hours of speech and corresponding word-by-word transcriptions, including the transcription of some non-speech events. Speakers’ names are also assigned to corresponding segments. The resulting corpus is suitable for both acoustic model training for ASR purposes and training of speaker identification and/or verification systems. The archive contains 16 sound files (WAV PCM, 16-bit, 48 kHz, mono) and transcriptions in XML-based standard Transcriber format (http://trans.sourceforge.net

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Rozpoznání spojité spontánní řeči s velkým slovníkem a v reálném čase pro dialogové systémy

Author: Šmídl Luboš
Švec Jan
Publication venue: IEEE Press
Publication date: 01/01/2011
Field of study

Článek popisuje modifikaci výchozího systému pro rozpoznávání řeči. Výsledný systém je vhodný pro použití v hlasovém dialogovém systému se smíšenou iniciativou a přirozeným vstupem. Jsou prezentovány tři přístupy pro rozšiření rozpoznávacího slovníku za účelem zajištění schopnosti rozpoznat všechny entity z dané domény. Dále je navržena metoda normalizace nespisovného textu. Experimenty provedené na korpusu spontánní řeči ukazují, že navržená metoda je velmi významná pro jazyky, kde se podstatně liší psaná formální podoba jazyka a obecná nespisnovná řeč. Celková chybovost slov (Word Error Rate) byla redukována o 16.7%.This paper describes the method for modifying the baseline speech recognition system to be suitable for a use in spoken dialog system with mixed initiative and natural user’s input. We present three approaches for extending the recognition vocabulary to ensure the spoken dialog system is able to recognize all entities in the given domain. The colloquial text normalization method is proposed. The experiments performed on spontaneous speech corpus suggested that the proposed method is very important for languages where the formal written language and a common colloquial speech are very different. The overall word error rate was reduced by 16.7%

University of West Bohemia Digital Library

DSpace at University of West Bohemia

Insight of neural network by removing synapses

Author: Bulín Martin
Šmídl Luboš
Publication venue: Západočeská univerzita v Plzni
Publication date: 01/01/2017
Field of study

University of West Bohemia Digital Library

DSpace at University of West Bohemia

Czech Parliament Meetings

Author: Pražák Aleš
Šmídl Luboš
Publication venue: University of West Bohemia, Department of Cybernetics
Publication date: 28/03/2012
Field of study

The corpus consists of recordings from the Chamber of Deputies of the Parliament of the Czech Republic. It currently consists of 88 hours of speech data, which corresponds roughly to 0.5 million tokens. The annotation process is semi-automatic, as we are able to perform the speech recognition on the data with high accuracy (over 90%) and consequently align the resulting automatic transcripts with the speech. The annotator’s task is then to check the transcripts, correct errors, add proper punctuation and label speech sections with information about the speaker. The resulting corpus is therefore suitable for both acoustic model training for ASR purposes and training of speaker identification and/or verification systems. The archive contains 18 sound files (WAV PCM, 16-bit, 44.1 kHz, mono) and corresponding transcriptions in XML-based standard Transcriber format (http://trans.sourceforge.net) The date of airing of a particular recording is encoded in the filename in the form SOUND_YYMMDD_*. Note that the recordings are usually aired in the early morning on the day following the actual Parliament session. If the recording is too long to fit in the broadcasting scheme, it is divided into several parts and aired on the consecutive days

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University