Search CORE

14 research outputs found

Integration of Voice Technologies on Mobile Platforms

Author: Černičko Sergij
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2013
Field of study

Cílem práce je seznámit se s metodami a technikami využívanými při zpracování řeči. Popsat současný stav výzkumu a vývoje řečových technologií. Navrhnout a implementovat serverový rozpoznávač řeči, který využívá BSAPI. Integrovat klienta, který bude využívat server pro rozpoznání řeči, do mobilních slovníků společnosti Lingea.The goal of the thesis is being familiar with methods a techniques used in speech processing. Describe the current state of research and development of speech technology. Project and implement server speech recognizer that uses BSAPI. Integrate client that will use server for speech recognition to mobile dictionaries of Lingea company.

Digital library of Brno University of Technology

National Repository of Grey Literature

Multiplatform Application for Speaker Verification

Author: Görig Jan
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2010
Field of study

Bakalářská práce se zabývá rozpoznáváním mluvčího bez znalosti textu sdělení. Zmiňuje dnes používané způsoby extrakce příznaků a jejich vyhodnocení pomocí směsice Gaussových hustotních funkcí. Praktickým výstupem práce je aplikace pro vizualizaci průběhu rozpoznávání. Návrh aplikace je multiplatformní a využívá knihoven Qt a BSAPI.Bachelor thesis considers speaker recognition without knowledge of spoken message. There are described current feature extraction methods and their evaluation using Gaussian mixture model. The practical output of this work is application for visualization of the recognition process. Developed application is cross platform and it uses Qt and BSAPI libraries.

Digital library of Brno University of Technology

National Repository of Grey Literature

Text to Audio Alignment

Author: Šuba Adam
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2018
Field of study

Tato bakalářská práce se zabývá výzkumem nástroje pro synchronizaci textu a audia na úrovni jednotlivých grafémů a fonémů. V práci jsou také diskutovány možné přístupy k synchronizaci a případná omezení a problémy, kterým je třeba čelit. Zkoumaný nástroj využívá přístup vycházející z grapheme-to-phoneme konverze s použitím joint-sequence modelů. Pro experimenty jsou použity data z televizního vysílání, která byla převzata z Multi-Genre Broadcast Challenge 2015.This bachelor thesis studies a tool for automatic text to audio alignment at the level of single phonemes and graphemes. It also discusses possible techniques used in alignment and possible limitations and difficulties that need to be taken into account. Studied tool uses approach based on grapheme-to-phoneme conversion using joint-sequence models. Data used in experiments are TV broadcast recordings from Multi-Genre Broadcast Challenge 2015.

Digital library of Brno University of Technology

National Repository of Grey Literature

In the Traces of Leoš Janáček - Conversion of Speech to Music

Author: Marciniak Petr
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2010
Field of study

Tato bakalářská práce popisuje vývoj aplikace pro převod řeči z nahrávky ve formátu WAV na hudbu uloženou ve formátu MIDI. V úvodní části je čtenář uveden do problematiky. Následuje popis teoretických základů zpracování řeči a následného generování hudby. Dále jsou diskutovány počáteční experimenty, jako generování základní melodie, průměrování tónů, detekce slabik atp., za účelem určení, které z těchto technik mají pozitivní vliv na poslouchatelnost vytvořené hudby, a proto by měly být ve výsledné aplikaci implementovány. Následně jsou definována základní kritéria krásy z hlediska generování hudby a jsou diskutovány různé skladatelské techniky, jako např. inverze tónů nebo změna tempa. Následuje popis implementace a vyhodnocení provedených testů. V závěrečné části je celá práce zhodnocena a je zde i krátké zamyšlení nad možnými dalšími směry vývoje tohoto systému. V příloze je možné najít uživatelský manuál k aplikaci a dále také seznam nástrojů použitých pro implementaci.The aim of this bachelor thesis is to develop an application, which will automatically convert speech recording in WAV format to speech-melody-based music in MIDI format. At first, the problem is analyzed and the theoretical background is described. Basics of music generation from speech are introduced. Initial experiments like creation of the elementary melody, averaging of tones, syllables detection, etc. are discussed in order to establish, which of these techniques have a positive impact on the resulting music and therefore should be implemented in the resulting application. Basic criteria of beauty in music generation needed to be defined and different compositional techniques such as inversion of notes or tempo changes were investigated. Further, the implementation is described and user testing is evaluated. The conclusions are drawn and future directions of development are discussed. The user manual for the application as well as a "cook book" listing tools used in the application development can be found in the Appendix.

Digital library of Brno University of Technology

National Repository of Grey Literature

Gaze-Based Keyboard

Author: Sznapka Jakub
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2013
Field of study

Cílem bakalářská práce je vytvoření nástroje pro psaní pohledem. Zabývá se problematikou snímání pohledu a jeho vyhodnocováním. Obsahuje popis metody Swype, která se používá při psaní na dotykových displejích. Následuje rozbor různých způsobů, pomocí kterých je možné modelovat jazyk, který nástroj používá. Hlavní část práce se věnuje samotnému návrhu nástroje, jenž umožňuje psaní pohledem a jeho implementaci za pomoci toolkitu Kaldi. The goal of this bachelor's thesis is to create a tool for gaze typing. It deals with gaze tracking and evalution issues. It contains a description of the Swype method which is used for typing on touch screen devices. Then follows the analysis of different ways which could be used to model the language used by model. The main part is dedicated to design of the gaze typing tool and implementation using the Kaldi toolkit.

Digital library of Brno University of Technology

National Repository of Grey Literature

Suggestion, creation and analysis of speech corpus of telephonic records for speech and speaker recognition task

Author: Pražák Jan
Publication venue
Publication date: 02/12/2018
Field of study

DSpace@TUL

Voice Activity Detection

Author: Břenek Roman
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2011
Field of study

Tato práce se zabývá technikami detekce lidské řeči v nahrávkách. Je nutné při rozpoznávání správně klasifikovat všechny neřečové segmenty a naopak rozpoznat veškerou řeč i v hlučných a zašuměných prostředích. V práci je popsán celý proces rozpoznávání, tzn. digitalizace audio signálu, extrakce příznaků, trénování klasifikátoru, rozpoznávání a samotné vyhodnocení a úpravy před vyhodnocením. Pro rozpoznávání byly použity tři systémy, z nichž jeden je založen na fonémovém rozpoznávání pomocí neuronových sítí, další dva jsou založené na GMM, přičemž každý systém byl testován na třech datových sadách - Tactical Speaker Identification Speech Corpus (TSID), Ham Radio (HR) a Rich Transcription Evaluation (RT05-RT07). Nejlepší výsledky každého systému jsou pak zhodnoceny i s výsledky třetích stran.This thesis describes techniques for voice activity detection in audio recordings. It is necessary to correctly classify all non-speech segments and recognize speech with noisy background. The whole process of voice activity detection (VAD) is described in this thesis, i.e. digitizing audio signal, feature extraction, training of the system, post-processing and final evaluation. There are three different systems compared within the thesis . The first one is based on phoneme recognition using neural network, the other two are variations of Gaussian Mixture Models (GMM). Each system was tested on three data sets - Tactical Speaker Identification Speech Corpus (TSID), Ham Radio (HR) and Rich Transcription Evaluation (RT05-RT07). The best results of each system are compared with the results of the third side.

Digital library of Brno University of Technology

National Repository of Grey Literature

Multi-Task Neural Networks for Speech Recognition

Author: Egorova Ekaterina
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

První část této diplomové práci se zabývá teoretickým rozborem principů neuronových sítí, včetně možnosti jejich použití v oblasti rozpoznávání řeči. Práce pokračuje popisem viceúkolových neuronových sítí a souvisejících experimentů. Praktická část práce obsahovala změny software pro trénování neuronových sítí, které umožnily viceúkolové trénování. Je rovněž popsáno připravené prostředí, včetně několika dedikovaných skriptů. Experimenty představené v této diplomové práci ověřují použití artikulačních characteristik řeči pro viceúkolové trénování. Experimenty byly provedeny na dvou řečových databázích lišících se kvalitou a velikostí a representujících různé jazyky - angličtinu a vietnamštinu. Artikulační charakteristiky byly také kombinovány s jinými sekundárními úkoly, například kontextem, s záměrem ověřit jejich komplementaritu. Porovnaní je provedeno s neuronovými sítěmi různých velikostí tak, aby byl popsán vztah mezi velikostí neuronových sítí a efektivitou viceúkolového trénování. Závěrem provedených experimentů je, že viceúkolové trénování s použitím artikulačnich charakteristik jako sekundárních úkolů vede k lepšímu trénování neuronových sítí a výsledkem tohoto trénování může být přesnější rozpoznávání fonémů. V závěru práce jsou viceúkolové neuronové sítě testovány v systému rozpoznávání řeči jako extraktor příznaků.The first part of this Master's thesis covers theoretical investigation into the principles and usage of neural networks, including their usability for the speech recognition tasks. Then it proceeds to summarize the multi-task neural networks' operating principles and some recent experiments with them. The practical part of the semester project reports changes made to a tool for neural network training which support multi-task training. Then the preparation of the settings is described, including a number of scripts written especially for this purpose. The experiments presented in the thesis explore the idea of using articulatory characteristics of phonemes as secondary tasks for multi-task training. The experiments are conducted on two different datasets of different quality and size and representing different languages - English and Vietnamese. Articulatory characteristics are occasionally combined with different secondary tasks, such as context, to see how well they function together. A comparison is made between the networks of different sizes to see how their size affects the effectiveness of multi-task training. These experiments show that multi-task training with the use of articulatory characteristics as secondary tasks can enhance training and yield better phoneme accuracy as a result. Finally, multi-task training is embedded to a speech recognition system as a feature extractor.

Digital library of Brno University of Technology

National Repository of Grey Literature

Vizualization of Outputs from Speech Technologies for Contact Centers

Author: Zhezhela Oleksandr
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2014
Field of study

Diplomová práce se zabývá vizualizací dat získaných pomocí řečových technologií pro potřeby kontaktních center. Jsou prozkoumány metody získávání informaci z řečových signálů a existující nástroje, které řeší podobné úlohy. Je analyzován rozsah dat, která lze z řečových technologií získat. Procesy a standardy používané v kontaktních centrech. Na základě požadavků pracovníků kontaktních center bylo implementováno uživatelské rozhraní pro vizualizaci dat a audio přehrávačznázorňující řečová data. Získané poznatky a řešení byly implementovány do nástroje Speech Analytics Server (SPAS).The thesis is aimed on visualisation of data mined by speech processing technologies. Some methods speech data extraction were studied and technologies for this task were analysed. The variety of meta data that can be mined from speech was defined. Were also examined existing standards and processes of call centres. Some requirements for the user interface were gathered and analysed. On that basis and after communication with call centre employees there was defined and implemented a concept for speech data visualization. Gained solutions were integrated into Speech Analytics Server (SPAS).

Digital library of Brno University of Technology

National Repository of Grey Literature