7 research outputs found
Adaptation of speech recognition systems to selected real-world deployment conditions
Tato habilitační práce se zabývá problematikou adaptace systémů
rozpoznávání řeči na vybrané reálné podmínky nasazení. Je koncipována
jako sborník celkem dvanácti článků, které se touto problematikou
zabývají. Jde o publikace, jejichž jsem hlavním autorem
nebo spoluatorem, a které vznikly v rámci několika navazujících
výzkumných projektů. Na řešení těchto projektů jsem se
podílel jak v roli člena výzkumného týmu, tak i v roli řešitele nebo
spoluřešitele.
Publikace zařazené do tohoto sborníku lze rozdělit podle tématu
do tří hlavních skupin. Jejich společným jmenovatelem je
snaha přizpůsobit daný rozpoznávací systém novým podmínkám či
konkrétnímu faktoru, který významným způsobem ovlivňuje jeho
funkci či přesnost.
První skupina článků se zabývá úlohou neřízené adaptace na
mluvčího, kdy systém přizpůsobuje svoje parametry specifickým
hlasovým charakteristikám dané mluvící osoby. Druhá část práce
se pak věnuje problematice identifikace neřečových událostí na vstupu
do systému a související úloze rozpoznávání řeči s hlukem
(a zejména hudbou) na pozadí. Konečně třetí část práce se zabývá
přístupy, které umožňují přepis audio signálu obsahujícího promluvy
ve více než v jednom jazyce. Jde o metody adaptace existujícího
rozpoznávacího systému na nový jazyk a metody identifikace
jazyka z audio signálu.
Obě zmíněné identifikační úlohy jsou přitom vyšetřovány zejména
v náročném a méně probádaném režimu zpracování po jednotlivých
rámcích vstupního signálu, který je jako jediný vhodný pro on-line
nasazení, např. pro streamovaná data.This habilitation thesis deals with adaptation of automatic speech
recognition (ASR) systems to selected real-world deployment conditions.
It is presented in the form of a collection of twelve articles
dealing with this task; I am the main author or a co-author of these
articles. They were published during my work on several consecutive
research projects. I have participated in the solution of them
as a member of the research team as well as the investigator or a
co-investigator.
These articles can be divided into three main groups according to
their topics. They have in common the effort to adapt a particular
ASR system to a specific factor or deployment condition that affects
its function or accuracy.
The first group of articles is focused on an unsupervised speaker
adaptation task, where the ASR system adapts its parameters to
the specific voice characteristics of one particular speaker. The second
part deals with a) methods allowing the system to identify
non-speech events on the input, and b) the related task of recognition
of speech with non-speech events, particularly music, in the
background. Finally, the third part is devoted to the methods
that allow the transcription of an audio signal containing multilingual
utterances. It includes a) approaches for adapting the existing
recognition system to a new language and b) methods for identification
of the language from the audio signal.
The two mentioned identification tasks are in particular investigated
under the demanding and less explored frame-wise scenario,
which is the only one suitable for processing of on-line data streams
IberSPEECH 2020: XI Jornadas en Tecnología del Habla and VII Iberian SLTech
IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli
Deliverable D1.1 State of the art and requirements analysis for hypervideo
This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
Human-Computer Interaction
In this book the reader will find a collection of 31 papers presenting different facets of Human Computer Interaction, the result of research projects and experiments as well as new approaches to design user interfaces. The book is organized according to the following main topics in a sequential order: new interaction paradigms, multimodality, usability studies on several interaction mechanisms, human factors, universal design and development methodologies and tools
PROSIDING SEMINAR TAHUNAN LINGUISTIK UNIVERSITAS PENDIDIKAN INDONESIA (SETALI 2018) TINGKAT INTERNASIONAL : Language in the Digital Era: Opportunities or Threats?
Seminar Tahunan Linguistik yang lazim disebut SETALI merupakan ajang seminar tahunan yang diselenggarakan oleh Program Studi Linguistik Sekolah Pascasarjana Universitas Pendidikan Indonesia (SPs UPI) bekerja sama dengan organisasi profesi Masyarakat Linguistik Indonesia (MLI) komisariat UPI. Pada 2018 ini, seminar kembali digelar pada 5-6 Mei bertemakan “Bahasa di Era Digital: Peluang atau Ancaman?”. Pengusungan tema kali ini beranjak dari fenomena khas terkait bahasa di era digital yang turut mengambil peran penting di dalam pengaplikasiannya. Ada sekitar 200 makalah terpilih yang dimuat untuk dibentangkan dalam Setali 2018. Makalah-makalah yang terhimpun dalam prosiding ini telah diseleksi melalui proses panjang dan pertimbangan yang cukup cermat. Bahasa dan digitalisasi adalah dua hal yang saling berkait dan tidak terpisahkan. Pemakaian bahasa di ruang digital, pada berbagai media, menimbulkan berbagai varian. Penggunaan bahasa dalam komunikasi di era digital, terkadang sesuai dengan bentuk yang baik (well-form), namun tak jarang juga tampil menyimpang (unwell-form). Banyaknya penyimpangan yang terjadi dalam konteks penggunaaan bahasa di ruang digital berpotensi menimbulkan efek negatif yang dapat mempengaruhi sikap bahasa pengguna bahasa Indonesia secara umum. Terkait dengan hal tersebut, masyarakat diharapkan cermat dalam menyikapi berbagai fenomena penggunaan bahasa yang sulit terbendung. Sekalipun ada banyak ancaman terhadap eksistensi bahasa di era ini, tidak dipungkiri juga ada banyak peluang yang dapat dipilih oleh masyarakat pengguna bahasa sebagai hal yang positif dan menguntungkan. Setakat ini, muncul berbagai polemik dalam dunia linguistik terkait masalah kebahasaan yang merebak di dunia digital. Para penggiat bahasa diharapkan banyak melakukan penelaahan terhadap praktik dan peran bahasa di era digital ini. Tema “Bahasa di Era Digital: Peluang atau Ancaman?” ini diharapkan mampu mewadahi semua elemen masyarakat untuk berpatisipasi dan ikut andil dalam menilai dan menelisik kedudukan bahasa dari sudut pandang yang beraneka ragam sehingga dapat melahirkan beraragamnya perspektif di jagat linguistik Indonesia. Akhir kata, dengan memohon petunjuk dan keridhaan Allah Swt., saya berharap agar penyelenggaraan Setali 2018 ini dapat berjalan dengan tertib dan lancar. Selain itu, saya juga berharap semoga dokumentasi akademik seperti ini dapat memberikan kontribusi nyata bagi perkembangan linguistik di Indonesia. Dalam kesempatan ini, saya merasa perlu untuk mengucapkan terima kasih kepada para pihak yang telah turut serta membantu terlaksananya Setali 2018 ini berjalan dengan baik. Selamat berseminar