Search CORE

8 research outputs found

LUX-ASR: Building an ASR system for the Luxembourgish language

Author: Gilles Peter
Hillah Léopold Edem Ayité
Hosseini Kivanani Nina
Publication venue
Publication date: 01/01/2023
Field of study

We present a first system for automatic speech recognition (ASR) for the low-resource language Luxembourgish. By applying transfer-learning, we were able to fine-tune Meta’s wav2vec2-xls-r-300m checkpoint with 35 hours of labeled Luxembourgish speech data. The best word error rate received lies at 14.47

Open Repository and Bibliography - Luxembourg

Adaptation of speech recognition systems to selected real-world deployment conditions

Author: Červa Petr
Publication venue
Publication date: 26/11/2021
Field of study

Tato habilitační práce se zabývá problematikou adaptace systémů rozpoznávání řeči na vybrané reálné podmínky nasazení. Je koncipována jako sborník celkem dvanácti článků, které se touto problematikou zabývají. Jde o publikace, jejichž jsem hlavním autorem nebo spoluatorem, a které vznikly v rámci několika navazujících výzkumných projektů. Na řešení těchto projektů jsem se podílel jak v roli člena výzkumného týmu, tak i v roli řešitele nebo spoluřešitele. Publikace zařazené do tohoto sborníku lze rozdělit podle tématu do tří hlavních skupin. Jejich společným jmenovatelem je snaha přizpůsobit daný rozpoznávací systém novým podmínkám či konkrétnímu faktoru, který významným způsobem ovlivňuje jeho funkci či přesnost. První skupina článků se zabývá úlohou neřízené adaptace na mluvčího, kdy systém přizpůsobuje svoje parametry specifickým hlasovým charakteristikám dané mluvící osoby. Druhá část práce se pak věnuje problematice identifikace neřečových událostí na vstupu do systému a související úloze rozpoznávání řeči s hlukem (a zejména hudbou) na pozadí. Konečně třetí část práce se zabývá přístupy, které umožňují přepis audio signálu obsahujícího promluvy ve více než v jednom jazyce. Jde o metody adaptace existujícího rozpoznávacího systému na nový jazyk a metody identifikace jazyka z audio signálu. Obě zmíněné identifikační úlohy jsou přitom vyšetřovány zejména v náročném a méně probádaném režimu zpracování po jednotlivých rámcích vstupního signálu, který je jako jediný vhodný pro on-line nasazení, např. pro streamovaná data.This habilitation thesis deals with adaptation of automatic speech recognition (ASR) systems to selected real-world deployment conditions. It is presented in the form of a collection of twelve articles dealing with this task; I am the main author or a co-author of these articles. They were published during my work on several consecutive research projects. I have participated in the solution of them as a member of the research team as well as the investigator or a co-investigator. These articles can be divided into three main groups according to their topics. They have in common the effort to adapt a particular ASR system to a specific factor or deployment condition that affects its function or accuracy. The first group of articles is focused on an unsupervised speaker adaptation task, where the ASR system adapts its parameters to the specific voice characteristics of one particular speaker. The second part deals with a) methods allowing the system to identify non-speech events on the input, and b) the related task of recognition of speech with non-speech events, particularly music, in the background. Finally, the third part is devoted to the methods that allow the transcription of an audio signal containing multilingual utterances. It includes a) approaches for adapting the existing recognition system to a new language and b) methods for identification of the language from the audio signal. The two mentioned identification tasks are in particular investigated under the demanding and less explored frame-wise scenario, which is the only one suitable for processing of on-line data streams

DSpace@TUL

Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings

Author: Irvine Ann
Publication venue: Johns Hopkins University
Publication date: 01/01/2014
Field of study

Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, or pairs of translated sentences. In this thesis, we directly incorporate comparable corpora into the estimation of end-to-end SMT models. In contrast to parallel corpora, comparable corpora are pairs of monolingual corpora that have some cross-lingual similarities, for example topic or publication date, but that do not necessarily contain any direct translations. Comparable corpora are more readily available in large quantities than parallel corpora, which require significant human effort to compile. We use comparable corpora to estimate machine translation model parameters and show that doing so improves performance in settings where a limited amount of parallel data is available for training. The major contributions of this thesis are the following: * We release ‘language packs’ for 151 human languages, which include bilingual dictionaries, comparable corpora of Wikipedia document pairs, comparable corpora of time-stamped news text that we harvested from the web, and, for non-roman script languages, dictionaries of name pairs, which are likely to be transliterations. * We present a novel technique for using a small number of example word translations to learn a supervised model for bilingual lexicon induction which takes advantage of a wide variety of signals of translation equivalence that can be estimated over comparable corpora. * We show that using comparable corpora to induce new translations and estimate new phrase table feature functions improves end-to-end statistical machine translation performance for low resource language pairs as well as domains. * We present a novel algorithm for composing multiword phrase translations from multiple unigram translations and then use comparable corpora to prune the large space of hypothesis translations. We show that these induced phrase translations improve machine translation performance beyond that of component unigrams. This thesis focuses on critical low resource machine translation settings, where insufficient parallel corpora exist for training statistical models. We experiment with both low resource language pairs and low resource domains of text. We present results from our novel error analysis methodology, which show that most translation errors in low resource settings are due to unseen source language words and phrases and unseen target language translations. We also find room for fixing errors due to how different translations are weighted, or scored, in the models. We target both error types; we use comparable corpora to induce new word and phrase translations and estimate novel translation feature scores. Our experiments show that augmenting baseline SMT systems with new translations and features estimated over comparable corpora improves translation performance significantly. Additionally, our techniques expand the applicability of statistical machine translation to those language pairs for which zero parallel text is available

CiteSeerX

JScholarship

Unmet goals of tracking: within-track heterogeneity of students' expectations for

Author: Demanet Jannick
Van den Broeck Laura
Van Houtte Mieke
Publication venue
Publication date: 01/01/2015
Field of study

Educational systems are often characterized by some form(s) of ability grouping, like tracking. Although substantial variation in the implementation of these practices exists, it is always the aim to improve teaching efficiency by creating homogeneous groups of students in terms of capabilities and performances as well as expected pathways. If students’ expected pathways (university, graduate school, or working) are in line with the goals of tracking, one might presume that these expectations are rather homogeneous within tracks and heterogeneous between tracks. In Flanders (the northern region of Belgium), the educational system consists of four tracks. Many students start out in the most prestigious, academic track. If they fail to gain the necessary credentials, they move to the less esteemed technical and vocational tracks. Therefore, the educational system has been called a 'cascade system'. We presume that this cascade system creates homogeneous expectations in the academic track, though heterogeneous expectations in the technical and vocational tracks. We use data from the International Study of City Youth (ISCY), gathered during the 2013-2014 school year from 2354 pupils of the tenth grade across 30 secondary schools in the city of Ghent, Flanders. Preliminary results suggest that the technical and vocational tracks show more heterogeneity in student’s expectations than the academic track. If tracking does not fulfill the desired goals in some tracks, tracking practices should be questioned as tracking occurs along social and ethnic lines, causing social inequality

Ghent University Academic Bibliography

Epidemiology of Injury in English Women's Super league Football: A Cohort Study

Author: Francis P
Hind K
Jones G
Mayhew L
McPee J
Publication venue: ISBN 978-3-9818414-1-1
Publication date: 07/07/2018
Field of study

INTRODUCTION: The epidemiology of injury in male professional football has been well documented (Ekstrand, Hägglund, & Waldén, 2011) and used as a basis to understand injury trends for a number of years. The prevalence and incidence of injuries occurring in womens super league football is unknown. The aim of this study is to estimate the prevalence and incidence of injury in an English Super League Women’s Football squad. METHODS: Following ethical approval from Leeds Beckett University, players (n = 25) signed to a Women’s Super League Football club provided written informed consent to complete a self-administered injury survey. Measures of exposure, injury and performance over a 12-month period was gathered. Participants were classified as injured if they reported a football injury that required medical attention or withdrawal from participation for one day or more. Injuries were categorised as either traumatic or overuse and whether the injury was a new injury and/or re-injury of the same anatomical site RESULTS: 43 injuries, including re-injury were reported by the 25 participants providing a clinical incidence of 1.72 injuries per player. Total incidence of injury was 10.8/1000 h (95% CI: 7.5 to 14.03). Participants were at higher risk of injury during a match compared with training (32.4 (95% CI: 15.6 to 48.4) vs 8.0 (95% CI: 5.0 to 10.85)/1000 hours, p 28 days) of which there were three non-contact anterior cruciate ligament (ACL) injuries. The epidemiological incidence proportion was 0.80 (95% CI: 0.64 to 0.95) and the average probability that any player on this team will sustain at least one injury was 80.0% (95% CI: 64.3% to 95.6%) CONCLUSION: This is the first report capturing exposure and injury incidence by anatomical site from a cohort of English players and is comparable to that found in Europe (6.3/1000 h (95% CI 5.4 to 7.36) Larruskain et al 2017). The number of ACL injuries highlights a potential injury burden for a squad of this size. Multi-site prospective investigations into the incidence and prevalence of injury in women’s football are require

Leeds Beckett Repository

Esa 12th Conference: Differences, Inequalities and Sociological Imagination: Abstract Book

Author: Klimczuk Andrzej
Publication venue
Publication date: 01/01/2015
Field of study

Esa 12th Conference: Differences, Inequalities and Sociological Imagination: Abstract Boo

PhilPapers

Synchronizing fields – understanding the success of European social science projects:The case of the European Value Study and the European Social Survey

Author: Kropp Kristoffer
Publication venue
Publication date: 24/01/2014
Field of study

Roskilde Universitet

Copenhagen University Research Information System

Study on media plurality and diversity online

Author
Publication venue: Publications Office of the European Union
Publication date: 01/01/2022
Field of study

Published online: 16 September 2022Corporate authors: Centre on Media Pluralism and Media Freedom (CMPF) , CiTiP (Centre for Information Technology and Intellectual Property) of KU Leuven , Directorate-General for Communications Networks, Content and Technology (European Commission) , Institute for Information Law of the University of Amsterdam (IViR/UvA) , Vrije Universiteit Brussels (Studies in Media Innovation and Technology VUB- SMIT)Personal authors: Parcu, Pier Luigi ; Brogi, Elda ; Verza, Sofia ; Da Costa Leite Borges, Danielle ; Carlini, Roberta ; Trevisan, Matteo ; Tambini, Damian ; Mazzoli, Eleonora Maria ; Klimkiewicz, Beata ; Broughton Micova, Sally ; Petković, Brankica ; Rossi, Maria Alessandra ; Stasi, Maria Luisa ; Valcke, Peggy ; Lambrecht, Ingrid ; Irion, Kristina ; Fahy, Ronan ; Idiz, Daphne ; Meiring, Arlette ; Seipp, Theresa ; Poort, Joost ; Ranaivoson, Heritiana ; Afilipoaie, Adelaida ; Domazetovikj, NinoThe Study on Media Plurality and Diversity Online investigates the value of safeguarding media pluralism and diversity online, focusing on (i) the prominence and discoverability of general interest content and services, and on (ii) market plurality and the concentration of economic resources. With a focus on Europe, the project is funded by a tender from the European Commission to produce a study on Media Plurality and Diversity Online and involves four partner universities: CMPF (EUI); CiTiP (Centre for Information Technology and Intellectual Property) of KU Leuven; the Institute for Information Law of the University of Amsterdam (IViR/UvA); imec-SMIT-Vrije Universiteit Brussel. The purpose of the assignment was to describe, analyse and evaluate the existing regulatory and business practices in the two areas mentioned above, and finally to elaborate some policy recommendations. Data were collected from the database of the Media Pluralism Monitor (CMPF) and through desk research, online consultations and interviews with stakeholders. The contractor was able to call on a network of national experts across the Member States to support this work

Cadmus, EUI Research Repository