10 research outputs found
Acoustic Modelling for Under-Resourced Languages
Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones.
In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
K prozodii mluvené češtiny metodami korpusové lingvistiky
Prosody is a key aspect of spoken language, yet it is currently underrepresented in the spoken Czech corpora on offer at the Czech National Corpus. This is mainly because spoken corpora are very expensive and manual work intensive as it is, and adding more annotation manually is infeasible. The present dissertation thus charts a way to provide an automatic prosodic annotation for the spoken corpora of the CNC using the Prosogram framework, in combination with other tools and various custom postprocessing strategies and heuristics. Acaseisalsomadeinfavoroftheory-light,predominantlydescriptiveapproaches when preparing general-purpose spoken corpus annotations for the consumption of the linguistics research community at large, in a variety of contexts and research tasks. This case is philosophically anchored in a discriminative approach to meaning, which is shown to be the correct, paradox-free alternative to the currently more dominant paradigm of compositionality. Finally, a selection of results based on the Prosogram-generated annotation is presented. A particular focus is given to pitch range, which is characteristically restricted in Czech compared to other languages like English, but other features such as glissandos are also considered. Keywords: Czech, speech, prosody, corpus linguistics,...Prozodie je klíčovým aspektem mluveného jazyka, nicméně v korpusech mluvené češtiny, které jsou aktuálně v nabídce Českého národního korpusu, je reprezen- tována jen okrajově. Primární důvod je ten, že mluvené korpusy jsou už tak velmi náročné co se nákladů a manuální práce týče, takže přidávat další manuálně an- otované prvky není schůdné. Předkládaná práce tak nabízí cestu, jak prozodickou anotaci doplnit do těchto korpusů automaticky, pomocí systému Prosogram v kom- binaci s dalšími nástroji a vlastními postprocessingovými postupy a heuristikami. Součástí teoretického zdůvodnění volby Prosogramu jako anotačního nástroje je i analýza toho, jak funguje v jazyce význam. Filozoficky je ukotvená v diskrimina- tivním pojetí významu, které na rozdíl od aktuálně dominantního kompozičního pojetí neskýtá při důsledné aplikaci žádné paradoxy. Vyplývá z ní, že anotaci obec- ných mluvených korpusů, která cílí na užití širokou lingvistickou komunitou v různýchkontextecha při různých výzkumných úkolech,je vhodné cílit deskriptivně, s minimální poplatností konkrétním teoriím. Prezentované výsledky, získané pomocí zpracování Prosogramem, se soustředí zejména na intonační rozpětí, protože omezené intonační rozpětí je poměrně ná- padným rysem češtiny ve srovnání s jinými jazyky, např. angličtinou. Věnujeme se...Ústav germánských studiíInstitute of Germanic StudiesFaculty of ArtsFilozofická fakult
Learning representations for speech recognition using artificial neural networks
Learning representations is a central challenge in machine learning. For speech
recognition, we are interested in learning robust representations that are stable
across different acoustic environments, recording equipment and irrelevant inter–
and intra– speaker variabilities. This thesis is concerned with representation
learning for acoustic model adaptation to speakers and environments, construction
of acoustic models in low-resource settings, and learning representations from
multiple acoustic channels. The investigations are primarily focused on the hybrid
approach to acoustic modelling based on hidden Markov models and artificial
neural networks (ANN).
The first contribution concerns acoustic model adaptation. This comprises
two new adaptation transforms operating in ANN parameters space. Both operate
at the level of activation functions and treat a trained ANN acoustic model as
a canonical set of fixed-basis functions, from which one can later derive variants
tailored to the specific distribution present in adaptation data. The first technique,
termed Learning Hidden Unit Contributions (LHUC), depends on learning
distribution-dependent linear combination coefficients for hidden units. This
technique is then extended to altering groups of hidden units with parametric and
differentiable pooling operators. We found the proposed adaptation techniques
pose many desirable properties: they are relatively low-dimensional, do not overfit
and can work in both a supervised and an unsupervised manner. For LHUC we
also present extensions to speaker adaptive training and environment factorisation.
On average, depending on the characteristics of the test set, 5-25% relative
word error rate (WERR) reductions are obtained in an unsupervised two-pass
adaptation setting.
The second contribution concerns building acoustic models in low-resource
data scenarios. In particular, we are concerned with insufficient amounts of
transcribed acoustic material for estimating acoustic models in the target language
– thus assuming resources like lexicons or texts to estimate language models
are available. First we proposed an ANN with a structured output layer
which models both context–dependent and context–independent speech units,
with the context-independent predictions used at runtime to aid the prediction
of context-dependent states. We also propose to perform multi-task adaptation
with a structured output layer. We obtain consistent WERR reductions up to
6.4% in low-resource speaker-independent acoustic modelling. Adapting those
models in a multi-task manner with LHUC decreases WERRs by an additional
13.6%, compared to 12.7% for non multi-task LHUC. We then demonstrate that
one can build better acoustic models with unsupervised multi– and cross– lingual
initialisation and find that pre-training is a largely language-independent. Up to
14.4% WERR reductions are observed, depending on the amount of the available
transcribed acoustic data in the target language.
The third contribution concerns building acoustic models from multi-channel
acoustic data. For this purpose we investigate various ways of integrating and
learning multi-channel representations. In particular, we investigate channel concatenation
and the applicability of convolutional layers for this purpose. We
propose a multi-channel convolutional layer with cross-channel pooling, which
can be seen as a data-driven non-parametric auditory attention mechanism. We
find that for unconstrained microphone arrays, our approach is able to match the
performance of the comparable models trained on beamform-enhanced signals
Mathematical linguistics
but in fact this is still an early draft, version 0.56, August 1 2001. Please d
Word Knowledge and Word Usage
Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines
Social work with airports passengers
Social work at the airport is in to offer to passengers social services. The main
methodological position is that people are under stress, which characterized by a
particular set of characteristics in appearance and behavior. In such circumstances
passenger attracts in his actions some attention. Only person whom he trusts can help him
with the documents or psychologically