10 research outputs found

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages

    Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information

    Get PDF
    This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech

    K prozodii mluvené češtiny metodami korpusové lingvistiky

    Get PDF
    Prosody is a key aspect of spoken language, yet it is currently underrepresented in the spoken Czech corpora on offer at the Czech National Corpus. This is mainly because spoken corpora are very expensive and manual work intensive as it is, and adding more annotation manually is infeasible. The present dissertation thus charts a way to provide an automatic prosodic annotation for the spoken corpora of the CNC using the Prosogram framework, in combination with other tools and various custom postprocessing strategies and heuristics. Acaseisalsomadeinfavoroftheory-light,predominantlydescriptiveapproaches when preparing general-purpose spoken corpus annotations for the consumption of the linguistics research community at large, in a variety of contexts and research tasks. This case is philosophically anchored in a discriminative approach to meaning, which is shown to be the correct, paradox-free alternative to the currently more dominant paradigm of compositionality. Finally, a selection of results based on the Prosogram-generated annotation is presented. A particular focus is given to pitch range, which is characteristically restricted in Czech compared to other languages like English, but other features such as glissandos are also considered. Keywords: Czech, speech, prosody, corpus linguistics,...Prozodie je klíčovým aspektem mluveného jazyka, nicméně v korpusech mluvené češtiny, které jsou aktuálně v nabídce Českého národního korpusu, je reprezen- tována jen okrajově. Primární důvod je ten, že mluvené korpusy jsou už tak velmi náročné co se nákladů a manuální práce týče, takže přidávat další manuálně an- otované prvky není schůdné. Předkládaná práce tak nabízí cestu, jak prozodickou anotaci doplnit do těchto korpusů automaticky, pomocí systému Prosogram v kom- binaci s dalšími nástroji a vlastními postprocessingovými postupy a heuristikami. Součástí teoretického zdůvodnění volby Prosogramu jako anotačního nástroje je i analýza toho, jak funguje v jazyce význam. Filozoficky je ukotvená v diskrimina- tivním pojetí významu, které na rozdíl od aktuálně dominantního kompozičního pojetí neskýtá při důsledné aplikaci žádné paradoxy. Vyplývá z ní, že anotaci obec- ných mluvených korpusů, která cílí na užití širokou lingvistickou komunitou v různýchkontextecha při různých výzkumných úkolech,je vhodné cílit deskriptivně, s minimální poplatností konkrétním teoriím. Prezentované výsledky, získané pomocí zpracování Prosogramem, se soustředí zejména na intonační rozpětí, protože omezené intonační rozpětí je poměrně ná- padným rysem češtiny ve srovnání s jinými jazyky, např. angličtinou. Věnujeme se...Ústav germánských studiíInstitute of Germanic StudiesFaculty of ArtsFilozofická fakult

    Learning representations for speech recognition using artificial neural networks

    Get PDF
    Learning representations is a central challenge in machine learning. For speech recognition, we are interested in learning robust representations that are stable across different acoustic environments, recording equipment and irrelevant inter– and intra– speaker variabilities. This thesis is concerned with representation learning for acoustic model adaptation to speakers and environments, construction of acoustic models in low-resource settings, and learning representations from multiple acoustic channels. The investigations are primarily focused on the hybrid approach to acoustic modelling based on hidden Markov models and artificial neural networks (ANN). The first contribution concerns acoustic model adaptation. This comprises two new adaptation transforms operating in ANN parameters space. Both operate at the level of activation functions and treat a trained ANN acoustic model as a canonical set of fixed-basis functions, from which one can later derive variants tailored to the specific distribution present in adaptation data. The first technique, termed Learning Hidden Unit Contributions (LHUC), depends on learning distribution-dependent linear combination coefficients for hidden units. This technique is then extended to altering groups of hidden units with parametric and differentiable pooling operators. We found the proposed adaptation techniques pose many desirable properties: they are relatively low-dimensional, do not overfit and can work in both a supervised and an unsupervised manner. For LHUC we also present extensions to speaker adaptive training and environment factorisation. On average, depending on the characteristics of the test set, 5-25% relative word error rate (WERR) reductions are obtained in an unsupervised two-pass adaptation setting. The second contribution concerns building acoustic models in low-resource data scenarios. In particular, we are concerned with insufficient amounts of transcribed acoustic material for estimating acoustic models in the target language – thus assuming resources like lexicons or texts to estimate language models are available. First we proposed an ANN with a structured output layer which models both context–dependent and context–independent speech units, with the context-independent predictions used at runtime to aid the prediction of context-dependent states. We also propose to perform multi-task adaptation with a structured output layer. We obtain consistent WERR reductions up to 6.4% in low-resource speaker-independent acoustic modelling. Adapting those models in a multi-task manner with LHUC decreases WERRs by an additional 13.6%, compared to 12.7% for non multi-task LHUC. We then demonstrate that one can build better acoustic models with unsupervised multi– and cross– lingual initialisation and find that pre-training is a largely language-independent. Up to 14.4% WERR reductions are observed, depending on the amount of the available transcribed acoustic data in the target language. The third contribution concerns building acoustic models from multi-channel acoustic data. For this purpose we investigate various ways of integrating and learning multi-channel representations. In particular, we investigate channel concatenation and the applicability of convolutional layers for this purpose. We propose a multi-channel convolutional layer with cross-channel pooling, which can be seen as a data-driven non-parametric auditory attention mechanism. We find that for unconstrained microphone arrays, our approach is able to match the performance of the comparable models trained on beamform-enhanced signals

    Mathematical linguistics

    Get PDF
    but in fact this is still an early draft, version 0.56, August 1 2001. Please d

    Word Knowledge and Word Usage

    Get PDF
    Word storage and processing define a multi-factorial domain of scientific inquiry whose thorough investigation goes well beyond the boundaries of traditional disciplinary taxonomies, to require synergic integration of a wide range of methods, techniques and empirical and experimental findings. The present book intends to approach a few central issues concerning the organization, structure and functioning of the Mental Lexicon, by asking domain experts to look at common, central topics from complementary standpoints, and discuss the advantages of developing converging perspectives. The book will explore the connections between computational and algorithmic models of the mental lexicon, word frequency distributions and information theoretical measures of word families, statistical correlations across psycho-linguistic and cognitive evidence, principles of machine learning and integrative brain models of word storage and processing. Main goal of the book will be to map out the landscape of future research in this area, to foster the development of interdisciplinary curricula and help single-domain specialists understand and address issues and questions as they are raised in other disciplines

    Social work with airports passengers

    Get PDF
    Social work at the airport is in to offer to passengers social services. The main methodological position is that people are under stress, which characterized by a particular set of characteristics in appearance and behavior. In such circumstances passenger attracts in his actions some attention. Only person whom he trusts can help him with the documents or psychologically
    corecore