19 research outputs found

    Rapid Generation of Pronunciation Dictionaries for new Domains and Languages

    Get PDF
    This dissertation presents innovative strategies and methods for the rapid generation of pronunciation dictionaries for new domains and languages. Depending on various conditions, solutions are proposed and developed. Starting from the straightforward scenario in which the target language is present in written form on the Internet and the mapping between speech and written language is close up to the difficult scenario in which no written form for the target language exists

    Low Resource Efficient Speech Retrieval

    Get PDF
    Speech retrieval refers to the task of retrieving the information, which is useful or relevant to a user query, from speech collection. This thesis aims to examine ways in which speech retrieval can be improved in terms of requiring low resources - without extensively annotated corpora on which automated processing systems are typically built - and achieving high computational efficiency. This work is focused on two speech retrieval technologies, spoken keyword retrieval and spoken document classification. Firstly, keyword retrieval - also referred to as keyword search (KWS) or spoken term detection - is defined as the task of retrieving the occurrences of a keyword specified by the user in text form, from speech collections. We make advances in an open vocabulary KWS platform using context-dependent Point Process Model (PPM). We further accomplish a PPM-based lattice generation framework, which improves KWS performance and enables automatic speech recognition (ASR) decoding. Secondly, the massive volumes of speech data motivate the effort to organize and search speech collections through spoken document classification. In classifying real-world unstructured speech into predefined classes, the wildly collected speech recordings can be extremely long, of varying length, and contain multiple class label shifts at variable locations in the audio. For this reason each spoken document is often first split into sequential segments, and then each segment is independently classified. We present a general purpose method for classifying spoken segments, using a cascade of language independent acoustic modeling, foreign-language to English translation lexicons, and English-language classification. Next, instead of classifying each segment independently, we demonstrate that exploring the contextual dependencies across sequential segments can provide large classification performance improvements. Lastly, we remove the need of any orthographic lexicon and instead exploit alternative unsupervised approaches to decoding speech in terms of automatically discovered word-like or phoneme-like units. We show that the spoken segment representations based on such lexical or phonetic discovery can achieve competitive classification performance as compared to those based on a domain-mismatched ASR or a universal phone set ASR

    Out-of-vocabulary spoken term detection

    Get PDF
    Spoken term detection (STD) is a fundamental task for multimedia information retrieval. A major challenge faced by an STD system is the serious performance reduction when detecting out-of-vocabulary (OOV) terms. The difficulties arise not only from the absence of pronunciations for such terms in the system dictionaries, but from intrinsic uncertainty in pronunciations, significant diversity in term properties and a high degree of weakness in acoustic and language modelling. To tackle the OOV issue, we first applied the joint-multigram model to predict pronunciations for OOV terms in a stochastic way. Based on this, we propose a stochastic pronunciation model that considers all possible pronunciations for OOV terms so that the high pronunciation uncertainty is compensated for. Furthermore, to deal with the diversity in term properties, we propose a termdependent discriminative decision strategy, which employs discriminative models to integrate multiple informative factors and confidence measures into a classification probability, which gives rise to minimum decision cost. In addition, to address the weakness in acoustic and language modelling, we propose a direct posterior confidence measure which replaces the generative models with a discriminative model, such as a multi-layer perceptron (MLP), to obtain a robust confidence for OOV term detection. With these novel techniques, the STD performance on OOV terms was improved substantially and significantly in our experiments set on meeting speech data

    24th Nordic Conference on Computational Linguistics (NoDaLiDa)

    Get PDF

    CLARIN. The infrastructure for language resources

    Get PDF
    CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    Grammar and Corpora 2016

    Get PDF
    In recent years, the availability of large annotated corpora, together with a new interest in the empirical foundation and validation of linguistic theory and description, has sparked a surge of novel work using corpus methods to study the grammar of natural languages. This volume presents recent developments and advances, firstly, in corpus-oriented grammar research with a special focus on Germanic, Slavic, and Romance languages and, secondly, in corpus linguistic methodology as well as the application of corpus methods to grammar-related fields. The volume results from the sixth international conference Grammar and Corpora (GaC 2016), which took place at the Institute for the German Language (IDS) in Mannheim, Germany, in November 2016

    CLARIN

    Get PDF
    The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

    RECOVERY OF ACRONYMS, OUT-OF-LATTICE WORDS AND PRONUNCIATIONS FROM PARALLEL MULTILINGUAL SPEECH

    No full text
    In this work we present a set of techniques which explore information from multiple, different language versions of the same speech, to improve Automatic Speech Recognition (ASR) performance. Using this redundant information we are able to recover acronyms, words that cannot be found in the multiple hypotheses produced by the ASR systems, and pronunciations absent from their pronunciation dictionaries. When used together, the three techniques yield a relative improvement of 5.0 % over the WER of our baseline system, and 24.8 % relative when compared with standard speech recognition, in an Europarl Committee dataset with three different languages (Portuguese, Spanish and English). One full iteration of the system has a parallel Real Time Factor (RTF) of 3.08 and a sequential RTF of 6.44. Index Terms — speech recognition, machine translation, pronunciation, out-of-lattice, acronyms 1

    The Story of Swahili

    Get PDF
    Swahili was once an obscure dialect of an East African Bantu language. Today more than one hundred million people use it: Swahili is to eastern and central Africa what English is to the world. From its embrace in the 1960s by the black freedom movement in the United States to its adoption in 2004 as the African Union’s official language, Swahili has become a truly international language. How this came about and why, of all African languages, it happened only to Swahili is the story that John M. Mugane sets out to explore. The remarkable adaptability of Swahili has allowed Africans and others to tailor the language to their needs, extending its influence far beyond its place of origin. Its symbolic as well as its practical power has evolved from its status as a language of contact among diverse cultures, even as it embodies the history of communities in eastern and central Africa and throughout the Indian Ocean world. The Story of Swahili calls for a reevaluation of the widespread assumption that cultural superiority, military conquest, and economic dominance determine a language’s prosperity. This sweeping history gives a vibrant, living language its due, highlighting its nimbleness from its beginnings to its place today in the fast-changing world of global communication.https://ohioopen.library.ohio.edu/oupress/1013/thumbnail.jp
    corecore