Search CORE

1,356 research outputs found

One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

Author: Cotterell Ryan
Kann Katharina
Schütze Hinrich
Publication venue
Publication date: 01/01/2017
Field of study

We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task. We use labeled data from a high-resource language to increase performance on a low-resource language. In experiments on 21 language pairs from four different language families, we obtain up to 58% higher accuracy than without transfer and show that even zero-shot and one-shot learning are possible. We further find that the degree of language relatedness strongly influences the ability to transfer morphological knowledge.Comment: Accepted at ACL 201

arXiv.org e-Print Archive

Crossref

Computational morphology and Bantu language learning:an implementation for Runyakitara

Author: Katushemererwe Fridah
Publication venue: s.n.
Publication date: 01/01/2013
Field of study

ARTS repository - University of Groningen

Linguistic probes into human history

Author: Manni Franz
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2017
Field of study

ARTS repository - University of Groningen

Linguistic probes into human history

Author: Manni Franz
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2017
Field of study

Dit proefschrift omvat vijf reeds gepubliceerde artikelen en een studie die binnenkort verschijnt. Daarin heb ik taalkundige methoden onderzocht, getoetst en gebruikt om linguïstische variëteiten te classificeren op basis van steekproeven die bestaan uit lexicale items.De gerapporteerde studies hebben betrekking op de classificatie van Nederlandse variëteiten uit Nederland, talen en dialecten uit Spanje, Bantu-variëteiten uit Gabon, Tanzania en tenslotte Turkse en Indo-Iraanse talen die gesproken worden in Kirgizstan, Tadzjikistan en Oezbekistan.Binnen een multidisciplinair perspectief dat gericht is op het verschaffen van een hoger niveau van antropologische synthese wordt de taalkundige diversiteit gebruikt als proxy voor de culturele verschillen van de overeenkomstige populaties en wordt vervolgens vergeleken met de variabiliteit van familienamen (hun aantal, frequentie en geografische verdeling) of met genetische verschillen die gebaseerd zijn op moleculaire kenmerken in het DNA.Met betrekking tot dat laatste kan de analyse van familienamen migraties zichtbaar maken die mogelijk in historische tijden hebben plaatsgevonden, en kunnen we regio's onderscheiden die veel immigranten hebben ontvangen die wegtrokken uit demografisch stabieler gebleven regio's. Wij vermoeden dat dergelijke migratiepatronen dialect- en taalcontact hebben beïnvloed. Dit is een nieuw perspectief van waaruit we de effecten van migratie op taalverandering kunnen onderzoeken.This thesis in linguistics includes five published articles and one study to appear, in which I review, test and use computational linguistic methods to classify languages and dialects consisting of lexical items – the sort of material that is generally readily available from linguistic atlases and databases. To compare linguistic varieties and classify them, methods that lead to the computation of a linguistic distance matrix are used.The studies reported respectively concern the classification of Dutch dialects from the Netherlands; languages and dialects from Spain; Bantu languages from Gabon, Tanzania and, finally, Turkic and Indo-Iranian languages spoken in Kyrgyzstan, Tajikistan and Uzbekistan.In a multidisciplinary perspective aimed at providing a higher level of anthropological synthesis, linguistic diversity is used as a proxy for the cultural differences of corresponding populations and is then compared to the variability of family names (their number, frequency and geographic distribution) or to genetic differences based on molecular markers on the DNA. The analysis of family names enables the depiction of migrations which have taken place in historical times, and, allows us to distinguish regions that have received many immigrants from those that have remained demographically more stable. We conjecture that such migration patterns have influenced dialect and language contact. This is a novel perspective from which we may examine the effects of migration on language change, for example it appears that Spanish languages have remained lively because the regions where they are spoken have often be quite isolated demographically

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Inquiries into words, constraints and contexts : Festschrift in the honour of Kimmo Koskenniemi on his 60th birthday

Author: Arppe Antti
Carlson Lauri
Linden Krister
Piitulainen Jussi Olavi
Suominen Mickael
Vainio Martti
Westerlund Hanna
Yli-Jyrä Anssi Mikael
Publication venue: CSLI publications
Publication date: 01/01/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

Author: Mitra Vikramjit
Publication venue
Publication date: 01/01/2010
Field of study

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

CiteSeerX

Digital Repository at the University of Maryland