Search CORE

842 research outputs found

In search of isoglosses: continuous and discrete language embeddings in Slavic historical phonology

Author: Cathcart Chundra A.
Wandl Florian
Publication venue
Publication date: 01/01/2020
Field of study

This paper investigates the ability of neural network architectures to effectively learn diachronic phonological generalizations in a multilingual setting. We employ models using three different types of language embedding (dense, sigmoid, and straight-through). We find that the Straight-Through model outperforms the other two in terms of accuracy, but the Sigmoid model's language embeddings show the strongest agreement with the traditional subgrouping of the Slavic languages. We find that the Straight-Through model has learned coherent, semi-interpretable information about sound change, and outline directions for future research

arXiv.org e-Print Archive

Crossref

ZORA

Modeling the Relationship among Linguistic Typological Features with Hierarchical Dirichlet Process

Author: Lin Chu-Cheng
Tsai Richard Tzong-Han
Wang Yu-Chun
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Learning Language Representations for Typology Prediction

Author: Littell Patrick
Malaviya Chaitanya
Neubig Graham
Publication venue
Publication date: 01/01/2017
Field of study

One central mystery of neural NLP is what neural models "know" about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages? Can this knowledge be extracted from the system to fill holes in human scientific knowledge? Existing typological databases contain relatively full feature specifications for only a few hundred languages. Exploiting the existence of parallel texts in more than a thousand languages, we build a massive many-to-one neural machine translation (NMT) system from 1017 languages into English, and use this to predict information missing from typological databases. Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages' geographic and phylogenetic neighbors.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Reconstructing Native Language Typology from Foreign Language Usage

Author: Berzak Yevgeni
Katz Boris
Reichart Roi
Publication venue
Publication date: 01/01/2014
Field of study

Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign language. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained from the typological features of the native languages. We leverage this finding to recover native language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.Comment: CoNLL 201

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Beyond binary dependencies in language structure

Author: Blasi Damian
Roberts Sean
Publication venue
Publication date: 10/05/2017
Field of study

Beyond binary dependencies in language structur

ZENODO

MPG.PuRe

Explore Bristol Research

Linguistic diversity through data

Author: Blasi Damian
Publication venue
Publication date: 27/04/2018
Field of study

Qucosa - Publikationsserver der Universität Leipzig

Innovative technologies for under-resourced language documentation: The BULB Project

Author: Adda Gilles
Adda-Decker Martine
Ambouroue Odette
Besacier Laurent
Blachon David
Ene Bonneau-Maynard Héì
Gauthier Elodie
Godard Pierre
Hamlaoui Fatima
Idiatov Dmitry
Kouarata Guy-Noël
Lamel Lori
Makasso Emmanuel-Moselly
Rialland Annie
Stuker Sebastian
Van De Velde Mark
Yvon François
Zerbian Sabine
Publication venue: HAL CCSD
Publication date: 01/05/2016
Field of study

International audienceThe project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated software—LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping

Hal - Université Grenoble Alpes

Hal-Diderot