Search CORE

12 research outputs found

Learning Language Representations for Typology Prediction

Author: Littell Patrick
Malaviya Chaitanya
Neubig Graham
Publication venue
Publication date: 01/01/2017
Field of study

One central mystery of neural NLP is what neural models "know" about their subject matter. When a neural machine translation system learns to translate from one language to another, does it learn the syntax or semantics of the languages? Can this knowledge be extracted from the system to fill holes in human scientific knowledge? Existing typological databases contain relatively full feature specifications for only a few hundred languages. Exploiting the existence of parallel texts in more than a thousand languages, we build a massive many-to-one neural machine translation (NMT) system from 1017 languages into English, and use this to predict information missing from typological databases. Experiments show that the proposed method is able to infer not only syntactic, but also phonological and phonetic inventory features, and improves over a baseline that has access to information about the languages' geographic and phylogenetic neighbors.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Bayesian Agglomerative Clustering with Coalescents

Author: Daumé III Hal
Roy Daniel
Teh Yee Whye
Publication venue
Publication date: 01/01/2009
Field of study

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.Comment: NIPS 200

arXiv.org e-Print Archive

CiteSeerX

Oxford University Research Archive

Uncovering Probabilistic Implications in Typological Knowledge Bases

Author: Augenstein Isabelle
Bjerva Johannes
Cotterell Ryan
Kementchedjhieva Yova Radoslavova
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

The study of linguistic typology is rooted in the implications we find between linguistic features, such as the fact that languages with object-verb word ordering tend to have post-positions. Uncovering such implications typically amounts to time-consuming manual processing by trained and experienced linguists, which potentially leaves key linguistic universals unexplored. In this paper, we present a computational model which successfully identifies known universals, including Greenberg universals, but also uncovers new ones, worthy of further linguistic investigation. Our approach outperforms baselines previously used for this problem, as well as a strong baseline from knowledge base population.Comment: To appear in Proceedings of ACL 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

言語変化と系統への統計的アプローチ

Author: Yugo Murawaki
村脇有吾
Publication venue: 統計数理研究所
Publication date: 01/12/2016
Field of study

要旨あり統計的言語研究の現在研究詳

RISM (Repository of the Institute of Statistical Mathematics) / 統計数理研究所学術研究リポジトリ