Search CORE

896 research outputs found

Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and Chinese

Author: Cui Chenxuan
Mortensen David R.
Zhang Katherine J.
Publication venue
Publication date: 08/04/2022
Field of study

Coordinate compounds (CCs) and elaborate expressions (EEs) are coordinate constructions common in languages of East and Southeast Asia. Mortensen (2006) claims that (1) the linear ordering of EEs and CCs in Hmong, Lahu, and Chinese can be predicted via phonological hierarchies and (2) these phonological hierarchies lack a clear phonetic rationale. These claims are significant because morphosyntax has often been seen as in a feed-forward relationship with phonology, and phonological generalizations have often been assumed to be phonetically "natural". We investigate whether the ordering of CCs and EEs can be learned empirically and whether computational models (classifiers and sequence labeling models) learn unnatural hierarchies similar to those posited by Mortensen (2006). We find that decision trees and SVMs learn to predict the order of CCs/EEs on the basis of phonology, with DTs learning hierarchies strikingly similar to those proposed by Mortensen. However, we also find that a neural sequence labeling model is able to learn the ordering of elaborate expressions in Hmong very effectively without using any phonological information. We argue that EE ordering can be learned through two independent routes: phonology and lexical distribution, presenting a more nuanced picture than previous work. [ISO 639-3:hmn, lhu, cmn]Comment: To be published in NAACL202

arXiv.org e-Print Archive

Lexical prefixes and Tibeto-Burman laryngeal contrasts

Author: Mortensen David R.
Publication venue: 'Linguistic Society of America'
Publication date: 25/06/2011
Field of study

Proceedings of the 37th Annual Meeting of the Berkeley Linguistics Society (2013), pp. 272-28

Proceedings Published by the LSA (Linguistic Society of America)

ChatGPT MT: Competitive for High- (but not Low-) Resource Languages

Author: Mortensen David R.
Neubig Graham
Ogayo Perez
Robinson Nathaniel R.
Publication venue
Publication date: 14/09/2023
Field of study

Large language models (LLMs) implicitly learn to perform a range of language tasks, including machine translation (MT). Previous studies explore aspects of LLMs' MT capabilities. However, there exist a wide variety of languages for which recent LLM MT performance has never before been evaluated. Without published experimental evidence on the matter, it is difficult for speakers of the world's diverse languages to know how and whether they can use LLMs for their languages. We present the first experimental evidence for an expansive set of 204 languages, along with MT cost analysis, using the FLORES-200 benchmark. Trends reveal that GPT models approach or exceed traditional MT model performance for some high-resource languages (HRLs) but consistently lag for low-resource languages (LRLs), under-performing traditional MT for 84.1% of languages we covered. Our analysis reveals that a language's resource level is the most important feature in determining ChatGPT's relative ability to translate it, and suggests that ChatGPT is especially disadvantaged for LRLs and African languages.Comment: 27 pages, 9 figures, 14 table

arXiv.org e-Print Archive

Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods

Author: Berg-Kirkpatrick Taylor
Mortensen David R.
Rabinovich Ella
Ryskina Maria
Tsvetkov Yulia
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

We perform statistical analysis of the phenomenon of neology, the process by which new words emerge in a language, using large diachronic corpora of English. We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive of word emergence although we find more support for the latter hypothesis. Besides presenting a new linguistic application of distributional semantics, this study tackles the linguistic question of the role of language-internal factors (in our case, sparsity) in language change motivated by language-external factors (reflected in frequency growth)

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

Interarm differences in systolic blood pressure and mortality among US army veterans:aetiological associations and risk prediction in the Vietnam experience study

Author: Batty G David
Gale Catharine R
Kivimäki Mika
Mortensen Laust H
White James
Publication venue: 'SAGE Publications'
Publication date: 01/07/2013
Field of study

Background Differences between the arms in systolic blood pressure (SBP) of ?10?mmHg have been associated with an increased risk of mortality in patients with hypertensive and chronic renal disease. For the first time, we examined these relationships in a non-clinical population. Design Cohort study. Methods Participants were 4419 men (mean age 38.37 years) from the Vietnam Experience Study. Bilateral SBP and diastolic BP (DBP), serum lipids, fasting glucose, erythrocyte sedimentation rate, metabolic syndrome, and ankle brachial index were assessed in 1986. Results Ten per cent of men had an interarm difference of ?10 and 2.4% of ?15?mmHg. A 15-year follow-up period gave rise to 246 deaths (64 from cardiovascular disease, CVD). Interarm differences of ?10?mmHg were associated with an elevated risk of all-cause mortality (hazard ratio, HR, 1.49, 95% confidence interval, CI, 1.04–2.14) and CVD mortality (HR 1.93, 95% CI 1.01–3.69). After adjusting for SBP, DBP, lipids, fasting glucose, and erythrocyte sedimentation rate, associations between interarm differences of ?10?mmHg and all-cause mortality (HR 1.35, 95% CI 0.94–1.95) and CVD mortality (1.62, 95% CI 0.84–3.14) were significantly attenuated. Conclusions In this non-clinical cohort study, interarm differences in SBP were not associated with mortality after accounting for traditional CVD risk factors. Interarm differences might not be valuable as an additional risk factor for mortality in populations with a low risk of CVD. <br/

Southampton (e-Prints Soton)

Crossref

Online Research @ Cardiff

Copenhagen University Research Information System

PubMed Central

UCL Discovery

Edinburgh Research Explorer

Towards Zero-shot Learning for Automatic Phonemic Transcription

Author: Black Alan W
Dalmia Siddharth
Li Juncheng
Li Xinjian
Metze Florian
Mortensen David R.
Publication venue
Publication date: 26/02/2020
Field of study

Automatic phonemic transcription tools are useful for low-resource language documentation. However, due to the lack of training sets, only a tiny fraction of languages have phonemic transcription tools. Fortunately, multilingual acoustic modeling provides a solution given limited audio training data. A more challenging problem is to build phonemic transcribers for languages with zero training data. The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes. In this work, we address this problem by adopting the idea of zero-shot learning. Our model is able to recognize unseen phonemes in the target language without any training data. In our model, we decompose phonemes into corresponding articulatory attributes such as vowel and consonant. Instead of predicting phonemes directly, we first predict distributions over articulatory attributes, and then compute phoneme distributions with a customized acoustic model. We evaluate our model by training it using 13 languages and testing it using 7 unseen languages. We find that it achieves 7.7% better phoneme error rate on average over a standard multilingual model.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications