896 research outputs found
Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and Chinese
Coordinate compounds (CCs) and elaborate expressions (EEs) are coordinate
constructions common in languages of East and Southeast Asia. Mortensen (2006)
claims that (1) the linear ordering of EEs and CCs in Hmong, Lahu, and Chinese
can be predicted via phonological hierarchies and (2) these phonological
hierarchies lack a clear phonetic rationale. These claims are significant
because morphosyntax has often been seen as in a feed-forward relationship with
phonology, and phonological generalizations have often been assumed to be
phonetically "natural". We investigate whether the ordering of CCs and EEs can
be learned empirically and whether computational models (classifiers and
sequence labeling models) learn unnatural hierarchies similar to those posited
by Mortensen (2006). We find that decision trees and SVMs learn to predict the
order of CCs/EEs on the basis of phonology, with DTs learning hierarchies
strikingly similar to those proposed by Mortensen. However, we also find that a
neural sequence labeling model is able to learn the ordering of elaborate
expressions in Hmong very effectively without using any phonological
information. We argue that EE ordering can be learned through two independent
routes: phonology and lexical distribution, presenting a more nuanced picture
than previous work. [ISO 639-3:hmn, lhu, cmn]Comment: To be published in NAACL202
Lexical prefixes and Tibeto-Burman laryngeal contrasts
Proceedings of the 37th Annual Meeting of the Berkeley Linguistics
Society (2013), pp. 272-28
ChatGPT MT: Competitive for High- (but not Low-) Resource Languages
Large language models (LLMs) implicitly learn to perform a range of language
tasks, including machine translation (MT). Previous studies explore aspects of
LLMs' MT capabilities. However, there exist a wide variety of languages for
which recent LLM MT performance has never before been evaluated. Without
published experimental evidence on the matter, it is difficult for speakers of
the world's diverse languages to know how and whether they can use LLMs for
their languages. We present the first experimental evidence for an expansive
set of 204 languages, along with MT cost analysis, using the FLORES-200
benchmark. Trends reveal that GPT models approach or exceed traditional MT
model performance for some high-resource languages (HRLs) but consistently lag
for low-resource languages (LRLs), under-performing traditional MT for 84.1% of
languages we covered. Our analysis reveals that a language's resource level is
the most important feature in determining ChatGPT's relative ability to
translate it, and suggests that ChatGPT is especially disadvantaged for LRLs
and African languages.Comment: 27 pages, 9 figures, 14 table
Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods
We perform statistical analysis of the phenomenon of neology, the process by which new words emerge in a language, using large diachronic corpora of English. We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive of word emergence although we find more support for the latter hypothesis. Besides presenting a new linguistic application of distributional semantics, this study tackles the linguistic question of the role of language-internal factors (in our case, sparsity) in language change motivated by language-external factors (reflected in frequency growth)
Interarm differences in systolic blood pressure and mortality among US army veterans:aetiological associations and risk prediction in the Vietnam experience study
Background Differences between the arms in systolic blood pressure (SBP) of ?10?mmHg have been associated with an increased risk of mortality in patients with hypertensive and chronic renal disease. For the first time, we examined these relationships in a non-clinical population. Design Cohort study. Methods Participants were 4419 men (mean age 38.37 years) from the Vietnam Experience Study. Bilateral SBP and diastolic BP (DBP), serum lipids, fasting glucose, erythrocyte sedimentation rate, metabolic syndrome, and ankle brachial index were assessed in 1986. Results Ten per cent of men had an interarm difference of ?10 and 2.4% of ?15?mmHg. A 15-year follow-up period gave rise to 246 deaths (64 from cardiovascular disease, CVD). Interarm differences of ?10?mmHg were associated with an elevated risk of all-cause mortality (hazard ratio, HR, 1.49, 95% confidence interval, CI, 1.04–2.14) and CVD mortality (HR 1.93, 95% CI 1.01–3.69). After adjusting for SBP, DBP, lipids, fasting glucose, and erythrocyte sedimentation rate, associations between interarm differences of ?10?mmHg and all-cause mortality (HR 1.35, 95% CI 0.94–1.95) and CVD mortality (1.62, 95% CI 0.84–3.14) were significantly attenuated. Conclusions In this non-clinical cohort study, interarm differences in SBP were not associated with mortality after accounting for traditional CVD risk factors. Interarm differences might not be valuable as an additional risk factor for mortality in populations with a low risk of CVD. <br/
Towards Zero-shot Learning for Automatic Phonemic Transcription
Automatic phonemic transcription tools are useful for low-resource language
documentation. However, due to the lack of training sets, only a tiny fraction
of languages have phonemic transcription tools. Fortunately, multilingual
acoustic modeling provides a solution given limited audio training data. A more
challenging problem is to build phonemic transcribers for languages with zero
training data. The difficulty of this task is that phoneme inventories often
differ between the training languages and the target language, making it
infeasible to recognize unseen phonemes. In this work, we address this problem
by adopting the idea of zero-shot learning. Our model is able to recognize
unseen phonemes in the target language without any training data. In our model,
we decompose phonemes into corresponding articulatory attributes such as vowel
and consonant. Instead of predicting phonemes directly, we first predict
distributions over articulatory attributes, and then compute phoneme
distributions with a customized acoustic model. We evaluate our model by
training it using 13 languages and testing it using 7 unseen languages. We find
that it achieves 7.7% better phoneme error rate on average over a standard
multilingual model.Comment: AAAI 202
- …