246 research outputs found

    Tools For Assessing Relatedness In Understudied Language Varieties: A Survey Of Mixtec Varieties In Western Oaxaca, Mexico

    Get PDF
    This thesis presents findings of research conducted on the relatedness of seven Mixtec varieties spoken in indigenous language communities in western Oaxaca, Mexico. Mixtec varieties vary widely from one community to the next, and it is necessary to determine the relatedness of Mixtec varieties in order to best serve the language development needs of communities. Understanding the relatedness of these varieties is also an important step in measuring their intelligibility. I used three research tools to gather data: a General Wordlist, a Tone Wordlist, and a Sociolinguistic Questionnaire. I present five analyses: percentage of phonologically similar forms, displaying phonological correspondences using isoglosses, two analyses of tone patterns, and reported intelligibility. Taken together, the first four analyses provide a clear picture of the linguistic relations of the Mixtec varieties studied. The analyses of tone and use of isoglosses are of particular note, as they present new strategies for analyzing unstudied tonal languages and language families. Findings on linguistic relatedness are then compared to the reported intelligibility of native speakers from the Questionnaire. With minor exceptions, the proposed relatedness matches up closely with intelligibility reported by survey participants. I then clarify how preexisting linguistic designations for this region could be improved, based on my findings. The Ethnologue currently includes all seven of the language varieties surveyed under a single designation, but my findings show that it is necessary to list YUC in a separate designation from the other six communities. The Instituto Nacional de Lenguas Indnas (INALI, National Institute of Indigenous Languages) needs to revise its current designations so that YUC is left under its current designation, the mixteco del oeste alto (High Western Mixtec), while all of the six varieties surveyed should be under the mixteco del oeste (Western Mixtec) designation

    Advanced Techniques for the Decipherment of Ancient Scripts

    Get PDF
    This contribution explores modern and traditional approaches to the decipherment of ancient writing systems. It surveys methods used by paleographers and epigraphers and state-of-the art applications of computational linguistics, such as models based on neural networks. It frames the contextual problems scholars encounter in dealing with ancient codes, the situations and preconditions of the unknown codes, their idiosyncrasies and peculiarities, and the potential solutions afforded by both traditional and novel methods of investigation

    Computational Approaches to Historical Language Comparison

    Get PDF
    The chapter discusses recently developed computational techniques providing concrete help in addressing various tasks in historical language comparison, focusing specifically on those tasks which are typically subsumed under the framework of the comparative method. These include the proof of relationship, cognate and correspondence detection, phonological reconstruction and sound law induction, and the reconstruction of evolutionary scenarios

    Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach

    Get PDF
    False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available

    Common sense: continuing in the comparative tradition

    Get PDF

    Newton Goes East: Natural Philosophy in the First Malay Grammar (1736) and the First Malay Bible (1733)

    Get PDF
    George Henrik Werndly’s work in Malay grammar, literature, and Bible translation can be understood and explained in the context of late seventeenth- and early eighteenth-century natural philosophy, especially natural philosophy in the spirit of Newton. The Dutch natural philosopher Lambert ten Kate, who was deeply influenced by Isaac Newton, is one of the main channels through which the ideas of the natural philosophy tradition reached Werndly. Ten Kate had applied the methodologies of natural philosophy to linguistics in ways that inspired Werndly to follow the same approach in his grammar of Malay

    Probabilistic Models for Alignment of Etymological Data

    Get PDF
    Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 246-253. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16955

    Information-theoretic causal inference of lexical flow

    Get PDF
    This volume seeks to infer large phylogenetic networks from phonetically encoded lexical data and contribute in this way to the historical study of language varieties. The technical step that enables progress in this case is the use of causal inference algorithms. Sample sets of words from language varieties are preprocessed into automatically inferred cognate sets, and then modeled as information-theoretic variables based on an intuitive measure of cognate overlap. Causal inference is then applied to these variables in order to determine the existence and direction of influence among the varieties. The directed arcs in the resulting graph structures can be interpreted as reflecting the existence and directionality of lexical flow, a unified model which subsumes inheritance and borrowing as the two main ways of transmission that shape the basic lexicon of languages. A flow-based separation criterion and domain-specific directionality detection criteria are developed to make existing causal inference algorithms more robust against imperfect cognacy data, giving rise to two new algorithms. The Phylogenetic Lexical Flow Inference (PLFI) algorithm requires lexical features of proto-languages to be reconstructed in advance, but yields fully general phylogenetic networks, whereas the more complex Contact Lexical Flow Inference (CLFI) algorithm treats proto-languages as hidden common causes, and only returns hypotheses of historical contact situations between attested languages. The algorithms are evaluated both against a large lexical database of Northern Eurasia spanning many language families, and against simulated data generated by a new model of language contact that builds on the opening and closing of directional contact channels as primary evolutionary events. The algorithms are found to infer the existence of contacts very reliably, whereas the inference of directionality remains difficult. This currently limits the new algorithms to a role as exploratory tools for quickly detecting salient patterns in large lexical datasets, but it should soon be possible for the framework to be enhanced e.g. by confidence values for each directionality decision

    Bora loans in Resígaro: Massive morphological and little lexical borrowing in a moribund Arawakan language

    Get PDF
    This study analyzes the influence of Bora (Boran) on Resígaro (Arawakan), two languages of the Colombian-Peruvian Amazon region, using a newly discovered Resígaro wordlist from the 1930s (Manuel María de Mataró no date), another wordlist from the late 1920s (Rivet & Wavrin 1951), and another from the early 1970s (Allin 1976:382-458). It shows that despite heavy structural and morphological influence (Aikhenvald 2001:182-190) Resígaro has borrowed relatively few lexical items, around 5% in all three sources. It also shows that the borrowing of entire sets of grammatical morphemes, including classifiers, number markers, and bound grammatical roots that is observable in contemporary Resígaro (Seifart 2011) goes back to at least the early 20th century. This suggests that this remarkable case of massive morphological borrowing is not merely an effect of language decay, linked to the current language endangerment situation of Resígaro, with only two surviving speakers
    corecore