254 research outputs found

    Roots of the Mongolian State: Genghis Khan’s Survival and Pragmatism as related in the Secret History of the Mongols

    Get PDF
    The genesis of the first Mongol State (1206) was overseen and led by Genghis Khan, whose conquests remain a formidable historical series of events. The Secret History of the Mongols narrates his biography as a tale of surviving repeated life threats and defeating major enemies. From this history, I have extracted an existential framework to explain how he survived in a dangerous natural, social and political environment. The rise of this State compressed what occurred in most other historical States, and I will summarize my Anthrocentric Security Theory as general explanation of this phenomenon, drawing on Western philosophy, especially philosophical anthropology. The framework consists of four levels of Being - state of nature, life- community, State, and civil society. Each level has enabled humans to devise several Security Action Platforms from which are launched particular security actions, culminating in the State. Successful in three stages, but not in creating a civil society, the Mongol State assimilated and absorbed the strengths of natural men and life-communities, enabling the expansion into Eurasian empire under his sons and grandsons

    Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

    Get PDF
    Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF

    Statistical Parsing by Machine Learning from a Classical Arabic Treebank

    Get PDF
    Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computational linguistics, relative to its worldwide reach as the language of the Quran. The thesis is based on seven publications that make significant contributions to knowledge relating to annotating and parsing Classical Arabic. Classical Arabic has been studied in depth by grammarians for over a thousand years using a traditional grammar known as i’rāb (Ű„ŰčŰșۧ۩ ). Using this grammar to develop a representation for parsing is challenging, as it describes syntax using a hybrid of phrase-structure and dependency relations. This work aims to advance the state-of-the-art for hybrid parsing by introducing a formal representation for annotation and a resource for machine learning. The main contributions are the first treebank for Classical Arabic and the first statistical dependency-based parser in any language for ellipsis, dropped pronouns and hybrid representations. A central argument of this thesis is that using a hybrid representation closely aligned to traditional grammar leads to improved parsing for Arabic. To test this hypothesis, two approaches are compared. As a reference, a pure dependency parser is adapted using graph transformations, resulting in an 87.47% F1-score. This is compared to an integrated parsing model with an F1-score of 89.03%, demonstrating that joint dependency-constituency parsing is better suited to Classical Arabic. The Quran was chosen for annotation as a large body of work exists providing detailed syntactic analysis. Volunteer crowdsourcing is used for annotation in combination with expert supervision. A practical result of the annotation effort is the corpus website: http://corpus.quran.com, an educational resource with over two million users per year

    Universal Discourse Representation Structure Parsing

    Get PDF
    We consider the task of crosslingual semantic parsing in the style of Discourse Representation Theory (DRT) where knowledge from annotated corpora in a resource-rich language is transferred via bitext to guide learning in other languages. We introduce Universal Discourse Representation Theory (UDRT), a variant of DRT that explicitly anchors semantic representations to tokens in the linguistic input. We develop a semantic parsing framework based on the Transformer architecture and utilize it to obtain semantic resources in multiple languages following two learning schemes. The many-to-one approach translates non-English text to English, and then runs a relatively accurate English parser on the translated text, while the one-to-many approach translates gold standard English to non-English text and trains multiple parsers (one per language) on the translations. Experimental results on the Parallel Meaning Bank show that our proposal outperforms strong baselines by a wide margin and can be used to construct (silver-standard) meaning banks for 99 languages

    Quantitative computational syntax: some initial results

    Get PDF
    In the computational study of human intelligence, the language sciences are in the unique position of resting both on sophisticated theories and representations and on large amounts of observational data available for many languages. In this paper, we discuss some recent results, where large-scale, data-intensive computational modelling techniques are used to address fundamental linguistic questions on the quantitative properties of abstract grammatical representations. Specifically, we present a programme of research exemplified in three case studies to identify the causes of frequency differentials. In the area of word order, we discuss work that investigates whether typological and corpus frequencies are systematically correlated to abstract syntactic structures and to higher-level structural principles of minimisation and efficiency. In the area of verb meaning, corpus-based computational models are discussed that investigate how frequencies are correlated to well-known lexical effects in causative alternations and morphological marking. The large corpus-based, cross-linguistic component of the work and the abstract grammatical hypotheses on word order and verb meaning provide new empirical and computational evidence to the important debate on language variation, its extent and its limits and illustrate how to bring corpus-based computational methodology to bear on theoretical syntactic issues. In so doing, we help reduce the current gap between theoretical and computational linguistics

    The optimality of word lengths. Theoretical foundations and an empirical study

    Full text link
    Zipf's law of abbreviation, namely the tendency of more frequent words to be shorter, has been viewed as a manifestation of compression, i.e. the minimization of the length of forms -- a universal principle of natural communication. Although the claim that languages are optimized has become trendy, attempts to measure the degree of optimization of languages have been rather scarce. Here we present two optimality scores that are dualy normalized, namely, they are normalized with respect to both the minimum and the random baseline. We analyze the theoretical and statistical pros and cons of these and other scores. Harnessing the best score, we quantify for the first time the degree of optimality of word lengths in languages. This indicates that languages are optimized to 62 or 67 percent on average (depending on the source) when word lengths are measured in characters, and to 65 percent on average when word lengths are measured in time. In general, spoken word durations are more optimized than written word lengths in characters. Our work paves the way to measure the degree of optimality of the vocalizations or gestures of other species, and to compare them against written, spoken, or signed human languages.Comment: On the one hand, the article has been reduced: analyses of the law of abbreviation and some of the methods have been moved to another article; appendix B has been reduced. On the other hand, various parts have been rewritten for clarity; new figures have been added to ease the understanding of the scores; new citations added. Many typos have been correcte

    Alignment and Adjacency in Optimality Theory: evidence from Warlpiri and Arrernte

    Get PDF
    The goal of this thesis is to explore alignment and adjacency of constituents in the framework of Optimality Theory. Under the notion of alignment, certain categories, prosodic and morphological, are required to correspond to certain other categories, prosodic or morphological. The alignment of categories is achieved through the operation of constraints which evaluate the wellformedness of outputs. The constraints on the alignment of categories and the ranking of these constraints are examined with emphasis on two Australian languages, Warlpiri and Arrernte. The aim is to provide an adequate account in the theory of Optimality of the processes of stress, reduplication and vowel harmony evident in the data. The thesis expands on the range of edges for the alignment of feet. Foot alignment is developed to account for the fact that the edges of intonational phrases, morphemes, and specific morphemes, as well as phonologically specific syllables, play an active role in determining the location of feet. An additional finding is that the location of feet can also be determined by adjacency, resolving conflict between morphological alignment, and ensuring rhythmic harmony. Requirements on adjacency are further supported to account for segmental harmony, where harmony provides evidence for the simultaneous action of segmental and prosodic processes. The analysis provides a unified account of binary and ternary rhythm recommending modifications to alignment of certain categories, thereby laying the groundwork to deal with variation. The account of variation involves relaxing certain constraints. In addition, the notion of rhythm is expanded to account for onset sensitivity to stress, with evidence of this sensitivity found in reduplication and allomorphy. The interaction of prosodic categories with each other and with morphological categories can be directly captured in OT, providing a unified and coherent account of phenomena, some of which were previously seen as exceptions and, therefore unrelated and arbitrary
    • 

    corecore