435 research outputs found

    Morphologic, Syntactic, and Phonologic Distance Between Japanese and Altaic, Dravidian, Austronesian, and Korean Languages

    Get PDF
    The present study measures the resemblances of Japanese with Altaic languages (Turkic; Tungstic; Mongolic; Nivkh); the Dravidian language Tamil; Austronesian languages (Western Malayo-Polynesian; Malayo-Sumbawan; Central Luzon; Central Malayo-Polynesian), and Korean, in an effort to pin down the genealogy of Japanese. Morphologic, syntactic, and phonologic distance are calculated using data from corpora. The chi-square homogeneity test and Euclidean distances are used for statistical analysis. The finding brings to light, morphologically, in the light of preferences of causative/inchoative verb alternation patterning and morphemes that convey the alternation, that Japanese and Korean are close for the most part. Syntactically, Altaics and Tamil convey case via suffixes; case in Austronesian languages is marked by prefixes. Japanese and Korean share a similarity in rendering case with particles. Phonologically, the Tamil and Austronesian languages share a resemblance in the harmony of vowel height. The Korean, Altaic languages, and Austronesian languages show similarities in the harmony of vowel backness. Japanese, the Altaic languages, and the Austronesian language Madurese display vowel-consonant harmony. Pulling these strands together, a conclusion is thus drawn that Japanese is most closely related to Korean

    Myanmar named entity corpus and its use in syllable-based neural named entity recognition

    Get PDF
    Myanmar language is a low-resource language and this is one of the main reasons why Myanmar Natural Language Processing lagged behind compared to other languages. Currently, there is no publicly available named entity corpus for Myanmar language. As part of this work, a very first manually annotated Named Entity tagged corpus for Myanmar language was developed and proposed to support the evaluation of named entity extraction. At present, our named entity corpus contains approximately 170,000 name entities and 60,000 sentences. This work also contributes the first evaluation of various deep neural network architectures on Myanmar Named Entity Recognition. Experimental results of the 10-fold cross validation revealed that syllable-based neural sequence models without additional feature engineering can give better results compared to baseline CRF model. This work also aims to discover the effectiveness of neural network approaches to textual processing for Myanmar language as well as to promote future research works on this understudied language

    Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

    Get PDF
    Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF
    • …
    corecore