Search CORE

1,360 research outputs found

ANNOTATION MODEL FOR LOANWORDS IN INDONESIAN CORPUS: A LOCAL GRAMMAR FRAMEWORK

Author: Prihantoro Prihantoro
Publication venue
Publication date: 02/07/2013
Field of study

There is a considerable number for loanwords in Indonesian language as it has been, or even continuously, in contact with other languages. The contact takes place via different media; one of them is via machine readable medium. As the information in different languages can be obtained by a mouse click these days, the contact becomes more and more intense. This paper aims at proposing an annotation model and lexical resource for loanwords in Indonesian. The lexical resource is applied to a corpus by a corpus processing software called UNITEX. This software works under local grammar framewor

Diponegoro University Institutional Repository

Review of Christine Sommerschuh, Einführung in die tibetische Schriftsprache: Lehrbuch für den Unterricht und das vertiefende Selbststudium. Nordstedt: Books on Demand GmbH, 2008.

Author: Hill Nathan W.
Publication venue: 'Brill'
Publication date: 01/01/2010
Field of study

Crossref

SOAS Research Online

SHR++: An Interface for Morpho-syntactic annotation of Sanskrit Corpora

Author: Chawla Dilpreet
Goyal Pawan
Krishna Amrith
Sambhavi Sruti
Vidhyut Shiv
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/02/2020
Field of study

The IT University of Copenhagen's Repository

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

Author: Sandhan Jivnesh
Publication venue
Publication date: 17/08/2023
Field of study

The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

arXiv.org e-Print Archive

A BRIEF INTRODUCTION TO AYURVEDIC SYSTEM OF MEDICINE : PROBLEMS ANDPROSPECTS OF DATABASE

Author: Unnikrishnan P.M.
Viswanathan M.V.
Publication venue: 富山医科薬科大学和漢薬研究所
Publication date: 01/06/1999
Field of study

Today the medical world is posed with complex chalolenges. Thus time demands an integrated and pluralistic approach towards healthcare to cope effectively with this situation. There has been an growing interest in Ayurveda in the past few years. To initiate fruitful dialogues between Ayurveda and modern science, an in-depth understanding of both the systems becomes an essential prerequisite. Such an exercise should emerge from a standpoint accepting that there are different world views existing in the world, Ayurveda being one among them. This may sound quite contrary to the common belief that the science is only one as expressed in modern scientific paradigm. Both Modern science and Ayurveda have universal attributes and share the common objective of well-being of mankind. But they are quite different in their philosophical and epistemological foundations, conceptual framework and practical outlook. So, let us see what are the fundamental differences between Sastra(Ayurveda) and the Modern science

University of Toyama Repository

Aspirated and Unaspirated Voiceless Consonants in Old Tibetan

Author: Hill Nathan W.
Publication venue
Publication date: 01/01/2007
Field of study

Although Tibetan orthography distinguishes aspirated and unaspirated voiceless consonants, various authors have viewed this distinction as not phonemic. An examination of the unaspirated voiceless initials in the Old Tibetan Inscriptions, together with unaspirated voiceless consonants in several Tibetan dialects confirms that aspiration was either not phonemic in Old Tibetan, or only just emerging as a distinction due to loan words. The data examined also affords evidence for the nature of the phonetic word in Old Tibetan

SOAS Research Online

New functions and updates of the resource DiACL - Diachronic Atlas of Compartive Linguistics

Author: Carling Gerd
Larsson Filip
Lundgren Olof
Nilsson Linus
Verhoeven Rob
Publication venue: Pavia University Press
Publication date: 01/01/2021
Field of study

Lund University Publications

The complexity of phonology

Author: András Kornai
Frank Robert
Kaplan Ron
Kornai András
Moore Gordon
Nettle David
Publication venue
Publication date: 01/01/2009
Field of study

CiteSeerX

Crossref

SZTAKI Publication Repository

The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries

Author: Garrett Edward
Hill Nathan W.
Kilgarriff Adam
Vadlapudi Ravikiran
Zadoks Abel
Publication venue: 'INIST-CNRS'
Publication date: 01/01/2015
Field of study

The first alphabetized dictionary of Tibetan appeared in 1829 (cf. Bray 2008) and the intervening 184 years have witnessed the publication of scores of other Tibetan dictionaries (cf. Simon 1964). Hundreds of Tibetan dictionaries are now available; these include bilin gual dictionaries, both to and from such languages as English, French, German, Latin, Japanese, etc. and specialized dictionaries focusing on medicine, plants, dialects, archaic terms, neologisms, etc. (cf. Walter 2006, McGrath 2008). However, if one classifies Tibetan dictionaries by the methods of their compilation the accomplishments of Tibetan lexicography are less impressive. Methodologies of dictionary compilation divide heuristically into three types. First, some dictionaries lack explicit methodology; these works assemble words in an ad hoc manner and illustrate them with invented examples. Second, there are dictionaries that are compiled over very long periods of time on the basis of collections of slips recording attestations of words as used in context. Third, more recent dictionaries are compiled on the basis of electronic text corpora, which are processed computationally to aid in the precision, consistency and speed of dictionary compilation. These methods may be called respectively the 'informal method', the 'traditional method', and the 'modern method'. The overwhelming majority of Tibetan dictionaries were compiled with the informal method. Only five Tibetan dictionaries use the traditional methodology. No Tibetan dictionary yet compiled makes use of the modern method

SOAS Research Online