1,360 research outputs found
ANNOTATION MODEL FOR LOANWORDS IN INDONESIAN CORPUS: A LOCAL GRAMMAR FRAMEWORK
There is a considerable number for loanwords in Indonesian language as it has been,
or even continuously, in contact with other languages. The contact takes place via different
media; one of them is via machine readable medium. As the information in different languages
can be obtained by a mouse click these days, the contact becomes more and more intense. This
paper aims at proposing an annotation model and lexical resource for loanwords in
Indonesian. The lexical resource is applied to a corpus by a corpus processing software called
UNITEX. This software works under local grammar framewor
Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
The primary focus of this thesis is to make Sanskrit manuscripts more
accessible to the end-users through natural language technologies. The
morphological richness, compounding, free word orderliness, and low-resource
nature of Sanskrit pose significant challenges for developing deep learning
solutions. We identify four fundamental tasks, which are crucial for developing
a robust NLP technology for Sanskrit: word segmentation, dependency parsing,
compound type identification, and poetry analysis. The first task, Sanskrit
Word Segmentation (SWS), is a fundamental text processing task for any other
downstream applications. However, it is challenging due to the sandhi
phenomenon that modifies characters at word boundaries. Similarly, the existing
dependency parsing approaches struggle with morphologically rich and
low-resource languages like Sanskrit. Compound type identification is also
challenging for Sanskrit due to the context-sensitive semantic relation between
components. All these challenges result in sub-optimal performance in NLP
applications like question answering and machine translation. Finally, Sanskrit
poetry has not been extensively studied in computational linguistics.
While addressing these challenges, this thesis makes various contributions:
(1) The thesis proposes linguistically-informed neural architectures for these
tasks. (2) We showcase the interpretability and multilingual extension of the
proposed systems. (3) Our proposed systems report state-of-the-art performance.
(4) Finally, we present a neural toolkit named SanskritShala, a web-based
application that provides real-time analysis of input for various NLP tasks.
Overall, this thesis contributes to making Sanskrit manuscripts more accessible
by developing robust NLP technology and releasing various resources, datasets,
and web-based toolkit.Comment: Ph.D. dissertatio
A BRIEF INTRODUCTION TO AYURVEDIC SYSTEM OF MEDICINE : PROBLEMS ANDPROSPECTS OF DATABASE
Today the medical world is posed with complex chalolenges. Thus time demands an integrated and pluralistic approach towards healthcare to cope effectively with this situation. There has been an growing interest in Ayurveda in the past few years. To initiate fruitful dialogues between Ayurveda and modern science, an in-depth understanding of both the systems becomes an essential prerequisite. Such an exercise should emerge from a standpoint accepting that there are different world views existing in the world, Ayurveda being one among them. This may sound quite contrary to the common belief that the science is only one as expressed in modern scientific paradigm. Both Modern science and Ayurveda have universal attributes and share the common objective of well-being of mankind. But they are quite different in their philosophical and epistemological foundations, conceptual framework and practical outlook. So, let us see what are the fundamental differences between Sastra(Ayurveda) and the Modern science
Aspirated and Unaspirated Voiceless Consonants in Old Tibetan
Although Tibetan orthography distinguishes aspirated and unaspirated voiceless consonants, various authors have viewed this distinction as not phonemic. An examination of the unaspirated voiceless initials in the Old Tibetan Inscriptions, together with unaspirated voiceless consonants in several Tibetan dialects confirms that aspiration was either not phonemic in Old Tibetan, or only just emerging as a distinction due to loan words. The data examined also affords evidence for the nature of the phonetic word in Old Tibetan
The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries
The first alphabetized dictionary of Tibetan appeared in 1829 (cf. Bray 2008) and the intervening 184 years have witnessed the publication of scores of other Tibetan dictionaries (cf. Simon 1964). Hundreds of Tibetan dictionaries are now available; these include bilin
gual dictionaries, both to and from such languages
as English, French, German, Latin, Japanese, etc. and specialized dictionaries focusing on medicine, plants, dialects, archaic terms, neologisms, etc. (cf. Walter 2006, McGrath 2008). However, if one classifies Tibetan dictionaries by the methods of their compilation the
accomplishments of Tibetan lexicography are less impressive.
Methodologies of dictionary compilation divide heuristically into three types. First, some dictionaries lack explicit methodology; these works assemble words in an
ad hoc manner and illustrate them with invented examples. Second, there are dictionaries that are compiled over very long periods of time on the basis of collections of slips
recording attestations of words as used in context. Third, more recent dictionaries are compiled on the basis of electronic text corpora, which are processed computationally to aid in the precision, consistency and speed of dictionary compilation. These methods may be called respectively the 'informal method', the 'traditional method', and the 'modern method'. The overwhelming majority of Tibetan dictionaries were compiled with the informal method. Only five Tibetan dictionaries use the traditional methodology. No Tibetan dictionary yet compiled makes
use of the modern method
- …