3 research outputs found
Enrichment of OntoSenseNet: Adding a Sense-annotated Telugu lexicon
The paper describes the enrichment of OntoSenseNet - a verb-centric lexical
resource for Indian Languages. This resource contains a newly developed
Telugu-Telugu dictionary. It is important because native speakers can better
annotate the senses when both the word and its meaning are in Telugu. Hence
efforts are made to develop a soft copy of Telugu dictionary. Our resource also
has manually annotated gold standard corpus consisting 8483 verbs, 253 adverbs
and 1673 adjectives. Annotations are done by native speakers according to
defined annotation guidelines. In this paper, we provide an overview of the
annotation procedure and present the validation of our resource through
inter-annotator agreement. Concepts of sense-class and sense-type are
discussed. Additionally, we discuss the potential of lexical sense-annotated
corpora in improving word sense disambiguation (WSD) tasks. Telugu WordNet is
crowd-sourced for annotation of individual words in synsets and is compared
with the developed sense-annotated lexicon (OntoSenseNet) to examine the
improvement. Also, we present a special categorization (spatio-temporal
classification) of adjectives.Comment: Accepted Long Paper at 19th International Conference on Computational
Linguistics and Intelligent Text Processing, March 2018, Hanoi, Vietna
Towards Enhancing Lexical Resource and Using Sense-annotations of OntoSenseNet for Sentiment Analysis
This paper illustrates the interface of the tool we developed for crowd
sourcing and we explain the annotation procedure in detail. Our tool is named
as 'Parupalli Padajaalam' which means web of words by Parupalli. The aim of
this tool is to populate the OntoSenseNet, sentiment polarity annotated Telugu
resource. Recent works have shown the importance of word-level annotations on
sentiment analysis. With this as basis, we aim to analyze the importance of
sense-annotations obtained from OntoSenseNet in performing the task of
sentiment analysis. We explain the fea- tures extracted from OntoSenseNet
(Telugu). Furthermore we compute and explain the adverbial class distribution
of verbs in OntoSenseNet. This task is known to aid in disambiguating
word-senses which helps in enhancing the performance of word-sense
disambiguation (WSD) task(s).Comment: Accepted at 3rd Workshop on Semantic Deep Learning (SemDeep-3) at The
27th International Conference on Computational Linguistics, COLING (August
2018) in Santa Fe, New Mexico, US
The 6th Workshop on Asian Languae Resources, 2008 ASSESSMENT AND DEVELOPMENT OF POS TAG SET FOR TELUGU
In this paper, we first had a overall study of existing POS tag sets for European and Indian languages. Till now, most of the research done on POS tagging is for English. We observed that even though the research on POS tagging for English is done exhaustively, part-of-speech annotation in various research applications is incomparable which is variously due to the variations in tag set definitions. We understand that the morphosyntactic features of the language and the degree of desire to represent the granularity of these morpho-syntactic features, domain etc., decide the tags in the tag set. We then examined how POS tagset design has to be handled for Indian languages, taking Telugu language into consideration. 1