386 research outputs found
MS-TR: A Morphologically Enriched Sentiment Treebank and Recursive Deep Models for Compositional Semantics in Turkish
Recursive Deep Models have been used as powerful models to learn
compositional representations of text for many natural language processing tasks.
However, they require structured input (i.e. sentiment treebank) to encode sentences
based on their tree-based structure to enable them to learn latent semantics
of words using recursive composition functions. In this paper, we present our
contributions and efforts for the Turkish Sentiment Treebank construction. We
introduce MS-TR, a Morphologically Enriched Sentiment Treebank, which was
implemented for training Recursive Deep Models to address compositional sentiment
analysis for Turkish, which is one of the well-known Morphologically Rich
Language (MRL). We propose a semi-supervised automatic annotation, as a distantsupervision
approach, using morphological features of words to infer the polarity of
the inner nodes of MS-TR as positive and negative. The proposed annotation model
has four different annotation levels: morph-level, stem-level, token-level, and
review-level. Each annotation level’s contribution was tested using three different
domain datasets, including product reviews, movie reviews, and the Turkish Natural
Corpus essays. Comparative results were obtained with the Recursive Neural Tensor Networks (RNTN) model which is operated over MS-TR, and conventional machine learning methods. Experiments proved that RNTN outperformed the baseline methods and achieved much better accuracy results compared to the baseline methods, which cannot accurately capture the aggregated sentiment information
Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling
In this paper we propose and carefully evaluate a sequence labeling framework
which solely utilizes sparse indicator features derived from dense distributed
word representations. The proposed model obtains (near) state-of-the art
performance for both part-of-speech tagging and named entity recognition for a
variety of languages. Our model relies only on a few thousand sparse
coding-derived features, without applying any modification of the word
representations employed for the different tasks. The proposed model has
favorable generalization properties as it retains over 89.8% of its average POS
tagging accuracy when trained at 1.2% of the total available training data,
i.e.~150 sentences per language
Automatic propbank generation for Turkish
Semantic role labeling (SRL) is an important task for understanding natural languages, where the objective is to analyse propositions expressed by the verb and to identify each word that bears a semantic role. It provides an extensive dataset to enhance NLP applications such as information retrieval, machine translation, information extraction, and question answering. However, creating SRL models are difficult. Even in some languages, it is infeasible to create SRL models that have predicate-argument structure due to lack of linguistic resources. In this paper, we present our method to create an automatic Turkish PropBank by exploiting parallel data from the translated sentences of English PropBank. Experiments show that our method gives promising results. © 2019 Association for Computational Linguistics (ACL).Publisher's Versio
- …