4,643 research outputs found
Introduction to the special issue on cross-language algorithms and applications
With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of
Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special
issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment
analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version
MS-TR: A Morphologically Enriched Sentiment Treebank and Recursive Deep Models for Compositional Semantics in Turkish
Recursive Deep Models have been used as powerful models to learn
compositional representations of text for many natural language processing tasks.
However, they require structured input (i.e. sentiment treebank) to encode sentences
based on their tree-based structure to enable them to learn latent semantics
of words using recursive composition functions. In this paper, we present our
contributions and efforts for the Turkish Sentiment Treebank construction. We
introduce MS-TR, a Morphologically Enriched Sentiment Treebank, which was
implemented for training Recursive Deep Models to address compositional sentiment
analysis for Turkish, which is one of the well-known Morphologically Rich
Language (MRL). We propose a semi-supervised automatic annotation, as a distantsupervision
approach, using morphological features of words to infer the polarity of
the inner nodes of MS-TR as positive and negative. The proposed annotation model
has four different annotation levels: morph-level, stem-level, token-level, and
review-level. Each annotation level’s contribution was tested using three different
domain datasets, including product reviews, movie reviews, and the Turkish Natural
Corpus essays. Comparative results were obtained with the Recursive Neural Tensor Networks (RNTN) model which is operated over MS-TR, and conventional machine learning methods. Experiments proved that RNTN outperformed the baseline methods and achieved much better accuracy results compared to the baseline methods, which cannot accurately capture the aggregated sentiment information
DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging
Tagging news articles or blog posts with relevant tags from a collection of
predefined ones is coined as document tagging in this work. Accurate tagging of
articles can benefit several downstream applications such as recommendation and
search. In this work, we propose a novel yet simple approach called DocTag2Vec
to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two
popular models for learning distributed representation of words and documents.
In DocTag2Vec, we simultaneously learn the representation of words, documents,
and tags in a joint vector space during training, and employ the simple
-nearest neighbor search to predict tags for unseen documents. In contrast
to previous multi-label learning methods, DocTag2Vec directly deals with raw
text instead of provided feature vector, and in addition, enjoys advantages
like the learning of tag representation, and the ability of handling newly
created tags. To demonstrate the effectiveness of our approach, we conduct
experiments on several datasets and show promising results against
state-of-the-art methods.Comment: 10 page
- …