3,426 research outputs found
TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification
This paper describes the participation of the team "TwiSE" in the SemEval
2016 challenge. Specifically, we participated in Task 4, namely "Sentiment
Analysis in Twitter" for which we implemented sentiment classification systems
for subtasks A, B, C and D. Our approach consists of two steps. In the first
step, we generate and validate diverse feature sets for twitter sentiment
evaluation, inspired by the work of participants of previous editions of such
challenges. In the second step, we focus on the optimization of the evaluation
measures of the different subtasks. To this end, we examine different learning
strategies by validating them on the data provided by the task organisers. For
our final submissions we used an ensemble learning approach (stacked
generalization) for Subtask A and single linear models for the rest of the
subtasks. In the official leaderboard we were ranked 9/35, 8/19, 1/11 and 2/14
for subtasks A, B, C and D respectively.\footnote{We make the code available
for research purposes at
\url{https://github.com/balikasg/SemEval2016-Twitter\_Sentiment\_Evaluation}.
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
WordRank: Learning Word Embeddings via Robust Ranking
Embedding words in a vector space has gained a lot of attention in recent
years. While state-of-the-art methods provide efficient computation of word
similarities via a low-dimensional matrix embedding, their motivation is often
left unclear. In this paper, we argue that word embedding can be naturally
viewed as a ranking problem due to the ranking nature of the evaluation
metrics. Then, based on this insight, we propose a novel framework WordRank
that efficiently estimates word representations via robust ranking, in which
the attention mechanism and robustness to noise are readily achieved via the
DCG-like ranking losses. The performance of WordRank is measured in word
similarity and word analogy benchmarks, and the results are compared to the
state-of-the-art word embedding techniques. Our algorithm is very competitive
to the state-of-the- arts on large corpora, while outperforms them by a
significant margin when the training set is limited (i.e., sparse and noisy).
With 17 million tokens, WordRank performs almost as well as existing methods
using 7.2 billion tokens on a popular word similarity benchmark. Our multi-node
distributed implementation of WordRank is publicly available for general usage.Comment: Conference on Empirical Methods in Natural Language Processing
(EMNLP), November 1-5, 2016, Austin, Texas, US
- …