4,674 research outputs found

    TwiSE at SemEval-2016 Task 4: Twitter Sentiment Classification

    Full text link
    This paper describes the participation of the team "TwiSE" in the SemEval 2016 challenge. Specifically, we participated in Task 4, namely "Sentiment Analysis in Twitter" for which we implemented sentiment classification systems for subtasks A, B, C and D. Our approach consists of two steps. In the first step, we generate and validate diverse feature sets for twitter sentiment evaluation, inspired by the work of participants of previous editions of such challenges. In the second step, we focus on the optimization of the evaluation measures of the different subtasks. To this end, we examine different learning strategies by validating them on the data provided by the task organisers. For our final submissions we used an ensemble learning approach (stacked generalization) for Subtask A and single linear models for the rest of the subtasks. In the official leaderboard we were ranked 9/35, 8/19, 1/11 and 2/14 for subtasks A, B, C and D respectively.\footnote{We make the code available for research purposes at \url{https://github.com/balikasg/SemEval2016-Twitter\_Sentiment\_Evaluation}.

    Fidelity-Weighted Learning

    Full text link
    Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.Comment: Published as a conference paper at ICLR 201

    On using Twitter to monitor political sentiment and predict election results

    Get PDF
    The body of content available on Twitter undoubtedly contains a diverse range of political insight and commentary. But, to what extent is this representative of an electorate? Can we model political sentiment effectively enough to capture the voting intentions of a nation during an election capaign? We use the recent Irish General Election as a case study for investigating the potential to model political sentiment through mining of social media. Our approach combines sentiment analysis using supervised learning and volume-based measures. We evaluate against the conventional election polls and the final election result. We find that social analytics using both volume-based measures and sentiment analysis are predictive and wemake a number of observations related to the task of monitoring public sentiment during an election campaign, including examining a variety of sample sizes, time periods as well as methods for qualitatively exploring the underlying content

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF
    • 

    corecore