75 research outputs found

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Multilingual sentiment analysis in social media.

    Get PDF
    252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations

    Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

    Get PDF
    In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

    Explainable Argument Mining

    Get PDF

    Towards Robust Word Embeddings for Noisy Texts

    Get PDF
    [Abstract] Research on word embeddings has mainly focused on improving their performance on standard corpora, disregarding the difficulties posed by noisy texts in the form of tweets and other types of non-standard writing from social media. In this work, we propose a simple extension to the skipgram model in which we introduce the concept of bridge-words, which are artificial words added to the model to strengthen the similarity between standard words and their noisy variants. Our new embeddings outperform baseline models on noisy texts on a wide range of evaluation tasks, both intrinsic and extrinsic, while retaining a good performance on standard texts. To the best of our knowledge, this is the first explicit approach at dealing with these types of noisy texts at the word embedding level that goes beyond the support for out-of-vocabulary words.Ministerio de Economía, Industria y Competitividad. MINECO; TIN2017-85160-C2-2-RMinisterio de Economía, Industria y Competitividad. MINECO; TIN2017-85160-C2-1-REuropean Social Fund. ESF; BES-2015-073768Xunta de Galicia; ED431D 2017/12Xunta de Galicia; ED431B 2017/01Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G/0

    ADDRESSING INFORMALITY IN PROCESSING CHINESE MICROTEXT

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management

    Get PDF
    This paper investigates research using VGI and geo-social media in the disaster management context. Relying on the method of systematic mapping, it develops a classification schema that captures three levels of main category, focus, and intended use, and analyzes the relationships with the employed data sources and analysis methods. It focuses the scope to the pioneering field of disaster management, but the described approach and the developed classification schema are easily adaptable to different application domains or future developments. The results show that a hypothesized consolidation of research, characterized through the building of canonical bodies of knowledge and advanced application cases with refined methodology, has not yet happened. The majority of the studies investigate the challenges and potential solutions of data handling, with fewer studies focusing on socio-technological issues or advanced applications. This trend is currently showing no sign of change, highlighting that VGI research is still very much technology-driven as opposed to theory- or application-driven. From the results of the systematic mapping study, the authors formulate and discuss several research objectives for future work, which could lead to a stronger, more theory-driven treatment of the topic VGI in GIScience.Carlos Granell has been partly funded by the Ramón y Cajal Programme (grant number RYC-2014-16913
    corecore