63 research outputs found

    Determining word–emotion associations from tweets by multi-label classification

    Get PDF
    The automatic detection of emotions in Twitter posts is a challenging task due to the informal nature of the language used in this platform. In this paper, we propose a methodology for expanding the NRC word-emotion association lexicon for the language used in Twitter. We perform this expansion using multi-label classification of words and compare different wordlevel features extracted from unlabelled tweets such as unigrams, Brown clusters, POS tags, and word2vec embeddings. The results show that the expanded lexicon achieves major improvements over the original lexicon when classifying tweets into emotional categories. In contrast to previous work, our methodology does not depend on tweets annotated with emotional hashtags, thus enabling the identification of emotional words from any domainspecific collection using unlabelled tweets

    Analyzing Twitter Feeds to Facilitate Crises Informatics and Disaster Response During Mass Emergencies

    Get PDF
    It is a common practice these days for general public to use various micro-blogging platforms, predominantly Twitter, to share ideas, opinions and information about things and life. Twitter is also being increasingly used as a popular source of information sharing during natural disasters and mass emergencies to update and communicate the extent of the geographic phenomena, report the affected population and casualties, request or provide volunteering services and to share the status of disaster recovery process initiated by humanitarian-aid and disaster-management organizations. Recent research in this area has affirmed the potential use of such social media data for various disaster response tasks. Even though the availability of social media data is massive, open and free, there is a significant limitation in making sense of this data because of its high volume, variety, velocity, value, variability and veracity. The current work provides a comprehensive framework of text processing and analysis performed on several thousands of tweets shared on Twitter during natural disaster events. Specifically, this work em- ploys state-of-the-art machine learning techniques from natural language processing on tweet content to process the ginormous data generated at the time of disasters. This study shall serve as a basis to provide useful actionable information to the crises management and mitigation teams in planning and preparation of effective disaster response and to facilitate the development of future automated systems for handling crises situations

    Classification of socially generated medical data

    Get PDF
    The growth of online health communities, particularly those involving socially generated content, can provide considerable value for society. Participants can gain knowledge of medical information or interact with peers on medical forum platforms. However, the sheer volume of information so generated – and the consequent ‘noise’ associated with large data volumes – can create difficulties for information consumers. We propose a solution to this problem by applying high-level analytics to the data – primarily sentiment analysis, but also content and topic analysis - for accurate classification. We believe that such analysis can be of significant value to data users, such as identifying a particular aspect of an information space, determining themes that predominate among a large dataset, and allowing people to summarize topics within a big dataset. In this thesis, we apply machine learning strategies to identify sentiments expressed in online medical forums that discuss Lyme Disease. As part of this process, we distinguish a complete and relevant set of categories that can be used to characterize Lyme Disease discourse. We present a feature-based model that employs supervised learning algorithms and assess the feasibility and accuracy of this sentiment classification model. We further evaluate our model by assessing its ability to adapt to an online medical forum discussing a disease with similar characteristics, Lupus. The experimental results demonstrate the effectiveness of our approach. In many sentiment analysis applications, the labelled training datasets are expensive to obtain, whereas unlabelled datasets are readily available. Therefore, we present an adaptation of a well-known semi-supervised learning technique, in which co-training is implemented by combining labelled and unlabelled data. Our results would suggest the ability to learn even with limited labelled data. In addition, we investigate complementary analytic techniques – content and topic analysis – to leverage best used of the data for various consumer groups. Within the work described in this thesis, some particular research issues are addressed, specifically when applied to socially generated medical/health datasets: • When applying binary sentiment analysis to short-form text data (e.g. Twitter), could meta-level features improve performance of classification? • When applying more complex multi-class sentiment analysis to classification of long-form content-rich text data, would meta-level features be a useful addition to more conventional features? • Can this multi-class analysis approach be generalised to other medical/health domains? • How would alternative classification strategies benefit different groups of information consumers

    Finding polarised communities and tracking information diffusion on Twitter: The Irish Abortion Referendum

    Full text link
    The analysis of social networks enables the understanding of social interactions, polarisation of ideas, and the spread of information and therefore plays an important role in society. We use Twitter data - as it is a popular venue for the expression of opinion and dissemination of information - to identify opposing sides of a debate and, importantly, to observe how information spreads between these groups in our current polarised climate. To achieve this, we collected over 688,000 Tweets from the Irish Abortion Referendum of 2018 to build a conversation network from users mentions with sentiment-based homophily. From this network, community detection methods allow us to isolate yes- or no-aligned supporters with high accuracy (90.9%). We supplement this by tracking how information cascades spread via over 31,000 retweet-cascades. We found that very little information spread between polarised communities. This provides a valuable methodology for extracting and studying information diffusion on large networks by isolating ideologically polarised groups and exploring the propagation of information within and between these groups.Comment: 44 pages, 4 appendices, 18 figure

    Sentiment analysis: the case of twitch chat - Mining user feedback from livestream chats

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementIn a world where users often share their thoughts and opinions through online communication channels, applications that can tap into these channels as to extract consumer feedback have become increasingly valuable. Traditional marketing research techniques such as interviews or surveys offer results that pale in comparison to sentiment analysis applications that can extract organic feedback from an extremely large selection, with very little resources and in real-time. This thesis focuses on proposing and developing one of these tools that targets livestreams, which have, over the years, seen a massive increase in popularity from both a user-base standpoint as well as brand involvement. We chose the livestreaming platform “Twitch” as the target of research and developed a sentiment analysis model, using rule-based approaches, capable of interpreting user chat messages and identifying whether those messages are negative, positive or neutral. Additionally, an application was developed to better view and analyze the results of the model. By segmenting our results by product reveal, we also exhibit how the application allows for the extraction of various insights about the public’s opinion of that product
    • …
    corecore