3 research outputs found

    Transfer Learning for Multi-language Twitter Election Classification

    Get PDF
    Both politicians and citizens are increasingly embracing social media as a means to disseminate information and comment on various topics, particularly during significant political events, such as elections. Such commentary during elections is also of interest to social scientists and pollsters. To facilitate the study of social media during elections, there is a need to automatically identify posts that are topically related to those elections. However, current studies have focused on elections within English-speaking regions, and hence the resultant election content classifiers are only applicable for elections in countries where the predominant language is English. On the other hand, as social media is becoming more prevalent worldwide, there is an increasing need for election classifiers that can be generalised across different languages, without building a training dataset for each election. In this paper, based upon transfer learning, we study the development of effective and reusable election classifiers for use on social media across multiple languages. We combine transfer learning with different classifiers such as Support Vector Machines (SVM) and state-of-the-art Convolutional Neural Networks (CNN), which make use of word embedding representations for each social media post. We generalise the learned classifier models for cross-language classification by using a linear translation approach to map the word embedding vectors from one language into another. Experiments conducted over two election datasets in different languages show that without using any training data from the target language, linear translations outperform a classical transfer learning approach, namely Transfer Component Analysis (TCA), by 80% in recall and 25% in F1 measure

    Word Level Approach for Tweets Classification based on its Content

    Get PDF
    Twitter has become the largest microblogging platform where users can interact between each other expressing opinions, thoughts and feelings related to any topic or source of news in a compressed 280 character message, called tweet. Hashtags are popular keywords used to label these tweets according to its content. This work tries to nd out if the usage of hashtags to label tweets with similar content is accurate enough. To do so, tweets from di erent popular hashtags have been retrieved and processed in order to have a dataset with a content as close to reality as possible. Several embedding methods and learning algorithms have been studied to classify tweets from di erent hashtags based on the content. Results showed that the best performance is achieved when using the Tf-idf embedding method and support vectors machine. The learning algorithm obtained a precision around 90% for classi cation on 10 classes and above 70% when dealing with 100 classes trained on datasets of only 13680 and 143067 samples respectively. The results also indicated that BoW and Tf-idf methods outperformed other state of the art methods for other natural language processing tasks, such as GloVe or Word2Vec.Outgoin

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories
    corecore