133 research outputs found

    Sentiment analysis on Twitter

    Get PDF
    In recent years more and more people have been connecting with Social Networks. One of the most used is Twitter. This huge amount of information is attracting the interest of companies. One reason is that this huge source of information can be used to detect public opinion about their brands and thus improve their business values. In order to transform the information present in the Social Networks into knowledge several steps are required. This project aim to describe them and provide tools that are able to perform this task. The first problem is how to retrieve the data. Several ways are available, each one with its own pros and cons. After that it is necessary to study and define proper queries in order to retrieve the information needed. Once the data is retrieved you may need to filter and explore your data. For this task a Topic Model Algorithm ( LDA ) has been studied and analyzed. LDA has shown positive results when it is tuned in the proper way and it is combined with appropriate visualization techniques. The difference between a Topic Model Algorithm and other Clustering/Segmentation techniques is that Topic Models allows each ”document” ( instance ) to belong to more than one topic ( cluster ). LDA doesn’t natively work well on Twitter due to the very short length of the tweets. An investigation in the literature has revealed a solution to this problem. Another problem that is common in clustering is how to validate the Algorithm and how to choose the proper number of topics ( clusters), for this problem several metrics in the literature have been explored. Afterwards, Sentiment Analysis techniques can be applied in order to measure the opinion of the users . The literature presents several approaches and ways to solving this problem. This work is focused in solving the Polarity Detection task, with three classes , so, classify if a tweet express a positive , a negative or a neutral sentiment. Here reach accurate results can be challenging, due to the messy nature of the twitter posts. Several approaches have been tested and compared. The baseline method tested is the use of sentiment dictionaries, after that , since the real sentiment of the twitter posts is not available, a sample has been manually labeled and several Supervised approaches combined with various Feature Selection/Transformation techniques have been tested. Finally, a totally new experimental approach, inspired from the Soft Labeling technique present in the literature, has been defined and tested. This method try to avoid the costly task to manually label a sample in order to validate a model. In the literature this problem is solved for the two-class problem, so by considering only positive and negative tweets. This work try to extend the soft-labeling approach to the three class problem

    Assessing Trust and Veracity of Data in Social Media

    Get PDF
    Social media highly impacts our knowledge and perception of the world. With the tremendous amount of data that is circulating in social media and initiated by a vast number of users from all over the world, extracting useful information from such data and assessing its veracity has become much more challenging. Data veracity refers to the trustworthiness and certainty of data. The challenges of handling textual data in social media have raised the need for efficient tools to extract, understand, and assess the veracity of information circulating in social media at a given time. In this thesis, we present three research problems to address major challenges of handling textual data in social media. First, overwhelming the user with huge volumes of short, noisy, and unstructured textual data complicates the task of understanding what topics are discussed by users in micro-blogging websites. Topic models were proposed to automatically learn a set of keywords that better describe each topic covered by a large corpus of text documents to enable fast and effective browsing and exploration of its contents. However, in order for the results of topic modeling algorithms to be useful, these results have to be interpretable. Applying topic models to social media data to get meaningful results is not a trivial task. In this thesis, we study the problem of improving interpretation of topic modeling of micro-posts in social media. We propose a new method that incorporates topic modeling, a lexical database, and the set of hashtags available in the corpus of micro-posts to produce a higher quality representation of each extracted topic. Extensive experiments on two real-life datasets collected from Twitter show that our method outperforms the state-of-the-art model in terms of perplexity, topics' coherence, and their quality. Second, the nature and flexibility of social media facilitate the process of posting unverified information, especially during the rapid diffusion of breaking news. Efficiently detecting and acting upon unverified breaking news rumors throughout social media is of high importance to minimizing their harmful effect. However, detecting them is not a trivial task. They belong to unseen topics or events that are not covered in the training dataset. In this thesis, we study the problem of assessing the veracity of information contained in micro-posts regarding emerging stories and topics of breaking news. We propose a new approach that jointly learns word embeddings and trains a neural network model with two different objectives to automatically identify unverified micro-posts spreading in social media during breaking news. Extensive experiments on real-life datasets show that our proposed model outperforms the state-of-the-art classifier as well as other baseline classifiers in terms of precision, recall, and F1. Finally, the uncertainty and chaos associated with hot and sensitive breaking news and emergencies facilitate the explosive spread of high-engaging breaking news rumors that might be extremely damaging. In such a case, authorities have to prioritize the rumors verification process and act upon high-engaging breaking news rumors quickly to reduce their damaging consequences. However, this is an extremely challenging task. In this thesis, we study the problem of identifying rumors micro-posts that are most likely to become viral and achieve high engagement rates among recipients in social media during breaking news. We propose a multi-task neural network to jointly learn the two tasks of breaking news rumors detection and breaking news rumors popularity prediction. Extensive experiments on real-life datasets show that the performance of our joint learning model outperforms other baseline classifiers in terms of precision, recall, and F1 and is capable of identifying high-engaging breaking news rumors with high accuracy

    Toward Effective Knowledge Discovery in Social Media Streams

    Get PDF
    The last few decades have seen an unprecedented growth in the amount of new data. New computing and communications resources, such as cloud data platforms and mo- bile devices have enabled individuals to contribute new ideas, share points of view and exchange newsworthy bits with each other at a previously unfathomable rate. While there are many ways a modern person can communicate digitally with others, social media outlets, such as Twitter or Facebook have been occupying much of the focus of inter-person social networking in recent years. The millions of pieces of content published on social media sites have been both a blessing and a curse for those trying to make sense of the discourse. On one hand, the sheer amount of easily available, real time, contextually relevant content has been a cause of much excitement in academia and the industry. On the other hand, however, the amount of new diverse content that is being continuously published on social sites makes it difficult for researchers and industry participants to effectively grasp. Therefore, the goal of this thesis is to discover a set of approaches and techniques that would help enable data miners to quickly develop intuitions regarding the happenings in the social media space. To that aim, I concentrate on effectively visualizing social media streams as hierarchical structures, as such structures have been shown to be useful in human sense makingPh.D., Information Studies -- Drexel University, 201

    Taste or Addiction?: Using Play Logs to Infer Song Selection Motivation

    Full text link
    Online music services are increasing in popularity. They enable us to analyze people's music listening behavior based on play logs. Although it is known that people listen to music based on topic (e.g., rock or jazz), we assume that when a user is addicted to an artist, s/he chooses the artist's songs regardless of topic. Based on this assumption, in this paper, we propose a probabilistic model to analyze people's music listening behavior. Our main contributions are three-fold. First, to the best of our knowledge, this is the first study modeling music listening behavior by taking into account the influence of addiction to artists. Second, by using real-world datasets of play logs, we showed the effectiveness of our proposed model. Third, we carried out qualitative experiments and showed that taking addiction into account enables us to analyze music listening behavior from a new viewpoint in terms of how people listen to music according to the time of day, how an artist's songs are listened to by people, etc. We also discuss the possibility of applying the analysis results to applications such as artist similarity computation and song recommendation.Comment: Accepted by The 21st Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2017

    Improving Topic Model Clustering of Newspaper Comments for Summarisation

    Get PDF

    Hybrid Recommender for Online Petitions with Social Network and Psycholinguistic Features

    Get PDF
    The online petition has become one of the most important channels of civic participation. Most of the state-of-the-art online platforms, however, tend to use simple indicators (such as popularity) to rank petitions, hence creating a situation where the most popular petitions dominate the rank and attract most people’s attention. For the petitions which focus on specific issues, they are often in a disadvantageous position on the list. For example, a petition for local environment problem may not be seen by many people who are really concerned with it, simply because it takes multiple pages to reach it. Therefore, the simple ranking mechanism adopted by most of the online petition platforms cannot effectively link most petitions with those who are really concerned with them. According to previous studies online, petitions seriousness has been questioned due to the rare chance of succeeding. At most, less than 10% of online petitions get the chance to fulfill their causes. To solve this problem, we present a design of a novel recommender system (PETREC). It leverages social interaction features, psycholinguistic features, and latent topic features to provide a personalized ranking to different users. Hence, it can give users better petition recommendations fitting their unique concerns. We evaluate PETREC against matrix factorization collaborative filtering and content-based filtering with the bag of words (Bow) features as two baseline recommenders for benchmarking. PETREC prediction performance outperformed Matrix factorization collaborative filtering, Bow petition-based content filtering, and Bow user-based content filtering with 4.2%, 1.7%, and 2.8% respectively as improvements in Root Mean Square Error (RMSE). The recommendation system described in this paper has potential to improve the user experience of online petition platforms. Thus, it is possible that it could encourage more public participation. Eventually, it will help the citizens to make a real difference through actively participating in online petitions that are matching their personalized concerns
    • 

    corecore