172 research outputs found

    Sentiment analysis and real-time microblog search

    Get PDF
    This thesis sets out to examine the role played by sentiment in real-time microblog search. The recent prominence of the real-time web is proving both challenging and disruptive for a number of areas of research, notably information retrieval and web data mining. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user query at a given point in time, automated methods are required to enable users to sift through this information. As an area of research reaching maturity, sentiment analysis offers a promising direction for modelling the text content in microblog streams. In this thesis we review the real-time web as a new area of focus for sentiment analysis, with a specific focus on microblogging. We propose a system and method for evaluating the effect of sentiment on perceived search quality in real-time microblog search scenarios. Initially we provide an evaluation of sentiment analysis using supervised learning for classi- fying the short, informal content in microblog posts. We then evaluate our sentiment-based filtering system for microblog search in a user study with simulated real-time scenarios. Lastly, we conduct real-time user studies for the live broadcast of the popular television programme, the X Factor, and for the Leaders Debate during the Irish General Election. We find that we are able to satisfactorily classify positive, negative and neutral sentiment in microblog posts. We also find a significant role played by sentiment in many microblog search scenarios, observing some detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users’ prior topic sentiment

    Sentiment Analysis in Social Streams

    Get PDF
    In this chapter we review and discuss the state of the art on sentiment analysis in social streams –such as web forums, micro-blogging systems, and so- cial networks–, aiming to clarify how user opinions, affective states, and intended emotional effects are extracted from user generated content, how they are modeled, and how they could be finally exploited. We explain why sentiment analysis tasks are more difficult for social streams than for other textual sources, and entail going beyond classic text-based opinion mining techniques. We show, for example, that social streams may use vocabularies and expressions that exist outside the main- stream of standard, formal languages, and may reflect complex dynamics in the opinions and sentiments expressed by individuals and communities

    Doctor of Philosophy

    Get PDF
    dissertationDue to the popularity of Web 2.0 and Social Media in the last decade, the percolation of user generated content (UGC) has rapidly increased. In the financial realm, this results in the emergence of virtual investing communities (VIC) to the investing public. There is an on-going debate among scholars and practitioners on whether such UGC contain valuable investing information or mainly noise. I investigate two major studies in my dissertation. First I examine the relationship between peer influence and information quality in the context of individual characteristics in stock microblogging. Surprisingly, I discover that the set of individual characteristics that relate to peer influence is not synonymous with those that relate to high information quality. In relating to information quality, influentials who are frequently mentioned by peers due to their name value are likely to possess higher information quality while those who are better at diffusing information via retweets are likely to associate with lower information quality. Second I propose a study to explore predictability of stock microblog dimensions and features over stock price directional movements using data mining classification techniques. I find that author-ticker-day dimension produces the highest predictive accuracy inferring that this dimension is able to capture both relevant author and ticker information as compared to author-day and ticker-day. In addition to these two studies, I also explore two topics: network structure of co-tweeted tickers and sentiment annotation via crowdsourcing. I do this in order to understand and uncover new features as well as new outcome indicators with the objective of improving predictive accuracy of the classification or saliency of the explanatory models. My dissertation work extends the frontier in understanding the relationship between financial UGC, specifically stock microblogging with relevant phenomena as well as predictive outcomes

    Sentiment Analysis in Social Streams

    Get PDF
    In this chapter, we review and discuss the state of the art on sentiment analysis in social streams—such as web forums, microblogging systems, and social networks, aiming to clarify how user opinions, affective states, and intended emo tional effects are extracted from user generated content, how they are modeled, and howthey could be finally exploited.We explainwhy sentiment analysistasks aremore difficult for social streams than for other textual sources, and entail going beyond classic text-based opinion mining techniques. We show, for example, that social streams may use vocabularies and expressions that exist outside the mainstream of standard, formal languages, and may reflect complex dynamics in the opinions and sentiments expressed by individuals and communities

    Information Reliability on the Social Web - Models and Applications in Intelligent User Interfaces

    Get PDF
    The Social Web is undergoing continued evolution, changing the paradigm of information production, processing and sharing. Information sources have shifted from institutions to individual users, vastly increasing the amount of information available online. To overcome the information overload problem, modern filtering algorithms have enabled people to find relevant information in efficient ways. However, noisy, false and otherwise useless information remains a problem. We believe that the concept of information reliability needs to be considered along with information relevance to adapt filtering algorithms to today's Social Web. This approach helps to improve information search and discovery and can also improve user experience by communicating aspects of information reliability.This thesis first shows the results of a cross-disciplinary study into perceived reliability by reporting on a novel user experiment. This is followed by a discussion of modeling, validating, and communicating information reliability, including its various definitions across disciplines. A selection of important reliability attributes such as source credibility, competence, influence and timeliness are examined through different case studies. Results show that perceived reliability of information can vary greatly across contexts. Finally, recent studies on visual analytics, including algorithm explanations and interactive interfaces are discussed with respect to their impact on the perception of information reliability in a range of application domains

    Mining microblogs for culture-awareness in web adaptation

    Get PDF
    Prior studies in sociology and human-computer interaction indicate that persons from different countries and cultural origins tend to have their preferences in real-life communication and the usage of web and social media applications. With Twitter data, statistical and machine learning tools, this study advances our understand ing of microblogging in respect of cultural differences and demonstrates possible solutions of inferring and exploiting cultural origins for building adaptive web ap plications. Our findings reveal statistically significant differences in Twitter feature usage in respect of geographic locations of users. These differences in microblogger behaviour and user language defined in user profiles enabled us to infer user country origins with an accuracy of more than 90%. Other user origin predictive solutions we proposed do not require other data sources and human involvement for training the models, enabling the high accuracy of user country inference when exploiting information extracted from a user followers’ network, or with data derived from Twitter profiles. With origin predictive models, we analysed communication and privacy preferences and built a culture-aware recommender system. Our analysis of friend responses shows that Twitter users tend to communicate mostly within their cultural regions. Usage of privacy settings showed that privacy perceptions differ across cultures. Finally, we created and evaluated movie recommendation strategies considering user cultural groups, and addressed a cold-start scenario with a new user. We believe that the findings discussed give insights into the sociological and web research, in particular on cultural differences in online communication

    Toward Geo-social Information Systems: Methods and Algorithms

    Get PDF
    The widespread adoption of GPS-enabled tagging of social media content via smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers a new window into the spatio-temporal activities of hundreds of millions of people. These \footprints" open new possibilities for understanding how people can organize for societal impact and lay the foundation for new crowd-powered geo-social systems. However, there are key challenges to delivering on this promise: the slow adoption of location sharing, the inherent bias in the users that do share location, imbalanced location granularity, respecting location privacy, among many others. With these challenges in mind, this dissertation aims to develop the framework, algorithms, and methods for a new class of geo-social information systems. The dissertation is structured in two main parts: the rst focuses on understanding the capacity of existing footprints; the second demonstrates the potential of new geo-social information systems through two concrete prototypes. First, we investigate the capacity of using these geo-social footprints to build new geo-social information systems. (i): we propose and evaluate a probabilistic framework for estimating a microblog user's location based purely on the content of the user's posts. With the help of a classi cation component for automatically identifying words in tweets with a strong local geo-scope, the location estimator places 51% of Twitter users within 100 miles of their actual location. (ii): we investigate a set of 22 million check-ins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. Concretely, we observe that users follow simple reproducible mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial hotel search engine. Although generated by users with fundamentally di erent intentions, we nd common conclusions may be drawn from both data sources, indicating the viability of publicly shared location information to complement (and replace, in some cases), privately held location information. Second, we introduce a couple of prototypes of new geo-social information systems that utilize the collective intelligence from the emerging geo-social footprints. Concretely, we propose an activity-driven search system, and a local expert nding system that both take advantage of the collective intelligence. Speci cally, we study location-based activity patterns revealed through location sharing services and nd that these activity patterns can identify semantically related locations, and help with both unsupervised location clustering, and supervised location categorization with a high con dence. Based on these results, we show how activity-driven semantic organization of locations may be naturally incorporated into location-based web search. In addition, we propose a local expert nding system that identi es top local experts for a topic in a location. Concretely, the system utilizes semantic labels that people label each other, people's locations in current location-based social networks, and can identify top local experts with a high precision. We also observe that the proposed local authority metrics that utilize collective intelligence from expert candidates' core audience (list labelers), signi cantly improve the performance of local experts nding than the more intuitive way that only considers candidates' locations. ii

    Health Misinformation in Search and Social Media

    Get PDF
    People increasingly rely on the Internet in order to search for and share health-related information. Indeed, searching for and sharing information about medical treatments are among the most frequent uses of online data. While this is a convenient and fast method to collect information, online sources may contain incorrect information that has the potential to cause harm, especially if people believe what they read without further research or professional medical advice. The goal of this thesis is to address the misinformation problem in two of the most commonly used online services: search engines and social media platforms. We examined how people use these platforms to search for and share health information. To achieve this, we designed controlled laboratory user studies and employed large-scale social media data analysis tools. The solutions proposed in this thesis can be used to build systems that better support people's health-related decisions. The techniques described in this thesis addressed online searching and social media sharing in the following manner. First, with respect to search engines, we aimed to determine the extent to which people can be influenced by search engine results when trying to learn about the efficacy of various medical treatments. We conducted a controlled laboratory study wherein we biased the search results towards either correct or incorrect information. We then asked participants to determine the efficacy of different medical treatments. Results showed that people were significantly influenced both positively and negatively by search results bias. More importantly, when the subjects were exposed to incorrect information, they made more incorrect decisions than when they had no interaction with the search results. Following from this work, we extended the study to gain insights into strategies people use during this decision-making process, via the think-aloud method. We found that, even with verbalization, people were strongly influenced by the search results bias. We also noted that people paid attention to what the majority states, authoritativeness, and content quality when evaluating online content. Understanding the effects of cognitive biases that can arise during online search is a complex undertaking because of the presence of unconscious biases (such as the search results ranking) that the think-aloud method fails to show. Moving to social media, we first proposed a solution to detect and track misinformation in social media. Using Zika as a case study, we developed a tool for tracking misinformation on Twitter. We collected 13 million tweets regarding the Zika outbreak and tracked rumors outlined by the World Health Organization and the Snopes fact-checking website. We incorporated health professionals, crowdsourcing, and machine learning to capture health-related rumors as well as clarification communications. In this way, we illustrated insights that the proposed tools provide into potentially harmful information on social media, allowing public health researchers and practitioners to respond with targeted and timely action. From identifying rumor-bearing tweets, we examined individuals on social media who are posting questionable health-related information, in particular those promoting cancer treatments that have been shown to be ineffective. Specifically, we studied 4,212 Twitter users who have posted about one of 139 ineffective ``treatments'' and compared them to a baseline of users generally interested in cancer. Considering features that capture user attributes, writing style, and sentiment, we built a classifier that is able to identify users prone to propagating such misinformation. This classifier achieved an accuracy of over 90%, providing a potential tool for public health officials to identify such individuals for preventive intervention

    FINE-GRAINED EMOTION DETECTION IN MICROBLOG TEXT

    Get PDF
    Automatic emotion detection in text is concerned with using natural language processing techniques to recognize emotions expressed in written discourse. Endowing computers with the ability to recognize emotions in a particular kind of text, microblogs, has important applications in sentiment analysis and affective computing. In order to build computational models that can recognize the emotions represented in tweets we need to identify a set of suitable emotion categories. Prior work has mainly focused on building computational models for only a small set of six basic emotions (happiness, sadness, fear, anger, disgust, and surprise). This thesis describes a taxonomy of 28 emotion categories, an expansion of these six basic emotions, developed inductively from data. This set of 28 emotion categories represents a set of fine-grained emotion categories that are representative of the range of emotions expressed in tweets, microblog posts on Twitter. The ability of humans to recognize these fine-grained emotion categories is characterized using inter-annotator reliability measures based on annotations provided by expert and novice annotators. A set of 15,553 human-annotated tweets form a gold standard corpus, EmoTweet-28. For each emotion category, we have extracted a set of linguistic cues (i.e., punctuation marks, emoticons, emojis, abbreviated forms, interjections, lemmas, hashtags and collocations) that can serve as salient indicators for that emotion category. We evaluated the performance of automatic classification techniques on the set of 28 emotion categories through a series of experiments using several classifier and feature combinations. Our results shows that it is feasible to extend machine learning classification to fine-grained emotion detection in tweets (i.e., as many as 28 emotion categories) with results that are comparable to state-of-the-art classifiers that detect six to eight basic emotions in text. Classifiers using features extracted from the linguistic cues associated with each category equal or better the performance of conventional corpus-based and lexicon-based features for fine-grained emotion classification. This thesis makes an important theoretical contribution in the development of a taxonomy of emotion in text. In addition, this research also makes several practical contributions, particularly in the creation of language resources (i.e., corpus and lexicon) and machine learning models for fine-grained emotion detection in text
    corecore