440 research outputs found

    On the Impact of Entity Linking in Microblog Real-Time Filtering

    Full text link
    Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applica- tions of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track realtime adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 - 17, 201

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities

    Characterizing Geo-located Tweets in Brazilian Megacities

    Full text link
    This work presents a framework for collecting, processing and mining geo-located tweets in order to extract meaningful and actionable knowledge in the context of smart cities. We collected and characterized more than 9M tweets from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We performed topic modeling using the Latent Dirichlet Allocation model to produce an unsupervised distribution of semantic topics over the stream of geo-located tweets as well as a distribution of words over those topics. We manually labeled and aggregated similar topics obtaining a total of 29 different topics across both cities. Results showed similarities in the majority of topics for both cities, reflecting similar interests and concerns among the population of Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more predominant in one of the cities

    Sentiment analysis and real-time microblog search

    Get PDF
    This thesis sets out to examine the role played by sentiment in real-time microblog search. The recent prominence of the real-time web is proving both challenging and disruptive for a number of areas of research, notably information retrieval and web data mining. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user query at a given point in time, automated methods are required to enable users to sift through this information. As an area of research reaching maturity, sentiment analysis offers a promising direction for modelling the text content in microblog streams. In this thesis we review the real-time web as a new area of focus for sentiment analysis, with a specific focus on microblogging. We propose a system and method for evaluating the effect of sentiment on perceived search quality in real-time microblog search scenarios. Initially we provide an evaluation of sentiment analysis using supervised learning for classi- fying the short, informal content in microblog posts. We then evaluate our sentiment-based filtering system for microblog search in a user study with simulated real-time scenarios. Lastly, we conduct real-time user studies for the live broadcast of the popular television programme, the X Factor, and for the Leaders Debate during the Irish General Election. We find that we are able to satisfactorily classify positive, negative and neutral sentiment in microblog posts. We also find a significant role played by sentiment in many microblog search scenarios, observing some detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users’ prior topic sentiment

    Seminar Users in the Arabic Twitter Sphere

    Full text link
    We introduce the notion of "seminar users", who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
    corecore