26 research outputs found

    Towards reproducible research of event detection techniques for Twitter

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Searching for superspreaders of information in real-world social media

    Full text link
    A number of predictors have been suggested to detect the most influential spreaders of information in online social media across various domains such as Twitter or Facebook. In particular, degree, PageRank, k-core and other centralities have been adopted to rank the spreading capability of users in information dissemination media. So far, validation of the proposed predictors has been done by simulating the spreading dynamics rather than following real information flow in social networks. Consequently, only model-dependent contradictory results have been achieved so far for the best predictor. Here, we address this issue directly. We search for influential spreaders by following the real spreading dynamics in a wide range of networks. We find that the widely-used degree and PageRank fail in ranking users' influence. We find that the best spreaders are consistently located in the k-core across dissimilar social platforms such as Twitter, Facebook, Livejournal and scientific publishing in the American Physical Society. Furthermore, when the complete global network structure is unavailable, we find that the sum of the nearest neighbors' degree is a reliable local proxy for user's influence. Our analysis provides practical instructions for optimal design of strategies for "viral" information dissemination in relevant applications.Comment: 12 pages, 7 figure

    Inverted Index Entry Invalidation Strategy for Real Time Search

    Get PDF
    The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Creating extended gender labelled datasets of Twitter users

    Get PDF
    The gender information of a Twitter user is not known a priori when analysing Twitter data, because user registration does not include gender information. This paper proposes an approach for creating extended gender labelled datasets of Twitter users. The process involves creating a smaller database of active Twitter users and to manually label the gender. The process follows by extracting features from unstructured information found on each user profile and by creating a gender classification model. The model is then applied to a larger dataset, thus providing automatic labels and corresponding confidence scores, which can be used to estimate the most accurately labeled users. The resulting databases can be further enriched with additional information extracted, for example, from the profile picture and from the user location. The proposed approach was successfully applied to English and Portuguese users, leading to two large datasets containing more than 57K labeled users each.info:eu-repo/semantics/acceptedVersio

    Detecting the Influence of Spreading in Social Networks with Excitable Sensor Networks

    Full text link
    Detecting spreading outbreaks in social networks with sensors is of great significance in applications. Inspired by the formation mechanism of human's physical sensations to external stimuli, we propose a new method to detect the influence of spreading by constructing excitable sensor networks. Exploiting the amplifying effect of excitable sensor networks, our method can better detect small-scale spreading processes. At the same time, it can also distinguish large-scale diffusion instances due to the self-inhibition effect of excitable elements. Through simulations of diverse spreading dynamics on typical real-world social networks (facebook, coauthor and email social networks), we find that the excitable senor networks are capable of detecting and ranking spreading processes in a much wider range of influence than other commonly used sensor placement methods, such as random, targeted, acquaintance and distance strategies. In addition, we validate the efficacy of our method with diffusion data from a real-world online social system, Twitter. We find that our method can detect more spreading topics in practice. Our approach provides a new direction in spreading detection and should be useful for designing effective detection methods

    Report on the Evaluation-as-a-Service (EaaS) Expert Workshop

    Get PDF
    In this report, we summarize the outcome of the "Evaluation-as-a-Service" workshop that was held on the 5th and 6th March 2015 in Sierre, Switzerland. The objective of the meeting was to bring together initiatives that use cloud infrastructures, virtual machines, APIs (Application Programming Interface) and related projects that provide evaluation of information retrieval or machine learning tools as a service

    Inverted Index Entry Invalidation Strategy for Real Time Search

    Get PDF
    The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Tools for the Analysis and Visualization of Twitter Language Data

    Get PDF
    The microblogging service Twitter provides vast amounts of user-generated language data. In this article I give an overview of related work on Twitter as an object of study. I also describe the anatomy of a Twitter message and discuss typical uses of the Twitter platform. The Twitter Application Programming Interface (API) will be introduced in a generic, non-technical way to provide a basic under-standing of existing opportunities but also limitations when working with Twitter data. I propose a basic classification system for existing tools that can be used for collecting and analyzing Twitter data and introduce some exemplary tools for each category. Then, I present a more comprehensive work-flow for conducting studies with Twitter data, which comprises the following steps: crawling, annotation, analysis and visualization. Finally, I illustrate the generic workflow by describing an exemplary study from the context of social TV research. At the end of the article, the main issues concerning tools and methods for the analysis of Twitter data are briefly addressed
    corecore