26 research outputs found
Towards reproducible research of event detection techniques for Twitter
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Searching for superspreaders of information in real-world social media
A number of predictors have been suggested to detect the most influential
spreaders of information in online social media across various domains such as
Twitter or Facebook. In particular, degree, PageRank, k-core and other
centralities have been adopted to rank the spreading capability of users in
information dissemination media. So far, validation of the proposed predictors
has been done by simulating the spreading dynamics rather than following real
information flow in social networks. Consequently, only model-dependent
contradictory results have been achieved so far for the best predictor. Here,
we address this issue directly. We search for influential spreaders by
following the real spreading dynamics in a wide range of networks. We find that
the widely-used degree and PageRank fail in ranking users' influence. We find
that the best spreaders are consistently located in the k-core across
dissimilar social platforms such as Twitter, Facebook, Livejournal and
scientific publishing in the American Physical Society. Furthermore, when the
complete global network structure is unavailable, we find that the sum of the
nearest neighbors' degree is a reliable local proxy for user's influence. Our
analysis provides practical instructions for optimal design of strategies for
"viral" information dissemination in relevant applications.Comment: 12 pages, 7 figure
Inverted Index Entry Invalidation Strategy for Real Time Search
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI
Creating extended gender labelled datasets of Twitter users
The gender information of a Twitter user is not known a priori when analysing Twitter data, because user registration does not include gender information. This paper proposes an approach for creating extended gender labelled datasets of Twitter users. The process involves creating a smaller database of active Twitter users and to manually label the gender. The process follows by extracting features from unstructured information found on each user profile and by creating a gender classification model. The model is then applied to a larger dataset, thus providing automatic labels and corresponding confidence scores, which can be used to estimate the most accurately labeled users. The resulting databases can be further enriched with additional information extracted, for example, from the profile picture and from the user location. The proposed approach was successfully applied to English and Portuguese users, leading to two large datasets containing more than 57K labeled users each.info:eu-repo/semantics/acceptedVersio
Detecting the Influence of Spreading in Social Networks with Excitable Sensor Networks
Detecting spreading outbreaks in social networks with sensors is of great
significance in applications. Inspired by the formation mechanism of human's
physical sensations to external stimuli, we propose a new method to detect the
influence of spreading by constructing excitable sensor networks. Exploiting
the amplifying effect of excitable sensor networks, our method can better
detect small-scale spreading processes. At the same time, it can also
distinguish large-scale diffusion instances due to the self-inhibition effect
of excitable elements. Through simulations of diverse spreading dynamics on
typical real-world social networks (facebook, coauthor and email social
networks), we find that the excitable senor networks are capable of detecting
and ranking spreading processes in a much wider range of influence than other
commonly used sensor placement methods, such as random, targeted, acquaintance
and distance strategies. In addition, we validate the efficacy of our method
with diffusion data from a real-world online social system, Twitter. We find
that our method can detect more spreading topics in practice. Our approach
provides a new direction in spreading detection and should be useful for
designing effective detection methods
Report on the Evaluation-as-a-Service (EaaS) Expert Workshop
In this report, we summarize the outcome of the "Evaluation-as-a-Service" workshop that was held on the 5th and 6th March 2015 in Sierre, Switzerland. The objective of the meeting was to bring together initiatives that use cloud infrastructures, virtual machines, APIs (Application Programming Interface) and related projects that provide evaluation of information retrieval or machine learning tools as a service
Inverted Index Entry Invalidation Strategy for Real Time Search
The impressive rise of user-generated content on the web in the hands of sites like Twitter imposes new challenges to search systems. The concept of real-time search emerges, increasing the role that efficient indexing and retrieval algorithms play in this scenario. Thousands of new updates need to be processed in the very moment they are generated and users expect content to be “searchable” within seconds. This lead to the develop of efficient data structures and algorithms that may face this challenge efficiently. In this work, we introduce the concept of index entry invalidator, a strategy responsible for keeping track of the evolu- tion of the underlying vocabulary and selectively invalidóte and evict those inverted index entries that do not considerably degrade retrieval effectiveness. Consequently, the index becomes smaller and may increase overall efficiency. We study the dynamics of the vocabulary using a real dataset and also provide an evaluation of the proposed strategy using a search engine specifically designed for real-time indexing and search.XII Workshop Bases de Datos y Minería de Datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI
Recommended from our members
This Account Doesn’t Exist: Tweet Decay and the Politics of Deletion in the Brexit Debate
Literature on influence operations has identified metrics that are indicative of social media manipulation, but few studies have explored the lifecycle of low-quality information. We contribute to this literature by reconstructing nearly 3 million messages posted by 1 million users in the last days of the Brexit referendum campaign. While previous studies have found that on average only 4% of tweets disappear, we found that 33% of the tweets leading up to the referendum vote are no longer available. Only about half of the most active accounts that tweeted the referendum continue to operate publicly, and 20% of all accounts are no longer active. We tested whether partisan content was more likely to disappear and found more messages from the Leave campaign that disappeared than the entire universe of tweets affiliated with the Remain campaign. We compare these results with an assorted set of 45 hashtags posted in the same period and find that political campaigns present much higher ratios of user and tweet decay. These results are validated by inspecting 2 million Brexit-related tweets posted over a period of nearly 4 years. The article concludes with an overview of these findings and recommendations for future research
Tools for the Analysis and Visualization of Twitter Language Data
The microblogging service Twitter provides vast amounts of user-generated language data. In this article I give an overview of related work on Twitter as an object of study. I also describe the anatomy of a Twitter message and discuss typical uses of the Twitter platform. The Twitter Application Programming Interface (API) will be introduced in a generic, non-technical way to provide a basic under-standing of existing opportunities but also limitations when working with Twitter data. I propose a basic classification system for existing tools that can be used for collecting and analyzing Twitter data and introduce some exemplary tools for each category. Then, I present a more comprehensive work-flow for conducting studies with Twitter data, which comprises the following steps: crawling, annotation, analysis and visualization. Finally, I illustrate the generic workflow by describing an exemplary study from the context of social TV research. At the end of the article, the main issues concerning tools and methods for the analysis of Twitter data are briefly addressed