13,886 research outputs found
Quantifying Biases in Online Information Exposure
Our consumption of online information is mediated by filtering, ranking, and
recommendation algorithms that introduce unintentional biases as they attempt
to deliver relevant and engaging content. It has been suggested that our
reliance on online technologies such as search engines and social media may
limit exposure to diverse points of view and make us vulnerable to manipulation
by disinformation. In this paper, we mine a massive dataset of Web traffic to
quantify two kinds of bias: (i) homogeneity bias, which is the tendency to
consume content from a narrow set of information sources, and (ii) popularity
bias, which is the selective exposure to content from top sites. Our analysis
reveals different bias levels across several widely used Web platforms. Search
exposes users to a diverse set of sources, while social media traffic tends to
exhibit high popularity and homogeneity bias. When we focus our analysis on
traffic to news sites, we find higher levels of popularity bias, with smaller
differences across applications. Overall, our results quantify the extent to
which our choices of online systems confine us inside "social bubbles."Comment: 25 pages, 10 figures, to appear in the Journal of the Association for
Information Science and Technology (JASIST
Is That Twitter Hashtag Worth Reading
Online social media such as Twitter, Facebook, Wikis and Linkedin have made a
great impact on the way we consume information in our day to day life. Now it
has become increasingly important that we come across appropriate content from
the social media to avoid information explosion. In case of Twitter, popular
information can be tracked using hashtags. Studying the characteristics of
tweets containing hashtags becomes important for a number of tasks, such as
breaking news detection, personalized message recommendation, friends
recommendation, and sentiment analysis among others.
In this paper, we have analyzed Twitter data based on trending hashtags,
which is widely used nowadays. We have used event based hashtags to know users'
thoughts on those events and to decide whether the rest of the users might find
it interesting or not. We have used topic modeling, which reveals the hidden
thematic structure of the documents (tweets in this case) in addition to
sentiment analysis in exploring and summarizing the content of the documents. A
technique to find the interestingness of event based twitter hashtag and the
associated sentiment has been proposed. The proposed technique helps twitter
follower to read, relevant and interesting hashtag.Comment: 10 pages, 6 figures, Presented at the Third International Symposium
on Women in Computing and Informatics (WCI-2015
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
Early Prediction of Movie Box Office Success based on Wikipedia Activity Big Data
Use of socially generated "big data" to access information about collective
states of the minds in human societies has become a new paradigm in the
emerging field of computational social science. A natural application of this
would be the prediction of the society's reaction to a new product in the sense
of popularity and adoption rate. However, bridging the gap between "real time
monitoring" and "early predicting" remains a big challenge. Here we report on
an endeavor to build a minimalistic predictive model for the financial success
of movies based on collective activity data of online users. We show that the
popularity of a movie can be predicted much before its release by measuring and
analyzing the activity level of editors and viewers of the corresponding entry
to the movie in Wikipedia, the well-known online encyclopedia.Comment: 13 pages, Including Supporting Information, 7 Figures, Download the
dataset from: http://wwm.phy.bme.hu/SupplementaryDataS1.zi
Social media analytics: a survey of techniques, tools and platforms
This paper is written for (social science) researchers seeking to analyze the wealth of social media now available. It presents a comprehensive review of software tools for social networking media, wikis, really simple syndication feeds, blogs, newsgroups, chat and news feeds. For completeness, it also includes introductions to social media scraping, storage, data cleaning and sentiment analysis. Although principally a review, the paper also provides a methodology and a critique of social media tools. Analyzing social media, in particular Twitter feeds for sentiment analysis, has become a major research and business activity due to the availability of web-based application programming interfaces (APIs) provided by Twitter, Facebook and News services. This has led to an ‘explosion’ of data services, software tools for scraping and analysis and social media analytics platforms. It is also a research area undergoing rapid change and evolution due to commercial pressures and the potential for using social media data for computational (social science) research. Using a simple taxonomy, this paper provides a review of leading software tools and how to use them to scrape, cleanse and analyze the spectrum of social media. In addition, it discussed the requirement of an experimental computational environment for social media research and presents as an illustration the system architecture of a social media (analytics) platform built by University College London. The principal contribution of this paper is to provide an overview (including code fragments) for scientists seeking to utilize social media scraping and analytics either in their research or business. The data retrieval techniques that are presented in this paper are valid at the time of writing this paper (June 2014), but they are subject to change since social media data scraping APIs are rapidly changing
- …