2,132 research outputs found
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
A Latent Source Model for Nonparametric Time Series Classification
For classifying time series, a nearest-neighbor approach is widely used in
practice with performance often competitive with or better than more elaborate
methods such as neural networks, decision trees, and support vector machines.
We develop theoretical justification for the effectiveness of
nearest-neighbor-like classification of time series. Our guiding hypothesis is
that in many applications, such as forecasting which topics will become trends
on Twitter, there aren't actually that many prototypical time series to begin
with, relative to the number of time series we have access to, e.g., topics
become trends on Twitter only in a few distinct manners whereas we can collect
massive amounts of Twitter data. To operationalize this hypothesis, we propose
a latent source model for time series, which naturally leads to a "weighted
majority voting" classification rule that can be approximated by a
nearest-neighbor classifier. We establish nonasymptotic performance guarantees
of both weighted majority voting and nearest-neighbor classification under our
model accounting for how much of the time series we observe and the model
complexity. Experimental results on synthetic data show weighted majority
voting achieving the same misclassification rate as nearest-neighbor
classification while observing less of the time series. We then use weighted
majority to forecast which news topics on Twitter become trends, where we are
able to detect such "trending topics" in advance of Twitter 79% of the time,
with a mean early advantage of 1 hour and 26 minutes, a true positive rate of
95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Social media is often viewed as a sensor into various societal events such as
disease outbreaks, protests, and elections. We describe the use of social media
as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our
approach detects a broad range of cyber-attacks (e.g., distributed denial of
service (DDOS) attacks, data breaches, and account hijacking) in an
unsupervised manner using just a limited fixed set of seed event triggers. A
new query expansion strategy based on convolutional kernels and dependency
parses helps model reporting structure and aids in identifying key event
characteristics. Through a large-scale analysis over Twitter, we demonstrate
that our approach consistently identifies and encodes events, outperforming
existing methods.Comment: 13 single column pages, 5 figures, submitted to KDD 201
Fenomena Trending Topic Di Twitter: Analisis Wacana Twit #Savehajilulung
Media sosial Twitter paling aktif digunakan di Indonesia.Penggunanya kerap terlibat dalam topik yang sedang hangat dibicarakan di dunia maya.Tidak heran jika trending topicTwitter banyak didominasi oleh topik asal Indonesia.Fenomena yang melahirkan kebebasan berpendapat di media sosial ini juga memunculkan masalah dimana jika ada topik tertentu yang tidak disukai maka tanpa sungkan para netizen ramai-ramai mem-bully pihak-pihak tertentu salah satunya pada kasus Haji Lulung. Tujuan penelitian adalah untuk mendapatkan gambaran tentang analisis wacana twit #SaveHajiLulung yang menjadi trending topic di Twitter. Penelitian ini menggunakan metode analisis isi wacana van Dijk yaitu struktur makro, super struktur dan struktur mikro. Hasil penelitian menyimpulkan bahwa tema yang paling menonjol adalah pengambaran sosok Haji Lulung secara negatif (sindiran) oleh para netizen.Skema pendapat netizen ini mengikuti perkembangan isu kasus setiap harinya. Ada proses dimana wacana Haji Lulung ini pertama kali bergulir di media massa yang kemudian banyak mendapat tanggapan netizen. Proses wacana semakin berkembang menjadi tidak sekedar postingan twit dan retwit saja, namun diikuti dengan meme (gambar lucu) tentang Haji Lulung. Makna yang ditekankan kebanyakan mengandung unsur parodi, cenderung hiperbola (melebih-lebihkan) dan repetisi/alterasi (mengulang-ulangi)
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Event Detection in Twitter Using Multi Timing Chained Windows
Twitter is a popular microblogging and social networking service. Twitter posts are continuously generated and well suited for knowledge discovery using different data mining techniques. We present a novel near real-time approach for processing tweets and detecting events. The proposed method, Multi Timing Chained Windows (MTCW), is independent of the language of the tweets. The MTCW defines several Timing Windows and links them to each other like a chain. Indeed, in this chain, the input of the larger window will be the output of the smaller previous one. Using MTCW, the events can be detected over a few minutes. To evaluate this idea, the required dataset has been collected using the Twitter API. The results of evaluations show the accuracy and the effectiveness of our approach compared with other state-of-the-art methods in the event detection in Twitter
#mytweet via Instagram: Exploring User Behaviour across Multiple Social Networks
We study how users of multiple online social networks (OSNs) employ and share
information by studying a common user pool that use six OSNs - Flickr, Google+,
Instagram, Tumblr, Twitter, and YouTube. We analyze the temporal and topical
signature of users' sharing behaviour, showing how they exhibit distinct
behaviorial patterns on different networks. We also examine cross-sharing
(i.e., the act of user broadcasting their activity to multiple OSNs
near-simultaneously), a previously-unstudied behaviour and demonstrate how
certain OSNs play the roles of originating source and destination sinks.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 2015. This is the pre-peer reviewed version and the
final version is available at
http://wing.comp.nus.edu.sg/publications/2015/lim-et-al-15.pd
- …