65,478 research outputs found
Analysis and Forecasting of Trending Topics in Online Media Streams
Among the vast information available on the web, social media streams capture
what people currently pay attention to and how they feel about certain topics.
Awareness of such trending topics plays a crucial role in multimedia systems
such as trend aware recommendation and automatic vocabulary selection for video
concept detection systems.
Correctly utilizing trending topics requires a better understanding of their
various characteristics in different social media streams. To this end, we
present the first comprehensive study across three major online and social
media streams, Twitter, Google, and Wikipedia, covering thousands of trending
topics during an observation period of an entire year. Our results indicate
that depending on one's requirements one does not necessarily have to turn to
Twitter for information about current events and that some media streams
strongly emphasize content of specific categories. As our second key
contribution, we further present a novel approach for the challenging task of
forecasting the life cycle of trending topics in the very moment they emerge.
Our fully automated approach is based on a nearest neighbor forecasting
technique exploiting our assumption that semantically similar topics exhibit
similar behavior.
We demonstrate on a large-scale dataset of Wikipedia page view statistics
that forecasts by the proposed approach are about 9-48k views closer to the
actual viewing statistics compared to baseline methods and achieve a mean
average percentage error of 45-19% for time periods of up to 14 days.Comment: ACM Multimedia 201
Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data
Traffic flow count data in networks arise in many applications, such as
automobile or aviation transportation, certain directed social network
contexts, and Internet studies. Using an example of Internet browser traffic
flow through site-segments of an international news website, we present
Bayesian analyses of two linked classes of models which, in tandem, allow fast,
scalable and interpretable Bayesian inference. We first develop flexible
state-space models for streaming count data, able to adaptively characterize
and quantify network dynamics efficiently in real-time. We then use these
models as emulators of more structured, time-varying gravity models that allow
formal dissection of network dynamics. This yields interpretable inferences on
traffic flow characteristics, and on dynamics in interactions among network
nodes. Bayesian monitoring theory defines a strategy for sequential model
assessment and adaptation in cases when network flow data deviates from
model-based predictions. Exploratory and sequential monitoring analyses of
evolving traffic on a network of web site-segments in e-commerce demonstrate
the utility of this coupled Bayesian emulation approach to analysis of
streaming network count data.Comment: 29 pages, 16 figure
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
AUGUR: Forecasting the Emergence of New Research Topics
Being able to rapidly recognise new research trends is strategic for many stakeholders, including universities, institutional funding bodies, academic publishers and companies. The literature presents several approaches to identifying the emergence of new research topics, which rely on the assumption that the topic is already exhibiting a certain degree of popularity and consistently referred to by a community of researchers. However, detecting the emergence of a new research area at an embryonic stage, i.e., before the topic has been consistently labelled by a community of researchers and associated with a number of publications, is still an open challenge. We address this issue by introducing Augur, a novel approach to the early detection of research topics. Augur analyses the diachronic relationships between research areas and is able to detect clusters of topics that exhibit dynamics correlated with the emergence of new research topics. Here we also present the Advanced Clique Percolation Method (ACPM), a new community detection algorithm developed specifically for supporting this task. Augur was evaluated on a gold standard of 1,408 debutant topics in the 2000-2011 interval and outperformed four alternative approaches in terms of both precision and recall
Recommended from our members
Weather, climate, and hydrologic forecasting for the US Southwest: A survey
As part of a regional integrated assessment of climate vulnerability, a survey was conducted from June 1998 to May 2000 of weather, climate, and hydrologic forecasts with coverage of the US Southwest and an emphasis on the Colorado River Basin. The survey addresses the types of forecasts that were issued, the organizations that provided them, and techniques used in their generation. It reflects discussions with key personnel from organizations involved in producing or issuing forecasts, providing data for making forecasts, or serving as a link for communicating forecasts. During the survey period, users faced a complex and constantly changing mix of forecast products available from a variety of sources. The abundance of forecasts was not matched in the provision of corresponding interpretive materials, documentation about how the forecasts were generated, or reviews of past performance. Potential existed for confusing experimental and research products with others that had undergone a thorough review process, including official products issued by the National Weather Service. Contrasts between the state of meteorologic and hydrologic forecasting were notable, especially in the former's greater operational flexibility and more rapid incorporation of new observations and research products. Greater attention should be given to forecast content and communication, including visualization, expression of probabilistic forecasts and presentation of ancillary information. Regional climate models and use of climate forecasts in water supply forecasting offer rapid improvements in predictive capabilities for the Southwest. Forecasts and production details should be archived, and publicly available forecasts should be accompanied by performance evaluations that are relevant to users
Fostering collective intelligence education
New educational models are necessary to update learning environments to the digitally shared communication and information. Collective intelligence is an emerging field that already has a significant impact in many areas and will have great implications in education, not only from the side of new methodologies but also as a challenge for education. This paper proposes an approach to a collective intelligence model of teaching using Internet to combine two strategies: idea management and real time assessment in the class. A digital tool named Fabricius has been created supporting these two elements to foster the collaboration and engagement of students in the learning process. As a result of the research we propose a list of KPI trying to measure individual and collective performance. We are conscious that this is just a first approach to define which aspects of a class following a course can be qualified and quantified.Postprint (published version
- …