12 research outputs found
DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity
Nowadays, events usually burst and are propagated online through multiple
modern media like social networks and search engines. There exists various
research discussing the event dissemination trends on individual medium, while
few studies focus on event popularity analysis from a cross-platform
perspective. Challenges come from the vast diversity of events and media,
limited access to aligned datasets across different media and a great deal of
noise in the datasets. In this paper, we design DancingLines, an innovative
scheme that captures and quantitatively analyzes event popularity between
pairwise text media. It contains two models: TF-SW, a semantic-aware popularity
quantification model, based on an integrated weight coefficient leveraging
Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series
alignment model matching different event phases adapted from Dynamic Time
Warping. We also propose three metrics to interpret event popularity trends
between pairwise social platforms. Experimental results on eighteen real-world
event datasets from an influential social network and a popular search engine
validate the effectiveness and applicability of our scheme. DancingLines is
demonstrated to possess broad application potentials for discovering the
knowledge of various aspects related to events and different media
Extreme value prediction: an application to sport records
Extreme value theory studies the extreme deviations from the central
portion of a probability distribution.
Results in this field have considerable importance in assessing the risk
that characterises rare events, such as collapse of the stock market,
or earthquakes of exceptional intensity, or floods.
In the last years, application of extreme value theory for prediction
of sport records have received increased interest by the scientific community.
In this work we face the problem of constructing prediction limits for series
of extreme values coming from sport data.
We propose the use of a calibration procedure applied to the generalised
extreme value distribution, in order to obtain a proper predictive distribution
for future records.
The calibrated procedure is applied to series of real data related
to sport records. In particular, we consider sequences of annual maxima
for different athletic events.
Using the proposed calibrated predictive distribution, we show how to correctly predict
the probability of future records and we discuss the existence and
interpretation of ultimate records
Simultaneous calibrated prediction intervals for time series
This paper deals with simultaneous prediction for time series models. In
particular, it presents a simple procedure which gives well-calibrated simultaneous
predictive intervals with coverage probability equal or close to the target nominal
value. Although the exact computation of the proposed intervals is usually not feasible,
an approximation can be easily obtained by means of a suitable bootstrap simulation
procedure. This new predictive solution is much simpler to compute than
those ones already proposed in the literature based on asymptotic calculations. An
application of the bootstrap calibrated procedure to first order autoregressive models
is presented
Robust prediction limits based on M-estimators
We discuss a robust solution to the problem of prediction. Extending Barndorff-Nielsen and Cox [1996. Prediction and asymptotics. Bernoulli 2, 319-340] and Vidoni [1998. A note on modified estimative prediction limits and distributions. Biometrika 85, 949-953], we propose improved prediction limits based on M-estimators. To compute them, the expressions of the bias and variance of an M-estimator are required. In view of this, a general asymptotic approximation for the bias of an M-estimator is derived. Moreover, by means of comparative studies in the context of affine transformation models, we show that the proposed robust procedure for prediction can be successfully used in a parametric setting.Bias Influence function Prediction Robustness Scale-regression model
A Characterization of Monotone and Regular Divergences
Differential geometry, divergence, embedding invariance, Markov embedding, α-connection,
A characterization of monotone and regular divergences
Preprint enviat per a la seva publicaciĂł en una revista cientĂfica: Annals of the Institute of Statistical Mathematics, (1998), volume 50, nÂş 3, pp. 433–450. [http://doi.org/10.1023/A:1003569210573]In this paper we characterize the local structure of monotone and regular divergences, which include f-divergences as a particular case, by giving their Taylor expansion up to fourth order. We extend a previous result obtained by ÄŚencov, using the invariant properties of Amari's α-connections
A study on microblog and search engine user behaviors: how twitter trending topics help predict google hot queries
Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. We claim that social trends fired by Twitter may help explain and predict web trends derived from Google. Indeed, we argue that information flooding nearly real-time across the Twitter social network could anticipate the set of topics that users will later search on the Web. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google.
Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter “asis” are also found as hot queries issued to Google.
Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models.
First, we find that Google by its own is not able to effectively predict the time behavior of its trends. Indeed, we show that autoregressive models, which try to fit time series of Google trends, perform poorly.
On the other hand, we validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times. Moreover, we discover that a Twitter trend causes a similar Google trend to later occur about 43% of times. In the end, we show that
the very best-performing models are those using past values of both Twitter and Google
A Study on Microblog and Search Engine User Behaviors: How Twitter Trending Topics Help Predict Google Hot Queries
Once every five minutes, Twitter publishes a list of trending topics by monitoring and analyzing tweets from its users. Similarly, Google makes available hourly a list of hot queries that have been issued to the search engine. We claim that social trends fired by Twitter may help explain and predict web trends derived from Google. Indeed, we argue that information flooding nearly real-time across the Twitter social network could anticipate the set of topics that users will later search on the Web. In this work, we analyze the time series derived from the daily volume index of each trend, either by Twitter or Google.
Our study on a real-world dataset reveals that about 26% of the trending topics raising from Twitter “asis” are also found as hot queries issued to Google.
Also, we find that about 72% of the similar trends appear first on Twitter. Thus, we assess the relation between comparable Twitter and Google trends by testing three classes of time series regression models.
First, we find that Google by its own is not able to effectively predict the time behavior of its trends. Indeed, we show that autoregressive models, which try to fit time series of Google trends, perform poorly.
On the other hand, we validate the forecasting power of Twitter by showing that models, which use Google as the dependent variable and Twitter as the explanatory variable, retain as significant the past values of Twitter 60% of times. Moreover, we discover that a Twitter trend causes a similar Google trend to later occur about 43% of times. In the end, we show that
the very best-performing models are those using past values of both Twitter and Google
On the relationships between \alpha-connections and the asymptotic properties of predictive distributions
Preprint enviat per a la seva publicaciĂł en una revista cientĂfica: Bernoulli, 1999, vol. 5, nĂşm. 1, p. 163-176. [http://projecteuclid.org/euclid.bj/1173707099]In a recent paper Komaki studies the second-order asymptotic properties
of the predictive distributions, using the Kullback-Leibler divergence
as loss function. He shows that estimative distributions with asymptotically
efficient estimators can be improved by predictive distributions
that do not belong to the model. The model is assumed to be a multidimensional
curved exponential family. In this paper we generalize the
result assuming as loss function any f-divergence. It appears a relationship
between the a-connections and the optimal predictive distributions.
In particular, using an a-divergence to measure the goodness of a predictive
distribution, the optimal shift of the estimative distribution is related
with alpha-covariant derivatives. The expression we obtain for the asymptotic
risk is also useful to study the higher-order asymptotic properties of
an estimator, in the mentioned class of loss functions