4 research outputs found
Truncated sequential Monte Carlo test with exact power.
Monte Carlo hypothesis testing is extensively used for statistical
inference. Surprisingly, despite the many theoretical advances in the field,
statistical power performance of Monte Carlo tests remains an open question.
Because the last assertion may sound questionable for some, the first goal in
this paper is to show that the power performance of truncated Monte Carlo
tests is still an unsolved question. The second goal here is to present a solution
for this issue, that is, we introduce a truncated sequential Monte Carlo
procedure with statistical power arbitrarily close to the power of the theoretical
exact test. The most significant contribution of this work is the validity of
our method for the general case of any test statistic
A critical look at prospective surveillance using a scan statistic.
The scan statistic is a very popular surveillance technique for purely spatial, purely temporal, and spatialtemporal
disease data. It was extended to the prospective surveillance case, and it has been applied quite
extensively in this situation.When the usual signal rules, as those implemented in SaTScanTM( Boston, MA, USA)
software, are used, we show that the scan statistic method is not appropriate for the prospective case. The reason
is that it does not adjust properly for the sequential and repeated tests carried out during the surveillance. We
demonstrate that the nominal significance level ? is notmeaningful and there is no relationship between ? and the
recurrence interval or the average run length (ARL). In some cases, the ARL may be equal to ?, which makes
the method ineffective. This lack of control of the type-I error probability and of the ARL leads us to strongly
oppose the use of the scan statistic with the usual signal rules in the prospective context
Exploring multiple evidence to inferusers? location in twitter.
Online social networks are valuable sources of information to monitor real-time events,
such as
earthquakes and epidemics. For this type of surveillance, users? location is an
essential piece of information, but a substantial number of users choose not to disclose their
geographical location. However, characteristics of the users' behavior, such as the friends they
associate with and the types of messages published may hint on their spatial location. In this
paper, we propose a method to infer the spatial location of Twitter users. Unlike the approaches
proposed so far, it incorporates two sources of information to learn geographical position: the
text posted by users and their friendship network. We propose a probabilistic approach that
jointly models the geographical labels and Twitter texts of users organized in the form of a graph
representing the friendship network. We use the Markov random ?eld probability model to represent
the network, and learning is carried out through a Markov Chain Monte Carlo simulation technique
to approximate the posterior probability distribution of the missing geographical labels. We show
the accuracy of the algorithm in a large dataset of Twitter users, where the ground truth is the
location given by GPS. The method presents promising results, with little
sensitivity to parameters and high values of precision
In search of a stochastic model for the E-News Reader.
E-news readers have increasingly at their disposal a broad set of news articles to read. Online newspaper sites use recommender systems to predict and to offer relevant articles to their users. Typically, these recommender systems do not leverage users? reading behavior. If we know how the topics-reads change in a reading session, we may lead to fine-tuned recommendations, for example, after reading a certain number of sports items, it may be counter-productive to keep recommending other sports news. The motivation for this article is the assumption that understanding user behavior when reading successive online news articles can help in developing better recommender systems. We propose five categories of stochastic models to describe this behavior depending on how the previous reading history affects the future choices of topics. We instantiated these five classes with many different stochastic processes covering short-term memory, revealed-preference, cumulative advantage, and geometric sojourn models. Our empirical study is based on large datasets of E-news from two online newspapers. We collected data from more than 13 million users who generated more than 23 million reading sessions, each one composed by the successive clicks of the users on the posted news. We reduce each user session to the sequence of reading news topics. The models were fitted and compared using the Akaike Information Criterion and the Brier Score. We found that the best models are those in which the user moves through topics influenced only by their most recent readings. Our models were also better to predict the next reading than the recommender systems currently used in these journals showing that our models can improve user satisfaction