4 research outputs found

    Truncated sequential Monte Carlo test with exact power.

    No full text
    Monte Carlo hypothesis testing is extensively used for statistical inference. Surprisingly, despite the many theoretical advances in the field, statistical power performance of Monte Carlo tests remains an open question. Because the last assertion may sound questionable for some, the first goal in this paper is to show that the power performance of truncated Monte Carlo tests is still an unsolved question. The second goal here is to present a solution for this issue, that is, we introduce a truncated sequential Monte Carlo procedure with statistical power arbitrarily close to the power of the theoretical exact test. The most significant contribution of this work is the validity of our method for the general case of any test statistic

    A critical look at prospective surveillance using a scan statistic.

    No full text
    The scan statistic is a very popular surveillance technique for purely spatial, purely temporal, and spatialtemporal disease data. It was extended to the prospective surveillance case, and it has been applied quite extensively in this situation.When the usual signal rules, as those implemented in SaTScanTM( Boston, MA, USA) software, are used, we show that the scan statistic method is not appropriate for the prospective case. The reason is that it does not adjust properly for the sequential and repeated tests carried out during the surveillance. We demonstrate that the nominal significance level ? is notmeaningful and there is no relationship between ? and the recurrence interval or the average run length (ARL). In some cases, the ARL may be equal to ?, which makes the method ineffective. This lack of control of the type-I error probability and of the ARL leads us to strongly oppose the use of the scan statistic with the usual signal rules in the prospective context

    Exploring multiple evidence to inferusers? location in twitter.

    No full text
    Online social networks are valuable sources of information to monitor real-time events, such as earthquakes and epidemics. For this type of surveillance, users? location is an essential piece of information, but a substantial number of users choose not to disclose their geographical location. However, characteristics of the users' behavior, such as the friends they associate with and the types of messages published may hint on their spatial location. In this paper, we propose a method to infer the spatial location of Twitter users. Unlike the approaches proposed so far, it incorporates two sources of information to learn geographical position: the text posted by users and their friendship network. We propose a probabilistic approach that jointly models the geographical labels and Twitter texts of users organized in the form of a graph representing the friendship network. We use the Markov random ?eld probability model to represent the network, and learning is carried out through a Markov Chain Monte Carlo simulation technique to approximate the posterior probability distribution of the missing geographical labels. We show the accuracy of the algorithm in a large dataset of Twitter users, where the ground truth is the location given by GPS. The method presents promising results, with little sensitivity to parameters and high values of precision

    In search of a stochastic model for the E-News Reader.

    No full text
    E-news readers have increasingly at their disposal a broad set of news articles to read. Online newspaper sites use recommender systems to predict and to offer relevant articles to their users. Typically, these recommender systems do not leverage users? reading behavior. If we know how the topics-reads change in a reading session, we may lead to fine-tuned recommendations, for example, after reading a certain number of sports items, it may be counter-productive to keep recommending other sports news. The motivation for this article is the assumption that understanding user behavior when reading successive online news articles can help in developing better recommender systems. We propose five categories of stochastic models to describe this behavior depending on how the previous reading history affects the future choices of topics. We instantiated these five classes with many different stochastic processes covering short-term memory, revealed-preference, cumulative advantage, and geometric sojourn models. Our empirical study is based on large datasets of E-news from two online newspapers. We collected data from more than 13 million users who generated more than 23 million reading sessions, each one composed by the successive clicks of the users on the posted news. We reduce each user session to the sequence of reading news topics. The models were fitted and compared using the Akaike Information Criterion and the Brier Score. We found that the best models are those in which the user moves through topics influenced only by their most recent readings. Our models were also better to predict the next reading than the recommender systems currently used in these journals showing that our models can improve user satisfaction
    corecore