Search CORE

209 research outputs found

Explaining Normal Quantile-Quantile Plots Through Animation: The Water-Filling Analogy

Author: Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/01/2017
Field of study

A normal quantile-quantile (QQ) plot is an important diagnostic for checking the assumption of normality. Though useful, these plots confuse students in my introductory statistics classes. A water-filling analogy, however, intuitively conveys the underlying concept. This analogy characterizes a QQ plot as a parametric plot of the water levels in two gradually filling vases. Each vase takes its shape from a probability distribution or sample. If the vases share a common shape, then the water levels match throughout the filling, and the QQ plot traces a diagonal line. An R package qqvases provides an interactive animation of this process and is suitable for classroom use

ScholarlyCommons@Penn

The Francis Crick Institute

Model Selection Using Information Theory and the MDL Principle

Author: Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/11/2004
Field of study

Information theory offers a coherent, intuitive view of model selection. This perspective arises from thinking of a statistical model as a code, an algorithm for compressing data into a sequence of bits. The description length is the length of this code for the data plus the length of a description of the model itself. The length of the code for the data measures the fit of the model to the data, whereas the length of the code for the model measures its complexity. The minimum description length (MDL) principle picks the model with smallest description length, balancing fit versus complexity. The conversion of a model into a code is flexible; one can represent a regression model, for example, with codes that reproduce the AIC and BIC as well as motivate other model selection criteria. Going further, information theory allows one to choose from among various types of non-nested models, such as tree-based models and regressions identified from different sets of predictors. A running example that compares several models for the well-known Boston housing data illustrates the ideas

ScholarlyCommons@Penn

Autocovariance Structure of Markov Regime Switching Models and Model Selection

Author: Stine Robert A
Zhang Jing
Publication venue: ScholarlyCommons
Publication date: 01/01/2001
Field of study

We show that the covariance function of a second-order stationary vector Markov regime switching time series has a vector ARMA(p,q) representation, where upper bounds for p and q are elementary functions of the number of regimes. These bounds apply to vector Markov regime switching processes with both mean–variance and autoregressive switching. This result yields an easily computed method for setting a lower bound on the number of underlying Markov regimes from an estimated autocovariance function

ScholarlyCommons@Penn

Alpha-Investing: A Procedure for Sequential Control of Expected False Discoveries

Author: Foster Dean
Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/04/2008
Field of study

Alpha-investing is an adaptive, sequential methodology that encompasses a large family of procedures for testing multiple hypotheses. All control mFDR, which is the ratio of the expected number of false rejections to the expected number of rejections. mFDR is a weaker criterion than FDR, which is the expected value of the ratio. We compensate for this weakness by showing that alpha-investing controls mFDR at every rejected hypothesis. Alpha-investing resembles alpha-spending used in sequential trials, but possesses a key difference. When a test rejects a null hypothesis, alpha-investing earns additional probability toward subsequent tests. Alpha-investing hence allows one to incorporate domain knowledge into the testing procedure and improve the power of the tests. In this way, alpha-investing enables the statistician to design a testing procedure for a specific problem while guaranteeing control of mFDR

ScholarlyCommons@Penn

Spatio-Temporal Low Count Processes with Application to Violent Crime Events

Author: Aldor-Noiman Sivan
Brown Lawrence D.
Fox Emily B.
Stine Robert A.
Publication venue
Publication date: 20/04/2013
Field of study

There is significant interest in being able to predict where crimes will happen, for example to aid in the efficient tasking of police and other protective measures. We aim to model both the temporal and spatial dependencies often exhibited by violent crimes in order to make such predictions. The temporal variation of crimes typically follows patterns familiar in time series analysis, but the spatial patterns are irregular and do not vary smoothly across the area. Instead we find that spatially disjoint regions exhibit correlated crime patterns. It is this indeterminate inter-region correlation structure along with the low-count, discrete nature of counts of serious crimes that motivates our proposed forecasting tool. In particular, we propose to model the crime counts in each region using an integer-valued first order autoregressive process. We take a Bayesian nonparametric approach to flexibly discover a clustering of these region-specific time series. We then describe how to account for covariates within this framework. Both approaches adjust for seasonality. We demonstrate our approach through an analysis of weekly reported violent crimes in Washington, D.C. between 2001-2008. Our forecasts outperform standard methods while additionally providing useful tools such as prediction intervals

arXiv.org e-Print Archive

CiteSeerX

The Competitive Complexity Ratio

Author: Foster Dean P
Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/01/2000
Field of study

The competitive complexity ratio is the worst case ratio of the regret of a data-driven model to that obtained by a model which benefits from side information. The side information bounds the sizes of unknown parameters. The ratio requires the use of a variation on parametric complexity, which we call the unconditional parametric complexity. We show that the optimal competitive complexity ratio is bounded and contrast this result with comparable results in statistics

CiteSeerX

ScholarlyCommons@Penn

Risk Inflation of Sequential Tests Controlled by Alpha Investing

Author: Foster Dean P
Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/01/2015
Field of study

Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk

ScholarlyCommons@Penn

Being Warren Buffett: A Classroom Simulation of Risk and Wealth When Investing in the Stock Market

Author: Foster Dean P
Stine Robert A
Publication venue: ScholarlyCommons
Publication date: 01/01/2006
Field of study

Students who are new to Statistics and its role in modern Finance have a hard time making the connection between variance and risk. To link these, we developed a classroom simulation in which groups of students roll dice that simulate the success of three investments. The simulated investments behave quite differently: one remains almost constant, another drifts slowly upward, and the third climbs to extremes or plummets. As the simulation proceeds, some groups have great success with this last investment – they become the “Warren Buffetts” of the class, accumulating far greater wealth than their classmates. For most groups, however, this last investment leads to ruin because of its volatility, the variance in its returns. The marked difference in outcomes surprises students who discover how hard it is to separate luck from skill. The simulation also demonstrates how portfolios, weighted combinations of investments, reduce the variance. Students discover that a mixture of two poor investments emerges as a surprising performer. After this experience, our students immediately associate financial volatility with variance. This lesson also introduces students to the history of the stock market in the US. We calibrated the returns on two simulated investments to mimic returns on US Treasury Bills and stocks

ScholarlyCommons@Penn