5,306 research outputs found

    Alpha-Investing: A Procedure for Sequential Control of Expected False Discoveries

    Get PDF
    Alpha-investing is an adaptive, sequential methodology that encompasses a large family of procedures for testing multiple hypotheses. All control mFDR, which is the ratio of the expected number of false rejections to the expected number of rejections. mFDR is a weaker criterion than FDR, which is the expected value of the ratio. We compensate for this weakness by showing that alpha-investing controls mFDR at every rejected hypothesis. Alpha-investing resembles alpha-spending used in sequential trials, but possesses a key difference. When a test rejects a null hypothesis, alpha-investing earns additional probability toward subsequent tests. Alpha-investing hence allows one to incorporate domain knowledge into the testing procedure and improve the power of the tests. In this way, alpha-investing enables the statistician to design a testing procedure for a specific problem while guaranteeing control of mFDR

    Perdido Key Beach nourishment project: Gulf Islands National Seashore 1991 annual report

    Get PDF
    This report is the second annual report in a continuing series documenting a field project within the Gulf Islands National Seashore at Perdido Key, Florida. The field project includes the monitoring of a number of physical parameters related to the evolution of the Perdido Key beach nourishment project. Approximately 4.1 million m3 of dredge spoil from Pensacola Pass were placed upon approximately 7 km of the Gulf of Mexico beaches of Perdido Key between November, 1989, and September, 1990. Beach profile data describing the evolution of the nourished beach are included, as well as wave, current, tide, wind, temperature, and rainfall data to describe the forces influencing the evolution. Data describing the sediment sizes throughout the project area are also included. A brief discussion of the data is included; a more detailed analysis and interpretation will be presented in the lead author's Ph.D. dissertation. (313 pp.

    Risk Inflation of Sequential Tests Controlled by Alpha Investing

    Get PDF
    Streaming feature selection is a greedy approach to variable selection that evaluates potential explanatory variables sequentially. It selects significant features as soon as they are discovered rather than testing them all and picking the best one. Because it is so greedy, streaming selection can rapidly explore large collections of features. If significance is defined by an alpha investing protocol, then the rate of false discoveries will be controlled. The focus of attention in variable selection, however, should be on fit rather than hypothesis testing. Little is known, however, about the risk of estimators produced by streaming selection and how the configuration of these estimators influences the risk. To meet these needs, we provide a computational framework based on stochastic dynamic programming that allows fast calculation of the minimax risk of a sequential estimator relative to an alternative. The alternative can be data driven or derived from an oracle. This framework allows us to compute and contrast the risk inflation of sequential estimators derived from various alpha investing rules. We find that a universal investing rule performs well over a variety of models and that estimators allowed to have larger than conventional rates of false discoveries produce generally smaller risk

    Being Warren Buffett: A Classroom Simulation of Risk and Wealth When Investing in the Stock Market

    Get PDF
    Students who are new to Statistics and its role in modern Finance have a hard time making the connection between variance and risk. To link these, we developed a classroom simulation in which groups of students roll dice that simulate the success of three investments. The simulated investments behave quite differently: one remains almost constant, another drifts slowly upward, and the third climbs to extremes or plummets. As the simulation proceeds, some groups have great success with this last investment – they become the “Warren Buffetts” of the class, accumulating far greater wealth than their classmates. For most groups, however, this last investment leads to ruin because of its volatility, the variance in its returns. The marked difference in outcomes surprises students who discover how hard it is to separate luck from skill. The simulation also demonstrates how portfolios, weighted combinations of investments, reduce the variance. Students discover that a mixture of two poor investments emerges as a surprising performer. After this experience, our students immediately associate financial volatility with variance. This lesson also introduces students to the history of the stock market in the US. We calibrated the returns on two simulated investments to mimic returns on US Treasury Bills and stocks

    The Competitive Complexity Ratio

    Get PDF
    The competitive complexity ratio is the worst case ratio of the regret of a data-driven model to that obtained by a model which benefits from side information. The side information bounds the sizes of unknown parameters. The ratio requires the use of a variation on parametric complexity, which we call the unconditional parametric complexity. We show that the optimal competitive complexity ratio is bounded and contrast this result with comparable results in statistics

    Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy

    Get PDF
    We develop and illustrate a methodology for fitting models to large, complex data sets. The methodology uses standard regression techniques that make few assumptions about the structure of the data. We accomplish this with three small modifications to stepwise regression: (1) We add interactions to capture non-linearities and indicator functions to capture missing values; (2) We exploit modern decision theoretic variable selection criteria; and (3) We estimate standard error using a conservative approach that works for heteroscedastic data. Omitting any one of these modifications leads to poor performance. We illustrate our methodology by predicting the onset of personal bankruptcy among users of credit cards. This applications presents many challenges, ranging from the rare frequency of bankruptcy to the size of the available database. Only 2,244 bankruptcy events appear among some 3 million months of customer activity. To predict these, we begin with 255 features to which we add missing value indicators and pairwise interactions that expand to a set of over 67,000 potential predictors. From these, our method selects a model with 39 predictors chosen by sequentially comparing estimates of their significance to a series of thresholds. The resulting model not only avoids over-fitting the data, it also predicts well out of sample. To find half of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations, one need only search the 8500 cases having the largest model predictions.
    • …
    corecore