54 research outputs found

    Bootstrap schemes for time series (in Russian)

    Get PDF
    We review and compare block, sieve and local bootstraps for time series and thereby illuminate theoretical aspects of the procedures as well as their performance on finite-sample data. Our view is selective with the intention of providing a new and fair picture of some particular aspects of bootstrapping time series. The generality of the block bootstrap is contrasted with the sieve bootstrap. We discuss implementational advantages and disadvantages, and argue that the sieve often outperforms the block method. Local bootstraps, designed for nonparametric smoothing problems, are easy to use and implement but exhibit in some cases low performance.

    Model Selection over Partially Ordered Sets

    Full text link
    In problems such as variable selection and graph estimation, models are characterized by Boolean logical structure such as presence or absence of a variable or an edge. Consequently, false positive and false negative errors can be specified as the number of variables or edges that are incorrectly included/excluded in an estimated model. However, there are several other problems such as ranking, clustering, and causal inference in which the associated model classes do not admit transparent notions of false positive and false negative errors due to the lack of an underlying Boolean logical structure. In this paper, we present a generic approach to endow a collection of models with partial order structure, which leads to a hierarchical organization of model classes as well as natural analogs of false positive and false negative errors. We describe model selection procedures that provide false positive error control in our general setting and we illustrate their utility with numerical experiments

    PENALIZED LIKELIHOOD AND BAYESIAN METHODS FOR SPARSE CONTINGENCY TABLES: AN ANALYSIS OF ALTERNATIVE SPLICING IN FULL-LENGTH cDNA LIBRARIES

    Get PDF
    We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a simulation study and we apply the proposed methods to full-length cDNA libraries, yielding valuable insight into the biological process of alternative splicing

    Sieve bootstrap for time series

    No full text
    We study a bootstrap method which is based on the method of sieves. A linear process is approximated by a sequence of autoregressive processes of order p = p(n), where p(n)!1; p(n) = o(n) as the sample size n!1. For given data, we then estimate such anAR(p(n)) model and generate a bootstrap sample by resampling from the residuals. This sieve bootstrap enjoys a nice nonparametric property. We show its consistency for a class of nonlinear estimators and compare the procedure with the blockwise bootstrap, which has been proposed by Kunsch (1989). In particular, the sieve bootstrap variance of the mean is shown to have a better rate of convergence if the dependence between separated values of the underlying process decreases su ciently fast with growing separation. Finally a simulation study helps illustrating the advantages and disadvantages of the sieve compared to the blockwise bootstrap

    Extreme events from the return-volume process: a discretization approach for complexity reduction

    No full text
    We propose the discretization of real-valued financial time series into few ordinal values and use sparse Markov chains within the framework of generalized linear models for such categorical time series. The discretization operation causes a large reduction in the complexity of the data. We analyse daily return and volume data and estimate the probability structure of the process of lower extreme, upper extreme and the complementary usual events. Knowing the whole probability law of such ordinalvalued vector processes of extreme events of return and volume allows us to quantify non-linear associations. In particular, we find a new kind of asymmetry in the return - volume relationship. Estimated probabilities are also used to compute the MAP predictor whose power is found to be remarkably high.

    Empirical Modeling of Extreme Events from Return{Volume Time Series in Stock Market

    No full text
    We propose the discretization of real-valued nancial time series into few ordinal values and use non-linear likelihood modeling for sparse Markov chains within the framework of generalized linear models for categorical time series. We analyze daily return and volume data and estimate the probability structure of the process of extreme lower, extreme upper and the complementary usual events. Knowing the whole probability lawofsuch ordinal-valued vector processes of extreme events of return and volume allows us to quantify non-linear associations. In particular, we nd a (new kind of) asymmetry in the return{volume relationship which isa partial answer to a research issue given by Karpo (1987). We also propose a simple prediction algorithm which is based on an empirically selected model

    Volatility and risk estimation with linear and nonlinear methods based on high frequency data

    No full text
    Accurate volatility predictions are crucial for the successful implementation of risk management. The use of high frequency data approximately renders volatility from a latent to an observable quantity, and opens new directions to forecast future volatilities. The goals in this paper are: (i) to select an accurate forecasting procedure for predicting volatilities based on high frequency data from various standard models and modern prediction tools; (ii) to evaluate the predictive potential of those volatility forecasts for both the realized and the true latent volatility; and (iii) to quantify the differences using volatility forecasts based on high frequency data and using a GARCH model for low frequency (e.g. daily) data, and study its implication in risk management for two widely used risk measures. The pay-off using high frequency data for the true latent volatility is empirically found to be still present, but magnitudes smaller than suggested by simple analysis.

    Explaining Bagging

    No full text
    Bagging is one of the most eective computationally intensive procedures to improve on instable estimators or classiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to explain a variance reduction eect of bagging (or its variant) in hard decision problems, which include estimation after testing in regression and decision trees for continuous regression functions and classiers. Hard decisions create instability, and bagging is shown to smooth such hard decisions yielding smaller variance and mean squared error. With theoretical explanations, we motivate subagging based on subsampling as an alternative aggregation scheme. It is computationally cheaper but still showing approximately the same accuracy as bagging. Moreover, our theory reveals improvements in rst order and in line with simulation studies; in contrast with the second-order explanation of Friedman and Hall (2000) for smooth functional..
    corecore