54 research outputs found
Bootstrap schemes for time series (in Russian)
We review and compare block, sieve and local bootstraps for time series and thereby illuminate theoretical aspects of the procedures as well as their performance on finite-sample data. Our view is selective with the intention of providing a new and fair picture of some particular aspects of bootstrapping time series. The generality of the block bootstrap is contrasted with the sieve bootstrap. We discuss implementational advantages and disadvantages, and argue that the sieve often outperforms the block method. Local bootstraps, designed for nonparametric smoothing problems, are easy to use and implement but exhibit in some cases low performance.
Model Selection over Partially Ordered Sets
In problems such as variable selection and graph estimation, models are
characterized by Boolean logical structure such as presence or absence of a
variable or an edge. Consequently, false positive and false negative errors can
be specified as the number of variables or edges that are incorrectly
included/excluded in an estimated model. However, there are several other
problems such as ranking, clustering, and causal inference in which the
associated model classes do not admit transparent notions of false positive and
false negative errors due to the lack of an underlying Boolean logical
structure. In this paper, we present a generic approach to endow a collection
of models with partial order structure, which leads to a hierarchical
organization of model classes as well as natural analogs of false positive and
false negative errors. We describe model selection procedures that provide
false positive error control in our general setting and we illustrate their
utility with numerical experiments
PENALIZED LIKELIHOOD AND BAYESIAN METHODS FOR SPARSE CONTINGENCY TABLES: AN ANALYSIS OF ALTERNATIVE SPLICING IN FULL-LENGTH cDNA LIBRARIES
We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a simulation study and we apply the proposed methods to full-length cDNA libraries, yielding valuable insight into the biological process of alternative splicing
Sieve bootstrap for time series
We study a bootstrap method which is based on the method of sieves. A linear process is approximated by a sequence of autoregressive processes of order p = p(n), where p(n)!1; p(n) = o(n) as the sample size n!1. For given data, we then estimate such anAR(p(n)) model and generate a bootstrap sample by resampling from the residuals. This sieve bootstrap enjoys a nice nonparametric property. We show its consistency for a class of nonlinear estimators and compare the procedure with the blockwise bootstrap, which has been proposed by Kunsch (1989). In particular, the sieve bootstrap variance of the mean is shown to have a better rate of convergence if the dependence between separated values of the underlying process decreases su ciently fast with growing separation. Finally a simulation study helps illustrating the advantages and disadvantages of the sieve compared to the blockwise bootstrap
Extreme events from the return-volume process: a discretization approach for complexity reduction
We propose the discretization of real-valued financial time series into few ordinal values and use sparse Markov chains within the framework of generalized linear models for such categorical time series. The discretization operation causes a large reduction in the complexity of the data. We analyse daily return and volume data and estimate the probability structure of the process of lower extreme, upper extreme and the complementary usual events. Knowing the whole probability law of such ordinalvalued vector processes of extreme events of return and volume allows us to quantify non-linear associations. In particular, we find a new kind of asymmetry in the return - volume relationship. Estimated probabilities are also used to compute the MAP predictor whose power is found to be remarkably high.
Empirical Modeling of Extreme Events from Return{Volume Time Series in Stock Market
We propose the discretization of real-valued nancial time series into few ordinal values and use non-linear likelihood modeling for sparse Markov chains within the framework of generalized linear models for categorical time series. We analyze daily return and volume data and estimate the probability structure of the process of extreme lower, extreme upper and the complementary usual events. Knowing the whole probability lawofsuch ordinal-valued vector processes of extreme events of return and volume allows us to quantify non-linear associations. In particular, we nd a (new kind of) asymmetry in the return{volume relationship which isa partial answer to a research issue given by Karpo (1987). We also propose a simple prediction algorithm which is based on an empirically selected model
Recommended from our members
Double-estimation-friendly inference for high-dimensional misspecified models
All models may be wrong---but that is not necessarily a problem for inference. Consider the standard -test for the significance of a variable for predicting response whilst controlling for other covariates in a random design linear model. This yields correct asymptotic type~I error control for the null hypothesis that is conditionally independent of given under an \emph{arbitrary} regression model of on , provided that a linear regression model for on holds. An analogous robustness to misspecification, which we term the ``double-estimation-friendly'' (DEF) property, also holds for Wald tests in generalised linear models, with some small modifications.
In this expository paper we explore this phenomenon, and propose methodology for high-dimensional regression settings that respects the DEF property. We advocate specifying (sparse) generalised linear regression models for both and the covariate of interest ; our framework gives valid inference for the conditional independence null if either of these hold. In the special case where both specifications are linear, our proposal amounts to a small modification of the popular debiased Lasso test. We also investigate constructing confidence intervals for the regression coefficient of via inverting our tests; these have coverage guarantees even in partially linear models where the contribution of to can be arbitrary. Numerical experiments demonstrate the effectiveness of the methodology
Volatility and risk estimation with linear and nonlinear methods based on high frequency data
Accurate volatility predictions are crucial for the successful implementation of risk management. The use of high frequency data approximately renders volatility from a latent to an observable quantity, and opens new directions to forecast future volatilities. The goals in this paper are: (i) to select an accurate forecasting procedure for predicting volatilities based on high frequency data from various standard models and modern prediction tools; (ii) to evaluate the predictive potential of those volatility forecasts for both the realized and the true latent volatility; and (iii) to quantify the differences using volatility forecasts based on high frequency data and using a GARCH model for low frequency (e.g. daily) data, and study its implication in risk management for two widely used risk measures. The pay-off using high frequency data for the true latent volatility is empirically found to be still present, but magnitudes smaller than suggested by simple analysis.
Explaining Bagging
Bagging is one of the most eective computationally intensive procedures to improve on instable estimators or classiers, useful especially for high dimensional data set problems. Here we formalize the notion of instability and derive theoretical results to explain a variance reduction eect of bagging (or its variant) in hard decision problems, which include estimation after testing in regression and decision trees for continuous regression functions and classiers. Hard decisions create instability, and bagging is shown to smooth such hard decisions yielding smaller variance and mean squared error. With theoretical explanations, we motivate subagging based on subsampling as an alternative aggregation scheme. It is computationally cheaper but still showing approximately the same accuracy as bagging. Moreover, our theory reveals improvements in rst order and in line with simulation studies; in contrast with the second-order explanation of Friedman and Hall (2000) for smooth functional..
- …