488 research outputs found

    On High-Dimensional Misspecified Quantile Regression

    Full text link
    In this dissertation we develop theory for inference and uncertainty quantification for potentially misspecified quantile regression processes when the number of predictor variables increases with or exceeds the sample size. Potential misspecification of the fitted model is a fundamental problem in statistics which is exacerbated by today's high-dimensional datasets, and quantile regression is often used in complex situations in which misspecifications are highly likely. We make the following contributions: First, we establish a uniform-in-model strong Bahadur representation for misspecified quantile regression processes when the number of predictor variables increases and provide tight error bounds on its remainder term which hold uniformly over growing collections of quantile regression functions. Second, we derive an almost sure de-biased representation of the Lasso-penalized high-dimensional misspecified quantile regression process and analyze the theoretical properties of the misspecified post-Lasso quantile regression estimator. Third, to quantify the uncertainty associated with a misspecified quantile regression function we analyze its predictive risk and expected optimism. We propose uniformly consistent estimators for both quantities when the number of regression functions is growing moderately with the sample size. Empirical evidence shows that our estimators perform favorably against cross-validation estimates. Forth, we develop a set of new exponential and maximal inequalities which allow to control the fluctuations of a collection of suprema of empirical processes over classes of unbounded functions when both the collection of function classes and the complexity of each individual the function class grow with the sample size. These new inequalities are instrumental in deriving the theoretical results in this dissertation.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146023/1/giessing_1.pd

    Anti-concentration of Suprema of Gaussian Processes and Gaussian Order Statistics

    Full text link
    We derive, up to a constant factor, matching lower and upper bounds on the concentration functions of suprema of separable centered Gaussian processes and order statistics of Gaussian random fields. These bounds reveal that suprema of separable centered Gaussian processes {Xu:uU}\{X_u : u \in U\} exhibit the same anti-concentration properties as a single Gaussian random variable with mean zero and variance Var(supuUXu)\mathrm{Var}(\sup_{u \in U} X_u). To apply these results to high-dimensional statistical problems, it is therefore essential to understand the asymptotic behavior of Var(supuUXu)\mathrm{Var}(\sup_{u \in U} X_u) as the dimension or metric entropy of the index set UU increases. Consequently, we also derive lower and upper bounds on this quantity.Comment: 2

    Eliminating small cells from census counts tables: empirical vs. design transition probabilities

    Get PDF
    The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)Peer Reviewe

    Gaussian and Bootstrap Approximations for Suprema of Empirical Processes

    Full text link
    In this paper we develop non-asymptotic Gaussian approximation results for the sampling distribution of suprema of empirical processes when the indexing function class Fn\mathcal{F}_n varies with the sample size nn and may not be Donsker. Prior approximations of this type required upper bounds on the metric entropy of Fn\mathcal{F}_n and uniform lower bounds on the variance of fFnf \in \mathcal{F}_n which, both, limited their applicability to high-dimensional inference problems. In contrast, the results in this paper hold under simpler conditions on boundedness, continuity, and the strong variance of the approximating Gaussian process. The results are broadly applicable and yield a novel procedure for bootstrapping the distribution of empirical process suprema based on the truncated Karhunen-Lo{\`e}ve decomposition of the approximating Gaussian process. We demonstrate the flexibility of this new bootstrap procedure by applying it to three fundamental problems in high-dimensional statistics: simultaneous inference on parameter vectors, inference on the spectral norm of covariance matrices, and construction of simultaneous confidence bands for functions in reproducing kernel Hilbert spaces.Comment: 95 page

    Eliminating small cells from census counts tables: empirical vs. design transition probabilities

    Get PDF
    The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)

    Biogeochemical Fate of Sediment-Associated PAH: Effect of Animal Processing

    Get PDF
    Biotransformation and fate of polycyclic aromatic hydrocarbons (PAHs) in marine invertebrates and sediment have been studied. Invertebrates can accumulate and metabolize sediment-associated PAHs to polar and aqueous PAH-derived compounds. The objectives of this study are to identify metabolites of PAHs in species of depositfeeding polychaetes and to examine biogeochemical fate and microbial degradation of the identified metabolites. Two metabolites, 1-hydroxypyrene and 1-hydroxypyrene glucuronide, were identified as the primary phase I and phase II metabolites of the tetracyclic PAH pyrene in Nereis diversicolor. Identification was performed using high pressure liquid chromatography with diode array and fluorescence detection (HPLC/DAD/F) and an ion-trap mass spectrometer for positive identification of 1-hydroxypyrene glucuronide. A fast synchronous fluorescence spectrometry (SFS) method was developed for detection of pyrene metabolites in polychaete tissue. A good correlation between 1-hydroxypyrene measured by SFS and HPLC/F was observed. 1-hydroxypyrene was identified as the single phase I metabolite in tissue of three additional polychaete species Nereis virens, Arenicola marina, and Capitella sp.I. A tentative aqueous metabolite identification scheme indicates that Nereid polychaetes predominantly make use of glucuronide conjugation whereas Capitella sp. I. and Arenicola marina appear to utilize sulfate and/or glucoside conjugation. Gut fluid from Nereis virens, Arenicola brasiliensis, and Arenicola marina and Parastichopus californicus could catalyze oxidative coupling of 1-hydroxypyrene in an apparent enzymatic reaction. Oxidative coupling will reduce subsequent bioavailability, toxicity, and transport of PAH metabolites in marine environments by formation of stable covalent bonds. An antioxidant enzyme, presumably a peroxidase, capable of oxidative coupling and with high oxyradical scavenging capacity was tentatively identified in gut fluid from Nereis virens. Nereis virens gut fluid could also catalyse formation of dityrosine, a marker of oxidative damage in proteins. Oxidative coupling of PAHs represents a new sink for organic contaminants in marine sediments and suggests a biological mechanism for the formation of aquatic humic material in general. Production of aqueous and polar metabolites by marine invertebrates does not enhance microbial degradation of pyrene either directly or in co-metabolic processes. The evidence suggests that enhanced degradation of larger PAHs in marine sediments is primarily due to bioturbation and irrigation processes of infauna

    A Bootstrap Hypothesis Test for High-Dimensional Mean Vectors

    Full text link
    This paper is concerned with testing global null hypotheses about population mean vectors of high-dimensional data. Current tests require either strong mixing (independence) conditions on the individual components of the high-dimensional data or high-order moment conditions. In this paper, we propose a novel class of bootstrap hypothesis tests based on p\ell_p-statistics with p[1,]p \in [1, \infty] which requires neither of these assumptions. We study asymptotic size, unbiasedness, consistency, and Bahadur slope of these tests. Capitalizing on these theoretical insights, we develop a modified bootstrap test with improved power properties and a self-normalized bootstrap test for elliptically distributed data. We then propose two novel bias correction procedures to improve the accuracy of the bootstrap test in finite samples, which leverage measure concentration and hypercontractivity properties of p\ell_p-norms in high dimensions. Numerical experiments support our theoretical results in finite samples.Comment: 86 pages, 4 figure

    Inference on Heterogeneous Quantile Treatment Effects via Rank-Score Balancing

    Full text link
    Understanding treatment effect heterogeneity in observational studies is of great practical importance to many scientific fields because the same treatment may affect different individuals differently. Quantile regression provides a natural framework for modeling such heterogeneity. In this paper, we propose a new method for inference on heterogeneous quantile treatment effects that incorporates high-dimensional covariates. Our estimator combines a debiased 1\ell_1-penalized regression adjustment with a quantile-specific covariate balancing scheme. We present a comprehensive study of the theoretical properties of this estimator, including weak convergence of the heterogeneous quantile treatment effect process to the sum of two independent, centered Gaussian processes. We illustrate the finite-sample performance of our approach through Monte Carlo experiments and an empirical example, dealing with the differential effect of mothers' education on infant birth weights.Comment: 94 pages, 3 figures, 2 table
    corecore