494 research outputs found
On High-Dimensional Misspecified Quantile Regression
In this dissertation we develop theory for inference and uncertainty quantification for potentially misspecified quantile regression processes when the number of predictor variables increases with or exceeds the sample size. Potential misspecification of the fitted model is a fundamental problem in statistics which is exacerbated by today's high-dimensional datasets, and quantile regression is often used in complex situations in which misspecifications are highly likely. We make the following contributions: First, we establish a uniform-in-model strong Bahadur representation for misspecified quantile regression processes when the number of predictor variables increases and provide tight error bounds on its remainder term which hold uniformly over growing collections of quantile regression functions. Second, we derive an almost sure de-biased representation of the Lasso-penalized high-dimensional misspecified quantile regression process and analyze the theoretical properties of the misspecified post-Lasso quantile regression estimator. Third, to quantify the uncertainty associated with a misspecified quantile regression function we analyze its predictive risk and expected optimism. We propose uniformly consistent estimators for both quantities when the number of regression functions is growing moderately with the sample size. Empirical evidence shows that our estimators perform favorably against cross-validation estimates. Forth, we develop a set of new exponential and maximal inequalities which allow to control the fluctuations of a collection of suprema of empirical processes over classes of unbounded functions when both the collection of function classes and the complexity of each individual the function class grow with the sample size. These new inequalities are instrumental in deriving the theoretical results in this dissertation.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146023/1/giessing_1.pd
Anti-concentration of Suprema of Gaussian Processes and Gaussian Order Statistics
We derive, up to a constant factor, matching lower and upper bounds on the
concentration functions of suprema of separable centered Gaussian processes and
order statistics of Gaussian random fields. These bounds reveal that suprema of
separable centered Gaussian processes exhibit the same
anti-concentration properties as a single Gaussian random variable with mean
zero and variance . To apply these results to
high-dimensional statistical problems, it is therefore essential to understand
the asymptotic behavior of as the dimension
or metric entropy of the index set increases. Consequently, we also derive
lower and upper bounds on this quantity.Comment: 2
Eliminating small cells from census counts tables: empirical vs. design transition probabilities
The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg
and has been in regular use there for several years now. It involves an algorithm that yields a
controlled cell frequency perturbation. When a microdata set has been protected by this method,
any table which can be computed on the basis of this microdata set will not contain any small cells,
e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities
resulting from this pre-tabular method to transition matrices in the context of variants of microdata
key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N.,
Young, C. (2008) and Fraser, B.,Wooton, J. (2006)Peer Reviewe
Gaussian and Bootstrap Approximations for Suprema of Empirical Processes
In this paper we develop non-asymptotic Gaussian approximation results for
the sampling distribution of suprema of empirical processes when the indexing
function class varies with the sample size and may not be
Donsker. Prior approximations of this type required upper bounds on the metric
entropy of and uniform lower bounds on the variance of which, both, limited their applicability to high-dimensional
inference problems. In contrast, the results in this paper hold under simpler
conditions on boundedness, continuity, and the strong variance of the
approximating Gaussian process. The results are broadly applicable and yield a
novel procedure for bootstrapping the distribution of empirical process suprema
based on the truncated Karhunen-Lo{\`e}ve decomposition of the approximating
Gaussian process. We demonstrate the flexibility of this new bootstrap
procedure by applying it to three fundamental problems in high-dimensional
statistics: simultaneous inference on parameter vectors, inference on the
spectral norm of covariance matrices, and construction of simultaneous
confidence bands for functions in reproducing kernel Hilbert spaces.Comment: 95 page
Eliminating small cells from census counts tables: empirical vs. design transition probabilities
The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)
Biogeochemical Fate of Sediment-Associated PAH: Effect of Animal Processing
Biotransformation and fate of polycyclic aromatic hydrocarbons (PAHs) in marine invertebrates and sediment have been studied. Invertebrates can accumulate and metabolize sediment-associated PAHs to polar and aqueous PAH-derived compounds. The objectives of this study are to identify metabolites of PAHs in species of depositfeeding polychaetes and to examine biogeochemical fate and microbial degradation of the identified metabolites. Two metabolites, 1-hydroxypyrene and 1-hydroxypyrene glucuronide, were identified as the primary phase I and phase II metabolites of the tetracyclic PAH pyrene in Nereis diversicolor. Identification was performed using high pressure liquid chromatography with diode array and fluorescence detection (HPLC/DAD/F) and an ion-trap mass spectrometer for positive identification of 1-hydroxypyrene glucuronide. A fast synchronous fluorescence spectrometry (SFS) method was developed for detection of pyrene metabolites in polychaete tissue. A good correlation between 1-hydroxypyrene measured by SFS and HPLC/F was observed. 1-hydroxypyrene was identified as the single phase I metabolite in tissue of three additional polychaete species Nereis virens, Arenicola marina, and Capitella sp.I. A tentative aqueous metabolite identification scheme indicates that Nereid polychaetes predominantly make use of glucuronide conjugation whereas Capitella sp. I. and Arenicola marina appear to utilize sulfate and/or glucoside conjugation. Gut fluid from Nereis virens, Arenicola brasiliensis, and Arenicola marina and Parastichopus californicus could catalyze oxidative coupling of 1-hydroxypyrene in an apparent enzymatic reaction. Oxidative coupling will reduce subsequent bioavailability, toxicity, and transport of PAH metabolites in marine environments by formation of stable covalent bonds. An antioxidant enzyme, presumably a peroxidase, capable of oxidative coupling and with high oxyradical scavenging capacity was tentatively identified in gut fluid from Nereis virens. Nereis virens gut fluid could also catalyse formation of dityrosine, a marker of oxidative damage in proteins. Oxidative coupling of PAHs represents a new sink for organic contaminants in marine sediments and suggests a biological mechanism for the formation of aquatic humic material in general. Production of aqueous and polar metabolites by marine invertebrates does not enhance microbial degradation of pyrene either directly or in co-metabolic processes. The evidence suggests that enhanced degradation of larger PAHs in marine sediments is primarily due to bioturbation and irrigation processes of infauna
A Bootstrap Hypothesis Test for High-Dimensional Mean Vectors
This paper is concerned with testing global null hypotheses about population
mean vectors of high-dimensional data. Current tests require either strong
mixing (independence) conditions on the individual components of the
high-dimensional data or high-order moment conditions. In this paper, we
propose a novel class of bootstrap hypothesis tests based on
-statistics with which requires neither of these
assumptions. We study asymptotic size, unbiasedness, consistency, and Bahadur
slope of these tests. Capitalizing on these theoretical insights, we develop a
modified bootstrap test with improved power properties and a self-normalized
bootstrap test for elliptically distributed data. We then propose two novel
bias correction procedures to improve the accuracy of the bootstrap test in
finite samples, which leverage measure concentration and hypercontractivity
properties of -norms in high dimensions. Numerical experiments support
our theoretical results in finite samples.Comment: 86 pages, 4 figure
Inference on Heterogeneous Quantile Treatment Effects via Rank-Score Balancing
Understanding treatment effect heterogeneity in observational studies is of
great practical importance to many scientific fields because the same treatment
may affect different individuals differently. Quantile regression provides a
natural framework for modeling such heterogeneity. In this paper, we propose a
new method for inference on heterogeneous quantile treatment effects that
incorporates high-dimensional covariates. Our estimator combines a debiased
-penalized regression adjustment with a quantile-specific covariate
balancing scheme. We present a comprehensive study of the theoretical
properties of this estimator, including weak convergence of the heterogeneous
quantile treatment effect process to the sum of two independent, centered
Gaussian processes. We illustrate the finite-sample performance of our approach
through Monte Carlo experiments and an empirical example, dealing with the
differential effect of mothers' education on infant birth weights.Comment: 94 pages, 3 figures, 2 table
- …