32,156 research outputs found

    Orthogonal Statistical Learning

    Full text link
    We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data. We analyze a two-stage sample splitting meta-algorithm that takes as input two arbitrary estimation algorithms: one for the target parameter and one for the nuisance parameter. We show that if the population risk satisfies a condition called Neyman orthogonality, the impact of the nuisance estimation error on the excess risk bound achieved by the meta-algorithm is of second order. Our theorem is agnostic to the particular algorithms used for the target and nuisance and only makes an assumption on their individual performance. This enables the use of a plethora of existing results from statistical learning and machine learning to give new guarantees for learning with a nuisance component. Moreover, by focusing on excess risk rather than parameter estimation, we can give guarantees under weaker assumptions than in previous works and accommodate settings in which the target parameter belongs to a complex nonparametric class. We provide conditions on the metric entropy of the nuisance and target classes such that oracle rates---rates of the same order as if we knew the nuisance parameter---are achieved. We also derive new rates for specific estimation algorithms such as variance-penalized empirical risk minimization, neural network estimation and sparse high-dimensional linear model estimation. We highlight the applicability of our results in four settings of central importance: 1) heterogeneous treatment effect estimation, 2) offline policy optimization, 3) domain adaptation, and 4) learning with missing data

    Optimal statistical inference in the presence of systematic uncertainties using neural network optimization based on binned Poisson likelihoods with nuisance parameters

    Get PDF
    Data analysis in science, e.g., high-energy particle physics, is often subject to an intractable likelihood if the observables and observations span a high-dimensional input space. Typically the problem is solved by reducing the dimensionality using feature engineering and histograms, whereby the latter technique allows to build the likelihood using Poisson statistics. However, in the presence of systematic uncertainties represented by nuisance parameters in the likelihood, the optimal dimensionality reduction with a minimal loss of information about the parameters of interest is not known. This work presents a novel strategy to construct the dimensionality reduction with neural networks for feature engineering and a differential formulation of histograms so that the full workflow can be optimized with the result of the statistical inference, e.g., the variance of a parameter of interest, as objective. We discuss how this approach results in an estimate of the parameters of interest that is close to optimal and the applicability of the technique is demonstrated with a simple example based on pseudo-experiments and a more complex example from high-energy particle physics

    Preprocessing Solar Images while Preserving their Latent Structure

    Get PDF
    Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics Observatory, a NASA satellite, collect massive streams of high resolution images of the Sun through multiple wavelength filters. Reconstructing pixel-by-pixel thermal properties based on these images can be framed as an ill-posed inverse problem with Poisson noise, but this reconstruction is computationally expensive and there is disagreement among researchers about what regularization or prior assumptions are most appropriate. This article presents an image segmentation framework for preprocessing such images in order to reduce the data volume while preserving as much thermal information as possible for later downstream analyses. The resulting segmented images reflect thermal properties but do not depend on solving the ill-posed inverse problem. This allows users to avoid the Poisson inverse problem altogether or to tackle it on each of \sim10 segments rather than on each of \sim107^7 pixels, reducing computing time by a factor of \sim106^6. We employ a parametric class of dissimilarities that can be expressed as cosine dissimilarity functions or Hellinger distances between nonlinearly transformed vectors of multi-passband observations in each pixel. We develop a decision theoretic framework for choosing the dissimilarity that minimizes the expected loss that arises when estimating identifiable thermal properties based on segmented images rather than on a pixel-by-pixel basis. We also examine the efficacy of different dissimilarities for recovering clusters in the underlying thermal properties. The expected losses are computed under scientifically motivated prior distributions. Two simulation studies guide our choices of dissimilarity function. We illustrate our method by segmenting images of a coronal hole observed on 26 February 2015

    Visual Representations: Defining Properties and Deep Approximations

    Full text link
    Visual representations are defined in terms of minimal sufficient statistics of visual data, for a class of tasks, that are also invariant to nuisance variability. Minimal sufficiency guarantees that we can store a representation in lieu of raw data with smallest complexity and no performance loss on the task at hand. Invariance guarantees that the statistic is constant with respect to uninformative transformations of the data. We derive analytical expressions for such representations and show they are related to feature descriptors commonly used in computer vision, as well as to convolutional neural networks. This link highlights the assumptions and approximations tacitly assumed by these methods and explains empirical practices such as clamping, pooling and joint normalization.Comment: UCLA CSD TR140023, Nov. 12, 2014, revised April 13, 2015, November 13, 2015, February 28, 201

    Machine Learning for Set-Identified Linear Models

    Full text link
    This paper provides estimation and inference methods for an identified set where the selection among a very large number of covariates is based on modern machine learning tools. I characterize the boundary of the identified set (i.e., support function) using a semiparametric moment condition. Combining Neyman-orthogonality and sample splitting ideas, I construct a root-N consistent, uniformly asymptotically Gaussian estimator of the support function and propose a weighted bootstrap procedure to conduct inference about the identified set. I provide a general method to construct a Neyman-orthogonal moment condition for the support function. Applying my method to Lee (2008)'s endogenous selection model, I provide the asymptotic theory for the sharp (i.e., the tightest possible) bounds on the Average Treatment Effect in the presence of high-dimensional covariates. Furthermore, I relax the conventional monotonicity assumption and allow the sign of the treatment effect on the selection (e.g., employment) to be determined by covariates. Using JobCorps data set with very rich baseline characteristics, I substantially tighten the bounds on the JobCorps effect on wages under weakened monotonicity assumption
    corecore