32,156 research outputs found
Orthogonal Statistical Learning
We provide non-asymptotic excess risk guarantees for statistical learning in
a setting where the population risk with respect to which we evaluate the
target parameter depends on an unknown nuisance parameter that must be
estimated from data. We analyze a two-stage sample splitting meta-algorithm
that takes as input two arbitrary estimation algorithms: one for the target
parameter and one for the nuisance parameter. We show that if the population
risk satisfies a condition called Neyman orthogonality, the impact of the
nuisance estimation error on the excess risk bound achieved by the
meta-algorithm is of second order. Our theorem is agnostic to the particular
algorithms used for the target and nuisance and only makes an assumption on
their individual performance. This enables the use of a plethora of existing
results from statistical learning and machine learning to give new guarantees
for learning with a nuisance component. Moreover, by focusing on excess risk
rather than parameter estimation, we can give guarantees under weaker
assumptions than in previous works and accommodate settings in which the target
parameter belongs to a complex nonparametric class. We provide conditions on
the metric entropy of the nuisance and target classes such that oracle
rates---rates of the same order as if we knew the nuisance parameter---are
achieved. We also derive new rates for specific estimation algorithms such as
variance-penalized empirical risk minimization, neural network estimation and
sparse high-dimensional linear model estimation. We highlight the applicability
of our results in four settings of central importance: 1) heterogeneous
treatment effect estimation, 2) offline policy optimization, 3) domain
adaptation, and 4) learning with missing data
Optimal statistical inference in the presence of systematic uncertainties using neural network optimization based on binned Poisson likelihoods with nuisance parameters
Data analysis in science, e.g., high-energy particle physics, is often
subject to an intractable likelihood if the observables and observations span a
high-dimensional input space. Typically the problem is solved by reducing the
dimensionality using feature engineering and histograms, whereby the latter
technique allows to build the likelihood using Poisson statistics. However, in
the presence of systematic uncertainties represented by nuisance parameters in
the likelihood, the optimal dimensionality reduction with a minimal loss of
information about the parameters of interest is not known. This work presents a
novel strategy to construct the dimensionality reduction with neural networks
for feature engineering and a differential formulation of histograms so that
the full workflow can be optimized with the result of the statistical
inference, e.g., the variance of a parameter of interest, as objective. We
discuss how this approach results in an estimate of the parameters of interest
that is close to optimal and the applicability of the technique is demonstrated
with a simple example based on pseudo-experiments and a more complex example
from high-energy particle physics
Preprocessing Solar Images while Preserving their Latent Structure
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics
Observatory, a NASA satellite, collect massive streams of high resolution
images of the Sun through multiple wavelength filters. Reconstructing
pixel-by-pixel thermal properties based on these images can be framed as an
ill-posed inverse problem with Poisson noise, but this reconstruction is
computationally expensive and there is disagreement among researchers about
what regularization or prior assumptions are most appropriate. This article
presents an image segmentation framework for preprocessing such images in order
to reduce the data volume while preserving as much thermal information as
possible for later downstream analyses. The resulting segmented images reflect
thermal properties but do not depend on solving the ill-posed inverse problem.
This allows users to avoid the Poisson inverse problem altogether or to tackle
it on each of 10 segments rather than on each of 10 pixels,
reducing computing time by a factor of 10. We employ a parametric
class of dissimilarities that can be expressed as cosine dissimilarity
functions or Hellinger distances between nonlinearly transformed vectors of
multi-passband observations in each pixel. We develop a decision theoretic
framework for choosing the dissimilarity that minimizes the expected loss that
arises when estimating identifiable thermal properties based on segmented
images rather than on a pixel-by-pixel basis. We also examine the efficacy of
different dissimilarities for recovering clusters in the underlying thermal
properties. The expected losses are computed under scientifically motivated
prior distributions. Two simulation studies guide our choices of dissimilarity
function. We illustrate our method by segmenting images of a coronal hole
observed on 26 February 2015
Visual Representations: Defining Properties and Deep Approximations
Visual representations are defined in terms of minimal sufficient statistics
of visual data, for a class of tasks, that are also invariant to nuisance
variability. Minimal sufficiency guarantees that we can store a representation
in lieu of raw data with smallest complexity and no performance loss on the
task at hand. Invariance guarantees that the statistic is constant with respect
to uninformative transformations of the data. We derive analytical expressions
for such representations and show they are related to feature descriptors
commonly used in computer vision, as well as to convolutional neural networks.
This link highlights the assumptions and approximations tacitly assumed by
these methods and explains empirical practices such as clamping, pooling and
joint normalization.Comment: UCLA CSD TR140023, Nov. 12, 2014, revised April 13, 2015, November
13, 2015, February 28, 201
Machine Learning for Set-Identified Linear Models
This paper provides estimation and inference methods for an identified set
where the selection among a very large number of covariates is based on modern
machine learning tools. I characterize the boundary of the identified set
(i.e., support function) using a semiparametric moment condition. Combining
Neyman-orthogonality and sample splitting ideas, I construct a root-N
consistent, uniformly asymptotically Gaussian estimator of the support function
and propose a weighted bootstrap procedure to conduct inference about the
identified set. I provide a general method to construct a Neyman-orthogonal
moment condition for the support function. Applying my method to Lee (2008)'s
endogenous selection model, I provide the asymptotic theory for the sharp
(i.e., the tightest possible) bounds on the Average Treatment Effect in the
presence of high-dimensional covariates. Furthermore, I relax the conventional
monotonicity assumption and allow the sign of the treatment effect on the
selection (e.g., employment) to be determined by covariates. Using JobCorps
data set with very rich baseline characteristics, I substantially tighten the
bounds on the JobCorps effect on wages under weakened monotonicity assumption
- …