1,792 research outputs found
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Tensor Regression with Applications in Neuroimaging Data Analysis
Classical regression methods treat covariates as a vector and estimate a
corresponding vector of regression coefficients. Modern applications in medical
imaging generate covariates of more complex form such as multidimensional
arrays (tensors). Traditional statistical and computational methods are proving
insufficient for analysis of these high-throughput data due to their ultrahigh
dimensionality as well as complex structure. In this article, we propose a new
family of tensor regression models that efficiently exploit the special
structure of tensor covariates. Under this framework, ultrahigh dimensionality
is reduced to a manageable level, resulting in efficient estimation and
prediction. A fast and highly scalable estimation algorithm is proposed for
maximum likelihood estimation and its associated asymptotic properties are
studied. Effectiveness of the new methods is demonstrated on both synthetic and
real MRI imaging data.Comment: 27 pages, 4 figure
Sparse variational regularization for visual motion estimation
The computation of visual motion is a key component in numerous computer vision tasks such as object detection, visual object tracking and activity recognition. Despite exten- sive research effort, efficient handling of motion discontinuities, occlusions and illumina- tion changes still remains elusive in visual motion estimation. The work presented in this thesis utilizes variational methods to handle the aforementioned problems because these methods allow the integration of various mathematical concepts into a single en- ergy minimization framework. This thesis applies the concepts from signal sparsity to the variational regularization for visual motion estimation. The regularization is designed in such a way that it handles motion discontinuities and can detect object occlusions
Semi-Global Stereo Matching with Surface Orientation Priors
Semi-Global Matching (SGM) is a widely-used efficient stereo matching
technique. It works well for textured scenes, but fails on untextured slanted
surfaces due to its fronto-parallel smoothness assumption. To remedy this
problem, we propose a simple extension, termed SGM-P, to utilize precomputed
surface orientation priors. Such priors favor different surface slants in
different 2D image regions or 3D scene regions and can be derived in various
ways. In this paper we evaluate plane orientation priors derived from stereo
matching at a coarser resolution and show that such priors can yield
significant performance gains for difficult weakly-textured scenes. We also
explore surface normal priors derived from Manhattan-world assumptions, and we
analyze the potential performance gains using oracle priors derived from
ground-truth data. SGM-P only adds a minor computational overhead to SGM and is
an attractive alternative to more complex methods employing higher-order
smoothness terms.Comment: extended draft of 3DV 2017 (spotlight) pape
Matching in Selective and Balanced Representation Space for Treatment Effects Estimation
The dramatically growing availability of observational data is being
witnessed in various domains of science and technology, which facilitates the
study of causal inference. However, estimating treatment effects from
observational data is faced with two major challenges, missing counterfactual
outcomes and treatment selection bias. Matching methods are among the most
widely used and fundamental approaches to estimating treatment effects, but
existing matching methods have poor performance when facing data with high
dimensional and complicated variables. We propose a feature selection
representation matching (FSRM) method based on deep representation learning and
matching, which maps the original covariate space into a selective, nonlinear,
and balanced representation space, and then conducts matching in the learned
representation space. FSRM adopts deep feature selection to minimize the
influence of irrelevant variables for estimating treatment effects and
incorporates a regularizer based on the Wasserstein distance to learn balanced
representations. We evaluate the performance of our FSRM method on three
datasets, and the results demonstrate superiority over the state-of-the-art
methods.Comment: Proceedings of the 29th ACM International Conference on Information
and Knowledge Management (CIKM '20
An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure
We present an exploration of the rich theoretical connections between several
classes of regularized models, network flows, and recent results in submodular
function theory. This work unifies key aspects of these problems under a common
theory, leading to novel methods for working with several important models of
interest in statistics, machine learning and computer vision.
In Part 1, we review the concepts of network flows and submodular function
optimization theory foundational to our results. We then examine the
connections between network flows and the minimum-norm algorithm from
submodular optimization, extending and improving several current results. This
leads to a concise representation of the structure of a large class of pairwise
regularized models important in machine learning, statistics and computer
vision.
In Part 2, we describe the full regularization path of a class of penalized
regression problems with dependent variables that includes the graph-guided
LASSO and total variation constrained models. This description also motivates a
practical algorithm. This allows us to efficiently find the regularization path
of the discretized version of TV penalized models. Ultimately, our new
algorithms scale up to high-dimensional problems with millions of variables
- …