27,919 research outputs found
Speeding up Permutation Testing in Neuroimaging
Multiple hypothesis testing is a significant problem in nearly all
neuroimaging studies. In order to correct for this phenomena, we require a
reliable estimate of the Family-Wise Error Rate (FWER). The well known
Bonferroni correction method, while simple to implement, is quite conservative,
and can substantially under-power a study because it ignores dependencies
between test statistics. Permutation testing, on the other hand, is an exact,
non-parametric method of estimating the FWER for a given -threshold,
but for acceptably low thresholds the computational burden can be prohibitive.
In this paper, we show that permutation testing in fact amounts to populating
the columns of a very large matrix . By analyzing the spectrum of this
matrix, under certain conditions, we see that has a low-rank plus a
low-variance residual decomposition which makes it suitable for highly
sub--sampled --- on the order of --- matrix completion methods. Based
on this observation, we propose a novel permutation testing methodology which
offers a large speedup, without sacrificing the fidelity of the estimated FWER.
Our evaluations on four different neuroimaging datasets show that a
computational speedup factor of roughly can be achieved while
recovering the FWER distribution up to very high accuracy. Further, we show
that the estimated -threshold is also recovered faithfully, and is
stable.Comment: NIPS 1
Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs
We consider a problem of grouping multiple graphs into several clusters using
singular value thesholding and non-negative factorization. We derive a model
selection information criterion to estimate the number of clusters. We
demonstrate our approach using "Swimmer data set" as well as simulated data
set, and compare its performance with two standard clustering algorithms.Comment: This paper has been withdrawn by the author due to a newer version
with overlapping content
Assessing Information Bias and Food Safety
Imperfect information can lead to market failure and be an external factor impacting managers of agribusiness firms. A matrix method approach to content analysis was conducted by independent judges based upon established typologies. Food safety articles from consumer publications were examined, and information received by consumers was found to be biased.food safety, information bias, consumers, media, Food Consumption/Nutrition/Food Safety, Marketing, Q10, Q13, Q16,
Sparse Recovery via Differential Inclusions
In this paper, we recover sparse signals from their noisy linear measurements
by solving nonlinear differential inclusions, which is based on the notion of
inverse scale space (ISS) developed in applied mathematics. Our goal here is to
bring this idea to address a challenging problem in statistics, \emph{i.e.}
finding the oracle estimator which is unbiased and sign-consistent using
dynamics. We call our dynamics \emph{Bregman ISS} and \emph{Linearized Bregman
ISS}. A well-known shortcoming of LASSO and any convex regularization
approaches lies in the bias of estimators. However, we show that under proper
conditions, there exists a bias-free and sign-consistent point on the solution
paths of such dynamics, which corresponds to a signal that is the unbiased
estimate of the true signal and whose entries have the same signs as those of
the true signs, \emph{i.e.} the oracle estimator. Therefore, their solution
paths are regularization paths better than the LASSO regularization path, since
the points on the latter path are biased when sign-consistency is reached. We
also show how to efficiently compute their solution paths in both continuous
and discretized settings: the full solution paths can be exactly computed piece
by piece, and a discretization leads to \emph{Linearized Bregman iteration},
which is a simple iterative thresholding rule and easy to parallelize.
Theoretical guarantees such as sign-consistency and minimax optimal -error
bounds are established in both continuous and discrete settings for specific
points on the paths. Early-stopping rules for identifying these points are
given. The key treatment relies on the development of differential inequalities
for differential inclusions and their discretizations, which extends the
previous results and leads to exponentially fast recovering of sparse signals
before selecting wrong ones.Comment: In Applied and Computational Harmonic Analysis, 201
The Augmented Synthetic Control Method
The synthetic control method (SCM) is a popular approach for estimating the
impact of a treatment on a single unit in panel data settings. The "synthetic
control" is a weighted average of control units that balances the treated
unit's pre-treatment outcomes as closely as possible. A critical feature of the
original proposal is to use SCM only when the fit on pre-treatment outcomes is
excellent. We propose Augmented SCM as an extension of SCM to settings where
such pre-treatment fit is infeasible. Analogous to bias correction for inexact
matching, Augmented SCM uses an outcome model to estimate the bias due to
imperfect pre-treatment fit and then de-biases the original SCM estimate. Our
main proposal, which uses ridge regression as the outcome model, directly
controls pre-treatment fit while minimizing extrapolation from the convex hull.
This estimator can also be expressed as a solution to a modified synthetic
controls problem that allows negative weights on some donor units. We bound the
estimation error of this approach under different data generating processes,
including a linear factor model, and show how regularization helps to avoid
over-fitting to noise. We demonstrate gains from Augmented SCM with extensive
simulation studies and apply this framework to estimate the impact of the 2012
Kansas tax cuts on economic growth. We implement the proposed method in the new
augsynth R package
On the Power of Adaptivity in Matrix Completion and Approximation
We consider the related tasks of matrix completion and matrix approximation
from missing data and propose adaptive sampling procedures for both problems.
We show that adaptive sampling allows one to eliminate standard incoherence
assumptions on the matrix row space that are necessary for passive sampling
procedures. For exact recovery of a low-rank matrix, our algorithm judiciously
selects a few columns to observe in full and, with few additional measurements,
projects the remaining columns onto their span. This algorithm exactly recovers
an rank matrix using observations,
where is a coherence parameter on the column space of the matrix. In
addition to completely eliminating any row space assumptions that have pervaded
the literature, this algorithm enjoys a better sample complexity than any
existing matrix completion algorithm. To certify that this improvement is due
to adaptive sampling, we establish that row space coherence is necessary for
passive sampling algorithms to achieve non-trivial sample complexity bounds.
For constructing a low-rank approximation to a high-rank input matrix, we
propose a simple algorithm that thresholds the singular values of a zero-filled
version of the input matrix. The algorithm computes an approximation that is
nearly as good as the best rank- approximation using
samples, where is a slightly different coherence parameter on the matrix
columns. Again we eliminate assumptions on the row space
Relevance Singular Vector Machine for low-rank matrix sensing
In this paper we develop a new Bayesian inference method for low rank matrix
reconstruction. We call the new method the Relevance Singular Vector Machine
(RSVM) where appropriate priors are defined on the singular vectors of the
underlying matrix to promote low rank. To accelerate computations, a
numerically efficient approximation is developed. The proposed algorithms are
applied to matrix completion and matrix reconstruction problems and their
performance is studied numerically.Comment: International Conference on Signal Processing and Communications
(SPCOM), 5 page
- …