298,947 research outputs found
Near-optimal mean estimators with respect to general norms
We study the problem of estimating the mean of a random vector in
based on an i.i.d.\ sample, when the accuracy of the estimator
is measured by a general norm on . We construct an estimator
(that depends on the norm) that achieves an essentially optimal
accuracy/confidence tradeoff under the only assumption that the random vector
has a well-defined covariance matrix. The estimator is based on the
construction of a uniform median-of-means estimator in a class of real valued
functions that may be of independent interest
Algorithms and Theory for Robust PCA and Phase Retrieval
In this dissertation, we investigate two problems, both of which require the recovery of unknowns from measurements that are potentially corrupted by outliers. The first part focuses on the problem of \emph{robust principal component analysis} (PCA), which aims to recover an unknown low-rank matrix from a corrupted and partially-observed matrix.
The robust PCA problem, originally nonconvex itself, has been solved via a convex relaxation based approach \emph{principal component pursuit} (PCP) in the literature.
However, previous works assume that the sparse errors uniformly spread over the entire matrix and characterize the condition under which PCP guarantees exact recovery. We generalize these results by allowing non-uniform error corruptions over the low-rank matrix and characterize the conditions on the error corruption probability of each individual entry based on the local coherence of the low-rank matrix, under which correct recovery can be guaranteed by PCP. Our results yield new insights on the graph clustering problem beyond the relevant literature.
The second part of the thesis studies the phase retrieval problem, which requires recovering an unknown vector from only its magnitude measurements. Differently from the first part, we solve this problem directly via optimizing nonconvex objectives. As the nonconvex objective is often constructed in such a way that the true vector is its global optimizer, the difficulty here is to design algorithms to find the global optimizer efficiently and provably.
In order to solve this problem, we propose a gradient-like algorithm named reshaped Wirtinger flow (RWF). For random Gaussian measurements, we show that RWF enjoys linear convergence to a global optimizer as long as the number of measurements is on the order of the dimension of the unknown vector. This achieves the best possible sample complexity as well as the state-of-the-art computational efficiency.
Moreover, we study the phase retrieval problem when the measurements are corrupted by adversarial outliers, which models situations with missing data or sensor failures. In order to resist possible observation outliers in an oblivious manner, we propose a novel median truncation approach to modify the nonconvex approach in both the initialization and the gradient descent steps. We apply the median truncation approach to the Poisson loss and the reshaped quadratic loss respectively, and obtain two algorithms \emph{median-TWF} and \emph{median-RWF}. We show that both algorithms recover the signal from a near-optimal number of independent Gaussian measurements, even when a constant fraction of measurements is corrupted. We further show that both algorithms are stable when measurements are corrupted by both sparse arbitrary outliers and dense bounded noises. We establish our results on the performance guarantee via the development of non-trivial concentration measures of the median-related quantities, which can be of independent interest
Loschmidt echoes in two-body random matrix ensembles
Fidelity decay is studied for quantum many-body systems with a dominant
independent particle Hamiltonian resulting e.g. from a mean field theory with a
weak two-body interaction. The diagonal terms of the interaction are included
in the unperturbed Hamiltonian, while the off-diagonal terms constitute the
perturbation that distorts the echo. We give the linear response solution for
this problem in a random matrix framework. While the ensemble average shows no
surprising behavior, we find that the typical ensemble member as represented by
the median displays a very slow fidelity decay known as ``freeze''. Numerical
calculations confirm this result and show, that the ground state even on
average displays the freeze. This may contribute to explanation of the
``unreasonable'' success of mean field theories.Comment: 9 pages, 5 figures (6 eps files), RevTex; v2: slight modifications
following referees' suggestion
A cubic algorithm for the generalized rank median of three genomes
The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O(n omega), where omega <= 3, with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.ResultsWe define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.ConclusionsWe test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction14FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP2016/01511-
Self organizing maps for outlier detection
In this paper we address the problem of multivariate outlier detection using the (unsupervised) self-organizing map (SOM) algorithm introduced by Kohonen. We examine a number of techniques, based on summary statistics and graphics derived from the trained SOM, and conclude that they work well in cooperation with each other. Useful tools include the median interneuron distance matrix and the projection ofthe trained map (via Sammon's projection). SOM quantization errors provide an important complementary source of information for certain type of outlying behavior. Empirical results are reported on both artificial and real data
Nearness to Local Subspace Algorithm for Subspace and Motion Segmentation
There is a growing interest in computer science, engineering, and mathematics
for modeling signals in terms of union of subspaces and manifolds. Subspace
segmentation and clustering of high dimensional data drawn from a union of
subspaces are especially important with many practical applications in computer
vision, image and signal processing, communications, and information theory.
This paper presents a clustering algorithm for high dimensional data that comes
from a union of lower dimensional subspaces of equal and known dimensions. Such
cases occur in many data clustering problems, such as motion segmentation and
face recognition. The algorithm is reliable in the presence of noise, and
applied to the Hopkins 155 Dataset, it generates the best results to date for
motion segmentation. The two motion, three motion, and overall segmentation
rates for the video sequences are 99.43%, 98.69%, and 99.24%, respectively
- …