298,947 research outputs found

    Near-optimal mean estimators with respect to general norms

    Full text link
    We study the problem of estimating the mean of a random vector in Rd\mathbb{R}^d based on an i.i.d.\ sample, when the accuracy of the estimator is measured by a general norm on Rd\mathbb{R}^d. We construct an estimator (that depends on the norm) that achieves an essentially optimal accuracy/confidence tradeoff under the only assumption that the random vector has a well-defined covariance matrix. The estimator is based on the construction of a uniform median-of-means estimator in a class of real valued functions that may be of independent interest

    Algorithms and Theory for Robust PCA and Phase Retrieval

    Get PDF
    In this dissertation, we investigate two problems, both of which require the recovery of unknowns from measurements that are potentially corrupted by outliers. The first part focuses on the problem of \emph{robust principal component analysis} (PCA), which aims to recover an unknown low-rank matrix from a corrupted and partially-observed matrix. The robust PCA problem, originally nonconvex itself, has been solved via a convex relaxation based approach \emph{principal component pursuit} (PCP) in the literature. However, previous works assume that the sparse errors uniformly spread over the entire matrix and characterize the condition under which PCP guarantees exact recovery. We generalize these results by allowing non-uniform error corruptions over the low-rank matrix and characterize the conditions on the error corruption probability of each individual entry based on the local coherence of the low-rank matrix, under which correct recovery can be guaranteed by PCP. Our results yield new insights on the graph clustering problem beyond the relevant literature. The second part of the thesis studies the phase retrieval problem, which requires recovering an unknown vector from only its magnitude measurements. Differently from the first part, we solve this problem directly via optimizing nonconvex objectives. As the nonconvex objective is often constructed in such a way that the true vector is its global optimizer, the difficulty here is to design algorithms to find the global optimizer efficiently and provably. In order to solve this problem, we propose a gradient-like algorithm named reshaped Wirtinger flow (RWF). For random Gaussian measurements, we show that RWF enjoys linear convergence to a global optimizer as long as the number of measurements is on the order of the dimension of the unknown vector. This achieves the best possible sample complexity as well as the state-of-the-art computational efficiency. Moreover, we study the phase retrieval problem when the measurements are corrupted by adversarial outliers, which models situations with missing data or sensor failures. In order to resist possible observation outliers in an oblivious manner, we propose a novel median truncation approach to modify the nonconvex approach in both the initialization and the gradient descent steps. We apply the median truncation approach to the Poisson loss and the reshaped quadratic loss respectively, and obtain two algorithms \emph{median-TWF} and \emph{median-RWF}. We show that both algorithms recover the signal from a near-optimal number of independent Gaussian measurements, even when a constant fraction of measurements is corrupted. We further show that both algorithms are stable when measurements are corrupted by both sparse arbitrary outliers and dense bounded noises. We establish our results on the performance guarantee via the development of non-trivial concentration measures of the median-related quantities, which can be of independent interest

    Loschmidt echoes in two-body random matrix ensembles

    Full text link
    Fidelity decay is studied for quantum many-body systems with a dominant independent particle Hamiltonian resulting e.g. from a mean field theory with a weak two-body interaction. The diagonal terms of the interaction are included in the unperturbed Hamiltonian, while the off-diagonal terms constitute the perturbation that distorts the echo. We give the linear response solution for this problem in a random matrix framework. While the ensemble average shows no surprising behavior, we find that the typical ensemble member as represented by the median displays a very slow fidelity decay known as ``freeze''. Numerical calculations confirm this result and show, that the ground state even on average displays the freeze. This may contribute to explanation of the ``unreasonable'' success of mean field theories.Comment: 9 pages, 5 figures (6 eps files), RevTex; v2: slight modifications following referees' suggestion

    A cubic algorithm for the generalized rank median of three genomes

    Get PDF
    The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time O(n omega), where omega <= 3, with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.ResultsWe define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.ConclusionsWe test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction14FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULO - FAPESP2016/01511-

    Self organizing maps for outlier detection

    Get PDF
    In this paper we address the problem of multivariate outlier detection using the (unsupervised) self-organizing map (SOM) algorithm introduced by Kohonen. We examine a number of techniques, based on summary statistics and graphics derived from the trained SOM, and conclude that they work well in cooperation with each other. Useful tools include the median interneuron distance matrix and the projection ofthe trained map (via Sammon's projection). SOM quantization errors provide an important complementary source of information for certain type of outlying behavior. Empirical results are reported on both artificial and real data

    Nearness to Local Subspace Algorithm for Subspace and Motion Segmentation

    Get PDF
    There is a growing interest in computer science, engineering, and mathematics for modeling signals in terms of union of subspaces and manifolds. Subspace segmentation and clustering of high dimensional data drawn from a union of subspaces are especially important with many practical applications in computer vision, image and signal processing, communications, and information theory. This paper presents a clustering algorithm for high dimensional data that comes from a union of lower dimensional subspaces of equal and known dimensions. Such cases occur in many data clustering problems, such as motion segmentation and face recognition. The algorithm is reliable in the presence of noise, and applied to the Hopkins 155 Dataset, it generates the best results to date for motion segmentation. The two motion, three motion, and overall segmentation rates for the video sequences are 99.43%, 98.69%, and 99.24%, respectively
    corecore