311 research outputs found

    L0 Sparse Inverse Covariance Estimation

    Full text link
    Recently, there has been focus on penalized log-likelihood covariance estimation for sparse inverse covariance (precision) matrices. The penalty is responsible for inducing sparsity, and a very common choice is the convex l1l_1 norm. However, the best estimator performance is not always achieved with this penalty. The most natural sparsity promoting "norm" is the non-convex l0l_0 penalty but its lack of convexity has deterred its use in sparse maximum likelihood estimation. In this paper we consider non-convex l0l_0 penalized log-likelihood inverse covariance estimation and present a novel cyclic descent algorithm for its optimization. Convergence to a local minimizer is proved, which is highly non-trivial, and we demonstrate via simulations the reduced bias and superior quality of the l0l_0 penalty as compared to the l1l_1 penalty

    Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions

    Full text link
    In this work we consider the estimation of spatio-temporal covariance matrices in the low sample non-Gaussian regime. We impose covariance structure in the form of a sum of Kronecker products decomposition (Tsiligkaridis et al. 2013, Greenewald et al. 2013) with diagonal correction (Greenewald et al.), which we refer to as DC-KronPCA, in the estimation of multiframe covariance matrices. This paper extends the approaches of (Tsiligkaridis et al.) in two directions. First, we modify the diagonally corrected method of (Greenewald et al.) to include a block Toeplitz constraint imposing temporal stationarity structure. Second, we improve the conditioning of the estimate in the very low sample regime by using Ledoit-Wolf type shrinkage regularization similar to (Chen, Hero et al. 2010). For improved robustness to heavy tailed distributions, we modify the KronPCA to incorporate robust shrinkage estimation (Chen, Hero et al. 2011). Results of numerical simulations establish benefits in terms of estimation MSE when compared to previous methods. Finally, we apply our methods to a real-world network spatio-temporal anomaly detection problem and achieve superior results.Comment: To appear at IEEE SSP 2014 4 page

    Scalable Hash-Based Estimation of Divergence Measures

    Full text link
    We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables XX and YY whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to f-divergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the H\"{o}lder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order d/2d/2, where dd is the dimension of samples, the optimal parametric MSE rate of O(1/N)O(1/N) can be achieved. The computational complexity is shown to be O(N)O(N), which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.Comment: 11 pages, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, Lanzarote, Spai

    Decomposable Principal Component Analysis

    Full text link
    We consider principal component analysis (PCA) in decomposable Gaussian graphical models. We exploit the prior information in these models in order to distribute its computation. For this purpose, we reformulate the problem in the sparse inverse covariance (concentration) domain and solve the global eigenvalue problem using a sequence of local eigenvalue problems in each of the cliques of the decomposable graph. We demonstrate the application of our methodology in the context of decentralized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical graphical model and distribute the computation of PCA

    Node Removal Vulnerability of the Largest Component of a Network

    Full text link
    The connectivity structure of a network can be very sensitive to removal of certain nodes in the network. In this paper, we study the sensitivity of the largest component size to node removals. We prove that minimizing the largest component size is equivalent to solving a matrix one-norm minimization problem whose column vectors are orthogonal and sparse and they form a basis of the null space of the associated graph Laplacian matrix. A greedy node removal algorithm is then proposed based on the matrix one-norm minimization. In comparison with other node centralities such as node degree and betweenness, experimental results on US power grid dataset validate the effectiveness of the proposed approach in terms of reduction of the largest component size with relatively few node removals.Comment: Published in IEEE GlobalSIP 201

    Dynamic Metric Learning from Pairwise Comparisons

    Full text link
    Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, demonstrate RICE-OCELAD on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.Comment: to appear Allerton 2016. arXiv admin note: substantial text overlap with arXiv:1603.0367
    • …
    corecore