543 research outputs found

    Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

    Full text link
    Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an â„“0\ell_0-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin

    From Sparse Signals to Sparse Residuals for Robust Sensing

    Full text link
    One of the key challenges in sensor networks is the extraction of information by fusing data from a multitude of distinct, but possibly unreliable sensors. Recovering information from the maximum number of dependable sensors while specifying the unreliable ones is critical for robust sensing. This sensing task is formulated here as that of finding the maximum number of feasible subsystems of linear equations, and proved to be NP-hard. Useful links are established with compressive sampling, which aims at recovering vectors that are sparse. In contrast, the signals here are not sparse, but give rise to sparse residuals. Capitalizing on this form of sparsity, four sensing schemes with complementary strengths are developed. The first scheme is a convex relaxation of the original problem expressed as a second-order cone program (SOCP). It is shown that when the involved sensing matrices are Gaussian and the reliable measurements are sufficiently many, the SOCP can recover the optimal solution with overwhelming probability. The second scheme is obtained by replacing the initial objective function with a concave one. The third and fourth schemes are tailored for noisy sensor data. The noisy case is cast as a combinatorial problem that is subsequently surrogated by a (weighted) SOCP. Interestingly, the derived cost functions fall into the framework of robust multivariate linear regression, while an efficient block-coordinate descent algorithm is developed for their minimization. The robust sensing capabilities of all schemes are verified by simulated tests.Comment: Under review for publication in the IEEE Transactions on Signal Processing (revised version

    Robust Rotation Synchronization via Low-rank and Sparse Matrix Decomposition

    Get PDF
    This paper deals with the rotation synchronization problem, which arises in global registration of 3D point-sets and in structure from motion. The problem is formulated in an unprecedented way as a "low-rank and sparse" matrix decomposition that handles both outliers and missing data. A minimization strategy, dubbed R-GoDec, is also proposed and evaluated experimentally against state-of-the-art algorithms on simulated and real data. The results show that R-GoDec is the fastest among the robust algorithms.Comment: The material contained in this paper is part of a manuscript submitted to CVI

    Foundational principles for large scale inference: Illustrations through correlation mining

    Full text link
    When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number nn of acquired samples (statistical replicates) is far fewer than the number pp of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size nn is fixed, and the dimension pp grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

    Shrinkage Based Particle Filters for Tracking in Wireless Sensor Networks with Correlated Sparse Measurements

    Get PDF
    This thesis focuses on the development of mobile tracking approaches in wireless sensor networks (WSNs) with correlated and sparse measurements. In wireless networks, devices have the ability to transfer information over the network nodes via wireless signals. The strength of a wireless signal at a receiver is referred as the received signal strength (RSS) and many wireless technologies such as Wi-Fi, ZigBee, the Global Positioning Systems (GPS), and other Satellite systems provide the RSS measurements for signal transmission. Due to the availability of RSS measurements, various tracking approaches in WSNs were developed based on the RSS measurements. Unfortunately, the feasibility of tracking using the RSS measurements is highly dependent on the connectivity of the wireless signals. The existing connectivity may be intermittently disrupted due to the low-battery status on the sensor node or temporarily sensor malfunction. In ad-hoc networks, the number of observation of the RSS measurements rapidly changing due to the movements of network nodes and mobile user. As a result, the tracking algorithms have limited data to perform state inference and this prevents accurate tracking. Furthermore, consecutive RSS measurements obtained from nearby sensor nodes exhibit spatio-temporal correlation, which provides extra information to be exploited. Exploiting the statistical information on the measurements noise covariance matrix increases the tracking accuracy. When the number of observations is relatively large, estimating the measurement noise covariance matrix is feasible. However, when they are relatively small, the covariance matrix estimation becomes ill-conditioned and non-invertible. In situations where the RSS measurements are corrupted by outliers, state inference can be misleading. Outliers can come from the sudden environmental disturbances, temporary sensor failures or even from the intrinsic noise of the sensor device. The outliers existence should be considered accordingly to avoid false and poor estimates. This thesis proposes first a shrinkage-based particle filter for mobile tracking in WSNs. It estimates the correlation in the RSS measurement using the shrinkage estimator. The shrinkage estimator overcomes the problems of ill-conditioned and non-invertibility of the measurement noise covariance matrix. The estimated covariance matrix is then applied to the particle filter. Secondly, it develops a robust shrinkage based particle filter for the problem of outliers in the RSS measurements. The proposed algorithm provides a non-parametric shrinkage estimate and represents a multiple model particle filter. The performances of both proposed filters are demonstrated over challenging scenarios for mobile tracking

    Regularized Estimation of High-dimensional Covariance Matrices.

    Full text link
    Many signal processing methods are fundamentally related to the estimation of covariance matrices. In cases where there are a large number of covariates the dimension of covariance matrices is much larger than the number of available data samples. This is especially true in applications where data acquisition is constrained by limited resources such as time, energy, storage and bandwidth. This dissertation attempts to develop necessary components for covariance estimation in the high-dimensional setting. The dissertation makes contributions in two main areas of covariance estimation: (1) high dimensional shrinkage regularized covariance estimation and (2) recursive online complexity regularized estimation with applications of anomaly detection, graph tracking, and compressive sensing. New shrinkage covariance estimation methods are proposed that significantly outperform previous approaches in terms of mean squared error. Two multivariate data scenarios are considered: (1) independently Gaussian distributed data; and (2) heavy tailed elliptically contoured data. For the former scenario we improve on the Ledoit-Wolf (LW) shrinkage estimator using the principle of Rao-Blackwell conditioning and iterative approximation of the clairvoyant estimator. In the latter scenario, we apply a variance normalizing transformation and propose an iterative robust LW shrinkage estimator that is distribution-free within the elliptical family. The proposed robustified estimator is implemented via fixed point iterations with provable convergence and unique limit. A recursive online covariance estimator is proposed for tracking changes in an underlying time-varying graphical model. Covariance estimation is decomposed into multiple decoupled adaptive regression problems. A recursive recursive group lasso is derived using a homotopy approach that generalizes online lasso methods to group sparse system identification. By reducing the memory of the objective function this leads to a group lasso regularized LMS that provably dominates standard LMS. Finally, we introduce a state-of-the-art sampling system, the Modulated Wideband Converter (MWC) which is based on recently developed analog compressive sensing theory. By inferring the block-sparse structures of the high-dimensional covariance matrix from a set of random projections, the MWC is capable of achieving sub-Nyquist sampling for multiband signals with arbitrary carrier frequency over a wide bandwidth.Ph.D.Electrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86396/1/yilun_1.pd

    Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain

    Get PDF
    The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio

    In-network Sparsity-regularized Rank Minimization: Algorithms and Applications

    Full text link
    Given a limited number of entries from the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for distributed sparsity-regularized rank minimization over networks, when the nuclear- and â„“1\ell_1-norm are used as surrogates to the rank and nonzero entry counts of the sought matrices, respectively. While nuclear-norm minimization has well-documented merits when centralized processing is viable, non-separability of the singular-value sum challenges its distributed minimization. To overcome this limitation, an alternative characterization of the nuclear norm is adopted which leads to a separable, yet non-convex cost minimized via the alternating-direction method of multipliers. The novel distributed iterations entail reduced-complexity per-node tasks, and affordable message passing among single-hop neighbors. Interestingly, upon convergence the distributed (non-convex) estimator provably attains the global optimum of its centralized counterpart, regardless of initialization. Several application domains are outlined to highlight the generality and impact of the proposed framework. These include unveiling traffic anomalies in backbone networks, predicting networkwide path latencies, and mapping the RF ambiance using wireless cognitive radios. Simulations with synthetic and real network data corroborate the convergence of the novel distributed algorithm, and its centralized performance guarantees.Comment: 30 pages, submitted for publication on the IEEE Trans. Signal Proces
    • …
    corecore