43 research outputs found

    Motion Segmentation Using Global and Local Sparse Subspace Optimization

    Get PDF

    On the Application of Dictionary Learning to Image Compression

    Get PDF
    Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000

    Sparse subspace clustering-based motion segmentation with complete occlusion handling

    Get PDF
    Motion segmentation is part of the computer vision field and aims to find the moving parts in a video sequence. It is used in applications such as autonomous driving, surveillance, robotics, human motion analysis, and video indexing. Since there are so many applications, motion segmentation is ill-defined and the research field is vast. Despite the advances in the research over the years, the existing methods are still far behind human capabilities. Problems such as changes in illumination, camera motion, noise, mixtures of motion, missing data, and occlusion remain challenges. Feature-based approaches have grown in popularity over the years, especially manifold clustering methods due to their strong mathematical foundation. Methods exploiting sparse and low-rank representations are often used since the dimensionality of the data is reduced while useful information regarding the motion segments is extracted. However, these methods are unable to effectively handle large and complete occlusions as well as missing data since they tend to fail when the amount of missing data becomes too large. An algorithm based on Sparse Subspace Clustering (SSC) has been proposed to address the issue of occlusions and missing data so that SSC can handle these cases with high accuracy. A frame-to-frame analysis was adopted as a pre-processing step to identify motion segments between consecutive frames, called inter-frame motion segments. The pre-processing step is called Multiple Split-And-Merge (MSAM), which is based on the classic top-down split-and-merge algorithm. Only points present in both frame pairs are segmented. This means that a point undergoing an occlusion is only assigned to a motion class when it has been visible for two consecutive frames after re-entering the camera view. Once all the inter-frame segments have been extracted, the results are combined in a single matrix and used as the input for the classic SSC algorithm. Therefore, SSC segments inter-frame motion segments rather than point trajectories. The resulting algorithm is referred to as MSAM-SSC. MSAM-SSC outperformed some of the most popular manifold clustering methods on the Hopkins155 and KT3DMoSeg datasets. It was also able to handle complete occlusions and 50% missing data sequences, as well as outliers. The algorithm can handle mixtures of motions and different numbers of motions. However, it was found that MSAM-SSC is more suited for traffic and articulate motion scenes which are often used in applications such as robotics, surveillance, and autonomous driving. For future work, the algorithm can be optimised to reduce the execution time so that it can be used for real-time applications. Additionally, the number of moving objects in the scene can be estimated to obtain a method that does not rely on prior knowledge.Dissertation (MEng (Computer Engineering))--University of Pretoria, 2021.CSIRElectrical, Electronic and Computer EngineeringMEng (Computer Engineering)Unrestricte

    Estimation of Subspace Arrangements with Applications in Modeling and Segmenting Mixed Data

    Get PDF
    In recent years, subspace arrangements have become an increasingly popular class of mathematical objects to be used for modeling a multivariate mixed data set that is (approximately) piecewise linear. A subspace arrangement is a union of multiple subspaces. Each subspace can be conveniently used to model a homogeneous subset of the data. Hence, all the subspaces together can capture the heterogeneous structures within the data set. In this paper, we give a comprehensive introduction to one new approach for the estimation of subspace arrangements, known as generalized principal component analysis. We provide a comprehensive summary of important algebraic properties and statistical facts that are crucial for making the inference of subspace arrangements both efficient and robust, even when the given data are corrupted with noise or contaminated by outliers. This new method in many ways improves and generalizes extant methods for modeling or clustering mixed data. There have been successful applications of this new method to many real-world problems in computer vision, image processing, and system identification. In this paper, we will examine a couple of those representative applications.National Science Foundation / NSF CAREER IIS-0347456, NSF CRS-EHS-0509151, NSF CCF-TF-0514955, and NSF CAREER DMS-034901ONR YIP N00014-05-1-0633Ope

    Robust Subspace Estimation via Low-Rank and Sparse Decomposition and Applications in Computer Vision

    Get PDF
    PhDRecent advances in robust subspace estimation have made dimensionality reduction and noise and outlier suppression an area of interest for research, along with continuous improvements in computer vision applications. Due to the nature of image and video signals that need a high dimensional representation, often storage, processing, transmission, and analysis of such signals is a difficult task. It is therefore desirable to obtain a low-dimensional representation for such signals, and at the same time correct for corruptions, errors, and outliers, so that the signals could be readily used for later processing. Major recent advances in low-rank modelling in this context were initiated by the work of Cand`es et al. [17] where the authors provided a solution for the long-standing problem of decomposing a matrix into low-rank and sparse components in a Robust Principal Component Analysis (RPCA) framework. However, for computer vision applications RPCA is often too complex, and/or may not yield desirable results. The low-rank component obtained by the RPCA has usually an unnecessarily high rank, while in certain tasks lower dimensional representations are required. The RPCA has the ability to robustly estimate noise and outliers and separate them from the low-rank component, by a sparse part. But, it has no mechanism of providing an insight into the structure of the sparse solution, nor a way to further decompose the sparse part into a random noise and a structured sparse component that would be advantageous in many computer vision tasks. As videos signals are usually captured by a camera that is moving, obtaining a low-rank component by RPCA becomes impossible. In this thesis, novel Approximated RPCA algorithms are presented, targeting different shortcomings of the RPCA. The Approximated RPCA was analysed to identify the most time consuming RPCA solutions, and replace them with simpler yet tractable alternative solutions. The proposed method is able to obtain the exact desired rank for the low-rank component while estimating a global transformation to describe camera-induced motion. Furthermore, it is able to decompose the sparse part into a foreground sparse component, and a random noise part that contains no useful information for computer vision processing. The foreground sparse component is obtained by several novel structured sparsity-inducing norms, that better encapsulate the needed pixel structure in visual signals. Moreover, algorithms for reducing complexity of low-rank estimation have been proposed that achieve significant complexity reduction without sacrificing the visual representation of video and image information. The proposed algorithms are applied to several fundamental computer vision tasks, namely, high efficiency video coding, batch image alignment, inpainting, and recovery, video stabilisation, background modelling and foreground segmentation, robust subspace clustering and motion estimation, face recognition, and ultra high definition image and video super-resolution. The algorithms proposed in this thesis including batch image alignment and recovery, background modelling and foreground segmentation, robust subspace clustering and motion segmentation, and ultra high definition image and video super-resolution achieve either state-of-the-art or comparable results to existing methods

    EXPLOITING LOW-DIMENSIONAL STRUCTURES IN MOTION PROBLEMS.

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Sparse representation frameworks for inference problems in visual sensor networks

    Get PDF
    Visual sensor networks (VSNs) form a new research area that merges computer vision and sensor networks. VSNs consist of small visual sensor nodes called camera nodes, which integrate an image sensor, an embedded processor, and a wireless transceiver. Having multiple cameras in a wireless network poses unique and challenging problems that do not exist either in computer vision or in sensor networks. Due to the resource constraints of the camera nodes, such as battery power and bandwidth, it is crucial to perform data processing and collaboration efficiently. This thesis presents a number of sparse-representation based methods to be used in the context of surveillance tasks in VSNs. Performing surveillance tasks, such as tracking, recognition, etc., in a communication-constrained VSN environment is extremely challenging. Compressed sensing is a technique for acquiring and reconstructing a signal from small amount of measurements utilizing the prior knowledge that the signal has a sparse representation in a proper space. The ability of sparse representation tools to reconstruct signals from small amount of observations fits well with the limitations in VSNs for processing, communication, and collaboration. Hence, this thesis presents novel sparsity-driven methods that can be used in action recognition and human tracking applications in VSNs. A sparsity-driven action recognition method is proposed by casting the classification problem as an optimization problem. We solve the optimization problem by enforcing sparsity through ł1 regularization and perform action recognition. We have demonstrated the superiority of our method when observations are low-resolution, occluded, and noisy. To the best of our knowledge, this is the first action recognition method that uses sparse representation. In addition, we have proposed an adaptation of this method for VSN resource constraints. We have also performed an analysis of the role of sparsity in classi cation for two different action recognition problems. We have proposed a feature compression framework for human tracking applications in visual sensor networks. In this framework, we perform decentralized tracking: each camera extracts useful features from the images it has observed and sends them to a fusion node which collects the multi-view image features and performs tracking. In tracking, extracting features usually results a likelihood function. To reduce communication in the network, we compress the likelihoods by first splitting them into blocks, and then transforming each block to a proper domain and taking only the most significant coefficients in this representation. To the best of our knowledge, compression of features computed in the context of tracking in a VSN has not been proposed in previous works. We have applied our method for indoor and outdoor tracking scenarios. Experimental results show that our approach can save up to 99.6% of the bandwidth compared to centralized approaches that compress raw images to decrease the communication. We have also shown that our approach outperforms existing decentralized approaches. Furthermore, we have extended this tracking framework and proposed a sparsitydriven approach for human tracking in VSNs. We have designed special overcomplete dictionaries that exploit the specific known geometry of the measurement scenario and used these dictionaries for sparse representation of likelihoods. By obtaining dictionaries that match the structure of the likelihood functions, we can represent likelihoods with few coefficients, and thereby decrease the communication in the network. This is the first method in the literature that uses sparse representation to compress likelihood functions and applies this idea for VSNs. We have tested our approach for indoor and outdoor tracking scenarios and demonstrated that our approach can achieve bandwidth reduction better than our feature compression framework. We have also presented that our approach outperforms existing decentralized and distributed approaches
    corecore