564 research outputs found

    Simultaneous subspace clustering and cluster number estimating based on triplet relationship

    Get PDF
    In this paper we propose a unified framework to discover the number of clusters and group the data points into different clusters using subspace clustering simultaneously. Real data distributed in a high-dimensional space can be disentangled into a union of low-dimensional subspaces, which can benefit various applications. To explore such intrinsic structure, stateof- the-art subspace clustering approaches often optimize a selfrepresentation problem among all samples, to construct a pairwise affinity graph for spectral clustering. However, a graph with pairwise similarities lacks robustness for segmentation, especially for samples which lie on the intersection of two subspaces. To address this problem, we design a hyper-correlation based data structure termed as the triplet relationship, which reveals high relevance and local compactness among three samples. The triplet relationship can be derived from the self-representation matrix, and be utilized to iteratively assign the data points to clusters. Based on the triplet relationship, we propose a unified optimizing scheme to automatically calculate clustering assignments. Specifically, we optimize a model selection reward and a fusion reward by simultaneously maximizing the similarity of triplets from different clusters while minimizing the correlation of triplets from same cluster. The proposed algorithm also automatically reveals the number of clusters and fuses groups to avoid over-segmentation. Extensive experimental results on both synthetic and real-world datasets validate the effectiveness and robustness of the proposed method

    A Cluster Elastic Net for Multivariate Regression

    Get PDF
    We propose a method for estimating coefficients in multivariate regression when there is a clustering structure to the response variables. The proposed method includes a fusion penalty, to shrink the difference in fitted values from responses in the same cluster, and an L1 penalty for simultaneous variable selection and estimation. The method can be used when the grouping structure of the response variables is known or unknown. When the clustering structure is unknown the method will simultaneously estimate the clusters of the response and the regression coefficients. Theoretical results are presented for the penalized least squares case, including asymptotic results allowing for p >> n. We extend our method to the setting where the responses are binomial variables. We propose a coordinate descent algorithm for both the normal and binomial likelihood, which can easily be extended to other generalized linear model (GLM) settings. Simulations and data examples from business operations and genomics are presented to show the merits of both the least squares and binomial methods.Comment: 37 Pages, 11 Figure

    Downstream Task Self-Supervised Learning for Object Recognition and Tracking

    Get PDF
    This dissertation addresses three limitations of deep learning methods in image and video understanding-based machine vision applications. Firstly, although deep convolutional neural networks (CNNs) are efficient for image recognition applications such as object detection and segmentation, they perform poorly under perspective distortions. In real-world applications, the camera perspective is a common problem that we can address by annotating large amounts of data, thus limiting the applicability of the deep learning models. Secondly, the typical approach for single-camera tracking problems is to use separate motion and appearance models, which are expensive in terms of computations and training data requirements. Finally, conventional multi-camera video understanding techniques use supervised learning algorithms to determine temporal relationships among objects. In large-scale applications, these methods are also limited by the requirement of extensive manually annotated data and computational resources.To address these limitations, we develop an uncertainty-aware self-supervised learning (SSL) technique that captures a model\u27s instance or semantic segmentation uncertainty from overhead images and guides the model to learn the impact of the new perspective on object appearance. The test-time data augmentation-based pseudo-label refinement technique continuously trains a model until convergence on new perspective images. The proposed method can be applied for both self-supervision and semi-supervision, thus increasing the effectiveness of a deep pre-trained model in new domains. Extensive experiments demonstrate the effectiveness of the SSL technique in both object detection and semantic segmentation problems. In video understanding applications, we introduce simultaneous segmentation and tracking as an unsupervised spatio-temporal latent feature clustering problem. The jointly learned multi-task features leverage the task-dependent uncertainty to generate discriminative features in multi-object videos. Experiments have shown that the proposed tracker outperforms several state-of-the-art supervised methods. Finally, we proposed an unsupervised multi-camera tracklet association (MCTA) algorithm to track multiple objects in real-time. MCTA leverages the self-supervised detector model for single-camera tracking and solves the multi-camera tracking problem using multiple pair-wise camera associations modeled as a connected graph. The graph optimization method generates a global solution for partially or fully overlapping camera networks

    From images to augmented 3D models: improved visual SLAM and augmented point cloud modeling

    Get PDF
    This thesis investigates into the problem of using monocular image sequences to generate augmented models. The problem is decomposed to two subproblems: monocular visual simultaneously localization and mapping (VSLAM), and the point cloud data modeling. Accordingly, the thesis comprises two major parts. The First part, including Chapters 2, 3 and 4, aims to leverage the system observability theories to improve the VSLAM accuracy. In Chapter 2, a piece-wise linear system is developed to model VSLAM, and two necessary conditions are proved to make the VSLAM completely observable. Based on the First condition, an instantaneous condition for complete observability, the "Optimally Observable and Minimal Cardinality (OOMC) VSLAM" is presented in Chapter 3. The OOMC algorithm selects the feature subset of minimal required cardinality to form the strongest observable VSLAM subsystem. The select feature subset is further used to improve the data association in VSLAM. Based on the second condition, a temporal condition for complete observability, the "Good Features (GF) to Track for VSLAM" is presented in Chapter 4. The GF algorithm ranks the individual features according to their contributions to system observability. Benchmarking experiments of both OOMC and GF algorithms demonstrate improvements in VSLAM performance. The second part, including Chapters 5 and 6, aims to solve the PCD modeling problem in a geometry-driven manner. Chapter 5 presents an algorithm to model PCDs with planar patches via a sparsity-inducing optimization. Chapter 6 extends the PCD modeling to quadratic surface primitives based models. A method is further developed to retrieve the high-level semantic information of the model components. Evaluation on the PCDs generated from VSLAM demonstrates the effectiveness of these geometry-driven PCD modeling approaches.Ph.D

    Non-Rigid Structure from Motion

    Get PDF
    This thesis revisits a challenging classical problem in geometric computer vision known as "Non-Rigid Structure-from-Motion" (NRSfM). It is a well-known problem where the task is to recover the 3D shape and motion of a non-rigidly moving object from image data. A reliable solution to this problem is valuable in several industrial applications such as virtual reality, medical surgery, animation movies etc. Nevertheless, to date, there does not exist any algorithm that can solve NRSfM for all kinds of conceivable motion. As a result, additional constraints and assumptions are often employed to solve NRSfM. The task is challenging due to the inherent unconstrained nature of the problem itself as many 3D varying configurations can have similar image projections. The problem becomes even more challenging if the camera is moving along with the object. The thesis takes on a modern view to this challenging problem and proposes a few algorithms that have set a new performance benchmark to solve NRSfM. The thesis not only discusses the classical work in NRSfM but also proposes some powerful elementary modification to it. The foundation of this thesis surpass the traditional single object NRSFM and for the first time provides an effective formulation to realise multi-body NRSfM. Most techniques for NRSfM under factorisation can only handle sparse feature correspondences. These sparse features are then used to construct a scene using the organisation of points, lines, planes or other elementary geometric primitive. Nevertheless, sparse representation of the scene provides an incomplete information about the scene. This thesis goes from sparse NRSfM to dense NRSfM for a single object, and then slowly lifts the intuition to realise dense 3D reconstruction of the entire dynamic scene as a global as rigid as possible deformation problem. The core of this work goes beyond the traditional approach to deal with deformation. It shows that relative scales for multiple deforming objects can be recovered under some mild assumption about the scene. The work proposes a new approach for dense detailed 3D reconstruction of a complex dynamic scene from two perspective frames. Since the method does not need any depth information nor it assumes a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios including YouTube Videos. Lastly, this thesis provides a new way to perceive the depth of a dynamic scene which essentially trivialises the notion of motion estimation as a compulsory step to solve this problem. Conventional geometric methods to address depth estimation requires a reliable estimate of motion parameters for each moving object, which is difficult to obtain and validate. In contrast, this thesis introduces a new motion-free approach to estimate the dense depth map of a complex dynamic scene for successive/multiple frames. The work show that given per-pixel optical flow correspondences between two consecutive frames and the sparse depth prior for the reference frame, we can recover the dense depth map for the successive frames without solving for motion parameters. By assigning the locally rigid structure to the piece-wise planar approximation of a dynamic scene which transforms as rigid as possible over frames, we can bypass the motion estimation step. Experiments results and MATLAB codes on relevant examples are provided to validate the motion-free idea

    Probabilistic analysis of the human transcriptome with side information

    Get PDF
    Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure
    • …
    corecore