36,150 research outputs found

    Learning deep representations for robotics applications

    Get PDF
    In this thesis, two hierarchical learning representations are explored in computer vision tasks. First, a novel graph theoretic method for statistical shape analysis, called Compositional Hierarchy of Parts (CHOP), was proposed. The method utilises line-based features as its building blocks for the representation of shapes. A deep, multi-layer vocabulary is learned by recursively compressing this initial representation. The key contribution of this work is to formulate layerwise learning as a frequent sub-graph discovery problem, solved using the Minimum Description Length (MDL) principle. The experiments show that CHOP employs part shareability and data compression features, and yields state-of- the-art shape retrieval performance on 3 benchmark datasets. In the second part of the thesis, a hybrid generative-evaluative method was used to solve the dexterous grasping problem. This approach combines a learned dexterous grasp generation model with two novel evaluative models based on Convolutional Neural Networks (CNNs). The data- efficient generative method learns from a human demonstrator. The evaluative models are trained in simulation, using the grasps proposed by the generative approach and the depth images of the objects from a single view. On a real grasp dataset of 49 scenes with previously unseen objects, the proposed hybrid architecture outperforms the purely generative method, with a grasp success rate of 77.7% to 57.1%. The thesis concludes by comparing the two families of deep architectures, compositional hierarchies and DNNs, providing insights on their strengths and weaknesses

    A Graph Theoretic Approach for Object Shape Representation in Compositional Hierarchies Using a Hybrid Generative-Descriptive Model

    Full text link
    A graph theoretic approach is proposed for object shape representation in a hierarchical compositional architecture called Compositional Hierarchy of Parts (CHOP). In the proposed approach, vocabulary learning is performed using a hybrid generative-descriptive model. First, statistical relationships between parts are learned using a Minimum Conditional Entropy Clustering algorithm. Then, selection of descriptive parts is defined as a frequent subgraph discovery problem, and solved using a Minimum Description Length (MDL) principle. Finally, part compositions are constructed by compressing the internal data representation with discovered substructures. Shape representation and computational complexity properties of the proposed approach and algorithms are examined using six benchmark two-dimensional shape image datasets. Experiments show that CHOP can employ part shareability and indexing mechanisms for fast inference of part compositions using learned shape vocabularies. Additionally, CHOP provides better shape retrieval performance than the state-of-the-art shape retrieval methods.Comment: Paper : 17 pages. 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III, pp 566-581. Supplementary material can be downloaded from http://link.springer.com/content/esm/chp:10.1007/978-3-319-10578-9_37/file/MediaObjects/978-3-319-10578-9_37_MOESM1_ESM.pd

    Teaching Compositionality to CNNs

    Full text link
    Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201

    Visual Concepts and Compositional Voting

    Get PDF
    It is very attractive to formulate vision in terms of pattern theory \cite{Mumford2010pattern}, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application

    Joint Object and Part Segmentation using Deep Learned Potentials

    Full text link
    Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision. In this paper, we propose a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localization is utilized to refine object segmentation. Specifically, we first introduce the concept of semantic compositional parts (SCP) in which similar semantic parts are grouped and shared among different objects. A two-channel fully convolutional network (FCN) is then trained to provide the SCP and object potentials at each pixel. At the same time, a compact set of segments can also be obtained from the SCP predictions of the network. Given the potentials and the generated segments, in order to explore long-range context, we finally construct an efficient fully connected conditional random field (FCRF) to jointly predict the final object and part labels. Extensive evaluation on three different datasets shows that our approach can mutually enhance the performance of object and part segmentation, and outperforms the current state-of-the-art on both tasks

    Resolving Lexical Ambiguity in Tensor Regression Models of Meaning

    Full text link
    This paper provides a method for improving tensor-based compositional distributional models of meaning by the addition of an explicit disambiguation step prior to composition. In contrast with previous research where this hypothesis has been successfully tested against relatively simple compositional models, in our work we use a robust model trained with linear regression. The results we get in two experiments show the superiority of the prior disambiguation method and suggest that the effectiveness of this approach is model-independent
    corecore