Search CORE

15,433 research outputs found

A Graph Theoretic Approach for Object Shape Representation in Compositional Hierarchies Using a Hybrid Generative-Descriptive Model

Author: A. Delong
A. Levinshtein
A. Shokoufandeh
A. Torsello
A.L. Zhu
A.M. Bronstein
B. Ommer
C.-E. Guo
D. Raviv
D.J. Cook
H. Li
H. Ling
I. Kokkinos
K. Siddiqi
L. Bo
L. Nanni
L.L. Zhu
P. Felzenszwalb
R. Salakhutdinov
R.H. Davies
S. Fidler
V. Ferrari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

A graph theoretic approach is proposed for object shape representation in a hierarchical compositional architecture called Compositional Hierarchy of Parts (CHOP). In the proposed approach, vocabulary learning is performed using a hybrid generative-descriptive model. First, statistical relationships between parts are learned using a Minimum Conditional Entropy Clustering algorithm. Then, selection of descriptive parts is defined as a frequent subgraph discovery problem, and solved using a Minimum Description Length (MDL) principle. Finally, part compositions are constructed by compressing the internal data representation with discovered substructures. Shape representation and computational complexity properties of the proposed approach and algorithms are examined using six benchmark two-dimensional shape image datasets. Experiments show that CHOP can employ part shareability and indexing mechanisms for fast inference of part compositions using learned shape vocabularies. Additionally, CHOP provides better shape retrieval performance than the state-of-the-art shape retrieval methods.Comment: Paper : 17 pages. 13th European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III, pp 566-581. Supplementary material can be downloaded from http://link.springer.com/content/esm/chp:10.1007/978-3-319-10578-9_37/file/MediaObjects/978-3-319-10578-9_37_MOESM1_ESM.pd

arXiv.org e-Print Archive

Crossref

University of Birmingham Research Portal

Teaching Compositionality to CNNs

Author: George Dileep
Liu Yi
Phoenix D. Scott
Stark Michael
Stone Austin
Wang Huayan
Publication venue
Publication date: 14/06/2017
Field of study

Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.Comment: Preprint appearing in CVPR 201

arXiv.org e-Print Archive

Crossref

Joint Object and Part Segmentation using Deep Learned Potentials

Author: Cohen Scott
Lin Zhe
Price Brian
Shen Xiaohui
Wang Peng
Yuille Alan
Publication venue
Publication date: 01/05/2015
Field of study

Segmenting semantic objects from images and parsing them into their respective semantic parts are fundamental steps towards detailed object understanding in computer vision. In this paper, we propose a joint solution that tackles semantic object and part segmentation simultaneously, in which higher object-level context is provided to guide part segmentation, and more detailed part-level localization is utilized to refine object segmentation. Specifically, we first introduce the concept of semantic compositional parts (SCP) in which similar semantic parts are grouped and shared among different objects. A two-channel fully convolutional network (FCN) is then trained to provide the SCP and object potentials at each pixel. At the same time, a compact set of segments can also be obtained from the SCP predictions of the network. Given the potentials and the generated segments, in order to explore long-range context, we finally construct an efficient fully connected conditional random field (FCRF) to jointly predict the final object and part labels. Extensive evaluation on three different datasets shows that our approach can mutually enhance the performance of object and part segmentation, and outperforms the current state-of-the-art on both tasks

arXiv.org e-Print Archive

CiteSeerX

Crossref

Visual Concepts and Compositional Voting

Author: Premachandran Vittal
Wang Jianyu
Xie Cihang
Xie Lingxi
Yuille Alan
Zhang Zhishuai
Zhou Yuyin
Zhu Jun
Publication venue
Publication date: 13/11/2017
Field of study

It is very attractive to formulate vision in terms of pattern theory \cite{Mumford2010pattern}, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application

arXiv.org e-Print Archive

DSpace@MIT