6,177 research outputs found
Visual Concepts and Compositional Voting
It is very attractive to formulate vision in terms of pattern theory
\cite{Mumford2010pattern}, where patterns are defined hierarchically by
compositions of elementary building blocks. But applying pattern theory to real
world images is currently less successful than discriminative methods such as
deep networks. Deep networks, however, are black-boxes which are hard to
interpret and can easily be fooled by adding occluding objects. It is natural
to wonder whether by better understanding deep networks we can extract building
blocks which can be used to develop pattern theoretic models. This motivates us
to study the internal representations of a deep network using vehicle images
from the PASCAL3D+ dataset. We use clustering algorithms to study the
population activities of the features and extract a set of visual concepts
which we show are visually tight and correspond to semantic parts of vehicles.
To analyze this we annotate these vehicles by their semantic parts to create a
new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised
part detectors. We show that visual concepts perform fairly well but are
outperformed by supervised discriminative methods such as Support Vector
Machines (SVM). We next give a more detailed analysis of visual concepts and
how they relate to semantic parts. Following this, we use the visual concepts
as building blocks for a simple pattern theoretical model, which we call
compositional voting. In this model several visual concepts combine to detect
semantic parts. We show that this approach is significantly better than
discriminative methods like SVM and deep networks trained specifically for
semantic part detection. Finally, we return to studying occlusion by creating
an annotated dataset with occlusion, called VehicleOcclusion, and show that
compositional voting outperforms even deep networks when the amount of
occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application
Picasso, Matisse, or a Fake? Automated Analysis of Drawings at the Stroke Level for Attribution and Authentication
This paper proposes a computational approach for analysis of strokes in line
drawings by artists. We aim at developing an AI methodology that facilitates
attribution of drawings of unknown authors in a way that is not easy to be
deceived by forged art. The methodology used is based on quantifying the
characteristics of individual strokes in drawings. We propose a novel algorithm
for segmenting individual strokes. We designed and compared different
hand-crafted and learned features for the task of quantifying stroke
characteristics. We also propose and compare different classification methods
at the drawing level. We experimented with a dataset of 300 digitized drawings
with over 80 thousands strokes. The collection mainly consisted of drawings of
Pablo Picasso, Henry Matisse, and Egon Schiele, besides a small number of
representative works of other artists. The experiments shows that the proposed
methodology can classify individual strokes with accuracy 70%-90%, and
aggregate over drawings with accuracy above 80%, while being robust to be
deceived by fakes (with accuracy 100% for detecting fakes in most settings)
Object-Oriented Deep Learning
We investigate an unconventional direction of research that aims at converting neural networks, a class of distributed, connectionist, sub-symbolic models into a symbolic level with the ultimate goal of achieving AI interpretability and safety. To that end, we propose Object-Oriented Deep Learning, a novel computational paradigm of deep learning that adopts interpretable “objects/symbols” as a basic representational atom instead of N-dimensional tensors (as in traditional “feature-oriented” deep learning). For visual processing, each “object/symbol” can explicitly package common properties of visual objects like its position, pose, scale, probability of being an object, pointers to parts, etc., providing a full spectrum of interpretable visual knowledge throughout all layers. It achieves a form of “symbolic disentanglement”, offering one solution to the important problem of disentangled representations and invariance. Basic computations of the network include predicting high-level objects and their properties from low-level objects and binding/aggregating relevant objects together. These computations operate at a more fundamental level than convolutions, capturing convolution as a special case while being significantly more general than it. All operations are executed in an input-driven fashion, thus sparsity and dynamic computation per sample are naturally supported, complementing recent popular ideas of dynamic networks and may enable new types of hardware accelerations. We experimentally show on CIFAR-10 that it can perform flexible visual processing, rivaling the performance of ConvNet, but without using any convolution. Furthermore, it can generalize to novel rotations of images that it was not trained for.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF – 1231216
- …