13 research outputs found
Deep Epitomic Convolutional Neural Networks
Deep convolutional neural networks have recently proven extremely competitive
in challenging image recognition tasks. This paper proposes the epitomic
convolution as a new building block for deep neural networks. An epitomic
convolution layer replaces a pair of consecutive convolution and max-pooling
layers found in standard deep convolutional neural networks. The main version
of the proposed model uses mini-epitomes in place of filters and computes
responses invariant to small translations by epitomic search instead of
max-pooling over image positions. The topographic version of the proposed model
uses large epitomes to learn filter maps organized in translational
topographies. We show that error back-propagation can successfully learn
multiple epitomic layers in a supervised fashion. The effectiveness of the
proposed method is assessed in image classification tasks on standard
benchmarks. Our experiments on Imagenet indicate improved recognition
performance compared to standard convolutional neural networks of similar
architecture. Our models pre-trained on Imagenet perform excellently on
Caltech-101. We also obtain competitive image classification results on the
small-image MNIST and CIFAR-10 datasets.Comment: 9 page
Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection
Deep Convolutional Neural Networks (DCNNs) commonly use generic `max-pooling'
(MP) layers to extract deformation-invariant features, but we argue in favor of
a more refined treatment. First, we introduce epitomic convolution as a
building block alternative to the common convolution-MP cascade of DCNNs; while
having identical complexity to MP, Epitomic Convolution allows for parameter
sharing across different filters, resulting in faster convergence and better
generalization. Second, we introduce a Multiple Instance Learning approach to
explicitly accommodate global translation and scaling when training a DCNN
exclusively with class labels. For this we rely on a `patchwork' data structure
that efficiently lays out all image scales and positions as candidates to a
DCNN. Factoring global and local deformations allows a DCNN to `focus its
resources' on the treatment of non-rigid deformations and yields a substantial
classification accuracy improvement. Third, further pursuing this idea, we
develop an efficient DCNN sliding window object detector that employs explicit
search over position, scale, and aspect ratio. We provide competitive image
classification and localization results on the ImageNet dataset and object
detection results on the Pascal VOC 2007 benchmark.Comment: 13 pages, 7 figures, 5 tables. arXiv admin note: substantial text
overlap with arXiv:1406.273
Complexity of Representation and Inference in Compositional Models with Part Sharing
This paper performs a complexity analysis of a class of serial and parallel compositional models of multiple objects and shows that they enable efficient representation and rapid inference. Compositional models are generative and represent objects in a hierarchically distributed manner in terms of parts and subparts, which are constructed recursively by part-subpart compositions. Parts are represented more coarsely at higher level of the hierarchy, so that the upper levels give coarse summary descriptions (e.g., there is a horse in the image) while the lower levels represents the details (e.g., the positions of the legs of the horse). This hierarchically distributed representation obeys the executive summary principle, meaning that a high level executive only requires a coarse summary description and can, if necessary, get more details by consulting lower level executives. The parts and subparts are organized in terms of hierarchical dictionaries which enables part sharing between different objects allowing efficient representation of many objects. The first main contribution of this paper is to show that compositional models can be mapped onto a parallel visual architecture similar to that used by bio-inspired visual models such as deep convolutional networks but more explicit in terms of representation, hence enabling part detection as well as object detection, and suitable for complexity analysis. Inference algorithms can be run on this architecture to exploit the gains caused by part sharing and executive summary. Effectively, this compositional architecture enables us to perform exact inference simultaneously over a large class of generative models of objects.The second contribution is an analysis of the complexity of compositional models in terms of computation time (for serial computers) and numbers of nodes (e.g., ``neurons") for parallel computers. In particular, we compute the complexity gains by part sharing and executive summary and their dependence on how the dictionary scales with the level of the hierarchy. We explore three regimes of scaling behavior where the dictionary size (i) increases exponentially with the level of the hierarchy, (ii) is determined by an unsupervised compositional learning algorithm applied to real data, (iii) decreases exponentially with scale. This analysis shows that in some regimes the use of shared parts enables algorithms which can perform inference in time linear in the number of levels for an exponential number of objects. In other regimes part sharing has little advantage for serial computers but can enable linear processing on parallel computers.This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216 and also by ARO 62250-CS
Visual Concepts and Compositional Voting
It is very attractive to formulate vision in terms of pattern theory
\cite{Mumford2010pattern}, where patterns are defined hierarchically by
compositions of elementary building blocks. But applying pattern theory to real
world images is currently less successful than discriminative methods such as
deep networks. Deep networks, however, are black-boxes which are hard to
interpret and can easily be fooled by adding occluding objects. It is natural
to wonder whether by better understanding deep networks we can extract building
blocks which can be used to develop pattern theoretic models. This motivates us
to study the internal representations of a deep network using vehicle images
from the PASCAL3D+ dataset. We use clustering algorithms to study the
population activities of the features and extract a set of visual concepts
which we show are visually tight and correspond to semantic parts of vehicles.
To analyze this we annotate these vehicles by their semantic parts to create a
new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised
part detectors. We show that visual concepts perform fairly well but are
outperformed by supervised discriminative methods such as Support Vector
Machines (SVM). We next give a more detailed analysis of visual concepts and
how they relate to semantic parts. Following this, we use the visual concepts
as building blocks for a simple pattern theoretical model, which we call
compositional voting. In this model several visual concepts combine to detect
semantic parts. We show that this approach is significantly better than
discriminative methods like SVM and deep networks trained specifically for
semantic part detection. Finally, we return to studying occlusion by creating
an annotated dataset with occlusion, called VehicleOcclusion, and show that
compositional voting outperforms even deep networks when the amount of
occlusion becomes large.Comment: It is accepted by Annals of Mathematical Sciences and Application
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
Following Tradition
Following Tradition is an expansive examination of the history of traditio
Following Tradition
Following Tradition is an expansive examination of the history of tradition— one of the most common as well as most contested terms in English language usage —in Americans\u27 thinking and discourse about culture. Tradition in use becomes problematic because of its multiple meanings and its conceptual softness. As a term and a concept, it has been important in the development of all scholarly fields that study American culture. Folklore, history, American studies, anthropology, cultural studies, and others assign different value and meaning to tradition. It is a frequent point of reference in popular discourse concerning everything from politics to lifestyles to sports and entertainment. Politicians and social advocates appeal to it as prima facie evidence of the worth of their causes. Entertainment and other media mass produce it, or at least a facsimile of it. In a society that frequently seeks to reinvent itself, tradition as a cultural anchor to be reverenced or rejected is an essential, if elusive, concept. Simon Bronner\u27s wide net captures the historical, rhetorical, philosophical, and psychological dimensions of tradition. As he notes, he has written a book about an American tradition—arguing about it. His elucidation of those arguments makes fascinating and thoughtful reading. An essential text for folklorists, Following Tradition will be a valuable reference as well for historians and anthropologists; students of American studies, popular culture, and cultural studies; and anyone interested in the continuing place of tradition in American culture.https://digitalcommons.usu.edu/usupress_pubs/1064/thumbnail.jp