667 research outputs found
Sparse Image Representation with Epitomes
Sparse coding, which is the decomposition of a vector using only a few basis
elements, is widely used in machine learning and image processing. The basis
set, also called dictionary, is learned to adapt to specific data. This
approach has proven to be very effective in many image processing tasks.
Traditionally, the dictionary is an unstructured "flat" set of atoms. In this
paper, we study structured dictionaries which are obtained from an epitome, or
a set of epitomes. The epitome is itself a small image, and the atoms are all
the patches of a chosen size inside this image. This considerably reduces the
number of parameters to learn and provides sparse image decompositions with
shiftinvariance properties. We propose a new formulation and an algorithm for
learning the structured dictionaries associated with epitomes, and illustrate
their use in image denoising tasks.Comment: Computer Vision and Pattern Recognition, Colorado Springs : United
States (2011
Analyzing structural characteristics of object category representations from their semantic-part distributions
Studies from neuroscience show that part-mapping computations are employed by
human visual system in the process of object recognition. In this work, we
present an approach for analyzing semantic-part characteristics of object
category representations. For our experiments, we use category-epitome, a
recently proposed sketch-based spatial representation for objects. To enable
part-importance analysis, we first obtain semantic-part annotations of
hand-drawn sketches originally used to construct the corresponding epitomes. We
then examine the extent to which the semantic-parts are present in the epitomes
of a category and visualize the relative importance of parts as a word cloud.
Finally, we show how such word cloud visualizations provide an intuitive
understanding of category-level structural trends that exist in the
category-epitome object representations
Deep Epitomic Convolutional Neural Networks
Deep convolutional neural networks have recently proven extremely competitive
in challenging image recognition tasks. This paper proposes the epitomic
convolution as a new building block for deep neural networks. An epitomic
convolution layer replaces a pair of consecutive convolution and max-pooling
layers found in standard deep convolutional neural networks. The main version
of the proposed model uses mini-epitomes in place of filters and computes
responses invariant to small translations by epitomic search instead of
max-pooling over image positions. The topographic version of the proposed model
uses large epitomes to learn filter maps organized in translational
topographies. We show that error back-propagation can successfully learn
multiple epitomic layers in a supervised fashion. The effectiveness of the
proposed method is assessed in image classification tasks on standard
benchmarks. Our experiments on Imagenet indicate improved recognition
performance compared to standard convolutional neural networks of similar
architecture. Our models pre-trained on Imagenet perform excellently on
Caltech-101. We also obtain competitive image classification results on the
small-image MNIST and CIFAR-10 datasets.Comment: 9 page
Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection
Deep Convolutional Neural Networks (DCNNs) commonly use generic `max-pooling'
(MP) layers to extract deformation-invariant features, but we argue in favor of
a more refined treatment. First, we introduce epitomic convolution as a
building block alternative to the common convolution-MP cascade of DCNNs; while
having identical complexity to MP, Epitomic Convolution allows for parameter
sharing across different filters, resulting in faster convergence and better
generalization. Second, we introduce a Multiple Instance Learning approach to
explicitly accommodate global translation and scaling when training a DCNN
exclusively with class labels. For this we rely on a `patchwork' data structure
that efficiently lays out all image scales and positions as candidates to a
DCNN. Factoring global and local deformations allows a DCNN to `focus its
resources' on the treatment of non-rigid deformations and yields a substantial
classification accuracy improvement. Third, further pursuing this idea, we
develop an efficient DCNN sliding window object detector that employs explicit
search over position, scale, and aspect ratio. We provide competitive image
classification and localization results on the ImageNet dataset and object
detection results on the Pascal VOC 2007 benchmark.Comment: 13 pages, 7 figures, 5 tables. arXiv admin note: substantial text
overlap with arXiv:1406.273
Repositioning the Base Level of Bibliographic Relationships: or, A Cataloguer, a Post-Modernist and a Chatbot Walk Into a Bar
Designers and maintainers of library catalogues are facing fresh challenges representing bibliographic relationships, due both to changes in cataloguing standards and to a broader information environment that has grown increasingly diverse, sophisticated and complex. This paper presents three different paradigms, drawn from three different fields of study, for representing relationships between bibliographic entities beyond the FRBR/LRM models: superworks, as developed in information studies; adaptation, as developed in literary studies; and artificial intelligence, as developed in computer science. Theories of literary adaptation remain focused on “the work,” as traditionally conceived. The concept of the superwork reminds us that there are some works which serve as ancestors for entire families of works, and that those familial relationships are still useful. Crowd-sourcing projects often make more granular connections, a trend which has escalated significantly with current and emerging artificial intelligence systems. While the artificial intelligence paradigm is proving more pervasive outside conventional library systems, it could lead to a seismic shift in knowledge organization, a shift in which the power both to arrange information and to use it are moving beyond the control of users and intermediaries alike
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic annotations are vital for training models for object recognition,
semantic segmentation or scene understanding. Unfortunately, pixelwise
annotation of images at very large scale is labor-intensive and only little
labeled data is available, particularly at instance level and for street
scenes. In this paper, we propose to tackle this problem by lifting the
semantic instance labeling task from 2D into 3D. Given reconstructions from
stereo or laser data, we annotate static 3D scene elements with rough bounding
primitives and develop a model which transfers this information into the image
domain. We leverage our method to obtain 2D labels for a novel suburban video
dataset which we have collected, resulting in 400k semantic and instance image
annotations. A comparison of our method to state-of-the-art label transfer
baselines reveals that 3D information enables more efficient annotation while
at the same time resulting in improved accuracy and time-coherent labels.Comment: 10 pages in Conference on Computer Vision and Pattern Recognition
(CVPR), 201
- …