6,772 research outputs found
Semantic Pose using Deep Networks Trained on Synthetic RGB-D
In this work we address the problem of indoor scene understanding from RGB-D
images. Specifically, we propose to find instances of common furniture classes,
their spatial extent, and their pose with respect to generalized class models.
To accomplish this, we use a deep, wide, multi-output convolutional neural
network (CNN) that predicts class, pose, and location of possible objects
simultaneously. To overcome the lack of large annotated RGB-D training sets
(especially those with pose), we use an on-the-fly rendering pipeline that
generates realistic cluttered room scenes in parallel to training. We then
perform transfer learning on the relatively small amount of publicly available
annotated RGB-D data, and find that our model is able to successfully annotate
even highly challenging real scenes. Importantly, our trained network is able
to understand noisy and sparse observations of highly cluttered scenes with a
remarkable degree of accuracy, inferring class and pose from a very limited set
of cues. Additionally, our neural network is only moderately deep and computes
class, pose and position in tandem, so the overall run-time is significantly
faster than existing methods, estimating all output parameters simultaneously
in parallel on a GPU in seconds.Comment: ICCV 2015 Submissio
Generating realistic scaled complex networks
Research on generative models is a central project in the emerging field of
network science, and it studies how statistical patterns found in real networks
could be generated by formal rules. Output from these generative models is then
the basis for designing and evaluating computational methods on networks, and
for verification and simulation studies. During the last two decades, a variety
of models has been proposed with an ultimate goal of achieving comprehensive
realism for the generated networks. In this study, we (a) introduce a new
generator, termed ReCoN; (b) explore how ReCoN and some existing models can be
fitted to an original network to produce a structurally similar replica, (c)
use ReCoN to produce networks much larger than the original exemplar, and
finally (d) discuss open problems and promising research directions. In a
comparative experimental study, we find that ReCoN is often superior to many
other state-of-the-art network generation methods. We argue that ReCoN is a
scalable and effective tool for modeling a given network while preserving
important properties at both micro- and macroscopic scales, and for scaling the
exemplar data by orders of magnitude in size.Comment: 26 pages, 13 figures, extended version, a preliminary version of the
paper was presented at the 5th International Workshop on Complex Networks and
their Application
A deep representation for depth images from synthetic data
Convolutional Neural Networks (CNNs) trained on large scale RGB databases
have become the secret sauce in the majority of recent approaches for object
categorization from RGB-D data. Thanks to colorization techniques, these
methods exploit the filters learned from 2D images to extract meaningful
representations in 2.5D. Still, the perceptual signature of these two kind of
images is very different, with the first usually strongly characterized by
textures, and the second mostly by silhouettes of objects. Ideally, one would
like to have two CNNs, one for RGB and one for depth, each trained on a
suitable data collection, able to capture the perceptual properties of each
channel for the task at hand. This has not been possible so far, due to the
lack of a suitable depth database. This paper addresses this issue, proposing
to opt for synthetically generated images rather than collecting by hand a 2.5D
large scale database. While being clearly a proxy for real data, synthetic
images allow to trade quality for quantity, making it possible to generate a
virtually infinite amount of data. We show that the filters learned from such
data collection, using the very same architecture typically used on visual
data, learns very different filters, resulting in depth features (a) able to
better characterize the different facets of depth images, and (b) complementary
with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
on two publicly available databases show the power of our approach
Multi-task Self-Supervised Visual Learning
We investigate methods for combining multiple self-supervised tasks--i.e.,
supervised tasks where data can be collected without manual labeling--in order
to train a single visual representation. First, we provide an apples-to-apples
comparison of four different self-supervised tasks using the very deep
ResNet-101 architecture. We then combine tasks to jointly train a network. We
also explore lasso regularization to encourage the network to factorize the
information in its representation, and methods for "harmonizing" network inputs
in order to learn a more unified representation. We evaluate all methods on
ImageNet classification, PASCAL VOC detection, and NYU depth prediction. Our
results show that deeper networks work better, and that combining tasks--even
via a naive multi-head architecture--always improves performance. Our best
joint network nearly matches the PASCAL performance of a model pre-trained on
ImageNet classification, and matches the ImageNet network on NYU depth
prediction.Comment: Published at ICCV 201
A survey of exemplar-based texture synthesis
Exemplar-based texture synthesis is the process of generating, from an input
sample, new texture images of arbitrary size and which are perceptually
equivalent to the sample. The two main approaches are statistics-based methods
and patch re-arrangement methods. In the first class, a texture is
characterized by a statistical signature; then, a random sampling conditioned
to this signature produces genuinely different texture images. The second class
boils down to a clever "copy-paste" procedure, which stitches together large
regions of the sample. Hybrid methods try to combine ideas from both approaches
to avoid their hurdles. The recent approaches using convolutional neural
networks fit to this classification, some being statistical and others
performing patch re-arrangement in the feature space. They produce impressive
synthesis on various kinds of textures. Nevertheless, we found that most real
textures are organized at multiple scales, with global structures revealed at
coarse scales and highly varying details at finer ones. Thus, when confronted
with large natural images of textures the results of state-of-the-art methods
degrade rapidly, and the problem of modeling them remains wide open.Comment: v2: Added comments and typos fixes. New section added to describe
FRAME. New method presented: CNNMR
- …