19 research outputs found
Semantic Pose using Deep Networks Trained on Synthetic RGB-D
In this work we address the problem of indoor scene understanding from RGB-D
images. Specifically, we propose to find instances of common furniture classes,
their spatial extent, and their pose with respect to generalized class models.
To accomplish this, we use a deep, wide, multi-output convolutional neural
network (CNN) that predicts class, pose, and location of possible objects
simultaneously. To overcome the lack of large annotated RGB-D training sets
(especially those with pose), we use an on-the-fly rendering pipeline that
generates realistic cluttered room scenes in parallel to training. We then
perform transfer learning on the relatively small amount of publicly available
annotated RGB-D data, and find that our model is able to successfully annotate
even highly challenging real scenes. Importantly, our trained network is able
to understand noisy and sparse observations of highly cluttered scenes with a
remarkable degree of accuracy, inferring class and pose from a very limited set
of cues. Additionally, our neural network is only moderately deep and computes
class, pose and position in tandem, so the overall run-time is significantly
faster than existing methods, estimating all output parameters simultaneously
in parallel on a GPU in seconds.Comment: ICCV 2015 Submissio
A deep representation for depth images from synthetic data
Convolutional Neural Networks (CNNs) trained on large scale RGB databases
have become the secret sauce in the majority of recent approaches for object
categorization from RGB-D data. Thanks to colorization techniques, these
methods exploit the filters learned from 2D images to extract meaningful
representations in 2.5D. Still, the perceptual signature of these two kind of
images is very different, with the first usually strongly characterized by
textures, and the second mostly by silhouettes of objects. Ideally, one would
like to have two CNNs, one for RGB and one for depth, each trained on a
suitable data collection, able to capture the perceptual properties of each
channel for the task at hand. This has not been possible so far, due to the
lack of a suitable depth database. This paper addresses this issue, proposing
to opt for synthetically generated images rather than collecting by hand a 2.5D
large scale database. While being clearly a proxy for real data, synthetic
images allow to trade quality for quantity, making it possible to generate a
virtually infinite amount of data. We show that the filters learned from such
data collection, using the very same architecture typically used on visual
data, learns very different filters, resulting in depth features (a) able to
better characterize the different facets of depth images, and (b) complementary
with respect to those derived from CNNs pre-trained on 2D datasets. Experiments
on two publicly available databases show the power of our approach
Playing for Data: Ground Truth from Computer Games
Recent progress in computer vision has been driven by high-capacity models
trained on large datasets. Unfortunately, creating large datasets with
pixel-level labels has been extremely costly due to the amount of human effort
required. In this paper, we present an approach to rapidly creating
pixel-accurate semantic label maps for images extracted from modern computer
games. Although the source code and the internal operation of commercial games
are inaccessible, we show that associations between image patches can be
reconstructed from the communication between the game and the graphics
hardware. This enables rapid propagation of semantic labels within and across
images synthesized by the game, with no access to the source code or the
content. We validate the presented approach by producing dense pixel-level
semantic annotations for 25 thousand images synthesized by a photorealistic
open-world computer game. Experiments on semantic segmentation datasets show
that using the acquired data to supplement real-world images significantly
increases accuracy and that the acquired data enables reducing the amount of
hand-labeled real-world data: models trained with game data and just 1/3 of the
CamVid training set outperform models trained on the complete CamVid training
set.Comment: Accepted to the 14th European Conference on Computer Vision (ECCV
2016
Real-Time Identification of Artifacts: Synthetic Data for AI Model
The collections represent the constitutive element and the raison d'ĂŞtre of each museum. Their management , care and dissemination are therefore a task of primary importance for every museum. Applying new Artificial Intelligence technologies in this area could lead to new initiatives. However, the development of certain tools requires structured and labeled datasets for the training phases which are not always easily available. The proposed contribution is within the domain of the construction of specific datasets with low budget tools and explores the results of a first step in this direction by testing algorithms for the recognition and labeling of heritage objects. The developed workflow is part of a first prototype that could be used both in heritage dissemination or gamification applications, and for use in heritage research tools
A short survey on modern virtual environments that utilize AI and synthetic data
Within a rather abstract computational framework Artificial Intelligence (AI) may be defined as intelligence exhibited by machines. In computer science, though, the field of AI research defines itself as the study of “intelligent agents.” In this context, interaction with popular virtual environments, as for instance in virtual game playing, has gained a lot of focus recently in the sense that it provides innovative aspects of AI perception that did not occur to researchers until now. Such aspects are typically formed by the computational intelligent behavior captured through interaction with the virtual environment, as well as the study of graphic models and biologically inspired learning techniques, like, for instance, evolutionary computation, neural networks, and reinforcement learning. In this short survey paper, we attempt to provide an overview of the most recent research works on such novel, yet quite interesting, research domains. We feel that this topic forms an attractive candidate for fellow researchers that came into sight over the last years. Thus, we initiate our study by presenting a brief overview of our motivation and continue with some basic information on recent virtual graphic models utilization and the state-of-the-art on virtual environments, which constitutes two clearly identifiable components of the herein attempted summarization. We then continue, by briefly reviewing the interesting video games territory, and by discerning and discriminating its useful types, thus envisioning possible further utilization scenarios for the collected information. A short discussion on the identified trends and a couple of future research directions conclude the paper
Object Localization, Segmentation, and Classification in 3D Images
We address the problem of identifying objects of interest in 3D images as a set of related tasks involving localization of objects within a scene, segmentation of observed object instances from other scene elements, classifying detected objects into semantic categories, and estimating the 3D pose of detected objects within the scene. The increasing availability of 3D sensors motivates us to leverage large amounts of 3D data to train machine learning models to address these tasks in 3D images. Leveraging recent advances in deep learning has allowed us to develop models capable of addressing these tasks and optimizing these tasks jointly to reduce potential errors propagated when solving these tasks independently