4,288 research outputs found
Geometric Image Synthesis
The task of generating natural images from 3D scenes has been a long standing
goal in computer graphics. On the other hand, recent developments in deep
neural networks allow for trainable models that can produce natural-looking
images with little or no knowledge about the scene structure. While the
generated images often consist of realistic looking local patterns, the overall
structure of the generated images is often inconsistent. In this work we
propose a trainable, geometry-aware image generation method that leverages
various types of scene information, including geometry and segmentation, to
create realistic looking natural images that match the desired scene structure.
Our geometrically-consistent image synthesis method is a deep neural network,
called Geometry to Image Synthesis (GIS) framework, which retains the
advantages of a trainable method, e.g., differentiability and adaptiveness,
but, at the same time, makes a step towards the generalizability, control and
quality output of modern graphics rendering engines. We utilize the GIS
framework to insert vehicles in outdoor driving scenes, as well as to generate
novel views of objects from the Linemod dataset. We qualitatively show that our
network is able to generalize beyond the training set to novel scene
geometries, object shapes and segmentations. Furthermore, we quantitatively
show that the GIS framework can be used to synthesize large amounts of training
data which proves beneficial for training instance segmentation models
Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects
Using synthetic data for training deep neural networks for robotic
manipulation holds the promise of an almost unlimited amount of pre-labeled
training data, generated safely out of harm's way. One of the key challenges of
synthetic data, to date, has been to bridge the so-called reality gap, so that
networks trained on synthetic data operate correctly when exposed to real-world
data. We explore the reality gap in the context of 6-DoF pose estimation of
known objects from a single RGB image. We show that for this problem the
reality gap can be successfully spanned by a simple combination of domain
randomized and photorealistic data. Using synthetic data generated in this
manner, we introduce a one-shot deep neural network that is able to perform
competitively against a state-of-the-art network trained on a combination of
real and synthetic data. To our knowledge, this is the first deep network
trained only on synthetic data that is able to achieve state-of-the-art
performance on 6-DoF object pose estimation. Our network also generalizes
better to novel environments including extreme lighting conditions, for which
we show qualitative results. Using this network we demonstrate a real-time
system estimating object poses with sufficient accuracy for real-world semantic
grasping of known household objects in clutter by a real robot.Comment: Conference on Robot Learning (CoRL) 201
The ISTI Rapid Response on Exploring Cloud Computing 2018
This report describes eighteen projects that explored how commercial cloud
computing services can be utilized for scientific computation at national
laboratories. These demonstrations ranged from deploying proprietary software
in a cloud environment to leveraging established cloud-based analytics
workflows for processing scientific datasets. By and large, the projects were
successful and collectively they suggest that cloud computing can be a valuable
computational resource for scientific computation at national laboratories
A Deformable Interface for Human Touch Recognition using Stretchable Carbon Nanotube Dielectric Elastomer Sensors and Deep Neural Networks
User interfaces provide an interactive window between physical and virtual
environments. A new concept in the field of human-computer interaction is a
soft user interface; a compliant surface that facilitates touch interaction
through deformation. Despite the potential of these interfaces, they currently
lack a signal processing framework that can efficiently extract information
from their deformation. Here we present OrbTouch, a device that uses
statistical learning algorithms, based on convolutional neural networks, to map
deformations from human touch to categorical labels (i.e., gestures) and touch
location using stretchable capacitor signals as inputs. We demonstrate this
approach by using the device to control the popular game Tetris. OrbTouch
provides a modular, robust framework to interpret deformation in soft media,
laying a foundation for new modes of human computer interaction through shape
changing solids
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks
We propose two efficient approximations to standard convolutional neural
networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks,
the filters are approximated with binary values resulting in 32x memory saving.
In XNOR-Networks, both the filters and the input to convolutional layers are
binary. XNOR-Networks approximate convolutions using primarily binary
operations. This results in 58x faster convolutional operations and 32x memory
savings. XNOR-Nets offer the possibility of running state-of-the-art networks
on CPUs (rather than GPUs) in real-time. Our binary networks are simple,
accurate, efficient, and work on challenging visual tasks. We evaluate our
approach on the ImageNet classification task. The classification accuracy with
a Binary-Weight-Network version of AlexNet is only 2.9% less than the
full-precision AlexNet (in top-1 measure). We compare our method with recent
network binarization methods, BinaryConnect and BinaryNets, and outperform
these methods by large margins on ImageNet, more than 16% in top-1 accuracy
GAN-based Virtual Re-Staining: A Promising Solution for Whole Slide Image Analysis
Histopathological cancer diagnosis is based on visual examination of stained
tissue slides. Hematoxylin and eosin (H\&E) is a standard stain routinely
employed worldwide. It is easy to acquire and cost effective, but cells and
tissue components show low-contrast with varying tones of dark blue and pink,
which makes difficult visual assessments, digital image analysis, and
quantifications. These limitations can be overcome by IHC staining of target
proteins of the tissue slide. IHC provides a selective, high-contrast imaging
of cells and tissue components, but their use is largely limited by a
significantly more complex laboratory processing and high cost. We proposed a
conditional CycleGAN (cCGAN) network to transform the H\&E stained images into
IHC stained images, facilitating virtual IHC staining on the same slide. This
data-driven method requires only a limited amount of labelled data but will
generate pixel level segmentation results. The proposed cCGAN model improves
the original network \cite{zhu_unpaired_2017} by adding category conditions and
introducing two structural loss functions, which realize a multi-subdomain
translation and improve the translation accuracy as well. % need to give
reasons here. Experiments demonstrate that the proposed model outperforms the
original method in unpaired image translation with multi-subdomains. We also
explore the potential of unpaired images to image translation method applied on
other histology images related tasks with different staining techniques
Efficient and Scalable View Generation from a Single Image using Fully Convolutional Networks
Single-image-based view generation (SIVG) is important for producing 3D
stereoscopic content. Here, handling different spatial resolutions as input and
optimizing both reconstruction accuracy and processing speed is desirable.
Latest approaches are based on convolutional neural network (CNN), and they
generate promising results. However, their use of fully connected layers as
well as pre-trained VGG forces a compromise between reconstruction accuracy and
processing speed. In addition, this approach is limited to the use of a
specific spatial resolution. To remedy these problems, we propose exploiting
fully convolutional networks (FCN) for SIVG. We present two FCN architectures
for SIVG. The first one is based on combination of an FCN and a view-rendering
network called DeepView. The second one consists of decoupled networks
for luminance and chrominance signals, denoted by DeepView. To train
our solutions we present a large dataset of 2M stereoscopic images. Results
show that both of our architectures improve accuracy and speed over the state
of the art. DeepView generates competitive accuracy to the state of the
art, however, with the fastest processing speed of all. That is x5 times faster
speed and x24 times lower memory consumption compared to the state of the art.
DeepView has much higher accuracy, but with x2.5 times faster speed and
x12 times lower memory consumption. We evaluated our approach with both
objective and subjective studies.Comment: 8 pages, 6 figure
Deformable Shape Completion with Graph Convolutional Autoencoders
The availability of affordable and portable depth sensors has made scanning
objects and people simpler than ever. However, dealing with occlusions and
missing parts is still a significant challenge. The problem of reconstructing a
(possibly non-rigidly moving) 3D object from a single or multiple partial scans
has received increasing attention in recent years. In this work, we propose a
novel learning-based method for the completion of partial shapes. Unlike the
majority of existing approaches, our method focuses on objects that can undergo
non-rigid deformations. The core of our method is a variational autoencoder
with graph convolutional operations that learns a latent space for complete
realistic shapes. At inference, we optimize to find the representation in this
latent space that best fits the generated shape to the known partial input. The
completed shape exhibits a realistic appearance on the unknown part. We show
promising results towards the completion of synthetic and real scans of human
body and face meshes exhibiting different styles of articulation and
partiality.Comment: CVPR 201
Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters
As autonomous vehicles become an every-day reality, high-accuracy pedestrian
detection is of paramount practical importance. Pedestrian detection is a
highly researched topic with mature methods, but most datasets focus on common
scenes of people engaged in typical walking poses on sidewalks. But performance
is most crucial for dangerous scenarios, such as children playing in the street
or people using bicycles/skateboards in unexpected ways. Such "in-the-tail"
data is notoriously hard to observe, making both training and testing
difficult. To analyze this problem, we have collected a novel annotated dataset
of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
dedicated collection effort, it is relatively small by contemporary standards
(around 1000 images). To allow for large-scale data-driven learning, we explore
the use of synthetic data generated by a game engine. A significant challenge
is selected the right "priors" or parameters for synthesis: we would like
realistic data with poses and object configurations that mimic true Precarious
Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a
massive amount of synthetic data and train a discriminative classifier to
select a realistic subset, which we deem the Adversarial Imposters. We
demonstrate that this simple pipeline allows one to synthesize realistic
training data by making use of rendering/animation engines within a GAN
framework. Interestingly, we also demonstrate that such data can be used to
rank algorithms, suggesting that Adversarial Imposters can also be used for
"in-the-tail" validation at test-time, a notoriously difficult challenge for
real-world deployment.Comment: To appear in CVPR 201
Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization
We present a system for training deep neural networks for object detection
using synthetic images. To handle the variability in real-world data, the
system relies upon the technique of domain randomization, in which the
parameters of the simulatorsuch as lighting, pose, object textures,
etc.are randomized in non-realistic ways to force the neural network to
learn the essential features of the object of interest. We explore the
importance of these parameters, showing that it is possible to produce a
network with compelling performance using only non-artistically-generated
synthetic data. With additional fine-tuning on real data, the network yields
better performance than using real data alone. This result opens up the
possibility of using inexpensive synthetic data for training neural networks
while avoiding the need to collect large amounts of hand-annotated real-world
data or to generate high-fidelity synthetic worldsboth of which remain
bottlenecks for many applications. The approach is evaluated on bounding box
detection of cars on the KITTI dataset.Comment: CVPR 2018 Workshop on Autonomous Drivin
- …