16 research outputs found
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
Today, there are two major paradigms for vision-based autonomous driving
systems: mediated perception approaches that parse an entire scene to make a
driving decision, and behavior reflex approaches that directly map an input
image to a driving action by a regressor. In this paper, we propose a third
paradigm: a direct perception approach to estimate the affordance for driving.
We propose to map an input image to a small number of key perception indicators
that directly relate to the affordance of a road/traffic state for driving. Our
representation provides a set of compact yet complete descriptions of the scene
to enable a simple controller to drive autonomously. Falling in between the two
extremes of mediated perception and behavior reflex, we argue that our direct
perception representation provides the right level of abstraction. To
demonstrate this, we train a deep Convolutional Neural Network using recording
from 12 hours of human driving in a video game and show that our model can work
well to drive a car in a very diverse set of virtual environments. We also
train a model for car distance estimation on the KITTI dataset. Results show
that our direct perception approach can generalize well to real driving images.
Source code and data are available on our project website
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
While there has been remarkable progress in the performance of visual
recognition algorithms, the state-of-the-art models tend to be exceptionally
data-hungry. Large labeled training datasets, expensive and tedious to produce,
are required to optimize millions of parameters in deep network models. Lagging
behind the growth in model capacity, the available datasets are quickly
becoming outdated in terms of size and density. To circumvent this bottleneck,
we propose to amplify human effort through a partially automated labeling
scheme, leveraging deep learning with humans in the loop. Starting from a large
set of candidate images for each category, we iteratively sample a subset, ask
people to label them, classify the others with a trained model, split the set
into positives, negatives, and unlabeled based on the classification
confidence, and then iterate with the unlabeled set. To assess the
effectiveness of this cascading procedure and enable further progress in visual
recognition research, we construct a new image dataset, LSUN. It contains
around one million labeled images for each of 10 scene categories and 20 object
categories. We experiment with training popular convolutional networks and find
that they achieve substantial performance gains when trained on this dataset
Discrete Object Generation with Reversible Inductive Construction
The success of generative modeling in continuous domains has led to a surge
of interest in generating discrete data such as molecules, source code, and
graphs. However, construction histories for these discrete objects are
typically not unique and so generative models must reason about intractably
large spaces in order to learn. Additionally, structured discrete domains are
often characterized by strict constraints on what constitutes a valid object
and generative models must respect these requirements in order to produce
useful novel samples. Here, we present a generative model for discrete objects
employing a Markov chain where transitions are restricted to a set of local
operations that preserve validity. Building off of generative interpretations
of denoising autoencoders, the Markov chain alternates between producing 1) a
sequence of corrupted objects that are valid but not from the data
distribution, and 2) a learned reconstruction distribution that attempts to fix
the corruptions while also preserving validity. This approach constrains the
generative model to only produce valid objects, requires the learner to only
discover local modifications to the objects, and avoids marginalization over an
unknown and potentially large space of construction histories. We evaluate the
proposed approach on two highly structured discrete domains, molecules and
Laman graphs, and find that it compares favorably to alternative methods at
capturing distributional statistics for a host of semantically relevant
metrics
Anatomy-specific classification of medical images using deep convolutional nets
Automated classification of human anatomy is an important prerequisite for
many computer-aided diagnosis systems. The spatial complexity and variability
of anatomy throughout the human body makes classification difficult. "Deep
learning" methods such as convolutional networks (ConvNets) outperform other
state-of-the-art methods in image classification tasks. In this work, we
present a method for organ- or body-part-specific anatomical classification of
medical images acquired using computed tomography (CT) with ConvNets. We train
a ConvNet, using 4,298 separate axial 2D key-images to learn 5 anatomical
classes. Key-images were mined from a hospital PACS archive, using a set of
1,675 patients. We show that a data augmentation approach can help to enrich
the data set and improve classification performance. Using ConvNets and data
augmentation, we achieve anatomy-specific classification error of 5.9 % and
area-under-the-curve (AUC) values of an average of 0.998 in testing. We
demonstrate that deep learning can be used to train very reliable and accurate
classifiers that could initialize further computer-aided diagnosis.Comment: Presented at: 2015 IEEE International Symposium on Biomedical
Imaging, April 16-19, 2015, New York Marriott at Brooklyn Bridge, NY, US
Interleaved text/image Deep Mining on a large-scale radiology database
Despite tremendous progress in computer vision, effec-tive learning on very large-scale (> 100K patients) medi-cal image databases has been vastly hindered. We present an interleaved text/image deep learning system to extract and mine the semantic interactions of radiology images and reports from a national research hospital’s picture archiv-ing and communication system. Instead of using full 3D medical volumes, we focus on a collection of representa-tive ~216K 2D key images/slices (selected by clinicians for diagnostic reference) with text-driven scalar and vector la-bels. Our system interleaves between unsupervised learn-ing (e.g., latent Dirichlet allocation, recurrent neural net language models) on document- and sentence-level texts to generate semantic labels and supervised learning via deep convolutional neural networks (CNNs) to map from images to label spaces. Disease-related key words can be predicted for radiology images in a retrieval manner. We have demon-strated promising quantitative and qualitative results. The large-scale datasets of extracted key images and their cat-egorization, embedded vector labels and sentence descrip-tions can be harnessed to alleviate the deep learning “data-hungry ” obstacle in the medical domain