52 research outputs found
The Devil is in the Decoder: Classification, Regression and GANs
Many machine vision applications, such as semantic segmentation and depth
prediction, require predictions for every pixel of the input image. Models for
such problems usually consist of encoders which decrease spatial resolution
while learning a high-dimensional representation, followed by decoders who
recover the original input resolution and result in low-dimensional
predictions. While encoders have been studied rigorously, relatively few
studies address the decoder side. This paper presents an extensive comparison
of a variety of decoders for a variety of pixel-wise tasks ranging from
classification, regression to synthesis. Our contributions are: (1) Decoders
matter: we observe significant variance in results between different types of
decoders on various problems. (2) We introduce new residual-like connections
for decoders. (3) We introduce a novel decoder: bilinear additive upsampling.
(4) We explore prediction artifacts
Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
With more and more household objects built on planned obsolescence and
consumed by a fast-growing population, hazardous waste recycling has become a
critical challenge. Given the large variability of household waste, current
recycling platforms mostly rely on human operators to analyze the scene,
typically composed of many object instances piled up in bulk. Helping them by
robotizing the unitary extraction is a key challenge to speed up this tedious
process. Whereas supervised deep learning has proven very efficient for such
object-level scene understanding, e.g., generic object detection and
segmentation in everyday scenes, it however requires large sets of per-pixel
labeled images, that are hardly available for numerous application contexts,
including industrial robotics. We thus propose a step towards a practical
interactive application for generating an object-oriented robotic grasp,
requiring as inputs only one depth map of the scene and one user click on the
next object to extract. More precisely, we address in this paper the middle
issue of object seg-mentation in top views of piles of bulk objects given a
pixel location, namely seed, provided interactively by a human operator. We
propose a twofold framework for generating edge-driven instance segments.
First, we repurpose a state-of-the-art fully convolutional object contour
detector for seed-based instance segmentation by introducing the notion of
edge-mask duality with a novel patch-free and contour-oriented loss function.
Second, we train one model using only synthetic scenes, instead of manually
labeled training data. Our experimental results show that considering edge-mask
duality for training an encoder-decoder network, as we suggest, outperforms a
state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly
Robotics, 10th International Workshop, Springer Proceedings in Advanced
Robotics, vol 7. The final authenticated version is available online at:
https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in
Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly
Robotics, 10th International Workshop,
Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion
The predictive performance of supervised learning algorithms depends on the
quality of labels. In a typical label collection process, multiple annotators
provide subjective noisy estimates of the "truth" under the influence of their
varying skill-levels and biases. Blindly treating these noisy labels as the
ground truth limits the accuracy of learning algorithms in the presence of
strong disagreement. This problem is critical for applications in domains such
as medical imaging where both the annotation cost and inter-observer
variability are high. In this work, we present a method for simultaneously
learning the individual annotator model and the underlying true label
distribution, using only noisy observations. Each annotator is modeled by a
confusion matrix that is jointly estimated along with the classifier
predictions. We propose to add a regularization term to the loss function that
encourages convergence to the true annotator confusion matrix. We provide a
theoretical argument as to how the regularization is essential to our approach
both for the case of single annotator and multiple annotators. Despite the
simplicity of the idea, experiments on image classification tasks with both
simulated and real labels show that our method either outperforms or performs
on par with the state-of-the-art methods and is capable of estimating the
skills of annotators even with a single label available per image.Comment: CVPR 2019, code snippets include
Learning Shape Priors for Single-View 3D Completion and Reconstruction
The problem of single-view 3D shape completion or reconstruction is
challenging, because among the many possible shapes that explain an
observation, most are implausible and do not correspond to natural objects.
Recent research in the field has tackled this problem by exploiting the
expressiveness of deep convolutional networks. In fact, there is another level
of ambiguity that is often overlooked: among plausible shapes, there are still
multiple shapes that fit the 2D image equally well; i.e., the ground truth
shape is non-deterministic given a single-view input. Existing fully supervised
approaches fail to address this issue, and often produce blurry mean shapes
with smooth surfaces but no fine details.
In this paper, we propose ShapeHD, pushing the limit of single-view shape
completion and reconstruction by integrating deep generative models with
adversarially learned shape priors. The learned priors serve as a regularizer,
penalizing the model only if its output is unrealistic, not if it deviates from
the ground truth. Our design thus overcomes both levels of ambiguity
aforementioned. Experiments demonstrate that ShapeHD outperforms state of the
art by a large margin in both shape completion and shape reconstruction on
multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work.
Project page: http://shapehd.csail.mit.edu
ICT for Sustainability — Current and future research directions
This workshop brings together researchers from the entire iSchools community to propose, share and discuss their current research and future research agendas and foster collaborations on ICT for Sustainability. ICT plays a major role in sustainability. It threatens sustainability as ICT devices cause carbon emissions, produce e-waste, but it can also be an enabler of sustainability, in form of systems that support the protection of natural resources, and that foster social sustainability, in the form of systems that foster communities and participation. These supporting systems come from many intellectual traditions within and beyond the information field and design. The iSchools community provides an excellent place to discuss this crucial topic at the intersection of information, society, and technology.
This workshop will bring together scholars from across the information field studying ICT for sustainability, to foster new interdisciplinary and multidisciplinary collaborations.ye
TF-Slim: A Lightweight Library for Defining, Training and Evaluating Complex Models in TensorFlow
Presented at CS 7643 Deep Learning on September 7, 2017 from 4:30 p.m. – 5:45 p.m. in the Clough Undergraduate Learning Commons (CULC), Room 144, Georgia Tech.Nathan Silberman is the Lead Deep Learning Scientist
at 4Catalyzer where he works on a variety of healthcare related
projects. His machine learning interests include semantic
segmentation, detection and reinforcement learning and how to best
apply these areas to high-impact areas in the medical world. Prior to
joining 4Catalyzer, Nathan was a researcher at Google where among
various projects, he co-wrote TensorFlow-Slim, which is now a
major component of the TensorFlow library. Nathan received his
Ph.D. in 2015 from New York University under Rob Fergus and
David Sontag.CS 7643 Deep LearningRuntime: 68:05 minutesTF-Slim is a TensorFlow-based library with various components. These include modules for
easily defining neural network models with few lines of code, routines for training and evaluating such
models in a highly distributed fashion and utilities for creating efficient data loading pipelines.
Additionally, the TF-Slim Image Models library provides many commonly used networks (ResNet,
Inception, VGG, etc) that make replicating results and creating new networks using existing components
simple and straightforward. I will discuss some of the design choices and constraints that guided our
development process as well as several high-impact projects in the medical domain that utilize most or
all components of the TF-Slim library
- …