1,160 research outputs found
Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
With more and more household objects built on planned obsolescence and
consumed by a fast-growing population, hazardous waste recycling has become a
critical challenge. Given the large variability of household waste, current
recycling platforms mostly rely on human operators to analyze the scene,
typically composed of many object instances piled up in bulk. Helping them by
robotizing the unitary extraction is a key challenge to speed up this tedious
process. Whereas supervised deep learning has proven very efficient for such
object-level scene understanding, e.g., generic object detection and
segmentation in everyday scenes, it however requires large sets of per-pixel
labeled images, that are hardly available for numerous application contexts,
including industrial robotics. We thus propose a step towards a practical
interactive application for generating an object-oriented robotic grasp,
requiring as inputs only one depth map of the scene and one user click on the
next object to extract. More precisely, we address in this paper the middle
issue of object seg-mentation in top views of piles of bulk objects given a
pixel location, namely seed, provided interactively by a human operator. We
propose a twofold framework for generating edge-driven instance segments.
First, we repurpose a state-of-the-art fully convolutional object contour
detector for seed-based instance segmentation by introducing the notion of
edge-mask duality with a novel patch-free and contour-oriented loss function.
Second, we train one model using only synthetic scenes, instead of manually
labeled training data. Our experimental results show that considering edge-mask
duality for training an encoder-decoder network, as we suggest, outperforms a
state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly
Robotics, 10th International Workshop, Springer Proceedings in Advanced
Robotics, vol 7. The final authenticated version is available online at:
https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in
Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly
Robotics, 10th International Workshop,
Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network
Depth estimation from a single image is a fundamental problem in computer
vision. In this paper, we propose a simple yet effective convolutional spatial
propagation network (CSPN) to learn the affinity matrix for depth prediction.
Specifically, we adopt an efficient linear propagation model, where the
propagation is performed with a manner of recurrent convolutional operation,
and the affinity among neighboring pixels is learned through a deep
convolutional neural network (CNN). We apply the designed CSPN to two depth
estimation tasks given a single image: (1) To refine the depth output from
state-of-the-art (SOTA) existing methods; and (2) to convert sparse depth
samples to a dense depth map by embedding the depth samples within the
propagation procedure. The second task is inspired by the availability of
LIDARs that provides sparse but accurate depth measurements. We experimented
the proposed CSPN over two popular benchmarks for depth estimation, i.e. NYU v2
and KITTI, where we show that our proposed approach improves in not only
quality (e.g., 30% more reduction in depth error), but also speed (e.g., 2 to 5
times faster) than prior SOTA methods.Comment: 14 pages, 8 figures, ECCV 201
Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion
The predictive performance of supervised learning algorithms depends on the
quality of labels. In a typical label collection process, multiple annotators
provide subjective noisy estimates of the "truth" under the influence of their
varying skill-levels and biases. Blindly treating these noisy labels as the
ground truth limits the accuracy of learning algorithms in the presence of
strong disagreement. This problem is critical for applications in domains such
as medical imaging where both the annotation cost and inter-observer
variability are high. In this work, we present a method for simultaneously
learning the individual annotator model and the underlying true label
distribution, using only noisy observations. Each annotator is modeled by a
confusion matrix that is jointly estimated along with the classifier
predictions. We propose to add a regularization term to the loss function that
encourages convergence to the true annotator confusion matrix. We provide a
theoretical argument as to how the regularization is essential to our approach
both for the case of single annotator and multiple annotators. Despite the
simplicity of the idea, experiments on image classification tasks with both
simulated and real labels show that our method either outperforms or performs
on par with the state-of-the-art methods and is capable of estimating the
skills of annotators even with a single label available per image.Comment: CVPR 2019, code snippets include
Ariel - Volume 2 Number 6
Editors
Richard J. Bonanno
Robin A. Edwards
Associate Editors
Steven Ager
Stephen Flynn
Shep Dickman
Tom Williams
Lay-out Editor
Eugenia Miller
Contributing Editors
Michael J. Blecker
W. Cherry Light
James J. Nocon
Lynne Porter
Editors Emeritus
Delvyn C. Case, Jr.
Paul M. Fernhof
Dual mechanism of brain injury and novel treatment strategy in maple syrup urine disease
Maple syrup urine disease (MSUD) is an inherited disorder of branched-chain amino acid metabolism presenting with lifethreatening cerebral oedema and dysmyelination in affected individuals. Treatment requires life-long dietary restriction and monitoring of branched-chain amino acids to avoid brain injury. Despite careful management, children commonly suffer metabolic decompensation in the context of catabolic stress associated with non-specific illness. The mechanisms underlying this decompensation and brain injury are poorly understood. Using recently developed mouse models of classic and intermediate maple syrup urine disease, we assessed biochemical, behavioural and neuropathological changes that occurred during encephalopathy in these mice. Here, we show that rapid brain leucine accumulation displaces other essential amino acids resulting in neurotransmitter depletion and disruption of normal brain growth and development. A novel approach of administering norleucine to heterozygous mothers of classic maple syrup urine disease pups reduced branched-chain amino acid accumulation in milk as well as blood and brain of these pups to enhance survival. Similarly, norleucine substantially delayed encephalopathy in intermediate maple syrup urine disease mice placed on a high protein diet that mimics the catabolic stress shown to cause encephalopathy in human maple syrup urine disease. Current findings suggest two converging mechanisms of brain injury in maple syrup urine disease including: (i) neurotransmitter deficiencies and growth restriction associated with branchedchain amino acid accumulation and (ii) energy deprivation through Krebs cycle disruption associated with branched-chain ketoacid accumulation. Both classic and intermediate models appear to be useful to study the mechanism of brain injury and potential treatment strategies for maple syrup urine disease. Norleucine should be further tested as a potential treatment to prevent encephalopathy in children with maple syrup urine disease during catabolic stress
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
NODIS: Neural Ordinary Differential Scene Understanding
Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark
Scene Segmentation Driven by Deep Learning and Surface Fitting
This paper proposes a joint color and depth segmentation scheme exploiting together geometrical clues and a learning stage. The approach starts from an initial over-segmentation based on spectral clustering. The input data is also fed to a Convolutional Neural Network (CNN) thus producing a per-pixel descriptor vector for each scene sample. An iterative merging procedure is then used to recombine the segments into the regions corresponding to the various objects and surfaces. The proposed algorithm starts by considering all the adjacent segments and computing a similarity metric according to the CNN features. The couples of segments with higher similarity are considered for merging. Finally the algorithm uses a NURBS surface fitting scheme on the segments in order to understand if the selected couples correspond to a single surface. The comparison with state-of-the-art methods shows how the proposed method provides an accurate and reliable scene segmentation
- …