52 research outputs found

    The Devil is in the Decoder: Classification, Regression and GANs

    Full text link
    Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. This paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise tasks ranging from classification, regression to synthesis. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce new residual-like connections for decoders. (3) We introduce a novel decoder: bilinear additive upsampling. (4) We explore prediction artifacts

    Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network

    Get PDF
    With more and more household objects built on planned obsolescence and consumed by a fast-growing population, hazardous waste recycling has become a critical challenge. Given the large variability of household waste, current recycling platforms mostly rely on human operators to analyze the scene, typically composed of many object instances piled up in bulk. Helping them by robotizing the unitary extraction is a key challenge to speed up this tedious process. Whereas supervised deep learning has proven very efficient for such object-level scene understanding, e.g., generic object detection and segmentation in everyday scenes, it however requires large sets of per-pixel labeled images, that are hardly available for numerous application contexts, including industrial robotics. We thus propose a step towards a practical interactive application for generating an object-oriented robotic grasp, requiring as inputs only one depth map of the scene and one user click on the next object to extract. More precisely, we address in this paper the middle issue of object seg-mentation in top views of piles of bulk objects given a pixel location, namely seed, provided interactively by a human operator. We propose a twofold framework for generating edge-driven instance segments. First, we repurpose a state-of-the-art fully convolutional object contour detector for seed-based instance segmentation by introducing the notion of edge-mask duality with a novel patch-free and contour-oriented loss function. Second, we train one model using only synthetic scenes, instead of manually labeled training data. Our experimental results show that considering edge-mask duality for training an encoder-decoder network, as we suggest, outperforms a state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly Robotics, 10th International Workshop, Springer Proceedings in Advanced Robotics, vol 7. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly Robotics, 10th International Workshop,

    Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion

    Get PDF
    The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying skill-levels and biases. Blindly treating these noisy labels as the ground truth limits the accuracy of learning algorithms in the presence of strong disagreement. This problem is critical for applications in domains such as medical imaging where both the annotation cost and inter-observer variability are high. In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. Each annotator is modeled by a confusion matrix that is jointly estimated along with the classifier predictions. We propose to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix. We provide a theoretical argument as to how the regularization is essential to our approach both for the case of single annotator and multiple annotators. Despite the simplicity of the idea, experiments on image classification tasks with both simulated and real labels show that our method either outperforms or performs on par with the state-of-the-art methods and is capable of estimating the skills of annotators even with a single label available per image.Comment: CVPR 2019, code snippets include

    Learning Shape Priors for Single-View 3D Completion and Reconstruction

    Full text link
    The problem of single-view 3D shape completion or reconstruction is challenging, because among the many possible shapes that explain an observation, most are implausible and do not correspond to natural objects. Recent research in the field has tackled this problem by exploiting the expressiveness of deep convolutional networks. In fact, there is another level of ambiguity that is often overlooked: among plausible shapes, there are still multiple shapes that fit the 2D image equally well; i.e., the ground truth shape is non-deterministic given a single-view input. Existing fully supervised approaches fail to address this issue, and often produce blurry mean shapes with smooth surfaces but no fine details. In this paper, we propose ShapeHD, pushing the limit of single-view shape completion and reconstruction by integrating deep generative models with adversarially learned shape priors. The learned priors serve as a regularizer, penalizing the model only if its output is unrealistic, not if it deviates from the ground truth. Our design thus overcomes both levels of ambiguity aforementioned. Experiments demonstrate that ShapeHD outperforms state of the art by a large margin in both shape completion and shape reconstruction on multiple real datasets.Comment: ECCV 2018. The first two authors contributed equally to this work. Project page: http://shapehd.csail.mit.edu

    ICT for Sustainability — Current and future research directions

    Get PDF
    This workshop brings together researchers from the entire iSchools community to propose, share and discuss their current research and future research agendas and foster collaborations on ICT for Sustainability. ICT plays a major role in sustainability. It threatens sustainability as ICT devices cause carbon emissions, produce e-waste, but it can also be an enabler of sustainability, in form of systems that support the protection of natural resources, and that foster social sustainability, in the form of systems that foster communities and participation. These supporting systems come from many intellectual traditions within and beyond the information field and design. The iSchools community provides an excellent place to discuss this crucial topic at the intersection of information, society, and technology. This workshop will bring together scholars from across the information field studying ICT for sustainability, to foster new interdisciplinary and multidisciplinary collaborations.ye

    TF-Slim: A Lightweight Library for Defining, Training and Evaluating Complex Models in TensorFlow

    No full text
    Presented at CS 7643 Deep Learning on September 7, 2017 from 4:30 p.m. – 5:45 p.m. in the Clough Undergraduate Learning Commons (CULC), Room 144, Georgia Tech.Nathan Silberman is the Lead Deep Learning Scientist at 4Catalyzer where he works on a variety of healthcare related projects. His machine learning interests include semantic segmentation, detection and reinforcement learning and how to best apply these areas to high-impact areas in the medical world. Prior to joining 4Catalyzer, Nathan was a researcher at Google where among various projects, he co-wrote TensorFlow-Slim, which is now a major component of the TensorFlow library. Nathan received his Ph.D. in 2015 from New York University under Rob Fergus and David Sontag.CS 7643 Deep LearningRuntime: 68:05 minutesTF-Slim is a TensorFlow-based library with various components. These include modules for easily defining neural network models with few lines of code, routines for training and evaluating such models in a highly distributed fashion and utilities for creating efficient data loading pipelines. Additionally, the TF-Slim Image Models library provides many commonly used networks (ResNet, Inception, VGG, etc) that make replicating results and creating new networks using existing components simple and straightforward. I will discuss some of the design choices and constraints that guided our development process as well as several high-impact projects in the medical domain that utilize most or all components of the TF-Slim library
    • …
    corecore