73 research outputs found
Attentive Single-Tasking of Multiple Tasks
In this work we address task interference in universal networks by
considering that a network is trained on multiple tasks, but performs one task
at a time, an approach we refer to as "single-tasking multiple tasks". The
network thus modifies its behaviour through task-dependent feature adaptation,
or task attention. This gives the network the ability to accentuate the
features that are adapted to a task, while shunning irrelevant ones. We further
reduce task interference by forcing the task gradients to be statistically
indistinguishable through adversarial training, ensuring that the common
backbone architecture serving all tasks is not dominated by any of the
task-specific gradients. Results in three multi-task dense labelling problems
consistently show: (i) a large reduction in the number of parameters while
preserving, or even improving performance and (ii) a smooth trade-off between
computation and multi-task accuracy. We provide our system's code and
pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.Comment: CVPR 2019 Camera Read
Deep Extreme Cut: From Extreme Points to Object Segmentation
This paper explores the use of extreme points in an object (left-most,
right-most, top, bottom pixels) as input to obtain precise object segmentation
for images and videos. We do so by adding an extra channel to the image in the
input of a convolutional neural network (CNN), which contains a Gaussian
centered in each of the extreme points. The CNN learns to transform this
information into a segmentation of an object that matches those extreme points.
We demonstrate the usefulness of this approach for guided segmentation
(grabcut-style), interactive segmentation, video object segmentation, and dense
segmentation annotation. We show that we obtain the most precise results to
date, also with less user input, in an extensive and varied selection of
benchmarks and datasets. All our models and code are publicly available on
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.Comment: CVPR 2018 camera ready. Project webpage and code:
http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr
Detection-aided liver lesion segmentation using deep learning
A fully automatic technique for segmenting the liver and localizing its
unhealthy tissues is a convenient tool in order to diagnose hepatic diseases
and assess the response to the according treatments. In this work we propose a
method to segment the liver and its lesions from Computed Tomography (CT) scans
using Convolutional Neural Networks (CNNs), that have proven good results in a
variety of computer vision tasks, including medical imaging. The network that
segments the lesions consists of a cascaded architecture, which first focuses
on the region of the liver in order to segment the lesions on it. Moreover, we
train a detector to localize the lesions, and mask the results of the
segmentation network with the positive detections. The segmentation
architecture is based on DRIU, a Fully Convolutional Network (FCN) with side
outputs that work on feature maps of different resolutions, to finally benefit
from the multi-scale information learned by different stages of the network.
The main contribution of this work is the use of a detector to localize the
lesions, which we show to be beneficial to remove false positives triggered by
the segmentation network. Source code and models are available at
https://imatge-upc.github.io/liverseg-2017-nipsws/ .Comment: NIPS 2017 Workshop on Machine Learning for Health (ML4H
CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
We propose a method for annotating videos of complex multi-object scenes with
a globally-consistent 3D representation of the objects. We annotate each object
with a CAD model from a database, and place it in the 3D coordinate frame of
the scene with a 9-DoF pose transformation. Our method is semi-automatic and
works on commonly-available RGB videos, without requiring a depth sensor. Many
steps are performed automatically, and the tasks performed by humans are
simple, well-specified, and require only limited reasoning in 3D. This makes
them feasible for crowd-sourcing and has allowed us to construct a large-scale
dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate
offers 101k instances of 12k unique CAD models placed in the 3D representations
of 20k videos. In comparison to Scan2CAD, the largest existing dataset with CAD
model annotations on real scenes, CAD-Estate has 7x more instances and 4x more
unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on
CAD-Estate for the task of automatic 3D object reconstruction and pose
estimation, demonstrating that it leads to performance improvements on the
popular Scan2CAD benchmark. The dataset is available at
https://github.com/google-research/cad-estate.Comment: Project page: https://github.com/google-research/cad-estat
Object Contour and Edge Detection with RefineContourNet
A ResNet-based multi-path refinement CNN is used for object contour
detection. For this task, we prioritise the effective utilization of the
high-level abstraction capability of a ResNet, which leads to state-of-the-art
results for edge detection. Keeping our focus in mind, we fuse the high, mid
and low-level features in that specific order, which differs from many other
approaches. It uses the tensor with the highest-levelled features as the
starting point to combine it layer-by-layer with features of a lower
abstraction level until it reaches the lowest level. We train this network on a
modified PASCAL VOC 2012 dataset for object contour detection and evaluate on a
refined PASCAL-val dataset reaching an excellent performance and an Optimal
Dataset Scale (ODS) of 0.752. Furthermore, by fine-training on the BSDS500
dataset we reach state-of-the-art results for edge-detection with an ODS of
0.824.Comment: Keywords: Object Contour Detection, Edge Detection, Multi-Path
Refinement CN
- …