12,545 research outputs found
Learning Less is More - 6D Camera Localization via 3D Surface Regression
Popular research areas like autonomous driving and augmented reality have
renewed the interest in image-based camera localization. In this work, we
address the task of predicting the 6D camera pose from a single RGB image in a
given 3D environment. With the advent of neural networks, previous works have
either learned the entire camera localization process, or multiple components
of a camera localization pipeline. Our key contribution is to demonstrate and
explain that learning a single component of this pipeline is sufficient. This
component is a fully convolutional neural network for densely regressing
so-called scene coordinates, defining the correspondence between the input
image and the 3D scene space. The neural network is prepended to a new
end-to-end trainable pipeline. Our system is efficient, highly accurate, robust
in training, and exhibits outstanding generalization capabilities. It exceeds
state-of-the-art consistently on indoor and outdoor datasets. Interestingly,
our approach surpasses existing techniques even without utilizing a 3D model of
the scene during training, since the network is able to discover 3D scene
geometry automatically, solely from single-view constraints.Comment: CVPR 201
Segmentation-Based Deep-Learning Approach for Surface-Defect Detection
Automated surface-anomaly detection using machine learning has become an
interesting and promising area of research, with a very high and direct impact
on the application domain of visual inspection. Deep-learning methods have
become the most suitable approaches for this task. They allow the inspection
system to learn to detect the surface anomaly by simply showing it a number of
exemplar images. This paper presents a segmentation-based deep-learning
architecture that is designed for the detection and segmentation of surface
anomalies and is demonstrated on a specific domain of surface-crack detection.
The design of the architecture enables the model to be trained using a small
number of samples, which is an important requirement for practical
applications. The proposed model is compared with the related deep-learning
methods, including the state-of-the-art commercial software, showing that the
proposed approach outperforms the related methods on the specific domain of
surface-crack detection. The large number of experiments also shed light on the
required precision of the annotation, the number of required training samples
and on the required computational cost. Experiments are performed on a newly
created dataset based on a real-world quality control case and demonstrates
that the proposed approach is able to learn on a small number of defected
surfaces, using only approximately 25-30 defective training samples, instead of
hundreds or thousands, which is usually the case in deep-learning applications.
This makes the deep-learning method practical for use in industry where the
number of available defective samples is limited. The dataset is also made
publicly available to encourage the development and evaluation of new methods
for surface-defect detection.Comment: Journal of Intelligent Manufacturing 201
Automatic Detection of Knee Joints and Quantification of Knee Osteoarthritis Severity using Convolutional Neural Networks
This paper introduces a new approach to automatically quantify the severity
of knee OA using X-ray images. Automatically quantifying knee OA severity
involves two steps: first, automatically localizing the knee joints; next,
classifying the localized knee joint images. We introduce a new approach to
automatically detect the knee joints using a fully convolutional neural network
(FCN). We train convolutional neural networks (CNN) from scratch to
automatically quantify the knee OA severity optimizing a weighted ratio of two
loss functions: categorical cross-entropy and mean-squared loss. This joint
training further improves the overall quantification of knee OA severity, with
the added benefit of naturally producing simultaneous multi-class
classification and regression outputs. Two public datasets are used to evaluate
our approach, the Osteoarthritis Initiative (OAI) and the Multicenter
Osteoarthritis Study (MOST), with extremely promising results that outperform
existing approaches
Weakly supervised training of pixel resolution segmentation models on whole slide images
We present a novel approach to train pixel resolution segmentation models on
whole slide images in a weakly supervised setup. The model is trained to
classify patches extracted from slides. This leads the training to be made
under noisy labeled data. We solve the problem with two complementary
strategies. First, the patches are sampled online using the model's knowledge
by focusing on regions where the model's confidence is higher. Second, we
propose an extension of the KL divergence that is robust to noisy labels. Our
preliminary experiment on CAMELYON 16 data set show promising results. The
model can successfully segment tumor areas with strong morphological
consistency.Comment: Performance updat
Automatic Renal Segmentation in DCE-MRI using Convolutional Neural Networks
Kidney function evaluation using dynamic contrast-enhanced MRI (DCE-MRI)
images could help in diagnosis and treatment of kidney diseases of children.
Automatic segmentation of renal parenchyma is an important step in this
process. In this paper, we propose a time and memory efficient fully automated
segmentation method which achieves high segmentation accuracy with running time
in the order of seconds in both normal kidneys and kidneys with hydronephrosis.
The proposed method is based on a cascaded application of two 3D convolutional
neural networks that employs spatial and temporal information at the same time
in order to learn the tasks of localization and segmentation of kidneys,
respectively. Segmentation performance is evaluated on both normal and abnormal
kidneys with varying levels of hydronephrosis. We achieved a mean dice
coefficient of 91.4 and 83.6 for normal and abnormal kidneys of pediatric
patients, respectively
A spatiotemporal model with visual attention for video classification
High level understanding of sequential visual input is important for safe and
stable autonomy, especially in localization and object detection. While
traditional object classification and tracking approaches are specifically
designed to handle variations in rotation and scale, current state-of-the-art
approaches based on deep learning achieve better performance. This paper
focuses on developing a spatiotemporal model to handle videos containing moving
objects with rotation and scale changes. Built on models that combine
Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to
classify sequential data, this work investigates the effectiveness of
incorporating attention modules in the CNN stage for video classification. The
superiority of the proposed spatiotemporal model is demonstrated on the Moving
MNIST dataset augmented with rotation and scaling.Comment: Accepted by Robotics: Science and Systems 2017 Workshop on
Articulated Model Trackin
Adversarial Learning for Image Forensics Deep Matching with Atrous Convolution
Constrained image splicing detection and localization (CISDL) is a newly
proposed challenging task for image forensics, which investigates two input
suspected images and identifies whether one image has suspected regions pasted
from the other. In this paper, we propose a novel adversarial learning
framework to train the deep matching network for CISDL. Our framework mainly
consists of three building blocks: 1) the deep matching network based on atrous
convolution (DMAC) aims to generate two high-quality candidate masks which
indicate the suspected regions of the two input images, 2) the detection
network is designed to rectify inconsistencies between the two corresponding
candidate masks, 3) the discriminative network drives the DMAC network to
produce masks that are hard to distinguish from ground-truth ones. In DMAC,
atrous convolution is adopted to extract features with rich spatial
information, the correlation layer based on the skip architecture is proposed
to capture hierarchical features, and atrous spatial pyramid pooling is
constructed to localize tampered regions at multiple scales. The detection
network and the discriminative network act as the losses with auxiliary
parameters to supervise the training of DMAC in an adversarial way. Extensive
experiments, conducted on 21 generated testing sets and two public datasets,
demonstrate the effectiveness of the proposed framework and the superior
performance of DMAC.Comment: 13 pages, 8 figure
Graph-based Proprioceptive Localization Using a Discrete Heading-Length Feature Sequence Matching Approach
Proprioceptive localization refers to a new class of robot egocentric
localization methods that do not rely on the perception and recognition of
external landmarks. These methods are naturally immune to bad weather, poor
lighting conditions, or other extreme environmental conditions that may hinder
exteroceptive sensors such as a camera or a laser ranger finder. These methods
depend on proprioceptive sensors such as inertial measurement units (IMUs)
and/or wheel encoders. Assisted by magnetoreception, the sensors can provide a
rudimentary estimation of vehicle trajectory which is used to query a prior
known map to obtain location. Named as graph-based proprioceptive localization
(GBPL), we provide a low cost fallback solution for localization under
challenging environmental conditions. As a robot/vehicle travels, we extract a
sequence of heading-length values for straight segments from the trajectory and
match the sequence with a pre-processed heading-length graph (HLG) abstracted
from the prior known map to localize the robot under a graph-matching approach.
Using the information from HLG, our location alignment and verification module
compensates for trajectory drift, wheel slip, or tire inflation level. We have
implemented our algorithm and tested it in both simulated and physical
experiments. The algorithm runs successfully in finding robot location
continuously and achieves localization accurate at the level that the prior map
allows (less than 10m).Comment: 13 pages, 32 figure
ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks
This paper aims to classify and locate objects accurately and efficiently,
without using bounding box annotations. It is challenging as objects in the
wild could appear at arbitrary locations and in different scales. In this
paper, we propose a novel classification architecture ProNet based on
convolutional neural networks. It uses computationally efficient neural
networks to propose image regions that are likely to contain objects, and
applies more powerful but slower networks on the proposed regions. The basic
building block is a multi-scale fully-convolutional network which assigns
object confidence scores to boxes at different locations and scales. We show
that such networks can be trained effectively using image-level annotations,
and can be connected into cascades or trees for efficient object
classification. ProNet outperforms previous state-of-the-art significantly on
PASCAL VOC 2012 and MS COCO datasets for object classification and point-based
localization.Comment: CVPR 2016 (fixed reference issue
BAOD: Budget-Aware Object Detection
We study the problem of object detection from a novel perspective in which
annotation budget constraints are taken into consideration, appropriately
coined Budget Aware Object Detection (BAOD). When provided with a fixed budget,
we propose a strategy for building a diverse and informative dataset that can
be used to optimally train a robust detector. We investigate both optimization
and learning-based methods to sample which images to annotate and what type of
annotation (strongly or weakly supervised) to annotate them with. We adopt a
hybrid supervised learning framework to train the object detector from both
these types of annotation. We conduct a comprehensive empirical study showing
that a handcrafted optimization method outperforms other selection techniques
including random sampling, uncertainty sampling and active learning. By
combining an optimal image/annotation selection scheme with hybrid supervised
learning to solve the BAOD problem, we show that one can achieve the
performance of a strongly supervised detector on PASCAL-VOC 2007 while saving
12.8% of its original annotation budget. Furthermore, when of the
budget is used, it surpasses this performance by 2.0 mAP percentage points
- …