44,248 research outputs found
RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter
Autonomous robotic manipulation in clutter is challenging. A large variety of
objects must be perceived in complex scenes, where they are partially occluded
and embedded among many distractors, often in restricted spaces. To tackle
these challenges, we developed a deep-learning approach that combines object
detection and semantic segmentation. The manipulation scenes are captured with
RGB-D cameras, for which we developed a depth fusion method. Employing
pretrained features makes learning from small annotated robotic data sets
possible. We evaluate our approach on two challenging data sets: one captured
for the Amazon Picking Challenge 2016, where our team NimbRo came in second in
the Stowing and third in the Picking task, and one captured in
disaster-response scenarios. The experiments show that object detection and
semantic segmentation complement each other and can be combined to yield
reliable object perception
Review of Visual Saliency Detection with Comprehensive Information
Visual saliency detection model simulates the human visual system to perceive
the scene, and has been widely used in many vision tasks. With the acquisition
technology development, more comprehensive information, such as depth cue,
inter-image correspondence, or temporal relationship, is available to extend
image saliency detection to RGBD saliency detection, co-saliency detection, or
video saliency detection. RGBD saliency detection model focuses on extracting
the salient regions from RGBD images by combining the depth information.
Co-saliency detection model introduces the inter-image correspondence
constraint to discover the common salient object in an image group. The goal of
video saliency detection model is to locate the motion-related salient object
in video sequences, which considers the motion cue and spatiotemporal
constraint jointly. In this paper, we review different types of saliency
detection algorithms, summarize the important issues of the existing methods,
and discuss the existent problems and future works. Moreover, the evaluation
datasets and quantitative measurements are briefly introduced, and the
experimental analysis and discission are conducted to provide a holistic
overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on
Circuits and Systems for Video Technology 2018, https://rmcong.github.io
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
Detachable Object Detection: Segmentation and Depth Ordering From Short-Baseline Video
We describe an approach for segmenting an image into regions that correspond
to surfaces in the scene that are partially surrounded by the medium. It
integrates both appearance and motion statistics into a cost functional, that
is seeded with occluded regions and minimized efficiently by solving a linear
programming problem. Where a short observation time is insufficient to
determine whether the object is detachable, the results of the minimization can
be used to seed a more costly optimization based on a longer sequence of video
data. The result is an entirely unsupervised scheme to detect and segment an
arbitrary and unknown number of objects. We test our scheme to highlight the
potential, as well as limitations, of our approach
SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception
Unsupervised learning for geometric perception (depth, optical flow, etc.) is
of great interest to autonomous systems. Recent works on unsupervised learning
have made considerable progress on perceiving geometry; however, they usually
ignore the coherence of objects and perform poorly under scenarios with dark
and noisy environments. In contrast, supervised learning algorithms, which are
robust, require large labeled geometric dataset. This paper introduces SIGNet,
a novel framework that provides robust geometry perception without requiring
geometrically informative labels. Specifically, SIGNet integrates semantic
information to make depth and flow predictions consistent with objects and
robust to low lighting conditions. SIGNet is shown to improve upon the
state-of-the-art unsupervised learning for depth prediction by 30% (in squared
relative error). In particular, SIGNet improves the dynamic object class
performance by 39% in depth prediction and 29% in flow prediction. Our code
will be made available at https://github.com/mengyuest/SIGNetComment: To appear at CVPR 201
Fusion Based Holistic Road Scene Understanding
This paper addresses the problem of holistic road scene understanding based
on the integration of visual and range data. To achieve the grand goal, we
propose an approach that jointly tackles object-level image segmentation and
semantic region labeling within a conditional random field (CRF) framework.
Specifically, we first generate semantic object hypotheses by clustering 3D
points, learning their prior appearance models, and using a deep learning
method for reasoning their semantic categories. The learned priors, together
with spatial and geometric contexts, are incorporated in CRF. With this
formulation, visual and range data are fused thoroughly, and moreover, the
coupled segmentation and semantic labeling problem can be inferred via Graph
Cuts. Our approach is validated on the challenging KITTI dataset that contains
diverse complicated road scenarios. Both quantitative and qualitative
evaluations demonstrate its effectiveness.Comment: 14 pages,11 figure
Human Centred Object Co-Segmentation
Co-segmentation is the automatic extraction of the common semantic regions
given a set of images. Different from previous approaches mainly based on
object visuals, in this paper, we propose a human centred object
co-segmentation approach, which uses the human as another strong evidence. In
order to discover the rich internal structure of the objects reflecting their
human-object interactions and visual similarities, we propose an unsupervised
fully connected CRF auto-encoder incorporating the rich object features and a
novel human-object interaction representation. We propose an efficient learning
and inference algorithm to allow the full connectivity of the CRF with the
auto-encoder, that establishes pairwise relations on all pairs of the object
proposals in the dataset. Moreover, the auto-encoder learns the parameters from
the data itself rather than supervised learning or manually assigned parameters
in the conventional CRF. In the extensive experiments on four datasets, we show
that our approach is able to extract the common objects more accurately than
the state-of-the-art co-segmentation algorithms
A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting
There is an increasing requirement for efficient image retargeting techniques
to adapt the content to various forms of digital media. With rapid growth of
mobile communications and dynamic web page layouts, one often needs to resize
the media content to adapt to the desired display sizes. For various layouts of
web pages and typically small sizes of handheld portable devices, the
importance in the original image content gets obfuscated after resizing it with
the approach of uniform scaling. Thus, there occurs a need for resizing the
images in a content aware manner which can automatically discard irrelevant
information from the image and present the salient features with more
magnitude. There have been proposed some image retargeting techniques keeping
in mind the content awareness of the input image. However, these techniques
fail to prove globally effective for various kinds of images and desired sizes.
The major problem is the inefficiency of these algorithms to process these
images with minimal visual distortion while also retaining the meaning conveyed
from the image. In this dissertation, we present a novel perspective for
content aware image retargeting, which is well implementable in real time. We
introduce a novel method of analysing semantic information within the input
image while also maintaining the important and visually significant features.
We present the various nuances of our algorithm mathematically and logically,
and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Incorporating Near-Infrared Information into Semantic Image Segmentation
Recent progress in computational photography has shown that we can acquire
near-infrared (NIR) information in addition to the normal visible (RGB) band,
with only slight modifications to standard digital cameras. Due to the
proximity of the NIR band to visible radiation, NIR images share many
properties with visible images. However, as a result of the material dependent
reflection in the NIR part of the spectrum, such images reveal different
characteristics of the scene. We investigate how to effectively exploit these
differences to improve performance on the semantic image segmentation task.
Based on a state-of-the-art segmentation framework and a novel manually
segmented image database (both indoor and outdoor scenes) that contain
4-channel images (RGB+NIR), we study how to best incorporate the specific
characteristics of the NIR response. We show that adding NIR leads to improved
performance for classes that correspond to a specific type of material in both
outdoor and indoor scenes. We also discuss the results with respect to the
physical properties of the NIR response
- …