Search CORE

44,248 research outputs found

RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter

Author: Behnke Sven
Milan Anton
Periyasamy Arul Selvam
Schwarz Max
Publication venue: 'SAGE Publications'
Publication date: 01/10/2018
Field of study

Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic data sets possible. We evaluate our approach on two challenging data sets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task, and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception

arXiv.org e-Print Archive

Review of Visual Saliency Detection with Comprehensive Information

Author: Cheng Ming-Ming
Cong Runmin
Fu Huazhu
Huang Qingming
Lei Jianjun
Lin Weisi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/09/2018
Field of study

Visual saliency detection model simulates the human visual system to perceive the scene, and has been widely used in many vision tasks. With the acquisition technology development, more comprehensive information, such as depth cue, inter-image correspondence, or temporal relationship, is available to extend image saliency detection to RGBD saliency detection, co-saliency detection, or video saliency detection. RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information. Co-saliency detection model introduces the inter-image correspondence constraint to discover the common salient object in an image group. The goal of video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly. In this paper, we review different types of saliency detection algorithms, summarize the important issues of the existing methods, and discuss the existent problems and future works. Moreover, the evaluation datasets and quantitative measurements are briefly introduced, and the experimental analysis and discission are conducted to provide a holistic overview of different saliency detection methods.Comment: 18 pages, 11 figures, 7 tables, Accepted by IEEE Transactions on Circuits and Systems for Video Technology 2018, https://rmcong.github.io

arXiv.org e-Print Archive

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

Author: Cai Jianfei
Lu Shijian
Meng Fanman
Zhu Hongyuan
Publication venue
Publication date: 02/02/2015
Field of study

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision. A lot of research has been conducted and has resulted in many applications. However, while many segmentation algorithms exist, yet there are only a few sparse and outdated summarizations available, an overview of the recent achievements and issues is lacking. We aim to provide a comprehensive review of the recent progress in this field. Covering 180 publications, we give an overview of broad areas of segmentation topics including not only the classic bottom-up approaches, but also the recent development in superpixel, interactive methods, object proposals, semantic image parsing and image cosegmentation. In addition, we also review the existing influential datasets and evaluation metrics. Finally, we suggest some design flavors and research directions for future research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image Representatio

arXiv.org e-Print Archive

Detachable Object Detection: Segmentation and Depth Ordering From Short-Baseline Video

Author: Ayvaci Alper
Soatto Stefano
Publication venue
Publication date: 21/09/2011
Field of study

We describe an approach for segmenting an image into regions that correspond to surfaces in the scene that are partially surrounded by the medium. It integrates both appearance and motion statistics into a cost functional, that is seeded with occluded regions and minimized efficiently by solving a linear programming problem. Where a short observation time is insufficient to determine whether the object is detachable, the results of the minimization can be used to seed a more costly optimization based on a longer sequence of video data. The result is an entirely unsupervised scheme to detect and segment an arbitrary and unknown number of objects. We test our scheme to highlight the potential, as well as limitations, of our approach

arXiv.org e-Print Archive

SIGNet: Semantic Instance Aided Unsupervised 3D Geometry Perception

Author: Bansal Gaurav
Bharadia Dinesh
Guo Rui
Javidi Tara
Lu Yongxi
Meng Yue
Raj Aman
Sunarjo Samuel
Publication venue
Publication date: 04/04/2019
Field of study

Unsupervised learning for geometric perception (depth, optical flow, etc.) is of great interest to autonomous systems. Recent works on unsupervised learning have made considerable progress on perceiving geometry; however, they usually ignore the coherence of objects and perform poorly under scenarios with dark and noisy environments. In contrast, supervised learning algorithms, which are robust, require large labeled geometric dataset. This paper introduces SIGNet, a novel framework that provides robust geometry perception without requiring geometrically informative labels. Specifically, SIGNet integrates semantic information to make depth and flow predictions consistent with objects and robust to low lighting conditions. SIGNet is shown to improve upon the state-of-the-art unsupervised learning for depth prediction by 30% (in squared relative error). In particular, SIGNet improves the dynamic object class performance by 39% in depth prediction and 29% in flow prediction. Our code will be made available at https://github.com/mengyuest/SIGNetComment: To appear at CVPR 201

arXiv.org e-Print Archive

Fusion Based Holistic Road Scene Understanding

Author: Gong Xiaojin
Huang Wenqi
Publication venue
Publication date: 29/06/2014
Field of study

This paper addresses the problem of holistic road scene understanding based on the integration of visual and range data. To achieve the grand goal, we propose an approach that jointly tackles object-level image segmentation and semantic region labeling within a conditional random field (CRF) framework. Specifically, we first generate semantic object hypotheses by clustering 3D points, learning their prior appearance models, and using a deep learning method for reasoning their semantic categories. The learned priors, together with spatial and geometric contexts, are incorporated in CRF. With this formulation, visual and range data are fused thoroughly, and moreover, the coupled segmentation and semantic labeling problem can be inferred via Graph Cuts. Our approach is validated on the challenging KITTI dataset that contains diverse complicated road scenarios. Both quantitative and qualitative evaluations demonstrate its effectiveness.Comment: 14 pages,11 figure

arXiv.org e-Print Archive

Human Centred Object Co-Segmentation

Author: Savarese Silvio
Saxena Ashutosh
Wu Chenxia
Zhang Jiemi
Publication venue
Publication date: 12/06/2016
Field of study

Co-segmentation is the automatic extraction of the common semantic regions given a set of images. Different from previous approaches mainly based on object visuals, in this paper, we propose a human centred object co-segmentation approach, which uses the human as another strong evidence. In order to discover the rich internal structure of the objects reflecting their human-object interactions and visual similarities, we propose an unsupervised fully connected CRF auto-encoder incorporating the rich object features and a novel human-object interaction representation. We propose an efficient learning and inference algorithm to allow the full connectivity of the CRF with the auto-encoder, that establishes pairwise relations on all pairs of the object proposals in the dataset. Moreover, the auto-encoder learns the parameters from the data itself rather than supervised learning or manually assigned parameters in the conventional CRF. In the extensive experiments on four datasets, we show that our approach is able to extract the common objects more accurately than the state-of-the-art co-segmentation algorithms

arXiv.org e-Print Archive

A Novel Semantics and Feature Preserving Perspective for Content Aware Image Retargeting

Author: Dragotti Pier Luigi
Shankar Sukrit
Publication venue
Publication date: 12/08/2015
Field of study

There is an increasing requirement for efficient image retargeting techniques to adapt the content to various forms of digital media. With rapid growth of mobile communications and dynamic web page layouts, one often needs to resize the media content to adapt to the desired display sizes. For various layouts of web pages and typically small sizes of handheld portable devices, the importance in the original image content gets obfuscated after resizing it with the approach of uniform scaling. Thus, there occurs a need for resizing the images in a content aware manner which can automatically discard irrelevant information from the image and present the salient features with more magnitude. There have been proposed some image retargeting techniques keeping in mind the content awareness of the input image. However, these techniques fail to prove globally effective for various kinds of images and desired sizes. The major problem is the inefficiency of these algorithms to process these images with minimal visual distortion while also retaining the meaning conveyed from the image. In this dissertation, we present a novel perspective for content aware image retargeting, which is well implementable in real time. We introduce a novel method of analysing semantic information within the input image while also maintaining the important and visually significant features. We present the various nuances of our algorithm mathematically and logically, and show that the results prove better than the state-of-the-art techniques.Comment: 74 Pages, 46 Figures, Masters Thesi

arXiv.org e-Print Archive

cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey

Author: Abe Kaori
Hoshino Hironori
Imanari Takaaki
Kataoka Hirokatsu
Kato Ryo
Kobayashi Naomichi
Miyashita Yudai
Morita Shinichiro
Nakamura Akio
Sato Shin'ichi
Shirakabe Soma
Yamabe Tomoaki
Publication venue
Publication date: 26/05/2016
Field of study

The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the ALL 602 conference papers presented at the CVPR2015, the premier annual computer vision event held in June 2015, in order to grasp the trends in the field. Further, we are proposing "DeepSurvey" as a mechanism embodying the entire process from the reading through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape

arXiv.org e-Print Archive

Incorporating Near-Infrared Information into Semantic Image Segmentation

Author: Csurka Gabriela
Larlus Diane
Salamati Neda
Süsstrunk Sabine
Publication venue
Publication date: 24/06/2014
Field of study

Recent progress in computational photography has shown that we can acquire near-infrared (NIR) information in addition to the normal visible (RGB) band, with only slight modifications to standard digital cameras. Due to the proximity of the NIR band to visible radiation, NIR images share many properties with visible images. However, as a result of the material dependent reflection in the NIR part of the spectrum, such images reveal different characteristics of the scene. We investigate how to effectively exploit these differences to improve performance on the semantic image segmentation task. Based on a state-of-the-art segmentation framework and a novel manually segmented image database (both indoor and outdoor scenes) that contain 4-channel images (RGB+NIR), we study how to best incorporate the specific characteristics of the NIR response. We show that adding NIR leads to improved performance for classes that correspond to a specific type of material in both outdoor and indoor scenes. We also discuss the results with respect to the physical properties of the NIR response

arXiv.org e-Print Archive