Search CORE

67,505 research outputs found

Fast Graph-Based Object Segmentation for RGB-D Images

Author: Rosa Stefano
Toscana Giorgio
Publication venue
Publication date: 12/05/2016
Field of study

Object segmentation is an important capability for robotic systems, in particular for grasping. We present a graph- based approach for the segmentation of simple objects from RGB-D images. We are interested in segmenting objects with large variety in appearance, from lack of texture to strong textures, for the task of robotic grasping. The algorithm does not rely on image features or machine learning. We propose a modified Canny edge detector for extracting robust edges by using depth information and two simple cost functions for combining color and depth cues. The cost functions are used to build an undirected graph, which is partitioned using the concept of internal and external differences between graph regions. The partitioning is fast with O(NlogN) complexity. We also discuss ways to deal with missing depth information. We test the approach on different publicly available RGB-D object datasets, such as the Rutgers APC RGB-D dataset and the RGB-D Object Dataset, and compare the results with other existing methods

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

Edge-based motion segmentation

Author: Smith Paul Alexander
Publication venue: University of Cambridge
Publication date: 12/02/2002
Field of study

Motion segmentation is the process of dividing video frames into regions which have different motions, providing a cut-out of the moving objects. Such a segmentation is a necessary first stage in many video analysis applications, but providing an accurate, efficient motion segmentation still presents a challenge. This dissertation proposes a novel approach to motion segmentation, using the image edges in a frame. Using edges, a motion can be calculated for each object. Edges provide good motion information, and it is shown that a set of edges, labelled according to the object motion that they obey, is sufficient to completely determine the labelling of the whole frame, up to unresolvable ambiguities. The areas of the frame between edges are divided into regions, grouping together pixels of similar colour, and these regions can each be assigned to different motion layers by reference to the edges. The depth ordering of these layers can also be deduced. A Bayesian framework is presented, which determines the most likely region labelling and depth ordering, given edges labelled with their probability of obeying each of the object motions. An efficient implementation of this framework is presented, initially for segmenting two motions (foreground and background) using two frames. The ExpectationMaximisation algorithm is used to determine the two motions and calculate the label probability for each edge. The frame is then segmented into regions. The best motion labelling for these regions is determined using simulated annealing. Extensions of this simple implementation are then presented. It is demonstrated how, by tracking the edges into further frames, the statistics may be accumulated to provide an even more accurate and robust segmentation. This also allows a complete sequence to be segmented. It is then demonstrated that the framework can be extended to a larger number of motions. A new hierarchical method of initialising the Expectation-Maximisation algorithm is described, which also determines the best number of motions. These techniques have been extensively tested on thirty-four real sequences, covering a wide range of genres. The results demonstrate that the proposed edge-based approach is an accurate and efficient method of obtaining a motion segmentation

Apollo (Cambridge)

Automated Semantic Content Extraction from Images

Author: Arab Khazaeli Mahdi
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image

Louisiana State University

SeGAN: Segmenting and Generating the Invisible

Author: Ehsani Kiana
Farhadi Ali
Mottaghi Roozbeh
Publication venue
Publication date: 07/05/2018
Field of study

Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects. Doing so requires knowing which pixels to paint (segmenting the invisible parts of objects) and what color to paint them (generating the invisible parts). Our proposed novel solution, SeGAN, jointly optimizes for both segmentation and generation of the invisible parts of objects. Our experimental results show that: (a) SeGAN can learn to generate the appearance of the occluded parts of objects; (b) SeGAN outperforms state-of-the-art segmentation baselines for the invisible parts of objects; (c) trained on synthetic photo realistic images, SeGAN can reliably segment natural images; (d) by reasoning about occluder occludee relations, our method can infer depth layering.Comment: Accepted to CVPR18 as spotligh

arXiv.org e-Print Archive

Crossref

Deep Learning With Effective Hierarchical Attention Mechanisms in Perception of Autonomous Vehicles

Author: Chen Qiuxiao
Publication venue: DigitalCommons@USU
Publication date: 01/12/2023
Field of study

Autonomous vehicles need to gather and understand information from their surroundings to drive safely. Just like how we look around and understand what\u27s happening on the road, these vehicles need to see and make sense of dynamic objects like other cars, pedestrians, and cyclists, and static objects like crosswalks, road barriers, and stop lines. In this dissertation, we aim to figure out better ways for computers to understand their surroundings in the 3D object detection task and map segmentation task. The 3D object detection task automatically spots objects in 3D (like cars or cyclists) and the map segmentation task automatically divides maps into different sections. To do this, we use attention modules to help the computer focus on important items. We create one network to find 3D objects such as cars on a highway, and one network to divide different parts of a map into different regions. Each of the networks utilizes the attention module and its hierarchical attention module to achieve comparable results with the best methods on challenging benchmarks. We name the 3D object detection network as Point Cloud Detection Network (PCDet), which utilizes LiDAR sensors to obtain the point cloud inputs with accurate depth information. To solve the problem of lacking multi-scale features and using the high-semantic features ineffectively, the proposed PCDet utilizes Hierarchical Double-branch Spatial Attention (HDSA) to capture high-level and low-level features at the same time. PCDet applies the Double-branch Spatial Attention (DSA) at the early stage and the late stage of the network, which helps to use the high-level features at the beginning of the network and obtain the multiple-scale features. However, HDSA does not consider global relational information. This limitation is solved by Hierarchical Residual Graph Convolutional Attention (HRGCA). PCDet applies the HRGCA module, which contains both graph and coordinate information, to not only effectively acquire the global information but also efficiently estimate contextual relationships of the global information in the 3D point cloud. We name the map segmentation network as Multi-View Segmentation in Bird\u27s-Eye-View (BEVSeg), which utilizes multiple cameras to obtain multi-view image inputs with plenty of colorful and textured information. The proposed BEVSeg aims to utilize high-level features effectively and solve the common overfitting problems in map segmentation tasks. Specifically, BEVSeg utilizes an Aligned BEV domain data Augmentation (ABA) module to flip, rotate, and scale the BEV feature map and repeat the same process on its ground truths to address overfitting issues. It further incorporates the hierarchical attention mechanisms, namely, HDSA and HRGCA, to effectively capture high-level and low-level features and to estimate global relationships between different regions in both the early stage and the late stage of the network, respectively. In general, the proposed HDSA is able to capture the high-level features and help utilize the high-level features effectively in both LiDAR-based 3D object detection and multiple camera-based map segmentation tasks, i.e. PCDet and BEVSeg. In addition, we proposed a new effective HRGCA to further capture global relationships between different regions to improve both 3D object detection accuracy and map segmentation performance

DigitalCommons@USU

Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

Author: Feng Yue
Jiang Jianmin
Ren Jinchang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

University of Strathclyde Institutional Repository

Surrey Research Insight

Recurrent Scene Parsing with Perspective Understanding in the Loop

Author: Fowlkes Charless
Kong Shu
Publication venue
Publication date: 05/12/2017
Field of study

Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

arXiv.org e-Print Archive

Crossref