67,505 research outputs found

    Fast Graph-Based Object Segmentation for RGB-D Images

    Full text link
    Object segmentation is an important capability for robotic systems, in particular for grasping. We present a graph- based approach for the segmentation of simple objects from RGB-D images. We are interested in segmenting objects with large variety in appearance, from lack of texture to strong textures, for the task of robotic grasping. The algorithm does not rely on image features or machine learning. We propose a modified Canny edge detector for extracting robust edges by using depth information and two simple cost functions for combining color and depth cues. The cost functions are used to build an undirected graph, which is partitioned using the concept of internal and external differences between graph regions. The partitioning is fast with O(NlogN) complexity. We also discuss ways to deal with missing depth information. We test the approach on different publicly available RGB-D object datasets, such as the Rutgers APC RGB-D dataset and the RGB-D Object Dataset, and compare the results with other existing methods

    Automated Semantic Content Extraction from Images

    Get PDF
    In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image

    SeGAN: Segmenting and Generating the Invisible

    Full text link
    Objects often occlude each other in scenes; Inferring their appearance beyond their visible parts plays an important role in scene understanding, depth estimation, object interaction and manipulation. In this paper, we study the challenging problem of completing the appearance of occluded objects. Doing so requires knowing which pixels to paint (segmenting the invisible parts of objects) and what color to paint them (generating the invisible parts). Our proposed novel solution, SeGAN, jointly optimizes for both segmentation and generation of the invisible parts of objects. Our experimental results show that: (a) SeGAN can learn to generate the appearance of the occluded parts of objects; (b) SeGAN outperforms state-of-the-art segmentation baselines for the invisible parts of objects; (c) trained on synthetic photo realistic images, SeGAN can reliably segment natural images; (d) by reasoning about occluder occludee relations, our method can infer depth layering.Comment: Accepted to CVPR18 as spotligh

    Deep Learning With Effective Hierarchical Attention Mechanisms in Perception of Autonomous Vehicles

    Get PDF
    Autonomous vehicles need to gather and understand information from their surroundings to drive safely. Just like how we look around and understand what\u27s happening on the road, these vehicles need to see and make sense of dynamic objects like other cars, pedestrians, and cyclists, and static objects like crosswalks, road barriers, and stop lines. In this dissertation, we aim to figure out better ways for computers to understand their surroundings in the 3D object detection task and map segmentation task. The 3D object detection task automatically spots objects in 3D (like cars or cyclists) and the map segmentation task automatically divides maps into different sections. To do this, we use attention modules to help the computer focus on important items. We create one network to find 3D objects such as cars on a highway, and one network to divide different parts of a map into different regions. Each of the networks utilizes the attention module and its hierarchical attention module to achieve comparable results with the best methods on challenging benchmarks. We name the 3D object detection network as Point Cloud Detection Network (PCDet), which utilizes LiDAR sensors to obtain the point cloud inputs with accurate depth information. To solve the problem of lacking multi-scale features and using the high-semantic features ineffectively, the proposed PCDet utilizes Hierarchical Double-branch Spatial Attention (HDSA) to capture high-level and low-level features at the same time. PCDet applies the Double-branch Spatial Attention (DSA) at the early stage and the late stage of the network, which helps to use the high-level features at the beginning of the network and obtain the multiple-scale features. However, HDSA does not consider global relational information. This limitation is solved by Hierarchical Residual Graph Convolutional Attention (HRGCA). PCDet applies the HRGCA module, which contains both graph and coordinate information, to not only effectively acquire the global information but also efficiently estimate contextual relationships of the global information in the 3D point cloud. We name the map segmentation network as Multi-View Segmentation in Bird\u27s-Eye-View (BEVSeg), which utilizes multiple cameras to obtain multi-view image inputs with plenty of colorful and textured information. The proposed BEVSeg aims to utilize high-level features effectively and solve the common overfitting problems in map segmentation tasks. Specifically, BEVSeg utilizes an Aligned BEV domain data Augmentation (ABA) module to flip, rotate, and scale the BEV feature map and repeat the same process on its ground truths to address overfitting issues. It further incorporates the hierarchical attention mechanisms, namely, HDSA and HRGCA, to effectively capture high-level and low-level features and to estimate global relationships between different regions in both the early stage and the late stage of the network, respectively. In general, the proposed HDSA is able to capture the high-level features and help utilize the high-level features effectively in both LiDAR-based 3D object detection and multiple camera-based map segmentation tasks, i.e. PCDet and BEVSeg. In addition, we proposed a new effective HRGCA to further capture global relationships between different regions to improve both 3D object detection accuracy and map segmentation performance

    Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

    Get PDF
    Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation
    • …
    corecore