11 research outputs found

    SmartAnnotator An Interactive Tool for Annotating Indoor RGBD Images

    Get PDF
    RGBD images with high quality annotations, both in the form of geometric (i.e., segmentation) and structural (i.e., how do the segments mutually relate in 3D) information, provide valuable priors for a diverse range of applications in scene understanding and image manipulation. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging. We present SmartAnnotator, an interactive system to facilitate annotating raw RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and generates an ordered list of hypotheses. The user simply has to flip through the suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As annotations are finalized, the process becomes simpler with fewer ambiguities to resolve. Moreover, as more scenes are annotated, the system makes better suggestions based on the structural and geometric priors learned from previous annotation sessions. We test the system on a large number of indoor scenes across different users and experimental settings, validate the results on existing benchmark datasets, and report significant improvements over low-level annotation alternatives. (Code and benchmark datasets are publicly available on the project page.

    A LiDAR Point Cloud Generator: from a Virtual World to Autonomous Driving

    Full text link
    3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate point-level labels from a computer game. The framework supports data collection from both auto-driving scenes and user-configured scenes. Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic calibration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed

    View suggestion for interactive segmentation of indoor scenes

    Get PDF

    View suggestion for interactive segmentation of indoor scenes

    Get PDF
    Point cloud segmentation is a fundamental problem. Due to the complexity of real-world scenes and the limitations of 3D scanners, interactive segmentation is currently the only way to cope with all kinds of point clouds. However, interactively segmenting complex and large-scale scenes is very time-consuming. In this paper, we present a novel interactive system for segmenting point cloud scenes. Our system automatically suggests a series of camera views, in which users can conveniently specify segmentation guidance. In this way, users may focus on specifying segmentation hints instead of manually searching for desirable views of unsegmented objects, thus significantly reducing user effort. To achieve this, we introduce a novel view preference model, which is based on a set of dedicated view attributes, with weights learned from a user study. We also introduce support relations for both graph-cut-based segmentation and finding similar objects. Our experiments show that our segmentation technique helps users quickly segment various types of scenes, outperforming alternative methods

    Shallow2Deep: Indoor scene modeling by single image understanding

    Get PDF
    Dense indoor scene modeling from 2D images has been bottlenecked due to the absence of depth information and cluttered occlusions. We present an automatic indoor scene modeling approach using deep features from neural networks. Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship by reasoning indoor environment context. Particularly, we design a shallow-to-deep architecture on the basis of convolutional networks for semantic scene understanding and modeling. It involves multi-level convolutional networks to parse indoor semantics/geometry into non-relational and relational knowledge. Non-relational knowledge extracted from shallow-end networks (e.g. room layout, object geometry) is fed forward into deeper levels to parse relational semantics (e.g. support relationship). A Relation Network is proposed to infer the support relationship between objects. All the structured semantics and geometry above are assembled to guide a global optimization for 3D scene modeling. Qualitative and quantitative analysis demonstrates the feasibility of our method in understanding and modeling semantics-enriched indoor scenes by evaluating the performance of reconstruction accuracy, computation performance and scene complexity

    Holistic indoor scene understanding, modelling and reconstruction from single images.

    Get PDF
    3D indoor scene understanding in computer vision refers to perceiving the semantic and geometric information in a 3D indoor environment from partial observations (e.g. images or depth scans). Semantics in a scene generally involves the conceptual knowledge such as the room layout, object categories, and their interrelationships (e.g. support relationship). These scene semantics are usually coupled with object and room geometry for 3D scene understanding, for example, layout plan (i.e. location of walls, ceiling and floor), shape of in-room objects, and a camera pose of observer. This thesis focuses on the problem of holistic 3D scene understanding from single images to model or reconstruct the in- door geometry with enriched scene semantics. This challenging task requires computers to perform equivalently as human vision system to perceive and understand indoor contents from colour intensities. Existing works either focus on a sub-problem (e.g. layout estimation, 3D detection or object reconstruction), or ad- dressing this entire problem with independent subtasks, while this thesis aims to an integrated and unified solution toward semantic scene understanding and reconstruction. In this thesis, scene semantics and geometry are regarded inter- twined and complementary. Understanding each part (semantics or geometry) helps to perceive the other one, which enables joint scene understanding, modelling & reconstruction. We start by the problem of semantic scene modelling. To estimate the object semantics and shapes from a single image, a feasible scene modelling streamline is proposed. It is backboned with fully convolutional networks to learn 2D semantics and geometry, and powered by a top-down shape retrieval for object modelling. After this, We build a unified and more efficient visual system for semantic scene modelling. Scene semantics are divided into relational (i.e. support relationship) and non-relational (i.e. object segmentation & geometry, room layout) knowledge. A Relation Network is proposed to estimate the support relations between objects to guide the object modelling process. Afterwards, We focus on the problem of holistic and end-to-end scene understanding and reconstruction. Instead of modelling scenes by top-down shape retrieval, this method bridges the gap between scene understanding and object mesh reconstruction. It does not rely on any external CAD repositories. Camera poses, room lay- out, object bounding boxes and meshes are end-to-end predicted from an RGB image with a single network architecture. At the end, We extend our work by using a different input modality, single-view depth scan, to explore the object reconstruction performance. A skeleton-bridged approach is proposed to predict the meso-skeleton of shapes as an intermediate representation to guide surface reconstruction, which outperforms the prior-arts in shape completion. Overall, this thesis provides a series of novel approaches towards holistic 3D indoor scene understanding, modelling and reconstruction. It aims at automatic 3D scene perception that enables machines to understand and predict 3D contents as human vision, which we hope could advance the boundaries of 3D vision in machine perception, robotics and Artificial Intelligence
    corecore