301 research outputs found

    Holistic indoor scene understanding by context supported instance segmentation

    Get PDF
    Intelligent robots require advanced vision capabilities to perceive and interact with the real physical world. While computer vision has made great strides in recent years, its predominant paradigm still focuses on building deep-learning networks or handcrafted features to achieve semantic labeling or instance segmentation separately and independently. However, the two tasks should be synergistically unified in the recognition flow since they have a complementary nature in scene understanding.This dissertation presents the detection of instances in multiple scene understanding levels. Representations that enable intelligent systems to not only recognize what is seen (e.g. Does that pixel represent a chair?), but also predict contextual information about the complete 3D scene as a whole (e.g. How big is the chair? Is the chair placed next to a table?). More specifically, it presents a flow of understanding from local information to global fitness. First, we investigate in the 3D geometry information of instances. A new approach of generating tight cuboids for objects is presented. Then, we take advantage of the trained semantic labeling networks by using the intermediate layer output as a per-category local detector. Instance hypotheses are generated to help traditional optimization methods to get a higher instance segmentation accuracy. After that, to bring the local detection results to holistic scene understanding, our method optimizes object instance segmentation considering both the spacial fitness and the relational compatibility. The context information is implemented using graphical models which represent the scene level object placement in three ways: horizontal, vertical and non-placement hanging relations. Finally, the context information is implemented to a network structure. A deep learning-based re-inferencing frame work is proposed to boost any pixel-level labeling outputs using our local collaborative object presence (LoCOP) feature as the global-to-local guidance.This dissertation demonstrates that uniting pixel-level detection and instance segmentation not only significantly improves the overall performance for localized and individualized analysis, but also paves the way for holistic scene understanding
    • …
    corecore