3 research outputs found

    Source-free Depth for Object Pop-out

    Full text link
    Depth cues are known to be useful for visual perception. However, direct measurement of depth is often impracticable. Fortunately, though, modern learning-based methods offer promising depth maps by inference in the wild. In this work, we adapt such depth inference models for object segmentation using the objects' ``pop-out'' prior in 3D. The ``pop-out'' is a simple composition prior that assumes objects reside on the background surface. Such compositional prior allows us to reason about objects in the 3D space. More specifically, we adapt the inferred depth maps such that objects can be localized using only 3D information. Such separation, however, requires knowledge about contact surface which we learn using the weak supervision of the segmentation mask. Our intermediate representation of contact surface, and thereby reasoning about objects purely in 3D, allows us to better transfer the depth knowledge into semantics. The proposed adaptation method uses only the depth model without needing the source data used for training, making the learning process efficient and practical. Our experiments on eight datasets of two challenging tasks, namely camouflaged object detection and salient object detection, consistently demonstrate the benefit of our method in terms of both performance and generalizability

    Object Discovery with a Mobile Robot

    Get PDF
    <p>The world is full of objects: cups, phones, computers, books, and</p><p>countless other things. For many tasks, robots need to understand that</p><p>this object is a stapler, that object is a textbook, and this other</p><p>object is a gallon of milk. The classic approach to this problem is</p><p>object recognition, which classifies each observation into one of</p><p>several previously-defined classes. While modern object recognition</p><p>algorithms perform well, they require extensive supervised training:</p><p>in a standard benchmark, the training data average more than four</p><p>hundred images of each object class.</p><p>The cost of manually labeling the training data prohibits these</p><p>techniques from scaling to general environments. Homes and workplaces</p><p>can contain hundreds of unique objects, and the objects in one</p><p>environment may not appear in another.</p><p>We propose a different approach: object discovery. Rather than rely on</p><p>manual labeling, we describe unsupervised algorithms that leverage the</p><p>unique capabilities of a mobile robot to discover the objects (and</p><p>classes of objects) in an environment. Because our algorithms are</p><p>unsupervised, they scale gracefully to large, general environments</p><p>over long periods of time. To validate our results, we collected 67</p><p>robotic runs through a large office environment. This dataset, which</p><p>we have made available to the community, is the largest of its kind.</p><p>At each step, we treat the problem as one of robotics, not disembodied</p><p>computer vision. The scale and quality of our results demonstrate the</p><p>merit of this perspective, and prove the practicality of long-term</p><p>large-scale object discovery.</p>Dissertatio

    Image composition for object pop-out

    No full text
    We propose a new data-driven framework for novel object detection and segmentation, or “object pop-out”. Traditionally, this task is approached via background subtraction, which requires continuous observation from a stationary camera. Instead, we consider this an image matching problem. We detect novel objects in the scene using an unordered, sparse database of previously captured images of the same general environment. The problem is formulated in a new image composition framework: 1) given an input image, we find a small set of similar matching images; 2) each of the matches is aligned with the input by proposing a set of homography transformations; 3) regions from different transformed matches are stitched together into a single composite image that best matches the input; 4) the difference between the input and the composite is used to “pop-out ” new or changed objects. Figure 1. Given a single input image (a), we are able to “explain” it with bits and pieces of similar images taken previously (b), so as to generate a faithful representation of the input image (c) and detect the novel object (d). 1
    corecore