8 research outputs found

    Gaussian Processes with Context-Supported Priors for Active Object Localization

    Full text link
    We devise an algorithm using a Bayesian optimization framework in conjunction with contextual visual data for the efficient localization of objects in still images. Recent research has demonstrated substantial progress in object localization and related tasks for computer vision. However, many current state-of-the-art object localization procedures still suffer from inaccuracy and inefficiency, in addition to failing to provide a principled and interpretable system amenable to high-level vision tasks. We address these issues with the current research. Our method encompasses an active search procedure that uses contextual data to generate initial bounding-box proposals for a target object. We train a convolutional neural network to approximate an offset distance from the target object. Next, we use a Gaussian Process to model this offset response signal over the search space of the target. We then employ a Bayesian active search for accurate localization of the target. In experiments, we compare our approach to a state-of-theart bounding-box regression method for a challenging pedestrian localization task. Our method exhibits a substantial improvement over this baseline regression method.Comment: 10 pages, 4 figure

    Active Object Localization in Visual Situations

    Get PDF
    We describe a method for performing active localization of objects in instances of visual situations. A visual situation is an abstract concept---e.g., "a boxing match", "a birthday party", "walking the dog", "waiting for a bus"---whose image instantiations are linked more by their common spatial and semantic structure than by low-level visual similarity. Our system combines given and learned knowledge of the structure of a particular situation, and adapts that knowledge to a new situation instance as it actively searches for objects. More specifically, the system learns a set of probability distributions describing spatial and other relationships among relevant objects. The system uses those distributions to iteratively sample object proposals on a test image, but also continually uses information from those object proposals to adaptively modify the distributions based on what the system has detected. We test our approach's ability to efficiently localize objects, using a situation-specific image dataset created by our group. We compare the results with several baselines and variations on our method, and demonstrate the strong benefit of using situation knowledge and active context-driven localization. Finally, we contrast our method with several other approaches that use context as well as active search for object localization in images.Comment: 14 page

    Active Object Localization in Visual Situations

    Get PDF
    —We describe a method for performing active localization of objects in instances of visual situations. A visual situation is an abstract concept—e.g., “a boxing match”, “a birthday party”, “walking the dog”, “waiting for a bus”—whose image instantiations are linked more by their common spatial and semantic structure than by low-level visual similarity. Our system combines given and learned knowledge of the structure of a particular situation, and adapts that knowledge to a new situation instance as it actively searches for objects. More specifically, the system learns a set of probability distributions describing spatial and other relationships among relevant objects. The system uses those distributions to iteratively sample object proposals on a test image, but also continually uses information from those object proposals to adaptively modify the distributions based on what the system has detected. We test our approach’s ability to efficiently localize objects, using a situation-specific image dataset created by our group. We compare the results with several baselines and variations on our method, and demonstrate the strong benefit of using situation knowledge and active context-driven localization. Finally, we contrast our method with several other approaches that use context as well as active search for object localization in images

    The eyes have it

    Get PDF

    Adaptive Gaze Control for Object Detection

    Get PDF
    We propose a novel gaze-control model for detecting objects in images. The model, named act-detect, uses the information from local image samples in order to shift its gaze towards object locations. The model constitutes two main contributions. The first contribution is that the model’s setup makes it computationally highly efficient in comparison with existing window-sliding methods for object detection, while retaining an acceptable detection performance. act-detect is evaluated on a face-detection task using a publicly available image set. In terms of detection performance, act-detect slightly outperforms the window-sliding methods that have been applied to the face-detection task. In terms of computational efficiency, act-detect clearly outperforms the window-sliding methods: it requires in the order of hundreds fewer samples for detection. The second contribution of the model lies in its more extensive use of local samples than previous models: instead of merely using them for verifying object presence at the gaze location, the model uses them to determine a direction and distance to the object of interest. The simultaneous adaptation of both the model’s visual features and its gaze-control strategy leads to the discovery of features and strategies for exploiting the local context of objects. For example, the model uses the spatial relations between the bodies of the persons in the images and their faces. The resulting gaze control is a temporal process, in which the object’s context is exploited at different scales and at different image locations relative to the object