281,932 research outputs found

    Research in interactive scene analysis

    Get PDF
    An interactive scene interpretation system (ISIS) was developed as a tool for constructing and experimenting with man-machine and automatic scene analysis methods tailored for particular image domains. A recently developed region analysis subsystem based on the paradigm of Brice and Fennema is described. Using this subsystem a series of experiments was conducted to determine good criteria for initially partitioning a scene into atomic regions and for merging these regions into a final partition of the scene along object boundaries. Semantic (problem-dependent) knowledge is essential for complete, correct partitions of complex real-world scenes. An interactive approach to semantic scene segmentation was developed and demonstrated on both landscape and indoor scenes. This approach provides a reasonable methodology for segmenting scenes that cannot be processed completely automatically, and is a promising basis for a future automatic system. A program is described that can automatically generate strategies for finding specific objects in a scene based on manually designated pictorial examples

    A knowledge-based machine vision system for space station automation

    Get PDF
    A simple knowledge-based approach to the recognition of objects in man-made scenes is being developed. Specifically, the system under development is a proposed enhancement to a robot arm for use in the space station laboratory module. The system will take a request from a user to find a specific object, and locate that object by using its camera input and information from a knowledge base describing the scene layout and attributes of the object types included in the scene. In order to use realistic test images in developing the system, researchers are using photographs of actual NASA simulator panels, which provide similar types of scenes to those expected in the space station environment. Figure 1 shows one of these photographs. In traditional approaches to image analysis, the image is transformed step by step into a symbolic representation of the scene. Often the first steps of the transformation are done without any reference to knowledge of the scene or objects. Segmentation of an image into regions generally produces a counterintuitive result in which regions do not correspond to objects in the image. After segmentation, a merging procedure attempts to group regions into meaningful units that will more nearly correspond to objects. Here, researchers avoid segmenting the image as a whole, and instead use a knowledge-directed approach to locate objects in the scene. The knowledge-based approach to scene analysis is described and the categories of knowledge used in the system are discussed

    Fusion of monocular cues to detect man-made structures in aerial imagery

    Get PDF
    The extraction of buildings from aerial imagery is a complex problem for automated computer vision. It requires locating regions in a scene that possess properties distinguishing them as man-made objects as opposed to naturally occurring terrain features. It is reasonable to assume that no single detection method can correctly delineate or verify buildings in every scene. A cooperative-methods paradigm is useful in approaching the building extraction problem. Using this paradigm, each extraction technique provides information which can be added or assimilated into an overall interpretation of the scene. Thus, the main objective is to explore the development of computer vision system that integrates the results of various scene analysis techniques into an accurate and robust interpretation of the underlying three dimensional scene. The problem of building hypothesis fusion in aerial imagery is discussed. Building extraction techniques are briefly surveyed, including four building extraction, verification, and clustering systems. A method for fusing the symbolic data generated by these systems is described, and applied to monocular image and stereo image data sets. Evaluation methods for the fusion results are described, and the fusion results are analyzed using these methods

    Solving Visual Madlibs with Multiple Cues

    Get PDF
    This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene classification, person activity prediction, and person and object attribute prediction. We also present a method for selecting sub-regions of an image that are relevant for evaluating the appropriateness of a putative answer. Visual features are computed both from the whole image and from local regions, while sentences are mapped to a common space using a simple normalized canonical correlation analysis (CCA) model. Our results show a significant improvement over the previous state of the art, and indicate that answering different question types benefits from examining a variety of image cues and carefully choosing informative image sub-regions

    Detecting shadows and low-lying objects in indoor and outdoor scenes using homographies

    Get PDF
    Many computer vision applications apply background suppression techniques for the detection and segmentation of moving objects in a scene. While these algorithms tend to work well in controlled conditions they often fail when applied to unconstrained real-world environments. This paper describes a system that detects and removes erroneously segmented foreground regions that are close to a ground plane. These regions include shadows, changing background objects and other low-lying objects such as leaves and rubbish. The system uses a set-up of two or more cameras and requires no 3D reconstruction or depth analysis of the regions. Therefore, a strong camera calibration of the set-up is not necessary. A geometric constraint called a homography is exploited to determine if foreground points are on or above the ground plane. The system takes advantage of the fact that regions in images off the homography plane will not correspond after a homography transformation. Experimental results using real world scenes from a pedestrian tracking application illustrate the effectiveness of the proposed approach

    Neural differentiation is moderated by age in scene- but not face-selective cortical regions

    Get PDF
    The aging brain is characterized by neural dedifferentiation, an apparent decrease in the functional selectivity of category-selective cortical regions. Age-related reductions in neural differentiation have been proposed to play a causal role in cognitive aging. Recent findings suggest, however, that age-related dedifferentiation is not equally evident for all stimulus categories and, additionally, that the relationship between neural differentiation and cognitive performance is not moderated by age. In light of these findings, in the present experiment, younger and older human adults (males and females) underwent fMRI as they studied words paired with images of scenes or faces before a subsequent memory task. Neural selectivity was measured in two scene-selective (parahippocampal place area (PPA) and retrosplenial cortex (RSC)] and two face-selective [fusiform face area (FFA) and occipital face area (OFA)] regions using both a univariate differentiation index and multivoxel pattern similarity analysis. Both methods provided highly convergent results, which revealed evidence of age-related reductions in neural dedifferentiation in scene-selective but not face-selective cortical regions. Additionally, neural differentiation in the PPA demonstrated a positive, age-invariant relationship with subsequent source memory performance (recall of the image category paired with each recognized test word). These findings extend prior findings suggesting that age-related neural dedifferentiation is not a ubiquitous phenomenon, and that the specificity of neural responses to scenes is predictive of subsequent memory performance independently of age

    From Volcano to Toyshop: Adaptive Discriminative Region Discovery for Scene Recognition

    Full text link
    As deep learning approaches to scene recognition emerge, they have continued to leverage discriminative regions at multiple scales, building on practices established by conventional image classification research. However, approaches remain largely generic, and do not carefully consider the special properties of scenes. In this paper, inspired by the intuitive differences between scenes and objects, we propose Adi-Red, an adaptive approach to discriminative region discovery for scene recognition. Adi-Red uses a CNN classifier, which was pre-trained using only image-level scene labels, to discover discriminative image regions directly. These regions are then used as a source of features to perform scene recognition. The use of the CNN classifier makes it possible to adapt the number of discriminative regions per image using a simple, yet elegant, threshold, at relatively low computational cost. Experimental results on the scene recognition benchmark dataset SUN397 demonstrate the ability of Adi-Red to outperform the state of the art. Additional experimental analysis on the Places dataset reveals the advantages of Adi-Red, and highlight how they are specific to scenes. We attribute the effectiveness of Adi-Red to the ability of adaptive region discovery to avoid introducing noise, while also not missing out on important information.Comment: To appear at the ACM International Conference on Multimedia (ACM MM 2018). Code available at https://github.com/ZhengyuZhao/Adi-Red-Scen

    RGB-D Scene Representations for Prosthetic Vision

    Get PDF
    This thesis presents a new approach to scene representation for prosthetic vision. Structurally salient information from the scene is conveyed through the prosthetic vision display. Given the low resolution and dynamic range of the display, this enables robust identification and reliable interpretation of key structural features that are missed when using standard appearance-based scene representations. Specifically, two different types of salient structure are investigated: salient edge structure, for depiction of scene shape to the user; and salient object structure, for emulation of biological attention deployment when viewing a scene. This thesis proposes and evaluates novel computer vision algorithms for extracting salient edge and salient object structure from RGB-D input. Extraction of salient edge structure from the scene is first investigated through low-level analysis of surface shape. Our approach is based on the observation that regions of irregular surface shape, such as the boundary between the wall and the floor, tend to be more informative of scene structure than uniformly shaped regions. We detect these surface irregularities through multi-scale analysis of iso-disparity contour orientations, providing a real time method that robustly identifies important scene structure. This approach is then extended by using a deep CNN to learn high level information for distinguishing salient edges from structural texture. A novel depth input encoding called the depth surface descriptor (DSD) is presented, which better captures scene geometry that corresponds to salient edges, improving the learned model. These methods provide robust detection of salient edge structure in the scene. The detection of salient object structure is first achieved by noting that salient objects often have contrasting shape from their surroundings. Contrasting shape in the depth image is captured through the proposed histogram of surface orientations (HOSO) feature. This feature is used to modulate depth and colour contrast in a saliency detection framework, improving the precision of saliency seed regions and through this the accuracy of the final detection. After this, a novel formulation of structural saliency is introduced based on the angular measure of local background enclosure (LBE). This formulation addresses fundamental limitations of depth contrast methods and is not reliant on foreground depth contrast in the scene. Saliency is instead measured through the degree to which a candidate patch exhibits foreground structure. The effectiveness of the proposed approach is evaluated through both standard datasets as well as user studies that measure the contribution of structure-based representations. Our methods are found to more effectively measure salient structure in the scene than existing methods. Our approach results in improved performance compared to standard methods during practical use of an implant display
    corecore