181,073 research outputs found

    Recognition of 3-D Objects from Multiple 2-D Views by a Self-Organizing Neural Architecture

    Full text link
    The recognition of 3-D objects from sequences of their 2-D views is modeled by a neural architecture, called VIEWNET that uses View Information Encoded With NETworks. VIEWNET illustrates how several types of noise and varialbility in image data can be progressively removed while incornplcte image features are restored and invariant features are discovered using an appropriately designed cascade of processing stages. VIEWNET first processes 2-D views of 3-D objects using the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and removes noise from the images. Boundary regularization and cornpletion are achieved by the same mechanisms that suppress image noise. A log-polar transform is taken with respect to the centroid of the resulting figure and then re-centered to achieve 2-D scale and rotation invariance. The invariant images are coarse coded to further reduce noise, reduce foreshortening effects, and increase generalization. These compressed codes are input into a supervised learning system based on the fuzzy ARTMAP algorithm. Recognition categories of 2-D views are learned before evidence from sequences of 2-D view categories is accumulated to improve object recognition. Recognition is studied with noisy and clean images using slow and fast learning. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 2-D views of jet aircraft with and without additive noise. A recognition rate of 90% is achieved with one 2-D view category and of 98.5% correct with three 2-D view categories.National Science Foundation (IRI 90-24877); Office of Naval Research (N00014-91-J-1309, N00014-91-J-4100, N00014-92-J-0499); Air Force Office of Scientific Research (F9620-92-J-0499, 90-0083

    Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

    Full text link
    A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio

    The city as one thing

    Get PDF
    This paper summarises the latest theories in the field of space syntax. It opens with a discussion of the relationship between the form of urban grids and the process of how cities are formed by human activity; this is done by a comprehensive review of space syntax theory from its starting point in the1970s. The paper goes on to present research into how cities balance the micro-economic factors which shape the spatial structure of cities with the cultural factors that shape the underlying form of residential areas. It goes on to discuss the relationship between activity and space and how this relationship is formed by the way different activities make different demands on movement and co-presence. The paper ends with a discussion regarding the manner in which patterns of spatial integration influence the location of different classes and social groups in the city and contribute to the pathology of housing estates. The paper concludes that spatial form needs to be understood as a contributing factor in forming the patterns of integration and segregation in cities

    Enhancing urban analysis through lacunarity multiscale measurement

    Get PDF
    Urban spatial configurations in most part of the developing countries showparticular urban forms associated with the more informal urban development ofthese areas. Latin American cities are prime examples of this sort, butinvestigation of these urban forms using up to date computational and analyticaltechniques are still scarce. The purpose of this paper is to examine and extendthe methodology of multiscale analysis for urban spatial patterns evaluation. Weexplain and explore the use of Lacunarity based measurements to follow a lineof research that might make more use of new satellite imagery information inurban planning contexts. A set of binary classifications is performed at differentthresholds on selected neighbourhoods of a small Brazilian town. Theclassifications are appraised and lacunarity measurements are compared in faceof the different geographic referenced information for the same neighbourhoodareas. It was found that even with the simple image classification procedure, animportant amount of spatial configuration characteristics could be extracted withthe analytical procedure that, in turn, may be used in planning and other urbanstudies purposes

    What makes for effective detection proposals?

    Full text link
    Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
    • …
    corecore