181,073 research outputs found
Recognition of 3-D Objects from Multiple 2-D Views by a Self-Organizing Neural Architecture
The recognition of 3-D objects from sequences of their 2-D views is modeled by a neural architecture, called VIEWNET that uses View Information Encoded With NETworks. VIEWNET illustrates how several types of noise and varialbility in image data can be progressively removed while incornplcte image features are restored and invariant features are discovered using an appropriately designed cascade of processing stages. VIEWNET first processes 2-D views of 3-D objects using the CORT-X 2 filter, which discounts the illuminant, regularizes and completes figural boundaries, and removes noise from the images. Boundary regularization and cornpletion are achieved by the same mechanisms that suppress image noise. A log-polar transform is taken with respect to the centroid of the resulting figure and then re-centered to achieve 2-D scale and rotation invariance. The invariant images are coarse coded to further reduce noise, reduce foreshortening effects, and increase generalization. These compressed codes are input into a supervised learning system based on the fuzzy ARTMAP algorithm. Recognition categories of 2-D views are learned before evidence from sequences of 2-D view categories is accumulated to improve object recognition. Recognition is studied with noisy and clean images using slow and fast learning. VIEWNET is demonstrated on an MIT Lincoln Laboratory database of 2-D views of jet aircraft with and without additive noise. A recognition rate of 90% is achieved with one 2-D view category and of 98.5% correct with three 2-D view categories.National Science Foundation (IRI 90-24877); Office of Naval Research (N00014-91-J-1309, N00014-91-J-4100, N00014-92-J-0499); Air Force Office of Scientific Research (F9620-92-J-0499, 90-0083
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
A robot that can carry out a natural-language instruction has been a dream
since before the Jetsons cartoon series imagined a life of leisure mediated by
a fleet of attentive robot helpers. It is a dream that remains stubbornly
distant. However, recent advances in vision and language methods have made
incredible progress in closely related areas. This is significant because a
robot interpreting a natural-language navigation instruction on the basis of
what it sees is carrying out a vision and language process that is similar to
Visual Question Answering. Both tasks can be interpreted as visually grounded
sequence-to-sequence translation problems, and many of the same methods are
applicable. To enable and encourage the application of vision and language
methods to the problem of interpreting visually-grounded navigation
instructions, we present the Matterport3D Simulator -- a large-scale
reinforcement learning environment based on real imagery. Using this simulator,
which can in future support a range of embodied vision and language tasks, we
provide the first benchmark dataset for visually-grounded natural language
navigation in real buildings -- the Room-to-Room (R2R) dataset.Comment: CVPR 2018 Spotlight presentatio
The city as one thing
This paper summarises the latest theories in the field of space syntax. It opens with a discussion of the relationship between the form of urban grids and the process of how cities are formed by human activity; this is done by a comprehensive review of space syntax theory from its starting point in the1970s. The paper goes on to present research into how cities balance the micro-economic factors which shape the spatial structure of cities with the cultural factors that shape the underlying form of residential areas. It goes on to discuss the relationship between activity and space and how this relationship is formed by the way different activities make different demands on movement and co-presence. The paper ends with a discussion regarding the manner in which patterns of spatial integration influence the location of different classes and social groups in the city and contribute to the pathology of housing estates. The paper concludes that spatial form needs to be understood as a contributing factor in forming the patterns of integration and segregation in cities
Enhancing urban analysis through lacunarity multiscale measurement
Urban spatial configurations in most part of the developing countries showparticular urban forms associated with the more informal urban development ofthese areas. Latin American cities are prime examples of this sort, butinvestigation of these urban forms using up to date computational and analyticaltechniques are still scarce. The purpose of this paper is to examine and extendthe methodology of multiscale analysis for urban spatial patterns evaluation. Weexplain and explore the use of Lacunarity based measurements to follow a lineof research that might make more use of new satellite imagery information inurban planning contexts. A set of binary classifications is performed at differentthresholds on selected neighbourhoods of a small Brazilian town. Theclassifications are appraised and lacunarity measurements are compared in faceof the different geographic referenced information for the same neighbourhoodareas. It was found that even with the simple image classification procedure, animportant amount of spatial configuration characteristics could be extracted withthe analytical procedure that, in turn, may be used in planning and other urbanstudies purposes
What makes for effective detection proposals?
Current top performing object detectors employ detection proposals to guide
the search for objects, thereby avoiding exhaustive sliding window search
across images. Despite the popularity and widespread use of detection
proposals, it is unclear which trade-offs are made when using them during
object detection. We provide an in-depth analysis of twelve proposal methods
along with four baselines regarding proposal repeatability, ground truth
annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM,
R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object
detection improving proposal localisation accuracy is as important as improving
recall. We introduce a novel metric, the average recall (AR), which rewards
both high recall and good localisation and correlates surprisingly well with
detection performance. Our findings show common strengths and weaknesses of
existing methods, and provide insights and metrics for selecting and tuning
proposal methods.Comment: TPAMI final version, duplicate proposals removed in experiment
- …