18,317 research outputs found
Deep Affordance-grounded Sensorimotor Object Recognition
It is well-established by cognitive neuroscience that human perception of
objects constitutes a complex process, where object appearance information is
combined with evidence about the so-called object "affordances", namely the
types of actions that humans typically perform when interacting with them. This
fact has recently motivated the "sensorimotor" approach to the challenging task
of automatic object recognition, where both information sources are fused to
improve robustness. In this work, the aforementioned paradigm is adopted,
surpassing current limitations of sensorimotor object recognition research.
Specifically, the deep learning paradigm is introduced to the problem for the
first time, developing a number of novel neuro-biologically and
neuro-physiologically inspired architectures that utilize state-of-the-art
neural networks for fusing the available information sources in multiple ways.
The proposed methods are evaluated using a large RGB-D corpus, which is
specifically collected for the task of sensorimotor object recognition and is
made publicly available. Experimental results demonstrate the utility of
affordance information to object recognition, achieving an up to 29% relative
error reduction by its inclusion.Comment: 9 pages, 7 figures, dataset link included, accepted to CVPR 201
3D Visual Perception for Self-Driving Cars using a Multi-Camera System: Calibration, Mapping, Localization, and Obstacle Detection
Cameras are a crucial exteroceptive sensor for self-driving cars as they are
low-cost and small, provide appearance information about the environment, and
work in various weather conditions. They can be used for multiple purposes such
as visual navigation and obstacle detection. We can use a surround multi-camera
system to cover the full 360-degree field-of-view around the car. In this way,
we avoid blind spots which can otherwise lead to accidents. To minimize the
number of cameras needed for surround perception, we utilize fisheye cameras.
Consequently, standard vision pipelines for 3D mapping, visual localization,
obstacle detection, etc. need to be adapted to take full advantage of the
availability of multiple cameras rather than treat each camera individually. In
addition, processing of fisheye images has to be supported. In this paper, we
describe the camera calibration and subsequent processing pipeline for
multi-fisheye-camera systems developed as part of the V-Charge project. This
project seeks to enable automated valet parking for self-driving cars. Our
pipeline is able to precisely calibrate multi-camera systems, build sparse 3D
maps for visual navigation, visually localize the car with respect to these
maps, generate accurate dense maps, as well as detect obstacles based on
real-time depth map extraction
Binocular fusion and invariant category learning due to predictive remapping during scanning of a depthful scene with eye movements
How does the brain maintain stable fusion of 3D scenes when the eyes move? Every eye movement causes each retinal position to process a different set of scenic features, and thus the brain needs to binocularly fuse new combinations of features at each position after an eye movement. Despite these breaks in retinotopic fusion due to each movement, previously fused representations of a scene in depth often appear stable. The 3D ARTSCAN neural model proposes how the brain does this by unifying concepts about how multiple cortical areas in the What and Where cortical streams interact to coordinate processes of 3D boundary and surface perception, spatial attention, invariant object category learning, predictive remapping, eye movement control, and learned coordinate transformations. The model explains data from single neuron and psychophysical studies of covert visual attention shifts prior to eye movements. The model further clarifies how perceptual, attentional, and cognitive interactions among multiple brain regions (LGN, V1, V2, V3A, V4, MT, MST, PPC, LIP, ITp, ITa, SC) may accomplish predictive remapping as part of the process whereby view-invariant object categories are learned. These results build upon earlier neural models of 3D vision and figure-ground separation and the learning of invariant object categories as the eyes freely scan a scene. A key process concerns how an object's surface representation generates a form-fitting distribution of spatial attention, or attentional shroud, in parietal cortex that helps maintain the stability of multiple perceptual and cognitive processes. Predictive eye movement signals maintain the stability of the shroud, as well as of binocularly fused perceptual boundaries and surface representations.Published versio
END-TO-END LEARNING UTILIZING TEMPORAL INFORMATION FOR VISION- BASED AUTONOMOUS DRIVING
End-to-End learning models trained with conditional imitation learning (CIL) have demonstrated their capabilities in driving autonomously in dynamic environments. The performance of such models however is limited as most of them fail to utilize the temporal information, which resides in a sequence of observations. In this work, we explore the use of temporal information with a recurrent network to improve driving performance. We propose a model that combines a pre-trained, deeper convolutional neural network to better capture image features with a long short-term memory network to better explore temporal information. Experimental results indicate that the proposed model achieves performance gain in several tasks in the CARLA benchmark, compared to the state-of-the-art models. In particular, comparing with other CIL-based models in the most challenging task, navigation in dynamic environments, we achieve a 96% success rate while other CIL-based models had 82-92% in training conditions; we also achieved 88% while other CIL-based models did 42-90% in the new town and new weather conditions. The subsequent ablation study also shows that all the major features of the proposed model are essential for improving performance. We, therefore, believe that this work contributes significantly towards safe, efficient, clean autonomous driving for future smart cities
Multi-layered reasoning by means of conceptual fuzzy sets
The real world consists of a very large number of instances of events and continuous numeric values. On the other hand, people represent and process their knowledge in terms of abstracted concepts derived from generalization of these instances and numeric values. Logic based paradigms for knowledge representation use symbolic processing both for concept representation and inference. Their underlying assumption is that a concept can be defined precisely. However, as this assumption hardly holds for natural concepts, it follows that symbolic processing cannot deal with such concepts. Thus symbolic processing has essential problems from a practical point of view of applications in the real world. In contrast, fuzzy set theory can be viewed as a stronger and more practical notation than formal, logic based theories because it supports both symbolic processing and numeric processing, connecting the logic based world and the real world. In this paper, we propose multi-layered reasoning by using conceptual fuzzy sets (CFS). The general characteristics of CFS are discussed along with upper layer supervision and context dependent processing
An Overview about Emerging Technologies of Autonomous Driving
Since DARPA started Grand Challenges in 2004 and Urban Challenges in 2007,
autonomous driving has been the most active field of AI applications. This
paper gives an overview about technical aspects of autonomous driving
technologies and open problems. We investigate the major fields of self-driving
systems, such as perception, mapping and localization, prediction, planning and
control, simulation, V2X and safety etc. Especially we elaborate on all these
issues in a framework of data closed loop, a popular platform to solve the long
tailed autonomous driving problems
- …