1,786 research outputs found

    Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

    Full text link
    Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.Comment: To appear on ICRA 201

    Anticipating Daily Intention using On-Wrist Motion Triggered Sensing

    Full text link
    Anticipating human intention by observing one's actions has many applications. For instance, picking up a cellphone, then a charger (actions) implies that one wants to charge the cellphone (intention). By anticipating the intention, an intelligent system can guide the user to the closest power outlet. We propose an on-wrist motion triggered sensing system for anticipating daily intentions, where the on-wrist sensors help us to persistently observe one's actions. The core of the system is a novel Recurrent Neural Network (RNN) and Policy Network (PN), where the RNN encodes visual and motion observation to anticipate intention, and the PN parsimoniously triggers the process of visual observation to reduce computation requirement. We jointly trained the whole network using policy gradient and cross-entropy loss. To evaluate, we collect the first daily "intention" dataset consisting of 2379 videos with 34 intentions and 164 unique action sequences. Our method achieves 92.68%, 90.85%, 97.56% accuracy on three users while processing only 29% of the visual observation on average

    Mapping Wide Row Crops with Video Sequences Acquired from a Tractor Moving at Treatment Speed

    Get PDF
    This paper presents a mapping method for wide row crop fields. The resulting map shows the crop rows and weeds present in the inter-row spacing. Because field videos are acquired with a camera mounted on top of an agricultural vehicle, a method for image sequence stabilization was needed and consequently designed and developed. The proposed stabilization method uses the centers of some crop rows in the image sequence as features to be tracked, which compensates for the lateral movement (sway) of the camera and leaves the pitch unchanged. A region of interest is selected using the tracked features, and an inverse perspective technique transforms the selected region into a bird’s-eye view that is centered on the image and that enables map generation. The algorithm developed has been tested on several video sequences of different fields recorded at different times and under different lighting conditions, with good initial results. Indeed, lateral displacements of up to 66% of the inter-row spacing were suppressed through the stabilization process, and crop rows in the resulting maps appear straight

    Humans combine the optic flow with static depth cues for robust perception of heading

    Get PDF
    The retinal flow during normal locomotion contains components due to rotation and translation of the observer. The translatory part of the flow-pattern is informative of heading, because it radiates outward from the direction of heading. However, it is not directly accessible from the retinal flow. Nevertheless, humans can perceive their direction of heading from the compound retinal flow without need for extra-retinal signals that indicate the rotation. Two classes of models have been proposed to explain the visual decomposition of the retinal flow into its constituent parts. One type relies on local operations to remove the rotational part of the flow field. The other type explicitly determines the direction and magnitude of the rotation from the global retinal flow, for subsequent removal. According to the former model, nearby points are most reliable for estimating one's heading. In the latter type of model the quality of the heading estimate depends on the accuracy with which the ego-rotation is determined and is therefore most reliable when based on the most distant points. We report that subjects underestimate the eccentricity of heading, relative to the fixated point in the ground plane, when the visible range of the ground plane is reduced. Moreover we find that in perception of heading, humans can tolerate more noise than the optimal observer (in the least squares sense) would do if only using optic flow. The latter finding argues against both schemes because ultimately both classes of model are limited in their noise tolerance to that of the optimal observer, which uses all information available in the optic flow. Apparently humans use more information than is present in the optic flow. Both aspects of human performance are consistent with the use of static depth information in addition to the optic flow to select the most distant points. Processing of the flow of these selected points provides the most reliable estimate of the ego-rotation. Subsequent estimates of the heading direction, obtained from the translatory component of the flow, are robust with respect to noise. In such a scheme heading estimates are subject to systematic errors, similar to those reported, if the most distant points are not much further away than the fixation point, because the ego-rotation is underestimated

    Perceived Surface Slant Is Systematically Biased in the Actively-Generated Optic Flow

    Get PDF
    Humans make systematic errors in the 3D interpretation of the optic flow in both passive and active vision. These systematic distortions can be predicted by a biologically-inspired model which disregards self-motion information resulting from head movements (Caudek, Fantoni, & Domini 2011). Here, we tested two predictions of this model: (1) A plane that is stationary in an earth-fixed reference frame will be perceived as changing its slant if the movement of the observer's head causes a variation of the optic flow; (2) a surface that rotates in an earth-fixed reference frame will be perceived to be stationary, if the surface rotation is appropriately yoked to the head movement so as to generate a variation of the surface slant but not of the optic flow. Both predictions were corroborated by two experiments in which observers judged the perceived slant of a random-dot planar surface during egomotion. We found qualitatively similar biases for monocular and binocular viewing of the simulated surfaces, although, in principle, the simultaneous presence of disparity and motion cues allows for a veridical recovery of surface slant

    The Southampton-York Natural Scenes (SYNS) dataset: statistics of surface attitude

    No full text
    Recovering 3D scenes from 2D images is an under-constrained task; optimal estimation depends upon knowledge of the underlying scene statistics. Here we introduce the Southampton-York Natural Scenes dataset (SYNS: https://syns.soton.ac.uk), which provides comprehensive scene statistics useful for understanding biological vision and for improving machine vision systems. In order to capture the diversity of environments that humans encounter, scenes were surveyed at random locations within 25 indoor and outdoor categories. Each survey includes (i) spherical LiDAR range data (ii) high-dynamic range spherical imagery and (iii) a panorama of stereo image pairs. We envisage many uses for the dataset and present one example: an analysis of surface attitude statistics, conditioned on scene category and viewing elevation. Surface normals were estimated using a novel adaptive scale selection algorithm. Across categories, surface attitude below the horizon is dominated by the ground plane (0° tilt). Near the horizon, probability density is elevated at 90°/270° tilt due to vertical surfaces (trees, walls). Above the horizon, probability density is elevated near 0° slant due to overhead structure such as ceilings and leaf canopies. These structural regularities represent potentially useful prior assumptions for human and machine observers, and may predict human biases in perceived surface attitude

    A neurobiological and computational analysis of target discrimination in visual clutter by the insect visual system.

    Get PDF
    Some insects have the capability to detect and track small moving objects, often against cluttered moving backgrounds. Determining how this task is performed is an intriguing challenge, both from a physiological and computational perspective. Previous research has characterized higher-order neurons within the fly brain known as 'small target motion detectors‘ (STMD) that respond selectively to targets, even within complex moving surrounds. Interestingly, these cells still respond robustly when the velocity of the target is matched to the velocity of the background (i.e. with no relative motion cues). We performed intracellular recordings from intermediate-order neurons in the fly visual system (the medulla). These full-wave rectifying, transient cells (RTC) reveal independent adaptation to luminance changes of opposite signs (suggesting separate 'on‘ and 'off‘ channels) and fast adaptive temporal mechanisms (as seen in some previously described cell types). We show, via electrophysiological experiments, that the RTC is temporally responsive to rapidly changing stimuli and is well suited to serving an important function in a proposed target-detecting pathway. To model this target discrimination, we use high dynamic range (HDR) natural images to represent 'real-world‘ luminance values that serve as inputs to a biomimetic representation of photoreceptor processing. Adaptive spatiotemporal high-pass filtering (1st-order interneurons) shapes the transient 'edge-like‘ responses, useful for feature discrimination. Following this, a model for the RTC implements a nonlinear facilitation between the rapidly adapting, and independent polarity contrast channels, each with centre-surround antagonism. The recombination of the channels results in increased discrimination of small targets, of approximately the size of a single pixel, without the need for relative motion cues. This method of feature discrimination contrasts with traditional target and background motion-field computations. We show that our RTC-based target detection model is well matched to properties described for the higher-order STMD neurons, such as contrast sensitivity, height tuning and velocity tuning. The model output shows that the spatiotemporal profile of small targets is sufficiently rare within natural scene imagery to allow our highly nonlinear 'matched filter‘ to successfully detect many targets from the background. The model produces robust target discrimination across a biologically plausible range of target sizes and a range of velocities. We show that the model for small target motion detection is highly correlated to the velocity of the stimulus but not other background statistics, such as local brightness or local contrast, which normally influence target detection tasks. From an engineering perspective, we examine model elaborations for improved target discrimination via inhibitory interactions from correlation-type motion detectors, using a form of antagonism between our feature correlator and the more typical motion correlator. We also observe that a changing optimal threshold is highly correlated to the value of observer ego-motion. We present an elaborated target detection model that allows for implementation of a static optimal threshold, by scaling the target discrimination mechanism with a model-derived velocity estimation of ego-motion. Finally, we investigate the physiological relevance of this target discrimination model. We show that via very subtle image manipulation of the visual stimulus, our model accurately predicts dramatic changes in observed electrophysiological responses from STMD neurons.Thesis (Ph.D.) - University of Adelaide, School of Molecular and Biomedical Science, 200

    No Evidence that Binocular Vision Enhances Online Corrections for Reaches in the Lower-Visual Field

    Get PDF
    Some work has proposed that an increased density of retinal ganglion cells in the superior hemiretina elicits a functional advantage for goal-directed reaches in the lower visual field (i.e., loVF). Furthermore, reaches performed with binocular stereo-cues exhibit optimized feedback-based trajectory corrections (i.e., online control). The present study examined whether the purported loVF advantage is restricted to binocular reaches implemented via a primarily online mode of control. Participants completed binocular and monocular reaches to loVF and upper-visual field (i.e., upVF) targets. Separate groups were provided vision during response planning and control (i.e., closed-loop group: CL), or during response planning only (i.e., open-loop group: OL). The binocular condition and the CL group exhibited more online corrections than reaches in the monocular condition or OL group. Notably, however, for all experimental conditions loVF and upVF reaches did not reliably differ – a result demonstrating no systemic loVF advantage for online control
    corecore