2,404 research outputs found
View-based approaches to spatial representation in human vision
In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene based on known baseline information from interocular separation or proprioception as the observer walks. An alternative is that observers use view-based methods to guide their actions and to represent the spatial layout of the scene. In this case, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk. We describe the way in which the eye movement strategy of animals simplifies motion processing if their goal is to move towards a desired image and discuss dorsal and ventral stream processing of moving images in that context. Although many questions about view-based approaches to scene representation remain unanswered, the solutions are likely to be highly relevant to understanding biological 3-D vision
Active Estimation of Distance in a Robotic Vision System that Replicates Human Eye Movement
Many visual cues, both binocular and monocular, provide 3D information. When an agent moves with respect to a scene, an important cue is the different motion of objects located at various distances. While a motion parallax is evident for large translations of the agent, in most head/eye systems a small parallax occurs also during rotations of the cameras. A similar parallax is present also in the human eye. During a relocation of gaze, the shift in the retinal projection of an object depends not only on the amplitude of the movement, but also on the distance of the object with respect to the observer. This study proposes a method for estimating distance on the basis of the parallax that emerges from rotations of a camera. A pan/tilt system specifically designed to reproduce the oculomotor parallax present in the human eye was used to replicate the oculomotor strategy by which humans scan visual scenes. We show that the oculomotor parallax provides accurate estimation of distance during sequences of eye movements. In a system that actively scans a visual scene, challenging tasks such as image segmentation and figure/ground segregation greatly benefit from this cue.National Science Foundation (BIC-0432104, CCF-0130851
Different Motion Cues Are Used to Estimate Time-to-arrival for Frontoparallel and Loming Trajectories
Estimation of time-to-arrival for moving objects is critical to obstacle interception and avoidance, as well as to timing actions such as reaching and grasping moving objects. The source of motion information that conveys arrival time varies with the trajectory of the object raising the question of whether multiple context-dependent mechanisms are involved in this computation. To address this question we conducted a series of psychophysical studies to measure observersâ performance on time-to-arrival estimation when object trajectory was specified by angular motion (âgap closureâ trajectories in the frontoparallel plane), looming (colliding trajectories, TTC) or both (passage courses, TTP). We measured performance of time-to-arrival judgments in the presence of irrelevant motion, in which a perpendicular motion vector was added to the object trajectory. Data were compared to models of expected performance based on the use of different components of optical information. Our results demonstrate that for gap closure, performance depended only on the angular motion, whereas for TTC and TTP, both angular and looming motion affected performance. This dissociation of inputs suggests that gap closures are mediated by a separate mechanism than that used for the detection of time-to-collision and time-to-passage. We show that existing models of TTC and TTP estimation make systematic errors in predicting subject performance, and suggest that a model which weights motion cues by their relative time-to-arrival provides a better account of performance
One Object at a Time: Accurate and Robust Structure From Motion for Robots
A gaze-fixating robot perceives distance to the fixated object and relative
positions of surrounding objects immediately, accurately, and robustly. We show
how fixation, which is the act of looking at one object while moving, exploits
regularities in the geometry of 3D space to obtain this information. These
regularities introduce rotation-translation couplings that are not commonly
used in structure from motion. To validate, we use a Franka Emika Robot with an
RGB camera. We a) find that error in distance estimate is less than 5 mm at a
distance of 15 cm, and b) show how relative position can be used to find
obstacles under challenging scenarios. We combine accurate distance estimates
and obstacle information into a reactive robot behavior that is able to pick up
objects of unknown size, while impeded by unforeseen obstacles. Project page:
https://oxidification.com/p/one-object-at-a-time/ .Comment: v3: Add link to project page v2: Update DOI v1: Accepted at 2022
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
Biophysically motivated efficient estimation of the spatially isotropic R*2 component from a single gradientârecalled echo measurement
Purpose
To propose and validate an efficient method, based on a biophysically motivated signal model, for removing the orientationâdependent part of R*2 using a single gradientârecalled echo (GRE) measurement.
Methods
The proposed method utilized a temporal secondâorder approximation of the hollowâcylinderâfiber model, in which the parameter describing the linear signal decay corresponded to the orientationâindependent part of R*2. The estimated parameters were compared to the classical, monoâexponential decay model for R*2 in a sample of an ex vivo human optic chiasm (OC). The OC was measured at 16 distinct orientations relative to the external magnetic field using GRE at 7T. To show that the proposed signal model can remove the orientation dependence of R*2, it was compared to the established phenomenological method for separating R*2 into orientationâdependent and âindependent parts.
Results
Using the phenomenological method on the classical signal model, the wellâknown separation of R*2 into orientationâdependent and âindependent parts was verified. For the proposed model, no significant orientation dependence in the linear signal decay parameter was observed.
Conclusions
Since the proposed secondâorder model features orientationâdependent and âindependent components at distinct temporal orders, it can be used to remove the orientation dependence of R*2 using only a single GRE measurement
Emerging hypothesis verification using function-based geometric models and active vision strategies
This paper describes an investigation into the use of parametric 2D models describing the movement of edges for the determination of possible 3D shape and hence function of an object. An assumption of this research is that the camera can foveate and track particular features. It is argued that simple 2D analytic descriptions of the movement of edges can infer 3D shape while the camera is moved. This uses an advantage of foveation i.e. the problem becomes object centred. The problem of correspondence for numerous edge points is overcome by the use of a tree based representation for the competing hypotheses. Numerous hypothesis are maintained simultaneously and it does not rely on a single kinematic model which assumes constant velocity or acceleration. The numerous advantages of this strategy are described
- âŠ