2,404 research outputs found

    View-based approaches to spatial representation in human vision

    Get PDF
    In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene based on known baseline information from interocular separation or proprioception as the observer walks. An alternative is that observers use view-based methods to guide their actions and to represent the spatial layout of the scene. In this case, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk. We describe the way in which the eye movement strategy of animals simplifies motion processing if their goal is to move towards a desired image and discuss dorsal and ventral stream processing of moving images in that context. Although many questions about view-based approaches to scene representation remain unanswered, the solutions are likely to be highly relevant to understanding biological 3-D vision

    Active Estimation of Distance in a Robotic Vision System that Replicates Human Eye Movement

    Full text link
    Many visual cues, both binocular and monocular, provide 3D information. When an agent moves with respect to a scene, an important cue is the different motion of objects located at various distances. While a motion parallax is evident for large translations of the agent, in most head/eye systems a small parallax occurs also during rotations of the cameras. A similar parallax is present also in the human eye. During a relocation of gaze, the shift in the retinal projection of an object depends not only on the amplitude of the movement, but also on the distance of the object with respect to the observer. This study proposes a method for estimating distance on the basis of the parallax that emerges from rotations of a camera. A pan/tilt system specifically designed to reproduce the oculomotor parallax present in the human eye was used to replicate the oculomotor strategy by which humans scan visual scenes. We show that the oculomotor parallax provides accurate estimation of distance during sequences of eye movements. In a system that actively scans a visual scene, challenging tasks such as image segmentation and figure/ground segregation greatly benefit from this cue.National Science Foundation (BIC-0432104, CCF-0130851

    Different Motion Cues Are Used to Estimate Time-to-arrival for Frontoparallel and Loming Trajectories

    Get PDF
    Estimation of time-to-arrival for moving objects is critical to obstacle interception and avoidance, as well as to timing actions such as reaching and grasping moving objects. The source of motion information that conveys arrival time varies with the trajectory of the object raising the question of whether multiple context-dependent mechanisms are involved in this computation. To address this question we conducted a series of psychophysical studies to measure observers’ performance on time-to-arrival estimation when object trajectory was specified by angular motion (“gap closure” trajectories in the frontoparallel plane), looming (colliding trajectories, TTC) or both (passage courses, TTP). We measured performance of time-to-arrival judgments in the presence of irrelevant motion, in which a perpendicular motion vector was added to the object trajectory. Data were compared to models of expected performance based on the use of different components of optical information. Our results demonstrate that for gap closure, performance depended only on the angular motion, whereas for TTC and TTP, both angular and looming motion affected performance. This dissociation of inputs suggests that gap closures are mediated by a separate mechanism than that used for the detection of time-to-collision and time-to-passage. We show that existing models of TTC and TTP estimation make systematic errors in predicting subject performance, and suggest that a model which weights motion cues by their relative time-to-arrival provides a better account of performance

    One Object at a Time: Accurate and Robust Structure From Motion for Robots

    Full text link
    A gaze-fixating robot perceives distance to the fixated object and relative positions of surrounding objects immediately, accurately, and robustly. We show how fixation, which is the act of looking at one object while moving, exploits regularities in the geometry of 3D space to obtain this information. These regularities introduce rotation-translation couplings that are not commonly used in structure from motion. To validate, we use a Franka Emika Robot with an RGB camera. We a) find that error in distance estimate is less than 5 mm at a distance of 15 cm, and b) show how relative position can be used to find obstacles under challenging scenarios. We combine accurate distance estimates and obstacle information into a reactive robot behavior that is able to pick up objects of unknown size, while impeded by unforeseen obstacles. Project page: https://oxidification.com/p/one-object-at-a-time/ .Comment: v3: Add link to project page v2: Update DOI v1: Accepted at 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

    Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling

    Full text link
    The bottom-up saliency, an early stage of humans' visual attention, can be considered as a binary classification problem between centre and surround classes. Discriminant power of features for the classification is measured as mutual information between distributions of image features and corresponding classes . As the estimated discrepancy very much depends on considered scale level, multi-scale structure and discriminant power are integrated by employing discrete wavelet features and Hidden Markov Tree (HMT). With wavelet coefficients and Hidden Markov Tree parameters, quad-tree like label structures are constructed and utilized in maximum a posterior probability (MAP) of hidden class variables at corresponding dyadic sub-squares. Then, a saliency value for each square block at each scale level is computed with discriminant power principle. Finally, across multiple scales is integrated the final saliency map by an information maximization rule. Both standard quantitative tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed multi-scale discriminant saliency (MDIS) method against the well-know information based approach AIM on its released image collection with eye-tracking data. Simulation results are presented and analysed to verify the validity of MDIS as well as point out its limitation for further research direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396

    Biophysically motivated efficient estimation of the spatially isotropic R*2 component from a single gradient‐recalled echo measurement

    Get PDF
    Purpose To propose and validate an efficient method, based on a biophysically motivated signal model, for removing the orientation‐dependent part of R*2 using a single gradient‐recalled echo (GRE) measurement. Methods The proposed method utilized a temporal second‐order approximation of the hollow‐cylinder‐fiber model, in which the parameter describing the linear signal decay corresponded to the orientation‐independent part of R*2. The estimated parameters were compared to the classical, mono‐exponential decay model for R*2 in a sample of an ex vivo human optic chiasm (OC). The OC was measured at 16 distinct orientations relative to the external magnetic field using GRE at 7T. To show that the proposed signal model can remove the orientation dependence of R*2, it was compared to the established phenomenological method for separating R*2 into orientation‐dependent and ‐independent parts. Results Using the phenomenological method on the classical signal model, the well‐known separation of R*2 into orientation‐dependent and ‐independent parts was verified. For the proposed model, no significant orientation dependence in the linear signal decay parameter was observed. Conclusions Since the proposed second‐order model features orientation‐dependent and ‐independent components at distinct temporal orders, it can be used to remove the orientation dependence of R*2 using only a single GRE measurement

    Emerging hypothesis verification using function-based geometric models and active vision strategies

    Full text link
    This paper describes an investigation into the use of parametric 2D models describing the movement of edges for the determination of possible 3D shape and hence function of an object. An assumption of this research is that the camera can foveate and track particular features. It is argued that simple 2D analytic descriptions of the movement of edges can infer 3D shape while the camera is moved. This uses an advantage of foveation i.e. the problem becomes object centred. The problem of correspondence for numerous edge points is overcome by the use of a tree based representation for the competing hypotheses. Numerous hypothesis are maintained simultaneously and it does not rely on a single kinematic model which assumes constant velocity or acceleration. The numerous advantages of this strategy are described
    • 

    corecore