    Visual motion processing and human tracking behavior

    The accurate visual tracking of a moving object is a human fundamental skill that allows to reduce the relative slip and instability of the object's image on the retina, thus granting a stable, high-quality vision. In order to optimize tracking performance across time, a quick estimate of the object's global motion properties needs to be fed to the oculomotor system and dynamically updated. Concurrently, performance can be greatly improved in terms of latency and accuracy by taking into account predictive cues, especially under variable conditions of visibility and in presence of ambiguous retinal information. Here, we review several recent studies focusing on the integration of retinal and extra-retinal information for the control of human smooth pursuit.By dynamically probing the tracking performance with well established paradigms in the visual perception and oculomotor literature we provide the basis to test theoretical hypotheses within the framework of dynamic probabilistic inference. We will in particular present the applications of these results in light of state-of-the-art computer vision algorithms

    Gaze control modelling and robotic implementation

    Although we have the impression that we can process the entire visual field in a single fixation, in reality we would be unable to fully process the information outside of foveal vision if we were unable to move our eyes. Because of acuity limitations in the retina, eye movements are necessary for processing the details of the array. Our ability to discriminate fine detail drops off markedly outside of the fovea in the parafovea (extending out to about 5 degrees on either side of fixation) and in the periphery (everything beyond the parafovea). While we are reading or searching a visual array for a target or simply looking at a new scene, our eyes move every 200-350 ms. These eye movements serve to move the fovea (the high resolution part of the retina encompassing 2 degrees at the centre of the visual field) to an area of interest in order to process it in greater detail. During the actual eye movement (or saccade), vision is suppressed and new information is acquired only during the fixation (the period of time when the eyes remain relatively still). While it is true that we can move our attention independently of where the eyes are fixated, it does not seem to be the case in everyday viewing. The separation between attention and fixation is often attained in very simple tasks; however, in tasks like reading, visual search, and scene perception, covert attention and overt attention (the exact eye location) are tightly linked. Because eye movements are essentially motor movements, it takes time to plan and execute a saccade. In addition, the end-point is pre-selected before the beginning of the movement. There is considerable evidence that the nature of the task influences eye movements. Depending on the task, there is considerable variability both in terms of fixation durations and saccade lengths. It is possible to outline five separate movement systems that put the fovea on a target and keep it there. Each of these movement systems shares the same effector pathway—the three bilateral groups of oculomotor neurons in the brain stem. These five systems include three that keep the fovea on a visual target in the environment and two that stabilize the eye during head movement. Saccadic eye movements shift the fovea rapidly to a visual target in the periphery. Smooth pursuit movements keep the image of a moving target on the fovea. Vergence movements move the eyes in opposite directions so that the image is positioned on both foveae. Vestibulo-ocular movements hold images still on the retina during brief head movements and are driven by signals from the vestibular system. Optokinetic movements hold images during sustained head rotation and are driven by visual stimuli. All eye movements but vergence movements are conjugate: each eye moves the same amount in the same direction. Vergence movements are disconjugate: The eyes move in different directions and sometimes by different amounts. Finally, there are times that the eye must stay still in the orbit so that it can examine a stationary object. Thus, a sixth system, the fixation system, holds the eye still during intent gaze. This requires active suppression of eye movement. Vision is most accurate when the eyes are still. When we look at an object of interest a neural system of fixation actively prevents the eyes from moving. The fixation system is not as active when we are doing something that does not require vision, for example, mental arithmetic. Our eyes explore the world in a series of active fixations connected by saccades. The purpose of the saccade is to move the eyes as quickly as possible. Saccades are highly stereotyped; they have a standard waveform with a single smooth increase and decrease of eye velocity. Saccades are extremely fast, occurring within a fraction of a second, at speeds up to 900°/s. Only the distance of the target from the fovea determines the velocity of a saccadic eye movement. We can change the amplitude and direction of our saccades voluntarily but we cannot change their velocities. Ordinarily there is no time for visual feedback to modify the course of the saccade; corrections to the direction of movement are made in successive saccades. Only fatigue, drugs, or pathological states can slow saccades. Accurate saccades can be made not only to visual targets but also to sounds, tactile stimuli, memories of locations in space, and even verbal commands (“look left”). The smooth pursuit system keeps the image of a moving target on the fovea by calculating how fast the target is moving and moving the eyes accordingly. The system requires a moving stimulus in order to calculate the proper eye velocity. Thus, a verbal command or an imagined stimulus cannot produce smooth pursuit. Smooth pursuit movements have a maximum velocity of about 100°/s, much slower than saccades. The saccadic and smooth pursuit systems have very different central control systems. A coherent integration of these different eye movements, together with the other movements, essentially corresponds to a gating-like effect on the brain areas controlled. The gaze control can be seen in a system that decides which action should be enabled and which should be inhibited and in another that improves the action performance when it is executed. It follows that the underlying guiding principle of the gaze control is the kind of stimuli that are presented to the system, by linking therefore the task that is going to be executed. This thesis aims at validating the strong relation between actions and gaze. In the first part a gaze controller has been studied and implemented in a robotic platform in order to understand the specific features of prediction and learning showed by the biological system. The eye movements integration opens the problem of the best action that should be selected when a new stimuli is presented. The action selection problem is solved by the basal ganglia brain structures that react to the different salience values of the environment. In the second part of this work the gaze behaviour has been studied during a locomotion task. The final objective is to show how the different tasks, such as the locomotion task, imply the salience values that drives the gaze

    Optimizations and applications in head-mounted video-based eye tracking

    Video-based eye tracking techniques have become increasingly attractive in many research fields, such as visual perception and human-computer interface design. The technique primarily relies on the positional difference between the center of the eye\u27s pupil and the first-surface reflection at the cornea, the corneal reflection (CR). This difference vector is mapped to determine an observer\u27s point of regard (POR). In current head-mounted video-based eye trackers, the systems are limited in several aspects, such as inadequate measurement range and misdetection of eye features (pupil and CR). This research first proposes a new `structured illumination\u27 configuration, using multiple IREDs to illuminate the eye, to ensure that eye positions can still be tracked even during extreme eye movements (up to ±45° horizontally and ±25° vertically). Then eye features are detected by a two-stage processing approach. First, potential CRs and the pupil are isolated based on statistical information in an eye image. Second, genuine CRs are distinguished by a novel CR location prediction technique based on the well-correlated relationship between the offset of the pupil and that of the CR. The optical relationship of the pupil and CR offsets derived in this thesis can be applied to two typical illumination configurations - collimated and near-source ones- in the video-based eye tracking system. The relationships from the optical derivation and that from an experimental measurement match well. Two application studies, smooth pursuit dynamics in controlled static (laboratory) and unconstrained vibrating (car) environments were conducted. In the first study, the extended stimuli (color photographs subtending 2° and 17°, respectively) were found to enhance smooth pursuit movements induced by realistic images, and the eye velocity for tracking a small dot (subtending \u3c0.1°) was saturated at about 64 deg/sec while the saturation velocity occurred at higher velocities for the extended images. The difference in gain due to target size was significant between dot and the two extended stimuli, while no statistical difference existed between the two extended stimuli. In the second study, twovisual stimuli same as in the first study were used. The visual performance was impaired dramatically due to the whole body motion in the car, even in the tracking of a slowly moving target (2 deg/sec); the eye was found not able to perform a pursuit task as smooth as in the static environment though the unconstrained head motion in the unstable condition was supposed to enhance the visual performance

    I can see it in your eyes: what the Xenopus laevis eye can teach us about motion perception

    Seuratun kappaleen poikkeuttaminen silmänräpäysten aikana: käyttäytymis- ja neuromagneettisia havaintoja

    The visual world is perceived as continuous despite frequent interruptions of sensory data due to eyeblinks and rapid eye movements. To create the perception of constancy, the brain makes use of fill-in mechanisms. This study presents an experiment in which the location of an object during smooth pursuit tracking is altered during eyeblinks. The experiment investigates the effects of blink suppression and fill-in mechanisms to cloud the discrimination of these changes. We employed a motion-tracking task, which promotes the accurate evaluation of the object’s trajectory and thus can counteract the fill-in mechanisms. Six subjects took part in the experiment, during which they were asked to report any perceived anomalies in the trajectory. Eye movements were monitored with a video-based tracking and brain responses with simultaneous MEG recordings. Discrimination success was found to depend on the direction of the displacement, and was significantly modulated by prior knowledge of the triggered effect. Eye-movement data were congruent with previous findings and revealed a smooth transition from blink recovery to object locating. MEG recordings were analysed for condition-dependent evoked and induced responses; however, intersubject variability was too large for drawing clear conclusions regarding the brain basis of the fill-in mechanisms.Visuaalinen maailma koetaan jatkuvana, vaikka silmänräpäykset ja nopeat silmänliikkeet aiheuttavat keskeytyksiä sensoriseen tiedonkeruuseen. Luodakseen käsityksen pysyvyydestä, aivot käyttävät täyttömekanismeja. Tämä tutkimus esittelee kokeen, jossa kappaleen seurantaa hitailla seurantaliikkeillä häiritään muuttamalla sen sijaintia silmänräpäysten aikana. Tämä koe tutkii, kuinka silmänräpäysten aiheuttama suppressio ja täyttömekanismit sumentavat kykyä erotella näitä muutoksia. Käytimme liikeseurantatehtävää, joka vastaavasti edistää kappaleen liikeradan tarkkaa arviointia. Kuusi koehenkilöä osallistui kokeeseen, jonka aikana heitä pyydettiin ilmoittamaan kaikki havaitut poikkeamat kappaleen liikeradassa. Silmänliikkeitä tallennettiin videopohjaisella seurannalla, ja aivovasteita yhtäaikaisella MEG:llä. Erottelykyvyn todettiin riippuvan poikkeutuksen suunnasta, sekä merkittävästi a priori tiedosta poikkeutusten esiintymistavasta. Silmänliikedata oli yhtenevää aiempien tutkimusten kanssa, ja paljasti sujuvan siirtymisen silmänräpäyksistä palautumisesta kappaleen paikallistamiseen. MEG-tallenteet analysoitiin ehdollisten heräte- ja indusoitujen vasteiden löytämiseksi, mutta yksilölliset vaste-erot koehenkilöiden välillä olivat liian suuria selkeiden johtopäätösten tekemiseksi täyttömekanismien aivoperustasta

    Correction of Errors in Time of Flight Cameras

    En esta tesis se aborda la corrección de errores en cámaras de profundidad basadas en tiempo de vuelo (Time of Flight - ToF). De entre las más recientes tecnologías, las cámaras ToF de modulación continua (Continuous Wave Modulation - CWM) son una alternativa prometedora para la creación de sensores compactos y rápidos. Sin embargo, existen gran variedad de errores que afectan notablemente la medida de profundidad, poniendo en compromiso posibles aplicaciones. La corrección de dichos errores propone un reto desafiante. Actualmente, se consideran dos fuentes principales de error: i) sistemático y ii) no sistemático. Mientras que el primero admite calibración, el último depende de la geometría y el movimiento relativo de la escena. Esta tesis propone métodos que abordan i) la distorsión sistemática de profundidad y dos de las fuentes de error no sistemático más relevantes: ii.a) la interferencia por multicamino (Multipath Interference - MpI) y ii.b) los artefactos de movimiento. La distorsión sistemática de profundidad en cámaras ToF surge principalmente debido al uso de señales sinusoidales no perfectas para modular. Como resultado, las medidas de profundidad aparecen distorsionadas, pudiendo ser reducidas con una etapa de calibración. Esta tesis propone un método de calibración basado en mostrar a la cámara un plano en diferentes posiciones y orientaciones. Este método no requiere de patrones de calibración y, por tanto, puede emplear los planos, que de manera natural, aparecen en la escena. El método propuesto encuentra una función que obtiene la corrección de profundidad correspondiente a cada píxel. Esta tesis mejora los métodos existentes en cuanto a precisión, eficiencia e idoneidad. La interferencia por multicamino surge debido a la superposición de la señal reflejada por diferentes caminos con la reflexión directa, produciendo distorsiones que se hacen más notables en superficies convexas. La MpI es la causa de importantes errores en la estimación de profundidad en cámaras CWM ToF. Esta tesis propone un método que elimina la MpI a partir de un solo mapa de profundidad. El enfoque propuesto no requiere más información acerca de la escena que las medidas ToF. El método se fundamenta en un modelo radio-métrico de las medidas que se emplea para estimar de manera muy precisa el mapa de profundidad sin distorsión. Una de las tecnologías líderes para la obtención de profundidad en imagen ToF está basada en Photonic Mixer Device (PMD), la cual obtiene la profundidad mediante el muestreado secuencial de la correlación entre la señal de modulación y la señal proveniente de la escena en diferentes desplazamientos de fase. Con movimiento, los píxeles PMD capturan profundidades diferentes en cada etapa de muestreo, produciendo artefactos de movimiento. El método propuesto en esta tesis para la corrección de dichos artefactos destaca por su velocidad y sencillez, pudiendo ser incluido fácilmente en el hardware de la cámara. La profundidad de cada píxel se recupera gracias a la consistencia entre las muestras de correlación en el píxel PMD y de la vecindad local. Este método obtiene correcciones precisas, reduciendo los artefactos de movimiento enormemente. Además, como resultado de este método, puede obtenerse el flujo óptico en los contornos en movimiento a partir de una sola captura. A pesar de ser una alternativa muy prometedora para la obtención de profundidad, las cámaras ToF todavía tienen que resolver problemas desafiantes en relación a la corrección de errores sistemáticos y no sistemáticos. Esta tesis propone métodos eficaces para enfrentarse con estos errores

    Real-time synthetic primate vision

