444 research outputs found
The effects of viewpoint on the virtual space of pictures
Pictorial displays whose primary purpose is to convey accurate information about the 3-D spatial layout of an environment are discussed. How and how well, pictures can convey such information is discussed. It is suggested that picture perception is not best approached as a unitary, indivisible process. Rather, it is a complex process depending on multiple, partially redundant, interacting sources of visual information for both the real surface of the picture and the virtual space beyond. Each picture must be assessed for the particular information that it makes available. This will determine how accurately the virtual space represented by the picture is seen, as well as how it is distorted when seen from the wrong viewpoint
Robust Image-Based Visual Servo Control of an Uncertain Missile Airframe
A nonlinear vision-based guidance law is presented for a missile-target scenario in the presence of model uncertainty and unknown target evasive maneuvers. To ease the readability of this thesis, detailed explanations of any relevant mathematical tools are provided, including stability definitions, the procedure of Lyapunov-based stability analysis, sliding mode control fundamentals, basics on visual servo control, and other basic nonlinear control tools. To develop the vision-based guidance law, projective geometric relationships are utilized to combine the image kinematics with the missile dynamics in an integrated visual dynamic system. The guidance law is designed using an image-based visual servo control method in conjunction with a sliding-mode control strategy, which is shown to achieve asymptotic target interception in the presence of the aforementioned uncertainties. A Lyapunov-based stability analysis is presented to prove the theoretical result, and numerical simulation results are provided to demonstrate the performance of the proposed robust controller for both stationary and non-stationary targets
An Active Visual Estimator for Dexterous Manipulation
We present a working implementation of a dynamics based architecture for visual sensing. This architecture provides field rate estimates of the positions and velocities of two independent falling balls in the face of repeated visual occlusions and departures from the field of view. The practical success of this system can be attributed to the interconnection of two strongly nonlinear dynamical systems: a novel triangulating state estimator; and an image plane window controller. We detail the architecture of this active sensor, provide data documenting its performance, and offer an analysis of its soundness in the form of a convergence proof for the estimator and a boundedness proof for the manager
InGVIO: A Consistent Invariant Filter for Fast and High-Accuracy GNSS-Visual-Inertial Odometry
Combining Global Navigation Satellite System (GNSS) with visual and inertial
sensors can give smooth pose estimation without drifting in geographical
coordinates. The fusion system gradually degrades to Visual-Inertial Odometry
(VIO) with the number of satellites decreasing, which guarantees robust global
navigation in GNSS unfriendly environments. In this letter, we propose an
open-sourced invariant filter-based platform, InGVIO, to tightly fuse
monocular/stereo visual-inertial measurements, along with raw data from GNSS,
i.e. pseudo ranges and Doppler shifts. InGVIO gives highly competitive results
in terms of accuracy and computational load compared to current graph-based and
`naive' EKF-based algorithms. Thanks to our proposed key-frame marginalization
strategies, the baseline for triangulation is large although only a few cloned
poses are kept. Besides, landmarks are anchored to a single cloned pose to fit
the nonlinear log-error form of the invariant filter while achieving decoupled
propagation with IMU states. Moreover, we exploit the infinitesimal symmetries
of the system, which gives equivalent results for the pattern of degenerate
motions and the structure of unobservable subspaces compared to our previous
work using observability analysis. We show that the properly-chosen invariant
error captures such symmetries and has intrinsic consistency properties. InGVIO
is tested on both open datasets and our proposed fixed-wing datasets with
variable levels of difficulty. The latter, to the best of our knowledge, are
the first datasets open-sourced to the community on a fixed-wing aircraft with
raw GNSS.Comment: 8 pages, 8 figures; manuscript will be submitted to IEEE RA-L for
possible publicatio
Recommended from our members
3D motion : encoding and perception
The visual system supports perception and inferences about events in a dynamic, three-dimensional (3D) world. While remarkable progress has been made in the study of visual information processing, the existing paradigms for examining visual perception and its relation to neural activity often fail to generalize to perception in the real world which has complex dynamics and 3D spatial structure. This thesis focuses on the case of 3D motion, developing dynamic tasks for studying visual perception and constructing a neural coding framework to relate neural activity to perception in a 3D environment.
First, I introduce target-tracking as a psychophysical method and develop an analysis framework based on state space models and the Kalman filter. I demonstrate that target-tracking in conjunction with a Kalman filter analysis framework produce estimates of visual sensitivity that are comparable to those obtained with a traditional forced-choice task and a signal detection theory analysis. Next, I use the target-tracking paradigm in a series of experiments examining 3D motion perception, specifically comparing the perception of frontoparallel motion with the perception of motion-through-depth. I find that continuous tracking of motion-through-depth is selectively impaired due to the relatively small retinal projections resulting from motion-through-depth and the slower processing of binocular disparities.
The thesis then turns the neural representation of 3D motion and how that underlies perception. First I introduce a theoretical framework that extends the standard neural coding approach, incorporating the environment-to-retina transformation. Neural coding typically treats the visuals stimulus as a direct proxy for the pattern of stimulation that falls on the retina. Incorporating the environment-to-retina transformation results in a neural representation fundamentally shaped by the projective geometry of the world onto the retina. This model explains substantial anomalies in existing neurophysiological recordings in primate visual cortical neurons during presentations of 3D motion and in psychophysical studies of human perception. In a series of psychophysical experiments, I systematically examine the predictions of the model for human perception by observing how perceptual performance changes as a function of viewing distance and eccentricity. Performance in these experiments suggests a reliance on a neural representation similar to the one described by the model.
Taken together, the experimental and theoretical findings reported here advance the understanding of the neural representation and perception of the dynamic 3D world, and adds to the behavioral tools available to vision scientists.Neuroscienc
Angular variation as a monocular cue for spatial percepcion
Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that
provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial
information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct
perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the
optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches.
In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer
vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be
developed.
The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to
the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to
establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is
carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose
estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms
based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however,
they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes
performed by the human visual system was developed. It is based in the proper content of the image and defines a computational
approach to direct perception.
The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data
from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric
relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the
environment, a significant cognitive effect is produced by the application of the developed computational approach in environments
mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants
were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in
teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the
3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un
solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría
características estáticas que proveen información de profundidad y son muy
utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la
información espacial contenida en dichas señales son extraídas de la retina, la
existencia de una relación entre esta extracción de información y la teoría de
percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la
información espacial de todo le que vemos está directamente contenido en el arreglo
óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de
percepción visual a través de enfoques computacionales. En esta tesis doctoral, la
variación angular es considerada como una señal monocular, y el concepto de
percepción directa adoptado por un enfoque basado en algoritmos de visión por
computador que lo consideran un principio apropiado para el desarrollo de nuevas
técnicas de cálculo de información espacial.
La información espacial esperada a obtener de esta señal monocular es la posición y
orientación de un objeto con respecto al observador, lo cual en visión por computador
es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta
tesis doctoral, establecer la variación angular como señal monocular y conseguir un
modelo matemático que describa la percepción directa, se lleva a cabo mediante el
desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias
convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones
para relacionar características del objeto y la imagen. En este caso, dos algoritmos
basados en el análisis de movimientos de rotación de una línea recta fueron
desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin
embargo, dependen fuertemente de condiciones de la escena. Para superar esta
limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el
sistema visual humano fue desarrollado. Está basado en el propio contenido de la
imagen y define un enfoque computacional a la percepción directa.
El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas
por variaciones angulares. El propósito principal es el de reunir datos de importancia
con los cuales la información espacial pueda ser obtenida y utilizada para emular
procesos de percepción visual mediante el establecimiento de relaciones métricas 2D-
3D. Debido a que dicha relación es considerada fundamental en la coordinación
visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un
efecto cognitivo significativo puede ser producido por la aplicación de métodos de
L
estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este
efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número
de participantes fueron invitados a ejecutar una tarea de acción-percepción. El
propósito principal de este estudio fue el análisis de la conducta guiada visualmente en
teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los
resultados han presentado una influencia notable de la ayuda 3D en la mejora de la
habilidad, así como un aumento de la sensación de presencia
Precise and Robust Visual SLAM with Inertial Sensors and Deep Learning.
Dotar a los robots con el sentido de la percepción destaca como el componente más importante para conseguir máquinas completamente autónomas. Una vez que las máquinas sean capaces de percibir el mundo, podrán interactuar con él. A este respecto, la localización y la reconstrucción de mapas de manera simultánea, SLAM (por sus siglas en inglés) comprende todas las técnicas que permiten a los robots estimar su posición y reconstruir el mapa de su entorno al mismo tiempo, usando únicamente el conjunto de sensores a bordo. El SLAM constituye el elemento clave para la percepción de las máquinas, estando ya presente en diferentes tecnologías y aplicaciones como la conducción autónoma, la realidad virtual y aumentada o los robots de servicio. Incrementar la robustez del SLAM expandiría su uso y aplicación, haciendo las máquinas más seguras y requiriendo una menor intervención humana.En esta tesis hemos combinado sensores inerciales (IMU) y visuales para incrementar la robustez del SLAM ante movimientos rápidos, oclusiones breves o entornos con poca textura. Primero hemos propuesto dos técnicas rápidas para la inicialización del sensor inercial, con un bajo error de escala. Estas han permitido empezar a usar la IMU tan pronto como 2 segundos después de lanzar el sistema. Una de estas inicializaciones ha sido integrada en un nuevo sistema de SLAM visual inercial, acuñado como ORB-SLAM3, el cual representa la mayor contribución de esta tesis. Este es el sistema de SLAM visual-inercial de código abierto más completo hasta la fecha, que funciona con cámaras monoculares o estéreo, estenopeicas o de ojo de pez, y con capacidades multimapa. ORB-SLAM3 se basa en una formulación de Máximo a Posteriori, tanto en la inicialización como en el refinamiento y el ajuste de haces visual-inercial. También explota la asociación de datos en el corto, medio y largo plazo. Todo esto hace que ORB-SLAM3 sea el sistema SLAM visual-inercial más preciso, como así demuestran nuestros resultados en experimentos públicos.Además, hemos explorado la aplicación de técnicas de aprendizaje profundo para mejorar la robustez del SLAM. En este aspecto, primero hemos propuesto DynaSLAM II, un sistema SLAM estéreo para entornos dinámicos. Los objetos dinámicos son segmentados mediante una red neuronal, y sus puntos y medidas son incluidas eficientemente en la optimización de ajuste de haces. Esto permite estimar y hacer seguimiento de los objetos en movimiento, al mismo tiempo que se mejora la estimación de la trayectoria de la cámara. En segundo lugar, hemos desarrollado un SLAM monocular y directo basado en predicciones de profundidad a través de redes neuronales. Optimizamos de manera conjunta tanto los residuos de predicción de profundidad como los fotométricos de distintas vistas, lo que da lugar a un sistema monocular capaz de estimar la escala. No sufre el problema de deriva de escala, siendo más robusto y varias veces más preciso que los sistemas monoculares clásicos.<br /
A Neural Model of How the Brain Computes Heading from Optic Flow in Realistic Scenes
Animals avoid obstacles and approach goals in novel cluttered environments using visual information, notably optic flow, to compute heading, or direction of travel, with respect to objects in the environment. We present a neural model of how heading is computed that describes interactions among neurons in several visual areas of the primate magnocellular pathway, from retina through V1, MT+, and MSTd. The model produces outputs which are qualitatively and quantitatively similar to human heading estimation data in response to complex natural scenes. The model estimates heading to within 1.5° in random dot or photo-realistically rendered scenes and within 3° in video streams from driving in real-world environments. Simulated rotations of less than 1 degree per second do not affect model performance, but faster simulated rotation rates deteriorate performance, as in humans. The model is part of a larger navigational system that identifies and tracks objects while navigating in cluttered environments.National Science Foundation (SBE-0354378, BCS-0235398); Office of Naval Research (N00014-01-1-0624); National-Geospatial Intelligence Agency (NMA201-01-1-2016
Learning, Moving, And Predicting With Global Motion Representations
In order to effectively respond to and influence the world they inhabit, animals and other intelligent agents must understand and predict the state of the world and its dynamics. An agent that can characterize how the world moves is better equipped to engage it. Current methods of motion computation rely on local representations of motion (such as optical flow) or simple, rigid global representations (such as camera motion). These methods are useful, but they are difficult to estimate reliably and limited in their applicability to real-world settings, where agents frequently must reason about complex, highly nonrigid motion over long time horizons. In this dissertation, I present methods developed with the goal of building more flexible and powerful notions of motion needed by agents facing the challenges of a dynamic, nonrigid world. This work is organized around a view of motion as a global phenomenon that is not adequately addressed by local or low-level descriptions, but that is best understood when analyzed at the level of whole images and scenes. I develop methods to: (i) robustly estimate camera motion from noisy optical flow estimates by exploiting the global, statistical relationship between the optical flow field and camera motion under projective geometry; (ii) learn representations of visual motion directly from unlabeled image sequences using learning rules derived from a formulation of image transformation in terms of its group properties; (iii) predict future frames of a video by learning a joint representation of the instantaneous state of the visual world and its motion, using a view of motion as transformations of world state. I situate this work in the broader context of ongoing computational and biological investigations into the problem of estimating motion for intelligent perception and action
- …