547 research outputs found
Angular variation as a monocular cue for spatial percepcion
Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that
provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial
information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct
perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the
optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches.
In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer
vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be
developed.
The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to
the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to
establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is
carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose
estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms
based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however,
they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes
performed by the human visual system was developed. It is based in the proper content of the image and defines a computational
approach to direct perception.
The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data
from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric
relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the
environment, a significant cognitive effect is produced by the application of the developed computational approach in environments
mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants
were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in
teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the
3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un
solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría
características estáticas que proveen información de profundidad y son muy
utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la
información espacial contenida en dichas señales son extraídas de la retina, la
existencia de una relación entre esta extracción de información y la teoría de
percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la
información espacial de todo le que vemos está directamente contenido en el arreglo
óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de
percepción visual a través de enfoques computacionales. En esta tesis doctoral, la
variación angular es considerada como una señal monocular, y el concepto de
percepción directa adoptado por un enfoque basado en algoritmos de visión por
computador que lo consideran un principio apropiado para el desarrollo de nuevas
técnicas de cálculo de información espacial.
La información espacial esperada a obtener de esta señal monocular es la posición y
orientación de un objeto con respecto al observador, lo cual en visión por computador
es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta
tesis doctoral, establecer la variación angular como señal monocular y conseguir un
modelo matemático que describa la percepción directa, se lleva a cabo mediante el
desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias
convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones
para relacionar características del objeto y la imagen. En este caso, dos algoritmos
basados en el análisis de movimientos de rotación de una línea recta fueron
desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin
embargo, dependen fuertemente de condiciones de la escena. Para superar esta
limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el
sistema visual humano fue desarrollado. Está basado en el propio contenido de la
imagen y define un enfoque computacional a la percepción directa.
El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas
por variaciones angulares. El propósito principal es el de reunir datos de importancia
con los cuales la información espacial pueda ser obtenida y utilizada para emular
procesos de percepción visual mediante el establecimiento de relaciones métricas 2D-
3D. Debido a que dicha relación es considerada fundamental en la coordinación
visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un
efecto cognitivo significativo puede ser producido por la aplicación de métodos de
L
estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este
efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número
de participantes fueron invitados a ejecutar una tarea de acción-percepción. El
propósito principal de este estudio fue el análisis de la conducta guiada visualmente en
teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los
resultados han presentado una influencia notable de la ayuda 3D en la mejora de la
habilidad, así como un aumento de la sensación de presencia
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
Combined Learned and Classical Methods for Real-Time Visual Perception in Autonomous Driving
Autonomy, robotics, and Artificial Intelligence (AI) are among the main defining themes of next-generation societies. Of the most important applications of said technologies is driving automation which spans from different Advanced Driver Assistance Systems (ADAS) to full self-driving vehicles. Driving automation is promising to reduce accidents, increase safety, and increase access to mobility for more people such as the elderly and the handicapped. However, one of the main challenges facing autonomous vehicles is robust perception which can enable safe interaction and decision making. With so many sensors to perceive the environment, each with its own capabilities and limitations, vision is by far one of the main sensing modalities. Cameras are cheap and can provide rich information of the observed scene. Therefore, this dissertation develops a set of visual perception algorithms with a focus on autonomous driving as the target application area. This dissertation starts by addressing the problem of real-time motion estimation of an agent using only the visual input from a camera attached to it, a problem known as visual odometry. The visual odometry algorithm can achieve low drift rates over long-traveled distances. This is made possible through the innovative local mapping approach used. This visual odometry algorithm was then combined with my multi-object detection and tracking system. The tracking system operates in a tracking-by-detection paradigm where an object detector based on convolution neural networks (CNNs) is used. Therefore, the combined system can detect and track other traffic participants both in image domain and in 3D world frame while simultaneously estimating vehicle motion. This is a necessary requirement for obstacle avoidance and safe navigation. Finally, the operational range of traditional monocular cameras was expanded with the capability to infer depth and thus replace stereo and RGB-D cameras. This is accomplished through a single-stream convolution neural network which can output both depth prediction and semantic segmentation. Semantic segmentation is the process of classifying each pixel in an image and is an important step toward scene understanding. Literature survey, algorithms descriptions, and comprehensive evaluations on real-world datasets are presented.Ph.D.College of Engineering & Computer ScienceUniversity of Michiganhttps://deepblue.lib.umich.edu/bitstream/2027.42/153989/1/Mohamed Aladem Final Dissertation.pdfDescription of Mohamed Aladem Final Dissertation.pdf : Dissertatio
Motorcycles that see: Multifocal stereo vision sensor for advanced safety systems in tilting vehicles
Advanced driver assistance systems, ADAS, have shown the possibility to anticipate crash accidents and effectively assist road users in critical traffic situations. This is not the case for motorcyclists, in fact ADAS for motorcycles are still barely developed. Our aim was to study a camera-based sensor for the application of preventive safety in tilting vehicles. We identified two road conflict situations for which automotive remote sensors installed in a tilting vehicle are likely to fail in the identification of critical obstacles. Accordingly, we set two experiments conducted in real traffic conditions to test our stereo vision sensor. Our promising results support the application of this type of sensors for advanced motorcycle safety applications
Visual control of flight speed in Drosophila melanogaster
Flight control in insects depends on self-induced image motion (optic flow), which the visual system must process to generate appropriate corrective steering maneuvers. Classic experiments in tethered insects applied rigorous system identification techniques for the analysis of turning reactions in the presence of rotating pattern stimuli delivered in open-loop. However, the functional relevance of these measurements for visual free-flight control remains equivocal due to the largely unknown effects of the highly constrained experimental conditions. To perform a systems analysis of the visual flight speed response under free-flight conditions, we implemented a `one-parameter open-loop' paradigm using `TrackFly' in a wind tunnel equipped with real-time tracking and virtual reality display technology. Upwind flying flies were stimulated with sine gratings of varying temporal and spatial frequencies, and the resulting speed responses were measured from the resulting flight speed reactions. To control flight speed, the visual system of the fruit fly extracts linear pattern velocity robustly over a broad range of spatio–temporal frequencies. The speed signal is used for a proportional control of flight speed within locomotor limits. The extraction of pattern velocity over a broad spatio–temporal frequency range may require more sophisticated motion processing mechanisms than those identified in flies so far. In Drosophila, the neuromotor pathways underlying flight speed control may be suitably explored by applying advanced genetic techniques, for which our data can serve as a baseline. Finally, the high-level control principles identified in the fly can be meaningfully transferred into a robotic context, such as for the robust and efficient control of autonomous flying micro air vehicles
Invariance of visual operations at the level of receptive fields
Receptive field profiles registered by cell recordings have shown that
mammalian vision has developed receptive fields tuned to different sizes and
orientations in the image domain as well as to different image velocities in
space-time. This article presents a theoretical model by which families of
idealized receptive field profiles can be derived mathematically from a small
set of basic assumptions that correspond to structural properties of the
environment. The article also presents a theory for how basic invariance
properties to variations in scale, viewing direction and relative motion can be
obtained from the output of such receptive fields, using complementary
selection mechanisms that operate over the output of families of receptive
fields tuned to different parameters. Thereby, the theory shows how basic
invariance properties of a visual system can be obtained already at the level
of receptive fields, and we can explain the different shapes of receptive field
profiles found in biological vision from a requirement that the visual system
should be invariant to the natural types of image transformations that occur in
its environment.Comment: 40 pages, 17 figure
Proceedings of the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory
This book is a collection of 15 reviewed technical reports summarizing the presentations at the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. The covered topics include image processing, optical signal processing, visual inspection, pattern recognition and classification, human-machine interaction, world and situation modeling, autonomous system localization and mapping, information fusion, and trust propagation in sensor networks
Biologically Inspired Visual Control of Flying Robots
Insects posses an incredible ability to navigate their environment at high speed, despite
having small brains and limited visual acuity. Through selective pressure they have
evolved computationally efficient means for simultaneously performing navigation tasks
and instantaneous control responses. The insect’s main source of information is visual,
and through a hierarchy of processes this information is used for perception; at the
lowest level are local neurons for detecting image motion and edges, at the higher level
are interneurons to spatially integrate the output of previous stages. These higher
level processes could be considered as models of the insect's environment, reducing the
amount of information to only that which evolution has determined relevant. The scope
of this thesis is experimenting with biologically inspired visual control of flying robots
through information processing, models of the environment, and flight behaviour.
In order to test these ideas I developed a custom quadrotor robot and experimental
platform; the 'wasp' system. All algorithms ran on the robot, in real-time or better,
and hypotheses were always verified with flight experiments.
I developed a new optical flow algorithm that is computationally efficient, and able
to be applied in a regular pattern to the image. This technique is used later in my
work when considering patterns in the image motion field.
Using optical flow in the log-polar coordinate system I developed attitude estimation
and time-to-contact algorithms. I find that the log-polar domain is useful for
analysing global image motion; and in many ways equivalent to the retinotopic arrange-
ment of neurons in the optic lobe of insects, used for the same task.
I investigated the role of depth in insect flight using two experiments. In the first
experiment, to study how concurrent visual control processes might be combined, I
developed a control system using the combined output of two algorithms. The first
algorithm was a wide-field optical flow balance strategy and the second an obstacle
avoidance strategy which used inertial information to estimate the depth to objects in
the environment - objects whose depth was significantly different to their surround-
ings. In the second experiment I created an altitude control system which used a model
of the environment in the Hough space, and a biologically inspired sampling strategy,
to efficiently detect the ground. Both control systems were used to control the flight
of a quadrotor in an indoor environment.
The methods that insects use to perceive edges and control their flight in response
had not been applied to artificial systems before. I developed a quadrotor control
system that used the distribution of edges in the environment to regulate the robot
height and avoid obstacles. I also developed a model that predicted the distribution of
edges in a static scene, and using this prediction was able to estimate the quadrotor
altitude
- …