12 research outputs found

    Combining segmentation and attention: a new foveal attention model

    Get PDF
    Artificial vision systems cannot process all the information that they receive from the world in real time because it is highly expensive and inefficient in terms of computational cost. Inspired by biological perception systems, artificial attention models pursuit to select only the relevant part of the scene. On human vision, it is also well established that these units of attention are not merely spatial but closely related to perceptual objects (proto-objects). This implies a strong bidirectional relationship between segmentation and attention processes. While the segmentation process is the responsible to extract the proto-objects from the scene, attention can guide segmentation, arising the concept of foveal attention. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. In this paper, a bottom-up foveal attention model is presented. In this model the input image is a foveal image represented using a Cartesian Foveal Geometry (CFG), which encodes the field of view of the sensor as a fovea (placed in the focus of attention) surrounded by a set of concentric rings with decreasing resolution. Then multi-resolution perceptual segmentation is performed by building a foveal polygon using the Bounded Irregular Pyramid (BIP). Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image proto-object. Saliency is computed as a linear combination of multiple low level features such as color and intensity contrast, symmetry, orientation and roundness. Obtained results from natural images show that the performance of the combination of hierarchical foveal segmentation and saliency estimation is good in terms of accuracy and speed

    Nonlinear Classifier Stacking on Riemannian and Grassmann Manifolds with Application to Video Analysis

    Get PDF
    This research is devoted to the problem of overfitting in Machine Learning and Pattern Recognition. It should lead to improving the generalisation ability and accuracy boosting in the case of small and/or difficult classification datasets. The aforementioned two problems have been solved in two different ways: by splitting the entire datasets into functional groups depending on the classification difficulty using consensus of classifiers, and by embedding the data obtained during classifier stacking into nonlinear spaces i.e. Riemannian and Grassmann manifolds. These two techniques are the main contributions of the thesis. The insight behind the first approach is that we are not going to use the entire training subset to train our classifiers but some part of it in order to approximate the true geometry and properties of classes. In terms of Data Science, this process can also be understood as Data Cleaning. According to the first approach, instances with high positive (easy) and negative (misclassified) margins are not considered for training as those that do not improve (or even worsen) the evaluation of the true geometry of classes. The main goal of using Riemannian geometry consists of embedding our classes in nonlinear spaces where the geometry of classes in terms of easier classification has to be obtained. Before embedding our classes on Riemannian and Grassmann manifolds we do several Data Transformations using different variants of Classifier Stacking. Riemannian manifolds of Symmetric Positive Definite matrices are created using the classifier interactions while Grassmann manifolds are built based on Decision Profiles. The purpose of the two aforementioned approaches is Data Complexity reduction. There is a consensus among researchers, that Data Complexity reduction should lead to an overfitting decrease as well as to classification accuracy enhancement. We carried out our experiments on various datasets from the UCI Machine Learning repository. We also tested our approaches on two datasets related to the Video Analysis problem. The first dataset is a Phase Gesture Segmentation dataset taken from the UCI Machine Learning repository. The second one is the Deep Fake detection Challenge dataset. In order to apply our approach to solve the second problem, some image processing has been carried out. Numerous experiments on datasets of general character and those related to Video Analysis problems show the consistency and efficiency of the proposed techniques. We also compared our techniques with the state-of-the-art techniques. The obtained results show the superiority of our approaches for most of the cases. The significance of carried out research and obtained results manifests in better representation and evaluation of the geometry of classes which may overlap only in feature space due to some improper measurements, errors, noises, or by selecting features that do not represent well our classes. Carried out research is a pioneering in terms of Data Cleaning and Classifier Ensemble Learning in Riemannian geometry

    Image Registration Workshop Proceedings

    Get PDF
    Automatic image registration has often been considered as a preliminary step for higher-level processing, such as object recognition or data fusion. But with the unprecedented amounts of data which are being and will continue to be generated by newly developed sensors, the very topic of automatic image registration has become and important research topic. This workshop presents a collection of very high quality work which has been grouped in four main areas: (1) theoretical aspects of image registration; (2) applications to satellite imagery; (3) applications to medical imagery; and (4) image registration for computer vision research

    Proceedings of the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory

    Get PDF
    This book is a collection of 15 reviewed technical reports summarizing the presentations at the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. The covered topics include image processing, optical signal processing, visual inspection, pattern recognition and classification, human-machine interaction, world and situation modeling, autonomous system localization and mapping, information fusion, and trust propagation in sensor networks

    Towards adaptive and autonomous humanoid robots: from vision to actions

    Get PDF
    Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions

    Visual Cortex

    Get PDF
    The neurosciences have experienced tremendous and wonderful progress in many areas, and the spectrum encompassing the neurosciences is expansive. Suffice it to mention a few classical fields: electrophysiology, genetics, physics, computer sciences, and more recently, social and marketing neurosciences. Of course, this large growth resulted in the production of many books. Perhaps the visual system and the visual cortex were in the vanguard because most animals do not produce their own light and offer thus the invaluable advantage of allowing investigators to conduct experiments in full control of the stimulus. In addition, the fascinating evolution of scientific techniques, the immense productivity of recent research, and the ensuing literature make it virtually impossible to publish in a single volume all worthwhile work accomplished throughout the scientific world. The days when a single individual, as Diderot, could undertake the production of an encyclopedia are gone forever. Indeed most approaches to studying the nervous system are valid and neuroscientists produce an almost astronomical number of interesting data accompanied by extremely worthy hypotheses which in turn generate new ventures in search of brain functions. Yet, it is fully justified to make an encore and to publish a book dedicated to visual cortex and beyond. Many reasons validate a book assembling chapters written by active researchers. Each has the opportunity to bind together data and explore original ideas whose fate will not fall into the hands of uncompromising reviewers of traditional journals. This book focuses on the cerebral cortex with a large emphasis on vision. Yet it offers the reader diverse approaches employed to investigate the brain, for instance, computer simulation, cellular responses, or rivalry between various targets and goal directed actions. This volume thus covers a large spectrum of research even though it is impossible to include all topics in the extremely diverse field of neurosciences

    Desarrollo de arquitecturas para procesamiento de control de la mirada en sistemas de visi贸n activa con resoluci贸n espacial variable

    Get PDF
    En esta tesis se aborda la implementaci贸n de un sistema completo de visi贸n activa, en el que se capturan y generan im谩genes de resoluci贸n espacial variable. Todo el sistema se integra en un s贸lo dispositivo del tipo AP SoC (All Programmable System on Chip), lo que nos permite llevar a cabo el codise帽o hardware-software del mismo, implementando en la parte l贸gica los bloques de preprocesado intensivo, y en la parte software los algoritmos de procesado de control m谩s complejo. El objetivo es que, trabajando con un campo visual del orden de Megap铆xeles, se pueda procesar una tasa moderada de im谩genes por segundo. Las im谩genes multiresoluci贸n se generan a partir de sensores de resoluci贸n uniforme con una latencia nula, lo que permite tener preparada la imagen de resoluci贸n variable en el mismo instante en que se ha terminado de capturar la imagen original. Como innovaci贸n con respecto a las primeras contribuciones relacionadas con esta Tesis, se procesan im谩genes con toda la informaci贸n de color. Esto implica la necesidad de dise帽ar conversores entre espacios de color distintos, para adecuar la informaci贸n al tipo de procesado que se va a realizar con ella. Estos bloques se integran sin alterar la latencia de entrega de los sucesivos fotogramas. El procesamiento de estas im谩genes multirresoluci贸n genera un mapa de saliencia que permite mover la f贸vea hac铆a la regi贸n considerada como m谩s relevante en la escena. El contenido de la imagen se estructura en una jerarqu铆a de niveles de abstracci贸n. A diferencia de otras arquitecturas de este tipo, como son la pir谩mide regular y el pol铆gono foveal, en las que se trabaja con im谩genes de resoluci贸n uniforme en los distintos niveles de la jerarqu铆a, la pir谩mide irregular foveal que se propone en esta tesis combina las ideas de trabajar con una imagen realmente multirresoluci贸n, que incluya el campo de visi贸n completo que abarcan sensor y 贸ptica, con el procesamiento jer谩rquico propio de las pir谩mides irregulares. Para ello en esta tesis se propone la implementaci贸n de un algoritmo de diezmado irregular que, tomando como base la imagen multirresoluci贸n, dar谩 como resultado una estructura piramidal donde los distintos niveles no son im谩genes sino grafos orientados a la resoluci贸n del problema de segmentaci贸n y estimaci贸n de saliencia. Todo el sistema se integra en torno a la arquitectura de bus AXI, que permite conectar entre si todos los cores desarrollados en la parte l贸gica, as铆 como el acceso a la memoria compartida con los algoritmos implementados en la parte software. Esto es posible gracias a los bloques de acceso directo a memoria AXI-VDMA, en una propuesta de configuraci贸n que permite tanto la integraci贸n perfectamente coordinada de la transferencia de la imagen multirresoluci贸n generada a la zona de trabajo del algoritmo de segmentaci贸n como su recuperaci贸n para la posterior visualizaci贸n del resultado del proceso, y todo ello con una tasa de trabajo que mejora los resultados de plataformas similares
    corecore