156 research outputs found

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Particle Filters for Colour-Based Face Tracking Under Varying Illumination

    Get PDF
    Automatic human face tracking is the basis of robotic and active vision systems used for facial feature analysis, automatic surveillance, video conferencing, intelligent transportation, human-computer interaction and many other applications. Superior human face tracking will allow future safety surveillance systems which monitor drowsy drivers, or patients and elderly people at the risk of seizure or sudden falls and will perform with lower risk of failure in unexpected situations. This area has actively been researched in the current literature in an attempt to make automatic face trackers more stable in challenging real-world environments. To detect faces in video sequences, features like colour, texture, intensity, shape or motion is used. Among these feature colour has been the most popular, because of its insensitivity to orientation and size changes and fast process-ability. The challenge of colour-based face trackers, however, has been dealing with the instability of trackers in case of colour changes due to the drastic variation in environmental illumination. Probabilistic tracking and the employment of particle filters as powerful Bayesian stochastic estimators, on the other hand, is increasing in the visual tracking field thanks to their ability to handle multi-modal distributions in cluttered scenes. Traditional particle filters utilize transition prior as importance sampling function, but this can result in poor posterior sampling. The objective of this research is to investigate and propose stable face tracker capable of dealing with challenges like rapid and random motion of head, scale changes when people are moving closer or further from the camera, motion of multiple people with close skin tones in the vicinity of the model person, presence of clutter and occlusion of face. The main focus has been on investigating an efficient method to address the sensitivity of the colour-based trackers in case of gradual or drastic illumination variations. The particle filter is used to overcome the instability of face trackers due to nonlinear and random head motions. To increase the traditional particle filter\u27s sampling efficiency an improved version of the particle filter is introduced that considers the latest measurements. This improved particle filter employs a new colour-based bottom-up approach that leads particles to generate an effective proposal distribution. The colour-based bottom-up approach is a classification technique for fast skin colour segmentation. This method is independent to distribution shape and does not require excessive memory storage or exhaustive prior training. Finally, to address the adaptability of the colour-based face tracker to illumination changes, an original likelihood model is proposed based of spatial rank information that considers both the illumination invariant colour ordering of a face\u27s pixels in an image or video frame and the spatial interaction between them. The original contribution of this work lies in the unique mixture of existing and proposed components to improve colour-base recognition and tracking of faces in complex scenes, especially where drastic illumination changes occur. Experimental results of the final version of the proposed face tracker, which combines the methods developed, are provided in the last chapter of this manuscript

    Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks

    Full text link
    We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first two authors contributed equally to this work. Project page: http://visualdynamics.csail.mit.ed

    Vehicle Tracking and Motion Estimation Based on Stereo Vision Sequences

    Get PDF
    In this dissertation, a novel approach for estimating trajectories of road vehicles such as cars, vans, or motorbikes, based on stereo image sequences is presented. Moving objects are detected and reliably tracked in real-time from within a moving car. The resulting information on the pose and motion state of other moving objects with respect to the own vehicle is an essential basis for future driver assistance and safety systems, e.g., for collision prediction. The focus of this contribution is on oncoming traffic, while most existing work in the literature addresses tracking the lead vehicle. The overall approach is generic and scalable to a variety of traffic scenes including inner city, country road, and highway scenarios. A considerable part of this thesis addresses oncoming traffic at urban intersections. The parameters to be estimated include the 3D position and orientation of an object relative to the ego-vehicle, as well as the object's shape, dimension, velocity, acceleration and the rotational velocity (yaw rate). The key idea is to derive these parameters from a set of tracked 3D points on the object's surface, which are registered to a time-consistent object coordinate system, by means of an extended Kalman filter. Combining the rigid 3D point cloud model with the dynamic model of a vehicle is one main contribution of this thesis. Vehicle tracking at intersections requires covering a wide range of different object dynamics, since vehicles can turn quickly. Three different approaches for tracking objects during highly dynamic turn maneuvers up to extreme maneuvers such as skidding are presented and compared. These approaches allow for an online adaptation of the filter parameter values, overcoming manual parameter tuning depending on the dynamics of the tracked object in the scene. This is the second main contribution. Further issues include the introduction of two initialization methods, a robust outlier handling, a probabilistic approach for assigning new points to a tracked object, as well as mid-level fusion of the vision-based approach with a radar sensor. The overall system is systematically evaluated both on simulated and real-world data. The experimental results show the proposed system is able to accurately estimate the object pose and motion parameters in a variety of challenging situations, including night scenes, quick turn maneuvers, and partial occlusions. The limits of the system are also carefully investigated.In dieser Dissertation wird ein Ansatz zur Trajektorienschätzung von Straßenfahrzeugen (PKW, Lieferwagen, Motorräder,...) anhand von Stereo-Bildfolgen vorgestellt. Bewegte Objekte werden in Echtzeit aus einem fahrenden Auto heraus automatisch detektiert, vermessen und deren Bewegungszustand relativ zum eigenen Fahrzeug zuverlässig bestimmt. Die gewonnenen Informationen liefern einen entscheidenden Grundstein für zukünftige Fahrerassistenz- und Sicherheitssysteme im Automobilbereich, beispielsweise zur Kollisionsprädiktion. Während der Großteil der existierenden Literatur das Detektieren und Verfolgen vorausfahrender Fahrzeuge in Autobahnszenarien adressiert, setzt diese Arbeit einen Schwerpunkt auf den Gegenverkehr, speziell an städtischen Kreuzungen. Der Ansatz ist jedoch grundsätzlich generisch und skalierbar für eine Vielzahl an Verkehrssituationen (Innenstadt, Landstraße, Autobahn). Die zu schätzenden Parameter beinhalten die räumliche Lage des anderen Fahrzeugs relativ zum eigenen Fahrzeug, die Objekt-Geschwindigkeit und -Längsbeschleunigung, sowie die Rotationsgeschwindigkeit (Gierrate) des beobachteten Objektes. Zusätzlich werden die Objektabmaße sowie die Objektform rekonstruiert. Die Grundidee ist es, diese Parameter anhand der Transformation von beobachteten 3D Punkten, welche eine ortsfeste Position auf der Objektoberfläche besitzen, mittels eines rekursiven Schätzers (Kalman Filter) zu bestimmen. Ein wesentlicher Beitrag dieser Arbeit liegt in der Kombination des Starrkörpermodells der Punktewolke mit einem Fahrzeugbewegungsmodell. An Kreuzungen können sehr unterschiedliche Dynamiken auftreten, von einer Geradeausfahrt mit konstanter Geschwindigkeit bis hin zum raschen Abbiegen. Um eine manuelle Parameteradaption abhängig von der jeweiligen Szene zu vermeiden, werden drei verschiedene Ansätze zur automatisierten Anpassung der Filterparameter an die vorliegende Situation vorgestellt und verglichen. Dies stellt den zweiten Hauptbeitrag der Arbeit dar. Weitere wichtige Beiträge sind zwei alternative Initialisierungsmethoden, eine robuste Ausreißerbehandlung, ein probabilistischer Ansatz zur Zuordnung neuer Objektpunkte, sowie die Fusion des bildbasierten Verfahrens mit einem Radar-Sensor. Das Gesamtsystem wird im Rahmen dieser Arbeit systematisch anhand von simulierten und realen Straßenverkehrsszenen evaluiert. Die Ergebnisse zeigen, dass das vorgestellte Verfahren in der Lage ist, die unbekannten Objektparameter auch unter schwierigen Umgebungsbedingungen, beispielsweise bei Nacht, schnellen Abbiegemanövern oder unter Teilverdeckungen, sehr präzise zu schätzen. Die Grenzen des Systems werden ebenfalls sorgfältig untersucht

    Extraction et analyse des caractéristiques faciales : application à l'hypovigilance chez le conducteur

    Get PDF
    Studying facial features has attracted increasing attention in both academic and industrial communities. Indeed, these features convey nonverbal information that plays a key role in humancommunication. Moreover, they are very useful to allow human-machine interactions. Therefore, the automatic study of facial features is an important task for various applications includingrobotics, human-machine interfaces, behavioral science, clinical practice and monitoring driver state. In this thesis, we focus our attention on monitoring driver state through its facial features analysis. This problematic solicits a universal interest caused by the increasing number of road accidents, principally induced by deterioration in the driver vigilance level, known as hypovigilance. Indeed, we can distinguish three hypovigilance states. The first and most critical one is drowsiness, which is manifested by an inability to keep awake and it is characterized by microsleep intervals of 2-6 seconds. The second one is fatigue, which is defined by the increasing difficulty of maintaining a task and it is characterized by an important number of yawns. The third and last one is the inattention that occurs when the attention is diverted from the driving activity and it is characterized by maintaining the head pose in a non-frontal direction.The aim of this thesis is to propose facial features based approaches allowing to identify driver hypovigilance. The first approach was proposed to detect drowsiness by identifying microsleepintervals through eye state analysis. The second one was developed to identify fatigue by detecting yawning through mouth analysis. Since no public hypovigilance database is available,we have acquired and annotated our own database representing different subjects simulating hypovigilance under real lighting conditions to evaluate the performance of these two approaches. Next, we have developed two driver head pose estimation approaches to detect its inattention and also to determine its vigilance level even if the facial features (eyes and mouth) cannot be analyzed because of non-frontal head positions. We evaluated these two estimators on the public database Pointing'04. Then, we have acquired and annotated a driver head pose database to evaluate our estimators in real driving conditions.L'étude des caractéristiques faciales a suscité l'intérêt croissant de la communauté scientifique et des industriels. En effet, ces caractéristiques véhiculent des informations non verbales qui jouent un rôle clé dans la communication entre les hommes. De plus, elles sont très utiles pour permettre une interaction entre l'homme et la machine. De ce fait, l'étude automatique des caractéristiques faciales constitue une tâche primordiale pour diverses applications telles que les interfaces homme-machine, la science du comportement, la pratique clinique et la surveillance de l'état du conducteur. Dans cette thèse, nous nous intéressons à la surveillance de l'état du conducteur à travers l'analyse de ses caractéristiques faciales. Cette problématique sollicite un intérêt universel causé par le nombre croissant des accidents routiers, dont une grande partie est provoquée par une dégradation de la vigilance du conducteur, connue sous le nom de l'hypovigilance. En effet, nous pouvons distinguer trois états d'hypovigilance. Le premier, et le plus critique, est la somnolence qui se manifeste par une incapacité à se maintenir éveillé et se caractérise par les périodes de micro-sommeil correspondant à des endormissements de 2 à 6 secondes. Le second est la fatigue qui se définit par la difficulté croissante à maintenir une tâche à terme et se caractérise par une augmentation du nombre de bâillements. Le troisième est l'inattention qui se produit lorsque l'attention est détournée de l'activité de conduite et se caractérise par le maintien de la pose de la tête en une direction autre que frontale. L'objectif de cette thèse est de concevoir des approches permettant de détecter l'hypovigilance chez le conducteur en analysant ses caractéristiques faciales. En premier lieu, nous avons proposé une approche dédiée à la détection de la somnolence à partir de l'identification des périodes de micro-sommeil à travers l'analyse des yeux. En second lieu, nous avons introduit une approche permettant de relever la fatigue à partir de l'analyse de la bouche afin de détecter les bâillements. Du fait qu'il n'existe aucune base de données publique dédiée à la détection de l'hypovigilance, nous avons acquis et annoté notre propre base de données représentant différents sujets simulant des états d'hypovigilance sous des conditions d'éclairage réelles afin d'évaluer les performances de ces deux approches. En troisième lieu, nous avons développé deux nouveaux estimateurs de la pose de la tête pour permettre à la fois de détecter l'inattention du conducteur et de déterminer son état, même quand ses caractéristiques faciales (yeux et bouche) ne peuvent être analysées suite à des positions non-frontales de la tête. Nous avons évalué ces deux estimateurs sur la base de données publique Pointing'04. Ensuite, nous avons acquis et annoté une base de données représentant la variation de la pose de la tête du conducteur pour valider nos estimateurs sous un environnement de conduite

    Multi-Modal Similarity Learning for 3D Deformable Registration of Medical Images

    Get PDF
    Alors que la perspective de la fusion d images médicales capturées par des systèmes d imageries de type différent est largement contemplée, la mise en pratique est toujours victime d un obstacle théorique : la définition d une mesure de similarité entre les images. Des efforts dans le domaine ont rencontrés un certain succès pour certains types d images, cependant la définition d un critère de similarité entre les images quelle que soit leur origine et un des plus gros défis en recalage d images déformables. Dans cette thèse, nous avons décidé de développer une approche générique pour la comparaison de deux types de modalités donnés. Les récentes avancées en apprentissage statistique (Machine Learning) nous ont permis de développer des solutions innovantes pour la résolution de ce problème complexe. Pour appréhender le problème de la comparaison de données incommensurables, nous avons choisi de le regarder comme un problème de plongement de données : chacun des jeux de données est plongé dans un espace commun dans lequel les comparaisons sont possibles. A ces fins, nous avons exploré la projection d un espace de données image sur l espace de données lié à la seconde image et aussi la projection des deux espaces de données dans un troisième espace commun dans lequel les calculs sont conduits. Ceci a été entrepris grâce à l étude des correspondances entre les images dans une base de données images pré-alignées. Dans la poursuite de ces buts, de nouvelles méthodes ont été développées que ce soit pour la régression d images ou pour l apprentissage de métrique multimodale. Les similarités apprises résultantes sont alors incorporées dans une méthode plus globale de recalage basée sur l optimisation discrète qui diminue le besoin d un critère différentiable pour la recherche de solution. Enfin nous explorons une méthode qui permet d éviter le besoin d une base de données pré-alignées en demandant seulement des données annotées (segmentations) par un spécialiste. De nombreuses expériences sont conduites sur deux bases de données complexes (Images d IRM pré-alignées et Images TEP/Scanner) dans le but de justifier les directions prises par nos approches.Even though the prospect of fusing images issued by different medical imagery systems is highly contemplated, the practical instantiation of it is subject to a theoretical hurdle: the definition of a similarity between images. Efforts in this field have proved successful for select pairs of images; however defining a suitable similarity between images regardless of their origin is one of the biggest challenges in deformable registration. In this thesis, we chose to develop generic approaches that allow the comparison of any two given modality. The recent advances in Machine Learning permitted us to provide innovative solutions to this very challenging problem. To tackle the problem of comparing incommensurable data we chose to view it as a data embedding problem where one embeds all the data in a common space in which comparison is possible. To this end, we explored the projection of one image space onto the image space of the other as well as the projection of both image spaces onto a common image space in which the comparison calculations are conducted. This was done by the study of the correspondences between image features in a pre-aligned dataset. In the pursuit of these goals, new methods for image regression as well as multi-modal metric learning methods were developed. The resulting learned similarities are then incorporated into a discrete optimization framework that mitigates the need for a differentiable criterion. Lastly we investigate on a new method that discards the constraint of a database of images that are pre-aligned, only requiring data annotated (segmented) by a physician. Experiments are conducted on two challenging medical images data-sets (Pre-Aligned MRI images and PET/CT images) to justify the benefits of our approach.CHATENAY MALABRY-Ecole centrale (920192301) / SudocSudocFranceF

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

    A Developmental Model of Trust in Humanoid Robots

    Get PDF
    Trust between humans and artificial systems has recently received increased attention due to the widespread use of autonomous systems in our society. In this context trust plays a dual role. On the one hand it is necessary to build robots that are perceived as trustworthy by humans. On the other hand we need to give to those robots the ability to discriminate between reliable and unreliable informants. This thesis focused on the second problem, presenting an interdisciplinary investigation of trust, in particular a computational model based on neuroscientific and psychological assumptions. First of all, the use of Bayesian networks for modelling causal relationships was investigated. This approach follows the well known theory-theory framework of the Theory of Mind (ToM) and an established line of research based on the Bayesian description of mental processes. Next, the role of gaze in human-robot interaction has been investigated. The results of this research were used to design a head pose estimation system based on Convolutional Neural Networks. The system can be used in robotic platforms to facilitate joint attention tasks and enhance trust. Finally, everything was integrated into a structured cognitive architecture. The architecture is based on an actor-critic reinforcement learning framework and an intrinsic motivation feedback given by a Bayesian network. In order to evaluate the model, the architecture was embodied in the iCub humanoid robot and used to replicate a developmental experiment. The model provides a plausible description of children's reasoning that sheds some light on the underlying mechanism involved in trust-based learning. In the last part of the thesis the contribution of human-robot interaction research is discussed, with the aim of understanding the factors that influence the establishment of trust during joint tasks. Overall, this thesis provides a computational model of trust that takes into account the development of cognitive abilities in children, with a particular emphasis on the ToM and the underlying neural dynamics.THRIVE, Air Force Office of Scientific Research, Award No. FA9550-15-1-002

    Visual Attention in Dynamic Environments and its Application to Playing Online Games

    Get PDF
    Abstract In this thesis we present a prototype of Cognitive Programs (CPs) - an executive controller built on top of Selective Tuning (ST) model of attention. CPs enable top-down control of visual system and interaction between the low-level vision and higher-level task demands. Abstract We implement a subset of CPs for playing online video games in real time using only visual input. Two commercial closed-source games - Canabalt and Robot Unicorn Attack - are used for evaluation. Their simple gameplay and minimal controls put the emphasis on reaction speed and attention over planning. Abstract Our implementation of Cognitive Programs plays both games at human expert level, which experimentally proves the validity of the concept. Additionally we resolved multiple theoretical and engineering issues, e.g. extending the CPs to dynamic environments, finding suitable data structures for describing the task and information flow within the network and determining the correct timing for each process
    • …
    corecore