21 research outputs found

    Visual Homing in Dynamic Indoor Environments

    Get PDF
    Institute of Perception, Action and BehaviourOur dissertation concerns robotic navigation in dynamic indoor environments using image-based visual homing. Image-based visual homing infers the direction to a goal location S from the navigator’s current location C using the similarity between panoramic images IS and IC captured at those locations. There are several ways to compute this similarity. One of the contributions of our dissertation is to identify a robust image similarity measure – mutual image information – to use in dynamic indoor environments. We crafted novel methods to speed the computation of mutual image information with both parallel and serial processors and demonstrated that these time-savers had little negative effect on homing success. Image-based visual homing requires a homing agent tomove so as to optimise themutual image information signal. As the mutual information signal is corrupted by sensor noise we turned to the stochastic optimisation literature for appropriate optimisation algorithms. We tested a number of these algorithms in both simulated and real dynamic laboratory environments and found that gradient descent (with gradients computed by one-sided differences) works best

    Edge detection using neural network arbitration

    Get PDF
    A human observer is able to recognise and describe most parts of an object by its contour, if this is properly traced and reflects the shape of the object itself. With a machine vision system this recognition task has been approached using a similar technique. This prompted the development of many diverse edge detection algorithms. The work described in this thesis is based on the visual observation that edge maps produced by different algorithms, as the image degrades. Display different properties of the original image. Our proposed objective is to try and improve the edge map through the arbitration between edge maps produced by diverse (in nature, approach and performance) edge detection algorithms. As image processing tools are repetitively applied to similar images we believe the objective can be achieved by a learning process based on sample images. It is shown that such an approach is feasible, using an artificial neural network to perform the arbitration. This is taught from sets extracted from sample images. The arbitration system is implemented upon a parallel processing platform. The performance of the system is presented through examples of diverse types of image. Comparisons with a neural network edge detector (also developed within this thesis) and conventional edge detectors show that the proposed system presents significant advantages

    Contributions for the automatic description of multimodal scenes

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

    Enhancing person annotation for personal photo management using content and context based technologies

    Get PDF
    Rapid technological growth and the decreasing cost of photo capture means that we are all taking more digital photographs than ever before. However, lack of technology for automatically organising personal photo archives has resulted in many users left with poorly annotated photos, causing them great frustration when such photo collections are to be browsed or searched at a later time. As a result, there has recently been significant research interest in technologies for supporting effective annotation. This thesis addresses an important sub-problem of the broad annotation problem, namely "person annotation" associated with personal digital photo management. Solutions to this problem are provided using content analysis tools in combination with context data within the experimental photo management framework, called “MediAssist”. Readily available image metadata, such as location and date/time, are captured from digital cameras with in-built GPS functionality, and thus provide knowledge about when and where the photos were taken. Such information is then used to identify the "real-world" events corresponding to certain activities in the photo capture process. The problem of enabling effective person annotation is formulated in such a way that both "within-event" and "cross-event" relationships of persons' appearances are captured. The research reported in the thesis is built upon a firm foundation of content-based analysis technologies, namely face detection, face recognition, and body-patch matching together with data fusion. Two annotation models are investigated in this thesis, namely progressive and non-progressive. The effectiveness of each model is evaluated against varying proportions of initial annotation, and the type of initial annotation based on individual and combined face, body-patch and person-context information sources. The results reported in the thesis strongly validate the use of multiple information sources for person annotation whilst emphasising the advantage of event-based photo analysis in real-life photo management systems

    Edge detection using neural network arbitration

    Get PDF
    A human observer is able to recognise and describe most parts of an object by its contour, if this is properly traced and reflects the shape of the object itself. With a machine vision system this recognition task has been approached using a similar technique. This prompted the development of many diverse edge detection algorithms. The work described in this thesis is based on the visual observation that edge maps produced by different algorithms, as the image degrades. Display different properties of the original image. Our proposed objective is to try and improve the edge map through the arbitration between edge maps produced by diverse (in nature, approach and performance) edge detection algorithms. As image processing tools are repetitively applied to similar images we believe the objective can be achieved by a learning process based on sample images. It is shown that such an approach is feasible, using an artificial neural network to perform the arbitration. This is taught from sets extracted from sample images. The arbitration system is implemented upon a parallel processing platform. The performance of the system is presented through examples of diverse types of image. Comparisons with a neural network edge detector (also developed within this thesis) and conventional edge detectors show that the proposed system presents significant advantages

    Artificial Neural Networks in Agriculture

    Get PDF
    Modern agriculture needs to have high production efficiency combined with a high quality of obtained products. This applies to both crop and livestock production. To meet these requirements, advanced methods of data analysis are more and more frequently used, including those derived from artificial intelligence methods. Artificial neural networks (ANNs) are one of the most popular tools of this kind. They are widely used in solving various classification and prediction tasks, for some time also in the broadly defined field of agriculture. They can form part of precision farming and decision support systems. Artificial neural networks can replace the classical methods of modelling many issues, and are one of the main alternatives to classical mathematical models. The spectrum of applications of artificial neural networks is very wide. For a long time now, researchers from all over the world have been using these tools to support agricultural production, making it more efficient and providing the highest-quality products possible

    Architectures d'apprentissage profond pour la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires. Application à la surveillance dans les transports publics

    Get PDF
    Cette thÚse porte sur la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires. La question principale est, à partir d'une vidéo ou d'une séquence d'images donnée, de savoir comment reconnaßtre des actions particuliÚres qui se produisent. Cette tùche est importante et est un défi majeur à cause d'un certain nombre de verrous scientifiques induits par la variabilité des conditions d'acquisition, comme l'éclairage, la position, l'orientation et le champ de vue de la caméra, ainsi que par la variabilité de la réalisation des actions, notamment de leur vitesse d'exécution. Pour surmonter certaines de ces difficultés, dans un premier temps, nous examinons et évaluons les techniques les plus récentes pour la reconnaissance d'actions dans des vidéos. Nous proposons ensuite une nouvelle approche basée sur des réseaux de neurones profonds pour la reconnaissance d'actions humaines à partir de séquences de squelettes 3D. Deux questions clés ont été traitées. Tout d'abord, comment représenter la dynamique spatio-temporelle d'une séquence de squelettes pour exploiter efficacement la capacité d'apprentissage des représentations de haut niveau des réseaux de neurones convolutifs (CNNs ou ConvNets). Ensuite, comment concevoir une architecture de CNN capable d'apprendre des caractéristiques spatio-temporelles discriminantes à partir de la représentation proposée dans un objectif de classification. Pour cela, nous introduisons deux nouvelles représentations du mouvement 3D basées sur des squelettes, appelées SPMF (Skeleton Posture-Motion Feature) et Enhanced-SPMF, qui encodent les postures et les mouvements humains extraits des séquences de squelettes sous la forme d'images couleur RGB. Pour les tùches d'apprentissage et de classification, nous proposons différentes architectures de CNNs, qui sont basées sur les modÚles Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) et Efficient Neural Architecture Search (ENAS), pour extraire des caractéristiques robustes de la représentation sous forme d'image que nous proposons et pour les classer. Les résultats expérimentaux sur des bases de données publiques (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, et NTU-RGB+D) montrent que notre approche surpasse les méthodes de l'état de l'art. Nous proposons également une nouvelle technique pour l'estimation de postures humaines à partir d'une vidéo RGB. Pour cela, le modÚle d'apprentissage profond appelé OpenPose est utilisé pour détecter les personnes et extraire leur posture en 2D. Un réseau de neurones profond est ensuite proposé pour apprendre la transformation permettant de reconstruire ces postures en trois dimensions. Les résultats expérimentaux sur la base de données Human3.6M montrent l'efficacité de la méthode proposée. Ces résultats ouvrent des perspectives pour une approche de la reconnaissance d'actions humaines à partir des séquences de squelettes 3D sans utiliser des capteurs de profondeur comme la Kinect. Nous avons également constitué la base CEMEST, une nouvelle base de données RGB-D illustrant des comportements de passagers dans les transports publics. Elle contient 203 vidéos de surveillance collectées dans une station du métro incluant des événements "normaux" et "anormaux". Nous avons obtenu des résultats prometteurs sur cette base en utilisant des techniques d'augmentation de données et de transfert d'apprentissage. Notre approche permet de concevoir des applications basées sur des techniques de l'apprentissage profond pour renforcer la qualité des services de transport en commun.This thesis is dealing with automatic recognition of human actions from monocular RGB-D video sequences. Our main goal is to recognize which human actions occur in unknown videos. This problem is a challenging task due to a number of obstacles caused by the variability of the acquisition conditions, including the lighting, the position, the orientation and the field of view of the camera, as well as the variability of actions which can be performed differently, notably in terms of speed. To tackle these problems, we first review and evaluate the most prominent state-of-the-art techniques to identify the current state of human action recognition in videos. We then propose a new approach for skeleton-based action recognition using Deep Neural Networks (DNNs). Two key questions have been addressed. First, how to efficiently represent the spatio-temporal patterns of skeletal data for fully exploiting the capacity in learning high-level representations of Deep Convolutional Neural Networks (D-CNNs). Second, how to design a powerful D-CNN architecture that is able to learn discriminative features from the proposed representation for classification task. As a result, we introduce two new 3D motion representations called SPMF (Skeleton Posture-Motion Feature) and Enhanced-SPMF that encode skeleton poses and their motions into color images. For learning and classification tasks, we design and train different D-CNN architectures based on the Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) and Efficient Neural Architecture Search (ENAS) to extract robust features from color-coded images and classify them. Experimental results on various public and challenging human action recognition datasets (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, and NTU-RGB+D) show that the proposed approach outperforms current state-of-the-art. We also conducted research on the problem of 3D human pose estimation from monocular RGB video sequences and exploited the estimated 3D poses for recognition task. Specifically, a deep learning-based model called OpenPose is deployed to detect 2D human poses. A DNN is then proposed and trained for learning a 2D-to-3D mapping in order to map the detected 2D keypoints into 3D poses. Our experiments on the Human3.6M dataset verified the effectiveness of the proposed method. These obtained results allow opening a new research direction for human action recognition from 3D skeletal data, when the depth cameras are failing. In addition, we collect and introduce in this thesis, CEMEST database, a new RGB-D dataset depicting passengers' behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic "normal" and "abnormal" events. We achieve promising results on CEMEST with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing public transportation management services

    Fail-Safe Vehicle Pose Estimation in Lane-Level Maps Using Pose Graph Optimization

    Get PDF
    Die hochgenaue PosenschĂ€tzung autonomer Fahrzeuge sowohl in HD-Karten als auch spurrelativ ist unerlĂ€sslich um eine sichere FahrzeugfĂŒhrung zu gewĂ€hrleisten. FĂŒr die Serienfertigung wird aus Kosten- und PlatzgrĂŒnden bewusst auf hochgenaue, teure Einzelsensorik verzichtet und stattdessen auf eine Vielzahl von Sensoren, die neben der PosenschĂ€tzung auch von anderen Modulen verwendet werden können, zurĂŒckgegriffen. Im Fokus dieser Arbeit steht die UnsicherheitsschĂ€tzung, Bewertung und Fusion dieser Sensordaten. Die Optimierung von Posengraphen zur Fusion von Sensordaten zeichnet sich, im Gegensatz zu klassischen Filterverfahren, wie Kalman oder Partikelfilter, durch seine Robustheit gegenĂŒber Fehlmessungen und der FlexibilitĂ€t in der Modellierung aus. Die Optimierung eines Posengraphen wurde erstmalig auf mobilen Roboterplattformen zur Lösung sogenannter SLAM-Probleme angewendet. Diese Verfahren wurden immer weiter entwickelt und im speziellen auch zur rein kamerabasierten Lokalisierung autonomer Fahrzeuge in 3D-Punktwolken erfolgreich emonstriert. FĂŒr die Entwicklung und Freigabe sicherheitsrelevanter Systeme nach ISO 26262 wird neben der Genauigkeit jedoch auch eine Aussage ĂŒber die QualitĂ€t und Ausfallsicherheit dieser Systeme gefordert. Diese Arbeit befasst sich, neben der SchĂ€tzung der karten- und spurrelativen Pose, auch mit der SchĂ€tzung der Posenunsicherheit und der IntegritĂ€t der Sensordaten zueinander. Auf Grundlage dieser Arbeit wird eine AbschĂ€tzung der Ausfallsicherheit des Lokalisierungsmoduls ermöglicht. Motiviert durch das Projekt Ko-HAF werden zur Lokalisierung in HD-Karten lediglich Spurmarkierungen verwendet. Die speichereffiziente Darstellung dieser Karten ermöglicht eine hochfrequente Aktualisierung der Karteninhalte durch eine Fahrzeugflotte. Der vorgestellte Ansatz wurde prototypisch auf einem Opel Insignia umgesetzt. Der TesttrĂ€ger wurde um eine Front- und Heckkamera sowie einen GNSS-EmpfĂ€nger erweitert. ZunĂ€chst werden die SchĂ€tzung der karten-und spurrelativen Fahrzeugpose, der GNSS-Signalauswertung sowie der BewegungsschĂ€tzung des Fahrzeugs vorgestellt. Durch einen Vergleich der SchĂ€tzungen zueinander werden die Unsicherheiten der einzelnen Module berechnet. Das Lokalisierungsproblem wird dann durch einen Optimierer gelöst. Mithilfe der berechneten Unsicherheiten wird in einem nachgelagerten Schritt eine Bewertung der einzelnen Module durchgefĂŒhrt. Zur Bewertung des Ansatzes wurden sowohl hochdynamische Manöver auf einer Teststrecke als auch Fahrten auf öffentlichen Autobahnen ausgewertet
    corecore