56 research outputs found

    Saliency-based approaches for multidimensional explainability of deep networks

    Get PDF
    In deep learning, visualization techniques extract the salient patterns exploited by deep networks to perform a task (e.g. image classification) focusing on single images. These methods allow a better understanding of these complex models, empowering the identification of the most informative parts of the input data. Beyond the deep network understanding, visual saliency is useful for many quantitative reasons and applications, both in the 2D and 3D domains, such as the analysis of the generalization capabilities of a classifier and autonomous navigation. In this thesis, we describe an approach to cope with the interpretability problem of a convolutional neural network and propose our ideas on how to exploit the visualization for applications like image classification and active object recognition. After a brief overview on common visualization methods producing attention/saliency maps, we will address two separate points: firstly, we will describe how visual saliency can be effectively used in the 2D domain (e.g. RGB images) to boost image classification performances: as a matter of fact, visual summaries, i.e. a compact representation of an ensemble of saliency maps, can be used to improve the classification accuracy of a network through summary-driven specializations. Then, we will present a 3D active recognition system that allows to consider different views of a target object, overcoming the single-view hypothesis of classical object recognition, making the classification problem much easier in principle. Here we adopt such attention maps in a quantitative fashion, by building a 3D dense saliency volume which fuses together saliency maps obtained from different viewpoints, obtaining a continuous proxy on which parts of an object are more discriminative for a given classifier. Finally, we will show how to inject this representations in a real world application, so that an agent (e.g. robot) can move knowing the capabilities of its classifier

    Real-time appearance-based gaze tracking.

    Get PDF
    PhDGaze tracking technology is widely used in Human Computer Interaction applications such as in interfaces for assisting people with disabilities and for driver attention monitoring. However, commercially available gaze trackers are expensive and their performance deteriorates if the user is not positioned in front of the camera and facing it. Also, head motion or being far from the device degrades their accuracy. This thesis focuses on the development of real-time time appearance based gaze tracking algorithms using low cost devices, such as a webcam or Kinect. The proposed algorithms are developed by considering accuracy, robustness to head pose variation and the ability to generalise to different persons. In order to deal with head pose variation, we propose to estimate the head pose and then compensate for the appearance change and the bias to a gaze estimator that it introduces. Head pose is estimated by a novel method that utilizes tensor-based regressors at the leaf nodes of a random forest. For a baseline gaze estimator we use an SVM-based appearance-based regressor. For compensating the appearance variation introduced by the head pose, we use a geometric model, and for compensating for the bias we use a regression function that has been trained on a training set. Our methods are evaluated on publicly available dataset

    Artificial Intelligence Tools for Facial Expression Analysis.

    Get PDF
    Inner emotions show visibly upon the human face and are understood as a basic guide to an individual’s inner world. It is, therefore, possible to determine a person’s attitudes and the effects of others’ behaviour on their deeper feelings through examining facial expressions. In real world applications, machines that interact with people need strong facial expression recognition. This recognition is seen to hold advantages for varied applications in affective computing, advanced human-computer interaction, security, stress and depression analysis, robotic systems, and machine learning. This thesis starts by proposing a benchmark of dynamic versus static methods for facial Action Unit (AU) detection. AU activation is a set of local individual facial muscle parts that occur in unison constituting a natural facial expression event. Detecting AUs automatically can provide explicit benefits since it considers both static and dynamic facial features. For this research, AU occurrence activation detection was conducted by extracting features (static and dynamic) of both nominal hand-crafted and deep learning representation from each static image of a video. This confirmed the superior ability of a pretrained model that leaps in performance. Next, temporal modelling was investigated to detect the underlying temporal variation phases using supervised and unsupervised methods from dynamic sequences. During these processes, the importance of stacking dynamic on top of static was discovered in encoding deep features for learning temporal information when combining the spatial and temporal schemes simultaneously. Also, this study found that fusing both temporal and temporal features will give more long term temporal pattern information. Moreover, we hypothesised that using an unsupervised method would enable the leaching of invariant information from dynamic textures. Recently, fresh cutting-edge developments have been created by approaches based on Generative Adversarial Networks (GANs). In the second section of this thesis, we propose a model based on the adoption of an unsupervised DCGAN for the facial features’ extraction and classification to achieve the following: the creation of facial expression images under different arbitrary poses (frontal, multi-view, and in the wild), and the recognition of emotion categories and AUs, in an attempt to resolve the problem of recognising the static seven classes of emotion in the wild. Thorough experimentation with the proposed cross-database performance demonstrates that this approach can improve the generalization results. Additionally, we showed that the features learnt by the DCGAN process are poorly suited to encoding facial expressions when observed under multiple views, or when trained from a limited number of positive examples. Finally, this research focuses on disentangling identity from expression for facial expression recognition. A novel technique was implemented for emotion recognition from a single monocular image. A large-scale dataset (Face vid) was created from facial image videos which were rich in variations and distribution of facial dynamics, appearance, identities, expressions, and 3D poses. This dataset was used to train a DCNN (ResNet) to regress the expression parameters from a 3D Morphable Model jointly with a back-end classifier

    Deep Learning Approaches for Seizure Video Analysis: A Review

    Full text link
    Seizure events can manifest as transient disruptions in the control of movements which may be organized in distinct behavioral sequences, accompanied or not by other observable features such as altered facial expressions. The analysis of these clinical signs, referred to as semiology, is subject to observer variations when specialists evaluate video-recorded events in the clinical setting. To enhance the accuracy and consistency of evaluations, computer-aided video analysis of seizures has emerged as a natural avenue. In the field of medical applications, deep learning and computer vision approaches have driven substantial advancements. Historically, these approaches have been used for disease detection, classification, and prediction using diagnostic data; however, there has been limited exploration of their application in evaluating video-based motion detection in the clinical epileptology setting. While vision-based technologies do not aim to replace clinical expertise, they can significantly contribute to medical decision-making and patient care by providing quantitative evidence and decision support. Behavior monitoring tools offer several advantages such as providing objective information, detecting challenging-to-observe events, reducing documentation efforts, and extending assessment capabilities to areas with limited expertise. The main applications of these could be (1) improved seizure detection methods; (2) refined semiology analysis for predicting seizure type and cerebral localization. In this paper, we detail the foundation technologies used in vision-based systems in the analysis of seizure videos, highlighting their success in semiology detection and analysis, focusing on work published in the last 7 years. Additionally, we illustrate how existing technologies can be interconnected through an integrated system for video-based semiology analysis.Comment: Accepted in Epilepsy & Behavio

    Performance Driven Facial Animation with Blendshapes

    Get PDF

    Brain Network Modelling

    Get PDF

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Kinematics and control of precision grip grasping

    Get PDF
    This thesis is about the kind of signals used in our central nervous system for guiding skilled motor behavior. In the first two projects a currently very influential theory on the flow of visual information inside our brain was tested. According to A. D. Milner and Goodale (1995) there exist two largely independent visual streams. The dorsal stream is supposed to transmit visual information for the guidance of action. The ventral stream is thought generate a conscious percept of the environment. The streams are said to use different parts of the visual information and to differ in temporal characteristics. Namely, the dorsal stream is proposed to have a lower sensitivity for color and a more rapid decay of information than the ventral stream. In the first project the role of chromatic information in action guidance was probed. We let participants grasp colored stimuli which varied in luminance. Criti- cally, some of these stimuli were completely isoluminant with the background. These stimuli thus could only be discriminated from their surrounding by means of chro- matic contrast, a poor input signal for the dorsal stream. Nevertheless, our partici- pants were perfectly able to guide their grip to these targets as well. In the second project the temporal characteristics of the two streams were probed. For a certain group of neurological patients it has been argued that they are able to switch from dorsal to ventral control when visual information is re- moved. These optic ataxic patients are normally quite bad at executing visually guided movements like e.g. pointing or grasping. Different researchers, however, demonstrated that their accuracy does improve when there is a delay between tar- get presentation and movement execution. Using different delay times and pointing movements Himmelbach and Karnath (2005) had shown that this improvement in- creases linearly with longer delay. We aimed at a replication of this result and a generalization to precision grip movements. Our results from two patients, however, did not show any improvement in grasping due to longer delay time. In pointing an effect was found only in one of the patients and only in one of several measures of pointing accuracy. Taken together the results of the first two projects don´t support the idea of two independent visual streams and are more in line with the idea of a single visual representation of target objects. The third project aimed at closing a gap in existing model approaches on pre- cision grip kinematics. The available models need the target points of a movement as an input on which they can operate. From the literature on human and robotic grasping we extracted the most plausible set of rules for grasp point selection. We created objects suitable to put these rules into conflict with each other. Thereby we estimated the individual contribution of each rule. We validated the model by predicting grasp points on a completely novel set of objects. Our straightforward approach showed a very good performance in predicting the preferred contact points of human actors.Diese Dissertation handelt von den Mechanismen mit denen unser Zentralnerven- system menschliche Feinmotorik koordiniert. Gegenstand der ersten beiden Projekte ist die Theorie von A. D. Milner und Goodale (1995). Laut diesen Autoren gibt es im visuellen System zwei unabhängige Verarbeitungspfade. Der dorsale Pfad verarbeitet visuelle Information zum Zweck der Handlungssteuerung. Der ventrale Pfad vermittelt bewusste visuelle Wahrnehmung. Beide Pfade verfügen uber teils unterschiedliche Anteile der gesamten visuellen Information. So soll der dorsale Pfad gegenüber dem ventralen zum Beispiel durch geringere Farbsensitivität sowie einen schnelleren Zerfall der Information gekennzeichnet sein. Im ersten Projekt wurde die Eignung von Farbinformation zur Handlungskontrolle getestet. Teilnehmer der Studie griffen nach farbigen Stimuli deren Helligkeit variiert wurde. Einige der Stimuli hatten die gleiche Helligkeit wie der Hintergrund vor dem sie präsentiert wurden. Diese Stimuli hoben sich also nur durch ihre Farbe vom Hintergrund ab. Trotz der angenommenen Farbinsensitivität des dorsalen Pfades konnten unsere Teilnehmer auch diese Stimuli problemlos greifen. Gegenstand des zweiten Projektes waren die Unterschiede beider Pfade im zeitlichen Verfall der visuellen Information. Einigen Patienten mit speziellen Hirn- schädigungen soll es möglich sein zwischen den Repräsentationen beider Pfade zu wechseln. Diese optischen Ataktiker zeigen starke Unsicherheit bei visuell geführten Bewegungen wie Zeigen oder Greifen. Wiederholt wurde jedoch gezeigt, dass ihre Bewegungen genauer werden wenn die Ausführung einige Zeit nach der Zielpräsentation erfolgt. Himmelbach und Karnath (2005) berichten, dass diese Verbesserung beim Zeigen linear mit der Länge des zwischengeschalteten Intervalles zunimmt. Wir versuchten dieses Ergebnis zu reproduzieren und auf das Greifen zu generalisieren. Die zwei von uns gemessenen Patienten zeigten beim Greifen jedoch keinen Effekt. Beim Zeigen zeigte sich eine Verbesserung nur bei einem Patienten und nur in einem von mehreren Maßen für die Zeigegenauigkeit. Insgesamt betrachtet widersprechen die Ergebnisse des ersten und zweiten Projektes der Vorstellung zweier getrennter visueller Pfade. Die hier präsentierten Daten lassen sich ebenso effektiv, aber deutlich effizienter, durch die Verarbeitung in einem einzelnen visuellen Verarbeitungspfad erklären. Das dritte Projekt soll eine Lücke in bestehenden Modellen zur Beschreibung der Kinematik des Greifens schließen. Alle diese Modelle sind darauf angewiesen, dass ihnen die Zielpunkte der Bewegung vorgegeben werden. Aus der Literatur zu menschlichem und maschinellem Greifen extrahierten wir die plausibelsten Regeln zur Auswahl dieser Zielpunkte. Wir brachten diese Regeln experimentell in Konflikt zueinander und schätzten auf diese Weise ihren relativen Einfluss. Das Modell wurde anschließend validiert indem wir die besten Greifpunkte für einen neuen Satz von Objekten vorhersagten. Mit wenigen Regeln konnten wir so sehr erfolgreich im Vorhinein die vom Menschen präferierten Greifpunkte bestimmen

    Convolutional neural networks for the segmentation of small rodent brain MRI

    Get PDF
    Image segmentation is a common step in the analysis of preclinical brain MRI, often performed manually. This is a time-consuming procedure subject to inter- and intra- rater variability. A possible alternative is the use of automated, registration-based segmentation, which suffers from a bias owed to the limited capacity of registration to adapt to pathological conditions such as Traumatic Brain Injury (TBI). In this work a novel method is developed for the segmentation of small rodent brain MRI based on Convolutional Neural Networks (CNNs). The experiments here presented show how CNNs provide a fast, robust and accurate alternative to both manual and registration-based methods. This is demonstrated by accurately segmenting three large datasets of MRI scans of healthy and Huntington disease model mice, as well as TBI rats. MU-Net and MU-Net-R, the CCNs here presented, achieve human-level accuracy while eliminating intra-rater variability, alleviating the biases of registration-based segmentation, and with an inference time of less than one second per scan. Using these segmentation masks I designed a geometric construction to extract 39 parameters describing the position and orientation of the hippocampus, and later used them to classify epileptic vs. non-epileptic rats with a balanced accuracy of 0.80, five months after TBI. This clinically transferable geometric approach detects subjects at high-risk of post-traumatic epilepsy, paving the way towards subject stratification for antiepileptogenesis studies
    • …
    corecore