27 research outputs found

    Out-of-plane action unit recognition using recurrent neural networks

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science. Johannesburg, 2015.The face is a fundamental tool to assist in interpersonal communication and interaction between people. Humans use facial expressions to consciously or subconsciously express their emotional states, such as anger or surprise. As humans, we are able to easily identify changes in facial expressions even in complicated scenarios, but the task of facial expression recognition and analysis is complex and challenging to a computer. The automatic analysis of facial expressions by computers has applications in several scientific subjects such as psychology, neurology, pain assessment, lie detection, intelligent environments, psychiatry, and emotion and paralinguistic communication. We look at methods of facial expression recognition, and in particular, the recognition of Facial Action Coding System’s (FACS) Action Units (AUs). Movements of individual muscles on the face are encoded by FACS from slightly different, instant changes in facial appearance. Contractions of specific facial muscles are related to a set of units called AUs. We make use of Speeded Up Robust Features (SURF) to extract keypoints from the face and use the SURF descriptors to create feature vectors. SURF provides smaller sized feature vectors than other commonly used feature extraction techniques. SURF is comparable to or outperforms other methods with respect to distinctiveness, robustness, and repeatability. It is also much faster than other feature detectors and descriptors. The SURF descriptor is scale and rotation invariant and is unaffected by small viewpoint changes or illumination changes. We use the SURF feature vectors to train a recurrent neural network (RNN) to recognize AUs from the Cohn-Kanade database. An RNN is able to handle temporal data received from image sequences in which an AU or combination of AUs are shown to develop from a neutral face. We are recognizing AUs as they provide a more fine-grained means of measurement that is independent of age, ethnicity, gender and different expression appearance. In addition to recognizing FACS AUs from the Cohn-Kanade database, we use our trained RNNs to recognize the development of pain in human subjects. We make use of the UNBC-McMaster pain database which contains image sequences of people experiencing pain. In some cases, the pain results in their face moving out-of-plane or some degree of in-plane movement. The temporal processing ability of RNNs can assist in classifying AUs where the face is occluded and not facing frontally for some part of the sequence. Results are promising when tested on the Cohn-Kanade database. We see higher overall recognition rates for upper face AUs than lower face AUs. Since keypoints are globally extracted from the face in our system, local feature extraction could provide improved recognition results in future work. We also see satisfactory recognition results when tested on samples with out-of-plane head movement, showing the temporal processing ability of RNNs

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Evolutionary design of deep neural networks

    Get PDF
    Mención Internacional en el título de doctorFor three decades, neuroevolution has applied evolutionary computation to the optimization of the topology of artificial neural networks, with most works focusing on very simple architectures. However, times have changed, and nowadays convolutional neural networks are the industry and academia standard for solving a variety of problems, many of which remained unsolved before the discovery of this kind of networks. Convolutional neural networks involve complex topologies, and the manual design of these topologies for solving a problem at hand is expensive and inefficient. In this thesis, our aim is to use neuroevolution in order to evolve the architecture of convolutional neural networks. To do so, we have decided to try two different techniques: genetic algorithms and grammatical evolution. We have implemented a niching scheme for preserving the genetic diversity, in order to ease the construction of ensembles of neural networks. These techniques have been validated against the MNIST database for handwritten digit recognition, achieving a test error rate of 0.28%, and the OPPORTUNITY data set for human activity recognition, attaining an F1 score of 0.9275. Both results have proven very competitive when compared with the state of the art. Also, in all cases, ensembles have proven to perform better than individual models. Later, the topologies learned for MNIST were tested on EMNIST, a database recently introduced in 2017, which includes more samples and a set of letters for character recognition. Results have shown that the topologies optimized for MNIST perform well on EMNIST, proving that architectures can be reused across domains with similar characteristics. In summary, neuroevolution is an effective approach for automatically designing topologies for convolutional neural networks. However, it still remains as an unexplored field due to hardware limitations. Current advances, however, should constitute the fuel that empowers the emergence of this field, and further research should start as of today.This Ph.D. dissertation has been partially supported by the Spanish Ministry of Education, Culture and Sports under FPU fellowship with identifier FPU13/03917. This research stay has been partially co-funded by the Spanish Ministry of Education, Culture and Sports under FPU short stay grant with identifier EST15/00260.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: María Araceli Sanchís de Miguel.- Secretario: Francisco Javier Segovia Pérez.- Vocal: Simon Luca

    Brain Tumor Diagnosis Support System: A decision Fusion Framework

    Get PDF
    An important factor in providing effective and efficient therapy for brain tumors is early and accurate detection, which can increase survival rates. Current image-based tumor detection and diagnosis techniques are heavily dependent on interpretation by neuro-specialists and/or radiologists, making the evaluation process time-consuming and prone to human error and subjectivity. Besides, widespread use of MR spectroscopy requires specialized processing and assessment of the data and obvious and fast show of the results as photos or maps for routine medical interpretative of an exam. Automatic brain tumor detection and classification have the potential to offer greater efficiency and predictions that are more accurate. However, the performance accuracy of automatic detection and classification techniques tends to be dependent on the specific image modality and is well known to vary from technique to technique. For this reason, it would be prudent to examine the variations in the execution of these methods to obtain consistently high levels of achievement accuracy. Designing, implementing, and evaluating categorization software is the goal of the suggested framework for discerning various brain tumor types on magnetic resonance imaging (MRI) using textural features. This thesis introduces a brain tumor detection support system that involves the use of a variety of tumor classifiers. The system is designed as a decision fusion framework that enables these multi-classifier to analyze medical images, such as those obtained from magnetic resonance imaging (MRI). The fusion procedure is ground on the Dempster-Shafer evidence fusion theory. Numerous experimental scenarios have been implemented to validate the efficiency of the proposed framework. Compared with alternative approaches, the outcomes show that the methodology developed in this thesis demonstrates higher accuracy and higher computational efficiency

    Quantum neural networks

    Get PDF
    Quantum computing is one of the most exciting research areas of the last decades. At the same time, methods of machine learning have started to dominate science, industry and our everyday life. In this thesis we combine these two essential research topics of the 21st century and introduce dissipative quantum neural networks (DQNNs), which are designed for fully quantum learning tasks, are capable of universal quantum computation and have low memory requirements while training. We start the discussion of this interdisciplinary topic by introducing artificial neural networks, which are a very common tool in classical machine learning. Next, we give an overview on quantum information. Here we focus on quantum algorithms and circuits, which are used to implement quantum neural networks. Moreover, we explain the opportunities and challenges arising with today's quantum computers. The discussion of the architecture and training algorithm of the DQNNs forms the core of this work. These networks are optimised with training data pairs in form of input and desired output states and therefore can be used for characterising unknown or untrusted quantum devices. We not only demonstrate the generalisation behaviour of these quantum neural networks using classical simulations, but also implement them successfully on actual quantum computers. To understand the ultimate limits for such quantum machine learning methods, we discuss the quantum no free lunch theorem, which describes a bound on the probability that a quantum device, which can be modelled as a unitary process and is optimised with quantum examples, gives an incorrect output for a random input. This gives us a tool to review the learning behaviour of quantum neural networks in general and the DQNNs in particular. Moreover we expand the area of applications of DQNNs in two directions. In the first case, we include additional information beyond just the training data pairs: since quantum devices are always structured, the resulting data is always structured as well. We modify the DQNN's training algorithm such that knowledge about the graph-structure of the training data pairs is included in the training process and show that this can lead to better generalisation behaviour. Both the original DQNN and the DQNN including graph structure are trained with data pairs in order to characterise an underlying relation. However, in the second extension of the algorithm we aim to learn characteristics of a set of quantum states in order to extend it to quantum states which have similar properties. Therefore we build a generative adversarial model where two DQNNs, called the generator and discriminator, are trained in a competitive way. Overall, we observe that DQNNs can not only be trained efficiently but also, similar to their classical counterparts, modified to suit different applications.Quantencomputer bilden eines der spannendsten Forschungsgebiete der letzten Jahrzehnte. Zur gleichen Zeit haben Methoden des maschinellen Lernens begonnen die Wissenschaft, Industrie und unseren Alltag zu dominieren. In dieser Arbeit kombinieren wir diese beiden wichtigen Forschungsthemen des 21. Jahrhunderts und stellen dissipative quantenneuronale Netze (DQNNs) vor, die für Quantenlernaufgaben konzipiert sind, universelle Quantenberechnungen durchführen können und wenig Speicherbedarf beim Training benötigen. Wir beginnen die Diskussion dieses interdisziplinären Themas mit der Einführung künstlicher neuronaler Netze, die beim klassischen maschinellen Lernen weit verbreitet sind. Dann geben wir einen Überblick über die Quanteninformationstheorie. Hier fokussieren wir uns auf die zur Implementierung von quantenneuronalen Netzen nötigen Quantenalgorithmen und -schaltungen. Außerdem erläutern wir die Chancen und Herausforderungen der heutigen Quantencomputer. Die Diskussion der Architektur und des Trainingsalgorithmus der DQNNs bildet den Mittelpunkt dieser Arbeit. Diese Netzwerke werden mit Trainingsdatenpaaren in Form von Eingangs- und gewünschten Ausgangszuständen optimiert und können daher zur Charakterisierung unbekannter oder nicht vertrauenswürdiger Quantenbauelemente verwendet werden. Wir demonstrieren nicht nur das Generalisierungsverhalten dieser Netze anhand klassischer Simulationen, sondern konstruieren auch eine erfolgreiche Implementierung für Quantencomputer. Um die ultimativen Grenzen solcher Methoden zum maschinellen Lernen von Quantendaten zu verstehen, führen wir das quantum no free lunch-Theorem ein, welches eine Begrenzung für die Wahrscheinlichkeit beschreibt, dass ein als unitärer Prozess modellierbares und mit Quantendaten optimiertes Quantenbauelement eine falsche Ausgabe für eine zufällige Eingabe herausgibt. Das Theorem gibt uns ein Werkzeug, um das Lernverhalten von quantenneuronalen Netzwerken im Allgemeinen und der DQNNs im Besonderen zu überprüfen. Darüber hinaus erweitern wir den Anwendungsbereich von DQNNs auf zwei Weisen. Im ersten Fall beziehen wir Informationen zusätzlich zu den Trainingsdaten mit ein: Da Quantenbauelemente immer eine gewisse Struktur haben, sind auch die resultierenden Daten strukturiert. Wir modifizieren den Trainingsalgorithmus der DQNNs so, dass Kenntnisse über die Struktur genutzt werden können und zeigen, dass dies zu einem besseren Trainingsergebnis führen kann. Sowohl das ursprüngliche DQNN als auch das Graphen-DQNN wird mit Datenpaaren trainiert, um eine zugrunde liegende Relation zu charakterisieren. Als zweite Erweiterung wollen wir jedoch die Eigenschaften einer Menge einzelner Quantenzustände untersuchen, um sie mit Quantenzuständen ähnlicher Eigenschaften zu erweitern. Daher konstruieren wir ein Modell, bei dem zwei DQNNs, Generator und Diskriminator genannt, kompetitiv trainiert werden. Zusammenfassend stellen wir fest, dass DQNNs nicht nur effizient trainiert, sondern auch, ähnlich wie ihre klassischen Gegenstücke, an unterschiedliche Anwendungen angepasst werden können

    Connecting mathematical models for image processing and neural networks

    Get PDF
    This thesis deals with the connections between mathematical models for image processing and deep learning. While data-driven deep learning models such as neural networks are flexible and well performing, they are often used as a black box. This makes it hard to provide theoretical model guarantees and scientific insights. On the other hand, more traditional, model-driven approaches such as diffusion, wavelet shrinkage, and variational models offer a rich set of mathematical foundations. Our goal is to transfer these foundations to neural networks. To this end, we pursue three strategies. First, we design trainable variants of traditional models and reduce their parameter set after training to obtain transparent and adaptive models. Moreover, we investigate the architectural design of numerical solvers for partial differential equations and translate them into building blocks of popular neural network architectures. This yields criteria for stable networks and inspires novel design concepts. Lastly, we present novel hybrid models for inpainting that rely on our theoretical findings. These strategies provide three ways for combining the best of the two worlds of model- and data-driven approaches. Our work contributes to the overarching goal of closing the gap between these worlds that still exists in performance and understanding.Gegenstand dieser Arbeit sind die Zusammenhänge zwischen mathematischen Modellen zur Bildverarbeitung und Deep Learning. Während datengetriebene Modelle des Deep Learning wie z.B. neuronale Netze flexibel sind und gute Ergebnisse liefern, werden sie oft als Black Box eingesetzt. Das macht es schwierig, theoretische Modellgarantien zu liefern und wissenschaftliche Erkenntnisse zu gewinnen. Im Gegensatz dazu bieten traditionellere, modellgetriebene Ansätze wie Diffusion, Wavelet Shrinkage und Variationsansätze eine Fülle von mathematischen Grundlagen. Unser Ziel ist es, diese auf neuronale Netze zu übertragen. Zu diesem Zweck verfolgen wir drei Strategien. Zunächst entwerfen wir trainierbare Varianten von traditionellen Modellen und reduzieren ihren Parametersatz, um transparente und adaptive Modelle zu erhalten. Außerdem untersuchen wir die Architekturen von numerischen Lösern für partielle Differentialgleichungen und übersetzen sie in Bausteine von populären neuronalen Netzwerken. Daraus ergeben sich Kriterien für stabile Netzwerke und neue Designkonzepte. Schließlich präsentieren wir neuartige hybride Modelle für Inpainting, die auf unseren theoretischen Erkenntnissen beruhen. Diese Strategien bieten drei Möglichkeiten, das Beste aus den beiden Welten der modell- und datengetriebenen Ansätzen zu vereinen. Diese Arbeit liefert einen Beitrag zum übergeordneten Ziel, die Lücke zwischen den zwei Welten zu schließen, die noch in Bezug auf Leistung und Modellverständnis besteht.ERC Advanced Grant INCOVI

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary
    corecore