39 research outputs found

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Addressing Variability in Speech when Recognizing Emotion and Mood In-the-Wild

    Full text link
    Bipolar disorder is a chronic mental illness, affecting 4% of Americans, that is characterized by periodic mood changes ranging from severe depression to extreme compulsive highs. Both mania and depression profoundly impact the behavior of affected individuals, resulting in potentially devastating personal and social consequences. Bipolar disorder is managed clinically with regular interactions with care providers, who assess mood, energy levels, and the form and content of speech. Recent work has proposed smartphones for automatically monitoring mood using speech. Much of the early work in speech-centered mood detection has been done in the laboratory or clinic and is not reflective of the variability found in real-world conversations and conditions. Outside of these settings, automatic mood detection is hard, as the recordings include environmental noise, differences in recording devices, and variations in subject speaking patterns. Without addressing these issues, it is difficult to move towards a passive mobile health system. My research works to address this variability present in speech so that such a system can be created, allowing for interventions to mitigate the life-changing effects of mood transitions. However detecting mood directly from speech is difficult, as mood varies over the course of days or weeks, while speech fluctuates rapidly. To address this, my thesis explores how an intermediate step can be used to aid in this prediction. For example, one of the major symptoms of bipolar disorder is emotion dysregulation - changes in the way emotions are perceived and a lack of inhibition in their expression. My work has supported the relationship between automatically extracted emotion estimates and mood. Because of this, my thesis explores how to mitigate the variability found when detecting emotion from speech. The remainder of my thesis is focused on employing these emotion-based features, as well as features based on language content, to real-world applications. This dissertation is divided into the following parts: Part I: I address the direct classification of mood from speech. This is accomplished by addressing variability due to recording device using preprocessing and multi-task learning. I then show how both subject-specific and population-general information can be combined to significantly improve mood detection. Part II: I explore the automatic detection of emotion from speech and how to control for the other factors of variability present in the speech signal. I use progressive networks as a method to augment emotion with other paralinguistic data including gender and speaker, as well as other datasets. Additionally, I introduce a novel domain generalization method for cross-corpus detection. Part III: I demonstrate real-world applications of speech mood monitoring using everyday conversations. I show how the previously introduced generalized model can predict emotion from the speech of individuals with suicidal ideation, demonstrating its effectiveness across domains. Furthermore, I use these predictions to distinguish individuals with suicidal thoughts from healthy controls. Lastly, I introduce a novel framework for intervention detection in individuals with bipolar disorder. I then create a natural speech mood monitoring system based on features derived from measures of emotion and automatic speech recognition (ASR) transcripts and show effective intervention detection. I conclude this dissertation with the following future directions: (1) Extending my emotion generalization system to include multiple modalities and factors of variability; (2) Expanding natural speech mood monitoring by including more devices, exploring other data besides speech, and investigating mood rating causality.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/153461/1/gideonjn_1.pd

    Machine Learning in Resource-constrained Devices: Algorithms, Strategies, and Applications

    Get PDF
    The ever-increasing growth of technologies is changing people's everyday life. As a major consequence: 1) the amount of available data is growing and 2) several applications rely on battery supplied devices that are required to process data in real time. In this scenario the need for ad-hoc strategies for the development of low-power and low-latency intelligent systems capable of learning inductive rules from data using a modest mount of computational resources is becoming vital. At the same time, one needs to develop specic methodologies to manage complex patterns such as text and images. This Thesis presents different approaches and techniques for the development of fast learning models explicitly designed to be hosted on embedded systems. The proposed methods proved able to achieve state-of-the-art performances in term of the trade-off between generalization capabilities and area requirements when implemented in low-cost digital devices. In addition, advanced strategies for ecient sentiment analysis in text and images are proposed

    The Application of Evolutionary Algorithms to the Classification of Emotion from Facial Expressions

    Get PDF
    Emotions are an integral part of human daily life as they can influence behaviour. A reliable emotion detection system may help people in varied things, such as social contact, health care and gaming experience. Emotions can often be identified by facial expressions, but this can be difficult to achieve reliably as people are different and a person can mask or supress an expression. Instead of analysis on static image, the computing of the motion of an expression’s occurrence plays more important role for these reasons. The work described in this thesis considers an automated and objective approach to recognition of facial expressions using extracted optical flow, which may be a reliable alternative to human interpretation. The Farneback’s fast estimation has been used for the dense optical flow extraction. Evolutionary algorithms, inspired by Darwinian evolution, have been shown to perform well on complex,nonlinear datasets and are considered for the basis of this automated approach. Specifically, Cartesian Genetic Programming (CGP) is implemented, which can find computer programme that approaches user-defined tasks by the evolution of solutions, and modified to work as a classifier for the analysis of extracted flow data. Its performance compared with Support Vector Machine (SVM), which has been widely used in expression recognition problem, on a range of pre-recorded facial expressions obtained from two separate databases (MMI and FG-NET). CGP was shown flexible to optimise in the experiments: the imbalanced data classification problem is sharply reduced by applying an Area under Curve (AUC) based fitness function. Results presented suggest that CGP is capable to achieve better performance than SVM. An automatic expression recognition system has also been implemented based on the method described in the thesis. The future work is to propose investigation of an ensemble classifier implementing both CGP and SVM

    Combining visual recognition and computational linguistics : linguistic knowledge for visual recognition and natural language descriptions of visual content

    Get PDF
    Extensive efforts are being made to improve visual recognition and semantic understanding of language. However, surprisingly little has been done to exploit the mutual benefits of combining both fields. In this thesis we show how the different fields of research can profit from each other. First, we scale recognition to 200 unseen object classes and show how to extract robust semantic relatedness from linguistic resources. Our novel approach extends zero-shot to few shot recognition and exploits unlabeled data by adopting label propagation for transfer learning. Second, we capture the high variability but low availability of composite activity videos by extracting the essential information from text descriptions. For this we recorded and annotated a corpus for fine-grained activity recognition. We show improvements in a supervised case but we are also able to recognize unseen composite activities. Third, we present a corpus of videos and aligned descriptions. We use it for grounding activity descriptions and for learning how to automatically generate natural language descriptions for a video. We show that our proposed approach is also applicable to image description and that it outperforms baselines and related work. In summary, this thesis presents a novel approach for automatic video description and shows the benefits of extracting linguistic knowledge for object and activity recognition as well as the advantage of visual recognition for understanding activity descriptions.Trotz umfangreicher Anstrengungen zur Verbesserung der die visuelle Erkennung und dem automatischen Verständnis von Sprache, ist bisher wenig getan worden, um diese beiden Forschungsbereiche zu kombinieren. In dieser Dissertation zeigen wir, wie beide voneinander profitieren können. Als erstes skalieren wir Objekterkennung zu 200 ungesehen Klassen und zeigen, wie man robust semantische Ähnlichkeiten von Sprachressourcen extrahiert. Unser neuer Ansatz kombiniert Transfer und halbüberwachten Lernverfahren und kann so Daten ohne Annotation ausnutzen und mit keinen als auch mit wenigen Trainingsbeispielen auskommen. Zweitens erfassen wir die hohe Variabilität aber geringe Verfügbarkeit von Videos mit zusammengesetzten Aktivitäten durch Extraktion der wesentlichen Informationen aus Textbeschreibungen. Wir verbessern überwachtes Training als auch die Erkennung von ungesehenen Aktivitäten. Drittens stellen wir einen parallelen Datensatz von Videos und Beschreibungen vor. Wir verwenden ihn für Grounding von Aktivitätsbeschreibungen und um die automatische Generierung natürlicher Sprache für ein Video zu erlernen. Wir zeigen, dass sich unsere Ansatz auch für Bildbeschreibung einsetzten lässt und das er bisherige Ansätze übertrifft. Zusammenfassend stellt die Dissertation einen neuen Ansatz zur automatische Videobeschreibung vor und zeigt die Vorteile von sprachbasierten Ähnlichkeitsmaßen für die Objekt- und Aktivitätserkennung als auch umgekehrt

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Detection of Driver Drowsiness and Distraction Using Computer Vision and Machine Learning Approaches

    Get PDF
    Drowsiness and distracted driving are leading factor in most car crashes and near-crashes. This research study explores and investigates the applications of both conventional computer vision and deep learning approaches for the detection of drowsiness and distraction in drivers. In the first part of this MPhil research study conventional computer vision approaches was studied to develop a robust drowsiness and distraction system based on yawning detection, head pose detection and eye blinking detection. These algorithms were implemented by using existing human crafted features. Experiments were performed for the detection and classification with small image datasets to evaluate and measure the performance of system. It was observed that the use of human crafted features together with a robust classifier such as SVM gives better performance in comparison to previous approaches. Though, the results were satisfactorily, there are many drawbacks and challenges associated with conventional computer vision approaches, such as definition and extraction of human crafted features, thus making these conventional algorithms to be subjective in nature and less adaptive in practice. In contrast, deep learning approaches automates the feature selection process and can be trained to learn the most discriminative features without any input from human. In the second half of this research study, the use of deep learning approaches for the detection of distracted driving was investigated. It was observed that one of the advantages of the applied methodology and technique for distraction detection includes and illustrates the contribution of CNN enhancement to a better pattern recognition accuracy and its ability to learn features from various regions of a human body simultaneously. The comparison of the performance of four convolutional deep net architectures (AlexNet, ResNet, MobileNet and NASNet) was carried out, investigated triplet training and explored the impact of combining a support vector classifier (SVC) with a trained deep net. The images used in our experiments with the deep nets are from the State Farm Distracted Driver Detection dataset hosted on Kaggle, each of which captures the entire body of a driver. The best results were obtained with the NASNet trained using triplet loss and combined with an SVC. It was observed that one of the advantages of deep learning approaches are their ability to learn discriminative features from various regions of a human body simultaneously. The ability has enabled deep learning approaches to reach accuracy at human level.
    corecore