30 research outputs found

    Two and three dimensional segmentation of multimodal imagery

    Get PDF
    The role of segmentation in the realms of image understanding/analysis, computer vision, pattern recognition, remote sensing and medical imaging in recent years has been significantly augmented due to accelerated scientific advances made in the acquisition of image data. This low-level analysis protocol is critical to numerous applications, with the primary goal of expediting and improving the effectiveness of subsequent high-level operations by providing a condensed and pertinent representation of image information. In this research, we propose a novel unsupervised segmentation framework for facilitating meaningful segregation of 2-D/3-D image data across multiple modalities (color, remote-sensing and biomedical imaging) into non-overlapping partitions using several spatial-spectral attributes. Initially, our framework exploits the information obtained from detecting edges inherent in the data. To this effect, by using a vector gradient detection technique, pixels without edges are grouped and individually labeled to partition some initial portion of the input image content. Pixels that contain higher gradient densities are included by the dynamic generation of segments as the algorithm progresses to generate an initial region map. Subsequently, texture modeling is performed and the obtained gradient, texture and intensity information along with the aforementioned initial partition map are used to perform a multivariate refinement procedure, to fuse groups with similar characteristics yielding the final output segmentation. Experimental results obtained in comparison to published/state-of the-art segmentation techniques for color as well as multi/hyperspectral imagery, demonstrate the advantages of the proposed method. Furthermore, for the purpose of achieving improved computational efficiency we propose an extension of the aforestated methodology in a multi-resolution framework, demonstrated on color images. Finally, this research also encompasses a 3-D extension of the aforementioned algorithm demonstrated on medical (Magnetic Resonance Imaging / Computed Tomography) volumes

    Machine Learning and Its Application to Reacting Flows

    Get PDF
    This open access book introduces and explains machine learning (ML) algorithms and techniques developed for statistical inferences on a complex process or system and their applications to simulations of chemically reacting turbulent flows. These two fields, ML and turbulent combustion, have large body of work and knowledge on their own, and this book brings them together and explain the complexities and challenges involved in applying ML techniques to simulate and study reacting flows. This is important as to the world’s total primary energy supply (TPES), since more than 90% of this supply is through combustion technologies and the non-negligible effects of combustion on environment. Although alternative technologies based on renewable energies are coming up, their shares for the TPES is are less than 5% currently and one needs a complete paradigm shift to replace combustion sources. Whether this is practical or not is entirely a different question, and an answer to this question depends on the respondent. However, a pragmatic analysis suggests that the combustion share to TPES is likely to be more than 70% even by 2070. Hence, it will be prudent to take advantage of ML techniques to improve combustion sciences and technologies so that efficient and “greener” combustion systems that are friendlier to the environment can be designed. The book covers the current state of the art in these two topics and outlines the challenges involved, merits and drawbacks of using ML for turbulent combustion simulations including avenues which can be explored to overcome the challenges. The required mathematical equations and backgrounds are discussed with ample references for readers to find further detail if they wish. This book is unique since there is not any book with similar coverage of topics, ranging from big data analysis and machine learning algorithm to their applications for combustion science and system design for energy generation

    Decoding attentional load in visual perception: a signal processing approach

    Get PDF
    Previous research has established that visual perception tasks high in attentional load (or ‘perceptual load’, defined operationally to include either a larger number of items or a greater perceptual processing demand) result in reduced perceptual sensitivity and cortical response for visual stimuli outside the focus of attention. However, there are three challenges facing the load theory of attention today. The first is to describe a neural mechanism by which load-induced perceptual deficits are explained; the second is to clarify the concept of perceptual load and develop a method for estimating the load induced by a visual task a priori, without recourse to measures of secondary perceptual effects; and the third is to extend the study of attentional load to natural, real-world, visual tasks. In this thesis we employ signal processing and machine learning approaches to address these challenges. In Chapters 3 and 4 it is shown that high perceptual load degrades the perception of orientation by modulating the tuning curves of neural populations in early visual cortex. The combination of tuning curve modulations reported is unique to perceptual load, inducing broadened tuning as well as reductions in tuning amplitude and overall neural activity, and so provides a novel low-level mechanism for behaviourally relevant failures of vision such as inattentional blindness. In Chapter 5, a predictive model of perceptual load during the task of driving is produced. The high variation in perceptual demands during real-world driving allow the construction of a direct fine-scale mapping between high-resolution natural imagery, captured from a driver's point-of-view, and induced perceptual load. The model therefore constitutes the first system able to produce a priori estimates of load directly from visual characteristics of a natural task, extending research into the antecedents of perceptual load beyond the realm of austere laboratory displays. Taken together, the findings of this thesis represent major theoretical advances into both the causes and effects of high perceptual load

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Full text link
    Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science

    Machine Learning and Its Application to Reacting Flows

    Get PDF
    This open access book introduces and explains machine learning (ML) algorithms and techniques developed for statistical inferences on a complex process or system and their applications to simulations of chemically reacting turbulent flows. These two fields, ML and turbulent combustion, have large body of work and knowledge on their own, and this book brings them together and explain the complexities and challenges involved in applying ML techniques to simulate and study reacting flows. This is important as to the world’s total primary energy supply (TPES), since more than 90% of this supply is through combustion technologies and the non-negligible effects of combustion on environment. Although alternative technologies based on renewable energies are coming up, their shares for the TPES is are less than 5% currently and one needs a complete paradigm shift to replace combustion sources. Whether this is practical or not is entirely a different question, and an answer to this question depends on the respondent. However, a pragmatic analysis suggests that the combustion share to TPES is likely to be more than 70% even by 2070. Hence, it will be prudent to take advantage of ML techniques to improve combustion sciences and technologies so that efficient and “greener” combustion systems that are friendlier to the environment can be designed. The book covers the current state of the art in these two topics and outlines the challenges involved, merits and drawbacks of using ML for turbulent combustion simulations including avenues which can be explored to overcome the challenges. The required mathematical equations and backgrounds are discussed with ample references for readers to find further detail if they wish. This book is unique since there is not any book with similar coverage of topics, ranging from big data analysis and machine learning algorithm to their applications for combustion science and system design for energy generation

    Emotion Recognition with Deep Neural Networks

    Get PDF
    RÉSUMÉ La reconnaissance automatique des émotions humaines a été étudiée pendant des décennies. Il est l'un des éléments clés de l'interaction homme-ordinateur dans les domaines des soins de santé, de l'éducation, du divertissement et de la publicité. La reconnaissance des émotions est une tâche difficile car elle repose sur la prédiction des états émotionnels abstraits à partir de données d'entrée multimodales. Ces modalités comprennent la vidéo, l’audio et des signaux physiologiques. La modalité visuelle est l'un des canaux les plus informatifs. Notons en particulier les expressions du visage qui sont un très fort indicateur de l'état émotionnel d'un sujet. Un système automatisé commun de reconnaissance d'émotion comprend plusieurs étapes de traitement, dont chacune doit être réglée et intégrée dans un pipeline. Ces pipelines sont souvent ajustés à la main, et ce processus peut introduire des hypothèses fortes sur les propriétés de la tâche et des données. Limiter ces hypothèses et utiliser un apprentissage automatique du pipeline de traitement de données donne souvent des solutions plus générales. Au cours des dernières années, il a été démontré que les méthodes d'apprentissage profond mènent à de bonnes représentations pour diverses modalités. Pour de nombreux benchmarks, l'écart diminue rapidement entre les algorithmes de pointe basés sur des réseaux neuronaux profonds et la performance humaine. Ces réseaux apprennent hiérarchies de caractéristiques. Avec la profondeur croissante, ces hiérarchies peuvent décrire des concepts plus abstraits. Cette progrès suggèrent d'explorer les applications de ces méthodes d'apprentissage à l'analyse du visage et de la reconnaissance des émotions. Cette thèse repose sur une étude préliminaire et trois articles, qui contribuent au domaine de la reconnaissance des émotions. L'étude préliminaire présente une nouvelle variante de Patterns Binaires Locales (PBL), qui est utilisé comme une représentation binaire de haute dimension des images faciales. Il est commun de créer des histogrammes de caractéristiques de PBL dans les régions d'images d'entrée. Toutefois, dans ce travail, ils sont utilisés en tant que vecteurs binaires de haute dimension qui sont extraits à des échelles multiples autour les points clés faciales détectées. Nous examinons un pipeline constitué de la réduction de la dimensionnalité non supervisé et supervisé, en utilisant l'Analyse en Composantes Principales (ACP) et l'Analyse Discriminante Fisher Locale (ADFL), suivi d'une Machine à Vecteurs de Support (MVS) comme classificateur pour la prédiction des expressions faciales. Les expériences montrent que les étapes de réduction de dimensionnalité fournissent de la robustesse en présence de bruit dans points clés. Cette approche atteint, lors de sa publication, des performances de l’état de l’art dans la reconnaissance de l'expression du visage sur l’ensemble de données Extended Cohn-Kanade (CK+) (Lucey et al, 2010) et sur la détection de sourire sur l’ensemble de données GENKI (GENKI-4K, 2008). Pour la tâche de détection de sourire, un profond Réseau Neuronal Convolutif (RNC) a été utilisé pour référence fiable. La reconnaissance de l'émotion dans les vidéos semblable à ceux de la vie de tous les jours, tels que les clips de films d'Hollywood dans l'Emotion Recognition in the Wild (EmotiW) challenge (Dhall et al, 2013), est beaucoup plus difficile que dans des environnements de laboratoire contrôlées. Le premier article est une analyse en profondeur de la entrée gagnante de l'EmotiW 2013 challenge (Kahou et al, 2013) avec des expériments supplémentaires sur l'ensemble de données du défi de l’an 2014. Le pipeline est constitué d'une combinaison de modèles d'apprentissage en profondeur, chacun spécialisé dans une modalité. Ces modèles comprennent une nouvelle technique d’agrégation de caractéristiques d’images individuelles pour permettre de transférer les caractéristiques apprises par réseaux convolutionnels (CNN) sur un grand ensemble de données d’expressions faciales, et de les application au domaine de l’analyse de contenu vidéo. On y trouve aussi un ``deep belief net'' (DBN) pour les caractéristiques audio, un pipeline de reconnaissance d’activité pour capturer les caractéristiques spatio-temporelles, ainsi qu’modèle de type ``bag-of-mouths'' basé sur k-means pour extraire les caractéristiques propres à la bouche. Plusieurs approches pour la fusion des prédictions des modèles spécifiques à la modalité sont comparés. La performance après un nouvel entraînement basé sur les données de 2014, établis avec quelques adaptations, est toujours comparable à l’état de l’art actuel. Un inconvénient de la méthode décrite dans le premier article est l'approche de l'agrégation de la modalité visuelle qui implique la mise en commun par image requiert un vecteur de longueur fixe. Cela ne tient pas compte de l'ordre temporel à l'intérieur des segments groupés. Les Réseau de Neurones Récurrents (RNR) sont des réseaux neuronaux construits pour le traitement séquentiel des données. Ils peuvent résoudre ce problème en résumant les images dans un vecteur de valeurs réelles qui est mis à jour à chaque pas de temps. En général, les RNR fournissent une façon d'apprendre une approche d'agrégation d'une manière axée sur les données. Le deuxième article analyse l'application d'un RNR sur les caractéristiques issues d’un réseau neuronal de convolution utilisé pour la reconnaissance des émotions dans la vidéo. Une comparaison de la RNR avec l'approche fondée sur pooling montre une amélioration significative des performances de classification. Il comprend également une fusion au niveau de la caractéristiques et au niveau de décision de modèles pour différentes modalités. En plus d’utiliser RNR comme dans les travaux antérieurs, il utilise aussi un modèle audio basé sur MVS, ainsi que l'ancien modèle d'agrégation qui sont fusionnées pour améliorer les performances sur l'ensemble de données de défi EmotiW 2015. Cette approche a terminé en troisième position dans le concours, avec une différence de seulement 1% dans la précision de classification par rapport au modèle gagnant. Le dernier article se concentre sur un problème de vision par ordinateur plus général, à savoir le suivi visuel. Un RNR est augmenté avec un mécanisme d'attention neuronal qui lui permet de se concentrer sur l'information liée à une tâche, ignorant les distractions potentielles dans la trame vidéo d'entrée. L'approche est formulée dans un cadre neuronal modulaire constitué de trois composantes: un module d'attention récurrente qui détermine où chercher, un module d'extraction de caractéristiques fournissant une représentation de quel objet est vu, et un module objectif qui indique pourquoi un comportement attentionnel est appris. Chaque module est entièrement différentiables, ce qui permet une optimisation simple à base de gradient. Un tel cadre pourrait être utilisé pour concevoir une solution de bout en bout pour la reconnaissance de l'émotion dans la vision, ne nécessitant pas les étapes initiales de détection de visage ou de localisation d’endroits d’intérêt. L'approche est présentée dans trois ensembles de données de suivi, y compris un ensemble de données du monde réel. En résumé, cette thèse explore et développe une multitude de techniques d'apprentissage en profondeur, complétant des étapes importantes en vue de l’objectif à long terme de la construction d'un système entraînable de bout en bout pour la reconnaissance des émotions.----------ABSTRACT Automatic recognition of human emotion has been studied for decades. It is one of the key components in human computer interaction with applications in health care, education, entertainment and advertisement. Emotion recognition is a challenging task as it involves predicting abstract emotional states from multi-modal input data. These modalities include video, audio and physiological signals. The visual modality is one of the most informative channels; especially facial expressions, which have been shown to be strong cues for the emotional state of a subject. A common automated emotion recognition system includes several processing steps, each of which has to be tuned and integrated into a pipeline. Such pipelines are often hand-engineered which can introduce strong assumptions about the properties of the task and data. Limiting assumptions and learning the processing pipeline from data often yields more general solutions. In recent years, deep learning methods have been shown to be able to learn good representations for various modalities. For many computer vision benchmarks, the gap between state-of-the-art algorithms based on deep neural networks and human performance is shrinking rapidly. These networks learn hierarchies of features. With increasing depth, these hierarchies can describe increasingly abstract concepts. This development suggests exploring the applications of such learning methods to facial analysis and emotion recognition. This thesis is based on a preliminary study and three articles, which contribute to the field of emotion recognition. The preliminary study introduces a new variant of Local Binary Patterns (LBPs), which is used as a high dimensional binary representation of facial images. It is common to create histograms of LBP features within regions of input images. However, in this work, they are used as high dimensional binary vectors that are extracted at multiple scales around detected facial keypoints. We examine a pipeline consisting of unsupervised and supervised dimensionality reduction, using Principal Component Analysis (PCA) and Local Fisher Discriminant Analysis (LFDA), followed by a Support Vector Machine (SVM) classifier for prediction of facial expressions. The experiments show that the dimensionality reduction steps provide robustness in the presence of noisy keypoints. This approach achieved state-of-the-art performance in facial expression recognition on the Extended Cohn-Kanade (CK+) data set (Lucey et al, 2010) and smile detection on the GENKI data set (GENKI-4K, 2008) at the time. For the smile detection task, a deep Convolutional Neural Network (CNN) was used as a strong baseline. Emotion recognition in close-to-real-world videos, such as the Hollywood film clips in the Emotion Recognition in the Wild (EmotiW) challenge (Dhall et al, 2013), is much harder than in controlled lab environments. The first article is an in-depth analysis of the EmotiW 2013 challenge winning entry (Kahou et al, 2013) with additional experiments on the data set of the 2014 challenge. The pipeline consists of a combination of deep learning models, each specializing on one modality. The models include the following: a novel aggregation of per-frame features helps to transfer powerful CNN features learned on a large pooled data set of facial expression images to the video domain, a Deep Belief Network (DBN) learns audio features, an activity recognition pipeline captures spatio-temporal motion features and a k-means based bag-of-mouths model extracts features around the mouth region. Several approaches for fusing the predictions of modality-specific models are compared. The performance after re-training on the 2014 data set with a few adaptions is still competitive to the new state-of-the-art. One drawback of the method described in the first article is the aggregation approach of the visual modality which involves pooling per-frame features into a fixed-length vector. This ignores the temporal order inside the pooled segments. Recurrent Neural Networks (RNNs) are neural networks built for sequential processing of data, which can address this issue by summarizing frames in a real-valued state vector that is updated at each time-step. In general, RNNs provide a way of learning an aggregation approach in a data-driven manner. The second article analyzes the application of an RNN on CNN features for emotion recognition in video. A comparison of the RNN with the pooling-based approach shows a significant improvement in classification performance. It also includes a feature-level fusion and decision-level fusion of models for different modalities. In addition to the RNN, the same activity pipeline as previous work, an SVM-based audio model and the old aggregation model are fused to boost performance on the EmotiW 2015 challenge data set. This approach was the second runner-up in the challenge with a small margin of 1% in classification accuracy to the challenge winner. The last article focuses on a more general computer vision problem, namely visual tracking. An RNN is augmented with a neural attention mechanism that allows it to focus on task-related information, ignoring potential distractors in input frames. The approach is formulated in a modular neural framework consisting of three components: a recurrent attention module controlling where to look, a feature-extraction module providing a representation of what is seen and an objective module which indicates why an attentional behaviour is learned. Each module is fully differentiable allowing simple gradient-based optimization. Such a framework could be used to design an end-to-end solution for emotion recognition in vision, potentially not requiring initial steps of face detection or keypoint localization. The approach is tested on three tracking data sets including one real-world data set. In summary, this thesis explores and develops a multitude of deep learning techniques, making significant steps towards a long-term goal of building an end-to-end trainable systems for emotion recognition

    Perceptual texture similarity estimation

    Get PDF
    This thesis evaluates the ability of computational features to estimate perceptual texture similarity. In the first part of this thesis, we conducted two evaluation experiments on the ability of 51 computational feature sets to estimate perceptual texture similarity using two differ-ent evaluation methods, namely, pair-of-pairs based and retrieval based evaluations. These experiments compared the computational features to two sets of human derived ground-truth data, both of which are higher resolution than those commonly used. The first was obtained by free-grouping and the second by pair-of-pairs experiments. Using these higher resolution data, we found that the feature sets do not perform well when compared to human judgements. Our analysis shows that these computational feature sets either (1) only exploit power spectrum information or (2) only compute higher order statistics (HoS) on, at most, small local neighbourhoods. In other words, they cannot capture aperiodic, long-range spatial relationships. As we hypothesise that these long-range interactions are important for the human perception of texture similarity we carried out two more pair-of-pairs ex-periments, the results of which indicate that long-range interactions do provide humans with important cues for the perception of texture similarity. In the second part of this thesis we develop new texture features that can encode such data. We first examine the importance of three different types of visual information for human perception of texture. Our results show that contours are the most critical type of information for human discrimination of textures. Finally, we report the development of a new set of contour-based features which performed well on the free-grouping data and outperformed the 51 feature sets and another contour type feature set with the pair-of-pairs data

    Digital Image Processing

    Get PDF
    This book presents several recent advances that are related or fall under the umbrella of 'digital image processing', with the purpose of providing an insight into the possibilities offered by digital image processing algorithms in various fields. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field to properly understand the presented algorithms. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further
    corecore