3,037 research outputs found

    Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

    Full text link
    Spatio-temporal feature encoding is essential for encoding the dynamics in video sequences. Recurrent neural networks, particularly long short-term memory (LSTM) units, have been popular as an efficient tool for encoding spatio-temporal features in sequences. In this work, we investigate the effect of mode variations on the encoded spatio-temporal features using LSTMs. We show that the LSTM retains information related to the mode variation in the sequence, which is irrelevant to the task at hand (e.g. classification facial expressions). Actually, the LSTM forget mechanism is not robust enough to mode variations and preserves information that could negatively affect the encoded spatio-temporal features. We propose the mode variational LSTM to encode spatio-temporal features robust to unseen modes of variation. The mode variational LSTM modifies the original LSTM structure by adding an additional cell state that focuses on encoding the mode variation in the input sequence. To efficiently regulate what features should be stored in the additional cell state, additional gating functionality is also introduced. The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task. Comparative experiments on publicly available datasets verified that the proposed mode variational LSTM outperforms existing methods. Moreover, a new dynamic facial expression dataset with different modes of variation, including various modes like pose and illumination variations, was collected to comprehensively evaluate the proposed mode variational LSTM. Experimental results verified that the proposed mode variational LSTM encodes spatio-temporal features robust to unseen modes of variation.Comment: Accepted in AAAI-1

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

    Get PDF
    Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns

    A Spatio-Temporal Probabilistic Framework for Dividing and Predicting Facial Action Units

    Get PDF
    This thesis proposed a probabilistic approach to divide the Facial Action Units (AUs) based on the physiological relations and their strengths among the facial muscle groups. The physiological relations and their strengths were captured using a Static Bayesian Network (SBN) from given databases. A data driven spatio-temporal probabilistic scoring function was introduced to divide the AUs into : (i) frequently occurred and strongly connected AUs (FSAUs) and (ii) infrequently occurred and weakly connected AUs (IWAUs). In addition, a Dynamic Bayesian Network (DBN) based predictive mechanism was implemented to predict the IWAUs from FSAUs. The combined spatio-temporal modeling enabled a framework to predict a full set of AUs in real-time. Empirical analyses were performed to illustrate the efficacy and utility of the proposed approach. Four different datasets of varying degrees of complexity and diversity were used for performance validation and perturbation analysis. Empirical results suggest that the IWAUs can be robustly predicted from the FSAUs in real-time and was found to be robust against noise

    Annotated Bibliography: Anticipation

    Get PDF

    Gesture tracking and neural activity segmentation in head-fixed behaving mice by deep learning methods

    Get PDF
    The typical approach used by neuroscientists is to study the response of laboratory animals to a stimulus while recording their neural activity at the same time. With the advent of calcium imaging technology, researchers can now study neural activity at sub-cellular resolutions in vivo. Similarly, recording the behaviour of laboratory animals is also becoming more affordable. Although it is now easier to record behavioural and neural data, this data comes with its own set of challenges. The biggest challenge, given the sheer volume of the data, is annotation. A traditional approach is to annotate the data manually, frame by frame. With behavioural data, manual annotation is done by looking at each frame and tracing the animals; with neural data, this is carried out by a trained neuroscientist. In this research, we propose automated tools based on deep learning that can aid in the processing of behavioural and neural data. These tools will help neuroscientists annotate and analyse the data they acquire in an automated and reliable way.La configuración típica empleada por los neurocientíficos consiste en estudiar la respuesta de los animales de laboratorio a un estímulo y registrar al mismo tiempo su actividad neuronal. Con la llegada de la tecnología de imágenes del calcio, los investigadores pueden ahora estudiar la actividad neuronal a resoluciones subcelulares in vivo. Del mismo modo, el registro del comportamiento de los animales de laboratorio también se está volviendo más asequible. Aunque ahora es más fácil registrar los datos del comportamiento y los datos neuronales, estos datos ofrecen su propio conjunto de desafíos. El mayor desafío es la anotación de los datos debido a su gran volumen. Un enfoque tradicional es anotar los datos manualmente, fotograma a fotograma. En el caso de los datos sobre el comportamiento, la anotación manual se hace mirando cada fotograma y rastreando los animales, mientras que, para los datos neuronales, la anotación la hace un neurocientífico capacitado. En esta investigación, proponemos herramientas automatizadas basadas en el aprendizaje profundo que pueden ayudar a procesar los datos de comportamiento y los datos neuronales.La configuració típica emprada pels neurocientífics consisteix a estudiar la resposta dels animals de laboratori a un estímul i registrar al mateix temps la seva activitat neuronal. Amb l'arribada de la tecnologia d'imatges basades en calci, els investigadors poden ara estudiar l'activitat neuronal a resolucions subcel·lulars in vivo. De la mateixa manera, el registre del comportament dels animals de laboratori també ha esdevingut molt més assequible. Tot i que ara és més fàcil registrar les dades del comportament i les dades neuronals, aquestes dades ofereixen el seu propi conjunt de reptes. El major desafiament és l'anotació de les dades, degut al seu gran volum. Un enfocament tradicional és anotar les dades manualment, fotograma a fotograma. En el cas de les dades sobre el comportament, l'anotació manual es fa mirant cada fotograma i rastrejant els animals, mentre que per a les dades neuronals, l'anotació la fa un neurocientífic capacitat. En aquesta investigació, proposem eines automatitzades basades en laprenentatge profund que poden ajudar a modelar les dades de comportament i les dades neuronals
    • …
    corecore