233 research outputs found

    ModDrop: adaptive multi-modal gesture recognition

    Full text link
    We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

    Deep Learning Applied to PMU Data in Power Systems

    Get PDF
    With the advent of Wide Area Measurement Systems and the consequent proliferation of digital measurement devices such as PMUs, control centers are being flooded with growing amounts of data. Therefore, operators are craving for efficient techniques to digest the incoming data, enhancing grid operations by making use of knowledge extraction. Driven by the volumes of data involved, innovative methods in the field of Artificial Intelligence are emerging for harnessing information without declaring complex analytical models. In fact, learning to recognize patterns seems to be the answer to overcome the challenges imposed by processing the huge volumes of raw data involved in PMU-based WAMS. Hence, Deep Learning Frameworks are applied as computational learning techniques so as to extract features from electrical frequency records collected by the Brazillian Medfasee BT Project. More specifically, the work developed proposes a classifier of dynamic events such as generation loss, load shedding, etc., based on frequency change

    A study of deep neural networks for human activity recognition

    Get PDF
    Human activity recognition and deep learning are two fields that have attracted attention in recent years. The former due to its relevance in many application domains, such as ambient assisted living or health monitoring, and the latter for its recent and excellent performance achievements in different domains of application such as image and speech recognition. In this article, an extensive analysis among the most suited deep learning architectures for activity recognition is conducted to compare its performance in terms of accuracy, speed, and memory requirements. In particular, convolutional neural networks (CNN), long short‐term memory networks (LSTM), bidirectional LSTM (biLSTM), gated recurrent unit networks (GRU), and deep belief networks (DBN) have been tested on a total of 10 publicly available datasets, with different sensors, sets of activities, and sampling rates. All tests have been designed under a multimodal approach to take advantage of synchronized raw sensor' signals. Results show that CNNs are efficient at capturing local temporal dependencies of activity signals, as well as at identifying correlations among sensors. Their performance in activity classification is comparable with, and in most cases better than, the performance of recurrent models. Their faster response and lower memory footprint make them the architecture of choice for wearable and IoT devices

    Cybersecurity of multi-cloud healthcare systems: A hierarchical deep learning approach

    Get PDF
    With the increase in sophistication and connectedness of the healthcare networks, their attack surfaces and vulnerabilities increase significantly. Malicious agents threaten patients’ health and life by stealing or altering data as it flows among the multiple domains of healthcare networks. The problem is likely to exacerbate with the increasing use of IoT devices, edge, and core clouds in the next generation healthcare networks. Presented in this paper is MUSE, a system of deep hierarchical stacked neural networks for timely and accurate detection of malicious activity that leads to alteration of meta-information or payload of the dataflow between the IoT gateway, edge and core clouds. Smaller models at the edge clouds take substantially less time to train as compared to the large models in the core cloud. To improve the speed of training and accuracy of detection of large core cloud models, the MUSE system uses a novel method of merging and aggregating layers of trained edge cloud models to construct a partly pre-trained core cloud model. As a result, the model in the core cloud takes substantially smaller number of epochs (6 to 8) and, consequently, less time, compared to those in the edge clouds, training of which take 35 to 40 epochs to converge. With the help of extensive evaluations, it is shown that with the MUSE system, large, merged models can be trained in significantly less time than the unmerged models that are created independently in the core cloud. Through several runs it is seen that the merged models give on an average 26.2% reduction in training times. From the experimental evaluation we demonstrate that along with fast training speeds the merged MUSE model gives high training and test accuracies, ranging from 95% to 100%, in detection of unknown attacks on dataflows. The merged model thus generalizes very well on the test data. This is a marked improvement when compared with the accuracy given by un-merged model as well as accuracy reported by other researchers with newer datasets

    Learning Generalizable Visual Patterns Without Human Supervision

    Get PDF
    Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design. This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise. While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings
    corecore