49 research outputs found

    Spatio-temporal human action detection and instance segmentation in videos

    Get PDF
    With an exponential growth in the number of video capturing devices and digital video content, automatic video understanding is now at the forefront of computer vision research. This thesis presents a series of models for automatic human action detection in videos and also addresses the space-time action instance segmentation problem. Both action detection and instance segmentation play vital roles in video understanding. Firstly, we propose a novel human action detection approach based on a frame-level deep feature representation combined with a two-pass dynamic programming approach. The method obtains a frame-level action representation by leveraging recent advances in deep learning based action recognition and object detection methods. To combine the the complementary appearance and motion cues, we introduce a new fusion technique which signicantly improves the detection performance. Further, we cast the temporal action detection as two energy optimisation problems which are solved using Viterbi algorithm. Exploiting a video-level representation further allows the network to learn the inter-frame temporal correspondence between action regions and it is bound to be a more optimal solution to the action detection problem than a frame-level representation. Secondly, we propose a novel deep network architecture which learns a video-level action representation by classifying and regressing 3D region proposals spanning two successive video frames. The proposed model is end-to-end trainable and can be jointly optimised for both proposal generation and action detection objectives in a single training step. We name our new network as \AMTnet" (Action Micro-Tube regression Network). We further extend the AMTnet model by incorporating optical ow features to encode motion patterns of actions. Finally, we address the problem of action instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. By taking advantage of recent work on action foreground-background segmentation, we are able to associate each action tube with class-specic segmentations. We demonstrate the performance of our proposed models on challenging action detection benchmarks achieving new state-of-the-art results across the board and signicantly increasing detection speed at test time

    Natural Language Processing (Almost) from Scratch

    Get PDF
    We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements

    Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition

    Get PDF
    This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network

    Modelling of Electrical Appliance Signatures for Energy Disaggregation

    Get PDF
    The rapid development of technology in the electrical sector within the last 20 years has led to growing electric power needs through the increased number of electrical appliances and automation of tasks. In contrary, reduction of the overall energy consumption as well as efficient energy management are needed, in order to reduce global warming and meet the global climate protection goals. These requirements have led to the recent adoption of smart-meters and smart-grids, as well as to the rise of Non-Intrusive Load Monitoring. Non-Intrusive Load Monitoring aims to extract the energy consumption of individual electrical appliances through disaggregation of the total power consumption as measured by a single smart meter at the inlet of a household. Therefore, Non-Intrusive Load Monitoring is a highly under-determined problem which aims to estimate multiple variables from a single observation, thus is impossible to be solved analytical. In order to find accurate estimates of the unknown variables three fundamentally different approaches, namely deep-learning, pattern matching and single-channel source separation, have been investigated in the literature in order to solve the Non-Intrusive Load Monitoring problem. While Non-Intrusive Load Monitoring has multiple areas of application, including energy reduction through consumer awareness, load scheduling for energy cost optimization or reduction of peak demands, the focus of this thesis is especially on the performance of the disaggregation algorithm, the key part of the Non-Intrusive Load Monitoring architecture. In detail, optimizations are proposed for all three architectures, while the focus lies on deep-learning based approaches. Furthermore, the transferability capability of the deep-learning based approach is investigated and a NILM specific transfer architecture is proposed. The main contribution of the thesis is threefold. First, with Non-Intrusive Load Monitoring being a time-series problem incorporation of temporal information is crucial for accurate modelling of the appliance signatures and the change of signatures over time. Therefore, previously published architectures based on deep-learning have focused on utilizing regression models which intrinsically incorporating temporal information. In this work, the idea of incorporating temporal information is extended especially through modelling temporal patterns of appliances not only in the regression stage, but also in the input feature vector, i.e. by using fractional calculus, feature concatenation or high-frequency double Fourier integral signatures. Additionally, multi variance matching is utilized for Non-Intrusive Load Monitoring in order to have additional degrees of freedom for a pattern matching based solution. Second, with Non-Intrusive Load Monitoring systems expected to operate in realtime as well as being low-cost applications, computational complexity as well as storage limitations must be considered. Therefore, in this thesis an approximation for frequency domain features is presented in order to account for a reduction in computational complexity. Furthermore, investigations of reduced sampling frequencies and their impact on disaggregation performance has been evaluated. Additionally, different elastic matching techniques have been compared in order to account for reduction of training times and utilization of models without trainable parameters. Third, in order to fully utilize Non-Intrusive Load Monitoring techniques accurate transfer models, i.e. models which are trained on one data domain and tested on a different data domain, are needed. In this context it is crucial to transfer time-variant and manufacturer dependent appliance signatures to manufacturer invariant signatures, in order to assure accurate transfer modelling. Therefore, a transfer learning architecture specifically adapted to the needs of Non-Intrusive Load Monitoring is presented. Overall, this thesis contributes to the topic of Non-Intrusive Load Monitoring improving the performance of the disaggregation stage while comparing three fundamentally different approaches for the disaggregation problem

    Human Action Recognition from Active Acoustics: Physics Modelling for Representation Learning and Inference Using Generative Probabilistic Graphical Models

    Get PDF
    This dissertation explores computational methods to address the problem of physics-based modeling and ultimately doing inference from data in multiple modalities where there exists large amounts of low dimensional data complementary to a much smaller set of high dimensional data. In this instance the low dimensional timeseries data are active acoustics from a micro-Doppler sensor that include no or very limited spatial information, and the high dimensional data is RGB-Depth skeleton data from a Microsoft Kinect sensor. The task is that of human action recognition from the active acoustic data. To accomplish this, statistical models, trained simultaneously on both the micro-Doppler modulations induced by human actions and symbolic representations of skeletal poses, are developed. This enables the model to learn correlations between the rich temporal structure of the micro-Doppler modulations and the high-dimensional motion sequences of human action. During runtime, the model then relies purely on the active acoustic data to infer the human action. In order to adapt this methodology to situations not observed in the training data, a physical model of the human body is combined with a physics-based simulation of the Doppler phenomenon to predict the acoustic data for a sequence of skeletal poses and a con gurable sensor geometry. The physics model is then combined with a generative statistical model for human actions to create a generative physics-based model of micro-Doppler modulations for human action

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Localizing spatially and temporally objects and actions in videos

    Get PDF
    The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization. Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap. While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset [Xu et al., 2015], we obtain state-of-the-art results on segmentation of object-action pairs. In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of-the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports [Rodriguez et al., 2008], J-HMDB [Jhuang et al., 2013a], and UCF-101 [Soomro et al., 2012] action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
    corecore