49 research outputs found
Spatio-temporal human action detection and instance segmentation in videos
With an exponential growth in the number of video capturing devices and digital video content, automatic video understanding is now at the forefront of computer vision research. This thesis presents a series of models for automatic human action detection in videos and also addresses the space-time action instance segmentation problem. Both action detection and instance segmentation play vital roles in video understanding.
Firstly, we propose a novel human action detection approach based on a frame-level deep feature representation combined with a two-pass dynamic programming approach. The method obtains a frame-level action representation by leveraging recent advances in deep learning based action recognition and object detection methods. To combine the the complementary appearance and motion cues, we introduce a new fusion technique which signicantly improves the detection performance. Further, we cast the temporal action detection as two energy optimisation problems which are solved using Viterbi algorithm.
Exploiting a video-level representation further allows the network to learn the inter-frame temporal correspondence between action regions and it is bound to be a more optimal solution to the action detection problem than a frame-level representation. Secondly, we propose a novel deep network architecture which learns a video-level action representation by classifying and regressing 3D region proposals spanning two successive video frames. The proposed model is end-to-end trainable and can be jointly optimised for both proposal generation and action detection objectives in a single training step. We name our new network as \AMTnet" (Action Micro-Tube regression Network). We further extend the AMTnet model by incorporating optical ow features to encode motion patterns of actions.
Finally, we address the problem of action instance segmentation in which multiple concurrent actions of the same class may be segmented out of an image sequence. By taking advantage of recent work on action foreground-background segmentation, we are able to associate each action tube with class-specic segmentations.
We demonstrate the performance of our proposed models on challenging action detection benchmarks achieving new state-of-the-art results across the board and signicantly increasing detection speed at test time
Natural Language Processing (Almost) from Scratch
We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements
Modularity and Neural Integration in Large-Vocabulary Continuous Speech Recognition
This Thesis tackles the problems of modularity in Large-Vocabulary Continuous Speech Recognition with use of Neural Network
Modelling of Electrical Appliance Signatures for Energy Disaggregation
The rapid development of technology in the electrical sector within the last 20 years has
led to growing electric power needs through the increased number of electrical appliances
and automation of tasks. In contrary, reduction of the overall energy consumption
as well as efficient energy management are needed, in order to reduce global warming
and meet the global climate protection goals. These requirements have led to the recent
adoption of smart-meters and smart-grids, as well as to the rise of Non-Intrusive Load
Monitoring.
Non-Intrusive Load Monitoring aims to extract the energy consumption of individual
electrical appliances through disaggregation of the total power consumption as
measured by a single smart meter at the inlet of a household. Therefore, Non-Intrusive
Load Monitoring is a highly under-determined problem which aims to estimate multiple
variables from a single observation, thus is impossible to be solved analytical. In
order to find accurate estimates of the unknown variables three fundamentally different
approaches, namely deep-learning, pattern matching and single-channel source separation,
have been investigated in the literature in order to solve the Non-Intrusive Load
Monitoring problem.
While Non-Intrusive Load Monitoring has multiple areas of application, including
energy reduction through consumer awareness, load scheduling for energy cost optimization
or reduction of peak demands, the focus of this thesis is especially on the performance
of the disaggregation algorithm, the key part of the Non-Intrusive Load Monitoring
architecture. In detail, optimizations are proposed for all three architectures, while
the focus lies on deep-learning based approaches. Furthermore, the transferability capability
of the deep-learning based approach is investigated and a NILM specific transfer
architecture is proposed. The main contribution of the thesis is threefold.
First, with Non-Intrusive Load Monitoring being a time-series problem incorporation
of temporal information is crucial for accurate modelling of the appliance signatures
and the change of signatures over time. Therefore, previously published architectures
based on deep-learning have focused on utilizing regression models which intrinsically
incorporating temporal information. In this work, the idea of incorporating temporal information
is extended especially through modelling temporal patterns of appliances not
only in the regression stage, but also in the input feature vector, i.e. by using fractional
calculus, feature concatenation or high-frequency double Fourier integral signatures. Additionally,
multi variance matching is utilized for Non-Intrusive Load Monitoring in order
to have additional degrees of freedom for a pattern matching based solution.
Second, with Non-Intrusive Load Monitoring systems expected to operate in realtime
as well as being low-cost applications, computational complexity as well as storage
limitations must be considered. Therefore, in this thesis an approximation for frequency
domain features is presented in order to account for a reduction in computational complexity.
Furthermore, investigations of reduced sampling frequencies and their impact on
disaggregation performance has been evaluated. Additionally, different elastic matching
techniques have been compared in order to account for reduction of training times and
utilization of models without trainable parameters.
Third, in order to fully utilize Non-Intrusive Load Monitoring techniques accurate
transfer models, i.e. models which are trained on one data domain and tested on a different
data domain, are needed. In this context it is crucial to transfer time-variant and
manufacturer dependent appliance signatures to manufacturer invariant signatures, in
order to assure accurate transfer modelling. Therefore, a transfer learning architecture
specifically adapted to the needs of Non-Intrusive Load Monitoring is presented.
Overall, this thesis contributes to the topic of Non-Intrusive Load Monitoring improving
the performance of the disaggregation stage while comparing three fundamentally
different approaches for the disaggregation problem
Human Action Recognition from Active Acoustics: Physics Modelling for Representation Learning and Inference Using Generative Probabilistic Graphical Models
This dissertation explores computational methods to address the problem of physics-based modeling and ultimately doing inference from data in multiple modalities where there exists large amounts of low dimensional data complementary to a much smaller set of high dimensional data. In this instance the low dimensional timeseries data are active acoustics from a micro-Doppler sensor that include no or very limited spatial information, and the high dimensional data is RGB-Depth skeleton data from a Microsoft Kinect sensor. The task is that of human action recognition from the active acoustic data. To accomplish this, statistical models, trained simultaneously on both the micro-Doppler modulations induced by human actions and symbolic representations of skeletal poses, are developed. This enables the model to learn correlations between the rich temporal structure of the micro-Doppler modulations and the high-dimensional motion sequences of human action. During runtime, the model then relies purely on the active acoustic data to infer the human action. In order to adapt this methodology to situations not observed in the training data, a physical model of the human body is combined with a physics-based simulation of the Doppler phenomenon to predict the acoustic data for a sequence of skeletal poses and a con gurable sensor geometry. The physics model is then combined with a generative statistical model for human actions to create a generative physics-based model of micro-Doppler modulations for human action
Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations
The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov
Recommended from our members
Scalable Tools for Information Extraction and Causal Modeling of Neural Data
Systems neuroscience has entered in the past 20 years into an era that one might call "large scale systems neuroscience". From tuning curves and single neuron recordings there has been a conceptual shift towards a more holistic understanding of how the neural circuits work and as a result how their representations produce neural tunings.
With the introduction of a plethora of datasets in various scales, modalities, animals, and systems; we as a community have witnessed invaluable insights that can be gained from the collective view of a neural circuit which was not possible with small scale experimentation. The concurrency of the advances in neural recordings such as the production of wide field imaging technologies and neuropixels with the developments in statistical machine learning and specifically deep learning has brought system neuroscience one step closer to data science. With this abundance of data, the need for developing computational models has become crucial. We need to make sense of the data, and thus we need to build models that are constrained up to the acceptable amount of biological detail and probe those models in search of neural mechanisms.
This thesis consists of sections covering a wide range of ideas from computer vision, statistics, machine learning, and dynamical systems. But all of these ideas share a common purpose, which is to help automate neuroscientific experimentation process in different levels. In chapters 1, 2, and 3, I develop tools that automate the process of extracting useful information from raw neuroscience data in the model organism C. elegans. The goal of this is to avoid manual labor and pave the way for high throughput data collection aiming at better quantification of variability across the population of worms. Due to its high level of structural and functional stereotypy, and its relative simplicity, the nematode C. elegans has been an attractive model organism for systems and developmental research. With 383 neurons in males and 302 neurons in hermaphrodites, the positions and function of neurons is remarkably conserved across individuals. Furthermore, C. elegans remains the only organism for which a complete cellular, lineage, and anatomical map of the entire nervous system has been described for both sexes. Here, I describe the analysis pipeline that we developed for the recently proposed NeuroPAL technique in C. elegans. Our proposed pipeline consists of atlas building (chapter 1), registration, segmentation, neural tracking (chapter 2), and signal extraction (chapter 3). I emphasize that categorizing the analysis techniques as a pipeline consisting of the above steps is general and can be applied to virtually every single animal model and emerging imaging modality. I use the language of probabilistic generative modeling and graphical models to communicate the ideas in a rigorous form, therefore some familiarity with those concepts could help the reader navigate through the chapters of this thesis more easily.
In chapters 4 and 5 I build models that aim to automate hypothesis testing and causal interrogation of neural circuits. The notion of functional connectivity (FC) has been instrumental in our understanding of how information propagates in a neural circuit. However, an important limitation is that current techniques do not dissociate between causal connections and purely functional connections with no mechanistic correspondence. I start chapter 4 by introducing causal inference as a unifying language for the following chapters. In chapter 4 I define the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit providing a more mechanistic description of the information flow. I then investigate which functional connectivity metrics are best predictive of IC in simulations and real data. Following this framework, I discuss how stimulations and interventions can be used to improve fitting and generalization properties of time series models. Building on the literature of model identification and active causal discovery I develop a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories. Finally in chapter 5 I develop a new FC metric that separates the transferred information from one variable to the other into unique and synergistic sources.
In all projects, I have abstracted out concepts that are specific to the datasets at hand and developed the methods in the most general form. This makes the presented methods applicable to a broad range of datasets, potentially leading to new findings. In addition, all projects are accompanied with extensible and documented code packages, allowing theorists to repurpose the modules for novel applications and experimentalists to run analysis on their datasets efficiently and scalably.
In summary my main contribution in this thesis are the following:
1) Building the first atlases of hermaphrodite and male C. elegans and developing a generic statistical framework for constructing atlases for a broad range of datasets.
2) Developing a semi-automated analysis pipeline for neural registration, segmentation, and tracking in C. elegans.
3) Extending the framework of non-negative matrix factorization to datasets with deformable motion and developing algorithms for joint tracking and signal demixing from videos of semi-immobilized C. elegans.
4) Defining the notion of interventional connectivity (IC) as a way to summarize the effect of stimulation in a neural circuit and investigating which functional connectivity metrics are best predictive of IC in simulations and real data.
5) Developing a switching time series model and a method for finding stimulation patterns that help the model to generalize to the vicinity of the observed neural trajectories.
6) Developing a new functional connectivity metric that separates the transferred information from one variable to the other into unique and synergistic sources.
7) Implementing extensible, well documented, open source code packages for each of the above contributions
Localizing spatially and temporally objects and actions in videos
The rise of deep learning has facilitated remarkable progress in video understanding.
This thesis addresses three important tasks of video understanding: video object detection,
joint object and action detection, and spatio-temporal action localization.
Object class detection is one of the most important challenges in computer vision.
Object detectors are usually trained on bounding-boxes from still images. Recently,
video has been used as an alternative source of data. Yet, training an object detector
on one domain (either still images or videos) and testing on the other one results in a
significant performance gap compared to training and testing on the same domain. In
the first part of this thesis, we examine the reasons behind this performance gap. We
define and evaluate several domain shift factors: spatial location accuracy, appearance
diversity, image quality, aspect distribution, and object size and camera framing. We
examine the impact of these factors by comparing the detection performance before
and after cancelling them out. The results show that all five factors affect the performance
of the detectors and their combined effect explains the performance gap.
While most existing approaches for detection in videos focus on objects or human
actions separately, in the second part of this thesis we aim at detecting non-human
centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We
introduce an end-to-end multitask objective that jointly learns object-action relationships.
We compare it with different training objectives, validate its effectiveness for
detecting object-action pairs in videos, and show that both tasks of object and action
detection benefit from this joint learning. In experiments on the A2D dataset [Xu et al.,
2015], we obtain state-of-the-art results on segmentation of object-action pairs.
In the third part, we are the first to propose an action tubelet detector that leverages
the temporal continuity of videos instead of operating at the frame level, as state-of-the-art approaches do. The same way modern detectors rely on anchor boxes, our
tubelet detector is based on anchor cuboids by taking as input a sequence of frames
and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our
tubelet detector outperforms all state of the art on the UCF-Sports [Rodriguez et al.,
2008], J-HMDB [Jhuang et al., 2013a], and UCF-101 [Soomro et al., 2012] action localization
datasets especially at high overlap thresholds. The improvement in detection
performance is explained by both more accurate scores and more precise localization
Analyzing Granger causality in climate data with time series classification methods
Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested