5 research outputs found

    Recognising and localising human actions

    Get PDF
    Human action recognition in challenging video data is becoming an increasingly important research area. Given the growing number of cameras and robots pointing their lenses at humans, the need for automatic recognition of human actions arises, promising Google-style video search and automatic video summarisation/description. Furthermore, for any autonomous robotic system to interact with humans, it must rst be able to understand and quickly react to human actions. Although the best action classication methods aggregate features from the entire video clip in which the action unfolds, this global representation may include irrelevant scene context and movements which are shared amongst multiple action classes. For example, a waving action may be performed whilst walking, however if the walking movement appears in distinct action classes, then it should not be included in training a waving movement classier. For this reason, we propose an action classication framework in which more discriminative action subvolumes are learned in a weakly supervised setting, owing to the diculty of manually labelling massive video datasets. The learned models are used to simultaneously classify video clips and to localise actions to a given space-time subvolume. Each subvolume is cast as a bag-of-features (BoF) instance in a multiple-instance-learning framework, which in turn is used to learn its class membership. We demonstrate quantitatively that even with single xed-sized subvolumes, the classication performance of our proposed algorithm is superior to our BoF baseline on the majority of performance measures, and shows promise for space-time action localisation on the most challenging video datasets. Exploiting spatio-temporal structure in the video should also improve results, just as deformable part models have proven highly successful in object recognition. However, whereas objects have clear boundaries which means we can easily dene a ground truth for initialisation, 3D space-time actions are inherently ambiguous and expensive to annotate in large datasets. Thus, it is desirable to adapt pictorial star models to action datasets without location annotation, and to features invariant to changes in pose such as bag-of-feature and Fisher vectors, rather than low-level HoG. Thus, we propose local deformable spatial bag-of-features (LDSBoF) in which local discriminative regions are split into axed grid of parts that are allowed to deform in both space and time at test-time. In our experimental evaluation we demonstrate that by using local, deformable space-time action parts, we are able to achieve very competitive classification performance, whilst being able to localise actions even in the most challenging video datasets. A recent trend in action recognition is towards larger and more challenging datasets, an increasing number of action classes and larger visual vocabularies. For the global classication of human action video clips, the bag-of-visual-words pipeline is currently the best performing. However, the strategies chosen to sample features and construct a visual vocabulary are critical to performance, in fact often dominating performance. Thus, we provide a critical evaluation of various approaches to building a vocabulary and show that good practises do have a signicant impact. By subsampling and partitioning features strategically, we are able to achieve state-of-the-art results on 5 major action recognition datasets using relatively small visual vocabularies. Another promising approach to recognise human actions first encodes the action sequence via a generative dynamical model. However, using classical distances for their classication does not necessarily deliver good results. Therefore we propose a general framework for learning distance functions between dynamical models, given a training set of labelled videos. The optimal distance function is selected among a family of `pullback' ones, induced by a parametrised mapping of the space of models. We focus here on hidden Markov models and their model space, and show how pullback distance learning greatly improves action recognition performances with respect to base distances. Finally, the action classication systems that use a single global representation for each video clip are tailored for oine batch classication benchmarks. For human-robot interaction however, current systems fall short, either because they can only detect one human action per video frame, or because they assume the video is available ahead of time. In this work we propose an online human action detection system that can incrementally detect multiple concurrent space-time actions. In this way, it becomes possible to learn new action classes on-the-fly, allowing multiple people to actively teach and interact with a robot

    Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

    No full text
    Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ..

    Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

    No full text
    Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ..
    corecore