657 research outputs found

    LSTM Pose Machines

    Full text link
    We observed that recent state-of-the-art results on single image human pose estimation were achieved by multi-stage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.Comment: Poster in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 201

    Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming

    Get PDF
    In this paper we consider the problem of human pose estimation in real-world videos of swimmers. Swimming channels allow filming swimmers simultaneously above and below the water surface with a single stationary camera. These recordings can be used to quantitatively assess the athletes' performance. The quantitative evaluation, so far, requires manual annotations of body parts in each video frame. We therefore apply the concept of CNNs in order to automatically infer the required pose information. Starting with an off-the-shelf architecture, we develop extensions to leverage activity information - in our case the swimming style of an athlete - and the continuous nature of the video recordings. Our main contributions are threefold: (a) We apply and evaluate a fine-tuned Convolutional Pose Machine architecture as a baseline in our very challenging aquatic environment and discuss its error modes, (b) we propose an extension to input swimming style information into the fully convolutional architecture and (c) modify the architecture for continuous pose estimation in videos. With these additions we achieve reliable pose estimates with up to +16% more correct body joint detections compared to the baseline architecture.Comment: 10 pages, 9 figures, accepted at WACV 201

    Modellbasiertes Regressionstesten von Varianten und Variantenversionen

    Get PDF
    The quality assurance of software product lines (SPL) achieved via testing is a crucial and challenging activity of SPL engineering. In general, the application of single-software testing techniques for SPL testing is not practical as it leads to the individual testing of a potentially vast number of variants. Testing each variant in isolation further results in redundant testing processes by means of redundant test-case executions due to the shared commonality. Existing techniques for SPL testing cope with those challenges, e.g., by identifying samples of variants to be tested. However, each variant is still tested separately without taking the explicit knowledge about the shared commonality and variability into account to reduce the overall testing effort. Furthermore, due to the increasing longevity of software systems, their development has to face software evolution. Hence, quality assurance has also to be ensured after SPL evolution by testing respective versions of variants. In this thesis, we tackle the challenges of testing redundancy as well as evolution by proposing a framework for model-based regression testing of evolving SPLs. The framework facilitates efficient incremental testing of variants and versions of variants by exploiting the commonality and reuse potential of test artifacts and test results. Our contribution is divided into three parts. First, we propose a test-modeling formalism capturing the variability and version information of evolving SPLs in an integrated fashion. The formalism builds the basis for automatic derivation of reusable test cases and for the application of change impact analysis to guide retest test selection. Second, we introduce two techniques for incremental change impact analysis to identify (1) changing execution dependencies to be retested between subsequently tested variants and versions of variants, and (2) the impact of an evolution step to the variant set in terms of modified, new and unchanged versions of variants. Third, we define a coverage-driven retest test selection based on a new retest coverage criterion that incorporates the results of the change impact analysis. The retest test selection facilitates the reduction of redundantly executed test cases during incremental testing of variants and versions of variants. The framework is prototypically implemented and evaluated by means of three evolving SPLs showing that it achieves a reduction of the overall effort for testing evolving SPLs.Testen ist ein wichtiger Bestandteil der Entwicklung von Softwareproduktlinien (SPL). Aufgrund der potentiell sehr großen Anzahl an Varianten einer SPL ist deren individueller Test im Allgemeinen nicht praktikabel und resultiert zudem in redundanten Testfallausführungen, die durch die Gemeinsamkeiten zwischen Varianten entstehen. Existierende SPL-Testansätze adressieren diese Herausforderungen z.B. durch die Reduktion der Anzahl an zu testenden Varianten. Jedoch wird weiterhin jede Variante unabhängig getestet, ohne dabei das Wissen über Gemeinsamkeiten und Variabilität auszunutzen, um den Testaufwand zu reduzieren. Des Weiteren muss sich die SPL-Entwicklung mit der Evolution von Software auseinandersetzen. Dies birgt weitere Herausforderungen für das SPL-Testen, da nicht nur für Varianten sondern auch für ihre Versionen die Qualität sichergestellt werden muss. In dieser Arbeit stellen wir ein Framework für das modellbasierte Regressionstesten von evolvierenden SPL vor, das die Herausforderungen des redundanten Testens und der Software-Evolution adressiert. Das Framework vereint Testmodellierung, Änderungsauswirkungsanalyse und automatische Testfallselektion, um einen inkrementellen Testprozess zu definieren, der Varianten und Variantenversionen unter Ausnutzung des Wissens über gemeinsame Funktionalität und dem Wiederverwendungspotential von Testartefakten und -resultaten effizient testet. Für die Testmodellierung entwickeln wir einen Ansatz, der Variabilitäts- sowie Versionsinformation von evolvierenden SPL gleichermaßen für die Modellierung einbezieht. Für die Änderungsauswirkungsanalyse definieren wir zwei Techniken, um zum einen Änderungen in Ausführungsabhängigkeiten zwischen zu testenden Varianten und ihren Versionen zu identifizieren und zum anderen die Auswirkungen eines Evolutionsschrittes auf die Variantenmenge zu bestimmen und zu klassifizieren. Für die Testfallselektion schlagen wir ein Abdeckungskriterium vor, das die Resultate der Auswirkungsanalyse einbezieht, um automatisierte Entscheidungen über einen Wiederholungstest von wiederverwendbaren Testfällen durchzuführen. Die abdeckungsgetriebene Testfallselektion ermöglicht somit die Reduktion der redundanten Testfallausführungen während des inkrementellen Testens von Varianten und Variantenversionen. Das Framework ist prototypisch implementiert und anhand von drei evolvierenden SPL evaluiert. Die Resultate zeigen, dass eine Aufwandsreduktion für das Testen evolvierender SPL erreicht wird

    Human activity recognition for pervasive interaction

    Get PDF
    PhD ThesisThis thesis addresses the challenge of computing food preparation context in the kitchen. The automatic recognition of fine-grained human activities and food ingredients is realized through pervasive sensing which we achieve by instrumenting kitchen objects such as knives, spoons, and chopping boards with sensors. Context recognition in the kitchen lies at the heart of a broad range of real-world applications. In particular, activity and food ingredient recognition in the kitchen is an essential component for situated services such as automatic prompting services for cognitively impaired kitchen users and digital situated support for healthier eating interventions. Previous works, however, have addressed the activity recognition problem by exploring high-level-human activities using wearable sensing (i.e. worn sensors on human body) or using technologies that raise privacy concerns (i.e. computer vision). Although such approaches have yielded significant results for a number of activity recognition problems, they are not applicable to our domain of investigation, for which we argue that the technology itself must be genuinely “invisible”, thereby allowing users to perform their activities in a completely natural manner. In this thesis we describe the development of pervasive sensing technologies and algorithms for finegrained human activity and food ingredient recognition in the kitchen. After reviewing previous work on food and activity recognition we present three systems that constitute increasingly sophisticated approaches to the challenge of kitchen context recognition. Two of these systems, Slice&Dice and Classbased Threshold Dynamic Time Warping (CBT-DTW), recognize fine-grained food preparation activities. Slice&Dice is a proof-of-concept application, whereas CBT-DTW is a real-time application that also addresses the problem of recognising unknown activities. The final system, KitchenSense is a real-time context recognition framework that deals with the recognition of a more complex set of activities, and includes the recognition of food ingredients and events in the kitchen. For each system, we describe the prototyping of pervasive sensing technologies, algorithms, as well as real-world experiments and empirical evaluations that validate the proposed solutions.Vietnamese government’s 322 project, executed by the Vietnamese Ministry of Education and Training

    Semantic Composition via Probabilistic Model Theory

    Get PDF
    Semantic composition remains an open problem for vector space models of semantics. In this paper, we explain how the probabilistic graphical model used in the framework of Functional Distributional Semantics can be interpreted as a probabilistic version of model theory. Building on this, we explain how various semantic phenomena can be recast in terms of conditional probabilities in the graphical model. This connection between formal semantics and machine learning is helpful in both directions: it gives us an explicit mechanism for modelling context-dependent meanings (a challenge for formal semantics), and also gives us well-motivated techniques for composing distributed representations (a challenge for distributional semantics). We present results on two datasets that go beyond word similarity, showing how these semantically-motivated techniques improve on the performance of vector models.Schiff Foundatio

    Second-order Temporal Pooling for Action Recognition

    Full text link
    Deep learning models for video-based action recognition usually generate features for short clips (consisting of a few frames); such clip-level features are aggregated to video-level representations by computing statistics on these features. Typically zero-th (max) or the first-order (average) statistics are used. In this paper, we explore the benefits of using second-order statistics. Specifically, we propose a novel end-to-end learnable feature aggregation scheme, dubbed temporal correlation pooling that generates an action descriptor for a video sequence by capturing the similarities between the temporal evolution of clip-level CNN features computed across the video. Such a descriptor, while being computationally cheap, also naturally encodes the co-activations of multiple CNN features, thereby providing a richer characterization of actions than their first-order counterparts. We also propose higher-order extensions of this scheme by computing correlations after embedding the CNN features in a reproducing kernel Hilbert space. We provide experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained datasets such as MPII Cooking activities and JHMDB, as well as the recent Kinetics-600. Our results demonstrate the advantages of higher-order pooling schemes that when combined with hand-crafted features (as is standard practice) achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV
    corecore