657 research outputs found
LSTM Pose Machines
We observed that recent state-of-the-art results on single image human pose
estimation were achieved by multi-stage Convolution Neural Networks (CNN).
Notwithstanding the superior performance on static images, the application of
these models on videos is not only computationally intensive, it also suffers
from performance degeneration and flicking. Such suboptimal results are mainly
attributed to the inability of imposing sequential geometric consistency,
handling severe image quality degradation (e.g. motion blur and occlusion) as
well as the inability of capturing the temporal correlation among video frames.
In this paper, we proposed a novel recurrent network to tackle these problems.
We showed that if we were to impose the weight sharing scheme to the
multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN).
This property decouples the relationship among multiple network stages and
results in significantly faster speed in invoking the network for videos. It
also enables the adoption of Long Short-Term Memory (LSTM) units between video
frames. We found such memory augmented RNN is very effective in imposing
geometric consistency among frames. It also well handles input quality
degradation in videos while successfully stabilizes the sequential outputs. The
experiments showed that our approach significantly outperformed current
state-of-the-art methods on two large-scale video pose estimation benchmarks.
We also explored the memory cells inside the LSTM and provided insights on why
such mechanism would benefit the prediction for video-based pose estimations.Comment: Poster in IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 201
Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming
In this paper we consider the problem of human pose estimation in real-world
videos of swimmers. Swimming channels allow filming swimmers simultaneously
above and below the water surface with a single stationary camera. These
recordings can be used to quantitatively assess the athletes' performance. The
quantitative evaluation, so far, requires manual annotations of body parts in
each video frame. We therefore apply the concept of CNNs in order to
automatically infer the required pose information. Starting with an
off-the-shelf architecture, we develop extensions to leverage activity
information - in our case the swimming style of an athlete - and the continuous
nature of the video recordings. Our main contributions are threefold: (a) We
apply and evaluate a fine-tuned Convolutional Pose Machine architecture as a
baseline in our very challenging aquatic environment and discuss its error
modes, (b) we propose an extension to input swimming style information into the
fully convolutional architecture and (c) modify the architecture for continuous
pose estimation in videos. With these additions we achieve reliable pose
estimates with up to +16% more correct body joint detections compared to the
baseline architecture.Comment: 10 pages, 9 figures, accepted at WACV 201
Modellbasiertes Regressionstesten von Varianten und Variantenversionen
The quality assurance of software product lines (SPL) achieved via testing is a crucial and challenging activity of SPL engineering. In general, the application of single-software testing techniques for SPL testing is not practical as it leads to the individual testing of a potentially vast number of variants. Testing each variant in isolation further results in redundant testing processes by means of redundant test-case executions due to the shared commonality. Existing techniques for SPL testing cope with those challenges, e.g., by identifying samples of variants to be tested. However, each variant is still tested separately without taking the explicit knowledge about the shared commonality and variability into account to reduce the overall testing effort. Furthermore, due to the increasing longevity of software systems, their development has to face software evolution. Hence, quality assurance has also to be ensured after SPL evolution by testing respective versions of variants. In this thesis, we tackle the challenges of testing redundancy as well as evolution by proposing a framework for model-based regression testing of evolving SPLs. The framework facilitates efficient incremental testing of variants and versions of variants by exploiting the commonality and reuse potential of test artifacts and test results. Our contribution is divided into three parts. First, we propose a test-modeling formalism capturing the variability and version information of evolving SPLs in an integrated fashion. The formalism builds the basis for automatic derivation of reusable test cases and for the application of change impact analysis to guide retest test selection. Second, we introduce two techniques for incremental change impact analysis to identify (1) changing execution dependencies to be retested between subsequently tested variants and versions of variants, and (2) the impact of an evolution step to the variant set in terms of modified, new and unchanged versions of variants. Third, we define a coverage-driven retest test selection based on a new retest coverage criterion that incorporates the results of the change impact analysis. The retest test selection facilitates the reduction of redundantly executed test cases during incremental testing of variants and versions of variants. The framework is prototypically implemented and evaluated by means of three evolving SPLs showing that it achieves a reduction of the overall effort for testing evolving SPLs.Testen ist ein wichtiger Bestandteil der Entwicklung von Softwareproduktlinien (SPL). Aufgrund der potentiell sehr großen Anzahl an Varianten einer SPL ist deren individueller Test im Allgemeinen nicht praktikabel und resultiert zudem in redundanten Testfallausführungen, die durch die Gemeinsamkeiten zwischen Varianten entstehen. Existierende SPL-Testansätze adressieren diese Herausforderungen z.B. durch die Reduktion der Anzahl an zu testenden Varianten. Jedoch wird weiterhin jede Variante unabhängig getestet, ohne dabei das Wissen über Gemeinsamkeiten und Variabilität auszunutzen, um den Testaufwand zu reduzieren. Des Weiteren muss sich die SPL-Entwicklung mit der Evolution von Software auseinandersetzen. Dies birgt weitere Herausforderungen für das SPL-Testen, da nicht nur für Varianten sondern auch für ihre Versionen die Qualität sichergestellt werden muss. In dieser Arbeit stellen wir ein Framework für das modellbasierte Regressionstesten von evolvierenden SPL vor, das die Herausforderungen des redundanten Testens und der Software-Evolution adressiert. Das Framework vereint Testmodellierung, Änderungsauswirkungsanalyse und automatische Testfallselektion, um einen inkrementellen Testprozess zu definieren, der Varianten und Variantenversionen unter Ausnutzung des Wissens über gemeinsame Funktionalität und dem Wiederverwendungspotential von Testartefakten und -resultaten effizient testet. Für die Testmodellierung entwickeln wir einen Ansatz, der Variabilitäts- sowie Versionsinformation von evolvierenden SPL gleichermaßen für die Modellierung einbezieht. Für die Änderungsauswirkungsanalyse definieren wir zwei Techniken, um zum einen Änderungen in Ausführungsabhängigkeiten zwischen zu testenden Varianten und ihren Versionen zu identifizieren und zum anderen die Auswirkungen eines Evolutionsschrittes auf die Variantenmenge zu bestimmen und zu klassifizieren. Für die Testfallselektion schlagen wir ein Abdeckungskriterium vor, das die Resultate der Auswirkungsanalyse einbezieht, um automatisierte Entscheidungen über einen Wiederholungstest von wiederverwendbaren Testfällen durchzuführen. Die abdeckungsgetriebene Testfallselektion ermöglicht somit die Reduktion der redundanten Testfallausführungen während des inkrementellen Testens von Varianten und Variantenversionen. Das Framework ist prototypisch implementiert und anhand von drei evolvierenden SPL evaluiert. Die Resultate zeigen, dass eine Aufwandsreduktion für das Testen evolvierender SPL erreicht wird
Human activity recognition for pervasive interaction
PhD ThesisThis thesis addresses the challenge of computing food preparation context in the kitchen. The automatic
recognition of fine-grained human activities and food ingredients is realized through pervasive sensing
which we achieve by instrumenting kitchen objects such as knives, spoons, and chopping boards with
sensors. Context recognition in the kitchen lies at the heart of a broad range of real-world applications. In
particular, activity and food ingredient recognition in the kitchen is an essential component for situated
services such as automatic prompting services for cognitively impaired kitchen users and digital situated
support for healthier eating interventions. Previous works, however, have addressed the activity
recognition problem by exploring high-level-human activities using wearable sensing (i.e. worn sensors
on human body) or using technologies that raise privacy concerns (i.e. computer vision). Although such
approaches have yielded significant results for a number of activity recognition problems, they are not
applicable to our domain of investigation, for which we argue that the technology itself must be genuinely
“invisible”, thereby allowing users to perform their activities in a completely natural manner.
In this thesis we describe the development of pervasive sensing technologies and algorithms for finegrained
human activity and food ingredient recognition in the kitchen. After reviewing previous work on
food and activity recognition we present three systems that constitute increasingly sophisticated
approaches to the challenge of kitchen context recognition. Two of these systems, Slice&Dice and Classbased
Threshold Dynamic Time Warping (CBT-DTW), recognize fine-grained food preparation
activities. Slice&Dice is a proof-of-concept application, whereas CBT-DTW is a real-time application
that also addresses the problem of recognising unknown activities. The final system, KitchenSense is a
real-time context recognition framework that deals with the recognition of a more complex set of
activities, and includes the recognition of food ingredients and events in the kitchen. For each system, we
describe the prototyping of pervasive sensing technologies, algorithms, as well as real-world experiments
and empirical evaluations that validate the proposed solutions.Vietnamese government’s 322 project, executed by the Vietnamese Ministry of
Education and Training
Semantic Composition via Probabilistic Model Theory
Semantic composition remains an open problem for vector space models of semantics. In this paper, we explain how the probabilistic graphical model used in the framework of Functional Distributional Semantics can be interpreted as a probabilistic version of model theory. Building on this, we explain how various semantic phenomena can be recast in terms of conditional probabilities in the graphical model. This connection between formal semantics and machine learning is helpful in both directions: it gives us an explicit mechanism for modelling context-dependent meanings (a challenge for formal semantics), and also gives us well-motivated techniques for composing distributed representations (a challenge for distributional semantics). We present results on two datasets that go beyond word similarity, showing how these semantically-motivated techniques improve on the performance of vector models.Schiff Foundatio
Second-order Temporal Pooling for Action Recognition
Deep learning models for video-based action recognition usually generate
features for short clips (consisting of a few frames); such clip-level features
are aggregated to video-level representations by computing statistics on these
features. Typically zero-th (max) or the first-order (average) statistics are
used. In this paper, we explore the benefits of using second-order statistics.
Specifically, we propose a novel end-to-end learnable feature aggregation
scheme, dubbed temporal correlation pooling that generates an action descriptor
for a video sequence by capturing the similarities between the temporal
evolution of clip-level CNN features computed across the video. Such a
descriptor, while being computationally cheap, also naturally encodes the
co-activations of multiple CNN features, thereby providing a richer
characterization of actions than their first-order counterparts. We also
propose higher-order extensions of this scheme by computing correlations after
embedding the CNN features in a reproducing kernel Hilbert space. We provide
experiments on benchmark datasets such as HMDB-51 and UCF-101, fine-grained
datasets such as MPII Cooking activities and JHMDB, as well as the recent
Kinetics-600. Our results demonstrate the advantages of higher-order pooling
schemes that when combined with hand-crafted features (as is standard practice)
achieves state-of-the-art accuracy.Comment: Accepted in the International Journal of Computer Vision (IJCV
- …