Search CORE

145 research outputs found

Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets

Author: Poppe Ronald
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2007
Field of study

We present an example-based approach to pose recovery, using histograms of oriented gradients as image descriptors. Tests on the HumanEva-I and HumanEva-II data sets provide us insight into the strengths and limitations of an example-based approach. We report mean relative 3D errors of approximately 65 mm per joint on HumanEva-I, and 175 mm on HumanEva-II. We discuss our results using single and multiple views. Also, we perform experiments to assess the algorithm’s generalization to unseen subjects, actions and viewpoints. We plan to incorporate the temporal aspect of human motion analysis to reduce orientation ambiguities, and increase the pose recovery accuracy

University of Twente Research Information

Preface: Facial and Bodily Expressions for Control and Adaptation of Games

Author: Nijholt Anton
Poppe Ronald
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Online backchannel synthesis evaluation with the switching Wizard of Oz

Author: Heylen Dirk
Maat Mark ter
Poppe Ronald
Publication venue: Otto von Guericke University
Publication date: 01/01/2012
Field of study

In this paper, we evaluate a backchannel synthesis algorithm in an online conversation between a human speaker and a virtual listener. We adopt the Switching Wizard of Oz (SWOZ) approach to assess behavior synthesis algorithms online. A human speaker watches a virtual listener that is either controlled by a human listener or by an algorithm. The source switches at random intervals. Speakers indicate when they feel they are no longer talking to a human listener. Analysis of these responses reveals patterns of inappropriate behavior in terms of quantity and timing of backchannels

University of Twente Research Information

Automatic behavior analysis in tag games: from traditional spaces to interactive playgrounds

Author: Moreno Alejandro
Poppe Ronald
Publication venue: Springer
Publication date: 01/01/2016
Field of study

Tag is a popular children’s playground game. It revolves around taggers that chase and then tag runners, upon which their roles switch. There are many variations of the game that aim to keep children engaged by presenting them with challenges and different types of gameplay. We argue that the introduction of sensing and floor projection technology in the playground can aid in providing both variation and challenge. To this end, we need to understand players’ behavior in the playground and steer the interactions using projections accordingly. In this paper, we first analyze the behavior of taggers and runners in a traditional tag setting. We focus on behavioral cues that differ between the two roles. Based on these, we present a probabilistic role recognition model. We then move to an interactive setting and evaluate the model on tag sessions in an interactive tag playground. Our model achieves 77.96 % accuracy, which demonstrates the feasibility of our approach. We identify several avenues for improvement. Eventually, these should lead to a more thorough understanding of what happens in the playground, not only regarding player roles but also when the play breaks down, for example when players are bored or cheat

Springer - Publisher Connector

University of Twente Research Information

Utrecht University Repository

Learn to cycle: Time-consistent feature discovery for action recognition

Author: Poppe Ronald
Stergiou Alexandros
Publication venue: 'Elsevier BV'
Publication date: 23/06/2020
Field of study

Generalizing over temporal variations is a prerequisite for effective action recognition in videos. Despite significant advances in deep neural networks, it remains a challenge to focus on short-term discriminative motions in relation to the overall performance of an action. We address this challenge by allowing some flexibility in discovering relevant spatio-temporal features. We introduce Squeeze and Recursion Temporal Gates (SRTG), an approach that favors inputs with similar activations with potential temporal variations. We implement this idea with a novel CNN block that uses an LSTM to encapsulate feature dynamics, in conjunction with a temporal gate that is responsible for evaluating the consistency of the discovered dynamics and the modeled features. We show consistent improvement when using SRTG blocks, with only a minimal increase in the number of GFLOPs. On Kinetics-700, we perform on par with current state-of-the-art models, and outperform these on HACS, Moments in Time, UCF-101 and HMDB-51

arXiv.org e-Print Archive

Utrecht University Repository

Explore Bristol Research

Example-based pose estimation in monocular images using compact fourier descriptors

Author: Poel Mannes
Poppe Ronald
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2005
Field of study

Automatically estimating human poses from visual input is useful but challenging due to variations in image space and the high dimensionality of the pose space. In this paper, we assume that a human silhouette can be extracted from monocular visual input. We compare the recovery performance of Fourier descriptors with a number of coefficients between 8 and 128, and two different sampling methods. An examplebased approach is taken to recover upper body poses from the descriptors. We test the robustness of our approach by investigating how shape deformations due to changes in body dimensions, viewpoint and noise affect the recovery of the pose. The average error per joint is approximately 16-17° for equidistant sampling and slightly higher for extreme point sampling. Increasing the number of descriptors does not have any influence on the performance. Noise and small changes in viewpoint have only a very small effect on the recovery performance but we obtain higher error scores when recovering poses using silhouettes from a person with different body dimensions

University of Twente Research Information

Multi-Temporal Convolutions for Human Action Recognition in Videos

Author: Poppe Ronald
Stergiou Alexandros
Publication venue
Publication date: 31/03/2021
Field of study

Effective extraction of temporal patterns is crucial for the recognition of temporally varying actions in video. We argue that the fixed-sized spatio-temporal convolution kernels used in convolutional neural networks (CNNs) can be improved to extract informative motions that are executed at different time scales. To address this challenge, we present a novel spatio-temporal convolution block that is capable of extracting spatio-temporal patterns at multiple temporal resolutions. Our proposed multi-temporal convolution (MTConv) blocks utilize two branches that focus on brief and prolonged spatio-temporal patterns, respectively. The extracted time-varying features are aligned in a third branch, with respect to global motion patterns through recurrent cells. The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture. This introduces a substantial reduction in computational costs. Extensive experiments on Kinetics, Moments in Time and HACS action recognition benchmark datasets demonstrate competitive performance of MTConvs compared to the state-of-the-art with a significantly lower computational footprint

arXiv.org e-Print Archive

Utrecht University Repository

Explore Bristol Research

AdaPool:Exponential Adaptive Pooling for Information-Retaining Downsampling

Author: Poppe Ronald
Stergiou Alexandros
Publication venue
Publication date: 01/11/2021
Field of study

Pooling layers are essential building blocks of Convolutional Neural Networks (CNNs) that reduce computational overhead and increase the receptive fields of proceeding convolutional operations. They aim to produce downsampled volumes that closely resemble the input volume while, ideally, also being computationally and memory efficient. It is a challenge to meet both requirements jointly. To this end, we propose an adaptive and exponentially weighted pooling method named adaPool. Our proposed method uses a parameterized fusion of two sets of pooling kernels that are based on the exponent of the Dice-Sorensen coefficient and the exponential maximum, respectively. A key property of adaPool is its bidirectional nature. In contrast to common pooling methods, weights can be used to upsample a downsampled activation map. We term this method adaUnPool. We demonstrate how adaPool improves the preservation of detail through a range of tasks including image and video classification and object detection. We then evaluate adaUnPool on image and video frame super-resolution and frame interpolation tasks. For benchmarking, we introduce Inter4K, a novel high-quality, high frame-rate video dataset. Our combined experiments demonstrate that adaPool systematically achieves better results across tasks and backbone architectures, while introducing a minor additional computational and memory overhead

arXiv.org e-Print Archive

Utrecht University Repository

Explore Bristol Research