Search CORE

15 research outputs found

Deep Directional Statistics: Pose Estimation with Uncertainty Quantification

Author: Gehler Peter
Nowozin Sebastian
Prokudin Sergey
Publication venue
Publication date: 01/01/2018
Field of study

Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low-resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance is unavoidable, we would like our models to quantify their uncertainty in order to achieve robustness against images of varying quality. Probabilistic deep learning models combine the expressive power of deep learning with uncertainty quantification. In this paper, we propose a novel probabilistic deep learning model for the task of angular regression. Our model uses von Mises distributions to predict a distribution over object pose angle. Whereas a single von Mises distribution is making strong assumptions about the shape of the distribution, we extend the basic model to predict a mixture of von Mises distributions. We show how to learn a mixture model using a finite and infinite number of mixture components. Our model allows for likelihood-based training and efficient inference at test time. We demonstrate on a number of challenging pose estimation datasets that our model produces calibrated probability predictions and competitive or superior point estimates compared to the current state-of-the-art

arXiv.org e-Print Archive

MPG.PuRe

An adaptive motion model for person tracking with instantaneous head-pose features

Author: Baxter Rolf
Leach Michael
Mukherjee Sankha Subhra
Robertson Neil
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2014
Field of study

Heriot Watt Pure

Towards a Principled Integration of Multi-Camera Re-Identification and Tracking through Optimal Bayes Filters

Author: Beyer Lucas
Breuers Stefan
Kurin Vitaly
Leibe Bastian
Publication venue
Publication date: 01/01/2017
Field of study

With the rise of end-to-end learning through deep learning, person detectors and re-identification (ReID) models have recently become very strong. Multi-camera multi-target (MCMT) tracking has not fully gone through this transformation yet. We intend to take another step in this direction by presenting a theoretically principled way of integrating ReID with tracking formulated as an optimal Bayes filter. This conveniently side-steps the need for data-association and opens up a direct path from full images to the core of the tracker. While the results are still sub-par, we believe that this new, tight integration opens many interesting research opportunities and leads the way towards full end-to-end tracking from raw pixels.Comment: First two authors have equal contribution. This is initial work into a new direction, not a benchmark-beating method. v2 only adds acknowledgements and fixes a typo in e-mai

arXiv.org e-Print Archive

Crossref

Improving Head and Body Pose Estimation through Semi-supervised Manifold Alignment

Author: Ahuja Narendra
Ghanem Bernard
Heili Alexandre
Odobez Jean-Marc
Varadarajan Jagannadan
Publication venue
Publication date: 19/07/2014
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Crossref

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Author: Alameda-Pineda Xavier
Batrinca Ligia
Lanz Oswald
Lepri Bruno
Ricci Elisa
Sebe Nicu
Staiano Jacopo
Subramanian Ramanathan
Publication venue
Publication date: 23/06/2015
Field of study

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

University of Canberra Research Repository

No Matter Where You Are: Flexible Graph-Guided Multi-task Learning for Multi-view Head Pose Classification under Target Motion

Author: Elisa Ricci
Nicu Sebe
Oswald Lanz
Ramanathan Subramanian
Yan Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We propose a novel Multi-Task Learning framework (FEGA-MTL) for classifying the head pose of a person who moves freely in an environment monitored by multi-ple, large field-of-view surveillance cameras. As the tar-get (person) moves, distortions in facial appearance ow-ing to camera perspective and scale severely impede per-formance of traditional head pose classification methods. FEGA-MTL operates on a dense uniform spatial grid and learns appearance relationships across partitions as well as partition-specific appearance variations for a given head pose to build region-specific classifiers. Guided by two graphs which a-priori model appearance similarity among (i) grid partitions based on camera geometry and (ii) head pose classes, the learner efficiently clusters appearance-wise related grid partitions to derive the optimal partition-ing. For pose classification, upon determining the target’s position using a person tracker, the appropriate region-specific classifier is invoked. Experiments confirm that FEGA-MTL achieves state-of-the-art classification with few training data. 1

CiteSeerX

Crossref

University of Canberra Research Repository

A Multi-task Learning Framework for Head Pose Estimation under Target Motion

Author: Lanz Oswald
Liu Gaowen
Ricci Elisa
Sebe Nicu
Subramanian Ramanathan
Yan Yan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Recently, head pose estimation (HPE) from low-resolution surveillance data has gained in importance. However, monocular and multi-view HPE approaches still work poorly under target motion, as facial appearance distorts owing to camera perspective and scale changes when a person moves around. To this end, we propose FEGA-MTL, a novel framework based on Multi-Task Learning (MTL) for classifying the head pose of a person who moves freely in an environment monitored by multiple, large field-of-view surveillance cameras. Upon partitioning the monitored scene into a dense uniform spatial grid, FEGA-MTL simultaneously clusters grid partitions into regions with similar facial appearance, while learning region-specific head pose classifiers. In the learning phase, guided by two graphs which a-priori model the similarity among (1) grid partitions based on camera geometry and (2) head pose classes, FEGA-MTL derives the optimal scene partitioning and associated pose classifiers. Upon determining the target's position using a person tracker at test time, the corresponding region-specific classifier is invoked for HPE. The FEGA-MTL framework naturally extends to a weakly supervised setting where the target's walking direction is employed as a proxy in lieu of head orientation. Experiments confirm that FEGA-MTL significantly outperforms competing single-task and multi-task learning methods in multi-view settings

Archivio della ricerca - Fondazione Bruno Kessler

University of Canberra Research Repository

Exploring Transfer Learning Approaches for Head Pose Classification from Multi-view Surveillance Images

Author: A Doshi
A Ferencz
Anoop Kolar Rajagopal
B Lepri
E Murphy-Chutorian
Elisa Ricci
K Smith
L Duan
Nicu Sebe
O Lanz
Oswald Lanz
R Muñoz-Salinas
Radu L. Vieriu
Ramakrishnan Kalpathi R.
Ramanathan Subramanian
SJ Pan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Author: Cremers Daniel
Dendorfer Patrick
Leal-Taixé Laura
Milan Anton
Ošep Aljoša
Reid Ian
Roth Stefan
Schindler Konrad
Publication venue
Publication date: 08/12/2020
Field of study

Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data, and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light on potential future research directions.Comment: Accepted at IJC

arXiv.org e-Print Archive

TUbiblio

Repository for Publications and Research Data