Search CORE

620 research outputs found

TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting

Author: Choudhury Rohan
Jeni Laszlo A.
Kitani Kris
Publication venue
Publication date: 14/09/2023
Field of study

Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the state-of-the-art by recurrently computing per-person 2D pose features, fusing both spatial and temporal information into a single representation. In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency. We further use this representation to track human poses over time as well as predict future poses. Finally, we demonstrate that our model is able to generalize across datasets without scene-specific fine-tuning. TEMPO achieves 10

\%

better MPJPE with a 33

\times

improvement in FPS compared to TesseTrack on the challenging CMU Panoptic Studio dataset.Comment: Accepted at ICCV 202

arXiv.org e-Print Archive

Predicting Future Instance Segmentation by Forecasting Convolutional Features

Author: A Yang
B Romera-Paredes
J Walker
KM Kitani
PO Pinheiro
R Sutton
T Lan
T-Y Lin
Publication venue
Publication date: 08/09/2018
Field of study

Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of future instance segmentation, which additionally segments out individual objects. To deal with a varying number of output labels per image, we develop a predictive model in the space of fixed-sized convolutional features of the Mask R-CNN instance segmentation model. We apply the "detection head'" of Mask R-CNN on the predicted features to produce the instance segmentation of future frames. Experiments show that this approach significantly improves over strong baselines based on optical flow and repurposed instance segmentation architectures

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Exploring and Understanding the High Dimensional and Sparse Image Face Space: a Self-Organized Manifold Mapping

Author: A. Giraldi Gilson
C. Kitani Edson
E. Thomaz Carlos
M. Hernandez Emilio
Publication venue: 'IntechOpen'
Publication date: 01/08/2011
Field of study

IntechOpen

Crossref

Dimensionality Reduction, Classification and Reconstruction Problems in Statistical Learning Approaches

Author: Giraldi Gilson A.
Kitani Edson C.
Rodrigues Paulo S.
Thomaz Carlos E.
Publication venue: 'Universidade Federal do Rio Grande do Sul'
Publication date: 24/09/2008
Field of study

Statistical learning theory explores ways of estimating functional dependency from a given collection of data. The specific sub-area of supervised statistical learning covers important models like Perceptron, Support Vector Machines (SVM) and Linear Discriminant Analysis (LDA). In this paper we review the theory of such models and compare their separating hypersurfaces for extracting group-differences between samples. Classification and reconstruction are the main goals of this comparison. We show recent advances in this topic of research illustrating their application on face and medical image databases.Statistical learning theory explores ways of estimating functional dependency from a given collection of data. The specific sub-area of supervised statistical learning covers important models like Perceptron, Support Vector Machines (SVM) and Linear Discriminant Analysis (LDA). In this paper we review the theory of such models and compare their separating hypersurfaces for extracting group-differences between samples. Classification and reconstruction are the main goals of this comparison. We show recent advances in this topic of research illustrating their application on face and medical image databases

Em Questao

Archives of the Faculty of Veterinary Medicine UFRGS

Survey on Vision-based Path Prediction

Author: A Lerner
A Robicquet
CG Keller
D Helbing
D Munoz
D Weinland
E Shelhamer
H Zhu
JANE BROMLEY
JFP Kooij
KM Kitani
L Ballan
Nicolas Schneider
R Benenson
S Huang
S Singh
S Yi
SZ Bokhari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Path prediction is a fundamental task for estimating how pedestrians or vehicles are going to move in a scene. Because path prediction as a task of computer vision uses video as input, various information used for prediction, such as the environment surrounding the target and the internal state of the target, need to be estimated from the video in addition to predicting paths. Many prediction approaches that include understanding the environment and the internal state have been proposed. In this survey, we systematically summarize methods of path prediction that take video as input and and extract features from the video. Moreover, we introduce datasets used to evaluate path prediction methods quantitatively.Comment: DAPI 201

arXiv.org e-Print Archive

Crossref

CAR-Net: Clairvoyant Attentive Recurrent Network

Author: A Robicquet
B Zhao
BT Morris
CK Williams
D Helbing
D Makris
D Xie
HS Koppula
J Quiñonero-Candela
JM Wang
KM Kitani
L Ballan
N Graham
O Russakovsky
P McCullagh
R Vesel
RE Kalman
S Hochreiter
S Pellegrini
Publication venue
Publication date: 16/07/2018
Field of study

We present an interpretable framework for path prediction that leverages dependencies between agents' behaviors and their spatial navigation environment. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-view image of the navigation scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns where to look in a large image of the scene when solving the path prediction task. Our method can attend to any area, or combination of areas, within the raw image (e.g., road intersections) when predicting the trajectory of the agent. This allows us to visualize fine-grained semantic elements of navigation scenes that influence the prediction of trajectories. To study the impact of space on agents' trajectories, we build a new dataset made of top-view images of hundreds of scenes (Formula One racing tracks) where agents' behaviors are heavily influenced by known areas in the images (e.g., upcoming turns). CAR-Net successfully attends to these salient regions. Additionally, CAR-Net reaches state-of-the-art accuracy on the standard trajectory forecasting benchmark, Stanford Drone Dataset (SDD). Finally, we show CAR-Net's ability to generalize to unseen scenes.Comment: The 2nd and 3rd authors contributed equall

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Recommended from our members

Charge distribution and electroluminescence in cross-linked polyethylene under dc field

Author: A See
Alison J M
Augé J L
Bert C
C Laurent
Dissado L
F Palmieri
Fothergill J C
Fothergill J C
Fukuma M
G C Montanari
G Teyssedre
Ho Y F F
J C Fothergill
Kitani I
L A Dissado
Laurent C
Li Y
Mary D
Montanari G C
Suzuoki Y
Zeller H R
Zeller H R
Publication venue: 'IOP Publishing'
Publication date: 01/01/2001
Field of study

The intent of this paper is to cross-correlate the information obtained by space charge distribution analysis and electroluminescence (EL) detection in cross-linked polyethylene samples submitted to dc fields, with the objective to make a link between space charge phenomena and energy release as revealed by the detection of visible photons. Space charge measurements carried out at different field levels by the pulsed electro-acoustic method show the presence of a low-field threshold, close to 15-20 kV mm-1, above which considerable space charge begins to accumulate in the insulation. Charges are seen to cross the insulation thickness through a packet-like behaviour at higher fields, starting at about 60-70 kV mm-1. EL measurements show the existence of two distinct thresholds, one related to the continuous excitation of EL under voltage, the other being transient EL detected upon specimen short circuit. The former occurs at values of field corresponding to charge packet formation and the latter to the onset of space charge accumulation. The correspondence between pertinent values of the electric field obtained through space charge and EL analyses provides support for the existence of degradation thresholds in insulating materials. Special emphasis is given to the relationship between charge packet formation and propagation, and EL. Although the two phenomena are observed in the same field range, it is found that the onset of continuous EL follows the formation at the electrodes of positive and negative space charge regions that extend into the bulk prior to the propagation of charge packets. Charge recombination appears to be the excitation process of EL since oppositely charged domains meet in the material bulk. To gain an insight into specific light-excitation processes associated with charge packet propagation, EL has been recorded for several hours under fields at which charge packet dynamics were evidenced. It is shown that current and luminescence oscillations are detected during charge packet propagation, and that they are in phase. The mechanisms underlying EL and charge packets are further considered on the basis of these results

City Research Online

Crossref

Leicester Research Archive

Mhealth interventions to address physical activity and sedentary behavior in cancer survivors: A systematic review

Author: Al-Kitani M.
Ansari Payam
Khoo S.
Mohbin N.
Müller A. M.
Publication venue
Publication date: 01/06/2021
Field of study

This review aimed to identify, evaluate, and synthesize the scientific literature on mobile health (mHealth) interventions to promote physical activity (PA) or reduce sedentary behavior (SB) in cancer survivors. We searched six databases from 2000 to 13 April 2020 for controlled and non-controlled trials published in any language. We conducted best evidence syntheses on controlled trials to assess the strength of the evidence. All 31 interventions included in this review measured PA outcomes, with 10 of them also evaluating SB outcomes. Most study participants were adults/older adults with various cancer types. The majority (n = 25) of studies implemented multi-component interventions, with activity trackers being the most commonly used mHealth technol-ogy. There is strong evidence for mHealth interventions, including personal contact components, in increasing moderate-to-vigorous intensity PA among cancer survivors. However, there is inconclusive evidence to support mHealth interventions in increasing total activity and step counts. There is inconclusive evidence on SB potentially due to the limited number of studies. mHealth interventions that include personal contact components are likely more effective in increasing PA than mHealth interventions without such components. Future research should address social factors in mHealth interventions for PA and SB in cancer survivors

Bournemouth University Research Online

Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video

Author: A Furnari
A Furnari
CY Chen
D Damen
G di Pellegrino
HS Koppula
JM Wang
KM Kitani
L-Y Gui
M Rushworth
R Hari
SM Aglioti
V Delaitre
V Pavlovic
W James
Y Huang
Y Li
Y Shen
Publication venue
Publication date: 19/07/2020
Field of study

We address the challenging task of anticipating human-object interaction in first person videos. Most existing methods ignore how the camera wearer interacts with the objects, or simply consider body motion as a separate modality. In contrast, we observe that the international hand movement reveals critical information about the future activity. Motivated by this, we adopt intentional hand movement as a future representation and propose a novel deep network that jointly models and predicts the egocentric hand motion, interaction hotspots and future action. Specifically, we consider the future hand motion as the motor attention, and model this attention using latent variables in our deep model. The predicted motor attention is further used to characterise the discriminative spatial-temporal visual features for predicting actions and interaction hotspots. We present extensive experiments demonstrating the benefit of the proposed joint model. Importantly, our model produces new state-of-the-art results for action anticipation on both EGTEA Gaze+ and the EPIC-Kitchens datasets. Our project page is available at https://aptx4869lm.github.io/ForecastingHOI

arXiv.org e-Print Archive

Crossref