3,544 research outputs found
Articulated Pose Estimation Using Hierarchical Exemplar-Based Models
Exemplar-based models have achieved great success on localizing the parts of
semi-rigid objects. However, their efficacy on highly articulated objects such
as humans is yet to be explored. Inspired by hierarchical object representation
and recent application of Deep Convolutional Neural Networks (DCNNs) on human
pose estimation, we propose a novel formulation that incorporates both
hierarchical exemplar-based models and DCNNs in the spatial terms.
Specifically, we obtain more expressive spatial models by assuming independence
between exemplars at different levels in the hierarchy; we also obtain stronger
spatial constraints by inferring the spatial relations between parts at the
same level. As our method strikes a good balance between expressiveness and
strength of spatial models, it is both effective and generalizable, achieving
state-of-the-art results on different benchmarks: Leeds Sports Dataset and
CUB-200-2011.Comment: 8 pages, 6 figure
Recommended from our members
Object Part Localization Using Exemplar-based Models
​Object part localization is a fundamental problem in computer vision, which aims to let machines understand object in an image as a configuration of parts. As the visual features at parts are usually weak and misleading, spatial models are needed to constrain the part configuration, ensuring that the estimated part locations respect both image cue and shape prior. Unlike most of the state-of-the-art techniques that employ parametric spatial models, we turn to non-parametric exemplars of part configurations. The benefit is twofold: instead of assuming any parametric yet imprecise distributions on the spatial relations of parts, exemplars literally encode such relations present in the training samples; exemplars allow us to prune the search space of part configurations with high confidence.
This thesis consists of two parts: fine-grained classification and object part localization. We first verify the efficacy of parts in fine-grained classification, where we build working systems that automatically identify dog breeds, fish species, and bird species using localized parts on the object. Then we explore multiple ways to enhance exemplar-based models, such that they can be well applied to deformable objects such as bird and human body. Specifically, we propose to enforce pose and subcategory consistency in exemplar matching, thus obtaining more reliable hypotheses of configuration. We also propose part-pair representation that features novel shape composing with multiple promising hypotheses. In the end, we adapt exemplars to hierarchical representation, and design a principled formulation to predict the part configuration based on multi-scale image cues and multi-level exemplars. These efforts consistently improve the accuracy of object part localization
Unsupervised Video Understanding by Reconciliation of Posture Similarities
Understanding human activity and being able to explain it in detail surpasses
mere action classification by far in both complexity and value. The challenge
is thus to describe an activity on the basis of its most fundamental
constituents, the individual postures and their distinctive transitions.
Supervised learning of such a fine-grained representation based on elementary
poses is very tedious and does not scale. Therefore, we propose a completely
unsupervised deep learning procedure based solely on video sequences, which
starts from scratch without requiring pre-trained networks, predefined body
models, or keypoints. A combinatorial sequence matching algorithm proposes
relations between frames from subsets of the training data, while a CNN is
reconciling the transitivity conflicts of the different subsets to learn a
single concerted pose embedding despite changes in appearance across sequences.
Without any manual annotation, the model learns a structured representation of
postures and their temporal development. The model not only enables retrieval
of similar postures but also temporal super-resolution. Additionally, based on
a recurrent formulation, next frames can be synthesized.Comment: Accepted by ICCV 201
Robust recognition and segmentation of human actions using HMMs with missing observations
This paper describes the integration of missing observation data with hidden Markov models to create a framework that is able to segment and classify individual actions from a stream of human motion using an incomplete 3D human pose estimation. Based on this framework, a model is trained to automatically segment and classify an activity sequence into its constituent subactions during inferencing. This is achieved by introducing action labels into the observation vector and setting these labels as missing data during inferencing, thus forcing the system to infer the probability of each action label. Additionally, missing data provides recognition-level support for occlusions and imperfect silhouette segmentation, permitting the use of a fast (real-time) pose estimation that delegates the burden of handling undetected limbs onto the action recognition system. Findings show that the use of missing data to segment activities is an accurate and elegant approach. Furthermore, action recognition can be accurate even when almost half of the pose feature data is missing due to occlusions, since not all of the pose data is important all of the time
Advances in Monocular Exemplar-based Human Body Pose Analysis: Modeling, Detection and Tracking
Esta tesis contribuye en el análisis de la postura del cuerpo humano a partir de secuencias de imágenes adquiridas con una sola cámara. Esta temática presenta un amplio rango de potenciales aplicaciones en video-vigilancia, video-juegos o aplicaciones biomédicas. Las técnicas basadas en patrones han tenido éxito, sin embargo, su precisión depende de la similitud del punto de vista de la cámara y de las propiedades de la escena entre las imágenes de entrenamiento y las de prueba. Teniendo en cuenta un conjunto de datos de entrenamiento capturado mediante un número reducido de cámaras fijas, paralelas al suelo, se han identificado y analizado tres escenarios posibles con creciente nivel de dificultad: 1) una cámara estática paralela al suelo, 2) una cámara de vigilancia fija con un ángulo de visión considerablemente diferente, y 3) una secuencia de video capturada con una cámara en movimiento o simplemente una sola imagen estática
- …