710 research outputs found

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Efficient Human Pose Estimation with Image-dependent Interactions

    Get PDF
    Human pose estimation from 2D images is one of the most challenging and computationally-demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. Finally, we develop a local linear approach that learns bases centered around modes in the training data, giving us image-dependent local models which are fast and accurate. These techniques allow for sparse and efficient inference on the order of minutes or seconds per image. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and multiple modes. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets

    Computational Anatomy for Multi-Organ Analysis in Medical Imaging: A Review

    Full text link
    The medical image analysis field has traditionally been focused on the development of organ-, and disease-specific methods. Recently, the interest in the development of more 20 comprehensive computational anatomical models has grown, leading to the creation of multi-organ models. Multi-organ approaches, unlike traditional organ-specific strategies, incorporate inter-organ relations into the model, thus leading to a more accurate representation of the complex human anatomy. Inter-organ relations are not only spatial, but also functional and physiological. Over the years, the strategies 25 proposed to efficiently model multi-organ structures have evolved from the simple global modeling, to more sophisticated approaches such as sequential, hierarchical, or machine learning-based models. In this paper, we present a review of the state of the art on multi-organ analysis and associated computation anatomy methodology. The manuscript follows a methodology-based classification of the different techniques 30 available for the analysis of multi-organs and multi-anatomical structures, from techniques using point distribution models to the most recent deep learning-based approaches. With more than 300 papers included in this review, we reflect on the trends and challenges of the field of computational anatomy, the particularities of each anatomical region, and the potential of multi-organ analysis to increase the impact of 35 medical imaging applications on the future of healthcare.Comment: Paper under revie

    Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades

    Get PDF
    The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models
    • …
    corecore