34 research outputs found

    Loss Guided Activation for Action Recognition in Still Images

    Full text link
    One significant problem of deep-learning based human action recognition is that it can be easily misled by the presence of irrelevant objects or backgrounds. Existing methods commonly address this problem by employing bounding boxes on the target humans as part of the input, in both training and testing stages. This requirement of bounding boxes as part of the input is needed to enable the methods to ignore irrelevant contexts and extract only human features. However, we consider this solution is inefficient, since the bounding boxes might not be available. Hence, instead of using a person bounding box as an input, we introduce a human-mask loss to automatically guide the activations of the feature maps to the target human who is performing the action, and hence suppress the activations of misleading contexts. We propose a multi-task deep learning method that jointly predicts the human action class and human location heatmap. Extensive experiments demonstrate our approach is more robust compared to the baseline methods under the presence of irrelevant misleading contexts. Our method achieves 94.06\% and 40.65\% (in terms of mAP) on Stanford40 and MPII dataset respectively, which are 3.14\% and 12.6\% relative improvements over the best results reported in the literature, and thus set new state-of-the-art results. Additionally, unlike some existing methods, we eliminate the requirement of using a person bounding box as an input during testing.Comment: Accepted to appear in ACCV 201

    STAR: Sparse Trained Articulated Human Body Regressor

    Full text link
    The SMPL body model is widely used for the estimation, synthesis, and analysis of 3D human pose and shape. While popular, we show that SMPL has several limitations and introduce STAR, which is quantitatively and qualitatively superior to SMPL. First, SMPL has a huge number of parameters resulting from its use of global blend shapes. These dense pose-corrective offsets relate every vertex on the mesh to all the joints in the kinematic tree, capturing spurious long-range correlations. To address this, we define per-joint pose correctives and learn the subset of mesh vertices that are influenced by each joint movement. This sparse formulation results in more realistic deformations and significantly reduces the number of model parameters to 20% of SMPL. When trained on the same data as SMPL, STAR generalizes better despite having many fewer parameters. Second, SMPL factors pose-dependent deformations from body shape while, in reality, people with different shapes deform differently. Consequently, we learn shape-dependent pose-corrective blend shapes that depend on both body pose and BMI. Third, we show that the shape space of SMPL is not rich enough to capture the variation in the human population. We address this by training STAR with an additional 10,000 scans of male and female subjects, and show that this results in better model generalization. STAR is compact, generalizes better to new bodies and is a drop-in replacement for SMPL. STAR is publicly available for research purposes at http://star.is.tue.mpg.de.Comment: ECCV 202

    In Good Shape: Robust People Detection based on Appearance and Shape

    Full text link

    Articulated People Detection and Pose Estimation in Challenging Real World Environments

    No full text

    Learning to Refine Human Pose Estimation

    No full text

    Дизайн сайта как основополагающий принцип продвижения в цифровом пространстве

    No full text
    Анализ принципов построения дизайна сайтов. Разработка методического подхода к созданию дизайна сайта ТПУ с учетом теории поколений и других принциповAnalysis of the principles of building site design. Development of a methodological approach to the creation of TPU website design, taking into account the theory of generations and other principles

    2D Human Pose Estimation: New Benchmark and State of the Art Analysis

    No full text
    Human pose estimation has made significant progress during the last years. However current datasets are limited in their coverage of the overall pose estimation challenges. Still these serve as the common sources to evaluate, train and compare different models on. In this paper we intro-duce a novel benchmark “MPII Human Pose”1 that makes a significant advance in terms of diversity and difficulty, a contribution that we feel is required for future develop-ments in human body models. This comprehensive dataset was collected using an established taxonomy of over 800 human activities [1]. The collected images cover a wider variety of human activities than previous datasets including various recreational, occupational and householding activ-ities, and capture people from a wider range of viewpoints. We provide a rich set of labels including positions of body joints, full 3D torso and head orientation, occlusion labels for joints and body parts, and activity labels. For each im-age we provide adjacent video frames to facilitate the use of motion information. Given these rich annotations we per-form a detailed analysis of leading human pose estimation approaches and gaining insights for the success and fail-ures of these methods. 1

    Poselet Conditioned Pictorial Structures

    No full text
    In this paper we consider the challenging problem of ar-ticulated human pose estimation in still images. We observe that despite high variability of the body articulations, hu-man motions and activities often simultaneously constrain the positions of multiple body parts. Modelling such higher order part dependencies seemingly comes at a cost of more expensive inference, which resulted in their limited use in state-of-the-art methods. In this paper we propose a model that incorporates higher order part dependencies while re-maining efficient. We achieve this by defining a conditional model in which all body parts are connected a-priori, but which becomes a tractable tree-structured pictorial struc-tures model once the image observations are available. In order to derive a set of conditioning variables we rely on the poselet-based features that have been shown to be effective for people detection but have so far found limited appli-cation for articulated human pose estimation. We demon-strate the effectiveness of our approach on three publicly available pose estimation benchmarks improving or being on-par with state of the art in each case. 1
    corecore