71 research outputs found

    Understanding Everyday Hands in Action from RGB-D Images

    Get PDF
    International audienceWe analyze functional manipulations of handheld objects, formalizing the problem as one of fine-grained grasp classification. To do so, we make use of a recently developed fine-grained taxonomy of human-object grasps. We introduce a large dataset of 12000 RGB-D images covering 71 everyday grasps in natural interactions. Our dataset is different from past work (typically addressed from a robotics perspective) in terms of its scale, diversity, and combination of RGB and depth data. From a computer-vision perspective , our dataset allows for exploration of contact and force prediction (crucial concepts in functional grasp analysis) from perceptual cues. We present extensive experimental results with state-of-the-art baselines, illustrating the role of segmentation, object context, and 3D-understanding in functional grasp analysis. We demonstrate a near 2X improvement over prior work and a naive deep baseline, while pointing out important directions for improvement

    Depth-based hand pose estimation: data, methods, and challenges

    Get PDF
    International audienceHand pose estimation has matured rapidly in recent years. The introduction of commodity depth sensors and a multitude of practical applications have spurred new advances. We provide an extensive analysis of the state-of-the-art, focusing on hand pose estimation from a single depth frame. To do so, we have implemented a considerable number of systems, and will release all software and evaluation code. We summarize important conclusions here: (1) Pose estimation appears roughly solved for scenes with isolated hands. However, methods still struggle to analyze cluttered scenes where hands may be interacting with nearby objects and surfaces. To spur further progress we introduce a challenging new dataset with diverse, cluttered scenes. (2) Many methods evaluate themselves with disparate criteria , making comparisons difficult. We define a consistent evaluation criteria, rigorously motivated by human experiments. (3) We introduce a simple nearest-neighbor baseline that outperforms most existing systems. This implies that most systems do not generalize beyond their training sets. This also reinforces the under-appreciated point that training data is as important as the model itself. We conclude with directions for future progress

    Precise determination of phonon constants in lead-free monoclinic (K0.5Na0.5)NbO3 single crystals

    Get PDF
    A polarized Raman analysis of ferroelectric (K0.5Na0.5)NbO3 (KNN) single crystals is presented. The Raman modes of KNN single crystals are assigned to the monoclinic symmetry. Angular-dependent intensities of A', A\", and mixed A' + A\" phonons have been theoretically calculated and compared with the experimental data, allowing the precise determination of the Raman tensor coefficients for (non-leaking) modes in single-domain monoclinic KNN. This study is the basis for non-destructive assessments of domain distribution by Raman spectroscopy in KNN-based lead-free ferroelectrics. (C) 2014 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution 3.0 Unported License

    Long-Term Tracking by Decision Making

    No full text
    Cameras can naturally capture sequences of images, or videos, and for computers to understand videos, they must track to connect the past with the present. We focus on two problems which challenge current state-of-the-art trackers. First, we address the challenge of long-term occlusion. For this challenge, a tracker must know when it has lost track and how to reinitialize tracking when the target reappears. We tackle reinitialization by building good appearance models for humans and hands, with a particular emphasis on robustness and occlusion. For the second challenge, appearance variation, the tracker must know when and how to re-learn (or update) an appearance model. Common solutions to this challenge encounter the classic problem of drift: aggressively learning putative appearance changes allows small errors to compound, as elements of the background environment pollute the appearance model. We propose two solutions. First, we consider self-paced learning, wherein a tracker begins by learning from frames it finds easy. As the tracker becomes better at recognizing the target, it begins to learn from harder frames. We also develop a data-driven approach in which we train a tracking policy to decide when and how to update an appearance model. To take this direct approach to “learning when to learn”, we exploit large-scale Internet data through reinforcement learning. We interpret the resulting policy and conclude with extensions for tracking multiple objects. By solving these tracking challenges, we advance applications in augmented reality, vehicle automation, healthcare, and security

    Long-Term Tracking by Decision Making

    No full text
    Cameras can naturally capture sequences of images, or videos, and for computers to understand videos, they must track to connect the past with the present. We focus on two problems which challenge current state-of-the-art trackers. First, we address the challenge of long-term occlusion. For this challenge, a tracker must know when it has lost track and how to reinitialize tracking when the target reappears. We tackle reinitialization by building good appearance models for humans and hands, with a particular emphasis on robustness and occlusion. For the second challenge, appearance variation, the tracker must know when and how to re-learn (or update) an appearance model. Common solutions to this challenge encounter the classic problem of drift: aggressively learning putative appearance changes allows small errors to compound, as elements of the background environment pollute the appearance model. We propose two solutions. First, we consider self-paced learning, wherein a tracker begins by learning from frames it finds easy. As the tracker becomes better at recognizing the target, it begins to learn from harder frames. We also develop a data-driven approach in which we train a tracking policy to decide when and how to update an appearance model. To take this direct approach to “learning when to learn”, we exploit large-scale Internet data through reinforcement learning. We interpret the resulting policy and conclude with extensions for tracking multiple objects. By solving these tracking challenges, we advance applications in augmented reality, vehicle automation, healthcare, and security
    • …
    corecore