10 research outputs found

    Practical Color-Based Motion Capture

    Get PDF
    Motion capture systems have been widely used for high quality content creation and virtual reality but are rarely used in consumer applications due to their price and setup cost. In this paper, we propose a motion capture system built from commodity components that can be deployed in a matter of minutes. Our approach uses one or more webcams and a color shirt to track the upper-body at interactive rates. We describe a robust color calibration system that enables our color-based tracking to work against cluttered backgrounds and under multiple illuminants. We demonstrate our system in several real-world indoor and outdoor settings

    Practical color-based motion capture

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 93-101).Motion capture systems track the 3-D pose of the human body and are widely used for high quality content creation, gestural user input and virtual reality. However, these systems are rarely deployed in consumer applications due to their price and complexity. In this thesis, we propose a motion capture system built from commodity components that can be deployed in a matter of minutes. Our approach uses one or more webcams and a color garment to track either the user's upper body or hands for motion capture and user input. We demonstrate that custom designed color garments can simplify difficult computer vision problems and lead to efficient and robust algorithms for hand and upper body tracking. Specifically, our highly descriptive color patterns alleviate ambiguities that are commonly encountered when tracking only silhouettes or edges, allowing us to employ a nearest-neighbor approach to track either the hands or the upper body at interactive rates. We also describe a robust color calibration system that enables our color-based tracking to work against cluttered backgrounds and under multiple illuminants. We demonstrate our system in several real-world indoor and outdoor settings and describe proof-of-concept applications enabled by our system that we hope will provide a foundation for new interactions in computer aided design, animation control and augmented reality.by Robert Yuanbo Wang.Ph.D

    It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

    Get PDF
    We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose algorithms. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed algorithm opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses.Comment: BMVC 2018. Project page available at http://www.vision.caltech.edu/~mronchi/projects/RelativePos

    It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data

    Get PDF
    We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with associated 3D poses. Existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed approach opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses

    A seamless solution for 3D real-time interaction: design and evaluation

    Get PDF
    This paper aims to propose and evaluate a markerless solution for capturing hand movements in real time to allow 3D interactions in virtual environments (VEs). Tools such as keyboard and mice are not enough for interacting in 3D VE; current motion capture systems are expensive and require wearing equipment. We developed a solution to allow more natural interactions with objects and VE for navigation and manipulation tasks. We conducted an experimental study involving 20 participants. The goal was to realize object manipulation (moving, orientation, scaling) and navigation tasks in VE. We compared our solution (Microsoft Kinect-based) with data gloves and magnetic sensors (3DGloves) regarding two criteria: performance and acceptability. Results demonstrate similar performance (precision, execution time) but a better overall acceptability for our solution. Preferences of participants are mostly in favor of the 3DCam, mainly for the criteria of comfort, freedom of movement, and handiness. Our solution can be considered as a real alternative to conventional systems for object manipulation in virtual reality

    Vision for Social Robots: Human Perception and Pose Estimation

    Get PDF
    In order to extract the underlying meaning from a scene captured from the surrounding world in a single still image, social robots will need to learn the human ability to detect different objects, understand their arrangement and relationships relative both to their own parts and to each other, and infer the dynamics under which they are evolving. Furthermore, they will need to develop and hold a notion of context to allow assigning different meanings (semantics) to the same visual configuration (syntax) of a scene. The underlying thread of this Thesis is the investigation of new ways for enabling interactions between social robots and humans, by advancing the visual perception capabilities of robots when they process images and videos in which humans are the main focus of attention. First, we analyze the general problem of scene understanding, as social robots moving through the world need to be able to interpret scenes without having been assigned a specific preset goal. Throughout this line of research, i) we observe that human actions and interactions which can be visually discriminated from an image follow a very heavy-tailed distribution; ii) we develop an algorithm that can obtain a spatial understanding of a scene by only using cues arising from the effect of perspective on a picture of a person’s face; and iii) we define a novel taxonomy of errors for the task of estimating the 2D body pose of people in images to better explain the behavior of algorithms and highlight their underlying causes of error. Second, we focus on the specific task of 3D human pose and motion estimation from monocular 2D images using weakly supervised training data, as accurately predicting human pose will open up the possibility of richer interactions between humans and social robots. We show that when 3D ground-truth data is only available in small quantities, or not at all, it is possible to leverage knowledge about the physical properties of the human body, along with additional constraints related to alternative types of supervisory signals, to learn models that can regress the full 3D pose of the human body and predict its motions from monocular 2D images. Taken in its entirety, the intent of this Thesis is to highlight the importance of, and provide novel methodologies for, social robots' ability to interpret their surrounding environment, learn in a way that is robust to low data availability, and generalize previously observed behaviors to unknown situations in a similar way to humans.</p

    Acquiring 3D Full-body Motion from Noisy and Ambiguous Input

    Get PDF
    Natural human motion is highly demanded and widely used in a variety of applications such as video games and virtual realities. However, acquisition of full-body motion remains challenging because the system must be capable of accurately capturing a wide variety of human actions and does not require a considerable amount of time and skill to assemble. For instance, commercial optical motion capture systems such as Vicon can capture human motion with high accuracy and resolution while they often require post-processing by experts, which is time-consuming and costly. Microsoft Kinect, despite its high popularity and wide applications, does not provide accurate reconstruction of complex movements when significant occlusions occur. This dissertation explores two different approaches that accurately reconstruct full-body human motion from noisy and ambiguous input data captured by commercial motion capture devices. The first approach automatically generates high-quality human motion from noisy data obtained from commercial optical motion capture systems, eliminating the need for post-processing. The second approach accurately captures a wide variety of human motion even under significant occlusions by using color/depth data captured by a single Kinect camera. The common theme that underlies two approaches is the use of prior knowledge embedded in pre-recorded motion capture database to reduce the reconstruction ambiguity caused by noisy and ambiguous input and constrain the solution to lie in the natural motion space. More specifically, the first approach constructs a series of spatial-temporal filter bases from pre-captured human motion data and employs them along with robust statistics techniques to filter noisy motion data corrupted by noise/outliers. The second approach formulates the problem in a Maximum a Posterior (MAP) framework and generates the most likely pose which explains the observations as well as consistent with the patterns embedded in the pre-recorded motion capture database. We demonstrate the effectiveness of our approaches through extensive numerical evaluations on synthetic data and comparisons against results created by commercial motion capture systems. The first approach can effectively denoise a wide variety of noisy motion data, including walking, running, jumping and swimming while the second approach is shown to be capable of accurately reconstructing a wider range of motions compared with Microsoft Kinect
    corecore