185 research outputs found

    XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

    No full text
    We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes

    {Mo2Cap2}: Real-time Mobile {3D} Motion Capture with a Cap-mounted Fisheye Camera

    Get PDF
    We propose the first real-time approach for the egocentric estimation of 3D human body pose in a wide range of unconstrained everyday activities. This setting has a unique set of challenges, such as mobility of the hardware setup, and robustness to long capture sessions with fast recovery from tracking failures. We tackle these challenges based on a novel lightweight setup that converts a standard baseball cap to a device for high-quality pose estimation based on a single cap-mounted fisheye camera. From the captured egocentric live stream, our CNN based 3D pose estimation approach runs at 60Hz on a consumer-level GPU. In addition to the novel hardware setup, our other main contributions are: 1) a large ground truth training corpus of top-down fisheye images and 2) a novel disentangled 3D pose estimation approach that takes the unique properties of the egocentric viewpoint into account. As shown by our evaluation, we achieve lower 3D joint error as well as better 2D overlay than the existing baselines

    Promoting Connectivity of Network-Like Structures by Enforcing Region Separation

    Full text link
    We propose a novel, connectivity-oriented loss function for training deep convolutional networks to reconstruct network-like structures, like roads and irrigation canals, from aerial images. The main idea behind our loss is to express the connectivity of roads, or canals, in terms of disconnections that they create between background regions of the image. In simple terms, a gap in the predicted road causes two background regions, that lie on the opposite sides of a ground truth road, to touch in prediction. Our loss function is designed to prevent such unwanted connections between background regions, and therefore close the gaps in predicted roads. It also prevents predicting false positive roads and canals by penalizing unwarranted disconnections of background regions. In order to capture even short, dead-ending road segments, we evaluate the loss in small image crops. We show, in experiments on two standard road benchmarks and a new data set of irrigation canals, that convnets trained with our loss function recover road connectivity so well, that it suffices to skeletonize their output to produce state of the art maps. A distinct advantage of our approach is that the loss can be plugged in to any existing training setup without further modifications

    Real-time landing place assessment in man-made environments

    Get PDF
    We propose a novel approach to the real-time landing site detection and assessment in unconstrained man-made environments using passive sensors. Because this task must be performed in a few seconds or less, existing methods are often limited to simple local intensity and edge variation cues. By contrast, we show how to efficiently take into account the potential sites' global shape, which is a critical cue in man-made scenes. Our method relies on a new segmentation algorithm and shape regularity measure to look for polygonal regions in video sequences. In this way, we enforce both temporal consistency and geometric regularity, resulting in very reliable and consistent detections. We demonstrate our approach for the detection of landable sites such as rural fields, building rooftops and runways from color and infrared monocular sequences significantly outperforming the state-of-the-art

    Domain Adaptation for Microscopy Imaging

    Get PDF
    Electron and Light Microscopy imaging can now deliver high-quality image stacks of neural structures. However, the amount of human annotation effort required to analyze them remains a major bottleneck. While Machine Learning algorithms can be used to help automate this process, they require training data, which is time-consuming to obtain manually, especially in image stacks. Furthermore, due to changing experimental conditions, successive stacks often exhibit differences that are severe enough to make it difficult to use a classifier trained for a specific one on another. This means that this tedious annotation process has to be repeated for each new stack. In this paper we present a domain adaptation algorithm that addresses this issue by effectively leveraging labeled examples across different acquisitions and significantly reducing the annotation requirements. Our approach can handle complex, non-linear image feature transformations and scales to large microscopy datasets that often involve high-dimensional feature spaces and large 3D data volumes. We evaluate our approach on four challenging Electron and Light Microscopy applications that exhibit very different image modalities and where annotation is very costly. Across all applications we achieve a significant improvement over the state-of-the-art Machine Learning methods and demonstrate our ability to greatly reduce human annotation effort

    Real-time landing place assessment in man-made environments

    Get PDF
    We propose a novel approach to the real-time landing site detection and assessment in unconstrained man-made environments using passive sensors. Because this task must be performed in a few seconds or less, existing methods are often limited to simple local intensity and edge variation cues. By contrast, we show how to efficiently take into account the potential sites' global shape, which is a critical cue in man-made scenes. Our method relies on a new segmentation algorithm and shape regularity measure to look for polygonal regions in video sequences. In this way, we enforce both temporal consistency and geometric regularity, resulting in very reliable and consistent detections. We demonstrate our approach for the detection of landable sites such as rural fields, building rooftops and runways from color and infrared monocular sequences significantly outperforming the state-of-the-art

    The effects of aging on neuropil structure in mouse somatosensory cortex-A 3D electron microscopy analysis of layer 1

    Get PDF
    This study has used dense reconstructions from serial EM images to compare the neuropil ultrastructure and connectivity of aged and adult mice. The analysis used models of axons, dendrites, and their synaptic connections, reconstructed from volumes of neuropil imaged in layer 1 of the somatosensory cortex. This shows the changes to neuropil structure that accompany a general loss of synapses in a well-defined brain region. The loss of excitatory synapses was balanced by an increase in their size such that the total amount of synaptic surface, per unit length of axon, and per unit volume of neuropil, stayed the same. There was also a greater reduction of inhibitory synapses than excitatory, particularly those found on dendritic spines, resulting in an increase in the excitatory/inhibitory balance. The close correlations, that exist in young and adult neurons, between spine volume, bouton volume, synaptic size, and docked vesicle numbers are all preserved during aging. These comparisons display features that indicate a reduced plasticity of cortical circuits, with fewer, more transient, connections, but nevertheless an enhancement of the remaining connectivity that compensates for a generalized synapse loss

    Local and global skeleton fitting techniques for optical motion capture

    Get PDF
    Identifying a precise anatomic skeleton is important in order to ensure high-quality motion capture. In this paper, we discuss two skeleton-fitting techniques based on 3D optical marker data. First, a local technique is proposed, based on relative marker trajectories. Then it is compared to a global optimization of a skeleton model. Various proposals are made to handle the skin deformation proble

    State of the Art in Dense Monocular Non-Rigid 3D Reconstruction

    Get PDF
    3D reconstruction of deformable (or non-rigid) scenes from a set of monocular2D image observations is a long-standing and actively researched area ofcomputer vision and graphics. It is an ill-posed inverse problem,since--without additional prior assumptions--it permits infinitely manysolutions leading to accurate projection to the input 2D images. Non-rigidreconstruction is a foundational building block for downstream applicationslike robotics, AR/VR, or visual content creation. The key advantage of usingmonocular cameras is their omnipresence and availability to the end users aswell as their ease of use compared to more sophisticated camera set-ups such asstereo or multi-view systems. This survey focuses on state-of-the-art methodsfor dense non-rigid 3D reconstruction of various deformable objects andcomposite scenes from monocular videos or sets of monocular views. It reviewsthe fundamentals of 3D reconstruction and deformation modeling from 2D imageobservations. We then start from general methods--that handle arbitrary scenesand make only a few prior assumptions--and proceed towards techniques makingstronger assumptions about the observed objects and types of deformations (e.g.human faces, bodies, hands, and animals). A significant part of this STAR isalso devoted to classification and a high-level comparison of the methods, aswell as an overview of the datasets for training and evaluation of thediscussed techniques. We conclude by discussing open challenges in the fieldand the social aspects associated with the usage of the reviewed methods.<br
    • …
    corecore