3,499 research outputs found

    Improving Surface Normals Based Action Recognition in Depth Images

    Get PDF
    International audienceIn this paper, we propose a new local descriptor for action recognition in depth images. Our proposed descrip-tor jointly encodes the shape and motion cues using surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In a traditional Bag-of-words (BoW) approach, local descriptors extracted from a depth sequence are encoded to form a global representation of the sequence. In our approach, local descriptors are encoded using Sparse Coding (SC) and Fisher Vector (FV), which have been recently proven effective for action recognition. Action recognition is then simply performed using a linear SVM classifier. Our proposed action descriptor is evaluated on two public benchmark datasets, MSRAction3D and MSRGesture3D. The experimental result shows the effectiveness of the proposed method on both the datasets

    Multimodal Deep Learning for Robust RGB-D Object Recognition

    Full text link
    Robust object recognition is a crucial ingredient of many, if not all, real-world robotics applications. This paper leverages recent progress on Convolutional Neural Networks (CNNs) and proposes a novel RGB-D architecture for object recognition. Our architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. We focus on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, we introduce a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns. We present state-of-the-art results on the RGB-D object dataset and show recognition in challenging RGB-D real-world noisy settings.Comment: Final version submitted to IROS'2015, results unchanged, reformulation of some text passages in abstract and introductio

    Lifting GIS Maps into Strong Geometric Context for Scene Understanding

    Full text link
    Contextual information can have a substantial impact on the performance of visual tasks such as semantic segmentation, object detection, and geometric estimation. Data stored in Geographic Information Systems (GIS) offers a rich source of contextual information that has been largely untapped by computer vision. We propose to leverage such information for scene understanding by combining GIS resources with large sets of unorganized photographs using Structure from Motion (SfM) techniques. We present a pipeline to quickly generate strong 3D geometric priors from 2D GIS data using SfM models aligned with minimal user input. Given an image resectioned against this model, we generate robust predictions of depth, surface normals, and semantic labels. We show that the precision of the predicted geometry is substantially more accurate other single-image depth estimation methods. We then demonstrate the utility of these contextual constraints for re-scoring pedestrian detections, and use these GIS contextual features alongside object detection score maps to improve a CRF-based semantic segmentation framework, boosting accuracy over baseline models

    Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd3^{rd} place in this challenge

    Cross-stitch Networks for Multi-task Learning

    Full text link
    Multi-task learning in Convolutional Networks has displayed remarkable success in the field of recognition. This success can be largely attributed to learning shared representations from multiple supervisory tasks. However, existing multi-task approaches rely on enumerating multiple network architectures specific to the tasks at hand, that do not generalize. In this paper, we propose a principled approach to learn shared representations in ConvNets using multi-task learning. Specifically, we propose a new sharing unit: "cross-stitch" unit. These units combine the activations from multiple networks and can be trained end-to-end. A network with cross-stitch units can learn an optimal combination of shared and task-specific representations. Our proposed method generalizes across multiple tasks and shows dramatically improved performance over baseline methods for categories with few training examples.Comment: To appear in CVPR 2016 (Spotlight

    Improving shape from shading with interactive Tabu search

    Get PDF
    Optimisation based shape from shading (SFS) is sensitive to initialization: errors in initialization are a significant cause of poor overall shape reconstruction. In this paper, we present a method to help overcome this problem by means of user interaction. There are two key elements in our method. Firstly, we extend SFS to consider a set of initializations, rather than to use a single one. Secondly, we efficiently explore this initialization space using a heuristic search method, tabu search, guided by user evaluation of the reconstruction quality. Reconstruction results on both synthetic and real images demonstrate the effectiveness of our method in providing more desirable shape reconstructions
    corecore