15 research outputs found

    Depth sensor based object detection using surface curvature

    Get PDF
    An object detection system finds objects from an image or video sequence of the real world. The good performance of object detection has been largely driven by the development of well-established robust feature sets. By using conventional color images as input, researchers have achieved major success. Recent dramatic advances in depth imaging technology triggered significant attention to revisit object detection problems using depth images as input. Using depth information, we propose a feature, Histogram of Oriented Curvature (HOC), designed specifically to capture local surface shape for object detection with depth sensor. We form the HOC feature as a concatenation of the local histograms of Gaussian curvature and mean curvature. The linear Support Vector Machine (SVM) is employed for the object detection task in this work. We evaluate our proposed HOC feature on two widely used datasets and compare the results with other well-known object detection methods applied on both RGB images and depth images. Our experimental results show that the proposed HOC feature generally outperform the HOG and HOGD features in object detection task, and can achieve similar or higher results compared with the state-of-the-art depth descriptor HONV on some object categories

    Real-time Monocular Object SLAM

    Get PDF
    We present a real-time object-based SLAM system that leverages the largest object database to date. Our approach comprises two main components: 1) a monocular SLAM algorithm that exploits object rigidity constraints to improve the map and find its real scale, and 2) a novel object recognition algorithm based on bags of binary words, which provides live detections with a database of 500 3D objects. The two components work together and benefit each other: the SLAM algorithm accumulates information from the observations of the objects, anchors object features to especial map landmarks and sets constrains on the optimization. At the same time, objects partially or fully located within the map are used as a prior to guide the recognition algorithm, achieving higher recall. We evaluate our proposal on five real environments showing improvements on the accuracy of the map and efficiency with respect to other state-of-the-art techniques

    Actionness Ranking with Lattice Conditional Ordinal Random Fields

    Full text link
    Action analysis in image and video has been attracting more and more attention in computer vision. Recognizing specific actions in video clips has been the main focus. We move in a new, more general direction in this paper and ask the critical fundamental question: what is action, how is action different from motion, and in a given image or video where is the action? We study the philosophical and vi-sual characteristics of action, which lead us to define ac-tionness: intentional bodily movement of biological agents (people, animals). To solve the general problem, we pro-pose the lattice conditional ordinal random field model that incorporates local evidence as well as neighboring order agreement. We implement the new model in the continuous domain and apply it to scoring actionness in both image and video datasets. Our experiments demonstrate not only that our new model can outperform the popular ranking SVM but also that indeed action is distinct from motion. 1

    Towards Viewpoint Invariant 3D Human Pose Estimation

    Get PDF
    We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our approach leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. We evaluate our model on a previously published depth dataset and a newly collected human pose dataset containing 100 K annotated depth images from extreme viewpoints. Experiments show that our model achieves competitive performance on frontal views while achieving state-of-the-art performance on alternate viewpoints

    Local k-NNs pattern in omni-direction graph convolution neural network for 3D point clouds

    Get PDF
    Effective representation of objects in irregular and unordered point clouds is one of the core challenges in 3D vision. Transforming point cloud into regular structures, such as 2D images and 3D voxels, are not ideal. It either obscures the inherent geometry information of 3D data or results in high computational complexity. Learning permutation invariance feature directly from raw 3D point clouds using deep neural network is a trend, such as PointNet and its variants, which are effective and computationally efficient. However, these methods are weak to reveal the spatial structure of 3D point clouds. Our method is delicately designed to capture both global and local spatial layout of point cloud by proposing a Local k-NNs Pattern in Omni-Direction Graph Convolution Neural Network architecture, called LKPO-GNN. Our method converts the unordered 3D point cloud into an ordered 1D sequence, to facilitate feeding the raw data into neural networks and simultaneously reducing the computational complexity. LKPO-GNN selects multi-directional k-NNs to form the local topological structure of a centroid, which describes local shapes in the point cloud. Afterwards, GNN is used to combine the local spatial structures and represent the unordered point clouds as a global graph. Experiments on ModelNet40, ShapeNetPart, ScanNet, and S3DIS datasets demonstrate that our proposed method outperforms most existing methods, which verifies the effectiveness and advantage of our work. Additionally, a deep analysis towards illustrating the rationality of our approach, in terms of the learned the topological structure feature, is provided. Source code is available at https://github.com/zwj12377/LKPO-GNN.git

    Learning Structured Hough Voting for Joint Object Detection and Occlusion Reasoning

    No full text
    We propose a structured Hough voting method for detecting objects with heavy occlusion in indoor environments. First, we extend the Hough hypothesis space to include both object location and its visibility pattern, and design a new score function that ac
    corecore