8,547 research outputs found

    Robust Duplicate Detection of 2D and 3D Objects

    Get PDF
    In this paper, we analyze our graph-based approach for 2D and 3D object duplicate detection in still images. A graph model is used to represent the 3D spatial information of the object based on the features extracted from training images so that an explicit and complex 3D object modeling is avoided. Therefore, improved performance can be achieved in comparison to existing methods in terms of both robustness and computational complexity. Different limitations of our approach are analyzed by evaluating performance with respect to the number of training images and calculation of optimal parameters in a number of applications. Furthermore, effectiveness of our object duplicate detection algorithm is measured over different object classes. Our method is shown to be robust in detecting the same objects even when images with objects are taken from very different viewpoints or distances

    Object Duplicate Detection

    Get PDF
    With the technological evolution of digital acquisition and storage technologies, millions of images and video sequences are captured every day and shared in online services. One way of exploring this huge volume of images and videos is through searching a particular object depicted in images or videos by making use of object duplicate detection. Therefore, need of research on object duplicate detection is validated by several image and video retrieval applications, such as tag propagation, augmented reality, surveillance, mobile visual search, and television statistic measurement. Object duplicate detection is detecting visually same or very similar object to a query. Input is not restricted to an image, it can be several images from an object or even it can be a video. This dissertation describes the author's contribution to solve problems on object duplicate detection in computer vision. A novel graph-based approach is introduced for 2D and 3D object duplicate detection in still images. Graph model is used to represent the 3D spatial information of the object based on the local features extracted from training images so that an explicit and complex 3D object modeling is avoided. Therefore, improved performance can be achieved in comparison to existing methods in terms of both robustness and computational complexity. Our method is shown to be robust in detecting the same objects even when images containing the objects are taken from very different viewpoints or distances. Furthermore, we apply our object duplicate detection method to video, where the training images are added iteratively to the video sequence in order to compensate for 3D view variations, illumination changes and partial occlusions. Finally, we show several mobile applications for object duplicate detection, such as object recognition based museum guide, money recognition or flower recognition. General object duplicate detection may fail to detection chess figures, however considering context, like chess board position and height of the chess figure, detection can be more accurate. We show that user interaction further improves image retrieval compared to pure content-based methods through a game, called Epitome

    3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

    Full text link
    Human activity understanding with 3D/depth sensors has received increasing attention in multimedia processing and interactions. This work targets on developing a novel deep model for automatic activity recognition from RGB-D videos. We represent each human activity as an ensemble of cubic-like video segments, and learn to discover the temporal structures for a category of activities, i.e. how the activities to be decomposed in terms of classification. Our model can be regarded as a structured deep architecture, as it extends the convolutional neural networks (CNNs) by incorporating structure alternatives. Specifically, we build the network consisting of 3D convolutions and max-pooling operators over the video segments, and introduce the latent variables in each convolutional layer manipulating the activation of neurons. Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i.e. the network configuration is allowed to be partially activated during inference. For model training, we propose an EM-type optimization method that iteratively (i) discovers the latent structure by determining the decomposed actions for each training example, and (ii) learns the network parameters by using the back-propagation algorithm. Our approach is validated in challenging scenarios, and outperforms state-of-the-art methods. A large human activity database of RGB-D videos is presented in addition.Comment: This manuscript has 10 pages with 9 figures, and a preliminary version was published in ACM MM'14 conferenc

    Autonomous Robot Navigation with Rich Information Mapping in Nuclear Storage Environments

    Full text link
    This paper presents our approach to develop a method for an unmanned ground vehicle (UGV) to perform inspection tasks in nuclear environments using rich information maps. To reduce inspectors' exposure to elevated radiation levels, an autonomous navigation framework for the UGV has been developed to perform routine inspections such as counting containers, recording their ID tags and performing gamma measurements on some of them. In order to achieve autonomy, a rich information map is generated which includes not only the 2D global cost map consisting of obstacle locations for path planning, but also the location and orientation information for the objects of interest from the inspector's perspective. The UGV's autonomy framework utilizes this information to prioritize locations to navigate to perform the inspections. In this paper, we present our method of generating this rich information map, originally developed to meet the requirements of the International Atomic Energy Agency (IAEA) Robotics Challenge. We demonstrate the performance of our method in a simulated testbed environment containing uranium hexafluoride (UF6) storage container mock ups

    Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene

    Full text link
    The goal of this paper is to take a single 2D image of a scene and recover the 3D structure in terms of a small set of factors: a layout representing the enclosing surfaces as well as a set of objects represented in terms of shape and pose. We propose a convolutional neural network-based approach to predict this representation and benchmark it on a large dataset of indoor scenes. Our experiments evaluate a number of practical design questions, demonstrate that we can infer this representation, and quantitatively and qualitatively demonstrate its merits compared to alternate representations.Comment: Project url with code: https://shubhtuls.github.io/factored3

    DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration

    Full text link
    We present DeepICP - a novel end-to-end learning-based 3D point cloud registration framework that achieves comparable registration accuracy to prior state-of-the-art geometric methods. Different from other keypoint based methods where a RANSAC procedure is usually needed, we implement the use of various deep neural network structures to establish an end-to-end trainable network. Our keypoint detector is trained through this end-to-end structure and enables the system to avoid the inference of dynamic objects, leverages the help of sufficiently salient features on stationary objects, and as a result, achieves high robustness. Rather than searching the corresponding points among existing points, the key contribution is that we innovatively generate them based on learned matching probabilities among a group of candidates, which can boost the registration accuracy. Our loss function incorporates both the local similarity and the global geometric constraints to ensure all above network designs can converge towards the right direction. We comprehensively validate the effectiveness of our approach using both the KITTI dataset and the Apollo-SouthBay dataset. Results demonstrate that our method achieves comparable or better performance than the state-of-the-art geometry-based methods. Detailed ablation and visualization analysis are included to further illustrate the behavior and insights of our network. The low registration error and high robustness of our method makes it attractive for substantial applications relying on the point cloud registration task.Comment: 10 pages, 6 figures, 3 tables, typos corrected, experimental results updated, accepted by ICCV 201
    • …
    corecore