3,593 research outputs found

    FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image

    Full text link
    In this work, we introduce the novel problem of identifying dense canonical 3D coordinate frames from a single RGB image. We observe that each pixel in an image corresponds to a surface in the underlying 3D geometry, where a canonical frame can be identified as represented by three orthogonal axes, one along its normal direction and two in its tangent plane. We propose an algorithm to predict these axes from RGB. Our first insight is that canonical frames computed automatically with recently introduced direction field synthesis methods can provide training data for the task. Our second insight is that networks designed for surface normal prediction provide better results when trained jointly to predict canonical frames, and even better when trained to also predict 2D projections of canonical frames. We conjecture this is because projections of canonical tangent directions often align with local gradients in images, and because those directions are tightly linked to 3D canonical frames through projective geometry and orthogonality constraints. In our experiments, we find that our method predicts 3D canonical frames that can be used in applications ranging from surface normal estimation, feature matching, and augmented reality

    Development of An Android Application for Object Detection Based on Color, Shape, or Local Features

    Full text link
    Object detection and recognition is an important task in many computer vision applications. In this paper an Android application was developed using Eclipse IDE and OpenCV3 Library. This application is able to detect objects in an image that is loaded from the mobile gallery, based on its color, shape, or local features. The image is processed in the HSV color domain for better color detection. Circular shapes are detected using Circular Hough Transform and other shapes are detected using Douglas-Peucker algorithm. BRISK (binary robust invariant scalable keypoints) local features were applied in the developed Android application for matching an object image in another scene image. The steps of the proposed detection algorithms are described, and the interfaces of the application are illustrated. The application is ported and tested on Galaxy S3, S6, and Note1 Smartphones. Based on the experimental results, the application is capable of detecting eleven different colors, detecting two dimensional geometrical shapes including circles, rectangles, triangles, and squares, and correctly match local features of object and scene images for different conditions. The application could be used as a standalone application, or as a part of another application such as Robot systems, traffic systems, e-learning applications, information retrieval and many others

    Estimating 6D Pose From Localizing Designated Surface Keypoints

    Full text link
    In this paper, we present an accurate yet effective solution for 6D pose estimation from an RGB image. The core of our approach is that we first designate a set of surface points on target object model as keypoints and then train a keypoint detector (KPD) to localize them. Finally a PnP algorithm can recover the 6D pose according to the 2D-3D relationship of keypoints. Different from recent state-of-the-art CNN-based approaches that rely on a time-consuming post-processing procedure, our method can achieve competitive accuracy without any refinement after pose prediction. Meanwhile, we obtain a 30% relative improvement in terms of ADD accuracy among methods without using refinement. Moreover, we succeed in handling heavy occlusion by selecting the most confident keypoints to recover the 6D pose. For the sake of reproducibility, we will make our code and models publicly available soon

    Drought Stress Classification using 3D Plant Models

    Full text link
    Quantification of physiological changes in plants can capture different drought mechanisms and assist in selection of tolerant varieties in a high throughput manner. In this context, an accurate 3D model of plant canopy provides a reliable representation for drought stress characterization in contrast to using 2D images. In this paper, we propose a novel end-to-end pipeline including 3D reconstruction, segmentation and feature extraction, leveraging deep neural networks at various stages, for drought stress study. To overcome the high degree of self-similarities and self-occlusions in plant canopy, prior knowledge of leaf shape based on features from deep siamese network are used to construct an accurate 3D model using structure from motion on wheat plants. The drought stress is characterized with a deep network based feature aggregation. We compare the proposed methodology on several descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping (CVPPP), International Conference on Computer Vision (ICCV) 201

    A Hierarchical Distributed Processing Framework for Big Image Data

    Full text link
    This paper introduces an effective processing framework nominated ICP (Image Cloud Processing) to powerfully cope with the data explosion in image processing field. While most previous researches focus on optimizing the image processing algorithms to gain higher efficiency, our work dedicates to providing a general framework for those image processing algorithms, which can be implemented in parallel so as to achieve a boost in time efficiency without compromising the results performance along with the increasing image scale. The proposed ICP framework consists of two mechanisms, i.e. SICP (Static ICP) and DICP (Dynamic ICP). Specifically, SICP is aimed at processing the big image data pre-stored in the distributed system, while DICP is proposed for dynamic input. To accomplish SICP, two novel data representations named P-Image and Big-Image are designed to cooperate with MapReduce to achieve more optimized configuration and higher efficiency. DICP is implemented through a parallel processing procedure working with the traditional processing mechanism of the distributed system. Representative results of comprehensive experiments on the challenging ImageNet dataset are selected to validate the capacity of our proposed ICP framework over the traditional state-of-the-art methods, both in time efficiency and quality of results

    Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency

    Full text link
    In this paper, we introduce a novel unsupervised domain adaptation technique for the task of 3D keypoint prediction from a single depth scan or image. Our key idea is to utilize the fact that predictions from different views of the same or similar objects should be consistent with each other. Such view consistency can provide effective regularization for keypoint prediction on unlabeled instances. In addition, we introduce a geometric alignment term to regularize predictions in the target domain. The resulting loss function can be effectively optimized via alternating minimization. We demonstrate the effectiveness of our approach on real datasets and present experimental results showing that our approach is superior to state-of-the-art general-purpose domain adaptation techniques.Comment: ECCV 201

    Texture Object Segmentation Based on Affine Invariant Texture Detection

    Full text link
    To solve the issue of segmenting rich texture images, a novel detection methods based on the affine invariable principle is proposed. Considering the similarity between the texture areas, we first take the affine transform to get numerous shapes, and utilize the KLT algorithm to verify the similarity. The transforms include rotation, proportional transformation and perspective deformation to cope with a variety of situations. Then we propose an improved LBP method combining canny edge detection to handle the boundary in the segmentation process. Moreover, human-computer interaction of this method which helps splitting the matched texture area from the original images is user-friendly.Comment: 6pages, 15 figure

    ENFT: Efficient Non-Consecutive Feature Tracking for Robust Structure-from-Motion

    Full text link
    Structure-from-motion (SfM) largely relies on feature tracking. In image sequences, if disjointed tracks caused by objects moving in and out of the field of view, occasional occlusion, or image noise, are not handled well, corresponding SfM could be affected. This problem becomes severer for large-scale scenes, which typically requires to capture multiple sequences to cover the whole scene. In this paper, we propose an efficient non-consecutive feature tracking (ENFT) framework to match interrupted tracks distributed in different subsequences or even in different videos. Our framework consists of steps of solving the feature `dropout' problem when indistinctive structures, noise or large image distortion exists, and of rapidly recognizing and joining common features located in different subsequences. In addition, we contribute an effective segment-based coarse-to-fine SfM algorithm for robustly handling large datasets. Experimental results on challenging video data demonstrate the effectiveness of the proposed system.Comment: 15 pages, 12 figure

    Simultaneous Joint and Object Trajectory Templates for Human Activity Recognition from 3-D Data

    Full text link
    The availability of low-cost range sensors and the development of relatively robust algorithms for the extraction of skeleton joint locations have inspired many researchers to develop human activity recognition methods using the 3-D data. In this paper, an effective method for the recognition of human activities from the normalized joint trajectories is proposed. We represent the actions as multidimensional signals and introduce a novel method for generating action templates by averaging the samples in a "dynamic time" sense. Then in order to deal with the variations in the speed and style of performing actions, we warp the samples to the action templates by an efficient algorithm and employ wavelet filters to extract meaningful spatiotemporal features. The proposed method is also capable of modeling the human-object interactions, by performing the template generation and temporal warping procedure via the joint and object trajectories simultaneously. The experimental evaluation on several challenging datasets demonstrates the effectiveness of our method compared to the state-of-the-arts

    Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

    Get PDF
    This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly
    • …
    corecore