2,884 research outputs found

    Appearance Descriptors for Person Re-identification: a Comprehensive Review

    Full text link
    In video-surveillance, person re-identification is the task of recognising whether an individual has already been observed over a network of cameras. Typically, this is achieved by exploiting the clothing appearance, as classical biometric traits like the face are impractical in real-world video surveillance scenarios. Clothing appearance is represented by means of low-level \textit{local} and/or \textit{global} features of the image, usually extracted according to some part-based body model to treat different body parts (e.g. torso and legs) independently. This paper provides a comprehensive review of current approaches to build appearance descriptors for person re-identification. The most relevant techniques are described in detail, and categorised according to the body models and features used. The aim of this work is to provide a structured body of knowledge and a starting point for researchers willing to conduct novel investigations on this challenging topic

    Drought Stress Classification using 3D Plant Models

    Full text link
    Quantification of physiological changes in plants can capture different drought mechanisms and assist in selection of tolerant varieties in a high throughput manner. In this context, an accurate 3D model of plant canopy provides a reliable representation for drought stress characterization in contrast to using 2D images. In this paper, we propose a novel end-to-end pipeline including 3D reconstruction, segmentation and feature extraction, leveraging deep neural networks at various stages, for drought stress study. To overcome the high degree of self-similarities and self-occlusions in plant canopy, prior knowledge of leaf shape based on features from deep siamese network are used to construct an accurate 3D model using structure from motion on wheat plants. The drought stress is characterized with a deep network based feature aggregation. We compare the proposed methodology on several descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping (CVPPP), International Conference on Computer Vision (ICCV) 201

    DF-SLAM: A Deep-Learning Enhanced Visual SLAM System based on Deep Local Features

    Full text link
    As the foundation of driverless vehicle and intelligent robots, Simultaneous Localization and Mapping(SLAM) has attracted much attention these days. However, non-geometric modules of traditional SLAM algorithms are limited by data association tasks and have become a bottleneck preventing the development of SLAM. To deal with such problems, many researchers seek to Deep Learning for help. But most of these studies are limited to virtual datasets or specific environments, and even sacrifice efficiency for accuracy. Thus, they are not practical enough. We propose DF-SLAM system that uses deep local feature descriptors obtained by the neural network as a substitute for traditional hand-made features. Experimental results demonstrate its improvements in efficiency and stability. DF-SLAM outperforms popular traditional SLAM systems in various scenes, including challenging scenes with intense illumination changes. Its versatility and mobility fit well into the need for exploring new environments. Since we adopt a shallow network to extract local descriptors and remain others the same as original SLAM systems, our DF-SLAM can still run in real-time on GPU

    A Performance Evaluation of Local Features for Image Based 3D Reconstruction

    Full text link
    This paper performs a comprehensive and comparative evaluation of the state of the art local features for the task of image based 3D reconstruction. The evaluated local features cover the recently developed ones by using powerful machine learning techniques and the elaborately designed handcrafted features. To obtain a comprehensive evaluation, we choose to include both float type features and binary ones. Meanwhile, two kinds of datasets have been used in this evaluation. One is a dataset of many different scene types with groundtruth 3D points, containing images of different scenes captured at fixed positions, for quantitative performance evaluation of different local features in the controlled image capturing situations. The other dataset contains Internet scale image sets of several landmarks with a lot of unrelated images, which is used for qualitative performance evaluation of different local features in the free image collection situations. Our experimental results show that binary features are competent to reconstruct scenes from controlled image sequences with only a fraction of processing time compared to use float type features. However, for the case of large scale image set with many distracting images, float type features show a clear advantage over binary ones

    From BoW to CNN: Two Decades of Texture Representation for Texture Classification

    Full text link
    Texture is a fundamental characteristic of many types of images, and texture representation is one of the essential and challenging problems in computer vision and pattern recognition which has attracted extensive research attention. Since 2000, texture representations based on Bag of Words (BoW) and on Convolutional Neural Networks (CNNs) have been extensively studied with impressive performance. Given this period of remarkable evolution, this paper aims to present a comprehensive survey of advances in texture representation over the last two decades. More than 200 major publications are cited in this survey covering different aspects of the research, which includes (i) problem description; (ii) recent advances in the broad categories of BoW-based, CNN-based and attribute-based methods; and (iii) evaluation issues, specifically benchmark datasets and state of the art results. In retrospect of what has been achieved so far, the survey discusses open challenges and directions for future research.Comment: Accepted by IJC

    Orientation Driven Bag of Appearances for Person Re-identification

    Full text link
    Person re-identification (re-id) consists of associating individual across camera network, which is valuable for intelligent video surveillance and has drawn wide attention. Although person re-identification research is making progress, it still faces some challenges such as varying poses, illumination and viewpoints. For feature representation in re-identification, existing works usually use low-level descriptors which do not take full advantage of body structure information, resulting in low representation ability. %discrimination. To solve this problem, this paper proposes the mid-level body-structure based feature representation (BSFR) which introduces body structure pyramid for codebook learning and feature pooling in the vertical direction of human body. Besides, varying viewpoints in the horizontal direction of human body usually causes the data missing problem, i.e.i.e., the appearances obtained in different orientations of the identical person could vary significantly. To address this problem, the orientation driven bag of appearances (ODBoA) is proposed to utilize person orientation information extracted by orientation estimation technic. To properly evaluate the proposed approach, we introduce a new re-identification dataset (Market-1203) based on the Market-1501 dataset and propose a new re-identification dataset (PKU-Reid). Both datasets contain multiple images captured in different body orientations for each person. Experimental results on three public datasets and two proposed datasets demonstrate the superiority of the proposed approach, indicating the effectiveness of body structure and orientation information for improving re-identification performance.Comment: 13 pages, 15 figures, 3 tables, submitted to IEEE Transactions on Circuits and Systems for Video Technolog

    Learning a 3D descriptor for cross-source point cloud registration from synthetic data

    Full text link
    As the development of 3D sensors, registration of 3D data (e.g. point cloud) coming from different kind of sensor is dispensable and shows great demanding. However, point cloud registration between different sensors is challenging because of the variant of density, missing data, different viewpoint, noise and outliers, and geometric transformation. In this paper, we propose a method to learn a 3D descriptor for finding the correspondent relations between these challenging point clouds. To train the deep learning framework, we use synthetic 3D point cloud as input. Starting from synthetic dataset, we use region-based sampling method to select reasonable, large and diverse training samples from synthetic samples. Then, we use data augmentation to extend our network be robust to rotation transformation. We focus our work on more general cases that point clouds coming from different sensors, named cross-source point cloud. The experiments show that our descriptor is not only able to generalize to new scenes, but also generalize to different sensors. The results demonstrate that the proposed method successfully aligns two 3D cross-source point clouds which outperforms state-of-the-art method

    Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation

    Get PDF
    How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all'' approach that satisfies all requirements. In recent years, the rising popularity of deep learning has resulted in a myriad of end-to-end solutions to many computer vision problems. These approaches, while successful, tend to lack scalability and can't easily exploit information learned by other systems. Instead, we propose SAND features, a dedicated deep learning solution to feature extraction capable of providing hierarchical context information. This is achieved by employing sparse relative labels indicating relationships of similarity/dissimilarity between image locations. The nature of these labels results in an almost infinite set of dissimilar examples to choose from. We demonstrate how the selection of negative examples during training can be used to modify the feature space and vary it's properties. To demonstrate the generality of this approach, we apply the proposed features to a multitude of tasks, each requiring different properties. This includes disparity estimation, semantic segmentation, self-localisation and SLAM. In all cases, we show how incorporating SAND features results in better or comparable results to the baseline, whilst requiring little to no additional training. Code can be found at: https://github.com/jspenmar/SAND_featuresComment: CVPR201

    Data-Driven Shape Analysis and Processing

    Full text link
    Data-driven methods play an increasingly important role in discovering geometric, structural, and semantic relationships between 3D shapes in collections, and applying this analysis to support intelligent modeling, editing, and visualization of geometric data. In contrast to traditional approaches, a key feature of data-driven approaches is that they aggregate information from a collection of shapes to improve the analysis and processing of individual shapes. In addition, they are able to learn models that reason about properties and relationships of shapes without relying on hard-coded rules or explicitly programmed instructions. We provide an overview of the main concepts and components of these techniques, and discuss their application to shape classification, segmentation, matching, reconstruction, modeling and exploration, as well as scene analysis and synthesis, through reviewing the literature and relating the existing works with both qualitative and numerical comparisons. We conclude our report with ideas that can inspire future research in data-driven shape analysis and processing.Comment: 10 pages, 19 figure

    Recovering 6D Object Pose and Predicting Next-Best-View in the Crowd

    Full text link
    Object detection and 6D pose estimation in the crowd (scenes with multiple object instances, severe foreground occlusions and background distractors), has become an important problem in many rapidly evolving technological areas such as robotics and augmented reality. Single shot-based 6D pose estimators with manually designed features are still unable to tackle the above challenges, motivating the research towards unsupervised feature learning and next-best-view estimation. In this work, we present a complete framework for both single shot-based 6D object pose estimation and next-best-view prediction based on Hough Forests, the state of the art object pose estimator that performs classification and regression jointly. Rather than using manually designed features we a) propose an unsupervised feature learnt from depth-invariant patches using a Sparse Autoencoder and b) offer an extensive evaluation of various state of the art features. Furthermore, taking advantage of the clustering performed in the leaf nodes of Hough Forests, we learn to estimate the reduction of uncertainty in other views, formulating the problem of selecting the next-best-view. To further improve pose estimation, we propose an improved joint registration and hypotheses verification module as a final refinement step to reject false detections. We provide two additional challenging datasets inspired from realistic scenarios to extensively evaluate the state of the art and our framework. One is related to domestic environments and the other depicts a bin-picking scenario mostly found in industrial settings. We show that our framework significantly outperforms state of the art both on public and on our datasets.Comment: CVPR 2016 accepted paper, project page: http://www.iis.ee.ic.ac.uk/rkouskou/6D_NBV.htm
    • …
    corecore