25 research outputs found

    DART: Distribution Aware Retinal Transform for Event-based Cameras

    Full text link
    We introduce a generic visual descriptor, termed as distribution aware retinal transform (DART), that encodes the structural context using log-polar grids for event cameras. The DART descriptor is applied to four different problems, namely object classification, tracking, detection and feature matching: (1) The DART features are directly employed as local descriptors in a bag-of-features classification framework and testing is carried out on four standard event-based object datasets (N-MNIST, MNIST-DVS, CIFAR10-DVS, NCaltech-101). (2) Extending the classification system, tracking is demonstrated using two key novelties: (i) For overcoming the low-sample problem for the one-shot learning of a binary classifier, statistical bootstrapping is leveraged with online learning; (ii) To achieve tracker robustness, the scale and rotation equivariance property of the DART descriptors is exploited for the one-shot learning. (3) To solve the long-term object tracking problem, an object detector is designed using the principle of cluster majority voting. The detection scheme is then combined with the tracker to result in a high intersection-over-union score with augmented ground truth annotations on the publicly available event camera dataset. (4) Finally, the event context encoded by DART greatly simplifies the feature correspondence problem, especially for spatio-temporal slices far apart in time, which has not been explicitly tackled in the event-based vision domain.Comment: 12 pages, revision submitted to TPAMI in Nov 201

    Scale-invariance in local heat kernel descriptors without scale selection and normalization

    Get PDF
    Today, only a small fraction of Internet repositories of geometric data is accessible through text search. Fast growth of these repositories makes content-based retrieval one of the next grand challenges in search and organization of such information. Particularly difficult is the problem of \emph{shape retrieval}, as geometric shapes manifest a vast variability due to different scale, orientation, non-rigid deformations, missing data, and also appear in a variety of different formats and representations. One of the biggest challenges in non-rigid shape retrieval and comparison is the design of a shape descriptor that would maintain invariance under a wide class of transformations the shape can undergo. Recently, heat kernel signature was introduced as an intrinsic local shape descriptor based on diffusion scale-space analysis. In this paper, we develop a scale-invariant version of the heat kernel descriptor. Our construction is based on a logarithmically sampled scale-space in which shape scaling corresponds, up to a multiplicative constant, to a translation. This translation is undone using the magnitude of the Fourier transform. The proposed scale-invariant local descriptors can be used in the bag-of-features framework for shape retrieval in the presence of transformations such as isometric deformations, missing data, topological noise, and global and local scaling. We get significant performance improvement over state-of-the-art algorithms on recently established non-rigid shape retrieval benchmarks

    ShapeNet: Convolutional Neural Networks on Non-Euclidean Manifolds

    Get PDF
    Feature descriptors play a crucial role in a wide range of geometry analysis and processing applications, including shape correspondence, retrieval, and segmentation. In this paper, we propose ShapeNet, a generalization of the popular convolutional neural networks (CNN) paradigm to non-Euclidean manifolds. Our construction is based on a local geodesic system of polar coordinates to extract "patches", which are then passed through a cascade of filters and linear and non-linear operators. The coefficients of the filters and linear combination weights are optimization variables that are learned to minimize a task-specific cost function. We use ShapeNet to learn invariant shape feature descriptors that significantly outperform recent state-of-the-art methods, and show that previous approaches such as heat and wave kernel signatures, optimal spectral descriptors, and intrinsic shape contexts can be obtained as particular configurations of ShapeNet

    Automatic Alignment of 3D Multi-Sensor Point Clouds

    Get PDF
    Automatic 3D point cloud alignment is a major research topic in photogrammetry, computer vision and computer graphics. In this research, two keypoint feature matching approaches have been developed and proposed for the automatic alignment of 3D point clouds, which have been acquired from different sensor platforms and are in different 3D conformal coordinate systems. The first proposed approach is based on 3D keypoint feature matching. First, surface curvature information is utilized for scale-invariant 3D keypoint extraction. Adaptive non-maxima suppression (ANMS) is then applied to retain the most distinct and well-distributed set of keypoints. Afterwards, every keypoint is characterized by a scale, rotation and translation invariant 3D surface descriptor, called the radial geodesic distance-slope histogram. Similar keypoints descriptors on the source and target datasets are then matched using bipartite graph matching, followed by a modified-RANSAC for outlier removal. The second proposed method is based on 2D keypoint matching performed on height map images of the 3D point clouds. Height map images are generated by projecting the 3D point clouds onto a planimetric plane. Afterwards, a multi-scale wavelet 2D keypoint detector with ANMS is proposed to extract keypoints on the height maps. Then, a scale, rotation and translation-invariant 2D descriptor referred to as the Gabor, Log-Polar-Rapid Transform descriptor is computed for all keypoints. Finally, source and target height map keypoint correspondences are determined using a bi-directional nearest neighbour matching, together with the modified-RANSAC for outlier removal. Each method is assessed on multi-sensor, urban and non-urban 3D point cloud datasets. Results show that unlike the 3D-based method, the height map-based approach is able to align source and target datasets with differences in point density, point distribution and missing point data. Findings also show that the 3D-based method obtained lower transformation errors and a greater number of correspondences when the source and target have similar point characteristics. The 3D-based approach attained absolute mean alignment differences in the range of 0.23m to 2.81m, whereas the height map approach had a range from 0.17m to 1.21m. These differences meet the proximity requirements of the data characteristics and the further application of fine co-registration approaches

    Dense Scale Invariant Descriptors for Images and Surfaces

    Get PDF
    Local descriptors are ubiquitous in image and shape analysis, as they allow the compact and robust description of the local content of a signal (image or 3D shape). A common problem that emerges in the computation of local descriptors is the variability of the signal scale. The standard approach to cope with this is scale selection, which consists in estimating a characteristic scale around the few image or shape points where scale estimation can be performed reliably. However, it is often desired to have a scale-invariant descriptor that can be constructed densely, namely at every point of the image or 3D shape. In this work, we construct scale-invariant signal descriptors by introducing a method that does not rely on scale selection; this allows us to apply our method at any point. Our method relies on a combination of logarithmic sampling with multi-scale signal processing that turns scaling in the original signal domain into a translation in a new domain. Scale invariance can then be guaranteed by computing the Fourier transform magnitude (FTM), which is unaffected by signal translations. We use our technique to construct scale- and rotation- invariant descriptors for images and scale- and isometry-invariant descriptors for 3D surfaces, and demonstrate that our descriptors outperform state-of-the-art descriptors on standard benchmarks.Les descripteurs locaux sont omniprésents dans l'analyse d'image et de la forme, car ils permettent la description compacte et robuste du contenu local d'un signal (image ou une forme 3D). Un problème commun qui se dégage dans le calcul de descripteurs locaux est la variabilité de l'échelle du signal. L'approche standard pour faire face à cette probleme est la sélection d'échelle, qui consiste à estimer une échelle caractéristique autour des ces points d'image ou de la forme où l'estimation échelle peuvent être réalisées de manière fiable. Cependant, il est souvent souhaité d'avoir un descripteur invariant d'échelle qui peut être construit densément, soit à chaque point de l'image ou la forme 3D. Dans ce travail, nous construisons des descripteurs de signaux invariante d'échelle par l'introduction d'une méthode qui ne repose pas sur la sélection d'échelle; ce qui nous permet d'appliquer notre méthode à un point quelconque. Notre méthode repose sur une combinaison de l'échantillonnage logarithmique avec le traitement du signal multi-échelle qui transforme le changement d'échelle dans le domaine du signal original dans une translation dans un nouveau domaine. L'invariance d'échelle peut être garanti par le calcul de la magnitude de la transformée de Fourier (Fourier Transform Modulus -FTM), qui n'est pas affecté par les translations du signal. Nous utilisons notre technique pour construire descripteurs invariantes de l'échelle et la rotation pour les images et les descripteurs invariantes de l'échelle et l'isométrie pour les surfaces 3D, et de démontrer que nos descripteurs peuvent surperformer l'état de l'art des descripteurs sur les benchmarks standards

    Feature Fusion for Fingerprint Liveness Detection

    Get PDF
    For decades, fingerprints have been the most widely used biometric trait in identity recognition systems, thanks to their natural uniqueness, even in rare cases such as identical twins. Recently, we witnessed a growth in the use of fingerprint-based recognition systems in a large variety of devices and applications. This, as a consequence, increased the benefits for offenders capable of attacking these systems. One of the main issues with the current fingerprint authentication systems is that, even though they are quite accurate in terms of identity verification, they can be easily spoofed by presenting to the input sensor an artificial replica of the fingertip skin’s ridge-valley patterns. Due to the criticality of this threat, it is crucial to develop countermeasure methods capable of facing and preventing these kind of attacks. The most effective counter–spoofing methods are those trying to distinguish between a "live" and a "fake" fingerprint before it is actually submitted to the recognition system. According to the technology used, these methods are mainly divided into hardware and software-based systems. Hardware-based methods rely on extra sensors to gain more pieces of information regarding the vitality of the fingerprint owner. On the contrary, software-based methods merely rely on analyzing the fingerprint images acquired by the scanner. Software-based methods can then be further divided into dynamic, aimed at analyzing sequences of images to capture those vital signs typical of a real fingerprint, and static, which process a single fingerprint impression. Among these different approaches, static software-based methods come with three main benefits. First, they are cheaper, since they do not require the deployment of any additional sensor to perform liveness detection. Second, they are faster since the information they require is extracted from the same input image acquired for the identification task. Third, they are potentially capable of tackling novel forms of attack through an update of the software. The interest in this type of counter–spoofing methods is at the basis of this dissertation, which addresses the fingerprint liveness detection under a peculiar perspective, which stems from the following consideration. Generally speaking, this problem has been tackled in the literature with many different approaches. Most of them are based on first identifying the most suitable image features for the problem in analysis and, then, into developing some classification system based on them. In particular, most of the published methods rely on a single type of feature to perform this task. Each of this individual features can be more or less discriminative and often highlights some peculiar characteristics of the data in analysis, often complementary with that of other feature. Thus, one possible idea to improve the classification accuracy is to find effective ways to combine them, in order to mutually exploit their individual strengths and soften, at the same time, their weakness. However, such a "multi-view" approach has been relatively overlooked in the literature. Based on the latter observation, the first part of this work attempts to investigate proper feature fusion methods capable of improving the generalization and robustness of fingerprint liveness detection systems and enhance their classification strength. Then, in the second part, it approaches the feature fusion method in a different way, that is by first dividing the fingerprint image into smaller parts, then extracting an evidence about the liveness of each of these patches and, finally, combining all these pieces of information in order to take the final classification decision. The different approaches have been thoroughly analyzed and assessed by comparing their results (on a large number of datasets and using the same experimental protocol) with that of other works in the literature. The experimental results discussed in this dissertation show that the proposed approaches are capable of obtaining state–of–the–art results, thus demonstrating their effectiveness
    corecore