71,338 research outputs found

    Improving Object Localization with Fitness NMS and Bounded IoU Loss

    Full text link
    We demonstrate that many detection methods are designed to identify only a sufficently accurate bounding box, rather than the best available one. To address this issue we propose a simple and fast modification to the existing methods called Fitness NMS. This method is tested with the DeNet model and obtains a significantly improved MAP at greater localization accuracies without a loss in evaluation rate, and can be used in conjunction with Soft NMS for additional improvements. Next we derive a novel bounding box regression loss based on a set of IoU upper bounds that better matches the goal of IoU maximization while still providing good convergence properties. Following these novelties we investigate RoI clustering schemes for improving evaluation rates for the DeNet wide model variants and provide an analysis of localization performance at various input image dimensions. We obtain a MAP of 33.6%@79Hz and 41.8%@5Hz for MSCOCO and a Titan X (Maxwell). Source code available from: https://github.com/lachlants/denetComment: CVPR2018 Main Conference (Poster

    From 3D Point Clouds to Pose-Normalised Depth Maps

    Get PDF
    We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)

    Towards Automatic SAR-Optical Stereogrammetry over Urban Areas using Very High Resolution Imagery

    Full text link
    In this paper we discuss the potential and challenges regarding SAR-optical stereogrammetry for urban areas, using very-high-resolution (VHR) remote sensing imagery. Since we do this mainly from a geometrical point of view, we first analyze the height reconstruction accuracy to be expected for different stereogrammetric configurations. Then, we propose a strategy for simultaneous tie point matching and 3D reconstruction, which exploits an epipolar-like search window constraint. To drive the matching and ensure some robustness, we combine different established handcrafted similarity measures. For the experiments, we use real test data acquired by the Worldview-2, TerraSAR-X and MEMPHIS sensors. Our results show that SAR-optical stereogrammetry using VHR imagery is generally feasible with 3D positioning accuracies in the meter-domain, although the matching of these strongly hetereogeneous multi-sensor data remains very challenging. Keywords: Synthetic Aperture Radar (SAR), optical images, remote sensing, data fusion, stereogrammetr

    Generic 3D Representation via Pose Estimation and Matching

    Full text link
    Though a large body of computer vision research has investigated developing generic semantic representations, efforts towards developing a similar representation for 3D has been limited. In this paper, we learn a generic 3D representation through solving a set of foundational proxy 3D tasks: object-centric camera pose estimation and wide baseline feature matching. Our method is based upon the premise that by providing supervision over a set of carefully selected foundational tasks, generalization to novel tasks and abstraction capabilities can be achieved. We empirically show that the internal representation of a multi-task ConvNet trained to solve the above core problems generalizes to novel 3D tasks (e.g., scene layout estimation, object pose estimation, surface normal estimation) without the need for fine-tuning and shows traits of abstraction abilities (e.g., cross-modality pose estimation). In the context of the core supervised tasks, we demonstrate our representation achieves state-of-the-art wide baseline feature matching results without requiring apriori rectification (unlike SIFT and the majority of learned features). We also show 6DOF camera pose estimation given a pair local image patches. The accuracy of both supervised tasks come comparable to humans. Finally, we contribute a large-scale dataset composed of object-centric street view scenes along with point correspondences and camera pose information, and conclude with a discussion on the learned representation and open research questions.Comment: Published in ECCV16. See the project website http://3drepresentation.stanford.edu/ and dataset website https://github.com/amir32002/3D_Street_Vie

    Contour Generator Points for Threshold Selection and a Novel Photo-Consistency Measure for Space Carving

    Full text link
    Space carving has emerged as a powerful method for multiview scene reconstruction. Although a wide variety of methods have been proposed, the quality of the reconstruction remains highly-dependent on the photometric consistency measure, and the threshold used to carve away voxels. In this paper, we present a novel photo-consistency measure that is motivated by a multiset variant of the chamfer distance. The new measure is robust to high amounts of within-view color variance and also takes into account the projection angles of back-projected pixels. Another critical issue in space carving is the selection of the photo-consistency threshold used to determine what surface voxels are kept or carved away. In this paper, a reliable threshold selection technique is proposed that examines the photo-consistency values at contour generator points. Contour generators are points that lie on both the surface of the object and the visual hull. To determine the threshold, a percentile ranking of the photo-consistency values of these generator points is used. This improved technique is applicable to a wide variety of photo-consistency measures, including the new measure presented in this paper. Also presented in this paper is a method to choose between photo-consistency measures, and voxel array resolutions prior to carving using receiver operating characteristic (ROC) curves
    • …
    corecore