12 research outputs found

    A Parametric Algorithm for Skyline Extraction

    Get PDF
    International audienceThis paper is dedicated to the problem of automatic skyline extraction in digital images. The study is motivated by the needs, expressed by urbanists, to describe in terms of geometrical features, the global shape created by man-made buildings in urban areas. Skyline extraction has been widely studied for navigation of Unmanned Aerial Vehicles (drones) or for geolocalization, both in natural and urban contexts. In most of these studies, the skyline is defined by the limit between sky and ground objects, and can thus be resumed to the sky segmentation problem in images. In our context, we need a more generic definition of skyline, which makes its extraction more complex and even variable. The skyline can be extracted for different depths, depending on the interest of the user (far horizon, intermediate buildings, near constructions , ...), and thus requires a human interaction. The main steps of our method are as follows: we use a Canny filter to extract edges and allow the user to interact with filter's parameters. With a high sensitivity , all the edges will be detected, whereas with lower values, only most contrasted contours will be kept by the filter. From the obtained edge map, an upper envelope is extracted, which is a disconnected approximation of the skyline. A graph is then constructed and a shortest path algorithm is used to link discontinuities. Our approach has been tested on several public domain urban and natural databases, and have proven to give better results that previously published methods

    Semantic Cross-View Matching

    Full text link
    Matching cross-view images is challenging because the appearance and viewpoints are significantly different. While low-level features based on gradient orientations or filter responses can drastically vary with such changes in viewpoint, semantic information of images however shows an invariant characteristic in this respect. Consequently, semantically labeled regions can be used for performing cross-view matching. In this paper, we therefore explore this idea and propose an automatic method for detecting and representing the semantic information of an RGB image with the goal of performing cross-view matching with a (non-RGB) geographic information system (GIS). A segmented image forms the input to our system with segments assigned to semantic concepts such as traffic signs, lakes, roads, foliage, etc. We design a descriptor to robustly capture both, the presence of semantic concepts and the spatial layout of those segments. Pairwise distances between the descriptors extracted from the GIS map and the query image are then used to generate a shortlist of the most promising locations with similar semantic concepts in a consistent spatial layout. An experimental evaluation with challenging query images and a large urban area shows promising results

    PlaNet - Photo Geolocation with Convolutional Neural Networks

    Full text link
    Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model

    Disparate View Matching

    Get PDF
    Matching of disparate views has gained significance in computer vision due to its role in many novel application areas. Being able to match images of the same scene captured during day and night, between a historic and contemporary picture of a scene, and between aerial and ground-level views of a building facade all enable novel applications ranging from loop-closure detection for structure-from-motion and re-photography to geo-localization of a street-level image using reference imagery captured from the air. The goal of this work is to develop novel features and methods that address matching problems where direct appearance-based correspondences are either difficult to obtain or infeasible because of the lack of appearance similarity altogether. To address these problems, we propose methods that span the appearance-geometry spectrum in terms of both the use of these cues as well as the ability of each method to handle variations in appearance and geometry. First, we consider the problem of geo-localization of a query street-level image using a reference database of building facades captured from a bird\u27s eye view. To address this wide-baseline facade matching problem, a novel scale-selective self-similarity feature that avoids direct comparison of appearance between disparate facade images is presented. Next, to address image matching problems with more extreme appearance variation, a novel representation for matchable images expressed in terms of the eigen-functions of the joint graph of the two images is presented. This representation is used to derive features that are persistent across wide variations in appearance. Next, the problem setting of matching between a street-level image and a digital elevation map (DEM) is considered. Given the limited appearance information available in this scenario, the matching approach has to rely more significantly on geometric cues. Therefore, a purely geometric method to establish correspondences between building corners in the DEM and the visible corners in the query image is presented. Finally, to generalize this problem setting we address the problem of establishing correspondences between 3D and 2D point clouds using geometric means alone. A novel framework for incorporating purely geometric constraints into a higher-order graph matching framework is presented with specific formulations for the three-point calibrated absolute camera pose problem (P3P), two-point upright camera pose problem (Up2p) and the three-plus-one relative camera pose problem

    The Geometry and Usage of the Supplementary Fisheye Lenses in Smartphones

    Get PDF
    Nowadays, mobile phones are more than a device that can only satisfy the communication need between people. Since fisheye lenses integrated with mobile phones are lightweight and easy to use, they are advantageous. In addition to this advantage, it is experimented whether fisheye lens and mobile phone combination can be used in a photogrammetric way, and if so, what will be the result. Fisheye lens equipment used with mobile phones was tested in this study. For this, standard calibration of ‘Olloclip 3 in one’ fisheye lens used with iPhone 4S mobile phone and ‘Nikon FC‐E9’ fisheye lens used with Nikon Coolpix8700 are compared based on equidistant model. This experimental study shows that Olloclip 3 in one fisheye lens developed for mobile phones has at least the similar characteristics with classic fisheye lenses. The dimensions of fisheye lenses used with smart phones are getting smaller and the prices are reducing. Moreover, as verified in this study, the accuracy of fisheye lenses used in smartphones is better than conventional fisheye lenses. The use of smartphones with fisheye lenses will give the possibility of practical applications to ordinary users in the near future

    Deep Probabilistic Models for Camera Geo-Calibration

    Get PDF
    The ultimate goal of image understanding is to transfer visual images into numerical or symbolic descriptions of the scene that are helpful for decision making. Knowing when, where, and in which direction a picture was taken, the task of geo-calibration makes it possible to use imagery to understand the world and how it changes in time. Current models for geo-calibration are mostly deterministic, which in many cases fails to model the inherent uncertainties when the image content is ambiguous. Furthermore, without a proper modeling of the uncertainty, subsequent processing can yield overly confident predictions. To address these limitations, we propose a probabilistic model for camera geo-calibration using deep neural networks. While our primary contribution is geo-calibration, we also show that learning to geo-calibrate a camera allows us to implicitly learn to understand the content of the scene
    corecore