649 research outputs found

    Efficient Error-Correcting Geocoding

    Get PDF
    We study the problem of resolving a perhaps misspelled address of a location into geographic coordinates of latitude and longitude. Our data structure solves this problem within a few milliseconds even for misspelled and fragmentary queries. Compared to major geographic search engines such as Google or Bing we achieve results of significantly better quality

    Engineering efficient error-correcting geocoding

    Get PDF

    Historical collaborative geocoding

    Full text link
    The latest developments in digital have provided large data sets that can increasingly easily be accessed and used. These data sets often contain indirect localisation information, such as historical addresses. Historical geocoding is the process of transforming the indirect localisation information to direct localisation that can be placed on a map, which enables spatial analysis and cross-referencing. Many efficient geocoders exist for current addresses, but they do not deal with the temporal aspect and are based on a strict hierarchy (..., city, street, house number) that is hard or impossible to use with historical data. Indeed historical data are full of uncertainties (temporal aspect, semantic aspect, spatial precision, confidence in historical source, ...) that can not be resolved, as there is no way to go back in time to check. We propose an open source, open data, extensible solution for geocoding that is based on the building of gazetteers composed of geohistorical objects extracted from historical topographical maps. Once the gazetteers are available, geocoding an historical address is a matter of finding the geohistorical object in the gazetteers that is the best match to the historical address. The matching criteriae are customisable and include several dimensions (fuzzy semantic, fuzzy temporal, scale, spatial precision ...). As the goal is to facilitate historical work, we also propose web-based user interfaces that help geocode (one address or batch mode) and display over current or historical topographical maps, so that they can be checked and collaboratively edited. The system is tested on Paris city for the 19-20th centuries, shows high returns rate and is fast enough to be used interactively.Comment: WORKING PAPE

    An effective and efficient approach for manually improving geocoded data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The process of geocoding produces output coordinates of varying degrees of quality. Previous studies have revealed that simply excluding records with low-quality geocodes from analysis can introduce significant bias, but depending on the number and severity of the inaccuracies, their inclusion may also lead to bias. Little quantitative research has been presented on the cost and/or effectiveness of correcting geocodes through manual interactive processes, so the most cost effective methods for improving geocoded data are unclear. The present work investigates the time and effort required to correct geocodes contained in five health-related datasets that represent examples of data commonly used in Health GIS.</p> <p>Results</p> <p>Geocode correction was attempted on five health-related datasets containing a total of 22,317 records. The complete processing of these data took 11.4 weeks (427 hours), averaging 69 seconds of processing time per record. Overall, the geocodes associated with 12,280 (55%) of records were successfully improved, taking 95 seconds of processing time per corrected record on average across all five datasets. Geocode correction improved the overall match rate (the number of successful matches out of the total attempted) from 79.3 to 95%. The spatial shift between the location of original successfully matched geocodes and their corrected improved counterparts averaged 9.9 km per corrected record. After geocode correction the number of city and USPS ZIP code accuracy geocodes were reduced from 10,959 and 1,031 to 6,284 and 200, respectively, while the number of building centroid accuracy geocodes increased from 0 to 2,261.</p> <p>Conclusion</p> <p>The results indicate that manual geocode correction using a web-based interactive approach is a feasible and cost effective method for improving the quality of geocoded data. The level of effort required varies depending on the type of data geocoded. These results can be used to choose between data improvement options (e.g., manual intervention, pseudocoding/geo-imputation, field GPS readings).</p

    Improving an Open Source Geocoding Server

    Get PDF
    A common problem in geocoding is that the postal addresses as requested by the user differ from the addresses as described in the database. The online, open source geocoder called Nominatim is one of the most used geocoders nowadays. However, this geocoder lacks the interactivity that most of the online geocoders already offer. The Nominatim geocoder provides no feedback to the user while typing addresses. Also, the geocoder cannot deal with any misspelling errors introduced by the user in the requested address. This thesis is about extending the functionality of the Nominatim geocoder to provide fuzzy search and autocomplete features. In this work I propose a new index and search strategy for the OpenStreetMap reference dataset. Also, I extend the search algorithm to geocode new address types such as street intersections. Both the original Nominatim geocoder and the proposed solution are compared using metrics such as the precision of the results, match rate and keystrokes saved by the autocomplete feature. The test addresses used in this work are a subset selected among the Swedish addresses available in the OpenStreetMap data set. The results show that the proposed geocoder performs better when compared to the original Nominatim geocoder. In the proposed geocoder, the users get address suggestions as they type, adding interactivity to the original geocoder. Also, the proposed geocoder is able to find the right address in the presence of errors in the user query with a match rate of 98%.The demand of geospatial information is increasing during the last years. There are more and more mobile applications and services that require from the users to enter some information about where they are, or the address of the place they want to find for example. The systems that convert postal addresses or place descriptions into coordinates are called geocoders. How good or bad a geocoder is not only depends on the information the geocoder contains, but also on how easy is for the users to find the desired addresses. There are many well-known web sites that we use in our everyday life to find the location of an address. For example sites like Google Maps, Bing Maps or Yahoo Maps are accessed by millions of users every day to use such services. Among the main features of the mentioned geocoders are the ability to predict the address the user is writing in the search box, and sometimes even to correct any misspellings introduced by the user. To make it more complicated, the predictions and error corrections these systems perform are done in real time. The owners of these address search engines usually impose some restrictions on the number of addresses a user is allowed to search monthly, above which the user needs to pay a fee in order to keep using the system. This limit is usually high enough for the end user, but it might not be enough for the software developers that want to use geospatial data in their products. There is a free alternative to the address search engines mentioned above called Nominatim. Nominatim is an open source project whose purpose is to search addresses among the OpenStreetMap dataset. OpenStreetMap is a collaborative project that tries to map places in the real world into coordinates. The main drawback of Nominatim is that the usability is not as good as the competitors. Nominatim is unable to find addresses that are not correctly spelled, neither predicts the user needs. In order for this address search engine to be among the most used the prediction and error correction features need to be added. In this thesis work I extend the search algorithms of Nominatim to add the functionality mentioned above. The address search engine proposed in this thesis offers a free and open source alternative to users and systems that require access to geospatial data without restrictions

    a Berlin case study

    Get PDF
    Durch den Prozess der Urbanisierung verändert die Menschheit die Erdoberfläche in großem Ausmaß und auf unwiederbringliche Weise. Die optische Fernerkundung ist eine Art der Erdbeobachtung, die das Verständnis dieses dynamischen Prozesses und seiner Auswirkungen erweitern kann. Die vorliegende Arbeit untersucht, inwiefern hyperspektrale Daten Informationen über Versiegelung liefern können, die der integrierten Analyse urbaner Mensch-Umwelt-Beziehungen dienen. Hierzu wird die Verarbeitungskette von Vorverarbeitung der Rohdaten bis zur Erstellung referenzierter Karten zu Landbedeckung und Versiegelung am Beispiel von Hyperspectral Mapper Daten von Berlin ganzheitlich untersucht. Die traditionelle Verarbeitungskette wird mehrmals erweitert bzw. abgewandelt. So wird die radiometrische Vorverarbeitung um die Normalisierung von Helligkeitsgradienten erweitert, welche durch die direktionellen Reflexionseigenschaften urbaner Oberflächen entstehen. Die Klassifikation in fünf spektral komplexe Landnutzungsklassen wird mit Support Vector Maschinen ohne zusätzliche Merkmalsextraktion oder Differenzierung von Subklassen durchgeführt...thesi

    Building Blocks for Mapping Services

    Get PDF
    Mapping services are ubiquitous on the Internet. These services enjoy a considerable user base. But it is often overlooked that providing a service on a global scale with virtually millions of users has been the playground of an oligopoly of a select few service providers are able to do so. Unfortunately, the literature on these solutions is more than scarce. This thesis adds a number of building blocks to the literature that explain how to design and implement a number of features

    Geodetic monitoring of complex shaped infrastructures using Ground-Based InSAR

    Get PDF
    In the context of climate change, alternatives to fossil energies need to be used as much as possible to produce electricity. Hydroelectric power generation through the utilisation of dams stands out as an exemplar of highly effective methodologies in this endeavour. Various monitoring sensors can be installed with different characteristics w.r.t. spatial resolution, temporal resolution and accuracy to assess their safe usage. Among the array of techniques available, it is noteworthy that ground-based synthetic aperture radar (GB-SAR) has not yet been widely adopted for this purpose. Despite its remarkable equilibrium between the aforementioned attributes, its sensitivity to atmospheric disruptions, specific acquisition geometry, and the requisite for phase unwrapping collectively contribute to constraining its usage. Several processing strategies are developed in this thesis to capitalise on all the opportunities of GB-SAR systems, such as continuous, flexible and autonomous observation combined with high resolutions and accuracy. The first challenge that needs to be solved is to accurately localise and estimate the azimuth of the GB-SAR to improve the geocoding of the image in the subsequent step. A ray tracing algorithm and tomographic techniques are used to recover these external parameters of the sensors. The introduction of corner reflectors for validation purposes confirms a significant error reduction. However, for the subsequent geocoding, challenges persist in scenarios involving vertical structures due to foreshortening and layover, which notably compromise the geocoding quality of the observed points. These issues arise when multiple points at varying elevations are encapsulated within a singular resolution cell, posing difficulties in pinpointing the precise location of the scattering point responsible for signal return. To surmount these hurdles, a Bayesian approach grounded in intensity models is formulated, offering a tool to enhance the accuracy of the geocoding process. The validation is assessed on a dam in the black forest in Germany, characterised by a very specific structure. The second part of this thesis is focused on the feasibility of using GB-SAR systems for long-term geodetic monitoring of large structures. A first assessment is made by testing large temporal baselines between acquisitions for epoch-wise monitoring. Due to large displacements, the phase unwrapping can not recover all the information. An improvement is made by adapting the geometry of the signal processing with the principal component analysis. The main case study consists of several campaigns from different stations at Enguri Dam in Georgia. The consistency of the estimated displacement map is assessed by comparing it to a numerical model calibrated on the plumblines data. It exhibits a strong agreement between the two results and comforts the usage of GB-SAR for epoch-wise monitoring, as it can measure several thousand points on the dam. It also exhibits the possibility of detecting local anomalies in the numerical model. Finally, the instrument has been installed for continuous monitoring for over two years at Enguri Dam. An adequate flowchart is developed to eliminate the drift happening with classical interferometric algorithms to achieve the accuracy required for geodetic monitoring. The analysis of the obtained time series confirms a very plausible result with classical parametric models of dam deformations. Moreover, the results of this processing strategy are also confronted with the numerical model and demonstrate a high consistency. The final comforting result is the comparison of the GB-SAR time series with the output from four GNSS stations installed on the dam crest. The developed algorithms and methods increase the capabilities of the GB-SAR for dam monitoring in different configurations. It can be a valuable and precious supplement to other classical sensors for long-term geodetic observation purposes as well as short-term monitoring in cases of particular dam operations

    Improving the geospatial consistency of digital libraries metadata

    Get PDF
    Consistency is an essential aspect of the quality of metadata. Inconsistent metadata records are harmful: given a themed query, the set of retrieved metadata records would contain descriptions of unrelated or irrelevant resources, and may even not contain some resources considered obvious. This is even worse when the description of the location is inconsistent. Inconsistent spatial descriptions may yield invisible or hidden geographical resources that cannot be retrieved by means of spatially themed queries. Therefore, ensuring spatial consistency should be a primary goal when reusing, sharing and developing georeferenced digital collections. We present a methodology able to detect geospatial inconsistencies in metadata collections based on the combination of spatial ranking, reverse geocoding, geographic knowledge organization systems and information-retrieval techniques. This methodology has been applied to a collection of metadata records describing maps and atlases belonging to the Library of Congress. The proposed approach was able to automatically identify inconsistent metadata records (870 out of 10,575) and propose fixes to most of them (91.5%) These results support the ability of the proposed methodology to assess the impact of spatial inconsistency in the retrievability and visibility of metadata records and improve their spatial consistency

    Multilevel logistic regression modelling for crash mapping in metropolitan areas

    Get PDF
    The spatial nature of traffic crashes makes crash locations one of the most important and informative attributes of crash databases. It is however very likely that recorded crash locations in terms of easting and northing coordinates, distances from junctions, addresses, road names and types are inaccurately reported. Improving the quality of crash locations therefore has the potential to enhance the accuracy of many spatial crash analyses. The determination of correct crash locations usually requires a combination of crash and network attributes with suitable crash mapping methods. Urban road networks are more sensitive to erroneous matches due to high road density and inherent complexity. This paper presents a novel crash mapping method suitable for urban and metropolitan areas that matched all the crashes that occurred in London from 2010-2012. The method is based on a hierarchical data structure of crashes (i.e. candidate road links are nested within vehicles and vehicles nested within crashes) and employs a multilevel logistic regression model to estimate the probability distribution of mapping a crash onto a set of candidate road links. The road link with the highest probability is considered to be the correct segment for mapping the crash. This is based on the two primary variables: (a) the distance between the crash location and a candidate segment and (b) the difference between the vehicle direction just before the collision and the link direction. Despite the fact that road names were not considered due to limited availability of this variable in the applied crash database, the developed method provides a 97.1% (±1%) accurate matches (N=1,000). The method was compared with two simpler, non-probabilistic crash mapping algorithms and the results were used to demonstrate the effect of crash location data quality on a crash risk analysis
    • …
    corecore