279 research outputs found

    A method and a tool for geocoding and record linkage

    Get PDF
    For many years, researchers have presented the geocoding of postal addresses as a challenge. Several research works have been devoted to achieve the geocoding process. This paper presents theoretical and technical aspects for geolocalization, geocoding, and record linkage. It shows possibilities and limitations of existing methods and commercial software identifying areas for further research. In particular, we present a methodology and a computing tool allowing the correction and the geo-coding of mailing addresses. The paper presents two main steps of the methodology. The first preliminary step is addresses correction (addresses matching), while the second caries geocoding of identified addresses. Additionally, we present some results from the processing of real data sets. Finally, in the discussion, areas for further research are identified.addresses correction; geocodage; matching; data management; record linkage

    The spatial-temporal prediction of various crime types in Houston, TX based on hot-spot techniques

    Get PDF
    A series of hotspot mapping theories and methods have been proposed to predict where and when a crime will happen. Each method has its strengths and weaknesses. In addition, the predictive accuracy of each hotspot method varies depending on the study area, crime type, parameter settings of each method, etc. The predictive accuracy of hotspot methods can be quantified by three measures, which include the hit rate, the predictive accuracy index (PAI), and the recapture rate index (RRI). This thesis research applied eight hotspot mapping techniques from the crime analysis field to predict crime hotspot patterns. In addition, these hotspot methods were compared and evaluated in order to possibly find a single best method that outperforms all other methods based on the three predictive accuracy measures. Identifying the single best method is carried out for all Part1 Crimes combined and individually, for five of the nine Part 1 Crime. In addition to the spatial analysis, a spatial–temporal analysis of the same crime dataset was conducted to investigate the distribution of crime clusters from both the space and time dimensions. The reported crime data analyzed in this study are from the city of Houston, TX, from January 2011 to December 2012. The results show that the predictive accuracy is affected by both the hotspot mapping method and the crime type, although the crime type has a more moderate effect. Considering the use of the three predictive accuracy measures, the kernel density estimation could be identified as the method which could most accurately predict the overall Part1 Crimes for the city of Houston. The nearest neighbor hierarchical clustering and kernel density estimation could be identified as the methods which are best at predicting each of the five crime types examined based on PAI and RRI, respectively. Also, spatial-temporal analysis indicates that more crimes occurred during September to December, 2011 around the center and in the southwestern part of the city of Houston, TX

    Minimum geocoding match rates: an international study of the impact of data and areal unit sizes

    Get PDF
    The analysis of geographically referenced data, specifically point data, is predicated on the accurate geocoding of those data. Geocoding refers to the process in which geographically referenced data (addresses, for example) are placed on a map. This process may lead to issues with positional accuracy or the inability to geocode an address. In this paper, we conduct an international investigation into the impact of the (in)ability to geocode an address on the resulting spatial pattern. We use a variety of point data sets of crime events (varying numbers of events and types of crime), a variety of areal units of analysis (varying the number and size of areal units), from a variety of countries (varying underlying administrative systems), and a locally-based spatial point pattern test to find the levels of geocoding match rates to maintain the spatial patterns of the original data when addresses are missing at random. We find that the level of geocoding success depends on the number of points and the number of areal units under analysis, but generally show that the necessary levels of geocoding success are lower than found in previous research. This finding is consistent across different national contexts

    Spatial Concentration of Opioid Overdose Deaths in Indianapolis: An Application of the Law of Crime Concentration at Place to a Public Health Epidemic

    Get PDF
    The law of crime concentration at place has become a criminological axiom and the foundation for one of the strongest evidence-based policing strategies to date. Using longitudinal data from three sources, emergency medical service calls, death toxicology reports from the Marion County (Indiana) Coroner’s Office, and police crime data, we provide four unique contributions to this literature. First, this study provides the first spatial concentration estimation of opioid-related deaths. Second, our findings support the spatial concentration of opioid deaths and the feasibility of this approach for public health incidents often outside the purview of traditional policing. Third, we find that opioid overdose death hot spots spatially overlap with areas of concentrated violence. Finally, we apply a recent method, corrected Gini coefficient, to best specify low-N incident concentrations and propose a novel method for improving upon a shortcoming of this approach. Implications for research and interventions are discussed

    An Interactive Method for Tracking & Geocoding

    Get PDF
    Tracking has been used for many years commercially. Some important applications are RFID tracking on animals and marine beings, military applications etc. tracking and navigation is done by using a wireless sensor having some transmission power. In Geocoding, the local names of the location are matched to their respective longitudes and latitudes so here after tracking we have to place its location on the map so for that the process of geocoding will be used. In this paper we are performing a survey on the various methods being used for geocoding and tracking

    Spatio-temporal methods for the analysis of crime and traffic safety data

    Get PDF
    Desde que John Snow analizara espacialmente los casos de cólera de la epidemia de Londres de 1854, han sido muchas las disciplinas que se han beneficiado de la existencia de métodos estadísticos espacio-temporales: agricultura, astronomía, biología, epidemiología, geología, hidrología, meteorología y teledetección, entre otras. Esta tesis se centra en el desarrollo y aplicación de estos métodos en el contexto de dos disciplinas: la seguridad vial y la criminología. En particular, un objetivo capital ha sido el de detectar lagunas de investigación en la literatura actualmente disponible. Así pues, la investigación de diversos problemas que surgen de forma habitual en estas dos áreas, los cuales requieren de un tratamiento estadístico concreto, ha llevado a estructurar la tesis de la forma siguiente. En primer lugar, tras un capítulo introductorio, se exponen dos estudios sobre seguridad vial sobre una estructura de tipo red. Así pues, el Capítulo 2 contiene un análisis multivariante a nivel de calle en el que se distingue entre zonas de intersección y de no intersección. Seguidamente, en el Capítulo 3 se presenta un método para la detección de “hotspots” de riesgo diferencial sobre una red. El Capítulo 4 incluye un análisis espacio-temporal de un conjunto de datos de robos a vivienda centrado en el fenómeno de casi-repetición, el cual es capital en criminología. La versión clásica del test de Knox es adaptada para contemplar la existencia de heterogeneidad espacio-temporal en el riesgo de robo, lo que permite obtener una visión más precisa de la magnitud del fenómeno. En concreto, se propone un ajuste adecuado en un contexto de ausencia de variación espacio-temporal tanto en la variable exposición como en las covariables. El Capítulo 5 incluye un estudio detallado del problema de la unidad de área modificable (MAUP) en el contexto del análisis de la seguridad vial. Como novedad frente a estudios previos, la escala y la zonificación de las estructuras espaciales son controladas de forma explícita. Además, el análisis no solo se centra en las consecuencias finales en términos de estimación y precisión de los modelos, sino en las alteraciones que sufren las variables. El Capítulo 6 se dedica a la comparación de varias metodologías que permiten analizar cómo la proximidad a ciertos lugares influye en la incidencia de un evento de interés. En concreto, esta comparación se realiza para valorar la relación existente entre los accidentes de tráfico y la localización de centros educativos. El Capítulo 7 se centra en analizar una cuestión a la que se ha dado gran importancia en criminología cuantitativa: la pérdida de fiabilidad de un análisis como consecuencia de la presencia de eventos no geocodificados. Se ha estimado que alcanzar un 85% en la tasa de geocodificación es lo suficientemente aceptable como para analizar los datos. En esta tesis se reestima este porcentaje teniendo en cuenta algunos factores y métodos no tenidos en cuenta en la estimación inicial. Se concluye que geocodificar el 85% de los eventos puede no ser suficiente bajo ciertas condiciones. Finalmente, el Capítulo 8 incluye la descripción de dos paquetes de R que han sido desarrollados durante esta tesis: SpNetPrep, que permite el preprocesado y depuración de una estructura de tipo red, y DRHotNet, que implementa el procedimiento de detección de “hotspots” descrito en el Capítulo 3.Since physician John Snow analyzed the spatial distribution of cholera cases detected in the 1854 epidemic in London, many disciplines have benefited from the existence of spatio-temporal statistical methods: agriculture, astronomy, biology, epidemiology, geology, hydrology, meteorology, and remote sensing, among others. This thesis therefore focuses on the development and application of spatio-temporal methods in the context of two disciplines: traffic safety analysis and criminology. In particular, a capital objective has been to detect research gaps in the currently available literature. Thus, the investigation of several types of problems that usually arise in these two fields, which require a specific statistical approach, has led to the structuring of this thesis as follows. Firstly, after an introductory chapter, two studies in the context of traffic safety analysis where the use of a linear network structure is fundamental are shown. The first one contains a street-level multivariate analysis of the occurrence of traffic accidents accounting for the presence of intersection and non-intersection segments. Next, in Chapter 3, a method is presented and employed for the detection of differential risk "hotspots" along a network. Chapter 4 includes a spatio-temporal analysis of a burglary dataset focused on the phenomenon of near-repetition, which is capital in the field of criminology. The classic version of the Knox test is adapted to account for spatio-temporal burglary risk heterogeneity, which provides a more accurate representation of the magnitude of the phenomenon. Specifically, an adjustment is proposed that is suitable in a context of absence of spatial-temporal variation in both the exposure variable and the covariates. Chapter 5 includes a detailed study of the modifiable area unit problem (MAUP) in the context of traffic safety analysis. As a novelty compared to previous studies, the scale and zoning of the spatial structures considered are explicitly controlled. Furthermore, the analysis does not only focus on the final consequences in terms of estimation and precision of the models, but also on the alterations that occur in the different variables involved. Chapter 6 is dedicated to the comparison of several methodologies that can be selected to analyze how the proximity to certain places influences the incidence of an event of interest. Specifically, this comparison is made to assess the relationship between traffic accidents and the location of educational centers. Chapter 7 focuses on analyzing an issue that has been given great importance in quantitative criminology: the loss of reliability of analyses as a result of the presence of non-geocoded events. It has been estimated that reaching 85% geocoding success rate is enough to carry out further analysis of the data. In this thesis, this percentage is reestimated taking into account some factors and methods not taken into account in the initial estimation. It is concluded that reaching 85% success rate in the geocoding process may not be sufficient under certain conditions. Finally, Chapter 8 includes the description of two R packages that have been developed during this thesis: SpNetPrep, which allows the preprocessing and curation of a linear network, and DRHotNet, which implements the "hotspot" detection procedure described in Chapter 3

    Region-based Dynamic Weighting Probabilistic Geocoding

    Get PDF
    Geocoding has been a widely used technology in daily life and scientific research for at least four decades. Especially in scientific research, geocoding has been used as a generator of spatial data for further analysis. These uses have made it extremely important that geocoding results be as accurate as possible. Existing global-weighting approaches to geocoding assume spatial stationarity of addressing systems and address data characteristic distributions across space, resulting in heuristics and approaches that apply global parameters to produce geocodes for addresses in all regions. However, different regions in the United States (US) have different values and densities of address attributes, which increases the error of standard algorithms that assume global parameters and calculation weights. Region-based dynamic weighting can be used in probabilistic geocoding approaches to stabilize and reduce incorrect match probability assignments that are due to place-specific naming conventions which vary region-to-region across the US. This study tested the spatial accuracy and time efficiency of a region-based dynamic weighting probabilistic geocoding system, as compared to a set of manually corrected geocoding results within Los Angeles City. The results of this study show that the region-based dynamic weighting probabilistic method improves the spatial accuracy of geocoding results and has a moderate influence on the time efficiency of the geocoding system

    Criminal Mobility Of Robbery Offenders

    Get PDF
    The current paper addresses the mobility and willingness to travel of robbery offenders. A five-sector robbery typology was constructed, consisting of: personal robbery, commercial robbery, carjacking robbery, home-invasion robbery, and robbery by sudden snatching. Defining mobility as the straight-line distance between the offender\u27s home residence and the location of the robbery offense, the extent of criminal mobility for each type of robbery offense was analyzed. Using geographical information system (GIS) technologies and, more specifically, geocoding software programs, the latitudinal and longitudinal coordinates of the offender\u27s home and offense\u27s location was determined. It was found that a subset of robbery offenders exhibit relatively high mobility across all five robbery types. However, distinct mobility patterns also emerged between the different types of robbery offenses. Policy and research implications from these findings are discussed

    Improving Geocoding Rates in Preparation for Crime Data Analysis

    Get PDF
    The new geocoding toolkit (matching a crime to the geographic location where it occurred) has been developed in order to improve the "hit" rate (rate at which a batch of crimes can be accurately located on a map). The purpose of the toolkit is not to replace commercial address-matching software such as Matchcode or QAS, but to enhance the outcome of the geocoding process by building additional steps and tools around these existing software products. It is a five-stage process. The first stage cleans common errors that arise in the address fields of crime data. In the second stage, the crime data are passed through commercial address-matching software, which attaches geographic coordinates to the crime location based on a street address. All addresses successfully geocoded at this stage are given the validation code "L1," indicating that the crime has been linked to an individual property address at the highest level of accuracy. The third stage focuses on crimes with nonaddress locations. The majority of these are street junctions and can be found in the free-text data field that describes the venue of the crime incident. Other nonaddress locations would include railway stations, bus stations, and prominent landmarks. The junctions are text-mined by searching for key words. In the fourth stage, all remaining records with a valid unit postcode (mail delivery point) are geocoded at the postcode level. The final stage of the toolkit geocodes all remaining records according to street name. A test of this system in a British police force raised the "hit" rate for accurate crime location an additional 65 percent to a rate of 91 percent

    A workflow for geocoding South African addresses

    Get PDF
    There are many industries that have long been utilizing Geographical Information Systems (GIS) for spatial analysis. In many parts of the world, it has gained less popularity because of inaccurate geocoding methods and a lack of data standardization. Commercial services can also be expensive and as such, smaller businesses have been reluctant to make a financial commitment to spatial analytics. This thesis discusses the challenges specific to South Africa as well as the challenges inherent in bad address data. The main goal of this research is to highlight the potential error rates of geocoded user-captured address data and to provide a workflow that can be followed to reduce the error rate without intensive manual data cleansing. We developed a six step workflow and software package to prepare address data for spatial analysis and determine the potential error rate. We used three methods of geocoding: a gazetteer postal code file, a free web API and an international commercial product. To protect the privacy of the clients and the businesses, addresses were aggregated with precision to a postcode or suburb centroid. Geocoding results were analysed before and after each step. Two businesses were analysed, a mid-large scale business with a large structured client address database and a small private business with a 20 year old unstructured client address database. The companies are from two completely different industries, the larger being in the financial industry and the smaller company an independent magazine in publishing
    corecore