56 research outputs found
Modeling the probability distribution of positional errors incurred by residential address geocoding
BACKGROUND: The assignment of a point-level geocode to subjects' residences is an important data assimilation component of many geographic public health studies. Often, these assignments are made by a method known as automated geocoding, which attempts to match each subject's address to an address-ranged street segment georeferenced within a streetline database and then interpolate the position of the address along that segment. Unfortunately, this process results in positional errors. Our study sought to model the probability distribution of positional errors associated with automated geocoding and E911 geocoding. RESULTS: Positional errors were determined for 1423 rural addresses in Carroll County, Iowa as the vector difference between each 100%-matched automated geocode and its true location as determined by orthophoto and parcel information. Errors were also determined for 1449 60%-matched geocodes and 2354 E911 geocodes. Huge (> 15 km) outliers occurred among the 60%-matched geocoding errors; outliers occurred for the other two types of geocoding errors also but were much smaller. E911 geocoding was more accurate (median error length = 44 m) than 100%-matched automated geocoding (median error length = 168 m). The empirical distributions of positional errors associated with 100%-matched automated geocoding and E911 geocoding exhibited a distinctive Greek-cross shape and had many other interesting features that were not capable of being fitted adequately by a single bivariate normal or t distribution. However, mixtures of t distributions with two or three components fit the errors very well. CONCLUSION: Mixtures of bivariate t distributions with few components appear to be flexible enough to fit many positional error datasets associated with geocoding, yet parsimonious enough to be feasible for nascent applications of measurement-error methodology to spatial epidemiology
Geocoding accuracy and the recovery of relationships between environmental exposures and health
<p>Abstract</p> <p>Background</p> <p>This research develops methods for determining the effect of geocoding quality on relationships between environmental exposures and health. The likelihood of detecting an existing relationship – statistical power – between measures of environmental exposures and health depends not only on the strength of the relationship but also on the level of positional accuracy and completeness of the geocodes from which the measures of environmental exposure are made. This paper summarizes the results of simulation studies conducted to examine the impact of inaccuracies of geocoded addresses generated by three types of geocoding processes: a) addresses located on orthophoto maps, b) addresses matched to TIGER files (U.S Census or their derivative street files); and, c) addresses from E-911 geocodes (developed by local authorities for emergency dispatch purposes).</p> <p>Results</p> <p>The simulated odds of disease using exposures modelled from the highest quality geocodes could be sufficiently recovered using other, more commonly used, geocoding processes such as TIGER and E-911; however, the strength of the odds relationship between disease exposures modelled at geocodes generally declined with decreasing geocoding accuracy.</p> <p>Conclusion</p> <p>Although these specific results cannot be generalized to new situations, the methods used to determine the sensitivity of results can be used in new situations. Estimated measures of positional accuracy must be used in the interpretation of results of analyses that investigate relationships between health outcomes and exposures measured at residential locations. Analyses similar to those employed in this paper can be used to validate interpretation of results from empirical analyses that use geocoded locations with estimated measures of positional accuracy.</p
A probabilistic sampling method (PSM) for estimating geographic distance to health services when only the region of residence is known
<p>Abstract</p> <p>Background</p> <p>The need to estimate the distance from an individual to a service provider is common in public health research. However, estimated distances are often imprecise and, we suspect, biased due to a lack of specific residential location data. In many cases, to protect subject confidentiality, data sets contain only a ZIP Code or a county.</p> <p>Results</p> <p>This paper describes an algorithm, known as "the probabilistic sampling method" (PSM), which was used to create a distribution of estimated distances to a health facility for a person whose region of residence was known, but for which demographic details and centroids were known for smaller areas within the region. From this distribution, the median distance is the most likely distance to the facility. The algorithm, using Monte Carlo sampling methods, drew a probabilistic sample of all the smaller areas (Census blocks) within each participant's reported region (ZIP Code), weighting these areas by the number of residents in the same age group as the participant. To test the PSM, we used data from a large cross-sectional study that screened women at a clinic for intimate partner violence (IPV). We had data on each woman's age and ZIP Code, but no precise residential address. We used the PSM to select a sample of census blocks, then calculated network distances from each census block's centroid to the closest IPV facility, resulting in a distribution of distances from these locations to the geocoded locations of known IPV services. We selected the median distance as the most likely distance traveled and computed confidence intervals that describe the shortest and longest distance within which any given percent of the distance estimates lie. We compared our results to those obtained using two other geocoding approaches. We show that one method overestimated the most likely distance and the other underestimated it. Neither of the alternative methods produced confidence intervals for the distance estimates. The algorithm was implemented in R code.</p> <p>Conclusions</p> <p>The PSM has a number of benefits over traditional geocoding approaches. This methodology improves the precision of estimates of geographic access to services when complete residential address information is unavailable and, by computing the expected distribution of possible distances for any respondent and associated distance confidence limits, sensitivity analyses on distance access measures are possible. Faulty or imprecise distance measures may compromise decisions about service location and misdirect scarce resources.</p
Recommended from our members
Institutions Sharing Geographic Information—NCGIA Research Initiative 9, Closing Report
This report describes the results of NCGIA Research Initiative 9 on Institutions Sharing Geographic Information. The initiative was active during the period June 1992–summer 1994. Its focus was to identify and understand the behavioral and organizational impediments and incentives to the sharing of geographic information within different kinds of geographic-information-user environments. Distinctions were drawn between the protocols that facilitate sharing among decision makers in the public and private sectors, and among scientists. Consideration was also given to the types of spatial data shared and the types of problems addressed by decision makers. Spatial metadata was seen as a critical priority for enhancement of the sharing of geographic information
Recommended from our members
NCGIA Research Initiative 9- Institutions Sharing Geographic Information: Â Scientific Report for the Specialist Meeting, 26-29 February 1992 (92-5)
Sharing of geographic information involves more than simple data exchange. To facilitate sharing, the GIS research and user communities must deal with both the technical and institutional aspects of collecting, structuring, analyzing, presenting, disseminating, integrating, and maintaining spatial data. Significant efforts are already underway in addressing the technical difficulties inherent in sharing spatial data. Those efforts need to be bolstered with increased activities and research in addressing institutional, organizational, and behavioral problems. In order to spur research on these topics, a group of specialists was brought together to explore behavioral, organizational, and institutional issues acting as impediments or incentives to the sharing of geographic information among and within organizations.The specialists met for three days of presentations and discussions in San Diego from February 27 through February 29, 1992. During the meeting, participants took on the task of suggesting areas of research likely to be fruitful in addressing both near and long-term problems in the sharing of geographic information. For the research topics suggested, theoretical frameworks and methodological approaches for their accomplishment were proposed. This document reports the discussions and research recommendations arising from the specialist meeting process
- …