21 research outputs found

    A Power-Enhanced Algorithm for Spatial Anomaly Detection in Binary Labelled Point Data Using the Spatial Scan Statistic [postprint]

    Get PDF
    This paper presents a novel modification to an existing algorithm for spatial anomaly detection in binary labeled point data sets, using the Bernoulli version of the Spatial Scan Statistic. We identify a potential ambiguity in p-values produced by Monte Carlo testing, which (by the selection of the most conservative p-value) can lead to sub-optimal power. When such ambiguity occurs, the modification uses a very inexpensive secondary test to suggest a less conservative p-value. Using benchmark tests, we show that this appears to restore power to the expected level, whilst having similarly retest variance to the original. The modification also appears to produce a small but significant improvement in overall detection performance when multiple anomalies are present

    A spatial accuracy assessment of an alternative circular scan method for Kulldorff's spatial scan statistic

    Get PDF
    This paper concerns the Bernoulli version of Kulldorff’s spatial scan statistic, and how accurately it identifies the exact centre of approximately circular regions of increased spatial density in point data. We present an alternative method of selecting circular regions that appears to give greater accuracy. Performance is tested in an epidemiological context using manifold synthetic case-control datasets. A small, but statistically significant, improvement is reported. The power of the alternative method is yet to be assessed

    A pilot inference study for a beta-Bernoulli spatial scan statistic

    Get PDF
    The Bernoulli spatial scan statistic is used to detect localised clusters in binary labelled point data, such as that used in spatial or spatio-temporal case/control studies. We test the inferential capability of a recently developed beta-Bernoulli spatial scan statistic, which adds a beta prior to the original statistic. This pilot study, which includes two test scenarios with 6,000 data sets each, suggests a marked increase in power for a given false alert rate. We suggest a more extensive study would be worthwhile to corroborate the findings. We also speculate on an explanation for the observed improvement

    A graph-theory method for pattern identification in geographical epidemiology - a preliminary application to deprivation and mortality

    Get PDF
    Background: Graph theoretical methods are extensively used in the field of computational chemistry to search datasets of compounds to see if they contain particular molecular substructures or patterns. We describe a preliminary application of a graph theoretical method, developed in computational chemistry, to geographical epidemiology in relation to testing a prior hypothesis. We tested the methodology on the hypothesis that if a socioeconomically deprived neighbourhood is situated in a wider deprived area, then that neighbourhood would experience greater adverse effects on mortality compared with a similarly deprived neighbourhood which is situated in a wider area with generally less deprivation. Methods: We used the Trent Region Health Authority area for this study, which contained 10,665 census enumeration districts (CED). Graphs are mathematical representations of objects and their relationships and within the context of this study, nodes represented CEDs and edges were determined by whether or not CEDs were neighbours (shared a common boundary). The overall area in this study was represented by one large graph comprising all CEDs in the region, along with their adjacency information. We used mortality data from 1988-1998, CED level population estimates and the Townsend Material Deprivation Index as an indicator of neighbourhood level deprivation. We defined deprived CEDs as those in the top 20% most deprived in the Region. We then set out to classify these deprived CEDs into seven groups defined by increasing deprivation levels in the neighbouring CEDs. 506 (24.2%) of the deprived CEDs had five adjacent CEDs and we limited pattern development and searching to these CEDs. We developed seven query patterns and used the RASCAL (Rapid Similarity Calculator) program to carry out the search for each of the query patterns. This program used a maximum common subgraph isomorphism method which was modified to handle geographical data. Results: Of the 506 deprived CEDs, 10 were not identified as belonging to any of the seven groups because they were adjacent to a CED with a missing deprivation category quintile, and none fell within query Group 1 (a deprived CED for which all five adjacent CEDs were affluent). Only four CEDs fell within Group 2, which was defined as having four affluent adjacent CEDs and one non-affluent adjacent CED. The numbers of CEDs in Groups 3-7 were 17, 214, 95, 81 and 85 respectively. Age and sex adjusted mortality rate ratios showed a non-significant trend towards increasing mortality risk across Groups (Chi-square = 3.26, df = 1, p = 0.07). Conclusion: Graph theoretical methods developed in computational chemistry may be a useful addition to the current GIS based methods available for geographical epidemiology but further developmental work is required. An important requirement will be the development of methods for specifying multiple complex search patterns. Further work is also required to examine the utility of using distance, as opposed to adjacency, to describe edges in graphs, and to examine methods for pattern specification when the nodes have multiple attributes attached to them

    New developments in the spatial scan statistic

    Get PDF
    The quantity and variety of spatial data have increased over recent years, and the variety and sophistication of tools for analysing this type of data have also increased. One such tool is the spatial scan statistic, which is freely available (www.satscan.org) and has been the subject of much scholarly research since its introduction in 1995 owing to its numerous applications in epidemiology, criminology and other fields. This paper provides readers with a non-technical introduction to the spatial scan statistic, together with an overview of associated research, which focuses particularly on work conducted at the University of Sheffield’s Information School, in collaboration with the School of Health and Related Research. This work falls into three main areas. First, we provide an examination of the probability of obtaining false alerts when using the statistic, and ways in which this can be managed. Second, we describe the development of a definitive way of measuring the spatial accuracy of the statistic. Third, and potentially the most important in terms of impact, we discuss a means of substantially increasing the detection capability of the statistic by placing a realistic constraint on the strength of any cluster that is likely to be present in the data. The paper also provides a discussion of potential future research directions

    Use of graph theory to identify patterns of deprivation, high morbidity and mortality in public health datasets

    No full text
    Objective: An important part of public health is identifying patterns of poor health and deprivation. Specific patterns of poor health may be associated with features of the geographic environment where contamination or pollution may be occurring. For example, there may be clusters of poor health surrounding nuclear power stations, whereas major roads or rivers may be associated with areas of poor health alongside the feature in chains. Current methods are limited in their capacity to search for complex patterns in geographic data sets. The objective of this study was to determine whether graph theory could be used to identify patterns of geographic areas that have high levels of deprivation, morbidity, and mortality in a public health database. The geographic areas used in the study were enumeration districts (EDs), which are the lowest level of census geography in England and Wales, representing on average 200 households in the 1991 census. More specifically, the study aimed to identify chains of EDs with high deprivation, morbidity, and mortality that might be adjacent to specific types of geographic features, i.e., rivers or major roads. Design: The maximum common subgraph (MCS) algorithm was used to search for seven query patterns of deprivation and poor health within the Trent region. Query pattern 1 represented a linear chain of five EDs and query patterns 2 to 7 represented the possible clusters of the five EDs. To identify chains of EDs with high deprivation, morbidity, and mortality, the results from the query patterns 2 to 7 were used to remove patterns (option 1) and EDs (option 2) from the results of query pattern 1. Measurements: Data on the Townsend Material Deprivation Index, standardized long-term limiting illness and standardized all-cause mortality rates were used for the 10,665 EDs within the Trent region. Results: The MCS algorithm retrieved a range of patterns and EDs from the database for the queries. Query pattern 1 identified 3,838 patterns containing a total of 195 EDs. When the patterns retrieved using query patterns 2 to 7 were removed from the 3,838 patterns using option 1, 1,704 patterns remained containing 161 EDs. When the EDs retrieved using query patterns 2 to 7 were removed from the 195 EDs identified by query pattern 1 using option 2, 12 EDs remained. The MCS algorithm was therefore able to reduce the numbers of patterns and EDs to allow manual examination for chains of EDs and for that which might be associated with them. Conclusion: The study demonstrates the potential of the MCS algorithm for searching for specific patterns of need. This method has potential for identifying such patterns in relation to local geographic features for public health

    Validation of graph-theoretical methods for pattern identification in public health datasets

    No full text
    Pattern identification issues are commonly used in public health practice to identify disease clusters and tendencies towards clustering. The basic building blocks or units for such patterns may be individuals or geographical units, but the key factor is the association between units in terms of time, space or other complex links. A range of methods has been developed for cluster detection but these methods are not designed to handle complex pattern searching. This paper describes early work in developing a novel method of tackling this problem, using graph theoretical techniques developed for computational chemistry. A modified version of the maximum common subgraph isomorphism method was used to search and retrieve enumeration districts (EDs) using 27 user-defined patterns from a set of 106 EDs. The results were then checked manually to ensure that all the appropriate and no additional patterns and EDs were retrieved. The program successfully retrieved all the relevant patterns and EDs and did not retrieve any patterns not specified by the query patterns. This study demonstrates the applicability of using graph theory for identifying and retrieving patterns in public health datasets

    Validation of graph-theoretical methods for pattern identification in public health datasets

    No full text
    exterior, portico, July-Sept. 198

    Use of graph theory for data mining in public health

    No full text
    Data mining problems are common in public health, for example for identifying disease clusters and multidimensional patterns within large databases, e.g. socioeconomic differentials in health. Although numerous data mining methods have been developed, currently available methods are not designed to handle complex pattern searching queries and no satisfactory methods are available for this purpose. The aim of the study reported here was to test graph-theoretical methods for data mining in public health databases to identify areas of high deprivation that are surrounded by affluent areas and deprived areas surrounded by deprived areas. Graph-theory (using the maximum common subgraph isomorphism (mcs) method) was used to search a database containing information on the 10920 enumeration districts (EDs) for the Trent Region of England. Each ED was allocated to a deprivation quintile based on the Townsend Deprivation Score. These mcs program was used to identify deprived EDs that are adjacent to deprived EDs and deprived EDs that are adjacent to affluent EDs. The mcs program identified 1528 deprived EDs adjacent to at least two deprived EDs, 1181 deprived EDs adjacent to at least three deprived EDs, 802 deprived EDs adjacent to at least four deprived EDs, and 505 deprived EDs adjacent to at least five deprived EDs. The program successfully identified 147 deprived EDs adjacent to at least two affluent EDs, 54 deprived EDs adjacent to at least three affluent EDs, 14 deprived EDs adjacent to at least four affluent EDs, and six deprived EDs adjacent to at least five affluent EDs. The retrieved EDs were then used for hypothesis testing using statistical methods. The study demonstrates the potential of graph theoretical techniques for data mining in public health databases
    corecore