9 research outputs found

    Geospatial Data Science to Identify Patterns of Evasion

    Get PDF
    University of Minnesota Ph.D. dissertation.January 2018. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); x, 153 pages.Over the last decade, there has been a significant growth in the availability of cheap raw spatial data in the form of GPS trajectories, activity/event locations, temporally detailed road networks, satellite imagery, etc. These data are being collected, often around the clock, from location-aware applications, sensor technologies, etc. and represent an unprecedented opportunity to study our economic, social, and natural systems and their interactions. For example, finding hotspots (areas with unusually high concentration of activities/events) from activity/event locations plays a crucial role in epidemiology since it may help public health officials prevent further spread of an infectious disease. In order to extract useful information from these datasets, many geospatial data tools have been proposed in recent years. However, these tools are often used as a “black box”, where a trial-error strategy is used with multiple approaches from different scientific disciplines (e.g. statistics, mathematics and computer science) to find the best solution with little or no consideration of the actual phenomena being investigated. Hence, the results may be biased or some important information may be missed. To address this problem, we need geospatial data science with a stronger scientific foundation to understand the actual phenomena, develop reliable and trustworthy models and extract information through a scientific process. Thus, my thesis investigates a wide-lens perspective on geospatial data science, considering it as a transdisciplinary field comprising statistics, mathematics, and computer science. This approach aims to reduce the redundant work across disciplines as well as define scientific boundaries of geospatial data science to distinguish it from being a black box that claims to solve every possible geospatial problem. In my proposed approaches, I used ideas from those three disciplines, e.g. spatial scan statistics from statistical science to reduce chance patterns in the output and provide statistical robustness; mathematical definitions of geometric shapes of the patterns, which maintain correctness and completeness; and computational approaches (along with prune and refine framework and dynamic programming ideas) to scale up to large spatial datasets. In addition, the proposed approaches incorporate domain-specific geographic theories (e.g., routine activity theory in criminology) for applicability in those domains that are interested in specific patterns, which occur due to the actual phenomena, from geospatial datasets. The proposed techniques have been applied to real world disease and crime datasets and the evaluations confirmed that our techniques outperform current state-of-the-art such as density based clustering approaches as well as circular hotspot detection methods

    Cancer

    Get PDF
    Maps are well recognized as an effective means of presenting and communicating health data, such as cancer incidence and mortality rates. These data can be linked to geographic features like counties or census tracts and their associated attributes for mapping and analysis. Such visualization and analysis provide insights regarding the geographic distribution of cancer and can be important for advancing effective cancer prevention and control programs. Applying a spatial approach allows users to identify location-based patterns and trends related to risk factors, health outcomes, and population health. Geographic information science (GIScience) is the discipline that applies Geographic Information Systems (GIS) and other spatial concepts and methods in research. This review explores the current state and evolution of GIScience in cancer research by addressing fundamental topics and issues regarding spatial data and analysis that need to be considered. GIScience, along with its health-specific application in the spatial epidemiology of cancer, incorporates multiple geographic perspectives pertaining to the individual, the health care infrastructure, and the environment. Challenges addressing these perspectives and the synergies among them can be explored through GIScience methods and associated technologies as integral parts of epidemiologic research, analysis efforts, and solutions. The authors suggest GIScience is a powerful tool for cancer research, bringing additional context to cancer data analysis and potentially informing decision-making and policy, ultimately aimed at reducing the burden of cancer.CC999999/ImCDC/Intramural CDC HHS/United States2019-08-01T00:00:00Z31145834PMC66259158081vault:3254

    Exploring Decomposition for Solving Pattern Mining Problems

    Get PDF
    This article introduces a highly efficient pattern mining technique called Clustering-based Pattern Mining (CBPM). This technique discovers relevant patterns by studying the correlation between transactions in the transaction database based on clustering techniques. The set of transactions is first clustered, such that highly correlated transactions are grouped together. Next, we derive the relevant patterns by applying a pattern mining algorithm to each cluster. We present two different pattern mining algorithms, one applying an approximation-based strategy and another based on an exact strategy. The approximation-based strategy takes into account only the clusters, whereas the exact strategy takes into account both clusters and shared items between clusters. To boost the performance of the CBPM, a GPU-based implementation is investigated. To evaluate the CBPM framework, we perform extensive experiments on several pattern mining problems. The results from the experimental evaluation show that the CBPM provides a reduction in both the runtime and memory usage. Also, CBPM based on the approximate strategy provides good accuracy, demonstrating its effectiveness and feasibility. Our GPU implementation achieves significant speedup of up to 552Ă— on a single GPU using big transaction databases.publishedVersio

    A Computational Framework for Finding Interestingness Hotspots in Spatial Datasets

    Get PDF
    The significant growth of spatial data increased the need for automated discovery of spatial knowledge. An important task when analyzing spatial data is hotspot discovery. In this dissertation, we propose a novel methodology for discovering interestingness hotspots in spatial datasets. We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert’s notion of interestingness captured by an interestingness function. We propose computational methods for finding interestingness hotspots in point-based and polygonal spatial datasets, and gridded spatial-temporal datasets. The proposed framework identifies hotspots maximizing an externally given interestingness function defined on any number of spatial or non-spatial attributes using a five-step methodology, which consists of: (1) identifying neighboring objects in the dataset, (2) generating hotspot seeds, (3) growing hotspots from identified hotspot seeds, (4) post-processing to remove highly overlapping neighboring redundant hotspots, and (5) finding the scope of hotspots. In particular, we introduce novel hotspot growing algorithms that grow hotspots from hotspot seeds. A novel growing algorithm for point-based datasets is introduced that operates on Gabriel Graphs, capturing the neighboring relationships of objects in a spatial dataset. Moreover, we present a novel graph-based post-processing algorithm, which removes highly overlapping hotspots and employs a graph simplification step that significantly improves the runtime of finding maximum weight independent set in the overlap graph of hotspots. The proposed post-processing algorithm is quite generic and can be used with any methods to cope with overlapping hotspots or clusters. Additionally, the employed graph simplification step can be adapted as a preprocessing step by algorithms that find maximum weight clique and maximum weight independent sets in graphs. Furthermore, we propose a computational framework for finding the scope of two-dimensional point-based hotspots. We evaluate our framework in case studies using a gridded air-pollution dataset, and point-based crime and taxicab datasets in which we find hotspots based on different interestingness functions and we give a comparison of our framework with a state of the art hotspot discovery technique. Experiments show that our methodology succeeds in accurately discovering interestingness hotspots and does well in comparison to traditional hotspot detection methods.Computer Science, Department o

    Spatial Big Data Analytics: Classification Techniques for Earth Observation Imagery

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2016. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); xi, 120 pages.Spatial Big Data (SBD), e.g., earth observation imagery, GPS trajectories, temporally detailed road networks, etc., refers to geo-referenced data whose volume, velocity, and variety exceed the capability of current spatial computing platforms. SBD has the potential to transform our society. Vehicle GPS trajectories together with engine measurement data provide a new way to recommend environmentally friendly routes. Satellite and airborne earth observation imagery plays a crucial role in hurricane tracking, crop yield prediction, and global water management. The potential value of earth observation data is so significant that the White House recently declared that full utilization of this data is one of the nation's highest priorities. However, SBD poses significant challenges to current big data analytics. In addition to its huge dataset size (NASA collects petabytes of earth images every year), SBD exhibits four unique properties related to the nature of spatial data that must be accounted for in any data analysis. First, SBD exhibits spatial autocorrelation effects. In other words, we cannot assume that nearby samples are statistically independent. Current analytics techniques that ignore spatial autocorrelation often perform poorly such as low prediction accuracy and salt-and-pepper noise (i.e., pixels predicted as different from neighbors by mistake). Second, spatial interactions are not isotropic and vary across directions. Third, spatial dependency exists in multiple spatial scales. Finally, spatial big data exhibits heterogeneity, i.e., identical feature values may correspond to distinct class labels in different regions. Thus, learned predictive models may perform poorly in many local regions. My thesis investigates novel SBD analytics techniques to address some of these challenges. To date, I have been mostly focusing on the challenges of spatial autocorrelation and anisotropy via developing novel spatial classification models such as spatial decision trees for raster SBD (e.g., earth observation imagery). To scale up the proposed models, I developed efficient learning algorithms via computational pruning. The proposed techniques have been applied to real world remote sensing imagery for wetland mapping. I also had developed spatial ensemble learning framework to address the challenge of spatial heterogeneity, particularly the class ambiguity issues in geographical classification, i.e., samples with the same feature values belong to different classes in different spatial zones. Evaluations on three real world remote sensing datasets confirmed that proposed spatial ensemble learning outperforms current approaches such as bagging, boosting, and mixture of experts when class ambiguity exists

    Crime Hotspot Detection: A Computational Perspective

    No full text
    Given a set of crime locations, a statistically significant crime hotspot is an area where the concentration of crimes inside is significantly higher than outside. The motivation of crime hotspot detection is twofold: detecting crime hotspots to focus the deployment of police enforcement and predicting the potential residence of a serial criminal. Crime hotspot detection is computationally challenging due to the difficulty of enumerating all potential hotspot areas, selecting an interest measure to compare these with the overall crime intensity, and testing for statistical significance to reduce chance patterns. This chapter focuses on statistical significant crime hotspots. First, the foundations of spatial scan statistics and its applications (i.e. SaTScan) to circular hotspot detection are reviewed. Next, ring-shaped hotspot detection is introduced. Third, linear hotspot detection is described since most crimes occur along a road network. The chapter concludes with future research directions in crime hotspot detection
    corecore