11 research outputs found

    Spatial Big Data Analytics: Classification Techniques for Earth Observation Imagery

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2016. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); xi, 120 pages.Spatial Big Data (SBD), e.g., earth observation imagery, GPS trajectories, temporally detailed road networks, etc., refers to geo-referenced data whose volume, velocity, and variety exceed the capability of current spatial computing platforms. SBD has the potential to transform our society. Vehicle GPS trajectories together with engine measurement data provide a new way to recommend environmentally friendly routes. Satellite and airborne earth observation imagery plays a crucial role in hurricane tracking, crop yield prediction, and global water management. The potential value of earth observation data is so significant that the White House recently declared that full utilization of this data is one of the nation's highest priorities. However, SBD poses significant challenges to current big data analytics. In addition to its huge dataset size (NASA collects petabytes of earth images every year), SBD exhibits four unique properties related to the nature of spatial data that must be accounted for in any data analysis. First, SBD exhibits spatial autocorrelation effects. In other words, we cannot assume that nearby samples are statistically independent. Current analytics techniques that ignore spatial autocorrelation often perform poorly such as low prediction accuracy and salt-and-pepper noise (i.e., pixels predicted as different from neighbors by mistake). Second, spatial interactions are not isotropic and vary across directions. Third, spatial dependency exists in multiple spatial scales. Finally, spatial big data exhibits heterogeneity, i.e., identical feature values may correspond to distinct class labels in different regions. Thus, learned predictive models may perform poorly in many local regions. My thesis investigates novel SBD analytics techniques to address some of these challenges. To date, I have been mostly focusing on the challenges of spatial autocorrelation and anisotropy via developing novel spatial classification models such as spatial decision trees for raster SBD (e.g., earth observation imagery). To scale up the proposed models, I developed efficient learning algorithms via computational pruning. The proposed techniques have been applied to real world remote sensing imagery for wetland mapping. I also had developed spatial ensemble learning framework to address the challenge of spatial heterogeneity, particularly the class ambiguity issues in geographical classification, i.e., samples with the same feature values belong to different classes in different spatial zones. Evaluations on three real world remote sensing datasets confirmed that proposed spatial ensemble learning outperforms current approaches such as bagging, boosting, and mixture of experts when class ambiguity exists

    Geospatial Data Science to Identify Patterns of Evasion

    Get PDF
    University of Minnesota Ph.D. dissertation.January 2018. Major: Computer Science. Advisor: Shashi Shekhar. 1 computer file (PDF); x, 153 pages.Over the last decade, there has been a significant growth in the availability of cheap raw spatial data in the form of GPS trajectories, activity/event locations, temporally detailed road networks, satellite imagery, etc. These data are being collected, often around the clock, from location-aware applications, sensor technologies, etc. and represent an unprecedented opportunity to study our economic, social, and natural systems and their interactions. For example, finding hotspots (areas with unusually high concentration of activities/events) from activity/event locations plays a crucial role in epidemiology since it may help public health officials prevent further spread of an infectious disease. In order to extract useful information from these datasets, many geospatial data tools have been proposed in recent years. However, these tools are often used as a “black box”, where a trial-error strategy is used with multiple approaches from different scientific disciplines (e.g. statistics, mathematics and computer science) to find the best solution with little or no consideration of the actual phenomena being investigated. Hence, the results may be biased or some important information may be missed. To address this problem, we need geospatial data science with a stronger scientific foundation to understand the actual phenomena, develop reliable and trustworthy models and extract information through a scientific process. Thus, my thesis investigates a wide-lens perspective on geospatial data science, considering it as a transdisciplinary field comprising statistics, mathematics, and computer science. This approach aims to reduce the redundant work across disciplines as well as define scientific boundaries of geospatial data science to distinguish it from being a black box that claims to solve every possible geospatial problem. In my proposed approaches, I used ideas from those three disciplines, e.g. spatial scan statistics from statistical science to reduce chance patterns in the output and provide statistical robustness; mathematical definitions of geometric shapes of the patterns, which maintain correctness and completeness; and computational approaches (along with prune and refine framework and dynamic programming ideas) to scale up to large spatial datasets. In addition, the proposed approaches incorporate domain-specific geographic theories (e.g., routine activity theory in criminology) for applicability in those domains that are interested in specific patterns, which occur due to the actual phenomena, from geospatial datasets. The proposed techniques have been applied to real world disease and crime datasets and the evaluations confirmed that our techniques outperform current state-of-the-art such as density based clustering approaches as well as circular hotspot detection methods

    Discovery of Spatiotemporal Event Sequences

    Get PDF
    Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms

    A method to detect and represent temporal patterns from time series data and its application for analysis of physiological data streams

    Get PDF
    In critical care, complex systems and sensors continuously monitor patients??? physiological features such as heart rate, respiratory rate thus generating significant amounts of data every second. This results to more than 2 million records generated per patient in an hour. It???s an immense challenge for anyone trying to utilize this data when making critical decisions about patient care. Temporal abstraction and data mining are two research fields that have tried to synthesize time oriented data to detect hidden relationships that may exist in the data. Various researchers have looked at techniques for generating abstractions from clinical data. However, the variety and speed of data streams generated often overwhelms current systems which are not designed to handle such data. Other attempts have been to understand the complexity in time series data utilizing mining techniques, however, existing models are not designed to detect temporal relationships that might exist in time series data (Inibhunu & McGregor, 2016). To address this challenge, this thesis has proposed a method that extends the existing knowledge discovery frameworks to include components for detecting and representing temporal relationships in time series data. The developed method is instantiated within the knowledge discovery component of Artemis, a cloud based platform for processing physiological data streams. This is a unique approach that utilizes pattern recognition principles to facilitate functions for; (a) temporal representation of time series data with abstractions, (b) temporal pattern generation and quantification (c) frequent patterns identification and (d) building a classification system. This method is applied to a neonatal intensive care case study with a motivating problem that discovery of specific patterns from patient data could be crucial for making improved decisions within patient care. Another application is in chronic care to detect temporal relationships in ambulatory patient data before occurrence of an adverse event. The research premise is that discovery of hidden relationships and patterns in data would be valuable in building a classification system that automatically characterize physiological data streams. Such characterization could aid in detection of new normal and abnormal behaviors in patients who may have life threatening conditions

    Cascading Spatio-Temporal Pattern Discovery

    No full text

    Cascading Spatio-temporal pattern discovery: A summary of results

    No full text
    Given a collection of Boolean spatio-temporal(ST) event types, the cascading spatio-temporal pattern (CSTP) discovery process finds partially ordered subsets of event-types whose instances are located together and occur in stages. For example, analysis of crime datasets may reveal frequent occurrence of misdemeanors and drunk driving after bar closings on weekends and after large gatherings such as football games. Discovering CSTPs from ST datasets is important for application domains such as public safety (e.g. crime attractors and generators) and natural disaster planning(e.g. hurricanes). However, CSTP discovery is challenging for several reasons, including both the lack of computationally efficient, statistically meaningful metrics to quantify interestingness, and the large cardinality of candidate pattern sets that are exponential in the number of event types. Existing literature for ST data mining focuses on mining totally ordered sequences or unordered subsets. In contrast, this paper models CSTPs as partially ordered subsets of Boolean ST event types. We propose a new CSTP interest measure (the Cascade Participation Index) that is computationally cheap (O(n2) vs. exponential, where n is the dataset size) as well as statistically meaningful. We propose a novel algorithm exploiting the ST nature of datasets and evaluate filtering strategies to quickly prune uninteresting candidates. We present a case study to find CSTPs from real crime reports and provide a statistical explanation. Experimental results indicate that the proposed multiresolution spatio-temporal(MST) filtering strategy leads to significant savings in computational costs
    corecore