21 research outputs found

    A Computational Framework for Finding Interestingness Hotspots in Spatial Datasets

    Get PDF
    The significant growth of spatial data increased the need for automated discovery of spatial knowledge. An important task when analyzing spatial data is hotspot discovery. In this dissertation, we propose a novel methodology for discovering interestingness hotspots in spatial datasets. We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert’s notion of interestingness captured by an interestingness function. We propose computational methods for finding interestingness hotspots in point-based and polygonal spatial datasets, and gridded spatial-temporal datasets. The proposed framework identifies hotspots maximizing an externally given interestingness function defined on any number of spatial or non-spatial attributes using a five-step methodology, which consists of: (1) identifying neighboring objects in the dataset, (2) generating hotspot seeds, (3) growing hotspots from identified hotspot seeds, (4) post-processing to remove highly overlapping neighboring redundant hotspots, and (5) finding the scope of hotspots. In particular, we introduce novel hotspot growing algorithms that grow hotspots from hotspot seeds. A novel growing algorithm for point-based datasets is introduced that operates on Gabriel Graphs, capturing the neighboring relationships of objects in a spatial dataset. Moreover, we present a novel graph-based post-processing algorithm, which removes highly overlapping hotspots and employs a graph simplification step that significantly improves the runtime of finding maximum weight independent set in the overlap graph of hotspots. The proposed post-processing algorithm is quite generic and can be used with any methods to cope with overlapping hotspots or clusters. Additionally, the employed graph simplification step can be adapted as a preprocessing step by algorithms that find maximum weight clique and maximum weight independent sets in graphs. Furthermore, we propose a computational framework for finding the scope of two-dimensional point-based hotspots. We evaluate our framework in case studies using a gridded air-pollution dataset, and point-based crime and taxicab datasets in which we find hotspots based on different interestingness functions and we give a comparison of our framework with a state of the art hotspot discovery technique. Experiments show that our methodology succeeds in accurately discovering interestingness hotspots and does well in comparison to traditional hotspot detection methods.Computer Science, Department o

    Discovery of Spatiotemporal Event Sequences

    Get PDF
    Finding frequent patterns plays a vital role in many analytics tasks such as finding itemsets, associations, correlations, and sequences. In recent decades, spatiotemporal frequent pattern mining has emerged with the main goal focused on developing data-driven analysis frameworks for understanding underlying spatial and temporal characteristics in massive datasets. In this thesis, we will focus on discovering spatiotemporal event sequences from large-scale region trajectory datasetes with event annotations. Spatiotemporal event sequences are the series of event types whose trajectory-based instances follow each other in spatiotemporal context. We introduce new data models for storing and processing evolving region trajectories, provide a novel framework for modeling spatiotemporal follow relationships, and present novel spatiotemporal event sequence mining algorithms

    Localizing the media, locating ourselves: a critical comparative analysis of socio-spatial sorting in locative media platforms (Google AND Flickr 2009-2011)

    Get PDF
    In this thesis I explore media geocoding (i.e., geotagging or georeferencing), the process of inscribing the media with geographic information. A process that enables distinct forms of producing, storing, and distributing information based on location. Historically, geographic information technologies have served a biopolitical function producing knowledge of populations. In their current guise as locative media platforms, these systems build rich databases of places facilitated by user-generated geocoded media. These geoindexes render places, and users of these services, this thesis argues, subject to novel forms of computational modelling and economic capture. Thus, the possibility of tying information, people and objects to location sets the conditions to the emergence of new communicative practices as well as new forms of governmentality (management of populations). This project is an attempt to develop an understanding of the socio-economic forces and media regimes structuring contemporary forms of location-aware communication, by carrying out a comparative analysis of two of the main current location-enabled platforms: Google and Flickr. Drawing from the medium-specific approach to media analysis characteristic of the subfield of Software Studies, together with the methodological apparatus of Cultural Analytics (data mining and visualization methods), the thesis focuses on examining how social space is coded and computed in these systems. In particular, it looks at the databases’ underlying ontologies supporting the platforms' geocoding capabilities and their respective algorithmic logics. In the final analysis the thesis argues that the way social space is translated in the form of POIs (Points of Interest) and business-biased categorizations, as well as the geodemographical ordering underpinning the way it is computed, are pivotal if we were to understand what kind of socio-spatial relations are actualized in these systems, and what modalities of governing urban mobility are enabled

    Geographic Feature Mining: Framework and Fundamental Tasks for Geographic Knowledge Discovery from User-generated Data

    Get PDF
    We live in a data-rich environment where massive amounts of data such as text messages, articles, images, and search queries are continuously generated by users. In this environment, new opportunities to discover and utilize knowledge about the real-world arise, such as the extraction and description of places and events from social media records, the organization of documents by spatio-temporal topics, and the prediction of epidemics by search engine queries. Major challenges addressed in these data- and application-specific works arise from the unstructured and complex nature of the data, and the high level of uncertainty and sparsity of the attributes. Despite the evident progress in utilizing specific data sources for different applications, there remains a lack of common concepts and techniques on how to exploit the data as high-quality sensors of geographic space in a general manner. However, such a general point of view allows to address the common challenges and to define fundamental building blocks to deal with problems in fields like information retrieval, recommender systems, market research, health surveillance, and social sciences. In this thesis, we develop concepts and techniques to utilize various kinds of user-generated data as a steady source of information about geographic processes and entities (together called geographic phenomena). For this, we introduce a novel conceptual data mining framework, called geographic feature mining, that provides the foundation to discover and extract highly informative and discriminative dimensions of geographic space in a unifying and systematic fashion. This is achieved by representing the qualitative and geographic information in the records as geographic feature signals, each constituting a potential dimensions to describe geographic space. The mining process then determines highly informative features or feature combinations from the candidate sets that can be used as a steady source of auxiliary information for domain-specific applications. In developing the framework, we make contributions to several fundamental problems: (1) We introduce a novel probabilistic model to extract high-quality geographic feature signals. The signals are robust to noise and background distributions, and the model allows to exploit diverse kinds of qualitative and geographic information in the records. This flexibility is achieved by utilizing a Bayesian network model and the robustness by choosing appropriate prior distributions. (2) We address the problem of categorizing and selecting geographic features based on their spatio-temporal type, such as feature signals having landmark, regional, or global semantics. For this, we introduce representations of the signals by interaction characteristics and evaluate their performance in clustering and data summarization tasks. (3) To extract a small number of highly informative feature combinations that reflect geographic phenomena, we introduce a model that extracts latent geographic features from the candidate signals using dimensionality reduction. We show that this model outperforms document-centric topic models with respect to the informativeness of the extracted phenomena, and we exhaustively evaluate how different statistical properties of the approaches affect the characteristics of the resulting feature combinations

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Event detection in high throughput social media

    Get PDF

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    Event detection in high throughput social media

    Get PDF

    Spatial Keyword Querying: Ranking Evaluation and Efficient Query Processing

    Get PDF
    corecore