957 research outputs found

    Spatial Data Mining Analytical Environment for Large Scale Geospatial Data

    Get PDF
    Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users

    Development of a New Framework for Distributed Processing of Geospatial Big Data

    Get PDF
    Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)

    MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data

    Get PDF
    In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms

    Big Data Computing for Geospatial Applications

    Get PDF
    The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

    A study of three paradigms for storing geospatial data: distributed-cloud model, relational database, and indexed flat file

    Get PDF
    Geographic Information Systems (GIS) and related applications of geospatial data were once a small software niche; today nearly all Internet and mobile users utilize some sort of mapping or location-aware software. This widespread use reaches beyond mere consumption of geodata; projects like OpenStreetMap (OSM) represent a new source of geodata production, sometimes dubbed “Volunteered Geographic Information.” The volume of geodata produced and the user demand for geodata will surely continue to grow, so the storage and query techniques for geospatial data must evolve accordingly. This thesis compares three paradigms for systems that manage vector data. Over the past few decades these methodologies have fallen in and out of favor. Today, some are considered new and experimental (distributed), others nearly forgotten (flat file), and others are the workhorse of present-day GIS (relational database). Each is well-suited to some use cases, and poorly-suited to others. This thesis investigates exemplars of each paradigm

    A case study of advancing remote sensing image analysis

    Get PDF
    Big data and cloud computing are two phenomena, which have gained significant reputation over the last few years. In computer science the approach shifted towards distributed architectures and high performance computing. In case of geographical information systems (GIS) and remote sensing image analysis, the new paradigms have already been successfully applied to several problems, and systems have been developed to support processing of geographical and remote sensing data in the cloud. However, due to different circumstances many previous workflows have to be reconsidered and redesigned. Our goal is to show a way how the existing approaches to remote sensing image analysis can be advanced to take advantages of these new paradigms. The task aiming in shifting the algorithms shall require a moderate effort and must avoid the complete redesign and reimplementation of the existing approaches. We present the whole journey as a case study using an existing industrial workflow for demonstration. Nevertheless, we define the rules of thumb, which can come in hand when shifting any existing GIS workflows. Our case study is the workflow of waterlogging and flood detection, which is an operative task at the Institute of Geodesy, Cartography and Remote Sensing (FĂ–MI). This task in currently operational using a semi-automatic single machine approach involving multiple software. The workflow is neither efficient nor scalable, thus it is not applicable in emergency situations where quick response is required. We present an approach utilizing distributed computing, which enables the automated execution of this task on large input data with much better response time. The approach is based on the well-known MapReduce paradigm, its open-source implementation, the Apache Hadoop framework and the AEGIS geospatial toolkit. This enables the replacement of multiple software to a single, generic framework. Results show that significant performance benefits can be achieved at the expense of minor accuracy loss
    • …
    corecore