Development of a New Framework for Distributed Processing of Geospatial Big Data

Abstract

Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)

    Similar works