Search CORE

957 research outputs found

Spatial Data Mining Analytical Environment for Large Scale Geospatial Data

Author: Yang Zhao
Publication venue: ScholarWorks@UNO
Publication date: 16/12/2016
Field of study

Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users

University of New Orleans

Development of a New Framework for Distributed Processing of Geospatial Big Data

Author: Kristóf Dániel
Olasz Angéla
Thai Nguyen Binh
Publication venue: 'Publications Office of the European Union'
Publication date: 01/01/2017
Field of study

Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)

International Journal of Spatial Data Infrastructures Research (Joint Research Centre of the European Commission)

Repository of the Academy's Library

MPI-Vector-IO: Parallel I/O and Partitioning for Geospatial Vector Data

Author: Paudel Anmol
Prasad Sushil K.
Puri Satish
Publication venue: e-Publications@Marquette
Publication date: 01/01/2018
Field of study

In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO 1 , a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms

epublications@Marquette

Big Data Computing for Geospatial Applications

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

Directory of Open Access Books (DOAB)

Recommended from our members

An Architecture for Big Data Analytics

Author: Chan Joseph O.
Publication venue: CSUSB ScholarWorks
Publication date: 19/06/2014
Field of study

Big Data is the new experience curve in the new economy driven by data with high volume, velocity, variety, and veracity. They come from various sources that include the Internet, mobile devices, social media, geospatial devices, sensors, and other machine-generated data. Unlocking the value of Big Data allows business to better sense and respond to the environment, and is becoming a key to creating competitive advantages in a complex and rapidly changing market. Government is also taking notice of the Big Data phenomenon and has created initiatives to exploit Big Data in many areas such as science and engineering, healthcare and national security. Traditional data processing and analysis of structured data using RDBMS and data warehousing no longer satisfy the challenges of Big Data. Technology trends for Big Data embrace open source software, commodity servers, and massively parallel-distributed processing platforms. Analytics is at the core of exploiting values from Big Data to create consumable insights for business and government. This paper presents architecture for Big Data Analytics and explores Big Data technologies that include NoSQL databases, Hadoop Distributed File System and MapReduce

CSUSB ScholarWorks

A study of three paradigms for storing geospatial data: distributed-cloud model, relational database, and indexed flat file

Author: Toups Matthew A
Publication venue: ScholarWorks@UNO
Publication date: 13/05/2016
Field of study

Geographic Information Systems (GIS) and related applications of geospatial data were once a small software niche; today nearly all Internet and mobile users utilize some sort of mapping or location-aware software. This widespread use reaches beyond mere consumption of geodata; projects like OpenStreetMap (OSM) represent a new source of geodata production, sometimes dubbed “Volunteered Geographic Information.” The volume of geodata produced and the user demand for geodata will surely continue to grow, so the storage and query techniques for geospatial data must evolve accordingly. This thesis compares three paradigms for systems that manage vector data. Over the past few decades these methodologies have fallen in and out of favor. Today, some are considered new and experimental (distributed), others nearly forgotten (flat file), and others are the workhorse of present-day GIS (relational database). Each is well-suited to some use cases, and poorly-suited to others. This thesis investigates exemplars of each paradigm

University of New Orleans

A case study of advancing remote sensing image analysis

Author: Fekete István
Giachetta Roberto
Publication venue
Publication date: 01/01/2015
Field of study

Big data and cloud computing are two phenomena, which have gained significant reputation over the last few years. In computer science the approach shifted towards distributed architectures and high performance computing. In case of geographical information systems (GIS) and remote sensing image analysis, the new paradigms have already been successfully applied to several problems, and systems have been developed to support processing of geographical and remote sensing data in the cloud. However, due to different circumstances many previous workflows have to be reconsidered and redesigned. Our goal is to show a way how the existing approaches to remote sensing image analysis can be advanced to take advantages of these new paradigms. The task aiming in shifting the algorithms shall require a moderate effort and must avoid the complete redesign and reimplementation of the existing approaches. We present the whole journey as a case study using an existing industrial workflow for demonstration. Nevertheless, we define the rules of thumb, which can come in hand when shifting any existing GIS workflows. Our case study is the workflow of waterlogging and flood detection, which is an operative task at the Institute of Geodesy, Cartography and Remote Sensing (FÖMI). This task in currently operational using a semi-automatic single machine approach involving multiple software. The workflow is neither efficient nor scalable, thus it is not applicable in emergency situations where quick response is required. We present an approach utilizing distributed computing, which enables the automated execution of this task on large input data with much better response time. The approach is based on the well-known MapReduce paradigm, its open-source implementation, the Apache Hadoop framework and the AEGIS geospatial toolkit. This enables the replacement of multiple software to a single, generic framework. Results show that significant performance benefits can be achieved at the expense of minor accuracy loss

University of Szeged