Search CORE

47 research outputs found

Big spatial data processing frameworks: feature and performance evaluation: experiments & analyses

Author: Götze Philipp
Hagedorn Stefan
Sattler Kai-Uwe
Publication venue
Publication date: 01/01/2017
Field of study

Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial data processing on Hadoop and Spark. In our comparison, we investigate their features as well as their performances in a micro benchmark for spatial filter and join queries. Based on the results and our experiences with these frameworks, we outline the requirements for a general spatio-temporal benchmark for Big Spatial Data processing platforms and sketch first solutions to the identified problems

Digitale Bibliothek Thüringen

Big Data – a step change for SDI?

Author: Schade Sven
Tsinaraki Chrisa
Publication venue: 'Publications Office of the European Union'
Publication date: 11/04/2016
Field of study

The globally hyped notion of Big Data has increasingly influenced scientific and technical debates about the handling and management of geospatial information. Accordingly, we see a need to recall what has happened over the past years, to present the recent Big Data landscape from an infrastructural perspective and to outline the major implications for the SDI community. We primarily conclude that it would be too simple and naïve to consider only the technological aspects that are underpinning geospatial (web) services. Instead, we request SDI researchers, engineers, providers and consumers to develop new methodologies and capacities for dealing with (geo)spatial information as part of broader knowledge infrastructures

International Journal of Spatial Data Infrastructures Research (Joint Research Centre of the European Commission)

A SPARK BASED COMPUTING FRAMEWORK FOR SPATIAL DATA

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

Spatial Data Mining Analytical Environment for Large Scale Geospatial Data

Author: Yang Zhao
Publication venue: ScholarWorks@UNO
Publication date: 16/12/2016
Field of study

Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users

University of New Orleans

Otimização no custo para processamento de Big GeoSpatial Data em ambiente de nuvem computacional

Author: Bachiega Junior João
Publication venue
Publication date: 21/06/2018
Field of study

Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2018.Os dados geográficos representam abstrações de entidades do Mundo real e podem ser obtidos de diversas formas. Além disso, eles possuem algumas propriedades que os diferenciam dos demais tipos de dados, tais como a estrutura complexa, a dinamicidade e o volume. Nos últimos anos, com o crescimento do volume dos dados geográficos, conceituado como big geospatial data, algumas ferramentas foram desenvolvidas para possibilitar o processamento eficiente desses dados, entre elas o SpatialHadoop, que é um framework incorporado ao Hadoop. A utilização da indexação correta dos dados baseado no conjunto de dados a ser processado, e também nas consultas e nas operações a serem realizadas é fundamental para que estas aplicações tenham o melhor desempenho. Por outro lado, como a tarifação em provedores públicos de computação em nuvem ocorre de acordo com o uso, é importante otimizar a execução das aplicações para evitar desperdício financeiro. Assim, este trabalho propõe a construção de uma Base de Conhecimento e de um Mecanismo de Inferência que buscam a otimização dos custos para o processamento de big geospatial data em provedores públicos de nuvem. Além disso, é apresentada uma compara ção entre os serviços oferecidos pelos três principais provedores de nuvem pública para o processamento de grande volume de dados. Os testes executados demonstraram que a utilização das regras geradas pelo Mecanismo de Inferência e a escolha do provedor de menor custo são capazes de otimizar os custos totais de processamento em até 71%.Spatial Data represents abstractions of real-world entities and can be obtained in various ways. They have properties that di erentiate them from other types of data, such as a complex structure and dynamism. In recent years with the increasing volume of spatial data, referred to as big geospatial data, some tools have been developed to process this data e ciently, such as SpatialHadoop, a framework incorporated into Hadoop. The use of appropriate data indices based on the dataset to be processed, queries and operations to be performed is essential for the optimal performance of these applications. In particular, since public cloud providers charge based on the resources used, it is imperative to optimize application execution to avoid unnecessary expense. This paper proposes the construction of a Knowledge Base and an Inference Engine that seek to minimize the cost of processing big geospatial data in public cloud providers. In addition, a comparison of the services o ered by three public cloud providers for large-volume data processing is presented. The tests performed demonstrate that the use of rules generated by the Inference Engine and the choice of the lowest-cost provider can reduce the total processing cost by up to 71%

Repositório Institucional da Universidade de Brasília

Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems

Author: Corral Antonio
García-García Francisco
Iribarne Luis
Manolopoulos Yannis
Vassilakopoulos Michael
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries (DJQs) are important and frequently used operations in numerous applications, including data mining, multimedia and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, ε Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems (DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and ε are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have proposed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, ε and number of computing nodes

Repositorio Institucional de la Universidad de Almería (Spain)

Towards intelligent geo-database support for earth system observation: Improving the preparation and analysis of big spatio-temporal raster data

Author: Al-Doori M.
Breunig M.
Heck A.
Kuper P.
Kutterer H.
Mazroob Semnani N.
Publication venue: ISPRS
Publication date: 01/01/2020
Field of study

The European COPERNICUS program provides an unprecedented breakthrough in the broad use and application of satellite remote sensing data. Maintained on a sustainable basis, the COPERNICUS system is operated on a free-and-open data policy. Its guaranteed availability in the long term attracts a broader community to remote sensing applications. In general, the increasing amount of satellite remote sensing data opens the door to the diverse and advanced analysis of this data for earth system science. However, the preparation of the data for dedicated processing is still inefficient as it requires time-consuming operator interaction based on advanced technical skills. Thus, the involved scientists have to spend significant parts of the available project budget rather on data preparation than on science. In addition, the analysis of the rich content of the remote sensing data requires new concepts for better extraction of promising structures and signals as an effective basis for further analysis. In this paper we propose approaches to improve the preparation of satellite remote sensing data by a geo-database. Thus the time needed and the errors possibly introduced by human interaction are minimized. In addition, it is recommended to improve data quality and the analysis of the data by incorporating Artificial Intelligence methods. A use case for data preparation and analysis is presented for earth surface deformation analysis in the Upper Rhine Valley, Germany, based on Persistent Scatterer Interferometric Synthetic Aperture Radar data. Finally, we give an outlook on our future research

KITopen