47 research outputs found

    Big spatial data processing frameworks: feature and performance evaluation: experiments & analyses

    Get PDF
    Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial data processing on Hadoop and Spark. In our comparison, we investigate their features as well as their performances in a micro benchmark for spatial filter and join queries. Based on the results and our experiences with these frameworks, we outline the requirements for a general spatio-temporal benchmark for Big Spatial Data processing platforms and sketch first solutions to the identified problems

    Big Data – a step change for SDI?

    Get PDF
    The globally hyped notion of Big Data has increasingly influenced scientific and technical debates about the handling and management of geospatial information. Accordingly, we see a need to recall what has happened over the past years, to present the recent Big Data landscape from an infrastructural perspective and to outline the major implications for the SDI community. We primarily conclude that it would be too simple and naïve to consider only the technological aspects that are underpinning geospatial (web) services. Instead, we request SDI researchers, engineers, providers and consumers to develop new methodologies and capacities for dealing with (geo)spatial information as part of broader knowledge infrastructures

    A SPARK BASED COMPUTING FRAMEWORK FOR SPATIAL DATA

    Get PDF

    Spatial Data Mining Analytical Environment for Large Scale Geospatial Data

    Get PDF
    Nowadays, many applications are continuously generating large-scale geospatial data. Vehicle GPS tracking data, aerial surveillance drones, LiDAR (Light Detection and Ranging), world-wide spatial networks, and high resolution optical or Synthetic Aperture Radar imagery data all generate a huge amount of geospatial data. However, as data collection increases our ability to process this large-scale geospatial data in a flexible fashion is still limited. We propose a framework for processing and analyzing large-scale geospatial and environmental data using a “Big Data” infrastructure. Existing Big Data solutions do not include a specific mechanism to analyze large-scale geospatial data. In this work, we extend HBase with Spatial Index(R-Tree) and HDFS to support geospatial data and demonstrate its analytical use with some common geospatial data types and data mining technology provided by the R language. The resulting framework has a robust capability to analyze large-scale geospatial data using spatial data mining and making its outputs available to end users

    Otimização no custo para processamento de Big GeoSpatial Data em ambiente de nuvem computacional

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2018.Os dados geográficos representam abstrações de entidades do Mundo real e podem ser obtidos de diversas formas. Além disso, eles possuem algumas propriedades que os diferenciam dos demais tipos de dados, tais como a estrutura complexa, a dinamicidade e o volume. Nos últimos anos, com o crescimento do volume dos dados geográficos, conceituado como big geospatial data, algumas ferramentas foram desenvolvidas para possibilitar o processamento eficiente desses dados, entre elas o SpatialHadoop, que é um framework incorporado ao Hadoop. A utilização da indexação correta dos dados baseado no conjunto de dados a ser processado, e também nas consultas e nas operações a serem realizadas é fundamental para que estas aplicações tenham o melhor desempenho. Por outro lado, como a tarifação em provedores públicos de computação em nuvem ocorre de acordo com o uso, é importante otimizar a execução das aplicações para evitar desperdício financeiro. Assim, este trabalho propõe a construção de uma Base de Conhecimento e de um Mecanismo de Inferência que buscam a otimização dos custos para o processamento de big geospatial data em provedores públicos de nuvem. Além disso, é apresentada uma compara ção entre os serviços oferecidos pelos três principais provedores de nuvem pública para o processamento de grande volume de dados. Os testes executados demonstraram que a utilização das regras geradas pelo Mecanismo de Inferência e a escolha do provedor de menor custo são capazes de otimizar os custos totais de processamento em até 71%.Spatial Data represents abstractions of real-world entities and can be obtained in various ways. They have properties that di erentiate them from other types of data, such as a complex structure and dynamism. In recent years with the increasing volume of spatial data, referred to as big geospatial data, some tools have been developed to process this data e ciently, such as SpatialHadoop, a framework incorporated into Hadoop. The use of appropriate data indices based on the dataset to be processed, queries and operations to be performed is essential for the optimal performance of these applications. In particular, since public cloud providers charge based on the resources used, it is imperative to optimize application execution to avoid unnecessary expense. This paper proposes the construction of a Knowledge Base and an Inference Engine that seek to minimize the cost of processing big geospatial data in public cloud providers. In addition, a comparison of the services o ered by three public cloud providers for large-volume data processing is presented. The tests performed demonstrate that the use of rules generated by the Inference Engine and the choice of the lowest-cost provider can reduce the total processing cost by up to 71%

    Efficient Distance Join Query Processing in Distributed Spatial Data Management Systems

    Get PDF
    Due to the ubiquitous use of spatial data applications and the large amounts of such data these applications use, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Distance Join Queries (DJQs) are important and frequently used operations in numerous applications, including data mining, multimedia and spatial databases. DJQs (e.g., k Nearest Neighbor Join Query, k Closest Pair Query, ε Distance Join Query, etc.) are costly operations, since they involve both the join and distance-based search, and performing DJQs efficiently is a challenging task. Recent Big Data developments have motivated the emergence of novel technologies for distributed processing of large-scale spatial data in clusters of computers, leading to Distributed Spatial Data Management Systems (DSDMSs). Distributed cluster-based computing systems can be classified as Hadoop-based or Spark-based systems. Based on this classification, in this paper, we compare two of the most recent and leading DSDMSs, SpatialHadoop and LocationSpark, by evaluating the performance of several existing and newly proposed parallel and distributed DJQ algorithms under various settings with large spatial real-world datasets. A general conclusion arising from the execution of the distributed DJQ algorithms studied is that, while SpatialHadoop is a robust and efficient system when large spatial datasets are joined (since it is built on top of the mature Hadoop platform), LocationSpark is the clear winner in total execution time efficiency when medium spatial datasets are combined (due to in-memory processing provided by Spark). However, LocationSpark requires higher memory allocation when large spatial datasets are involved in DJQs (even more so when k and ε are large). Finally, this detailed performance study has demonstrated that the new distributed DJQ algorithms we have proposed are efficient, robust and scalable with respect to different parameters, such as dataset sizes, k, ε and number of computing nodes

    Towards intelligent geo-database support for earth system observation: Improving the preparation and analysis of big spatio-temporal raster data

    Get PDF
    The European COPERNICUS program provides an unprecedented breakthrough in the broad use and application of satellite remote sensing data. Maintained on a sustainable basis, the COPERNICUS system is operated on a free-and-open data policy. Its guaranteed availability in the long term attracts a broader community to remote sensing applications. In general, the increasing amount of satellite remote sensing data opens the door to the diverse and advanced analysis of this data for earth system science. However, the preparation of the data for dedicated processing is still inefficient as it requires time-consuming operator interaction based on advanced technical skills. Thus, the involved scientists have to spend significant parts of the available project budget rather on data preparation than on science. In addition, the analysis of the rich content of the remote sensing data requires new concepts for better extraction of promising structures and signals as an effective basis for further analysis. In this paper we propose approaches to improve the preparation of satellite remote sensing data by a geo-database. Thus the time needed and the errors possibly introduced by human interaction are minimized. In addition, it is recommended to improve data quality and the analysis of the data by incorporating Artificial Intelligence methods. A use case for data preparation and analysis is presented for earth surface deformation analysis in the Upper Rhine Valley, Germany, based on Persistent Scatterer Interferometric Synthetic Aperture Radar data. Finally, we give an outlook on our future research
    corecore