871 research outputs found

    SAT-hadoop-processor: a distributed remote sensing big data processing software for earth observation applications

    Get PDF
    Nowadays, several environmental applications take advantage of remote sensing techniques. A considerable volume of this remote sensing data occurs in near real-time. Such data are diverse and are provided with high velocity and variety, their pre-processing requires large computing capacities, and a fast execution time is critical. This paper proposes a new distributed software for remote sensing data pre-processing and ingestion using cloud computing technology, specifically OpenStack. The developed software discarded 86% of the unneeded daily files and removed around 20% of the erroneous and inaccurate datasets. The parallel processing optimized the total execution time by 90%. Finally, the software efficiently processed and integrated data into the Hadoop storage system, notably the HDFS, HBase, and Hive.This research was funded by Erasmus+ KA 107 program, and the UPC funded the APC. This work has received funding from the Spanish Government under contracts PID2019-106774RBC21, PCI2019-111851-2 (LeadingEdge CHIST-ERA), PCI2019-111850-2 (DiPET CHIST-ERA).Peer ReviewedPostprint (published version

    Development of a New Framework for Distributed Processing of Geospatial Big Data

    Get PDF
    Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)

    Big Geospatial Data processing in the IQmulus Cloud

    Get PDF
    Remote sensing instruments are continuously evolving in terms of spatial, spectral and temporal resolutions and hence provide exponentially increasing amounts of raw data. These volumes increase significantly faster than computing speeds. All these techniques record lots of data, yet in different data models and representations; therefore, resulting datasets require harmonization and integration prior to deriving meaningful information from them. All in all, huge datasets are available but raw data is almost of no value if not processed, semantically enriched and quality checked. The derived information need to be transferred and published to all level of possible users (from decision makers to citizens). Up to now, there are only limited automatic procedures for this; thus, a wealth of information is latent in many datasets. This paper presents the first achievements of the IQmulus EU FP7 research and development project with respect to processing and analysis of big geospatial data in the context of flood and waterlogging detection

    A case study of advancing remote sensing image analysis

    Get PDF
    Big data and cloud computing are two phenomena, which have gained significant reputation over the last few years. In computer science the approach shifted towards distributed architectures and high performance computing. In case of geographical information systems (GIS) and remote sensing image analysis, the new paradigms have already been successfully applied to several problems, and systems have been developed to support processing of geographical and remote sensing data in the cloud. However, due to different circumstances many previous workflows have to be reconsidered and redesigned. Our goal is to show a way how the existing approaches to remote sensing image analysis can be advanced to take advantages of these new paradigms. The task aiming in shifting the algorithms shall require a moderate effort and must avoid the complete redesign and reimplementation of the existing approaches. We present the whole journey as a case study using an existing industrial workflow for demonstration. Nevertheless, we define the rules of thumb, which can come in hand when shifting any existing GIS workflows. Our case study is the workflow of waterlogging and flood detection, which is an operative task at the Institute of Geodesy, Cartography and Remote Sensing (FÖMI). This task in currently operational using a semi-automatic single machine approach involving multiple software. The workflow is neither efficient nor scalable, thus it is not applicable in emergency situations where quick response is required. We present an approach utilizing distributed computing, which enables the automated execution of this task on large input data with much better response time. The approach is based on the well-known MapReduce paradigm, its open-source implementation, the Apache Hadoop framework and the AEGIS geospatial toolkit. This enables the replacement of multiple software to a single, generic framework. Results show that significant performance benefits can be achieved at the expense of minor accuracy loss

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    Performance-Aware High-Performance Computing for Remote Sensing Big Data Analytics

    Get PDF
    The incredible increase in the volume of data emerging along with recent technological developments has made the analysis processes which use traditional approaches more difficult for many organizations. Especially applications involving subjects that require timely processing and big data such as satellite imagery, sensor data, bank operations, web servers, and social networks require efficient mechanisms for collecting, storing, processing, and analyzing these data. At this point, big data analytics, which contains data mining, machine learning, statistics, and similar techniques, comes to the help of organizations for end-to-end managing of the data. In this chapter, we introduce a novel high-performance computing system on the geo-distributed private cloud for remote sensing applications, which takes advantages of network topology, exploits utilization and workloads of CPU, storage, and memory resources in a distributed fashion, and optimizes resource allocation for realizing big data analytics efficiently

    SAT-CEP-monitor: An air quality monitoring software architecture combining complex event processing with satellite remote sensing

    Get PDF
    La contaminación del aire es un problema importante hoy en día que causa graves daños a la salud humana. Las áreas urbanas son las más afectadas por la degradación de la calidad del aire causada por las emisiones de gases antropogénicos. Aunque existen múltiples propuestas para el monitoreo de la calidad del aire, en la mayoría de los casos, se imponen dos limitaciones: la imposibilidad de procesar datos en tiempo casi real (NRT) para enfoques de teledetección y la imposibilidad de llegar a áreas de acceso limitado o baja cobertura de red para enfoques de datos terrestres. Proponemos una arquitectura de software que combina eficientemente el procesamiento de eventos complejos con datos de teledetección de varios sensores satelitales para monitorear la calidad del aire en NRT, brindando apoyo a los tomadores de decisiones. Ilustramos la solución propuesta calculando los niveles de calidad del aire para varias áreas de Marruecos y España, extrayendo y procesando información satelital en NRT. Este estudio también valida la calidad del aire medida por estaciones terrestres y datos de sensores satelitales.Air pollution is a major problem today that causes serious damage to human health. Urban areas are the most affected by the degradation of air quality caused by anthropogenic gas emissions. Although there are multiple proposals for air quality monitoring, in most cases, two limitations are imposed: the impossibility of processing data in Near Real-Time (NRT) for remote sensing approaches and the impossibility of reaching areas of limited accessibility or low network coverage for ground data approaches. We propose a software architecture that efficiently combines complex event processing with remote sensing data from various satellite sensors to monitor air quality in NRT, giving support to decision-makers. We illustrate the proposed solution by calculating the air quality levels for several areas of Morocco and Spain, extracting and processing satellite information in NRT. This study also validates the air quality measured by ground stations and satellite sensor data.This work was partially supported by the Spanish Ministry of Science and Innovation and the European Regional Development Fund (ERDF) under project FAME [RTI2018-093608-B-C33]. The corresponding author thanks the ERASMUS+ KA107 program for the grant and acknowledges the University of Cadiz for the academic supervision and their research facilities, grant number: 2017-1-ES01- KA107-037422 and 2018-1-ES01-KA107-049705. The authors of this work are also thankful to the Andalusian and Madrid regional governments for providing us with the NRT MGS data
    corecore