16 research outputs found

    AN APPROACH FOR STITCHING SATELLITE IMAGES IN A BIGDATA MAPREDUCE FRAMEWORK

    Get PDF

    A MapReduce-Based Big Spatial Data Framework for Solving the Problem of Covering a Polygon with Orthogonal Rectangles

    Get PDF
    The polygon covering problem is an important class of problems in the area of computational geometry. There are slightly different versions of this problem depending on the types of polygons to be addressed. In this paper, we focus on finding an answer to a question of whether an orthogonal rectangle, or spatial query window, is fully covered by a set of orthogonal rectangles which are in smaller sizes. This problem is encountered in many application domains including object recognition/extraction/trace, spatial analyses, topological analyses, and augmented reality applications. In many real-world applications, in the cases of using traditional central computation techniques, working with real world data results in a performance bottlenecks. The work presented in this paper proposes a high performance MapReduce-based big data framework to solve the polygon covering problem in the cases of using a spatial query window and data are represented as a set of orthogonal rectangles. Orthogonal rectangular polygons are represented in the form of minimum bounding boxes. The spatial query windows are also called as range queries. The proposed spatial big data framework is evaluated in terms of horizontal scalability. In addition, efficiency and speed-up performance metrics for the proposed two algorithms are measured

    Dutkat: A Privacy-Preserving System for Automatic Catch Documentation and Illegal Activity Detection in the Fishing Industry

    Get PDF
    United Nations' Sustainable Development Goal 14 aims to conserve and sustainably use the oceans and their resources for the benefit of people and the planet. This includes protecting marine ecosystems, preventing pollution, and overfishing, and increasing scientific understanding of the oceans. Achieving this goal will help ensure the health and well-being of marine life and the millions of people who rely on the oceans for their livelihoods. In order to ensure sustainable fishing practices, it is important to have a system in place for automatic catch documentation. This thesis presents our research on the design and development of Dutkat, a privacy-preserving, edge-based system for catch documentation and detection of illegal activities in the fishing industry. Utilising machine learning techniques, Dutkat can analyse large amounts of data and identify patterns that may indicate illegal activities such as overfishing or illegal discard of catch. Additionally, the system can assist in catch documentation by automating the process of identifying and counting fish species, thus reducing potential human error and increasing efficiency. Specifically, our research has consisted of the development of various components of the Dutkat system, evaluation through experimentation, exploration of existing data, and organization of machine learning competitions. We have also implemented it from a compliance-by-design perspective to ensure that the system is in compliance with data protection laws and regulations such as GDPR. Our goal with Dutkat is to promote sustainable fishing practices, which aligns with the Sustainable Development Goal 14, while simultaneously protecting the privacy and rights of fishing crews

    Integração, controle e acompanhamento da análise de imagens baseada em objeto e mineração de dados por meio da plataforma distribuída InterCloud

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Instituto de Geociências, Pós-Graduação em Geociências Aplicadas, 2018.Atualmente, enormes volumes de dados de sensoriamento remoto são geradas em pouco espaço de tempo e manipular esses dados se torna um desafio para os profissionais e pesquisadores de sensoriamento remoto (SR), que necessitam de ferramentas e modelos mais eficientes de processamento e interpretação de imagens. Nesta linha de raciocínio, o presente trabalho apresenta um novo método on-line de integração de uma plataforma distribuída de classificação de imagem baseada em objetos e algoritmo de classificação de aprendizado de máquina para criação de modelos estatísticos de interpretação. Por meio do sistema InterCloud, que é uma nova plataforma de interpretação de imagens projetada para rodar em redes de computadores (clusters físicos ou infra-estrutura de computação em nuvem), e os frameworks para computação distribuída Apache Hive que cria tabelas virtuais, a MLlib do Apache Spark que é uma biblioteca de machine learning e o Apache Zeppelin que disponibiliza um notebook web, foi possível disponibilizar dados, tabelas e gráficos com valores de pixels para modelagem estatísticas de interpretação. No protótipo implementado, o sistema Apache Zeppelin forneceu os meios para usar a biblioteca de aprendizado de máquina Scikit-Learn Python na criação de um modelo de classificação (Árvore de Decisão), que foi simulado no InterCloud por meio de um sript pig. Neste trabalho, também avaliamos a abordagem com uma aplicação de interpretação de imagem baseada em objeto, cobertura terrestre, realizada em uma cena GeoEye-1 de 103 Km² (19k por 23k pixels), usando recursos de um serviço de infraestrutura de computação em nuvem comercial. 24 atributos (espectrais e morfológicos) e 11 classes de objetos, incluindo alvos urbanos e rurais, foram considerados. O estudo avaliou as possibilidades de escalabidade para execução de diferentes tarefas e, a exatidão da classificação por meio de uma matriz de confusão.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).Currently, huge amounts of remote sensing data are generated in a short time and manipulating such data becomes a challenge for Remote Sensing (SR) professionals and researchers. Efficient tools and patterns of image processing and interpretation need to be made available. The present study is aimed to show a new online method of integrating a distributed object-based image classification platform and machine learning Decision Tree algorithm for creating statistical patterns of interpretation. Through the InterCloud system, which is a new imaging platform designed to run on computer networks (physical clusters or cloud computing support), and the Apache Hive distributed computing frameworks that create virtual tables, MLlib of Apache Spark which is a library of machine learning and Apache Zeppelin that makes available a web notebook, it was possible to make available data, tables, and graphics with pixel values for statistical patterns of interpretation. In the prototype implemented, the Apache Zeppelin system provided the means to use another Sci-kit-Learn Python machine learning library establishing a classification pattern (Decision Tree) that was simulated in InterCloud platform by means of a script pig. We also used the object-based image analysis approach interpretation to evaluate the image into terrestrial coverage, performed in a 103 Km² (19k by 23k pixels) GeoEye-1 scene using features of a commercial cloud computing support service. 24 attributes (spectral and morphological) and 11 classes of objects, including urban and rural targets, were considered. In addition to the accuracy of the classification result evaluated by means of accurate indexes, we evaluate the InterCloud ability to perform different tasks (distributed segmentation, extraction of characteristics and distributed classification) with different configurations of the cloud infrastructure, in which they were varied in the number of nodes/clusters. The accuracy index of the final classification was evaluated by means of the confusion matrix in agreement with the coefficients

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Near Data Processing for Efficient and Trusted Systems

    Full text link
    We live in a world which constantly produces data at a rate which only increases with time. Conventional processor architectures fail to process this abundant data in an efficient manner as they expend significant energy in instruction processing and moving data over deep memory hierarchies. Furthermore, to process large amounts of data in a cost effective manner, there is increased demand for remote computation. While cloud service providers have come up with innovative solutions to cater to this increased demand, the security concerns users feel for their data remains a strong impediment to their wide scale adoption. An exciting technique in our repertoire to deal with these challenges is near-data processing. Near-data processing (NDP) is a data-centric paradigm which moves computation to where data resides. This dissertation exploits NDP to both process the data deluge we face efficiently and design low-overhead secure hardware designs. To this end, we first propose Compute Caches, a novel NDP technique. Simple augmentations to underlying SRAM design enable caches to perform commonly used operations. In-place computation in caches not only avoids excessive data movement over memory hierarchy, but also significantly reduces instruction processing energy as independent sub-units inside caches perform computation in parallel. Compute Caches significantly improve the performance and reduce energy expended for a suite of data intensive applications. Second, this dissertation identifies security advantages of NDP. While memory bus side channel has received much attention, a low-overhead hardware design which defends against it remains elusive. We observe that smart memory, memory with compute capability, can dramatically simplify this problem. To exploit this observation, we propose InvisiMem which uses the logic layer in the smart memory to implement cryptographic primitives, which aid in addressing memory bus side channel efficiently. Our solutions obviate the need for expensive constructs like Oblivious RAM (ORAM) and Merkle trees, and have one to two orders of magnitude lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions. This dissertation also addresses a related vulnerability of page fault side channel in which the Operating System (OS) induces page faults to learn application's address trace and deduces application secrets from it. To tackle it, we propose Sanctuary which obfuscates page fault channel while allowing the OS to manage memory as a resource. To do so, we design a novel construct, Oblivious Page Management (OPAM) which is derived from ORAM but is customized for page management context. We employ near-memory page moves to reduce OPAM overhead and also propose a novel memory partition to reduce OPAM transactions required. For a suite of cloud applications which process sensitive data we show that page fault channel can be tackled at reasonable overheads.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144139/1/shaizeen_1.pd

    GEOBIA 2016 : Solutions and Synergies., 14-16 September 2016, University of Twente Faculty of Geo-Information and Earth Observation (ITC): open access e-book

    Get PDF

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity
    corecore