Search CORE

754 research outputs found

CD/CV: Blockchain-based schemes for continuous verifiability and traceability of IoT data for edge-fog-cloud

Author: Carretero Pérez Jesús
Gonzalez-Compean Jose Luis
Martinez Rendon Cristhian
Sánchez Gallegos Dante D.
Publication venue: Elsevier
Publication date: 01/01/2023
Field of study

This paper presents a continuous delivery/continuous verifiability (CD/CV) method for IoT dataflows in edge¿fog¿cloud. A CD model based on extraction, transformation, and load (ETL) mechanism as well as a directed acyclic graph (DAG) construction, enable end-users to create efficient schemes for the continuous verification and validation of the execution of applications in edge¿fog¿cloud infrastructures. This scheme also verifies and validates established execution sequences and the integrity of digital assets. CV model converts ETL and DAG into business model, smart contracts in a private blockchain for the automatic and transparent registration of transactions performed by each application in workflows/pipelines created by CD model without altering applications nor edge¿fog¿cloud workflows. This model ensures that IoT dataflows delivers verifiable information for organizations to conduct critical decision-making processes with certainty. A containerized parallelism model solves portability issues and reduces/compensates the overhead produced by CD/CV operations. We developed and implemented a prototype to create CD/CV schemes, which were evaluated in a case study where user mobility information is used to identify interest points, patterns, and maps. The experimental evaluation revealed the efficiency of CD/CV to register the transactions performed in IoT dataflows through edge¿fog¿cloud in a private blockchain network in comparison with state-of-art solutions.This work has been partially supported by the project “CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones” S2018/TCS-4423 from Madrid Regional Government, Spain and by the Spanish Ministry of Science and Innovation Project “New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)”. Ref. PID2019-107858GB-I00; and by the project 41756 “Plataforma tecnológica para la gestión, aseguramiento, intercambio preservación de grandes volúmenes de datos en salud construcción de un repositorio nacional de servicios de análisis de datos de salud” by the PRONACES-CONACYT, Mexic

Universidad Carlos III de Madrid e-Archivo

Liquid: unifying nearline and offline big data integration

Author: Castro Fernandez R
Koshy J
Kreps J
Lin D
Narkhede N
Pietzuch PR
Rao J
Riccomini C
Wang G
Publication venue
Publication date: 01/10/2014
Field of study

Spiral - Imperial College Digital Repository

Big Data Computing for Geospatial Applications

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

Directory of Open Access Books (DOAB)

Deep Learning in the Automotive Industry: Applications and Tools

Author: Ashcraft Nathan
Cook Matthew
Djerekarov Emil
Luckow Andre
Vorster Bennie
Weill Edwin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/04/2017
Field of study

Deep Learning refers to a set of machine learning techniques that utilize neural networks with many hidden layers for tasks, such as image classification, speech recognition, language understanding. Deep learning has been proven to be very effective in these domains and is pervasively used by many Internet services. In this paper, we describe different automotive uses cases for deep learning in particular in the domain of computer vision. We surveys the current state-of-the-art in libraries, tools and infrastructures (e.\,g.\ GPUs and clouds) for implementing, training and deploying deep neural networks. We particularly focus on convolutional neural networks and computer vision use cases, such as the visual inspection process in manufacturing plants and the analysis of social media data. To train neural networks, curated and labeled datasets are essential. In particular, both the availability and scope of such datasets is typically very limited. A main contribution of this paper is the creation of an automotive dataset, that allows us to learn and automatically recognize different vehicle properties. We describe an end-to-end deep learning application utilizing a mobile app for data collection and process support, and an Amazon-based cloud backend for storage and training. For training we evaluate the use of cloud and on-premises infrastructures (including multiple GPUs) in conjunction with different neural network architectures and frameworks. We assess both the training times as well as the accuracy of the classifier. Finally, we demonstrate the effectiveness of the trained classifier in a real world setting during manufacturing process.Comment: 10 page

arXiv.org e-Print Archive

Crossref

New techniques to integrate blockchain in Internet of Things scenarios for massive data management

Author: Martínez Rendón Cristhian
Publication venue
Publication date: 04/07/2023
Field of study

Mención Internacional en el título de doctorNowadays, regardless of the use case, most IoT data is processed using workflows that are executed on different infrastructures (edge-fog-cloud), which produces dataflows from the IoT through the edge to the fog/cloud. In many cases, they also involve several actors (organizations and users), which poses a challenge for organizations to establish verification of the transactions performed by the participants in the dataflows built by the workflow engines and pipeline frameworks. It is essential for organizations, not only to verify that the execution of applications is performed in the strict sequence previously established in a DAG by authenticated participants, but also to verify that the incoming and outgoing IoT data of each stage of a workflow/pipeline have not been altered by third parties or by the users associated to the organizations participating in a workflow/pipeline. Blockchain technology and its mechanism for recording immutable transactions in a distributed and decentralized manner, characterize it as an ideal technology to support the aforementioned challenges and challenges since it allows the verification of the records generated in a secure manner. However, the integration of blockchain technology with workflows for IoT data processing is not trivial considering that it is a challenge not to lose the generalization of workflows and/or pipeline engines, which must be modified to include the embedded blockchain module. The main objective of this doctoral research was to create new techniques to use blockchain in the Internet of Things (IoT). Thus, we defined the main goal of this thesis is to develop new techniques to integrate blockchain in Internet of Things scenarios for massive data management in edge-fog-cloud environments. To fulfill this general objective, we have designed a content delivery model for processing big IoT data in Edge-Fog-Cloud computing by using micro/nanoservice composition, a continuous verification model based on blockchain to register significant events from the continuous delivery model, selecting techniques to integrate blockchain in quasi-real systems that allow ensuring traceability and non-repudiation of data obtained from devices and sensors. The evaluation proposed has been thoroughly evaluated, showing its feasibility and good performance.Hoy en día, independientemente del caso de uso, la mayoría de los datos de IoT se procesan utilizando flujos de trabajo que se ejecutan en diferentes infraestructuras (edge-fog-cloud) desde IoT a través del edge hasta la fog/cloud. En muchos casos, también involucran a varios actores (organizaciones y usuarios), lo que plantea un desafío para las organizaciones a la hora de verificar las transacciones realizadas por los participantes en los flujos de datos. Es fundamental para las organizaciones, no solo para verificar que la ejecución de aplicaciones se realiza en la secuencia previamente establecida en un DAG y por participantes autenticados, sino también para verificar que los datos IoT entrantes y salientes de cada etapa de un flujo de trabajo no han sido alterados por terceros o por usuarios asociados a las organizaciones que participan en el mismo. La tecnología Blockchain, gracias a su mecanismo para registrar transacciones de manera distribuida y descentralizada, es un tecnología ideal para soportar los retos y desafíos antes mencionados ya que permite la verificación de los registros generados de manera segura. Sin embargo, la integración de la tecnología blockchain con flujos de trabajo para IoT no es baladí considerando que es un desafío proporcionar el rendimiento necesario sin perder la generalización de los motores de flujos de trabajo, que deben ser modificados para incluir el módulo blockchain integrado. El objetivo principal de esta investigación doctoral es desarrollar nuevas técnicas para integrar blockchain en Internet de las Cosas (IoT) para la gestión masiva de datos en un entorno edge-fog-cloud. Para cumplir con este objetivo general, se ha diseñado un modelo de flujos para procesar grandes datos de IoT en computación Edge-Fog-Cloud mediante el uso de la composición de micro/nanoservicio, un modelo de verificación continua basado en blockchain para registrar eventos significativos de la modelo de entrega continua de datos, seleccionando técnicas para integrar blockchain en sistemas cuasi-reales que permiten asegurar la trazabilidad y el no repudio de datos obtenidos de dispositivos y sensores, La evaluación propuesta ha sido minuciosamente evaluada, mostrando su factibilidad y buen rendimiento.This work has been partially supported by the project "CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones" S2018/TCS-4423 from Madrid Regional Government.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Paolo Trunfio.- Secretario: David Exposito Singh.- Vocal: Rafael Mayo Garcí

Universidad Carlos III de Madrid e-Archivo

Fine-Grained Provenance And Applications To Data Analytics Computation

Author: Zheng Nan
Publication venue: ScholarlyCommons
Publication date: 01/01/2021
Field of study

Data provenance tools seek to facilitate reproducible data science and auditable data analyses by capturing the analytics steps used in generating data analysis results. However, analysts must choose among workflow provenance systems, which allow arbitrary code but only track provenance at the granularity of files; prove-nance APIs, which provide tuple-level provenance, but incur overhead in all computations; and database provenance tools, which track tuple-level provenance through relational operators and support optimization, but support a limited subset of data science tasks. None of these solutions are well suited for tracing errors introduced during common ETL, record alignment, and matching tasks – for data types such as strings, images, etc.Additionally, we need a provenance archival layer to store and manage the tracked fine-grained prove-nance that enables future sophisticated reasoning about why individual output results appear or fail to appear. For reproducibility and auditing, the provenance archival system should be tamper-resistant. On the other hand, the provenance collecting over time or within the same query computation tends to be repeated partially (i.e., the same operation with the same input records in the middle computation step). Hence, we desire efficient provenance storage (i.e., it compresses repeated results). We address these challenges with novel formalisms and algorithms, implemented in the PROVision system, for reconstructing fine-grained provenance for a broad class of ETL-style workflows. We extend database-style provenance techniques to capture equivalences, support optimizations, and enable lazy evaluations. We develop solutions for storing fine-grained provenance in relational storage systems while both compressing and protecting it via cryptographic hashes. We experimentally validate our proposed solutions using both scientific and OLAP workloads

ScholarlyCommons@Penn

Extract, Transform, and Load data from Legacy Systems to Azure Cloud

Author: Jephte Ioudom Foubi
Publication venue
Publication date: 20/05/2021
Field of study

Internship report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceIn a world with continuously evolving technologies and hardened competitive markets, organisations need to continually be on guard to grasp cutting edge technology and tools that will help them to surpass any competition that arises. Modern data platforms that incorporate cloud technologies, support organisations to strive and get ahead of their competitors by providing solutions that help them capture and optimally use untapped data, and scalable storages to adapt to ever-growing data quantities. Also, adopt data processing and visualisation tools that help to improve the decision-making process. With many cloud providers available in the market, from small players to major technology corporations, this offers much flexibility to organisations to choose the best cloud technology that will align with their use cases and overall products and services strategy. This internship came up at the time when one of Accenture’s significant client in the financial industry decided to migrate from legacy systems to a cloud-based data infrastructure that is Microsoft Azure cloud. During this internship, development of the data lake, which is a core part of the MDP, was done to understand better the type of challenges that can be faced when migrating data from on-premise legacy systems to a cloud-based infrastructure. Also, provided in this work, are the main recommendations and guidelines when it comes to performing a large scale data migration

Repositório da Universidade Nova de Lisboa