191 research outputs found

    A federated content distribution system to build health data synchronization services

    Get PDF
    In organizational environments, such as in hospitals, data have to be processed, preserved, and shared with other organizations in a cost-efficient manner. Moreover, organizations have to accomplish different mandatory non-functional requirements imposed by the laws, protocols, and norms of each country. In this context, this paper presents a Federated Content Distribution System to build infrastructure-agnostic health data synchronization services. In this federation, each hospital manages local and federated services based on a pub/sub model. The local services manage users and contents (i.e., medical imagery) inside the hospital, whereas federated services allow the cooperation of different hospitals sharing resources and data. Data preparation schemes were implemented to add non-functional requirements to data. Moreover, data published in the content distribution system are automatically synchronized to all users subscribed to the catalog where the content was published.This work has been partially supported by the grant “CABAHLA-CM: Convergencia Big data-Hpc: de Los sensores a las Aplicaciones” (Ref: S2018/TCS-4423) of Madrid Regional Government; the Spanish Ministry of Science and Innovation Project ” New Data Intensive Computing Methods for High-End and Edge Computing Platforms (DECIDE)”. Ref. PID2019-107858GB-I00; and by the project 41756 “Plataforma tecnológica para la gestión, aseguramiento, intercambio y preservación de grandes volúmenes de datos en salud y construcción de un repositorio nacional de servicios de análisis de datos de salud” by the FORDECYT-PRONACES

    A data preparation approach for cloud storage based on containerized parallel patterns

    Get PDF
    In this paper, we present the design, implementation, and evaluation of an efficient data preparation and retrieval approach for cloud storage. The approach includes a deduplication subsystem that indexes the hash of each content to identify duplicated data. As a consequence, avoiding duplicated content reduces reprocessing time during uploads and other costs related to outsource data management tasks. Our proposed data preparation scheme enables organizations to add properties such as security, reliability, and cost-efficiency to their contents before sending them to the cloud. It also creates recovery schemes for organizations to share preprocessed contents with partners and end-users. The approach also includes an engine that encapsulates preprocessing applications into virtual containers (VCs) to create parallel patterns that improve the efficiency of data preparation retrieval process. In a study case, real repositories of satellite images, and organizational files were prepared to be migrated to the cloud by using processes such as compression, encryption, encoding for fault tolerance, and access control. The experimental evaluation revealed the feasibility of using a data preparation approach for organizations to mitigate risks that still could arise in the cloud. It also revealed the efficiency of the deduplication process to reduce data preparation tasks and the efficacy of parallel patterns to improve the end-user service experience.This research was supported by "Fondo Sectorial de InvestigaciĂłn para la EducaciĂłn";, SEP-CONACyT Mexico, through projects 281565 and 285276

    A gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers

    Get PDF
    Software pipelines enable organizations to chain applications for adding value to contents (e.g., confidentially, reliability, and integrity) before either sharing them with partners or sending them to the cloud. However, the pipeline components add overhead when processing large volumes of data, which can become critical in real-world scenarios. This paper presents a gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers. In this model, the gears represent applications, whereas gearboxes represent software pipelines. This model was implemented as a collaborative system that automatically performs Gear up (by using parallel patterns) and/or Gear down (by using in-memory storage) until all gears produce uniform data processing velocities. This model reduces delays and bottlenecks produced by the heterogeneous performance of applications included in software pipelines. The new container tool has been designed to encapsulate both the collaborative system and the software pipelines into a virtual container and deploy it on IT infrastructures. We conducted case studies to evaluate the performance of when processing medical images and PDF repositories. The incorporation of a capsule to a cloud storage service for pre-processing medical imagery was also studied. The experimental evaluation revealed the feasibility of applying the gearbox model to the deployment of software pipelines in real-world scenarios as it can significantly improve the end-user service experience when pre-processing large-scale data in comparison with state-of-the-art solutions such as Sacbe and Parsl.This work has been partially supported by the “Spanish Ministerio de Economia y Competitividad ” under the project grant TIN2016-79637-P “Towards Unification of HPC and Big Data paradigms”

    A Ring to Rule Them All - Revising OpenStack Internals to Operate Massively Distributed Clouds: The Discovery Initiative - Where Do We Are ?

    Get PDF
    STACK_HCERES2020The deployment of micro/nano data-centers in network point of presence offers an opportunity to deliver a more sustainable and efficient infrastructure for Cloud Computing. Among the different challenges we need to address to favor the adoption of such a model, the development of a system in charge of turning such a complex and diverse network of resources into a collection of abstracted computing facilities that are convenient to administrate and use is critical.In this report, we introduce the premises of such a system. The novelty of our work is that instead of developing a system from scratch, we revised the OpenStack solution in order to operate such an infrastructure in a distributed manner leveraging P2P mechanisms. More precisely, we describe how we revised the Nova service by leveraging a distributed key/value store instead of the centralized SQL backend. We present experiments that validated the correct behavior of our prototype, while having promising performance using several clusters composed of servers of the Grid’5000 testbed. We believe that such a strategy is promising and paves the way to a first large-scale and WAN-wide IaaS manager.La tendance actuelle pour supporter la demande croissante d'informatique utilitaire consiste à construire des centres de données de plus en plus grands, dans un nombre limité de lieux stratégiques. Cette approche permet sans aucun doute de satisfaire la demande actuelle tout en conservant une approche centralisée de la gestion de ces ressources, mais elle reste loin de pouvoir fournir des infrastructures répondant aux contraintes actuelles et futures en termes d'efficacité, de juridiction ou encore de durabilité. L'objectif de l'initiative DISCOVERY est de concevoir le LUC OS, un système de gestion distribuée des ressources qui permettra de tirer parti de n'importe quel noeud réseau constituant la dorsale d'Internet afin de fournir une nouvelle génération d'informatique utilitaire, plus apte à prendre en compte la dispersion géographiquedes utilisateurs et leur demande toujours croissante.Après avoir rappelé les objectifs de l'initiative DISCOVERY et expliqué pourquoi les approches type fédération ne sont pas adaptées pour opérer une infrastructure d'informatique utilitaire intégrée au réseau, nous présentons les prémisses de notre système. Nous expliquerons notamment pourquoi et comment nous avons choisi de démarrer des travaux visant à revisiter la conception de la solution Openstack. De notre point de vue, choisir d'appuyer nos travaux sur cette solution est une stratégie judicieuse à la vue de la complexité des systèmes de gestion des plateformes IaaS et de la vélocité des solutions open-source

    On the continuous processing of health data in edge-fog-cloud computing by using micro/nanoservice composition

    Get PDF
    The edge, the fog, the cloud, and even the end-user's devices play a key role in the management of the health sensitive content/data lifecycle. However, the creation and management of solutions including multiple applications executed by multiple users in multiple environments (edge, the fog, and the cloud) to process multiple health repositories that, at the same time, fulfilling non-functional requirements (NFRs) represents a complex challenge for health care organizations. This paper presents the design, development, and implementation of an architectural model to create, on-demand, edge-fog-cloud processing structures to continuously handle big health data and, at the same time, to execute services for fulfilling NFRs. In this model, constructive and modular blocksblocks , implemented as microservices and nanoservices, are recursively interconnected to create edge-fog-cloud processing structures as ÂżThis work was supported in part by the Council for Science and Technology of Mexico (CONACYT) through the Basic Scientific Research under Grant 2016-01-285276, and in part by the Project CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones from Madrid Regional Government under Grant S2018/TCS-4423

    On the efficient delivery and storage of IoT data in edge-fog-cloud environments

    Get PDF
    This article belongs to the Special Issue Internet of Things, Sensing and Cloud ComputingCloud storage has become a keystone for organizations to manage large volumes of data produced by sensors at the edge as well as information produced by deep and machine learning applications. Nevertheless, the latency produced by geographic distributed systems deployed on any of the edge, the fog, or the cloud, leads to delays that are observed by end-users in the form of high response times. In this paper, we present an efficient scheme for the management and storage of Internet of Thing (IoT) data in edge-fog-cloud environments. In our proposal, entities called data containers are coupled, in a logical manner, with nano/microservices deployed on any of the edge, the fog, or the cloud. The data containers implement a hierarchical cache file system including storage levels such as in-memory, file system, and cloud services for transparently managing the input/output data operations produced by nano/microservices (e.g., a sensor hub collecting data from sensors at the edge or machine learning applications processing data at the edge). Data containers are interconnected through a secure and efficient content delivery network, which transparently and automatically performs the continuous delivery of data through the edge-fog-cloud. A prototype of our proposed scheme was implemented and evaluated in a case study based on the management of electrocardiogram sensor data. The obtained results reveal the suitability and efficiency of the proposed scheme.This research was funded by the project 41756 "Plataforma tecnológica para la gestión, aseguramiento, intercambio y preservación de grandes volúmenes de datos en salud y construcción de un repositorio nacional de servicios de análisis de datos de salud" by the PRONACES-CONACYT

    Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications

    Get PDF
    This paper presents the design, development, and implementation of Kulla, a virtual container-centric construction model that mixes loosely coupled structures with a parallel programming model for building infrastructure-agnostic distributed and parallel applications. In Kulla, applications, dependencies and environment settings, are mapped with construction units called Kulla-Blocks. A parallel programming model enables developers to couple those interoperable structures for creating constructive structures named Kulla-Bricks. In these structures, continuous dataflow and parallel patterns can be created without modifying the code of applications. Methods such as Divide&Containerize (data parallelism), Pipe&Blocks (streaming), and Manager/Block (task parallelism) were developed to create Kulla-Bricks. Recursive combinations of Kulla instances can be grouped in deployment structures called Kulla-Boxes, which are encapsulated into VCs to create infrastructure-agnostic parallel and/or distributed applications. Deployment strategies were created for Kulla-Boxes to improve the IT resource profitability. To show the feasibility and flexibility of this model, solutions combining real-world applications were implemented by using Kulla instances to compose parallel and/or distributed system deployed on different IT infrastructures. An experimental evaluation based on use cases solving satellite and medical image processing problems revealed the efficiency of Kulla model in comparison with some traditional state-of-the-art solutions.This work has been partially supported by the EU project "ASPIDE: Exascale Programing Models for Extreme Data Processing" under grant 801091 and the project "CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones" S2018/TCS-4423 from Madrid Regional Government

    Improving performance and capacity utilization in cloud storage for content delivery and sharing services

    Get PDF
    Content delivery and sharing (CDS) is a popular and cost effective cloud-based service for organizations to deliver/share contents to/with end-users, partners and insider users. This type of service improves the data availability and I/O performance by producing and distributing replicas of shared contents. However, such a technique increases the storage/network resources utilization. This paper introduces a threefold methodology to improve the trade-off between I/O performance and capacity utilization of cloud storage for CDS services. This methodology includes: i) Definition of a classification model for identifying types of users and contents by analyzing their consumption/ demand and sharing patterns, ii) Usage of the classification model for defining content availability and load balancing schemes, and iii) Integration of a dynamic availability scheme into a cloud based CDS system. Our method was implemented ¿This work was partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under the grant TIN2016-79637-P ”Towards Unification of HPC and Big Data Paradigms

    Population Dynamics and Ecophysiology of Fraser fir (\u3ci\u3eAbies fraseri\u3c/i\u3e) in the High Elevation Forests in the Southern Appalachian Mountains

    Get PDF
    Dominated by the endemic Fraser fir (Abies fraseri), the high-elevation forests of the Southern Appalachians are one of the most endangered ecosystems in the United States, and the future of these forests remains uncertain. Fraser fir is showing signs of decline in health and increased mortality throughout its range, possibly due to multiple environmental stresses. Using twenty years of forest monitoring data, this dissertation documents change in forest structure and species composition in high-elevation red spruce-Fraser fir forests in southern Appalachia and generates predictions of future forest change. Additionally, it quantifies physiological measures of carbon fixation, storage and growth in adult Fraser fir in situ under multiple stresses, which has been unstudied previously, and explores environmental constraints associated with climate, soil chemistry and acidic deposition on physiological metrics. We find no evidence of previously hypothesized shifts in forest composition to greater dominance of northern hardwood species. Using a stage-structured Baysian hierarchical model to predict Fraser fir populations through 2050, we predict robust recovery of populations on Clingmans Dome and Mount LeConte for at least the next several decades, as well as continued decline for populations on a number of mountains, notably Mount Sterling at the lowest end of Fraser fir’s elevation range. We find that maximum photosynthetic rates are low throughout the high elevation mountains of Great Smoky Mountains National Park, indicating trees are under considerable stress, but are highest in trees growing on the highest, steepest slopes. Trees from Clingmans Dome have significantly higher maximum photosynthetic rates and water use efficiency than trees on other mountains, which may indicate stress resistance in this population. Additionally both vi photosynthetic water use efficiency and leaf architecture are affected by maximum July temperature, suggesting future climate change will impact the foliar physiology of Fraser fir. Measurements of nonstructural carbohydrate pools are consistent with those found in mature trees of other species which suggests the capacity for resistance of future stress events, particularly at the highest elevations where photosynthetic rates are the highest
    • …
    corecore