104 research outputs found

    Executing Large Scale Scientific Workflows in Public Clouds

    Get PDF
    Scientists in different fields, such as high-energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this thesis, we develop a set of methods to optimize the execution of large-scale scientific workflows in public clouds with both cost and deadline constraints with a two-step approach. Firstly, we present a set of methods to optimize the execution of scientific workflow in public clouds, with the Montage astronomical mosaic engine running on Amazon EC2 as an example. Secondly, we address three main challenges in realizing benefits of using public clouds when executing large-scale workflow ensembles: (1) execution coordination, (2) resource provisioning, and (3) data staging. To this end, we develop a new pulling-based workflow execution system with a profiling-based resource provisioning strategy. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline

    Service-oriented models for audiovisual content storage

    No full text
    What are the important topics to understand if involved with storage services to hold digital audiovisual content? This report takes a look at how content is created and moves into and out of storage; the storage service value networks and architectures found now and expected in the future; what sort of data transfer is expected to and from an audiovisual archive; what transfer protocols to use; and a summary of security and interface issues

    Distributed Shared Memory based Live VM Migration

    Get PDF
    Cloud computing is the new trend in computing services and IT industry, this computing paradigm has numerous benefits to utilize IT infrastructure resources and reduce services cost. The key feature of cloud computing depends on mobility and scalability of the computing resources, by managing virtual machines. The virtualization decouples the software from the hardware and manages the software and hardware resources in an easy way without interruption of services. Live virtual machine migration is an essential tool for dynamic resource management in current data centers. Live virtual machine is defined as the process of moving a running virtual machine or application between different physical machines without disconnecting the client or application. Many techniques have been developed to achieve this goal based on several metrics (total migration time, downtime, size of data sent and application performance) that are used to measure the performance of live migration. These metrics measure the quality of the VM services that clients care about, because the main goal of clients is keeping the applications performance with minimum service interruption. The pre-copy live VM migration is done in four phases: preparation, iterative migration, stop and copy, and resume and commitment. During the preparation phase, the source and destination physical servers are selected, the resources in destination physical server are reserved, and the critical VM is selected to be migrated. The cloud manager responsibility is to make all of these decisions. VM state migration takes place and memory state is transferred to the target node during iterative migration phase. Meanwhile, the migrated VM continues to execute and dirties its memory. In the stop and copy phase, VM virtual CPU is stopped and then the processor and network states are transferred to the destination host. Service downtime results from stopping VM execution and moving the VM CPU and network states. Finally in the resume and commitment phase, the migrated VM is resumed running in the destination physical host, the remaining memory pages are pulled by destination machine from the source machine. The source machine resources are released and eliminated. In this thesis, pre-copy live VM migration using Distributed Shared Memory (DSM) computing model is proposed. The setup is built using two identical computation nodes to construct all the proposed environment services architecture namely the virtualization infrastructure (Xenserver6.2 hypervisor), the shared storage server (the network file system), and the DSM and High Performance Computing (HPC) cluster. The custom DSM framework is based on a low latency memory update named Grappa. Moreover, HPC cluster is used to parallelize the work load by using CPUs computation nodes. HPC cluster employs OPENMPI and MPI libraries to support parallelization and auto-parallelization. The DSM allows the cluster CPUs to access the same memory space pages resulting in less memory data updates, which reduces the amount of data transferred through the network. The thesis proposed model achieves a good enhancement of the live VM migration metrics. Downtime is reduced by 50 % in the idle workload of Windows VM and 66.6% in case of Ubuntu Linux idle workload. In general, the proposed model not only reduces the downtime and the total amount of data sent, but also does not degrade other metrics like the total migration time and the applications performance

    Helmholtz Portfolio Theme Large-Scale Data Management and Analysis (LSDMA)

    Get PDF
    The Helmholtz Association funded the "Large-Scale Data Management and Analysis" portfolio theme from 2012-2016. Four Helmholtz centres, six universities and another research institution in Germany joined to enable data-intensive science by optimising data life cycles in selected scientific communities. In our Data Life cycle Labs, data experts performed joint R&D together with scientific communities. The Data Services Integration Team focused on generic solutions applied by several communities

    Towards Design and Analysis For High-Performance and Reliable SSDs

    Get PDF
    NAND Flash-based Solid State Disks have many attractive technical merits, such as low power consumption, light weight, shock resistance, sustainability of hotter operation regimes, and extraordinarily high performance for random read access, which makes SSDs immensely popular and be widely employed in different types of environments including portable devices, personal computers, large data centers, and distributed data systems. However, current SSDs still suffer from several critical inherent limitations, such as the inability of in-place-update, asymmetric read and write performance, slow garbage collection processes, limited endurance, and degraded write performance with the adoption of MLC and TLC techniques. To alleviate these limitations, we propose optimizations from both specific outside applications layer and SSDs\u27 internal layer. Since SSDs are good compromise between the performance and price, so SSDs are widely deployed as second layer caches sitting between DRAMs and hard disks to boost the system performance. Due to the special properties of SSDs such as the internal garbage collection processes and limited lifetime, traditional cache devices like DRAM and SRAM based optimizations might not work consistently for SSD-based cache. Therefore, for the outside applications layer, our work focus on integrating the special properties of SSDs into the optimizations of SSD caches. Moreover, our work also involves the alleviation of the increased Flash write latency and ECC complexity due to the adoption of MLC and TLC technologies by analyzing the real work workloads

    Network bandwidth aware dynamic automated framework for Virtual Machine Live Migration in cloud environments

    Get PDF
    Live migration is a very important feature of virtualisation, a running VM can be seamlessly moved between different physical hosts. The source VM’s CPU state, storage, memory and network resources can be completely moved to a target host without disrupting the users or running applications. Live VM migration is an extremely powerful tool in many key scenarios such as load balancing, online maintenance, proactive fault tolerance and power management. There are four steps involved in the live VM migration, the setup stage, memory transfer stage, VM storage transfer stage and the network clean up stage. The most important part of live VM migration is transferring the main memory state of the VM from the source to the destination host which can consume a significant amount of network bandwidth in a short period of time. Modern cloud based data centres generate a significant amount of network traffic apart from VM live migration traffic. If VM migration occurs during a peak time, VM migration and user traffic will compete for network bandwidth, then the data centre’s network may not have enough resources to support both VM migration and demands of application users, which would create a bottleneck in the network. Therefore, this research presents a centralised, bandwidth aware, dynamic, and automated framework for live VM migration in Cloud environments. The proposed framework adopted a heuristic approach, and it provides guaranteed bandwidth for VM live migration by controlling user traffic on the network while scheduling live VM migration in an efficient manner. The framework consists with two main components, The Central Controller and the Local Controller. The Local Controller is responsible for collecting resources usage data from VMs and PMs however the Central Controller makes global management decisions. The Central Controller is based on four algorithms which are called a migration policy. The migration policy contains the following algorithms: the host overloaded detection, host underloaded detection, VM selection and VM placement algorithms which are proposed in this research. The proposed migration policy has been implemented in CloudSim and evaluated against two benchmark migration policies in CloudSim. Five evaluation metrics have been used in the simulation to evaluate the performance of the proposed migration policy. The results reveal that the proposed migration policy outperformed the two benchmark policies

    Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems

    Get PDF
    University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio

    Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données à large échelle

    Get PDF
    The amount of produced data, either in the scientific community or the commercialworld, is constantly growing. The field of Big Data has emerged to handle largeamounts of data on distributed computing infrastructures. High-Performance Computing (HPC) infrastructures are traditionally used for the execution of computeintensive workloads. However, the HPC community is also facing an increasingneed to process large amounts of data derived from high definition sensors andlarge physics apparati. The convergence of the two fields -HPC and Big Data- iscurrently taking place. In fact, the HPC community already uses Big Data tools,which are not always integrated correctly, especially at the level of the file systemand the Resource and Job Management System (RJMS).In order to understand how we can leverage HPC clusters for Big Data usage, andwhat are the challenges for the HPC infrastructures, we have studied multipleaspects of the convergence: We initially provide a survey on the software provisioning methods, with a focus on data-intensive applications. We contribute a newRJMS collaboration technique called BeBiDa which is based on 50 lines of codewhereas similar solutions use at least 1000 times more. We evaluate this mechanism on real conditions and in simulated environment with our simulator Batsim.Furthermore, we provide extensions to Batsim to support I/O, and showcase thedevelopments of a generic file system model along with a Big Data applicationmodel. This allows us to complement BeBiDa real conditions experiments withsimulations while enabling us to study file system dimensioning and trade-offs.All the experiments and analysis of this work have been done with reproducibilityin mind. Based on this experience, we propose to integrate the developmentworkflow and data analysis in the reproducibility mindset, and give feedback onour experiences with a list of best practices.RésuméLa quantité de données produites, que ce soit dans la communauté scientifiqueou commerciale, est en croissance constante. Le domaine du Big Data a émergéface au traitement de grandes quantités de données sur les infrastructures informatiques distribuées. Les infrastructures de calcul haute performance (HPC) sont traditionnellement utilisées pour l’exécution de charges de travail intensives en calcul. Cependant, la communauté HPC fait également face à un nombre croissant debesoin de traitement de grandes quantités de données dérivées de capteurs hautedéfinition et de grands appareils physique. La convergence des deux domaines-HPC et Big Data- est en cours. En fait, la communauté HPC utilise déjà des outilsBig Data, qui ne sont pas toujours correctement intégrés, en particulier au niveaudu système de fichiers ainsi que du système de gestion des ressources (RJMS).Afin de comprendre comment nous pouvons tirer parti des clusters HPC pourl’utilisation du Big Data, et quels sont les défis pour les infrastructures HPC, nousavons étudié plusieurs aspects de la convergence: nous avons d’abord proposé uneétude sur les méthodes de provisionnement logiciel, en mettant l’accent sur lesapplications utilisant beaucoup de données. Nous contribuons a l’état de l’art avecune nouvelle technique de collaboration entre RJMS appelée BeBiDa basée sur 50lignes de code alors que des solutions similaires en utilisent au moins 1000 fois plus.Nous évaluons ce mécanisme en conditions réelles et en environnement simuléavec notre simulateur Batsim. En outre, nous fournissons des extensions à Batsimpour prendre en charge les entrées/sorties et présentons le développements d’unmodèle de système de fichiers générique accompagné d’un modèle d’applicationBig Data. Cela nous permet de compléter les expériences en conditions réellesde BeBiDa en simulation tout en étudiant le dimensionnement et les différentscompromis autours des systèmes de fichiers.Toutes les expériences et analyses de ce travail ont été effectuées avec la reproductibilité à l’esprit. Sur la base de cette expérience, nous proposons d’intégrerle flux de travail du développement et de l’analyse des données dans l’esprit dela reproductibilité, et de donner un retour sur nos expériences avec une liste debonnes pratiques
    • …
    corecore