3,451 research outputs found

    Leveraging OpenStack and Ceph for a Controlled-Access Data Cloud

    Full text link
    While traditional HPC has and continues to satisfy most workflows, a new generation of researchers has emerged looking for sophisticated, scalable, on-demand, and self-service control of compute infrastructure in a cloud-like environment. Many also seek safe harbors to operate on or store sensitive and/or controlled-access data in a high capacity environment. To cater to these modern users, the Minnesota Supercomputing Institute designed and deployed Stratus, a locally-hosted cloud environment powered by the OpenStack platform, and backed by Ceph storage. The subscription-based service complements existing HPC systems by satisfying the following unmet needs of our users: a) on-demand availability of compute resources, b) long-running jobs (i.e., >30> 30 days), c) container-based computing with Docker, and d) adequate security controls to comply with controlled-access data requirements. This document provides an in-depth look at the design of Stratus with respect to security and compliance with the NIH's controlled-access data policy. Emphasis is placed on lessons learned while integrating OpenStack and Ceph features into a so-called "walled garden", and how those technologies influenced the security design. Many features of Stratus, including tiered secure storage with the introduction of a controlled-access data "cache", fault-tolerant live-migrations, and fully integrated two-factor authentication, depend on recent OpenStack and Ceph features.Comment: 7 pages, 5 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    Containerization in Cloud Computing: performance analysis of virtualization architectures

    Get PDF
    La crescente adozione del cloud è fortemente influenzata dall’emergere di tecnologie che mirano a migliorare i processi di sviluppo e deployment di applicazioni di livello enterprise. L’obiettivo di questa tesi è analizzare una di queste soluzioni, chiamata “containerization” e di valutare nel dettaglio come questa tecnologia possa essere adottata in infrastrutture cloud in alternativa a soluzioni complementari come le macchine virtuali. Fino ad oggi, il modello tradizionale “virtual machine” è stata la soluzione predominante nel mercato. L’importante differenza architetturale che i container offrono ha portato questa tecnologia ad una rapida adozione poichè migliora di molto la gestione delle risorse, la loro condivisione e garantisce significativi miglioramenti in termini di provisioning delle singole istanze. Nella tesi, verrà esaminata la “containerization” sia dal punto di vista infrastrutturale che applicativo. Per quanto riguarda il primo aspetto, verranno analizzate le performances confrontando LXD, Docker e KVM, come hypervisor dell’infrastruttura cloud OpenStack, mentre il secondo punto concerne lo sviluppo di applicazioni di livello enterprise che devono essere installate su un insieme di server distribuiti. In tal caso, abbiamo bisogno di servizi di alto livello, come l’orchestrazione. Pertanto, verranno confrontate le performances delle seguenti soluzioni: Kubernetes, Docker Swarm, Apache Mesos e Cattle

    Performance evaluation of data-centric workloads in serverless environments

    Get PDF
    Serverless computing is a cloud-based execution paradigm that allows provisioning resources on-demand, freeing developers from infrastructure management and operational concerns. It typically involves deploying workloads as stateless functions that take no resources when not in use, and is meant to scale transparently. To make serverless effective, providers impose limits on a per-function level, such as maximum duration, fixed amount of memory, and no persistent local storage. These constraints make it challenging for data-intensive workloads to take advantage of serverless because they lead to sharing significant amounts of data through remote storage. In this paper, we build a performance model for serverless workloads that considers how data is shared between functions, including the amount of data and the underlying technology that is being used. The model's accuracy is assessed by running a real workload in a cluster using Knative, a state-of-the-art serverless environment, showing a relative error of 5.52%. With the proposed model, we evaluate the performance of data-intensive workloads in serverless, analyzing parallelism, scalability, resource requirements, and scheduling policies. We also explore possible solutions for the data-sharing problem, like using local memory and storage. Our results show that the performance of data-intensive workloads in serverless can be up to 4.32= faster depending on how these are deployed.This work was partially supported by the Ministry of Economy of Spain under contract TIN2015-65316-P, the Ministry of Science under contract PID2019-107255GB-C21/AEI/10.13039/501100011033, and the Generalitat de Catalunya under contract 2014SGR1051.Peer ReviewedPostprint (author's final draft

    Workload characterization of JVM languages

    Get PDF
    Being developed with a single language in mind, namely Java, the Java Virtual Machine (JVM) nowadays is targeted by numerous programming languages. Automatic memory management, Just-In-Time (JIT) compilation, and adaptive optimizations provided by the JVM make it an attractive target for different language implementations. Even though being targeted by so many languages, the JVM has been tuned with respect to characteristics of Java programs only -- different heuristics for the garbage collector or compiler optimizations are focused more on Java programs. In this dissertation, we aim at contributing to the understanding of the workloads imposed on the JVM by both dynamically-typed and statically-typed JVM languages. We introduce a new set of dynamic metrics and an easy-to-use toolchain for collecting the latter. We apply our toolchain to applications written in six JVM languages -- Java, Scala, Clojure, Jython, JRuby, and JavaScript. We identify differences and commonalities between the examined languages and discuss their implications. Moreover, we have a close look at one of the most efficient compiler optimizations - method inlining. We present the decision tree of the HotSpot JVM's JIT compiler and analyze how well the JVM performs in inlining the workloads written in different JVM languages

    Self-adaptive Grid Resource Monitoring and discovery

    Get PDF
    The Grid provides a novel platform where the scientific and engineering communities can share data and computation across multiple administrative domains. There are several key services that must be offered by Grid middleware; one of them being the Grid Information Service( GIS). A GIS is a Grid middleware component which maintains information about hardware, software, services and people participating in a virtual organisation( VO). There is an inherent need in these systems for the delivery of reliable performance. This thesis describes a number of approaches which detail the development and application of a suite of benchmarks for the prediction of the process of resource discovery and monitoring on the Grid. A series of experimental studies of the characterisation of performance using benchmarking, are carried out. Several novel predictive algorithms are presented and evaluated in terms of their predictive error. Furthermore, predictive methods are developed which describe the behaviour of MDS2 for a variable number of user requests. The MDS is also extended to include job information from a local scheduler; this information is queried using requests of greatly varying complexity. The response of the MDS to these queries is then assessed in terms of several performance metrics. The benchmarking of the dynamic nature of information within MDS3 which is based on the Open Grid Services Architecture (OGSA), and also the successor to MDS2, is also carried out. The performance of both the pull and push query mechanisms is analysed. GridAdapt (Self-adaptive Grid Resource Monitoring) is a new system that is proposed, built upon the Globus MDS3 benchmarking. It offers self-adaptation, autonomy and admission control at the Index Service, whilst ensuring that the MIDS is not overloaded and can meet its quality-of-service,f or example,i n terms of its average response time for servicing synchronous queries and the total number of queries returned per unit time

    CATS: linearizability and partition tolerance in scalable and self-organizing key-value stores

    Get PDF
    Distributed key-value stores provide scalable, fault-tolerant, and self-organizing storage services, but fall short of guaranteeing linearizable consistency in partially synchronous, lossy, partitionable, and dynamic networks, when data is distributed and replicated automatically by the principle of consistent hashing. This paper introduces consistent quorums as a solution for achieving atomic consistency. We present the design and implementation of CATS, a distributed key-value store which uses consistent quorums to guarantee linearizability and partition tolerance in such adverse and dynamic network conditions. CATS is scalable, elastic, and self-organizing; key properties for modern cloud storage middleware. Our system shows that consistency can be achieved with practical performance and modest throughput overhead (5%) for read-intensive workloads

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Full text link
    The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin

    Workflow environments for advanced cyberinfrastructure platforms

    Get PDF
    Progress in science is deeply bound to the effective use of high-performance computing infrastructures and to the efficient extraction of knowledge from vast amounts of data. Such data comes from different sources that follow a cycle composed of pre-processing steps for data curation and preparation for subsequent computing steps, and later analysis and analytics steps applied to the results. However, scientific workflows are currently fragmented in multiple components, with different processes for computing and data management, and with gaps in the viewpoints of the user profiles involved. Our vision is that future workflow environments and tools for the development of scientific workflows should follow a holistic approach, where both data and computing are integrated in a single flow built on simple, high-level interfaces. The topics of research that we propose involve novel ways to express the workflows that integrate the different data and compute processes, dynamic runtimes to support the execution of the workflows in complex and heterogeneous computing infrastructures in an efficient way, both in terms of performance and energy. These infrastructures include highly distributed resources, from sensors and instruments, and devices in the edge, to High-Performance Computing and Cloud computing resources. This paper presents our vision to develop these workflow environments and also the steps we are currently following to achieve it.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051). Javier Conejero postdoctoral contract is co-financed by the Ministry of Economy and Competitiveness under Juan de la Cierva Formacion´ postdoctoral fellowship number FJCI-2015-24651. This work is supported by the H2020 mF2C project (730929) and the CLASS project (780622). The participation of Rosa M Badia in the BDEC2 meetings is supported by the EXDCI project (800957). The dislib library developments are partially funded under the project agreement between BSC and FUJITSU.Peer ReviewedPostprint (author's final draft
    corecore