4,919 research outputs found

    funcX: A Federated Function Serving Fabric for Science

    Full text link
    Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.0490

    Self-managing cloud-native applications : design, implementation and experience

    Get PDF
    Running applications in the cloud efficiently requires much more than deploying software in virtual machines. Cloud applications have to be continuously managed: (1) to adjust their resources to the incoming load and (2) to face transient failures replicating and restarting components to provide resiliency on unreliable infrastructure. Continuous management monitors application and infrastructural metrics to provide automated and responsive reactions to failures (health management) and changing environmental conditions (auto-scaling) minimizing human intervention. In the current practice, management functionalities are provided as infrastructural or third party services. In both cases they are external to the application deployment. We claim that this approach has intrinsic limits, namely that separating management functionalities from the application prevents them from naturally scaling with the application and requires additional management code and human intervention. Moreover, using infrastructure provider services for management functionalities results in vendor lock-in effectively preventing cloud applications to adapt and run on the most effective cloud for the job. In this paper we discuss the main characteristics of cloud native applications, propose a novel architecture that enables scalable and resilient self-managing applications in the cloud, and relate on our experience in porting a legacy application to the cloud applying cloud-native principles

    The Clarens Web Service Framework for Distributed Scientific Analysis in Grid Projects

    Get PDF
    Large scientific collaborations are moving towards service oriented architecutres for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web Service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web Services that support scientific analysis

    The evolution of bits and bottlenecks in a scientific workflow trying to keep up with technology: Accelerating 4D image segmentation applied to nasa data

    Get PDF
    In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period

    SGX-Aware Container Orchestration for Heterogeneous Clusters

    Full text link
    Containers are becoming the de facto standard to package and deploy applications and micro-services in the cloud. Several cloud providers (e.g., Amazon, Google, Microsoft) begin to offer native support on their infrastructure by integrating container orchestration tools within their cloud offering. At the same time, the security guarantees that containers offer to applications remain questionable. Customers still need to trust their cloud provider with respect to data and code integrity. The recent introduction by Intel of Software Guard Extensions (SGX) into the mass market offers an alternative to developers, who can now execute their code in a hardware-secured environment without trusting the cloud provider. This paper provides insights regarding the support of SGX inside Kubernetes, an industry-standard container orchestrator. We present our contributions across the whole stack supporting execution of SGX-enabled containers. We provide details regarding the architecture of the scheduler and its monitoring framework, the underlying operating system support and the required kernel driver extensions. We evaluate our complete implementation on a private cluster using the real-world Google Borg traces. Our experiments highlight the performance trade-offs that will be encountered when deploying SGX-enabled micro-services in the cloud.Comment: Presented in the 38th IEEE International Conference on Distributed Computing Systems (ICDCS 2018

    Distributed Handler Architecture

    Get PDF
    Thesis (PhD) - Indiana University, Computer Sciences, 2007Over the last couple of decades, distributed systems have been demonstrated an architectural evolvement based on models including client/server, multi-tier, distributed objects, messaging and peer-to-peer. One recent evolutionary step is Service Oriented Architecture (SOA), whose goal is to achieve loose-coupling among the interacting software applications for scalability and interoperability. The SOA model is engendered in Web Services, which provide software platforms to build applications as services and to create seamless and loosely-coupled interactions. Web Services utilize supportive functionalities such as security, reliability, monitoring, logging and so forth. These functionalities are typically provisioned as handlers, which incrementally add new capabilities to the services by building an execution chain. Even though handlers are very important to the service, the way of utilization is very crucial to attain the potential benefits. Every attempt to support a service with an additive functionality increases the chance of having an overwhelmingly crowded chain: this makes Web Service fat. Moreover, a handler may become a bottleneck because of having a comparably higher processing time. In this dissertation, we present Distributed Handler Architecture (DHArch) to provide an efficient, scalable and modular architecture to manage the execution of the handlers. The system distributes the handlers by utilizing a Message Oriented Middleware and orchestrates their execution in an efficient fashion. We also present an empirical evaluation of the system to demonstrate the suitability of this architecture to cope with the issues that exist in the conventional Web Service handler structures

    A gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers

    Get PDF
    Software pipelines enable organizations to chain applications for adding value to contents (e.g., confidentially, reliability, and integrity) before either sharing them with partners or sending them to the cloud. However, the pipeline components add overhead when processing large volumes of data, which can become critical in real-world scenarios. This paper presents a gearbox model for processing large volumes of data by using pipeline systems encapsulated into virtual containers. In this model, the gears represent applications, whereas gearboxes represent software pipelines. This model was implemented as a collaborative system that automatically performs Gear up (by using parallel patterns) and/or Gear down (by using in-memory storage) until all gears produce uniform data processing velocities. This model reduces delays and bottlenecks produced by the heterogeneous performance of applications included in software pipelines. The new container tool has been designed to encapsulate both the collaborative system and the software pipelines into a virtual container and deploy it on IT infrastructures. We conducted case studies to evaluate the performance of when processing medical images and PDF repositories. The incorporation of a capsule to a cloud storage service for pre-processing medical imagery was also studied. The experimental evaluation revealed the feasibility of applying the gearbox model to the deployment of software pipelines in real-world scenarios as it can significantly improve the end-user service experience when pre-processing large-scale data in comparison with state-of-the-art solutions such as Sacbe and Parsl.This work has been partially supported by the “Spanish Ministerio de Economia y Competitividad ” under the project grant TIN2016-79637-P “Towards Unification of HPC and Big Data paradigms”
    • …
    corecore