Search CORE

2,541 research outputs found

Survey and Analysis of Production Distributed Computing Infrastructures

Author: Jha Shantenu
Katz Daniel S.
Parashar Manish
Rana Omer
Weissman Jon
Publication venue
Publication date: 13/08/2012
Field of study

This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

arXiv.org e-Print Archive

FigShare

funcX: A Federated Function Serving Fabric for Science

Author: Akkus I. E.
CMS
Forde J.
Fox G.
Hightower K.
Hindman B.
Malawski M.
Merkel D.
Spillner J.
Stubbs J.
Wang L.
Waterman D. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.0490

arXiv.org e-Print Archive

Crossref

Building Near-Real-Time Processing Pipelines with the Spark-MPI Platform

Author: Chaudhary Aashish
Cowan Matt
Hanwell Marcus
Jourdain Sebastien
Malitsky Nikolay
O'Leary Patrick
Van Dam Kerstin Kleese
Publication venue
Publication date: 12/05/2018
Field of study

Advances in detectors and computational technologies provide new opportunities for applied research and the fundamental sciences. Concurrently, dramatic increases in the three Vs (Volume, Velocity, and Variety) of experimental data and the scale of computational tasks produced the demand for new real-time processing systems at experimental facilities. Recently, this demand was addressed by the Spark-MPI approach connecting the Spark data-intensive platform with the MPI high-performance framework. In contrast with existing data management and analytics systems, Spark introduced a new middleware based on resilient distributed datasets (RDDs), which decoupled various data sources from high-level processing algorithms. The RDD middleware significantly advanced the scope of data-intensive applications, spreading from SQL queries to machine learning to graph processing. Spark-MPI further extended the Spark ecosystem with the MPI applications using the Process Management Interface. The paper explores this integrated platform within the context of online ptychographic and tomographic reconstruction pipelines.Comment: New York Scientific Data Summit, August 6-9, 201

arXiv.org e-Print Archive

Crossref

Scipedia

COEL: A Web-based Chemistry Simulation Framework

Author: Banda Peter
Blount Drew
Teuscher Christof
Publication venue
Publication date: 01/01/2014
Field of study

The chemical reaction network (CRN) is a widely used formalism to describe macroscopic behavior of chemical systems. Available tools for CRN modelling and simulation require local access, installation, and often involve local file storage, which is susceptible to loss, lacks searchable structure, and does not support concurrency. Furthermore, simulations are often single-threaded, and user interfaces are non-trivial to use. Therefore there are significant hurdles to conducting efficient and collaborative chemical research. In this paper, we introduce a new enterprise chemistry simulation framework, COEL, which addresses these issues. COEL is the first web-based framework of its kind. A visually pleasing and intuitive user interface, simulations that run on a large computational grid, reliable database storage, and transactional services make COEL ideal for collaborative research and education. COEL's most prominent features include ODE-based simulations of chemical reaction networks and multicompartment reaction networks, with rich options for user interactions with those networks. COEL provides DNA-strand displacement transformations and visualization (and is to our knowledge the first CRN framework to do so), GA optimization of rate constants, expression validation, an application-wide plotting engine, and SBML/Octave/Matlab export. We also present an overview of the underlying software and technologies employed and describe the main architectural decisions driving our development. COEL is available at http://coel-sim.org for selected research teams only. We plan to provide a part of COEL's functionality to the general public in the near future.Comment: 23 pages, 12 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

PDXScholar (Portland State University)

Open Repository and Bibliography - Luxembourg

Development of Grid e-Infrastructure in South-Eastern Europe

Author: A Balaž
Antun Balaž
B Novaković
Boro Jakimovski
C Ozturan
D Davidović
D Syrakov
Dušan Vudragović
Emanouil Atanassov
I Vidanović
I Vidanović
IE Stanković
Ioannis Liabotis
K Lagouvardos
Mihajlo Savić
MV Milovanović
Ognjen Prnjat
V Kotroni
Vladimir Slavnić
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/12/2011
Field of study

Over the period of 6 years and three phases, the SEE-GRID programme has established a strong regional human network in the area of distributed scientific computing and has set up a powerful regional Grid infrastructure. It attracted a number of user communities and applications from diverse fields from countries throughout the South-Eastern Europe. From the infrastructure point view, the first project phase has established a pilot Grid infrastructure with more than 20 resource centers in 11 countries. During the subsequent two phases of the project, the infrastructure has grown to currently 55 resource centers with more than 6600 CPUs and 750 TBs of disk storage, distributed in 16 participating countries. Inclusion of new resource centers to the existing infrastructure, as well as a support to new user communities, has demanded setup of regionally distributed core services, development of new monitoring and operational tools, and close collaboration of all partner institution in managing such a complex infrastructure. In this paper we give an overview of the development and current status of SEE-GRID regional infrastructure and describe its transition to the NGI-based Grid model in EGI, with the strong SEE regional collaboration.Comment: 22 pages, 12 figures, 4 table

arXiv.org e-Print Archive

Crossref

The Cloud Services Innovation Platform-Enabling Service-Based Environmental Modelling Using Infrastructure-As-A-Service Cloud Computing

Author: Ascough II James C.
Carlson J. R.
David O.
Green T. R.
Lloyd W.
Lyon J.
Rojas K. W.
Publication venue: UW Tacoma Digital Commons
Publication date: 01/01/2012
Field of study

Service oriented architectures allow modelling engines to be hosted over the Internet abstracting physical hardware configuration and software deployments from model users. Many existing environmental models are deployed as desktop applications running on user\u27s personal computers (PCs). Migration to service - based modelling centralizes the modelling functions to service hosts on the Internet . Users no longer require high-end PCs to run models and model updates encapsulating science advances can be disseminated more rapidly by hosting the modelling functions centrally via an Internet host instead of requiring software updates to user\u27s PCs . In this paper we present the Cloud Services Innovation Platform (CSIP), an Infrastructure -as -a -Service cloud application architecture , used to prototype development of distributed and scalable environmental modelling services. CSIP aims to provide modelling as a service to support both interactive (synchronous) and batch (asynchronous) modelling. CSIP enables c loud-based computing resources to be harnessed for both new and existing environmental models supporting the disaggregation of work into subtasks which execute in parallel using a scalable number of virtual machines. This paper presents CSIP \u27s implementation using the RUSLE2 model as a prototype model. RUSLE2 model service benchmarks are presented to demonstrate performance gains from using cloud resources. We also provide benchmarks for virtualization overhead observed using popular virtual machine hypervisors and demonstrate how application profile characteristics significantly impact performance when virtualized

University of Washington: UW Tacoma Digital Commons