131 research outputs found

    HEP-Frame: A software engineered framework to aid the development and efficient multicore execution of scientific code

    Get PDF
    This communication presents an evolutionary soft- ware prototype of a user-centered Highly Efficient Pipelined Framework, HEP-Frame, to aid the development of sustainable parallel scientific code with a flexible pipeline structure. HEP- Frame is the result of a tight collaboration between computational scientists and software engineers: it aims to improve scientists coding productivity, ensuring an efficient parallel execution on a wide set of multicore systems, with both HPC and HTC techniques. Current prototype complies with the requirements of an actual scientific code, includes desirable sustainability features and supports at compile time additional plugin interfaces for other scientific fields. The porting and development productivity was assessed and preliminary efficiency results are promising.This work was supported by FCT (Fundação para a Ciência e Tecnologia) within Project Scope (UID/CEC/00319/2013), by LIP (Laboratório de Instrumentação e Física Experimental de Partículas) and by Project Search-ON2 (NORTE-07-0162- FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework, through the European Regional Development Fund

    Meta-scheduling Issues in Interoperable HPCs, Grids and Clouds

    Get PDF
    Over the last years, interoperability among resources has been emerged as one of the most challenging research topics. However, the commonality of the complexity of the architectures (e.g., heterogeneity) and the targets that each computational paradigm including HPC, grids and clouds aims to achieve (e.g., flexibility) remain the same. This is to efficiently orchestrate resources in a distributed computing fashion by bridging the gap among local and remote participants. Initially, this is closely related with the scheduling concept which is one of the most important issues for designing a cooperative resource management system, especially in large scale settings such as in grids and clouds. Within this context, meta-scheduling offers additional functionalities in the area of interoperable resource management, this is because of its great agility to handle sudden variations and dynamic situations in user demands. Accordingly, the case of inter-infrastructures, including InterCloud, entitle that the decentralised meta-scheduling scheme overcome issues like consolidated administration management, bottleneck and local information exposition. In this work, we detail the fundamental issues for developing an effective interoperable meta-scheduler for e-infrastructures in general and InterCloud in particular. Finally, we describe a simulation and experimental configuration based on real grid workload traces to demonstrate the interoperable setting as well as provide experimental results as part of a strategic plan for integrating future meta-schedulers

    Workshop Report: Campus Bridging: Reducing Obstacles on the Path to Big Answers 2015

    Get PDF
    For the researcher whose experiments require large-scale cyberinfrastructure, there exists significant challenges to successful completion. These challenges are broad and go far beyond the simple issue that there are not enough large-scale resources available; these solvable issues range from a lack of documentation written for a non-technical audience to a need for greater consistency with regard to system configuration and consistent software configuration and availability on the large-scale resources at national tier supercomputing centers, with a number of other challenges existing alongside the ones mentioned here. Campus Bridging is a relatively young discipline that aims to mitigate these issues for the academic end-user, for whom the entire process can feel like a path comprised entirely of obstacles. The solutions to these problems must by necessity include multiple approaches, with focus not only on the end user but on the system administrators responsible for supporting these resources as well as the systems themselves. These system resources include not only those at the supercomputing centers but also those that exist at the campus or departmental level and even on the personal computing devices the researcher uses to complete his or her work. This workshop report compiles the results of a half-day workshop, held in conjunction with IEEE Cluster 2015 in Chicago, IL.NSF XSED

    DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge

    Full text link
    The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running on a laptop to tens of millions of concurrent tasks on the second fastest supercomputer in the world. DALiuGE has been used in production for reducing interferometry data sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide Spectral Radioheliograph; and is being developed as the execution framework prototype for the Science Data Processor (SDP) consortium of the Square Kilometre Array (SKA) telescope. This paper presents a technical overview of DALiuGE and discusses case studies from the CHILES and MUSER projects that use DALiuGE to execute production pipelines. In a companion paper, we provide in-depth analysis of DALiuGE's scalability to very large numbers of tasks on two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and Computin

    Exploring Distributed HPC Scheduling in MATRIX

    Get PDF
    Abstract- Efficiently scheduling large number of jobs over large scale distributed systems is very critical in order to achieve high system utilization and throughput. Today's state-of-the-art job schedulers mostly follow a centralized architecture that is master/slave architecture. The problem with this architecture is that it cannot scale efficiently upto even petascales and is always vulnerable to single point of failure. This is over come by the distributed job management system called MATRIX (MAny-Task computing execution fabRIc at eXascale) which adopts a work stealing algorithm which aims at load balancing throughout the distributed system. The MATRIX currently supports Many Task Computing (MTC) workloads. This project aims at extending MATRIX in order to support the High Performance Computing (HPC) workloads. The HPC workloads are nothing but long jobs which needs multiple nodes/cores to run the tasks. It is a challenge to support HPC on the framework which supports MTC jobs. The framework is focused at efficiently scheduling sub-second jobs on available workers. The design of scheduling HPC jobs should be efficient enough in order to not hamper the efficient working of MTC tasks. I

    Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing

    Full text link
    Tesis por compendio[ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps.[CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps.[EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds.López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327TESISCompendi

    CompF2: Theoretical Calculations and Simulation Topical Group Report

    Full text link
    This report summarizes the work of the Computational Frontier topical group on theoretical calculations and simulation for Snowmass 2021. We discuss the challenges, potential solutions, and needs facing six diverse but related topical areas that span the subject of theoretical calculations and simulation in high energy physics (HEP): cosmic calculations, particle accelerator modeling, detector simulation, event generators, perturbative calculations, and lattice QCD (quantum chromodynamics). The challenges arise from the next generations of HEP experiments, which will include more complex instruments, provide larger data volumes, and perform more precise measurements. Calculations and simulations will need to keep up with these increased requirements. The other aspect of the challenge is the evolution of computing landscape away from general-purpose computing on CPUs and toward special-purpose accelerators and coprocessors such as GPUs and FPGAs. These newer devices can provide substantial improvements for certain categories of algorithms, at the expense of more specialized programming and memory and data access patterns.Comment: Report of the Computational Frontier Topical Group on Theoretical Calculations and Simulation for Snowmass 202

    Virtual Organization Clusters: Self-Provisioned Clouds on the Grid

    Get PDF
    Virtual Organization Clusters (VOCs) provide a novel architecture for overlaying dedicated cluster systems on existing grid infrastructures. VOCs provide customized, homogeneous execution environments on a per-Virtual Organization basis, without the cost of physical cluster construction or the overhead of per-job containers. Administrative access and overlay network capabilities are granted to Virtual Organizations (VOs) that choose to implement VOC technology, while the system remains completely transparent to end users and non-participating VOs. Unlike alternative systems that require explicit leases, VOCs are autonomically self-provisioned according to configurable usage policies. As a grid computing architecture, VOCs are designed to be technology agnostic and are implementable by any combination of software and services that follows the Virtual Organization Cluster Model. As demonstrated through simulation testing and evaluation of an implemented prototype, VOCs are a viable mechanism for increasing end-user job compatibility on grid sites. On existing production grids, where jobs are frequently submitted to a small subset of sites and thus experience high queuing delays relative to average job length, the grid-wide addition of VOCs does not adversely affect mean job sojourn time. By load-balancing jobs among grid sites, VOCs can reduce the total amount of queuing on a grid to a level sufficient to counteract the performance overhead introduced by virtualization
    • …
    corecore