1,613 research outputs found

    Optimizing performance of workflow executions under authorization control

    Get PDF
    “Business processes or workflows are often used to model enterprise or scientific applications. It has received considerable attention to automate workflow executions on computing resources. However, many workflow scenarios still involve human activities and consist of a mixture of human tasks and computing tasks. Human involvement introduces security and authorization concerns, requiring restrictions on who is allowed to perform which tasks at what time. Role- Based Access Control (RBAC) is a popular authorization mechanism. In RBAC, the authorization concepts such as roles and permissions are defined, and various authorization constraints are supported, including separation of duty, temporal constraints, etc. Under RBAC, users are assigned to certain roles, while the roles are associated with prescribed permissions. When we assess resource capacities, or evaluate the performance of workflow executions on supporting platforms, it is often assumed that when a task is allocated to a resource, the resource will accept the task and start the execution once a processor becomes available. However, when the authorization policies are taken into account,” this assumption may not be true and the situation becomes more complex. For example, when a task arrives, a valid and activated role has to be assigned to a task before the task can start execution. The deployed authorization constraints may delay the workflow execution due to the roles’ availability, or other restrictions on the role assignments, which will consequently have negative impact on application performance. When the authorization constraints are present to restrict the workflow executions, it entails new research issues that have not been studied yet in conventional workflow management. This thesis aims to investigate these new research issues. First, it is important to know whether a feasible authorization solution can be found to enable the executions of all tasks in a workflow, i.e., check the feasibility of the deployed authorization constraints. This thesis studies the issue of the feasibility checking and models the feasibility checking problem as a constraints satisfaction problem. Second, it is useful to know when the performance of workflow executions will not be affected by the given authorization constraints. This thesis proposes the methods to determine the time durations when the given authorization constraints do not have impact. Third, when the authorization constraints do have the performance impact, how can we quantitatively analyse and determine the impact? When there are multiple choices to assign the roles to the tasks, will different choices lead to the different performance impact? If so, can we find an optimal way to conduct the task-role assignments so that the performance impact is minimized? This thesis proposes the method to analyze the delay caused by the authorization constraints if the workflow arrives beyond the non-impact time duration calculated above. Through the analysis of the delay, we realize that the authorization method, i.e., the method to select the roles to assign to the tasks affects the length of the delay caused by the authorization constraints. Based on this finding, we propose an optimal authorization method, called the Global Authorization Aware (GAA) method. Fourth, a key reason why authorization constraints may have impact on performance is because the authorization control directs the tasks to some particular roles. Then how to determine the level of workload directed to each role given a set of authorization constraints? This thesis conducts the theoretical analysis about how the authorization constraints direct the workload to the roles, and proposes the methods to calculate the arriving rate of the requests directed to each role under the role, temporal and cardinality constraints. Finally, the amount of resources allocated to support each individual role may have impact on the execution performance of the workflows. Therefore, it is desired to develop the strategies to determine the adequate amount of resources when the authorization control is present in the system. This thesis presents the methods to allocate the appropriate quantity for resources, including both human resources and computing resources. Different features of human resources and computing resources are taken into account. For human resources, the objective is to maximize the performance subject to the budgets to hire the human resources, while for computing resources, the strategy aims to allocate adequate amount of computing resources to meet the QoS requirements

    Flock together with CReATIVE-B: A roadmap of global research data infrastructures supporting biodiversity and ecosystem science

    Get PDF
    Biodiversity research infrastructures are providing the integrated data sets and support for studying scenarios of biodiversity and ecosystem dynamics. The CReATIVE-B project - Coordination of Research e-Infrastructures Activities Toward an International Virtual Environment for Biodiversity – explored how cooperation and interoperability of large-scale Research Infrastructures across the globe could support the challenges of biodiversity and ecosystem research. A key outcome of the project is that the research infrastructures agreed to continue cooperation after the end of the project to advance scientific progress in understanding and predicting the complexity of natural systems. By working together in implementing the recommendations in this Roadmap, the data and capabilities of the cooperating research infrastructures are better served to address the grand challenges for biodiversity and ecosystem scientists

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing

    Full text link
    Tesis por compendio[ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps.[CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps.[EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds.López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327TESISCompendi

    Simulation of the performance of complex data-intensive workflows

    Get PDF
    PhD ThesisRecently, cloud computing has been used for analytical and data-intensive processes as it offers many attractive features, including resource pooling, on-demand capability and rapid elasticity. Scientific workflows use these features to tackle the problems of complex data-intensive applications. Data-intensive workflows are composed of many tasks that may involve large input data sets and produce large amounts of data as output, which typically runs in highly dynamic environments. However, the resources should be allocated dynamically depending on the demand changes of the work flow, as over-provisioning increases the cost and under-provisioning causes Service Level Agreement (SLA) violation and poor Quality of Service (QoS). Performance prediction of complex workflows is a necessary step prior to the deployment of the workflow. Performance analysis of complex data-intensive workflows is a challenging task due to the complexity of their structure, diversity of big data, and data dependencies, in addition to the required examination to the performance and challenges associated with running their workflows in the real cloud. In this thesis, a solution is explored to address these challenges, using a Next Generation Sequencing (NGS) workflow pipeline as a case study, which may require hundreds/ thousands of CPU hours to process a terabyte of data. We propose a methodology to model, simulate and predict runtime and the number of resources used by the complex data-intensive workflows. One contribution of our simulation methodology is that it provides an ability to extract the simulation parameters (e.g., MIPs and BW values) that are required for constructing a training set and a fairly accurate prediction of the run time for input for cluster sizes much larger than ones used in training of the prediction model. The proposed methodology permits the derivation of run time prediction based on historical data from the provenance fi les. We present the run time prediction of the complex workflow by considering different cases of its running in the cloud such as execution failure and library deployment time. In case of failure, the framework can apply the prediction only partially considering the successful parts of the pipeline, in the other case the framework can predict with or without considering the time to deploy libraries. To further improve the accuracy of prediction, we propose a simulation model that handles I/O contention

    Scalable And Secure Provenance Querying For Scientific Workflows And Its Application In Autism Study

    Get PDF
    In the era of big data, scientific workflows have become essential to automate scientific experiments and guarantee repeatability. As both data and workflow increase in their scale, requirements for having a data lineage management system commensurate with the complexity of the workflow also become necessary, calling for new scalable storage, query, and analytics infrastructure. This system that manages and preserves the derivation history and morphosis of data, known as provenance system, is essential for maintaining quality and trustworthiness of data products and ensuring reproducibility of scientific discoveries. With a flurry of research and increased adoption of scientific workflows in processing sensitive data, i.e., health and medication domain, securing information flow and instrumenting access privileges in the system have become a fundamental precursor to deploying large-scale scientific workflows. That has become more important now since today team of scientists around the world can collaborate on experiments using globally distributed sensitive data sources. Hence, it has become imperative to augment scientific workflow systems as well as the underlying provenance management systems with data security protocols. Provenance systems, void of data security protocol, are susceptible to vulnerability. In this dissertation research, we delineate how scientific workflows can improve therapeutic practices in autism spectrum disorders. The data-intensive computation inherent in these workflows and sensitive nature of the data, necessitate support for scalable, parallel and robust provenance queries and secured view of data. With that in perspective, we propose OPQLPigOPQL^{Pig}, a parallel, robust, reliable and scalable provenance query language and introduce the concept of access privilege inheritance in the provenance systems. We characterize desirable properties of role-based access control protocol in scientific workflows and demonstrate how the qualities are integrated into the workflow provenance systems as well. Finally, we describe how these concepts fit within the DATAVIEW workflow management system

    Native structure-based modeling and simulation of biomolecular systems per mouse click

    Get PDF
    Background Molecular dynamics (MD) simulations provide valuable insight into biomolecular systems at the atomic level. Notwithstanding the ever-increasing power of high performance computers current MD simulations face several challenges: the fastest atomic movements require time steps of a few femtoseconds which are small compared to biomolecular relevant timescales of milliseconds or even seconds for large conformational motions. At the same time, scalability to a large number of cores is limited mostly due to long-range interactions. An appealing alternative to atomic-level simulations is coarse-graining the resolution of the system or reducing the complexity of the Hamiltonian to improve sampling while decreasing computational costs. Native structure-based models, also called Gō-type models, are based on energy landscape theory and the principle of minimal frustration. They have been tremendously successful in explaining fundamental questions of, e.g., protein folding, RNA folding or protein function. At the same time, they are computationally sufficiently inexpensive to run complex simulations on smaller computing systems or even commodity hardware. Still, their setup and evaluation is quite complex even though sophisticated software packages support their realization. Results Here, we establish an efficient infrastructure for native structure-based models to support the community and enable high-throughput simulations on remote computing resources via GridBeans and UNICORE middleware. This infrastructure organizes the setup of such simulations resulting in increased comparability of simulation results. At the same time, complete workflows for advanced simulation protocols can be established and managed on remote resources by a graphical interface which increases reusability of protocols and additionally lowers the entry barrier into such simulations for, e.g., experimental scientists who want to compare their results against simulations. We demonstrate the power of this approach by illustrating it for protein folding simulations for a range of proteins. Conclusions We present software enhancing the entire workflow for native structure-based simulations including exception-handling and evaluations. Extending the capability and improving the accessibility of existing simulation packages the software goes beyond the state of the art in the domain of biomolecular simulations. Thus we expect that it will stimulate more individuals from the community to employ more confidently modeling in their research
    corecore