1,014 research outputs found

    Survey and Analysis of Production Distributed Computing Infrastructures

    Full text link
    This report has two objectives. First, we describe a set of the production distributed infrastructures currently available, so that the reader has a basic understanding of them. This includes explaining why each infrastructure was created and made available and how it has succeeded and failed. The set is not complete, but we believe it is representative. Second, we describe the infrastructures in terms of their use, which is a combination of how they were designed to be used and how users have found ways to use them. Applications are often designed and created with specific infrastructures in mind, with both an appreciation of the existing capabilities provided by those infrastructures and an anticipation of their future capabilities. Here, the infrastructures we discuss were often designed and created with specific applications in mind, or at least specific types of applications. The reader should understand how the interplay between the infrastructure providers and the users leads to such usages, which we call usage modalities. These usage modalities are really abstractions that exist between the infrastructures and the applications; they influence the infrastructures by representing the applications, and they influence the ap- plications by representing the infrastructures

    A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications

    Get PDF
    Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society. The European Open Science Cloud (EOSC) initiative is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications and store, share and reuse research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. To meet those resource demands, computing paradigms such as High-Performance Computing (HPC) and Cloud Computing are applied to e-science applications. However, adapting applications and services to these paradigms is a challenging task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a general barrier to its uptake by scientists. In this context, EOSC-Synergy, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC’s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-Synergy to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services can be transferred to new services for the adoption of the EOSC ecosystem framework. The article made several recommendations for the integration of thematic services in the EOSC ecosystem regarding Authentication and Authorization (federated regional or thematic solutions based on EduGAIN mainly), FAIR data and metadata preservation solutions (both at cataloguing and data preservation—such as EUDAT’s B2SHARE), cloud platform-agnostic resource management services (such as Infrastructure Manager) and workload management solutions.This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857647, EOSC-Synergy, European Open Science Cloud - Expanding Capacities by building Capabilities. Moreover, this work is partially funded by grant No 2015/24461-2, São Paulo Research Foundation (FAPESP). Francisco Brasileiro is a CNPq/Brazil researcher (grant 308027/2020-5).Peer Reviewed"Article signat per 20 autors/es: Amanda Calatrava, Hernán Asorey, Jan Astalos, Alberto Azevedo, Francesco Benincasa, Ignacio Blanquer, Martin Bobak, Francisco Brasileiro, Laia Codó, Laura del Cano, Borja Esteban, Meritxell Ferret, Josef Handl, Tobias Kerzenmacher, Valentin Kozlov, Aleš Křenek, Ricardo Martins, Manuel Pavesio, Antonio Juan Rubio-Montero, Juan Sánchez-Ferrero "Postprint (published version

    A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications

    Get PDF
    Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society as a whole. The initiative known as the European Open Science Cloud (EOSC) is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications, and to store, share and re-use research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. Computing paradigms such as High Performance Computing (HPC) and Cloud Computing are applied to e-science applications to meet these demands. However, adapting applications and services to these paradigms is not a trivial task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a barrier for its uptake by scientists in general. In this context, EOSC-SYNERGY, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC\u27s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-SYNERGY to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services could be transferred to new services for the adoption of the EOSC ecosystem framework

    Development of a centralized log management system

    Get PDF
    Os registos de um sistema são uma peça crucial de qualquer sistema e fornecem uma visão útil daquilo que este está fazendo e do que acontenceu em caso de falha. Qualquer processo executado num sistema gera registos em algum formato. Normalmente, estes registos ficam armazenados em memória local. À medida que os sistemas evoluiram, o número de registos a analisar também aumentou, e, como consequência desta evolução, surgiu a necessidade de produzir um formato de registos uniforme, minimizando assim dependências e facilitando o processo de análise. A ams é uma empresa que desenvolve e cria soluções no mercado dos sensores. Com vinte e dois centros de design e três locais de fabrico, a empresa fornece os seus serviços a mais de oito mil clientes em todo o mundo. Um centro de design está localizado no Funchal, no qual está incluida uma equipa de engenheiros de aplicação que planeiam e desenvolvem applicações de software para clientes internos. O processo de desenvolvimento destes engenheiros envolve várias aplicações e programas, cada um com o seu próprio sistema de registos. Os registos gerados por cada aplicação são mantido em sistemas de armazenamento distintos. Se um desenvolvedor ou administrador quiser solucionar um problema que abrange várias aplicações, será necessário percorrer as várias localizações onde os registos estão armazenados, colecionando-os e correlacionando-os de forma a melhor entender o problema. Este processo é cansativo e, se o ambiente for dimensionado automaticamente, a solução de problemas semelhantes torna-se inconcebível. Este projeto teve como principal objetivo resolver estes problemas, criando assim um Sistema de Gestão de Registos Centralizado capaz de lidar com registos de várias fontes, como também fornecer serviços que irão ajudar os desenvolvedores e administradores a melhor entender os diferentes ambientes afetados. A solução final foi desenvolvida utilizando um conjunto de diferentes tecnologias de código aberto, tais como a Elastic Stack (Elasticsearch, Logstash e Kibana), Node.js, GraphQL e Cassandra. O presente documento descreve o processo e as decisões tomadas para chegar à solução apresentada.Logs are a crucial piece of any system and give a helpful insight into what it is doing as well as what happened in case of failure. Every process running on a system generates logs in some format. Generally, these logs are written to local storage resources. As systems evolved, the number of logs to analyze increased, and, as a consequence of this progress, there was the need of having a standardized log format, minimizing dependencies and making the analysis process easier. ams is a company that develops and creates sensor solutions. With twenty-two design centers and three manufacturing locations, the company serves to over eight thousand clients worldwide. One design center is located in Funchal, which includes a team of application engineers that design and develop software applications to clients inside the company. The application engineer’s development process is comprised of several applications and programs, each having its own logging system. Log entries generated by different applications are kept in separate storage systems. If a developer or administrator wants to troubleshoot an issue that includes several applications, he/she would have to go to different database systems or locations to collect the logs and correlate them across the several requests. This is a tiresome process and if the environment is auto-scaled, then troubleshooting an issue is inconceivable. This project aimed to solve these problems by creating a Centralized Log Management System that was capable of handling logs from a variety of sources, as well as to provide services that will help developers and administrators better understand the different affected environments. The deployed solution was developed using a set of different open-source technologies, such as the Elastic Stack (Elasticsearch, Logstash and Kibana), Node.js, GraphQL and Cassandra. The present document describes the process and decisions taken to achieve the solution

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Automated Data for DevSecOps Programs

    Get PDF
    Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumAutomation in DevSecOps (DSO) transforms the practice of building, deploying, and managing software intensive programs. Although this automation supports continuous delivery and rapid builds, the persistent manual collection of information delays (by weeks) the release of program status metrics and the decisions they are intended to inform. Emerging DSO metrics (e.g., deployment rates, lead times) provide insight into how software development is progressing but fall short of replacing program control metrics for assessing progress (e.g., burn rates against spend targets, integration capability tar-get dates, and schedule for the minimum viable capability release). By instrumenting the (potentially in-teracting) DSO pipelines and supporting environments, the continuous measurement of status, identifica-tion of emerging risks, and probabilistic projections are possible and practical. In this paper, we discuss our research on the information modeling, measurement, metrics, and indicators necessary to establish a continuous program control capability that can keep pace with DSO management needs. We discuss the importance of interactive visualization dashboards for addressing program information needs. We also identify and address the gaps and barriers in the current state of the practice. Finally, we recommend future research needs based on our initial findings.Approved for public release; distribution is unlimited

    Automated Data for DevSecOps Programs

    Get PDF
    Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumAutomation in DevSecOps (DSO) transforms the practice of building, deploying, and managing software intensive programs. Although this automation supports continuous delivery and rapid builds, the persistent manual collection of information delays (by weeks) the release of program status metrics and the decisions they are intended to inform. Emerging DSO metrics (e.g., deployment rates, lead times) provide insight into how software development is progressing but fall short of replacing program control metrics for assessing progress (e.g., burn rates against spend targets, integration capability tar-get dates, and schedule for the minimum viable capability release). By instrumenting the (potentially in-teracting) DSO pipelines and supporting environments, the continuous measurement of status, identifica-tion of emerging risks, and probabilistic projections are possible and practical. In this paper, we discuss our research on the information modeling, measurement, metrics, and indicators necessary to establish a continuous program control capability that can keep pace with DSO management needs. We discuss the importance of interactive visualization dashboards for addressing program information needs. We also identify and address the gaps and barriers in the current state of the practice. Finally, we recommend future research needs based on our initial findings.Approved for public release; distribution is unlimited
    • …