1,014 research outputs found
Survey and Analysis of Production Distributed Computing Infrastructures
This report has two objectives. First, we describe a set of the production
distributed infrastructures currently available, so that the reader has a basic
understanding of them. This includes explaining why each infrastructure was
created and made available and how it has succeeded and failed. The set is not
complete, but we believe it is representative.
Second, we describe the infrastructures in terms of their use, which is a
combination of how they were designed to be used and how users have found ways
to use them. Applications are often designed and created with specific
infrastructures in mind, with both an appreciation of the existing capabilities
provided by those infrastructures and an anticipation of their future
capabilities. Here, the infrastructures we discuss were often designed and
created with specific applications in mind, or at least specific types of
applications. The reader should understand how the interplay between the
infrastructure providers and the users leads to such usages, which we call
usage modalities. These usage modalities are really abstractions that exist
between the infrastructures and the applications; they influence the
infrastructures by representing the applications, and they influence the ap-
plications by representing the infrastructures
A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications
Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society. The European Open Science Cloud (EOSC) initiative is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications and store, share and reuse research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. To meet those resource demands, computing paradigms such as High-Performance Computing (HPC) and Cloud Computing are applied to e-science applications. However, adapting applications and services to these paradigms is a challenging task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a general barrier to its uptake by scientists. In this context, EOSC-Synergy, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC’s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-Synergy to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services can be transferred to new services for the adoption of the EOSC ecosystem framework. The article made several recommendations for the integration of thematic services in the EOSC ecosystem regarding Authentication and Authorization (federated regional or thematic solutions based on EduGAIN mainly), FAIR data and metadata preservation solutions (both at cataloguing and data preservation—such as EUDAT’s B2SHARE), cloud platform-agnostic resource management services (such as Infrastructure Manager) and workload management solutions.This work was supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 857647, EOSC-Synergy, European Open Science Cloud - Expanding Capacities by building Capabilities. Moreover, this work is partially funded by grant No 2015/24461-2, São Paulo Research Foundation (FAPESP). Francisco Brasileiro is a CNPq/Brazil researcher (grant 308027/2020-5).Peer Reviewed"Article signat per 20 autors/es: Amanda Calatrava, Hernán Asorey, Jan Astalos, Alberto Azevedo, Francesco Benincasa, Ignacio Blanquer, Martin Bobak, Francisco Brasileiro, Laia Codó, Laura del Cano, Borja Esteban, Meritxell Ferret, Josef Handl, Tobias Kerzenmacher, Valentin Kozlov, Aleš Křenek, Ricardo Martins, Manuel Pavesio, Antonio Juan Rubio-Montero, Juan Sánchez-Ferrero "Postprint (published version
A survey of the European Open Science Cloud services for expanding the capacity and capabilities of multidisciplinary scientific applications
Open Science is a paradigm in which scientific data, procedures, tools and results are shared transparently and reused by society as a whole. The initiative known as the European Open Science Cloud (EOSC) is an effort in Europe to provide an open, trusted, virtual and federated computing environment to execute scientific applications, and to store, share and re-use research data across borders and scientific disciplines. Additionally, scientific services are becoming increasingly data-intensive, not only in terms of computationally intensive tasks but also in terms of storage resources. Computing paradigms such as High Performance Computing (HPC) and Cloud Computing are applied to e-science applications to meet these demands. However, adapting applications and services to these paradigms is not a trivial task, commonly requiring a deep knowledge of the underlying technologies, which often constitutes a barrier for its uptake by scientists in general. In this context, EOSC-SYNERGY, a collaborative project involving more than 20 institutions from eight European countries pooling their knowledge and experience to enhance EOSC\u27s capabilities and capacities, aims to bring EOSC closer to the scientific communities. This article provides a summary analysis of the adaptations made in the ten thematic services of EOSC-SYNERGY to embrace this paradigm. These services are grouped into four categories: Earth Observation, Environment, Biomedicine, and Astrophysics. The analysis will lead to the identification of commonalities, best practices and common requirements, regardless of the thematic area of the service. Experience gained from the thematic services could be transferred to new services for the adoption of the EOSC ecosystem framework
Development of a centralized log management system
Os registos de um sistema são uma peça crucial de qualquer sistema e fornecem
uma visão útil daquilo que este está fazendo e do que acontenceu em caso de falha.
Qualquer processo executado num sistema gera registos em algum formato.
Normalmente, estes registos ficam armazenados em memória local. À medida que os
sistemas evoluiram, o número de registos a analisar também aumentou, e, como
consequência desta evolução, surgiu a necessidade de produzir um formato de registos
uniforme, minimizando assim dependências e facilitando o processo de análise.
A ams é uma empresa que desenvolve e cria soluções no mercado dos sensores.
Com vinte e dois centros de design e trĂŞs locais de fabrico, a empresa fornece os seus
serviços a mais de oito mil clientes em todo o mundo. Um centro de design está
localizado no Funchal, no qual está incluida uma equipa de engenheiros de aplicação
que planeiam e desenvolvem applicações de software para clientes internos. O processo
de desenvolvimento destes engenheiros envolve várias aplicações e programas, cada
um com o seu prĂłprio sistema de registos.
Os registos gerados por cada aplicação são mantido em sistemas de
armazenamento distintos. Se um desenvolvedor ou administrador quiser solucionar um
problema que abrange várias aplicações, será necessário percorrer as várias localizações
onde os registos estĂŁo armazenados, colecionando-os e correlacionando-os de forma a
melhor entender o problema. Este processo Ă© cansativo e, se o ambiente for
dimensionado automaticamente, a solução de problemas semelhantes torna-se
inconcebĂvel.
Este projeto teve como principal objetivo resolver estes problemas, criando
assim um Sistema de GestĂŁo de Registos Centralizado capaz de lidar com registos de
várias fontes, como também fornecer serviços que irão ajudar os desenvolvedores e
administradores a melhor entender os diferentes ambientes afetados.
A solução final foi desenvolvida utilizando um conjunto de diferentes tecnologias
de cĂłdigo aberto, tais como a Elastic Stack (Elasticsearch, Logstash e Kibana), Node.js,
GraphQL e Cassandra.
O presente documento descreve o processo e as decisões tomadas para chegar
à solução apresentada.Logs are a crucial piece of any system and give a helpful insight into what it is
doing as well as what happened in case of failure. Every process running on a system
generates logs in some format. Generally, these logs are written to local storage
resources. As systems evolved, the number of logs to analyze increased, and, as a
consequence of this progress, there was the need of having a standardized log format,
minimizing dependencies and making the analysis process easier.
ams is a company that develops and creates sensor solutions. With twenty-two
design centers and three manufacturing locations, the company serves to over eight
thousand clients worldwide. One design center is located in Funchal, which includes a
team of application engineers that design and develop software applications to clients
inside the company. The application engineer’s development process is comprised of
several applications and programs, each having its own logging system.
Log entries generated by different applications are kept in separate storage
systems. If a developer or administrator wants to troubleshoot an issue that includes
several applications, he/she would have to go to different database systems or locations
to collect the logs and correlate them across the several requests. This is a tiresome
process and if the environment is auto-scaled, then troubleshooting an issue is
inconceivable.
This project aimed to solve these problems by creating a Centralized Log
Management System that was capable of handling logs from a variety of sources, as well
as to provide services that will help developers and administrators better understand
the different affected environments.
The deployed solution was developed using a set of different open-source
technologies, such as the Elastic Stack (Elasticsearch, Logstash and Kibana), Node.js,
GraphQL and Cassandra.
The present document describes the process and decisions taken to achieve the
solution
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Automated Data for DevSecOps Programs
Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumAutomation in DevSecOps (DSO) transforms the practice of building, deploying, and managing software intensive programs. Although this automation supports continuous delivery and rapid builds, the persistent manual collection of information delays (by weeks) the release of program status metrics and the decisions they are intended to inform. Emerging DSO metrics (e.g., deployment rates, lead times) provide insight into how software development is progressing but fall short of replacing program control metrics for assessing progress (e.g., burn rates against spend targets, integration capability tar-get dates, and schedule for the minimum viable capability release). By instrumenting the (potentially in-teracting) DSO pipelines and supporting environments, the continuous measurement of status, identifica-tion of emerging risks, and probabilistic projections are possible and practical. In this paper, we discuss our research on the information modeling, measurement, metrics, and indicators necessary to establish a continuous program control capability that can keep pace with DSO management needs. We discuss the importance of interactive visualization dashboards for addressing program information needs. We also identify and address the gaps and barriers in the current state of the practice. Finally, we recommend future research needs based on our initial findings.Approved for public release; distribution is unlimited
Automated Data for DevSecOps Programs
Excerpt from the Proceedings of the Nineteenth Annual Acquisition Research SymposiumAutomation in DevSecOps (DSO) transforms the practice of building, deploying, and managing software intensive programs. Although this automation supports continuous delivery and rapid builds, the persistent manual collection of information delays (by weeks) the release of program status metrics and the decisions they are intended to inform. Emerging DSO metrics (e.g., deployment rates, lead times) provide insight into how software development is progressing but fall short of replacing program control metrics for assessing progress (e.g., burn rates against spend targets, integration capability tar-get dates, and schedule for the minimum viable capability release). By instrumenting the (potentially in-teracting) DSO pipelines and supporting environments, the continuous measurement of status, identifica-tion of emerging risks, and probabilistic projections are possible and practical. In this paper, we discuss our research on the information modeling, measurement, metrics, and indicators necessary to establish a continuous program control capability that can keep pace with DSO management needs. We discuss the importance of interactive visualization dashboards for addressing program information needs. We also identify and address the gaps and barriers in the current state of the practice. Finally, we recommend future research needs based on our initial findings.Approved for public release; distribution is unlimited
- …