5 research outputs found
Recommended from our members
Running the Sloan Digital Sky Survey data archive server
The Sloan Digital Sky Survey (SDSS) Data Archive Server (DAS) provides public access to over 12Tb of data in 17 million files produced by the SDSS data reduction pipeline. Many tasks which seem trivial when serving smaller, less complex data sets present challenges when serving data of this volume and technical complexity. The included output files should be chosen to support as much science as possible from publicly released data, and only publicly released data. Users must have the resources needed to read and interpret the data correctly. Server administrators must generate new data releases at regular intervals, monitor usage, quickly recover from hardware failures, and monitor the data served by the DAS both for contents and corruption. We discuss these challenges, describe tools we use to administer and support the DAS, and discuss future development plans
Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows
International audienceWith their globally distributed datacenters, clouds now provide an opportunity to run complex large-scale applications on dynamically provisioned, networked and federated infrastructures. However, there is a lack of tools supporting data-intensive applications across geographically distributed sites. For instance, scientific workflows which handle many small files can easily saturate state-of-the-art distributed filesystems based on centralized metadata servers (e.g. HDFS, PVFS). In this paper, we explore several alternative design strategies to efficiently support the execution of existing workflow engines across multi-site clouds, by reducing the cost of metadata operations. These strategies leverage workflow semantics in a 2-level metadata partitioning hierarchy that combines distribution and replication. The system was validated on the Microsoft Azure cloud across 4 EU and US datacenters. The experiments were conducted on 128 nodes using synthetic benchmarks and real-life applications. We observe as much as 28% gain in execution time for a parallel, geo-distributed real-world application (Montage) and up to 50% for a metadata-intensive synthetic benchmark, compared to a baseline centralized configuration
Supercomputing Frontiers
This open access book constitutes the refereed proceedings of the 7th Asian Conference Supercomputing Conference, SCFA 2022, which took place in Singapore in March 2022. The 8 full papers presented in this book were carefully reviewed and selected from 21 submissions. They cover a range of topics including file systems, memory hierarchy, HPC cloud platform, container image configuration workflow, large-scale applications, and scheduling