Search CORE

518 research outputs found

ASCR/HEP Exascale Requirements Review Report

This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

arXiv.org e-Print Archive

eScholarship - University of California

NORNS: Extending Slurm to Support Data-Driven Workflows through Asynchronous Data Staging

Author: Jackson William
Miranda Alberto
Nou Ramon
Panourgias Iakovos
Tocci Tommaso
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2019
Field of study

Crossref

Edinburgh Research Explorer

Building a scientific workflow framework to enable real‐time machine learning and visualization

Author: Li Feng
Song Fengguang
Publication venue: 'Wiley'
Publication date: 01/08/2019
Field of study

Nowadays, we have entered the era of big data. In the area of high performance computing, large‐scale simulations can generate huge amounts of data with potentially critical information. However, these data are usually saved in intermediate files and are not instantly visible until advanced data analytics techniques are applied after reading all simulation data from persistent storages (eg, local disks or a parallel file system). This approach puts users in a situation where they spend long time on waiting for running simulations while not knowing the status of the running job. In this paper, we build a new computational framework to couple scientific simulations with multi‐step machine learning processes and in‐situ data visualizations. We also design a new scalable simulation‐time clustering algorithm to automatically detect fluid flow anomalies. This computational framework is built upon different software components and provides plug‐in data analysis and visualization functions over complex scientific workflows. With this advanced framework, users can monitor and get real‐time notifications of special patterns or anomalies from ongoing extreme‐scale turbulent flow simulations

IUPUIScholarWorks

Chapter 4 Data System and Data Management in a Federation of HPC/ Cloud Centers

Author: Couvee Philippe
Donnat Frederic
García- Hernández Rubén J.
Golasowski Martin
Hachinger Stephan
Hayek Mohamad
Koch-Hofer Cédric
Martinovič Jan
Munke Johannes
Publication venue: 'Informa UK Limited'
Publication date: 10/01/2022
Field of study

Artificial Intelligence, Deep Learning, Machine Learning, Supercomputin

Directory of Open Access Books (DOAB)

Potential of I/O aware workflows in climate and weather

Author: Kunkel Julian M.
Pedro Luciana R.
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2020
Field of study

The efficient, convenient, and robust execution of data-driven workflows and enhanced data management are essential for productivity in scientific computing. In HPC, the concerns of storage and computing are traditionally separated and optimised independently from each other and the needs of the end-to-end user. However, in complex workflows, this is becoming problematic. These problems are particularly acute in climate and weather workflows, which as well as becoming increasingly complex and exploiting deep storage hierarchies, can involve multiple data centres. The key contributions of this paper are: 1) A sketch of a vision for an integrated data-driven approach, with a discussion of the associated challenges and implications, and 2) An architecture and roadmap consistent with this vision that would allow a seamless integration into current climate and weather workflows as it utilises versions of existing tools (ESDM, Cylc, XIOS, and DDN’s IME). The vision proposed here is built on the belief that workflows composed of data, computing, and communication-intensive tasks should drive interfaces and hardware configurations to better support the programming models. When delivered, this work will increase the opportunity for smarter scheduling of computing by considering storage in heterogeneous storage systems. We illustrate the performance-impact on an example workload using a model built on measured performance data using ESDM at DKRZ

Central Archive at the University of Reading

Introducing distributed dynamic data-intensive (D3) science: Understanding applications and infrastructure

Author: Chue Hong Neil
Jha Shantenu
Katz Daniel S.
Luckow Andre
Rana Omer
Simmhan Yogesh
Publication venue: 'Wiley'
Publication date: 12/09/2016
Field of study

A common feature across many science and engineering applications is the amount and diversity of data and computation that must be integrated to yield insights. Data sets are growing larger and becoming distributed; and their location, availability and properties are often time-dependent. Collectively, these characteristics give rise to dynamic distributed data-intensive applications. While "static" data applications have received significant attention, the characteristics, requirements, and software systems for the analysis of large volumes of dynamic, distributed data, and data-intensive applications have received relatively less attention. This paper surveys several representative dynamic distributed data-intensive application scenarios, provides a common conceptual framework to understand them, and examines the infrastructure used in support of applications.Comment: 38 pages, 2 figure

arXiv.org e-Print Archive

Online Research @ Cardiff

Edinburgh Research Explorer

Open Access Repository of IISc Research Publications

StreamFlow: cross-breeding cloud with HPC

Author: Aldinucci Marco
Cantalupo Barbara
Colonnelli Iacopo
Merelli Ivan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2020
Field of study

Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single-cell transcriptomic data analysis workflow.Comment: 30 pages - 2020 IEEE Transactions on Emerging Topics in Computin

arXiv.org e-Print Archive

Institutional Research Information System University of Turin

Middleware for large scale in situ analytics workflows

Author: Dayal Jai
Publication venue: Georgia Institute of Technology
Publication date: 11/01/2017
Field of study

The trend to exascale is causing researchers to rethink the entire computa- tional science stack, as future generation machines will contain both diverse hardware environments and run times that manage them. Additionally, the science applications themselves are stepping away from the traditional bulk-synchronous model and are moving towards a more dynamic and decoupled environment where analysis routines are run in situ alongside the large scale simulations. This thesis presents CoApps, a middleware that allows in situ science analytics applications to operate in a location-flexible manner. Additionally, CoApps explores methods to extract information from, and issue management operations to, lower level run times that are managing the diverse hardware expected to be found on next generation exascale machines. This work leverages experience with several extremely scalable applications in materials and fusion, and has been evaluated on machines ranging from local Linux clusters to the supercomputer Titan.Ph.D

Scholarly Materials And Research @ Georgia Tech

Spark-DIY: A framework for interoperable Spark Operations with high performance Block-Based Data Models

Author: Caino Lores Silvina
Carretero Pérez Jesús
Nicolae Bogdan
Peterka Tom
Yildiz Orcun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/12/2018
Field of study

This work was partially funded by the Spanish Ministry of Economy, Industry and Competitiveness under the grant TIN2016-79637-P ”Towards Unification of HPC and Big Data Paradigms”; the Spanish Ministry of Education under the FPU15/00422 Training Program for Academic and Teaching Staff Grant; the Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357; and by DOE with agreement No. DE-DC000122495, program manager Laura Biven

Crossref

Universidad Carlos III de Madrid e-Archivo

Common challenges and requirements

Author: K Belhajjame
M Wilkinson
P Buneman
P Martin
R Bordawekar
R Filgueira
T Tanhua
Y Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Research infrastructures available for researchers in environmental and Earth science are diverse and highly distributed; dedicated research infrastructures exist for atmospheric science, marine science, solid Earth science, biodiversity research, and more. These infrastructures aggregate and curate key research datasets and provide consolidated data services for a target research community, but they also often overlap in scope and ambition, sharing data sources, sometimes even sites, using similar standards, and ultimately all contributing data that will be essential to addressing the societal challenges that face environmental research today. Thus, while their diversity poses a problem for open science and multidisciplinary research, their commonalities mean that they often face similar technical problems and consequently have common requirements when addressing the implementation of best practices in curation, cataloguing, identification and citation, and other related core topics for data science. In this chapter, we review the requirements gathering performed in the context of the cluster of European environmental and Earth science research infrastructures participating in the ENVRI community, and survey the common challenges identified from that requirements gathering process

Crossref

Online Research @ Cardiff

International Migration, Integration and Social Cohesion online publications

UvA-DARE