117,479 research outputs found
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
AstroGrid-D: Grid Technology for Astronomical Science
We present status and results of AstroGrid-D, a joint effort of
astrophysicists and computer scientists to employ grid technology for
scientific applications. AstroGrid-D provides access to a network of
distributed machines with a set of commands as well as software interfaces. It
allows simple use of computer and storage facilities and to schedule or monitor
compute tasks and data management. It is based on the Globus Toolkit middleware
(GT4). Chapter 1 describes the context which led to the demand for advanced
software solutions in Astrophysics, and we state the goals of the project. We
then present characteristic astrophysical applications that have been
implemented on AstroGrid-D in chapter 2. We describe simulations of different
complexity, compute-intensive calculations running on multiple sites, and
advanced applications for specific scientific purposes, such as a connection to
robotic telescopes. We can show from these examples how grid execution improves
e.g. the scientific workflow. Chapter 3 explains the software tools and
services that we adapted or newly developed. Section 3.1 is focused on the
administrative aspects of the infrastructure, to manage users and monitor
activity. Section 3.2 characterises the central components of our architecture:
The AstroGrid-D information service to collect and store metadata, a file
management system, the data management system, and a job manager for automatic
submission of compute tasks. We summarise the successfully established
infrastructure in chapter 4, concluding with our future plans to establish
AstroGrid-D as a platform of modern e-Astronomy.Comment: 14 pages, 12 figures Subjects: data analysis, image processing,
robotic telescopes, simulations, grid. Accepted for publication in New
Astronom
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
- …