21,852 research outputs found
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
Provenance Threat Modeling
Provenance systems are used to capture history metadata, applications include
ownership attribution and determining the quality of a particular data set.
Provenance systems are also used for debugging, process improvement,
understanding data proof of ownership, certification of validity, etc. The
provenance of data includes information about the processes and source data
that leads to the current representation. In this paper we study the security
risks provenance systems might be exposed to and recommend security solutions
to better protect the provenance information.Comment: 4 pages, 1 figure, conferenc
Architecture for Provenance Systems
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies
An Architecture for Integrated Intelligence in Urban Management using Cloud Computing
With the emergence of new methodologies and technologies it has now become
possible to manage large amounts of environmental sensing data and apply new
integrated computing models to acquire information intelligence. This paper
advocates the application of cloud capacity to support the information,
communication and decision making needs of a wide variety of stakeholders in
the complex business of the management of urban and regional development. The
complexity lies in the interactions and impacts embodied in the concept of the
urban-ecosystem at various governance levels. This highlights the need for more
effective integrated environmental management systems. This paper offers a
user-orientated approach based on requirements for an effective management of
the urban-ecosystem and the potential contributions that can be supported by
the cloud computing community. Furthermore, the commonality of the influence of
the drivers of change at the urban level offers the opportunity for the cloud
computing community to develop generic solutions that can serve the needs of
hundreds of cities from Europe and indeed globally.Comment: 6 pages, 3 figure
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
The lifecycle of provenance metadata and its associated challenges and opportunities
This chapter outlines some of the challenges and opportunities associated
with adopting provenance principles and standards in a variety of disciplines,
including data publication and reuse, and information sciences
- …