Search CORE

11,989 research outputs found

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in R

Author: Becker Gabriel
Lawrence Michael
Moore Sara E.
Publication venue
Publication date: 13/06/2017
Field of study

Research is an incremental, iterative process, with new results relying and building upon previous ones. Scientists need to find, retrieve, understand, and verify results in order to confidently extend them, even when the results are their own. We present the trackr framework for organizing, automatically annotating, discovering, and retrieving results. We identify sources of automatically extractable metadata for computational results, and we define an extensible system for organizing, annotating, and searching for results based on these and other metadata. We present an open-source implementation of these concepts for plots, computational artifacts, and woven dynamic reports generated in the R statistical computing language

arXiv.org e-Print Archive

User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project

Author: Cholia Shreyas
Gunter Dan
Huck Patrick
N'Diaye Alpha
Persson Kristin
Winston Donald
Publication venue: 'Wiley'
Publication date: 19/10/2015
Field of study

This work discusses how the MPContribs framework in the Materials Project (MP) allows user-contributed data to be shown and analyzed alongside the core MP database. The Materials Project is a searchable database of electronic structure properties of over 65,000 bulk solid materials that is accessible through a web-based science-gateway. We describe the motivation for enabling user contributions to the materials data and present the framework's features and challenges in the context of two real applications. These use-cases illustrate how scientific collaborations can build applications with their own "user-contributed" data using MPContribs. The Nanoporous Materials Explorer application provides a unique search interface to a novel dataset of hundreds of thousands of materials, each with tables of user-contributed values related to material adsorption and density at varying temperature and pressure. The Unified Theoretical and Experimental x-ray Spectroscopy application discusses a full workflow for the association, dissemination and combined analyses of experimental data from the Advanced Light Source with MP's theoretical core data, using MPContribs tools for data formatting, management and exploration. The capabilities being developed for these collaborations are serving as the model for how new materials data can be incorporated into the Materials Project website with minimal staff overhead while giving powerful tools for data search and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing Environments Workshop (2015), to be published in "Concurrency in Computation: Practice and Experience

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Open Science principles for accelerating trait-based science across the Tree of Life.

Author: Adams Vanessa M
Alroy John
Andrew Samuel C
Ankenbrand Markus J
Balk Meghan A
Bland Lucie M
Boyle Brad L
Bravo-Avila Catherine H
Brennan Ian
Carthey Alexandra JR
Catullo Renee
Cavazos Brittany R
Chown Steven L
Conde Dalia A
Enquist Brian J
Fadrique Belen
Falster Daniel S
Feng Xiao
Gallagher Rachael V
Gibb Heloise
Halbritter Aud H
Hammock Jennifer
Hogan J Aaron
Holewa Hamish
Hope Michael
Iversen Colleen M
Jochum Malte
Kattge Jens
Kearney Michael
Keller Alexander
Mabee Paula
Madin Joshua S
Maitner Brian S
Manning Peter
McCormack Luke
Michaletz Sean T
Park Daniel S
Pearse William D
Penone Caterina
Perez Timothy M
Pineda-Munoz Silvia
Poelen Jorrit H
Ray Courtenay A
Rossetto Maurizio
Salguero-Gómez Roberto
Sauquet Hervé
Schneider Florian D
Sparrow Benjamin
Spasojevic Marko J
Telford Richard J
Tobias Joseph A
Vandvik Vigdis
Violle Cyrille
Walls Ramona
Weiss Katherine CB
Westoby Mark
Wright Ian J
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Synthesizing trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Species traits are widely used in ecological and evolutionary science, and new data and methods have proliferated rapidly. Yet accessing and integrating disparate data sources remains a considerable challenge, slowing progress toward a global synthesis to integrate trait data across organisms. Trait science needs a vision for achieving global integration across all organisms. Here, we outline how the adoption of key Open Science principles-open data, open source and open methods-is transforming trait science, increasing transparency, democratizing access and accelerating global synthesis. To enhance widespread adoption of these principles, we introduce the Open Traits Network (OTN), a global, decentralized community welcoming all researchers and institutions pursuing the collaborative goal of standardizing and integrating trait data across organisms. We demonstrate how adherence to Open Science principles is key to the OTN community and outline five activities that can accelerate the synthesis of trait data across the Tree of Life, thereby facilitating rapid advances to address scientific inquiries and environmental issues. Lessons learned along the path to a global synthesis of trait data will provide a framework for addressing similarly complex data science and informatics challenges

eScholarship - University of California

Oxford University Research Archive

Western Sydney ResearchDirect

Bern Open Repository and Information System (BORIS)

Monash University Research Portal

Status and Future Perspectives for Lattice Gauge Theory Calculations to the Exascale and Beyond

Author: Christ Norman H.
Detmold William
Edwards Robert G.
Joó Bálint
Jung Chulwoo
Savage Martin
Shanahan Phiala
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/11/2019
Field of study

In this and a set of companion whitepapers, the USQCD Collaboration lays out a program of science and computing for lattice gauge theory. These whitepapers describe how calculation using lattice QCD (and other gauge theories) can aid the interpretation of ongoing and upcoming experiments in particle and nuclear physics, as well as inspire new ones.Comment: 44 pages. 1 of USQCD whitepapers

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Comparative Analysis of Computationally Accelerated NGS Alignment

Author: van der Zijp-Tan Ada Chaeli
Publication venue: JagWorks@USA
Publication date: 01/01/2021
Field of study

The Smith-Waterman algorithm is the basis of most current sequence alignment technology, which can be used to identify similarities between sequences for cancer detection and treatment because it provides researchers with potential targets for early diagnosis and personalized treatment. The growing number of DNA and RNA sequences available to analyze necessitates faster alignment processes than are possible with current iterations of the Smith-Waterman (S-W) algorithm. This project aimed to identify the most effective and efficient methods for accelerating the S-W algorithm by investigating recent advances in sequence alignment. Out of a total of 22 articles considered in this project, 17 articles had to be excluded from the study due to lack of standardization of data reporting. Only one study by Chen et al. obtained in this project contained enough information to compare accuracy and alignment speed. When accuracy was excluded from the criteria, five studies contained enough information to rank their efficiency. The study conducted by Rucci et al. was the fastest at 268.83 Giga Cell Updates Per Second (GCUPS), and the method by Pérez-Serrano et al. came close at 229.93 GCUPS while testing larger sequences. It was determined that reporting standards in this field are not sufficient, and the study by Chen et al. should set a benchmark for future reporting

University of South Alabama Institutional Repository