Search CORE

3,059 research outputs found

Towards Exascale Scientific Metadata Management

Author: Blanas Spyros
Byna Surendra
Publication venue
Publication date: 29/03/2015
Field of study

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

arXiv.org e-Print Archive

eScholarship - University of California

RegenBase: a knowledge base of spinal cord injury biology for translational research.

Author: Abeyruwan Saminda W
Al-Ali Hassan
Bixby John L
Callahan Alison
Ferguson Adam R
Lemmon Vance P
Popovich Phillip G
Sakurai Kunie
Shah Nigam H
Visser Ubbo
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Spinal cord injury (SCI) research is a data-rich field that aims to identify the biological mechanisms resulting in loss of function and mobility after SCI, as well as develop therapies that promote recovery after injury. SCI experimental methods, data and domain knowledge are locked in the largely unstructured text of scientific publications, making large scale integration with existing bioinformatics resources and subsequent analysis infeasible. The lack of standard reporting for experiment variables and results also makes experiment replicability a significant challenge. To address these challenges, we have developed RegenBase, a knowledge base of SCI biology. RegenBase integrates curated literature-sourced facts and experimental details, raw assay data profiling the effect of compounds on enzyme activity and cell growth, and structured SCI domain knowledge in the form of the first ontology for SCI, using Semantic Web representation languages and frameworks. RegenBase uses consistent identifier schemes and data representations that enable automated linking among RegenBase statements and also to other biological databases and electronic resources. By querying RegenBase, we have identified novel biological hypotheses linking the effects of perturbagens to observed behavioral outcomes after SCI. RegenBase is publicly available for browsing, querying and download.Database URL:http://regenbase.org

PubMed Central

eScholarship - University of California

Recommended from our members

FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.

Author: Mungall Christopher J
Overton James A
Peters Bjoern
Sette Alessandro
Vita Randi
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The Immune Epitope Database (IEDB), at www.iedb.org, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true 'machine-actionable interoperability', which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies

eScholarship - University of California

The Revised TESS Input Catalog and Candidate Target List

Author: Barclay Thomas
Bean Jacob L.
Brassuer C.E.
Charbonneau David
Chittidi Jay
Collins Kevin
Fleming Scott W.
Ge Jian
Kane Stephen R.
Latham David W.
Lee Nathan De
Lissauer Jack J.
Mann Andrew W.
McLean Brian
Muirhead Philip S.
Mullally Susan
Narita Norio
Oelkers Ryan J.
Paegert Martin
Pepper Joshua
Plavchan Peter
Ricker George R.
Rojas-Ayala Bárbara
Rose Mark E.
Sasselov Dimitar
Seager S.
Sharma Sanjib
Shiao Bernie
Sozzetti Alessandro
Stassun Keivan G.
Stello Dennis
Tenenbaum Peter
Ting Eric B.
Torres Guillermo
Vanderspek Roland
Wallace Geoff
Winn Joshua N.
Publication venue: 'American Astronomical Society'
Publication date: 01/01/2019
Field of study

We describe the catalogs assembled and the algorithms used to populate the revised TESS Input Catalog (TIC), based on the incorporation of the Gaia second data release. We also describe a revised ranking system for prioritizing stars for 2-minute cadence observations, and assemble a revised Candidate Target List (CTL) using that ranking. The TIC is available on the Mikulski Archive for Space Telescopes (MAST) server, and an enhanced CTL is available through the Filtergraph data visualization portal system at the URL http://filtergraph.vanderbilt.edu/tess_ctl.Comment: 30 pages, 16 figures, submitted to AAS Journals; provided to the community in advance of publication in conjunction with public release of the TIC/CTL on 28 May 201

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

OA@INAF - Istituto Nazionale di Astrofisica

Theory and Practice of Data Citation

Author: Silvello Gianmaria
Publication venue: 'Wiley'
Publication date: 24/06/2017
Field of study

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Enhanced Search for Educational Resources - A Perspective and a Prototype from ccLearn

Author: Ahrash Bissell
Jane Park
Mike Linksvayer
Nathan Yergler
Publication venue: Creative Commons
Publication date: 07/07/2009
Field of study

Users of search tools who seek educational materials on the Internet are typically presented with either a web-scale search (e.g., Google or Yahoo) or a specialized, site-specific tool. The specialized search tools often rely upon custom data fields, such as user-entered ratings, to provide additional value. As currently designed, these systems are generally too labor intensive to manage and scale up beyond a single site or set of resources.However, custom (or structured) data of some form is necessary if search outcomes foreducational materials are to be improved. For example, design criteria and evaluative metrics are crucial attributes for educational resources, and these currently require human labeling and verification. Thus, one challenge is to design a search tool that capitalizes on available structured data (also called metadata) but is not crippled if the data are missing. This information should be amenable to repurposing by anyone, which means that it must be archived in a manner that can be discovered and leveraged easily.In this paper, we describe the extent to which DiscoverEd, a prototype developed by ccLearn, meets the design challenge of a scalable, enhanced search platform for educational resources. We then explore some of the key challenges regarding enhanced search for topic-specific Internet resources generally. We conclude by illustrating some possible future developments and third-party enhancements to the DiscoverEd prototype

IssueLab