Search CORE

792 research outputs found

Sharing and Preserving Computational Analyses for Posterity with encapsulator

Author: Boose Emery
Crosas Merce
Ellison Aaron M.
Fong Elizabeth
Han Xueyuan
Lau Matthew K.
Lerner Barbara S.
Pasquier Thomas
Seltzer Margo
Publication venue
Publication date: 06/05/2018
Field of study

Open data and open-source software may be part of the solution to science's "reproducibility crisis", but they are insufficient to guarantee reproducibility. Requiring minimal end-user expertise, encapsulator creates a "time capsule" with reproducible code in a self-contained computational environment. encapsulator provides end-users with a fully-featured desktop environment for reproducible research.Comment: 11 pages, 6 figure

arXiv.org e-Print Archive

Explore Bristol Research

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Author: Choi Frederick
Kim Hannah
Rahman Sajjadur
Zhang Dan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/03/2023
Field of study

Data science workflows are human-centered processes involving on-demand programming and analysis. While programmable and interactive interfaces such as widgets embedded within computational notebooks are suitable for these workflows, they lack robust state management capabilities and do not support user-defined customization of the interactive components. The absence of such capabilities hinders workflow reusability and transparency while limiting the scope of exploration of the end-users. In response, we developed MAGNETON, a framework for authoring interactive widgets within computational notebooks that enables transparent, reusable, and customizable data science workflows. The framework enhances existing widgets to support fine-grained interaction history management, reusable states, and user-defined customizations. We conducted three case studies in a real-world knowledge graph construction and serving platform to evaluate the effectiveness of these widgets. Based on the observations, we discuss future implications of employing MAGNETON widgets for general-purpose data science workflows.Comment: To appear at Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing System

arXiv.org e-Print Archive

TOLKIN – Tree of Life Knowledge and Information Network: Filling a Gap for Collaborative Research in Biological Systematics

Author: B Ludaescher
Christopher A. Dell
CS Parr
DE Soltis
DR Maddison
DS Carneiro-Torres
E Pennisi
ES Lander
Greg H. Traub
H-J Esser
HD Zhimin W
J Wieczorek
JC Venter
Jin Koh
M Gross
MA O'Leary
MB Jones
Nestor Santiago
Nico Cellinese
PH Pahlevani
RA Vos
Reed S. Beaman
Robert DeSalle
SD Kahn
T Oinn
TJ Vision
WK Michener
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The development of biological informatics infrastructure capable of supporting growing data management and analysis environments is an increasing need within the systematics biology community. Although significant progress has been made in recent years on developing new algorithms and tools for analyzing and visualizing large phylogenetic data and trees, implementation of these resources is often carried out by bioinformatics experts, using one-off scripts. Therefore, a gap exists in providing data management support for a large set of non-technical users. The TOLKIN project (Tree of Life Knowledge and Information Network) addresses this need by supporting capabilities to manage, integrate, and provide public access to molecular, morphological, and biocollections data and research outcomes through a collaborative, web application. This data management framework allows aggregation and import of sequences, underlying documentation about their source, including vouchers, tissues, and DNA extraction. It combines features of LIMS and workflow environments by supporting management at the level of individual observations, sequences, and specimens, as well as assembly and versioning of data sets used in phylogenetic inference. As a web application, the system provides multi-user support that obviates current practices of sharing data sets as files or spreadsheets via email

CiteSeerX

Public Library of Science (PLOS)

Crossref

PubMed Central

Recommended from our members

Research resources: curating the new eagle-i discovery system

Author: Brush Matthew
Corday Karen
Haendel Melissa
Johnson Tenille
Robinson David
Segerdell Erik
Shaffer Chris
Torniai Carlo
Vasilevsky Nicole
Wilson Melanie
Publication venue: Oxford University Press
Publication date: 17/05/2012
Field of study

Development of biocuration processes and guidelines for new data types or projects is a challenging task. Each project finds its way toward defining annotation standards and ensuring data consistency with varying degrees of planning and different tools to support and/or report on consistency. Further, this process may be data type specific even within the context of a single project. This article describes our experiences with eagle-i, a 2-year pilot project to develop a federated network of data repositories in which unpublished, unshared or otherwise ‘invisible’ scientific resources could be inventoried and made accessible to the scientific community. During the course of eagle-i development, the main challenges we experienced related to the difficulty of collecting and curating data while the system and the data model were simultaneously built, and a deficiency and diversity of data management strategies in the laboratories from which the source data was obtained. We discuss our approach to biocuration and the importance of improving information management strategies to the research process, specifically with regard to the inventorying and usage of research resources. Finally, we highlight the commonalities and differences between eagle-i and similar efforts with the hope that our lessons learned will assist other biocuration endeavors

Harvard University - DASH

PubMed Central

Task 51 - Cloud-Optimized Format Study

Author: Durbin Chris
Quinn Patrick
Shum Dana
Publication venue
Publication date
Field of study

The cloud infrastructure provides a number of capabilities that can dramatically improve access and use of Earth Observation data. However, in many cases, data may need to be reorganized and/or reformatted in order to make them tractable to support cloud-native analysis/access patterns. The purpose of this study is to examine the pros and cons of different formats for storing data on the cloud. The evaluation will focus on both enabling high-performance data access and usage as well as to meet the existing scientific data stewardship needs of EOSDIS

NASA Technical Reports Server

Informatic system for a global tissue–fluid biorepository with a graph theory–oriented graphical user interface

Author: Bob Carter
Fred Hochberg
Nadia Atai
William E. Butler
Publication venue: Taylor & Francis Group
Publication date: 01/01/2014
Field of study

The Richard Floor Biorepository supports collaborative studies of extracellular vesicles (EVs) found in human fluids and tissue specimens. The current emphasis is on biomarkers for central nervous system neoplasms but its structure may serve as a template for collaborative EV translational studies in other fields. The informatic system provides specimen inventory tracking with bar codes assigned to specimens and containers and projects, is hosted on globalized cloud computing resources, and embeds a suite of shared documents, calendars, and video-conferencing features. Clinical data are recorded in relation to molecular EV attributes and may be tagged with terms drawn from a network of externally maintained ontologies thus offering expansion of the system as the field matures. We fashioned the graphical user interface (GUI) around a web-based data visualization package. This system is now in an early stage of deployment, mainly focused on specimen tracking and clinical, laboratory, and imaging data capture in support of studies to optimize detection and analysis of brain tumour–specific mutations. It currently includes 4,392 specimens drawn from 611 subjects, the majority with brain tumours. As EV science evolves, we plan biorepository changes which may reflect multi-institutional collaborations, proteomic interfaces, additional biofluids, changes in operating procedures and kits for specimen handling, novel procedures for detection of tumour-specific EVs, and for RNA extraction and changes in the taxonomy of EVs. We have used an ontology-driven data model and web-based architecture with a graph theory–driven GUI to accommodate and stimulate the semantic web of EV science

Directory of Open Access Journals

COINS: An Innovative Informatics and Neuroimaging Tool Suite Built for Large Heterogeneous Datasets

Author: Adam eScott
Dylan eWood
Jessica A Turner
Jody eRoberts
Margaret eKing
Raul eDe la Garza
Runtang eWang
Susan eLane
Vince D Calhoun
Vince D Calhoun
William eCourtney
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

The availability of well-characterized neuroimaging data with large numbers of subjects, especially for clinical populations, is critical to advancing our understanding of the healthy and diseased brain. Such data enables questions to be answered in a much more generalizable manner and also has the potential to yield solutions derived from novel methods that were conceived after the original studies’ implementation. Though there is currently growing interest in data sharing, the neuroimaging community has been struggling for years with how to best encourage sharing data across brain imaging studies. With the advent of studies that are much more consistent across sites (e.g., resting functional magnetic resonance imaging, diffusion tensor imaging, and structural imaging) the potential of pooling data across studies continues to gain momentum. At the mind research network, we have developed the collaborative informatics and neuroimaging suite (COINS; http://coins.mrn.org) to provide researchers with an information system based on an open-source model that includes web-based tools to manage studies, subjects, imaging, clinical data, and other assessments. The system currently hosts data from nine institutions, over 300 studies, over 14,000 subjects, and over 19,000 MRI, MEG, and EEG scan sessions in addition to more than 180,000 clinical assessments. In this paper we provide a description of COINS with comparison to a valuable and popular system known as XNAT. Although there are many similarities between COINS and other electronic data management systems, the differences that may concern researchers in the context of multi-site, multi-organizational data sharing environments with intuitive ease of use and PHI security are emphasized as important attributes

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Recommended from our members

Towards Recovering Provenance with Experiment Explorer

Author: Abdulla G. M.
Asuncion H. U.
Carr C. W.
Davis D. B.
Publication venue: Lawrence Livermore National Laboratory
Publication date: 05/10/2012
Field of study

UNT Digital Library