Search CORE

1,531 research outputs found

Rice Galaxy: An open resource for plant science

Author: Alexandrov Nickolai
Beaume Nicolas
Dereeper Alexis
Dizon Joshua
Droc Gaëtan
Haga Jason
Juanillas Venice
Kretzschmar Tobias
Lang Jillian M.
Larmande Pierre
Leach Jan E.
Mansueto Locedie
Mauleon Ramil P.
Mendoza John Robert
Perdon Jon Peter
Plale Beth
Ratharanjan Kunalan
Ruiz Manuel
Thomson Michael J.
Triplett Lindsay
Zhou Gabriel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Background: Rice molecular genetics, breeding, genetic diversity, and allied research (such as rice-pathogen interaction) have adopted sequencing technologies and high-density genotyping platforms for genome variation analysis and gene discovery. Germplasm collections representing rice diversity, improved varieties, and elite breeding materials are accessible through rice gene banks for use in research and breeding, with many having genome sequences and high-density genotype data available. Combining phenotypic and genotypic information on these accessions enables genome-wide association analysis, which is driving quantitative trait loci discovery and molecular marker development. Comparative sequence analyses across quantitative trait loci regions facilitate the discovery of novel alleles. Analyses involving DNA sequences and large genotyping matrices for thousands of samples, however, pose a challenge to non−computer savvy rice researchers. Findings: The Rice Galaxy resource has shared datasets that include high-density genotypes from the 3,000 Rice Genomes project and sequences with corresponding annotations from 9 published rice genomes. The Rice Galaxy web server and deployment installer includes tools for designing single-nucleotide polymorphism assays, analyzing genome-wide association studies, population diversity, rice−bacterial pathogen diagnostics, and a suite of published genomic prediction methods. A prototype Rice Galaxy compliant to Open Access, Open Data, and Findable, Accessible, Interoperable, and Reproducible principles is also presented. Conclusions: Rice Galaxy is a freely available resource that empowers the plant research community to perform state-of-the-art analyses and utilize publicly available big datasets for both fundamental and applied science

IUScholarWorks Open

ePublications@SCU

Agritrop

Horizon / Pleins textes

Recommended from our members

Kronos: a workflow assembler for genome analytics and informatics.

Author: Aniba Radhouane
Bashashati Ali
Boutros Paul C
Grande Bruno M
Grewal Diljot
Grewal Jasleen
Morin Ryan D
Rosner Jamie
Shah Sohrab P
Taghiyar M Jafar
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

BackgroundThe field of next-generation sequencing informatics has matured to a point where algorithmic advances in sequence alignment and individual feature detection methods have stabilized. Practical and robust implementation of complex analytical workflows (where such tools are structured into "best practices" for automated analysis of next-generation sequencing datasets) still requires significant programming investment and expertise.ResultsWe present Kronos, a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. Making analysis modules would still require programming. The framework of each workflow includes a run manager to execute the encoded workflows locally (or on a cluster or cloud), parallelize tasks, and log all runtime events. The resulting workflows are highly modular and configurable by construction, facilitating flexible and extensible meta-applications that can be modified easily through configuration file editing. The workflows are fully encoded for ease of distribution and can be instantiated on external systems, a step toward reproducible research and comparative analyses. We introduce a framework for building Kronos components that function as shareable, modular nodes in Kronos workflows.ConclusionsThe Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large. Kronos is shipped with both Docker and Amazon Web Services Machine Images. It is free, open source, and available through the Python Package Index and at https://github.com/jtaghiyar/kronos

eScholarship - University of California

Structuring research methods and data with the research object model: genomics workflows as a case study

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Reproducible big data science: A case study in continuous FAIRness.

Author: Chard Kyle
D\u27Arcy Mike
Deutsch Eric W
Foster Ian
Funk Cory C
Glusman Gustavo
Heavner Ben
Jung Segun C
Kesselman Carl
Madduri Ravi
Price Nathan D
Richards Matthew A
Rodriguez Alexis
Shannon Paul
Sulakhe Dinanath
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 01/01/2019
Field of study

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes

Providence St. Joseph Health Digital Commons

The Research Object Suite of Ontologies: Sharing and Exchanging Research Data and Methods on the Open Web

Author: Bechhofer Sean
Belhajjame Khalid
Corcho Óscar
Garijo Daniel
Goble Carole
Gómez-Pérez José-Manuel
Hettne Kristina
Klyne Graham
Palma Raul
Zhao Jun
Publication venue
Publication date: 03/02/2014
Field of study

Research in life sciences is increasingly being conducted in a digital and online environment. In particular, life scientists have been pioneers in embracing new computational tools to conduct their investigations. To support the sharing of digital objects produced during such research investigations, we have witnessed in the last few years the emergence of specialized repositories, e.g., DataVerse and FigShare. Such repositories provide users with the means to share and publish datasets that were used or generated in research investigations. While these repositories have proven their usefulness, interpreting and reusing evidence for most research results is a challenging task. Additional contextual descriptions are needed to understand how those results were generated and/or the circumstances under which they were concluded. Because of this, scientists are calling for models that go beyond the publication of datasets to systematically capture the life cycle of scientific investigations and provide a single entry point to access the information about the hypothesis investigated, the datasets used, the experiments carried out, the results of the experiments, the people involved in the research, etc. In this paper we present the Research Object (RO) suite of ontologies, which provide a structured container to encapsulate research data and methods along with essential metadata descriptions. Research Objects are portable units that enable the sharing, preservation, interpretation and reuse of research investigation results. The ontologies we present have been designed in the light of requirements that we gathered from life scientists. They have been built upon existing popular vocabularies to facilitate interoperability. Furthermore, we have developed tools to support the creation and sharing of Research Objects, thereby promoting and facilitating their adoption.Comment: 20 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Structuring research methods and data with the research object model:genomics workflows as a case study

Author: 't Hoen Peter A. C.
Bechhofer Sean
Belhajjame Khalid
Corcho Oscar
Cruickshank Don
de Roure David
Dharuri Harish
Garrido Julian
Goble Carole
Hettne Kristina M.
Klyne Graham
Mina Eleni
Roos Marco
Soiland-Reyes Stian
Thompson Mark
van Schouwen Reinout
Verdes-Montenegro Lourdes
Wolstencroft Katherine
Zhao Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: One of the main challenges for biomedical research lies in the computer-assisted integrative study of large and increasingly complex combinations of data in order to understand molecular mechanisms. The preservation of the materials and methods of such computational experiments with clear annotations is essential for understanding an experiment, and this is increasingly recognized in the bioinformatics community. Our assumption is that offering means of digital, structured aggregation and annotation of the objects of an experiment will provide necessary meta-data for a scientist to understand and recreate the results of an experiment. To support this we explored a model for the semantic description of a workflow-centric Research Object (RO), where an RO is defined as a resource that aggregates other resources, e. g., datasets, software, spreadsheets, text, etc. We applied this model to a case study where we analysed human metabolite variation by workflows. Results: We present the application of the workflow-centric RO model for our bioinformatics case study. Three workflows were produced following recently defined Best Practices for workflow design. By modelling the experiment as an RO, we were able to automatically query the experiment and answer questions such as "which particular data was input to a particular workflow to test a particular hypothesis?", and "which particular conclusions were drawn from a particular workflow?". Conclusions: Applying a workflow-centric RO model to aggregate and annotate the resources used in a bioinformatics experiment, allowed us to retrieve the conclusions of the experiment in the context of the driving hypothesis, the executed workflows and their input data. The RO model is an extendable reference model that can be used by other systems as well. Availability: The Research Object is available at http://www.myexperiment.org/packs/428 The Wf4Ever Research Object Model is available at http://wf4ever.github.io/r

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Oxford University Research Archive

Leiden University Scholary Publications

The University of Manchester - Institutional Repository

Lancaster E-Prints

Archivo Digital UPM

Recommended from our members

Reproducible big data science: A case study in continuous FAIRness

Author: Chard Kyle
D'Arcy Mike
Deutsch Eric
Foster Ian
Funk Cory
Glusman Gustavo
Heavner Ben
Jung Segun C.
Kesselman Carl
Madduri Ravi
Price Nathan
Richards Matthew
Rodriguez Alexis
Shannon Paul
Sulakhe Dinanath
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/06/2023
Field of study

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility—thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes

Knowledge UChicago

Structuring research methods and data with the research object model: genomics workflows as a case study

Author
Publication venue: BioMed Central
Publication date
Field of study

Springer - Publisher Connector

CWLProv - Interoperable Retrospective Provenance capture and its challenges

Author: Crusoe Michael R.
Khan Farah Zaib
Lonie Andrew
Sinnott Richard
Soiland-Reyes Stian
Publication venue
Publication date: 27/03/2018
Field of study

The automation of data analysis in the form of scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). However, there are still several challenges associated with the effective sharing, publication, understandability and reproducibility of such workflows due to the incomplete capture of provenance and the dependence on particular technical (software) platforms. This paper presents CWLProv, an approach for retrospective provenance capture utilizing open source community-driven standards involving application and customization of workflow-centric <a href="http://www.researchobject.org/">Research Objects</a> (ROs). The ROs are produced as an output of a workflow enactment defined in the <a href="http://www.commonwl.org/">Common Workflow Language</a> (CWL) using the CWL reference implementation and its data structures. The approach aggregates and annotates all the resources involved in the scientific investigation including inputs, outputs, workflow specification, command line tool specifications and input parameter settings. The resources are linked within the RO to enable re-enactment of an analysis without depending on external resources. The workflow provenance profile is represented in W3C recommended standard <a href="https://www.w3.org/TR/prov-n/">PROV-N</a> and <a href="https://www.w3.org/Submission/prov-json/">PROV-JSON</a> format to capture retrospective provenance of the workflow enactment. The workflow-centric RO produced as an output of a CWL workflow enactment is expected to be interoperable, reusable, shareable and portable across different plat- forms. This paper describes the need and motivation for <a href="https://github.com/common-workflow-language/cwltool/tree/provenance">CWLProv</a> and the lessons learned in applying it for ROs using CWL in the bioinformatics domain.</p

ZENODO

The University of Manchester - Institutional Repository

FigShare