23 research outputs found
Galaxy: A Decade of Realising CWFR Concepts
Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.publishedVersio
A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive
Abstract
Summary
Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment.
Availability
CLI ENA upload tool is available at github.com/usegalaxy-eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena_upload/382518f24d6d and https://github.com/galaxyproject/tools-iuc/tree/master/tools/ena_upload (development) and; ENA upload Galaxy container at github.com/ELIXIR-Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785)
</jats:sec
The Arabidopsis condensin CAP‐D subunits arrange interphase chromatin
Condensins are best known for their role in shaping chromosomes. Other functions such as organizing interphase chromatin and transcriptional control have been reported in yeasts and animals, but little is known about their function in plants. To elucidate the specific composition of condensin complexes and the expression of CAP-D2 (condensin I) and CAP-D3 (condensin II), we performed biochemical analyses in Arabidopsis. The role of CAP-D3 in interphase chromatin organization and function was evaluated using cytogenetic and transcriptome analysis in cap-d3 T-DNA insertion mutants. CAP-D2 and CAP-D3 are highly expressed in mitotically active tissues. In silico and pull-down experiments indicate that both CAP-D proteins interact with the other condensin I and II subunits. In cap-d3 mutants, an association of heterochromatic sequences occurs, but the nuclear size and the general histone and DNA methylation patterns remain unchanged. Also, CAP-D3 influences the expression of genes affecting the response to water, chemicals, and stress. The expression and composition of the condensin complexes in Arabidopsis are similar to those in other higher eukaryotes. We propose a model for the CAP-D3 function during interphase in which CAP-D3 localizes in euchromatin loops to stiffen them and consequently separates centromeric regions and 45S rDNA repeats
Enhancing RDM in Galaxy by integrating RO-Crate
We introduce how the Galaxy research environment (Jalili et al. 2020) integrates with RO-Crate as an implementation of Findable Accessible Interoperable Reproducible Digital Objects (FAIR Digital Objects / FDO) (Wilkinson et al. 2016, Schultes and Wittenburg 2018) and how using RO-Crate as an exchange mechanism of workflows and their execution history helps integrate Galaxy with the wider ecosystem of ELIXIR (Harrow et al. 2021) and the European Open Science Cloud (EOSC-Life) to enable FAIR and reproducible data analysis.RO-Crate (Soiland-Reyes et al. 2022) is a generic packaging format containing datasets and their description using standards for FAIR Linked Data. The format is based on schema.org (Guha et al. 2016) annotations in JSON-LD, which allows for rich metadata representation. The RO-Crate effort aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.The RO-Crate community brings together practitioners from very different backgrounds, and with different motivations and use cases. Among the core target users are:researchers engaged with computation and data-intensive, workflow-driven analysis;digital repository managers and infrastructure providers;individual researchers looking for a straightforward tool or how-to guide to "FAIRify" their data;data stewards supporting research projects in creating and curating datasets.Given the wide applicability of RO-Crate and the lack of practical implementations of FDOs, ELIXIR (Harrow et al. 2021) co-opted this initiative as the project to define a common format for research data exchange and repository entries. Thus, during the last year it's been implemented in a wide range of services, such as: WorkflowHub (Goble et al. 2021) (a registry for describing, sharing and publishing scientific computational workflows) uses RO-Crates as an exchange format to improve reproducibility of computational workflows that follow the Workflow RO-Crate profile (Bacall et al. 2022); LifeMonitor (Leo et al. 2022) (a service to support the sustainability of computational workflows being developed as part of the EOSC-Life project) uses RO-Crate as an exchange format for describing test suites associated with workflows. Tools have been developed towards aiding the previously mentioned use cases and increasing the general usability of RO-Crates by providing a user-friendly (programmatic) interface for consumption and production of RO-Crates through programmatic libraries for consuming/producing RO-Crates (ro-crate-py De Geest et al. 2022, ro-crate-ruby Bacall and Whitwell 2022, ro-crate-js Lynch et al. 2021).The Galaxy project provides a research environment with data analysis and data management functionalities as a multi user platform, aiming to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. As such, it stores not just analysis related data but also the complete analytical workflow, including its metadata. The internal data model involves the history entity, including all steps performed in a specific analysis, and the workflow entity, defining the structure of an analytical pipeline. From the start, Galaxy aims to enable reproducible analyses by providing capabilities to export (and import) all the analysis history details and workflow data and metadata in a FAIR way. As such it helps its users with the daily research data management. The Galaxy community is continuously improving and adding features, the integration of the FAIR Digital Object principles is a natural next step in this. To be able to support these FDOs, Galaxy leverages the RO-Crate Python client library (De Geest et al. 2022) and provides multiple entry points to import and export different research data objects representing its internal entities and associated metadata. These objects include:a workflow definition, which is used to share/publish the details of an analysis pipeline, including the graph of tools that need to be executed, and metadata about the data types requiredindividual data files or a collection of datasets related to an analysis historya compressed archive of the entire analysis history including the metadata associated with it such as the tools used, their versions, the parameters chosen, workflow invocation related metadata, inputs, outputs, license, author, CWLProv description (Khan et al. 2019) of the workflow, contextual references in the form of Digital Object Identifiers (DOIs), 'EMBRACE Data And Methods' ontology (EDAM) terms (Ison et al. 2013), etc. The adoption of RO-crate by Galaxy allows a standardised exchange of FDOs with other platforms in the ELIXIR Tools ecosystem, such as WorkflowHub and LifeMonitor. Integrating RO-Crate deeply into Galaxy and offering import and export options of various Galaxy objects such as Research Objects allows for increased standardisation, improved Research Data Management (RDM) functionalities, smoother user experience (UX) as well as improved interoperability with other systems. The integration in a platform used by biologists to do data intensive analysis, facilitates the publication of workflows and workflow invocations for all skill levels and democratises the ability to perform Open Science
A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive
Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment
ro-crate-py 0.8.0:RO-Crate metadata generator/parser
What's Changed
Test to check the performance of adding a data entity by @simleo in https://github.com/ResearchObject/ro-crate-py/pull/134
Fix missing file by @simleo in https://github.com/ResearchObject/ro-crate-py/pull/136
Fix typos in notebooks by @kinow in https://github.com/ResearchObject/ro-crate-py/pull/144
Fix typo on README by @kinow in https://github.com/ResearchObject/ro-crate-py/pull/145
Add Autosubmit language by @kinow in https://github.com/ResearchObject/ro-crate-py/pull/143
Add methods for adding and updating JSON-LD directly (partials for WMS) by @kinow in https://github.com/ResearchObject/ro-crate-py/pull/149
Remove version from ComputerLanguage by @simleo in https://github.com/ResearchObject/ro-crate-py/pull/150
Add add_tree method by @simleo in https://github.com/ResearchObject/ro-crate-py/pull/151
Remove engine version default by @simleo in https://github.com/ResearchObject/ro-crate-py/pull/152
New Contributors
@kinow made their first contribution in https://github.com/ResearchObject/ro-crate-py/pull/144
Full Changelog: https://github.com/ResearchObject/ro-crate-py/compare/0.7.0...0.8.0Cite a
The Arabidopsis condensin CAP‐D subunits arrange interphase chromatin
International audienceCondensins are best known for their role in shaping chromosomes. Other functions such as organizing interphase chromatin and transcriptional control have been reported in yeasts and animals, but little is known about their function in plants. To elucidate the specific composition of condensin complexes and the expression of CAP-D2 (condensin I) and CAP-D3 (condensin II), we performed biochemical analyses in Arabidopsis. The role of CAP-D3 in interphase chromatin organization and function was evaluated using cytogenetic and transcriptome analysis in cap-d3 T-DNA insertion mutants. CAP-D2 and CAP-D3 are highly expressed in mitotically active tissues. In silico and pulldown experiments indicate that both CAP-D proteins interact with the other condensin I and II subunits. In cap-d3 mutants, an association of heterochromatic sequences occurs, but the nuclear size and the general histone and DNA methylation patterns remain unchanged. Also, CAP-D3 influences the expression of genes affecting the response to water, chemicals, and stress. The expression and composition of the condensin complexes in Arabidopsis are similar to those in other higher eukaryotes. We propose a model for the CAP-D3 function during interphase in which CAP-D3 localizes in euchromatin loops to stiffen them and consequently separates centromeric regions and 45S rDNA repeats