28 research outputs found

    A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive

    Get PDF
    Abstract Summary Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment. Availability CLI ENA upload tool is available at github.com/usegalaxy-eu/ena-upload-cli (DOI 10.5281/zenodo.4537621); Galaxy ENA upload tool at toolshed.g2.bx.psu.edu/view/iuc/ena_upload/382518f24d6d and https://github.com/galaxyproject/tools-iuc/tree/master/tools/ena_upload (development) and; ENA upload Galaxy container at github.com/ELIXIR-Belgium/ena-upload-container (DOI 10.5281/zenodo.4730785) </jats:sec

    Small RNA profiling of low biomass samples: identification and removal of contaminants

    Get PDF
    Background Sequencing-based analyses of low-biomass samples are known to be prone to misinterpretation due to the potential presence of contaminating molecules derived from laboratory reagents and environments. DNA contamination has been previously reported, yet contamination with RNA is usually considered to be very unlikely due to its inherent instability. Small RNAs (sRNAs) identified in tissues and bodily fluids, such as blood plasma, have implications for physiology and pathology, and therefore the potential to act as disease biomarkers. Thus, the possibility for RNA contaminants demands careful evaluation. Results Herein, we report on the presence of small RNA (sRNA) contaminants in widely used microRNA extraction kits and propose an approach for their depletion. We sequenced sRNAs extracted from human plasma samples and detected important levels of non-human (exogenous) sequences whose source could be traced to the microRNA extraction columns through a careful qPCR-based analysis of several laboratory reagents. Furthermore, we also detected the presence of artefactual sequences related to these contaminants in a range of published datasets, thereby arguing in particular for a re-evaluation of reports suggesting the presence of exogenous RNAs of microbial and dietary origin in blood plasma. To avoid artefacts in future experiments, we also devise several protocols for the removal of contaminant RNAs, define minimal amounts of starting material for artefact-free analyses, and confirm the reduction of contaminant levels for identification of bona fide sequences using ‘ultra-clean’ extraction kits. Conclusion This is the first report on the presence of RNA molecules as contaminants in RNA extraction kits. The described protocols should be applied in the future to avoid confounding sRNA studies. Keywords: RNA sequencing; Artefact removal; Exogenous RNA in human blood plasma; Contaminant RNA; Spin column

    The RNA workbench: Best practices for RNA and high-throughput sequencing bioinformatics in Galaxy

    Get PDF
    RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis

    Community-Driven Data Analysis Training for Biology

    Get PDF
    The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org. We developed an infrastructure that facilitates data analysis training in life sciences. It is an interactive learning platform tuned for current types of data and research problems. Importantly, it provides a means for community-wide content creation and maintenance and, finally, enables trainers and trainees to use the tutorials in a variety of situations, such as those where reliable Internet access is unavailable

    Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling

    Get PDF
    Abstract Background DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. Results Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. Conclusions We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq

    A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive

    Get PDF
    Many aspects of the global response to the COVID-19 pandemic are enabled by the fast and open publication of SARS-CoV-2 genetic sequence data. The European Nucleotide Archive (ENA) is the European recommended open repository for genetic sequences. In this work, we present a tool for submitting raw sequencing reads of SARS-CoV-2 to ENA. The tool features a single-step submission process, a graphical user interface, tabular-formatted metadata and the possibility to remove human reads prior to submission. A Galaxy wrap of the tool allows users with little or no bioinformatic knowledge to do bulk sequencing read submissions. The tool is also packed in a Docker container to ease deployment

    Training data for 'From peaks to gene' tutorial (Galaxy Training Material)

    No full text
    <p>The data provided here are part of a Galaxy Training Network tutorial that analyzes peaks from a study published by Li et al., 2012 (DOI:10.1016/j.stem.2012.04.023) to identify target genes</p
    corecore