66 research outputs found

    The Galaxy platform for accessible, reproducible, and collaborative data analyses:2024 update

    Get PDF
    Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.</p

    The Galaxy platform for accessible, reproducible, and collaborative data analyses:2024 update

    Get PDF
    Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.</p

    Galaxy training: A powerful framework for teaching!

    Get PDF
    There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform

    FAIR data retrieval for sensitive clinical research data in Galaxy

    Get PDF
    Background: In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized “omics” platform for FAIR data analysis. Results: To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow. Conclusions: We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.</p

    WeFaceNano:a user-friendly pipeline for complete ONT sequence assembly and detection of antibiotic resistance in multi-plasmid bacterial isolates

    Get PDF
    Background: Bacterial plasmids often carry antibiotic resistance genes and are a significant factor in the spread of antibiotic resistance. The ability to completely assemble plasmid sequences would facilitate the localization of antibiotic resistance genes, the identification of genes that promote plasmid transmission and the accurate tracking of plasmid mobility. However, the complete assembly of plasmid sequences using the currently most widely used sequencing platform (Illumina-based sequencing) is restricted due to the generation of short sequence lengths. The long-read Oxford Nanopore Technologies (ONT) sequencing platform overcomes this limitation. Still, the assembly of plasmid sequence data remains challenging due to software incompatibility with long-reads and the error rate generated using ONT sequencing. Bioinformatics pipelines have been developed for ONT-generated sequencing but require computational skills that frequently are beyond the abilities of scientific researchers. To overcome this challenge, the authors developed ‘WeFaceNano’, a user-friendly Web interFace for rapid assembly and analysis of plasmid DNA sequences generated using the ONT platform. WeFaceNano includes: a read statistics report; two assemblers (Miniasm and Flye); BLAST searching; the detection of antibiotic resistance- and replicon genes and several plasmid visualizations. A user-friendly interface displays the main features of WeFaceNano and gives access to the analysis tools. Results: Publicly available ONT sequence data of 21 plasmids were used to validate WeFaceNano, with plasmid assemblages and anti-microbial resistance gene detection being concordant with the published results. Interestingly, the “Flye” assembler with “meta” settings generated the most complete plasmids. Conclusions: WeFaceNano is a user-friendly open-source software pipeline suitable for accurate plasmid assembly and the detection of anti-microbial resistance genes in (clinical) samples where multiple plasmids can be present.</p

    IReport: A generalised Galaxy solution for integrated experimental reporting

    Get PDF
    Background: Galaxy offers a number of visualisation options with components, such as Trackster, Circster and Galaxy Charts, but currently lacks the ability to easily combine outputs from different tools into a single view or report. A number of tools produce HTML reports as output in order to combine the various output files from a single tool; however, this requires programming and knowledge of HTML, and the reports must be custom-made for each new tool.Findings: We have developed a generic and flexible reporting tool for Galaxy, iReport, that allows users to create interactive HTML reports directly from the Galaxy UI, with the ability to combine an arbitrary number of outputs from any number of different tools. Content can be organised into different tabs, and interactivity can be added to components. To demonstrate the capability of iReport we provide two publically available examples, the first is an iReport explaining about iReports, created for, and using content from the recent Galaxy Community Conference 2014. The second is a genetic report based on a trio analysis to determine candidate pathogenic variants which uses our previously developed Galaxy toolset for whole-genome NGS analysis, CGtag. These reports may be adapted for outputs from any sequencing platform and any results, such as omics data, non-high throughput results and clinical variables.Conclusions: iReport provides a secure, collaborative, and flexible web-based reporting system that is compatible with Galaxy (and non-Galaxy) generated content. We demonstrate its value with a real-life example of reporting genetic trio-analysis

    ASaiM: A Galaxy-based framework to analyze microbiota data

    Get PDF
    Background: New generations of sequencing platforms coupled to numerous bioinformatics tools have led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings: We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides an extensive collection of tools to assemble, extract, explore, and visualize microbiota information from raw metataxonomic, metagenomic, or metatranscriptomic sequences. To guide the analyses, several customizable workflows are included and are supported by tutorials and Galaxy interactive tours, which guide users through the analyses step by step. ASaiM is implemented as a Galaxy Docker flavour. It is scalable to thousands of datasets but also can be used on a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io). Conclusions: Based on the Galaxy framework, ASaiM offers a sophisticated environment with a variety of tools, workflows, documentation, and training to scientists working on complex microorganism communities. It makes analysis and exploration analyses of microbiota data easy, quick, transparent, reproducible, and shareable

    CGtag: Complete genomics toolkit and annotation in a cloud-based Galaxy

    Get PDF
    Background: Complete Genomics provides an open-source suite of command-line tools for the analysis of their CG-formatted mapped sequencing files. Determination of; for example, the functional impact of detected variants, requires annotation with various databases that often require command-line and/or programming experience; thus, limiting their use to the average research scientist. We have therefore implemented this CG toolkit, together with a number of annotation, visualisation and file manipulation tools in Galaxy called CGtag (Complete Genomics Toolkit and Annotation in a Cloud-based Galaxy).Findings: In order to provide research scientists with web-based, simple and accurate analytical and visualisation applications for the selection of candidate mutations from Complete Genomics data, we have implemented the open-source Complete Genomics tool set, CGATools, in Galaxy. In addition we implemented some of the most popular command-line annotation and visualisation tools to allow research scientists to select candidate pathological mutations (SNV, and indels). Furthermore, we have developed a cloud-based public Galaxy instance to host the CGtag toolkit and other associated modules.Conclusions: CGtag provides a user-friendly interface to all research scientists wishing to select candidate variants from CG or other next-generation sequencing platforms' data. By using a cloud-based infrastructure, we can also assure sufficient and on-demand computation and storage resources to handle the analysis tasks. The tools are freely available for use from an NBIC/CTMM-TraIT (The Netherlands Bioinformatics Center/Center for Translational Molecular Medicine) cloud-based Galaxy instance, or can be installed to a local (production) Galaxy via the NBIC Galaxy tool shed

    Development and evaluation of a culture-free microbiota profiling platform (MYcrobiota) for clinical diagnostics

    Get PDF
    Microbiota profiling has the potential to greatly impact on routine clinical diagnostics by detecting DNA derived from live, fastidious, and dead bacterial cells present within clinical samples. Such results could potentially be used to benefit patients by influencing antibiotic prescribing practices or to generate new classical-based diagnostic methods, e.g., culture or PCR. However, technical flaws in 16S rRNA gene next-generation sequencing (NGS) protocols, together with the requirement for access to bioinformatics, currently hinder the introduction of microbiota analysis into clinical diagnostics. Here, we report on the development and evaluation of an “end-to-end” microbiota profiling platform (MYcrobiota), which combines our previously validated micelle PCR/NGS (micPCR/NGS) methodology with an easy-to-use, dedicated bioinf

    ImmunoGlobulin galaxy (IGGalaxy) for simple determination and quantitation of immunoglobulin heavy chain rearrangements from NGS

    Get PDF
    Background: Sequence analysis of immunoglobulin heavy chain (IGH) gene rearrangements and frequency analysis is a powerful tool for studying the immune repertoire, immune responses and immune dysregulation in health and disease. The challenge is to provide user friendly, secure and reproducible analytical services that are available for both small and large laboratories which are determining VDJ repertoire using NGS technology. Results: In this study we describe ImmunoGlobulin Galaxy (IGGalaxy)- a convenient web based application for analyzing next-generation sequencing results and reporting IGH gene rearrangements for both repertoire and clonality studies. IGGalaxy has two analysis options one using the built in igBLAST algorithm and the second using output from IMGT; in either case repertoire summaries for the B-cell populations tested are available. IGGalaxy supports multi-sample and multi-replicate input analysis for both igBLAST and IMGT/HIGHV-QUEST. We demonstrate the technical validity of this platform using a standard dataset, S22, used for benchmarking the performance of antibody alignment utilities with a 99.9 % concordance with previous results. Re-analysis of NGS data from our samples of RAG-deficient patients demonstrated the validity and user friendliness of this tool. Conclusions: IGGalaxy provides clinical researchers with detailed insight into the repertoire of the B-cell population per individual sequenced and between control and pathogenic genomes. IGGalaxy was developed for 454 NGS results but is capable of analyzing alternative NGS data (e.g. Illumina, Ion Torrent). We demonstrate the use of a Galaxy virtual machine to determine the VDJ repertoire for reference data and from B-cells taken from immune deficient patients. IGGalaxy is available as a VM for download and use on a desktop PC or on a server
    • 

    corecore