69 research outputs found
Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology
The Galaxy Project offers the popular web browser-based platform Galaxy for running bioinformatics tools and constructing simple workflows. Here, we present a broad collection of additional Galaxy tools for large scale analysis of gene and protein sequences. The motivating research theme is the identification of specific genes of interest in a range of non-model organisms, and our central example is the identification and prediction of "effector" proteins produced by plant pathogens in order to manipulate their host plant. This functional annotation of a pathogen's predicted capacity for virulence is a key step in translating sequence data into potential applications in plant pathology. This collection includes novel tools, and widely-used third-party tools such as NCBI BLASTC wrapped for use within Galaxy. Individual bioinformatics software tools are typically available separately as standalone packages, or in online browserbased form. The Galaxy framework enables the user to combine these and other tools to automate organism scale analyses as workflows, without demanding familiarity with command line tools and scripting.Workflows created using Galaxy can be saved and are reusable, so may be distributed within and between research groups, facilitating the construction of a set of standardised, reusable bioinformatic protocols. The Galaxy tools and workflows described in this manuscript are open source and freely available from the Galaxy Tool Shed (http://usegalaxy.org/toolshed or http://toolshed.g2.bx.psu.edu)
Alteration of the Route to Menaquinone towards Isochorismate-Derived Metabolites
Chorismate and isochorismate constitute branch-point intermediates in the biosynthesis of many aromatic metabolites in microorganisms and plants. To obtain unnatural compounds, we modified the route to menaquinone in Escherichia coli. We propose a model for the binding of isochorismate to the active site of MenD ((1R,2S, 5S,6S)-2-succinyl-5-enolpyruvyl-6-hydroxycyclohex-3-ene-1-carboxylate (SEPHCHC) synthase) that explains the outcome of the native reaction with α-ketoglutarate. We have rationally designed variants of MenD for the conversion of several isochorismate analogues. The double-variant Asn117Arg–Leu478Thr preferentially converts (5S,6S)-5,6-dihydroxycyclohexa-1,3-diene-1-carboxylate (2,3-trans-CHD), the hydrolysis product of isochorismate, with a >70-fold higher ratio than that for the wild type. The single-variant Arg107Ile uses (5S,6S)-6-amino-5-hydroxycyclohexa-1,3-diene-1-carboxylate (2,3-trans-CHA) as substrate with >6-fold conversion compared to wild-type MenD. The novel compounds have been made accessible in vivo (up to 5.3 g L−1). Unexpectedly, as the identified residues such as Arg107 are highly conserved (>94 %), some of the designed variations can be found in wild-type SEPHCHC synthases from other bacteria (Arg107Lys, 0.3 %). This raises the question for the possible natural occurrence of as yet unexplored branches of the shikimate pathway.Fil: Fries, Alexander Erich. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Ciencias de la Tierra y Ambientales de La Pampa. Universidad Nacional de La Pampa. Facultad de Ciencias Exactas y Naturales. Instituto de Ciencias de la Tierra y Ambientales de La Pampa; Argentina. Albert Ludwigs University of Freiburg; AlemaniaFil: Mazzaferro, Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Ciencias de la Tierra y Ambientales de La Pampa. Universidad Nacional de La Pampa. Facultad de Ciencias Exactas y Naturales. Instituto de Ciencias de la Tierra y Ambientales de La Pampa; Argentina. Albert Ludwigs University of Freiburg; AlemaniaFil: Grüning, Björn. Albert Ludwigs University of Freiburg; AlemaniaFil: Bisel, Philippe. Albert Ludwigs University of Freiburg; AlemaniaFil: Stibal, Karin. Albert Ludwigs University of Freiburg; AlemaniaFil: Buchholz, Patrick C. F.. University of Stuttgart; AlemaniaFil: Pleiss, Jürgen. Universität Stuttgart;Fil: Sprenger, Georg A.. Universität Stuttgart;Fil: Müller, Michael. Albert Ludwigs University of Freiburg; Alemani
BioContainers: An open-source and community-driven framework for software standardization
Motivation BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). Availability and Implementation The software is freely available at github.com/BioContainers/.publishedVersio
The RNA workbench: Best practices for RNA and high-throughput sequencing bioinformatics in Galaxy
RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis
A proteomics sample metadata representation for multiomics integration and big data analysis
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.publishedVersio
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update
Galaxy (homepage: https://galaxyproject.org, main public server: https://usegalaxy.org) is a web-based scientific analysis platform used by tens of thousands of scientists across the world to analyze large biomedical datasets such as those found in genomics, proteomics, metabolomics and imaging. Started in 2005, Galaxy continues to focus on three key challenges of data-driven biomedical science: making analyses accessible to all researchers, ensuring analyses are completely reproducible, and making it simple to communicate analyses so that they can be reused and extended. During the last two years, the Galaxy team and the open-source community around Galaxy have made substantial improvements to Galaxy's core framework, user interface, tools, and training materials. Framework and user interface improvements now enable Galaxy to be used for analyzing tens of thousands of datasets, and >5500 tools are now available from the Galaxy ToolShed. The Galaxy community has led an effort to create numerous high-quality tutorials focused on common types of genomic analyses. The Galaxy developer and user communities continue to grow and be integral to Galaxy's development. The number of Galaxy public servers, developers contributing to the Galaxy framework and its tools, and users of the main Galaxy server have all increased substantially
Community-driven ELIXIR activities in single-cell omics
Single-cell omics (SCO) has revolutionized the way and the level of resolution by which life science research is conducted, not only impacting our understanding of fundamental cell biology but also providing novel solutions in cutting-edge medical research. The rapid development of single-cell technologies has been accompanied by the active development of data analysis methods, resulting in a plethora of new analysis tools and strategies every year. Such a rapid development of SCO methods and tools poses several challenges in standardization, benchmarking, computational resources and training. These challenges are in line with the activities of ELIXIR, the European coordinated infrastructure for life science data. Here, we describe the current landscape of and the main challenges in SCO data, and propose the creation of the ELIXIR SCO Community, to coordinate the efforts in order to best serve SCO researchers in Europe and beyond. The Community will build on top of national experiences and pave the way towards integrated long-term solutions for SCO research.
Keywor
Galaxy Training: A powerful framework for teaching!
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments
Community-Driven Data Analysis Training for Biology
The primary problem with the explosion of biomedical datasets is not the data, not computational resources, and not the required storage space, but the general lack of trained and skilled researchers to manipulate and analyze these data. Eliminating this problem requires development of comprehensive educational resources. Here we present a community-driven framework that enables modern, interactive teaching of data analytics in life sciences and facilitates the development of training materials. The key feature of our system is that it is not a static but a continuously improved collection of tutorials. By coupling tutorials with a web-based analysis framework, biomedical researchers can learn by performing computation themselves through a web browser without the need to install software or search for example datasets. Our ultimate goal is to expand the breadth of training materials to include fundamental statistical and data science topics and to precipitate a complete re-engineering of undergraduate and graduate curricula in life sciences. This project is accessible at https://training.galaxyproject.org. We developed an infrastructure that facilitates data analysis training in life sciences. It is an interactive learning platform tuned for current types of data and research problems. Importantly, it provides a means for community-wide content creation and maintenance and, finally, enables trainers and trainees to use the tutorials in a variety of situations, such as those where reliable Internet access is unavailable
- …