46 research outputs found
Galaxy Training: A powerful framework for teaching!
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments
FAIRing Research Data to Live
International audienceChallenges: with the digitization of biological research, the quantity and throughput of data are increasing. Many research teams and laboratories are still using medium-term data management strategies, in the absence of global, sustainable solutions within their reach. With the implementation of an open science policy and publishers' incentives to make datasets available at the time of publication, real strategies need to be considered on a daily basis to anticipate data flows. The current development of tools and their centralization for management and analysis must be accompanied by the necessary consideration of their life cycle, in order to best parameterize and anticipate needs.Context: The regional deployment of IT resources for biology, supported by the Equipex 4DOMICs, as well as access to the OMERO UniCA-EMBRC-Fr database, and the reflection on bioinformatics supported by Academy 4 of the Idex, are all local factors contributing to the reflection and awareness of biology research teams on UniCA. This day is part of the local and national context of open science, with the deployment of data workshops such as DATAZUR, the Equipex mudis4LS and the data management deployment strategies of Infrastructures en Biologie Santé, including France BioImaging with FBI.data.Objective: Raise awareness of and acculturate to open science, data management policy and data protection.In this context, a presentation of AuBi platform actions are introduced within the scopes of Clermont-Auvergne Mesocenter and IFB (Institut Français de Bioinformatique
Proposer un workflow généraliste pour identifier les interactions protéine-protéine entre deux tissus. Exemple des PPIs entre le sécrétome et le surfaceome pour comprendre le dialogue entre tissus chez les mammifÚres et en particulier chez le bovin
DoctoralThe prediction of the protein-protein interactions is studied in different domains. In particular, the interactions between surfaceome and secretome may strongly improved the understanding of inter-tissue crosstalk. This intership aims to develop a Snakemake-based workflow named Talkmine for the identification of the molecular dialogue between two biological tissues resulting from protein-protein interactions. The first objective was to identify and test some opensource tools known to predict proteins that belong to secretome or surfaceome. Publically available tools were classified according to three classes, i.e. peptide signal, subcellular location or topology prediction. Secondly, we developed the workflow. In brief, user gives a gene or protein identifiers list. g:Convert tool converts identifiers to a same format and then, Entrez-Direct tool generates a multi-fasta file. The protein sequences are then launched to the predictive tools. Finally, proteins tagged to secretome and surfaceome classes that are sent to PSICQUIC tool, which determines the protein-protein interactions. The user has access to the list of these interactions as well as to the intermediate results.The workflow Talkmine was initiated to be applied to Bos taurus to determine the interactions between muscle and fat tissue, and it could also be applied to other species and research purposes.La prĂ©diction des interactions protĂ©ine-protĂ©ine est Ă©tudiĂ©e dans diffĂ©rents domaines. Notamment, la connaissance des interactions entre le surfaceome et le sĂ©crĂ©tome contribuerait Ă une meilleure comprĂ©hension du dialogue inter-tissus. Ce stage a pour objectif de dĂ©velopper un workflow en Snakemake, appelĂ© Talkmine, pour identifier le dialogue molĂ©culaire entre deux tissus biologiques rĂ©sultant dâinteractions protĂ©ine-protĂ©ine. Le premier objectif a Ă©tĂ© dâidentifier puis de tester des outils opensources pour prĂ©dire lâappartenance des protĂ©ines au sĂ©crĂ©tome ou au surfaceome. Ils ont Ă©tĂ© classĂ©s selon trois catĂ©gories, la prĂ©diction du peptide signal, de la localisation subcellulaire ou de la topologie.Dans un deuxiĂšme temps, le workflow a Ă©tĂ© dĂ©veloppĂ©. Lâutilisateur donne en entrĂ©e une liste dâidentifiants de gĂšnes ou de protĂ©ines. Lâoutil g:Convert convertit les identifiants au format ENSP puis lâoutil Entrez-Direct rĂ©cupĂšre les sĂ©quences dans un fichier multi-fasta. Les sĂ©quences protĂ©iques sont injectĂ©es dans les outils de prĂ©diction. Enfin, les protĂ©ines associĂ©es aux classes SĂ©crĂ©tome et Surfaceome sont envoyĂ©es dans lâoutil PSICQUIC, pour identifierles interactions protĂ©ine-protĂ©ine. Lâutilisateur a accĂšs Ă la liste des interactions ainsi quâaux rĂ©sultats intermĂ©diaires.Le workflow Talkmine a Ă©tĂ© dĂ©veloppĂ© pour ĂȘtre appliquĂ© Ă Bos taurus, afin de dĂ©terminer les interactions entre le muscle et le tissu adipeux, et il est applicable Ă dâautres espĂšces et domaines de recherch
A Newly Opened Galaxy Platform at Clermont Auvergne University
The Mesocentre as part of Clermont Auvergne University (UCA) is delivering services in sciences data computing (HPC, VM, âŠ) and short-term storage through a network of technology core facilities. These offers are done to assist multi-disciplinary scientists in their computing projects. At that time, we are hosting a computer farm with about 800 cores, 40 nodes for moderate memory usage (<256 GB) and a SMP supercomputer made of 384 cores and 12 TB memory in addition to a scalable storage cluster managed with Ceph of at least 1 TB capacity per user, with a total of 1.2 PB.Hosted by the Mesocentre, the AuBi (Auvergne bioinformatics) platform is a member of IFB (French Bioinformatics Institute, https://www.france-bioinformatique.fr/). AuBi platform aims at sharing expertises and knowledge in large-scale data treatments and analysis by supplying a complete computing environment with hardware and software infrastructures for UCA research laboratories. To fit this goal, a Galaxy server was installed as a facilitator for a computing access to non-bioinformatician biologists. As AuBi platform is involved in various projects, roughly 50 tools related to genomics, metagenomics, transcriptomics and epigenetics were installed to the Galaxy instance as well as homemade tools from a local toolshed.From an informatics infrastructure point of view, our Galaxy server is behind a reverse proxy server, galaxy.mesocentre.uca.fr. A virtual machine, with an extensible disk on a scalable storage cluster (RBD / Ceph), runs the server. It is connected to the HPC facilities through a NFS server in order to allow Galaxy to benefit from the Mesocentre empowered computing and storage capabilities. Users authenticate through a shared LDAP directory between the Galaxy server and the cluster. In addition, BioMaJ1 framework was deployed for sharing databanks access to both Galaxy and cluster.In the future, we plan to expand our access to the users from the IFB community via an eduGAIN connection. Finally, we expect a significant contribution from UCA laboratories into migrating local tools to the IUC toolshed
AuBi platform for biologists and bioinformaticians at UCA Mesocentre
International audienceThe Mesocentre as part of Clermont Auvergne University (UCA) is delivering services in sciences data computing (HPC, VM, âŠ) and short-term storage through a network of technology core facilities. These offers are done to assist multi-disciplinary scientists in their computing projects. At that time, we are hosting a computer farm with about 800 cores, 40 nodes for moderate memory usage (<256 Gb) and a SMP supercomputer made of 384 cores and 12 To memory in addition to a CEPH storage of at least 1 To capacity by user. Hosted by the Mesocentre, the Auvergne bioinformatics (AuBi) platform is a member of the French Bioinformatics Institute (IFB, https://www.france-bioinformatique.fr/en/platforms/AUBI). AuBi platform aims at sharing expertises and knowledge in large-scale data treatments and analysis by supplying a complete computing environment with hardware and software infrastructures for 9 research laboratories. AuBi platform is then involved in various projects belonging to genomics, metagenomics, transcriptomics, modeling and imaging fields amongst others [1,2,3]. Furthermore, we provide support to UCA laboratories and Associates in their effort to maintain and enhance their scripts and pipelines used on our infrastructure.Another aspect of AuBi platform work is to facilitate computing access to non-bioinformatician biologists by the way of a Galaxy server released in the upcoming weeks. We are also organizing training sessions to help our users, either biologists or bioinformaticians to optimize computing resources usage through command line interface and Galaxy environment.References1. Amato P., Joly M., Besaury L. Oudart A., Taib N., MonĂ© A., Deguillaume L., Delort A.M. and Debroas D. (2017). Active microorganisms thrive among extremely diverse communities in cloud water. PLoS ONE 12(8):e0182869.2. Gasc C, Constantin A, Jaziri F, Peyret P: OCaPPI-Db: an oligonucleotide probe database for pathogen identification through hybridization capture. Database (Oxford) 2017, 2017.3. Parisot N, Peyretaillade E, Dugat-Bony E, Denonfoux J, Mahul A, Peyret P: Probe Design Strategies for Oligonucleotide Microarrays. Methods Mol Biol 2016, 1368:67-82
Auvergne Bioinformatics platform at UCA Mesocentre
Le mĂ©socentre, qui fait partie de l'UniversitĂ© de Clermont Auvergne (UCA), fournit des services de calcul haute performance pour les donnĂ©es scientifiques et le stockage Ă court terme grĂące Ă un rĂ©seau d'installations technologiques de base. Ces offres sont faites pour aider les scientifiques pluridisciplinaires dans leurs projets de calcul. Nous hĂ©bergeons actuellement une ferme informatique d'environ 800 cĆurs ; 40 nĆuds pour une utilisation modĂ©rĂ©e de la mĂ©moire (<256 Go) ; un supercalculateur SMP composĂ© de 384 cĆurs et de 12 To de mĂ©moire ; plus une technologie GPU (8 GPU de 5120 cĆurs chacun) ; une plateforme de cloud computing - basĂ©e sur la technologie Openstack - avec un total de 960 cĆurs physiques et 9 To de mĂ©moire ; et un stockage CEPH d'une capacitĂ© d'au moins 1 To par utilisateur. HĂ©bergĂ©e par le MĂ©socentre, la plateforme de bio-informatique d'Auvergne (AuBi) est membre de l'Institut français de bio-informatique (IFB, https://www.france-bioinformatique.fr/en/platforms/AUBI). La plate-forme AuBi vise Ă partager les expertises et les connaissances en matiĂšre de traitement et d'analyse de donnĂ©es Ă grande Ă©chelle en fournissant un environnement informatique complet avec des infrastructures matĂ©rielles et logicielles pour 9 laboratoires de recherche. La plate-forme AuBi est ensuite impliquĂ©e dans diffĂ©rents projets appartenant aux domaines de la gĂ©nomique, de la mĂ©tagĂ©nomique, de la transcriptomique, de la modĂ©lisation et de l'imagerie, entre autres [1,2,3]. De plus, nous soutenons les laboratoires de l'UCA et leurs associĂ©s dans leurs efforts pour maintenir et amĂ©liorer leurs scripts et pipelines utilisĂ©s sur notre infrastructure, ainsi qu'un accĂšs facile aux banques de donnĂ©es publiques reflĂ©tĂ©es par le BioMAJ [4].Un autre aspect du travail de la plateforme AuBi consiste Ă faciliter l'accĂšs au calcul par le biais de Galaxy [5] et d'une installation de stockage des mĂ©tadonnĂ©es d'images par le biais d'un serveur OMERO [6]. Nous organisons Ă©galement des sessions de formation pour aider nos utilisateurs, qu'ils soient biologistes ou bioinformaticiens, Ă optimiser l'utilisation des ressources informatiques via l'interface de ligne de commande ou l'environnement Galaxy.References [1] Pierre Amato, Ludocic Besaury, Muriel Joly, Benjamin Penaud, Laurent Deguillaume and Anne-Marie Delort. Metatranscriptomic exploration of microbial functioning in clouds. Scientific Reports 9: 4383, 2019. [2] François Balfourier, Sophie Bouchet, Sandra Robert, Romain De Oliveira, HĂ©lĂšne Rimbert, Jonathan Kitt, FrĂ©dĂ©ric Choulet, IWGS Consortium, BreedWheat Consortium and Etienne Paux. Worldwide phylogeography and history of wheat genetic diversity. Science Advances 5(5): eaav0536, 2019 [3] Caroline Pont, Thibault Leroy, Michael Seidel, Alessandro Tondelli, Wandrille Duchemin, David Armisen, Daniel Lang, Daniela, Bustos-Korts, Nadia GouĂ©, François Balfourier, MĂĄrta MolnĂĄr-LĂĄng, Jacob Lagen Benjamin Kilian, Hakan Ăzkan, Darren Waite, Sarah Dyer, Letellier Thomas, Michael Alaux, WHEALBI consortium, Joanne Russel, Beat Keller, Fred van Eeuwijk, Manuel Spannagl, Klaus Mayer, Robbie Waugh, Nils Stein, Kuigi Cattivelli, Georg Haberer, Gilles Charmet and JĂ©rĂŽme Salse. Tracing the ancestry of modern bread wheats. Nature Genetics, (51): 905-911, 2019 [4] Olivier Filangi, Yoann Beausse, Anthony Assi, Ludovic Legrand, Jean-Marc LarrĂ©, VĂ©ronique Martin, Olivier Collin, Christophe Caron, Hugues Leroy, David Allouche. BioMAJ: A flexible framework for databanks synchronization and processing. Bioinformatics, 24(16): 1823-1825, 2008. [5] Enis Afgan, Dannon Baker, BĂ©rĂ©nice Batut, Marius van den Beek , Dave Bouvier, Martin Cech, John Chilton, Dave Clements, Nate Coraor, Björn GrĂŒning, Aysam Guerler, Jennifer Hillman-Jackson, Saskia Hiltemann, Vahid Jalili, Helena Rasche, Nicola Soranzo, Jeremy Goecks, James Taylor, Anton Nekrutenko and Daniel Blankenberg. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses : 2018 update. 46(W1) : W537-W544, 2018. [6] http://www.openmicroscopy.org
Talkmine, a workflow for the prediction of the interactions between secretome and surfaceome in the dialogue between cellular types
International audienc
Talkmine, a workflow for the prediction of the interactions between secretome and surfaceome in the dialogue between cellular types
International audienc
Creation of an integrated molecular dynamics workflow on the Galaxy platform : Characterization of aquaporin pores
International audienceGalaxy is an international bioinformatics platform for biologists [1] . So far, the Galaxy team has adapted molecular dynamics tools which are mainly tools to create the prerequisites of a simulation or to run a simulation. In our case, this simulation step was done but the tools to finalize our analysis were missing. This is why tools have been developed and integrated in Galaxy. This integration of a succession of internal tools in the form of a Galaxy workflow is intended to help biologists and would benefit from the high performance computing facilities connected to the Galaxy webservice.Tools developed here aim at studying the structure the structure of 102 aquaporin trajectories using a molecular dynamics approach. This approach requires to take into account the molecular scale (Ă
ngströms) of the proteins and the time step (nanosecond). In total, we speak of a simulated trajectory of 100 ns to model the transport of a water molecule [2]. In order to optimize the computational time on a trajectory, each trajectory is divided into several sub-trajectories and the pore diameter calculations are performed for each sub-trajectory. The resulting data are then compiled in a table before being visualized in graphical form.This workflow is designed to work on aquaporin trajectories. Aquaporins are transmembrane proteins that transport water. In addition, an aquaporin is a tetramer composed of four protomers. Each protomer has six transmembrane alpha helices connected by extramembrane loops that structure into a central pore. In addition, each protomer is hourglass-shaped and has two sites consisting of 3 successive amino acids, Asparagine - Proline - Alanine (NPA) and an aromatic arginine site (arR) [3]. The NPA sites form an electro-static barrier preventing excess protons from entering the cell. The arR site is composed of 4 amino acids that form a constriction inside the pore of each protomer. This constriction prevents large particles from passing and also regulates the amount of water that can pass through the transmembrane space at any given time. Ourworlflow allows us to calculate the pore diameter at this constriction. Recent advances in pore diameter characterization of aquaporin complexes, from manipulation of molecular modeling files to visualization of results, will be presented here.References[1] V. Jalili et al., « The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 up-date », Nucleic Acids Research, vol. 48, n o W1, p. W395-W402, juill. 2020, doi: 10.1093/nar/gkaa434.[2] R. O. Dror, R. M. Dirks, J. P. Grossman, H. Xu, et D. E. Shaw, « Biomolecular Simulation: A Computational Mi-croscope for Molecular Biology », Annu. Rev. Biophys., vol. 41, n o 1, p. 429-452, juin 2012, doi: 10.1146/annurev-biophys-042910-155245.[3] J.-S. Venisse et al., « Genome-Wide Identification, Structure Characterization, and Expression Pattern Profiling ofthe Aquaporin Gene Family in Betula pendula », IJMS, vol. 22, n o 14, p. 7269, juill. 2021, doi: 10.3390/ijms22147269
MetaboCloud : A catalog of microservices hosted on a Cloud infrastructure and addressing issues linked to FAIR principles and open science
International audienceMetabolomics, the study of small molecules called metabolites, is a field generating massive and complex data that needs to be processed and interpreted. However, this requires overcoming several challenges, such as data manipulation, where the heterogeneity of technologies makes it difficult to standardize the methods and tools, as well as molecule annotation which is still a major bottleneck nowadays. Bioinformatics therefore plays an important role in answering these issues. However, it also brings its own set of questions, in terms of interoperability and reproducibility for example.In this context, new resources must be put in place to address the needs related to these questions. Consideration about FAIR (Findable, Accessible, Interoperable, Reusable) principles and open science has led to the initiation of MetaboCloud; a collaborative project between teams from the âExploration of Metabolism Platformâ (PFEM) member of the MetaboHUB infrastructure, the Auvergne Bioinformatics (AuBi) platform and the Mesocentre from the Clermont Auvergne Univercity (UCA). It aims to provide a set of bioinformatics tools, in metabolomics as a start, in the form of microservices, hosted on a Cloud infrastructure. It also intends to serve as a proof of concept, and to create a recipe to share with the bioinformatics community, integrating best practices in terms of code and deployment, thus ensuring high service quality and easy maintainability. The MetaboCloud microservices infrastructure is based on (1) bioinformatics tools, (2) from scratch API development if tools are not available, (3) an advanced CI/CD work environment managing the construction of a docker image, (4) taken all together in an OpenStack cloud technology environment. A roadmap containing about ten microservices has been drawn up for this project. Three of them are currently open to the community for use. Two have been developed using the Java language and the SpringBoot web framework. One is based on the CDK tool, which offers several functionalities using structural information (InChI, MOL or SDF) as input data. Firstly, it can return chemical properties of a compound, such as masses, SMILES, InChIs, the logP and the formula. It can also convert the compoundâs structural information into another format such as InChI Key, InChI, MOL or SDF. Finally, it can depict a molecule, which means returning its PNG or SVG image. The other microservice is based on the InChI tool, which can, using the same type of input that CDK, generate the InChI and the InChI Key of a compound. The third microservice, derived from the Goslin tool, has been implemented in Python using the Falcon web framework. It can be used to transform a common lipid name into a standardized one. All microservices methods are described in OpenAPI standardized format, which enable anyone to generate code to query them in a large panel of programming languages. Each microservice has its own Docker container.These microservices can be used in different ways. They can be integrated into an application, or they can be used on their own or combined with other microservices inside a script. Furthermore, web components which are a collection of functionalities establishing a standardized component model for the web, enabling the encapsulation and interoperability of individual HTML elements, also developed within PFEM, are available for use as clients to query each of the microservices. These web components are available in a npm library.The development of bioinformatics tools in the form of microservices therefore offers a number of advantages. In particular, from an interoperability point of view, as they can be queried from any programming language. It also addresses issues of reproducibility, since the versioning of a microservice is controlled by its containerization. Moreover, a web portal, referenced in bio.tools, has been created to make all developed applications accessible, associated with their documentation and metadata, thus addressing the "Accessible" dimension of the FAIR principles