Search CORE

46 research outputs found

Galaxy Training: A powerful framework for teaching!

Author: Bacon Wendi
Batut Bérénice
Blankenberg Daniel
Bretaudeau Anthony
Clements Davx
Coppens Frederik
Davis John
Doyle Maria A.
Droesbeke Bert
Fahrner Matthias
Fouilloux Anne Claire
Föll Melanie Christine
Galaxy Training Network
Gallardo-Alba Cristóbal
Gladman Simon
Goué Nadia
Griffin Timothy J.
Grüning Björn
Heyl Florian
Hiltemann Saskia
Hotz Hans-Rudolf
Jagtap Pratik D.
Larivière Delphine
Le Bras Yvan
Maier Wolfgang
Mehta Subina
Psomopoulos Fotis
Rasche Helena
Royaux Coline
Serrano-Solano Beatriz
Soranzo Nicola
Syme Anna
van Heusden Peter
Wollmann Thomas
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/06/2022
Field of study

There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments

HAL-CentraleSupelec

HAL - Normandie Université

Ghent University Academic Bibliography

HAL Clermont Université

INRIA a CCSD electronic archive server

Open Research Online

EUR Research Repository

HAL-IRD

HAL: Hyper Article en Ligne

HAL-Rennes 1

FAIRing Research Data to Live

Author: Goué Nadia
Publication venue: HAL CCSD
Publication date: 04/04/2024
Field of study

International audienceChallenges: with the digitization of biological research, the quantity and throughput of data are increasing. Many research teams and laboratories are still using medium-term data management strategies, in the absence of global, sustainable solutions within their reach. With the implementation of an open science policy and publishers' incentives to make datasets available at the time of publication, real strategies need to be considered on a daily basis to anticipate data flows. The current development of tools and their centralization for management and analysis must be accompanied by the necessary consideration of their life cycle, in order to best parameterize and anticipate needs.Context: The regional deployment of IT resources for biology, supported by the Equipex 4DOMICs, as well as access to the OMERO UniCA-EMBRC-Fr database, and the reflection on bioinformatics supported by Academy 4 of the Idex, are all local factors contributing to the reflection and awareness of biology research teams on UniCA. This day is part of the local and national context of open science, with the deployment of data workshops such as DATAZUR, the Equipex mudis4LS and the data management deployment strategies of Infrastructures en Biologie Santé, including France BioImaging with FBI.data.Objective: Raise awareness of and acculturate to open science, data management policy and data protection.In this context, a presentation of AuBi platform actions are introduced within the scopes of Clermont-Auvergne Mesocenter and IFB (Institut Français de Bioinformatique

HAL Clermont Université

Proposer un workflow généraliste pour identifier les interactions protéine-protéine entre deux tissus. Exemple des PPIs entre le sécrétome et le surfaceome pour comprendre le dialogue entre tissus chez les mammifères et en particulier chez le bovin

Author: Bonnet Muriel
Connault Manon
Goué Nadia
Publication venue: HAL CCSD
Publication date: 17/06/2021
Field of study

DoctoralThe prediction of the protein-protein interactions is studied in different domains. In particular, the interactions between surfaceome and secretome may strongly improved the understanding of inter-tissue crosstalk. This intership aims to develop a Snakemake-based workflow named Talkmine for the identification of the molecular dialogue between two biological tissues resulting from protein-protein interactions. The first objective was to identify and test some opensource tools known to predict proteins that belong to secretome or surfaceome. Publically available tools were classified according to three classes, i.e. peptide signal, subcellular location or topology prediction. Secondly, we developed the workflow. In brief, user gives a gene or protein identifiers list. g:Convert tool converts identifiers to a same format and then, Entrez-Direct tool generates a multi-fasta file. The protein sequences are then launched to the predictive tools. Finally, proteins tagged to secretome and surfaceome classes that are sent to PSICQUIC tool, which determines the protein-protein interactions. The user has access to the list of these interactions as well as to the intermediate results.The workflow Talkmine was initiated to be applied to Bos taurus to determine the interactions between muscle and fat tissue, and it could also be applied to other species and research purposes.La prédiction des interactions protéine-protéine est étudiée dans différents domaines. Notamment, la connaissance des interactions entre le surfaceome et le sécrétome contribuerait à une meilleure compréhension du dialogue inter-tissus. Ce stage a pour objectif de développer un workflow en Snakemake, appelé Talkmine, pour identifier le dialogue moléculaire entre deux tissus biologiques résultant d’interactions protéine-protéine. Le premier objectif a été d’identifier puis de tester des outils opensources pour prédire l’appartenance des protéines au sécrétome ou au surfaceome. Ils ont été classés selon trois catégories, la prédiction du peptide signal, de la localisation subcellulaire ou de la topologie.Dans un deuxième temps, le workflow a été développé. L’utilisateur donne en entrée une liste d’identifiants de gènes ou de protéines. L’outil g:Convert convertit les identifiants au format ENSP puis l’outil Entrez-Direct récupère les séquences dans un fichier multi-fasta. Les séquences protéiques sont injectées dans les outils de prédiction. Enfin, les protéines associées aux classes Sécrétome et Surfaceome sont envoyées dans l’outil PSICQUIC, pour identifierles interactions protéine-protéine. L’utilisateur a accès à la liste des interactions ainsi qu’aux résultats intermédiaires.Le workflow Talkmine a été développé pour être appliqué à Bos taurus, afin de déterminer les interactions entre le muscle et le tissu adipeux, et il est applicable à d’autres espèces et domaines de recherch

HAL Clermont Université

HAL Descartes

A Newly Opened Galaxy Platform at Clermont Auvergne University

Author: Goué Nadia
Grimbichler David
Mahul Antoine
Peyret Pierre
Publication venue: HAL CCSD
Publication date: 01/07/2019
Field of study

The Mesocentre as part of Clermont Auvergne University (UCA) is delivering services in sciences data computing (HPC, VM, …) and short-term storage through a network of technology core facilities. These offers are done to assist multi-disciplinary scientists in their computing projects. At that time, we are hosting a computer farm with about 800 cores, 40 nodes for moderate memory usage (<256 GB) and a SMP supercomputer made of 384 cores and 12 TB memory in addition to a scalable storage cluster managed with Ceph of at least 1 TB capacity per user, with a total of 1.2 PB.Hosted by the Mesocentre, the AuBi (Auvergne bioinformatics) platform is a member of IFB (French Bioinformatics Institute, https://www.france-bioinformatique.fr/). AuBi platform aims at sharing expertises and knowledge in large-scale data treatments and analysis by supplying a complete computing environment with hardware and software infrastructures for UCA research laboratories. To fit this goal, a Galaxy server was installed as a facilitator for a computing access to non-bioinformatician biologists. As AuBi platform is involved in various projects, roughly 50 tools related to genomics, metagenomics, transcriptomics and epigenetics were installed to the Galaxy instance as well as homemade tools from a local toolshed.From an informatics infrastructure point of view, our Galaxy server is behind a reverse proxy server, galaxy.mesocentre.uca.fr. A virtual machine, with an extensible disk on a scalable storage cluster (RBD / Ceph), runs the server. It is connected to the HPC facilities through a NFS server in order to allow Galaxy to benefit from the Mesocentre empowered computing and storage capabilities. Users authenticate through a shared LDAP directory between the Galaxy server and the cluster. In addition, BioMaJ1 framework was deployed for sharing databanks access to both Galaxy and cluster.In the future, we plan to expand our access to the users from the IFB community via an eduGAIN connection. Finally, we expect a significant contribution from UCA laboratories into migrating local tools to the IUC toolshed

HAL Clermont Université

AuBi platform for biologists and bioinformaticians at UCA Mesocentre

Author: Goué Nadia
Grimbichler David
Mahul Antoine
Peyret Pierre
Publication venue: HAL CCSD
Publication date: 03/07/2018
Field of study

International audienceThe Mesocentre as part of Clermont Auvergne University (UCA) is delivering services in sciences data computing (HPC, VM, …) and short-term storage through a network of technology core facilities. These offers are done to assist multi-disciplinary scientists in their computing projects. At that time, we are hosting a computer farm with about 800 cores, 40 nodes for moderate memory usage (<256 Gb) and a SMP supercomputer made of 384 cores and 12 To memory in addition to a CEPH storage of at least 1 To capacity by user. Hosted by the Mesocentre, the Auvergne bioinformatics (AuBi) platform is a member of the French Bioinformatics Institute (IFB, https://www.france-bioinformatique.fr/en/platforms/AUBI). AuBi platform aims at sharing expertises and knowledge in large-scale data treatments and analysis by supplying a complete computing environment with hardware and software infrastructures for 9 research laboratories. AuBi platform is then involved in various projects belonging to genomics, metagenomics, transcriptomics, modeling and imaging fields amongst others [1,2,3]. Furthermore, we provide support to UCA laboratories and Associates in their effort to maintain and enhance their scripts and pipelines used on our infrastructure.Another aspect of AuBi platform work is to facilitate computing access to non-bioinformatician biologists by the way of a Galaxy server released in the upcoming weeks. We are also organizing training sessions to help our users, either biologists or bioinformaticians to optimize computing resources usage through command line interface and Galaxy environment.References1. Amato P., Joly M., Besaury L. Oudart A., Taib N., Moné A., Deguillaume L., Delort A.M. and Debroas D. (2017). Active microorganisms thrive among extremely diverse communities in cloud water. PLoS ONE 12(8):e0182869.2. Gasc C, Constantin A, Jaziri F, Peyret P: OCaPPI-Db: an oligonucleotide probe database for pathogen identification through hybridization capture. Database (Oxford) 2017, 2017.3. Parisot N, Peyretaillade E, Dugat-Bony E, Denonfoux J, Mahul A, Peyret P: Probe Design Strategies for Oligonucleotide Microarrays. Methods Mol Biol 2016, 1368:67-82

HAL Clermont Université

Auvergne Bioinformatics platform at UCA Mesocentre

Author: Goué Nadia
Grimbichler David
Mahul Antoine
Peyret Pierre
Publication venue: HAL CCSD
Publication date: 30/06/2020
Field of study

Le mésocentre, qui fait partie de l'Université de Clermont Auvergne (UCA), fournit des services de calcul haute performance pour les données scientifiques et le stockage à court terme grâce à un réseau d'installations technologiques de base. Ces offres sont faites pour aider les scientifiques pluridisciplinaires dans leurs projets de calcul. Nous hébergeons actuellement une ferme informatique d'environ 800 cœurs ; 40 nœuds pour une utilisation modérée de la mémoire (<256 Go) ; un supercalculateur SMP composé de 384 cœurs et de 12 To de mémoire ; plus une technologie GPU (8 GPU de 5120 cœurs chacun) ; une plateforme de cloud computing - basée sur la technologie Openstack - avec un total de 960 cœurs physiques et 9 To de mémoire ; et un stockage CEPH d'une capacité d'au moins 1 To par utilisateur. Hébergée par le Mésocentre, la plateforme de bio-informatique d'Auvergne (AuBi) est membre de l'Institut français de bio-informatique (IFB, https://www.france-bioinformatique.fr/en/platforms/AUBI). La plate-forme AuBi vise à partager les expertises et les connaissances en matière de traitement et d'analyse de données à grande échelle en fournissant un environnement informatique complet avec des infrastructures matérielles et logicielles pour 9 laboratoires de recherche. La plate-forme AuBi est ensuite impliquée dans différents projets appartenant aux domaines de la génomique, de la métagénomique, de la transcriptomique, de la modélisation et de l'imagerie, entre autres [1,2,3]. De plus, nous soutenons les laboratoires de l'UCA et leurs associés dans leurs efforts pour maintenir et améliorer leurs scripts et pipelines utilisés sur notre infrastructure, ainsi qu'un accès facile aux banques de données publiques reflétées par le BioMAJ [4].Un autre aspect du travail de la plateforme AuBi consiste à faciliter l'accès au calcul par le biais de Galaxy [5] et d'une installation de stockage des métadonnées d'images par le biais d'un serveur OMERO [6]. Nous organisons également des sessions de formation pour aider nos utilisateurs, qu'ils soient biologistes ou bioinformaticiens, à optimiser l'utilisation des ressources informatiques via l'interface de ligne de commande ou l'environnement Galaxy.References [1] Pierre Amato, Ludocic Besaury, Muriel Joly, Benjamin Penaud, Laurent Deguillaume and Anne-Marie Delort. Metatranscriptomic exploration of microbial functioning in clouds. Scientific Reports 9: 4383, 2019. [2] François Balfourier, Sophie Bouchet, Sandra Robert, Romain De Oliveira, Hélène Rimbert, Jonathan Kitt, Frédéric Choulet, IWGS Consortium, BreedWheat Consortium and Etienne Paux. Worldwide phylogeography and history of wheat genetic diversity. Science Advances 5(5): eaav0536, 2019 [3] Caroline Pont, Thibault Leroy, Michael Seidel, Alessandro Tondelli, Wandrille Duchemin, David Armisen, Daniel Lang, Daniela, Bustos-Korts, Nadia Goué, François Balfourier, Márta Molnár-Láng, Jacob Lagen Benjamin Kilian, Hakan Özkan, Darren Waite, Sarah Dyer, Letellier Thomas, Michael Alaux, WHEALBI consortium, Joanne Russel, Beat Keller, Fred van Eeuwijk, Manuel Spannagl, Klaus Mayer, Robbie Waugh, Nils Stein, Kuigi Cattivelli, Georg Haberer, Gilles Charmet and Jérôme Salse. Tracing the ancestry of modern bread wheats. Nature Genetics, (51): 905-911, 2019 [4] Olivier Filangi, Yoann Beausse, Anthony Assi, Ludovic Legrand, Jean-Marc Larré, Véronique Martin, Olivier Collin, Christophe Caron, Hugues Leroy, David Allouche. BioMAJ: A flexible framework for databanks synchronization and processing. Bioinformatics, 24(16): 1823-1825, 2008. [5] Enis Afgan, Dannon Baker, Bérénice Batut, Marius van den Beek , Dave Bouvier, Martin Cech, John Chilton, Dave Clements, Nate Coraor, Björn Grüning, Aysam Guerler, Jennifer Hillman-Jackson, Saskia Hiltemann, Vahid Jalili, Helena Rasche, Nicola Soranzo, Jeremy Goecks, James Taylor, Anton Nekrutenko and Daniel Blankenberg. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses : 2018 update. 46(W1) : W537-W544, 2018. [6] http://www.openmicroscopy.org

HAL Clermont Université

Talkmine, a workflow for the prediction of the interactions between secretome and surfaceome in the dialogue between cellular types

Author: Boby Céline
Bonnet Muriel
Connault Manon
Goué Nadia
Tournayre Jérémy
Publication venue: HAL CCSD
Publication date: 06/07/2021
Field of study

International audienc

Hal-Diderot

Talkmine, a workflow for the prediction of the interactions between secretome and surfaceome in the dialogue between cellular types

Author: Boby Céline
Bonnet Muriel
Connault Manon
Goué Nadia
Tournayre Jérémy
Publication venue: HAL CCSD
Publication date: 06/07/2021
Field of study

International audienc

HAL Clermont Université

HAL Descartes

Hal-Diderot

Creation of an integrated molecular dynamics workflow on the Galaxy platform : Characterization of aquaporin pores

Author: Goué Nadia
Label Philippe
Petit Agnès-Elisabeth
Venisse Jean-Stéphane
Publication venue: HAL CCSD
Publication date: 05/07/2022
Field of study

International audienceGalaxy is an international bioinformatics platform for biologists [1] . So far, the Galaxy team has adapted molecular dynamics tools which are mainly tools to create the prerequisites of a simulation or to run a simulation. In our case, this simulation step was done but the tools to finalize our analysis were missing. This is why tools have been developed and integrated in Galaxy. This integration of a succession of internal tools in the form of a Galaxy workflow is intended to help biologists and would benefit from the high performance computing facilities connected to the Galaxy webservice.Tools developed here aim at studying the structure the structure of 102 aquaporin trajectories using a molecular dynamics approach. This approach requires to take into account the molecular scale (Ångströms) of the proteins and the time step (nanosecond). In total, we speak of a simulated trajectory of 100 ns to model the transport of a water molecule [2]. In order to optimize the computational time on a trajectory, each trajectory is divided into several sub-trajectories and the pore diameter calculations are performed for each sub-trajectory. The resulting data are then compiled in a table before being visualized in graphical form.This workflow is designed to work on aquaporin trajectories. Aquaporins are transmembrane proteins that transport water. In addition, an aquaporin is a tetramer composed of four protomers. Each protomer has six transmembrane alpha helices connected by extramembrane loops that structure into a central pore. In addition, each protomer is hourglass-shaped and has two sites consisting of 3 successive amino acids, Asparagine - Proline - Alanine (NPA) and an aromatic arginine site (arR) [3]. The NPA sites form an electro-static barrier preventing excess protons from entering the cell. The arR site is composed of 4 amino acids that form a constriction inside the pore of each protomer. This constriction prevents large particles from passing and also regulates the amount of water that can pass through the transmembrane space at any given time. Ourworlflow allows us to calculate the pore diameter at this constriction. Recent advances in pore diameter characterization of aquaporin complexes, from manipulation of molecular modeling files to visualization of results, will be presented here.References[1] V. Jalili et al., « The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 up-date », Nucleic Acids Research, vol. 48, n o W1, p. W395-W402, juill. 2020, doi: 10.1093/nar/gkaa434.[2] R. O. Dror, R. M. Dirks, J. P. Grossman, H. Xu, et D. E. Shaw, « Biomolecular Simulation: A Computational Mi-croscope for Molecular Biology », Annu. Rev. Biophys., vol. 41, n o 1, p. 429-452, juin 2012, doi: 10.1146/annurev-biophys-042910-155245.[3] J.-S. Venisse et al., « Genome-Wide Identification, Structure Characterization, and Expression Pattern Profiling ofthe Aquaporin Gene Family in Betula pendula », IJMS, vol. 22, n o 14, p. 7269, juill. 2021, doi: 10.3390/ijms22147269

HAL Clermont Université

MetaboCloud : A catalog of microservices hosted on a Cloud infrastructure and addressing issues linked to FAIR principles and open science

Author: Bellembois Thomas
Giacomoni Franck
Goué Nadia
Mahul Antoine
Paulhe Nils
Point Érina
Souc Faustine
Publication venue: HAL CCSD
Publication date: 25/06/2024
Field of study

International audienceMetabolomics, the study of small molecules called metabolites, is a field generating massive and complex data that needs to be processed and interpreted. However, this requires overcoming several challenges, such as data manipulation, where the heterogeneity of technologies makes it difficult to standardize the methods and tools, as well as molecule annotation which is still a major bottleneck nowadays. Bioinformatics therefore plays an important role in answering these issues. However, it also brings its own set of questions, in terms of interoperability and reproducibility for example.In this context, new resources must be put in place to address the needs related to these questions. Consideration about FAIR (Findable, Accessible, Interoperable, Reusable) principles and open science has led to the initiation of MetaboCloud; a collaborative project between teams from the “Exploration of Metabolism Platform” (PFEM) member of the MetaboHUB infrastructure, the Auvergne Bioinformatics (AuBi) platform and the Mesocentre from the Clermont Auvergne Univercity (UCA). It aims to provide a set of bioinformatics tools, in metabolomics as a start, in the form of microservices, hosted on a Cloud infrastructure. It also intends to serve as a proof of concept, and to create a recipe to share with the bioinformatics community, integrating best practices in terms of code and deployment, thus ensuring high service quality and easy maintainability. The MetaboCloud microservices infrastructure is based on (1) bioinformatics tools, (2) from scratch API development if tools are not available, (3) an advanced CI/CD work environment managing the construction of a docker image, (4) taken all together in an OpenStack cloud technology environment. A roadmap containing about ten microservices has been drawn up for this project. Three of them are currently open to the community for use. Two have been developed using the Java language and the SpringBoot web framework. One is based on the CDK tool, which offers several functionalities using structural information (InChI, MOL or SDF) as input data. Firstly, it can return chemical properties of a compound, such as masses, SMILES, InChIs, the logP and the formula. It can also convert the compound’s structural information into another format such as InChI Key, InChI, MOL or SDF. Finally, it can depict a molecule, which means returning its PNG or SVG image. The other microservice is based on the InChI tool, which can, using the same type of input that CDK, generate the InChI and the InChI Key of a compound. The third microservice, derived from the Goslin tool, has been implemented in Python using the Falcon web framework. It can be used to transform a common lipid name into a standardized one. All microservices methods are described in OpenAPI standardized format, which enable anyone to generate code to query them in a large panel of programming languages. Each microservice has its own Docker container.These microservices can be used in different ways. They can be integrated into an application, or they can be used on their own or combined with other microservices inside a script. Furthermore, web components which are a collection of functionalities establishing a standardized component model for the web, enabling the encapsulation and interoperability of individual HTML elements, also developed within PFEM, are available for use as clients to query each of the microservices. These web components are available in a npm library.The development of bioinformatics tools in the form of microservices therefore offers a number of advantages. In particular, from an interoperability point of view, as they can be queried from any programming language. It also addresses issues of reproducibility, since the versioning of a microservice is controlled by its containerization. Moreover, a web portal, referenced in bio.tools, has been created to make all developed applications accessible, associated with their documentation and metadata, thus addressing the "Accessible" dimension of the FAIR principles

HAL Clermont Université