Search CORE

2 research outputs found

Integration of Network Performance Monitoring Data at FTS3

Author: Rama Ballesteros Rocío
Salichos Michail
Álvarez Alejandro
Publication venue
Publication date: 31/08/2013
Field of study

Project Specification: The main goal of this project is to optimize the tcp buffer size to make more efficient the file transfers with FTS3. The library that has been implemented provides a way to calculate this providing a source and a destination. This way, whoever is transferring the files does not have to know anything about the logic of how calculate it. In this project, I have done a library to make easy the access to PerfSONAR’s information between two hosts, calculating the optimized tcp buffer size and thereby to making more efficient the transfer of files. As part of my work, I have also tested the library to check if it actually improved the transfer throughput with tools as GridFTP and Globus

ZENODO

Machine learning to infer the process of coevolution under the light of evolution

Author: RAMA BALLESTEROS Rocío
Publication venue: Université de Lausanne, Faculté de biologie et médecine
Publication date: 01/01/2022
Field of study

Coevolution is an important component of the evolutionary biology and describes the reciprocal changes that occur between biological entities as they depend on each other. It is one of the mechanisms driving biodiversity when interactions occur between organisms, and, at the molecular level, it can reveal information about the function and structure of a protein. These coordinated changes between sites along sequences tend to occur to improve or maintain functional and structural interactions, or because of evolutionary processes like compensatory mutations and epistasis. Because of the high throughput sequencing revolution, it is now possible to examine the available genomic databases - encompassing thousands of proteins - to detect coevolution and improve the insights of the genomic data. Nevertheless, current methods inferring coevolution have some limitations and they require large computing time to analyze the data. In my thesis, I used the power of machine learning techniques to infer coevolution in large databases in an easy and fast way. First, I investigated the limitations of the current methods inferring coevolution by testing the effect of the level of divergence on their performance to detect pairs of coevolving sites, comparing their key properties and downsides. Secondly, I developed a machine learning model based on Convolutional Neural Networks (CNN) to detect the signature of coevolution while accounting for the evolutionary history of the sequences. Finally, I provided a user-friendly pipeline to run the model and infer coevolution in any given alignment with its phylogenetic tree. I simulated genomic data based on a large genomic database to train the CNN. I used the model trained on a subset of the bony vertebrate Selectome dataset to detect signature of coevolution in 217 proteins. Overall, my work provides a novel approach based on machine learning techniques to better detect and understand the signature of coevolution, opening the door to investigate other compelling machine learning approaches to take advantage of the large genomic data that is becoming available nowadays. -- La coévolution est une composante importante de la théorie de l'évolution et décrit les changements réciproques qui se produisent entre les entités biologiques lorsqu'elles dépendent les unes des autres. C'est l'un des mécanismes moteur de la biodiversité lorsque des organismes interagissent entre eux. A plus fine échelle, au niveau moléculaire, la coévolution peut révéler des informations sur la fonction et la structure des protéines. Ainsi, des changements entre différents sites d’une séquence qui ont tendance à se produire de manière coordonnée sont la trace de coévolution, permettant d’améliorer ou maintenir les interactions fonctionnelles et structurelles. Grâce à la révolution du séquençage à haut débit, il est désormais possible d’exploiter les bases de données génomiques disponibles, couvrant des milliers de gènes et de protéines afin de détecter les signatures de la coévolution. Cependant, les méthodes actuelles qui infèrent la coévolution ont certaines limites et prennent trop de temps pour analyser les données. Par conséquent, dans cette thèse, j'exploite la puissance des techniques d'apprentissage automatique utilisant des réseaux de neurones convolutifs (CNN) pour inférer la coévolution de manière simple et rapide sur des bases de données génomiques à large échelle. Dans un premier temps, j'ai étudié les limites des méthodes d'inférence de coévolution actuelles (plmDCA, CoMap et Coev) en fonction du niveau de divergence, en comparant leurs avantages et leurs inconvénients respectifs. Deuxièmement, j'ai développé un modèle d'apprentissage automatique basé sur des CNN pour détecter la signature de la coévolution au sein d’une protéine en exploitant le signal laissé par l'évolution sur des sites qui ont des changements coordonnés. Enfin, j'ai développé un pipeline facile à utiliser pour exécuter le modèle et détecter de la coévolution à partir d’un alignement donné et de son arbre phylogénétique. J'ai simulé des données génomiques basées sur l'ensemble de données bony vertebrate Selectome et je les ai utilisées pour entraîner le CNN. Après avoir entraîné le modèle, je l'ai testé sur le jeu de données réel bony vertebrate Selectome, détectant 217 protéines sous signature d’une coévolution. Dans l'ensemble, mon travail fournit une approche puissante basée sur des techniques d'apprentissage automatique pour mieux détecter et comprendre la signature de la coévolution, ouvrant la porte à d'autres approches d'apprentissage automatique utilisables sur des données génomiques afin de répondre à des questions de biologie évolutive

Serveur académique lausannois