3 research outputs found

    Computational toolbox towards evolutionary domain mapping of membrane proteins

    Get PDF
    Curs 2012-2013Membrane proteins account for about 20% to 30% of all proteins encoded in a typical genome. They play central roles in multiple cellular processes mediating the interaction of the cell with its surrounding. Over 60% of all drug targets contain a membrane domain. The experimental difficulties of obtaining a crystal structural severely limits our ability or understanding of membrane protein function. Computational evolutionary studies of proteins are crucial for the prediction of 3D structures. In this project, we construct a tool able to quantify the evolutionary positive selective pressure on each residue of membrane proteins through maximum likelihood phylogeny reconstruction. The conservation plot combined with a structural homology model is also a potent tool to predict those residues that have essentials roles in the structure and function of a membrane protein and can be very useful in the design of validation experiments.Director/a: Mireia Olivella i Alex Peràlvare

    Characterising the source of errors for metagenomic taxonomic classification

    Get PDF
    Characterising microbial communities enables a better understanding of their complexity and the contribution to the environment. Metagenomics has been a rapidly expanding field since the revolution of next generation sequencing began, and it has a wide range of application including for medicine, agriculture, forensics, archaeology and even domestic use [Sarkar et al., 2021, Holman et al., 2017, Khodakova et al.,2014, Santiago-Rodriguez et al., 2017, Vilanova et al., 2015]. Sequencing amplicon data, such as 16S rRNA, is now commonly used to characterise the microbiome in a variety of biological samples. However, their correct taxonomic identification still remains a challenge, and often short reads are identified, correctly or not, at several ranks of the taxonomic tree other than species or subspecies level. Every metagenomic study is designed for specific needs, and it is often complicated to find a suitable bioinformatics pipeline and reference database. There is currently a lack of systematic benchmarking of in-house methods for metagenomics. The work presented in this thesis aims to establish an approach for the in silico validation of 16S rRNA metagenomic data. A method to generate realistic in silico metagenome data that resembles project-specific sequencing data is presented, including a new process to generate synthetic negative controls for amplicon data, which can be employed regularly to assess the appropriateness and optimisation of methods for specific metagenomic projects. To aid the benchmarking process, new metrics have been defined based on a measure of taxonomic distance. A k-mer based method with the lowest common ancestor approach was selected to investigate a range of factors that influence meta-taxonomic classification success. It includes the comparison of database quality filtered at various levels, and as well as a comparison of different taxonomic annotation methodologies. The experimental findings reveal the importance of having highly curated taxonomic annotations of the genetic sequences in the database, and that a missing fraction of the tree of life can lead to misclassification of any related or unrelated organisms. In some cases, it is shown that longer reads can help to improve assignment, with mutations and sequencing errors having a relatively low negative impact. The marker gene 16S rRNA has well-defined conserved and variable regions, which help to distinguish species. Therefore, these regions were studied and also recalculated using information theory, to investigate which parts of the sequence are discriminative for metagenomic taxonomic identification. In addition, linguistics methods, Term Frequency — Inverse Document Frequency (TF-IDF) coupled with multinomial naive Bayes, is shown to provide understanding of genetic signatures and is applied to generate a new method to classify taxonomically metagenomics short reads. Biological samples were taken from cattle respiratory tract, DNA was extracted and sequenced to provide metagenomic data. Two sets of experiments were carried out, (i) to compare sampling and extraction methods and (ii) to characterise the microbial community observed in young cattle in the different lung lobes and nose. The data reveal that the composition of the microbial community observed is highly dependent on the sampling method
    corecore