29 research outputs found

    Simplifying gene trees for easier comprehension

    Get PDF
    BACKGROUND: In the genomic age, gene trees may contain large amounts of data making them hard to read and understand. Therefore, an automated simplification is important. RESULTS: We present a simplification tool for gene trees called TreeSimplifier. Based on species tree information and HUGO gene names, it summarizes "monophyla". These monophyla correspond to subtrees of the gene tree where the evolution of a gene follows species phylogeny, and they are simplified to single leaves in the gene tree. Such a simplification may fail, for example, due to genes in the gene tree that are misplaced. In this way, misplaced genes can be identified. Optionally, our tool glosses over a limited degree of "paraphyly" in a further simplification step. In both simplification steps, species can be summarized into groups and treated as equivalent. In the present study we used our tool to derive a simplified tree of 397 leaves from a tree of 1138 leaves. Comparing the simplified tree to a "cartoon tree" created manually, we note that both agree to a high degree. CONCLUSION: Our automatic simplification tool for gene trees is fast, accurate, and effective. It yields results of similar quality as manual simplification. It should be valuable in phylogenetic studies of large protein families. The software is available at

    IsoSVM – Distinguishing isoforms and paralogs on the protein level

    Get PDF
    BACKGROUND: Recent progress in cDNA and EST sequencing is yielding a deluge of sequence data. Like database search results and proteome databases, this data gives rise to inferred protein sequences without ready access to the underlying genomic data. Analysis of this information (e.g. for EST clustering or phylogenetic reconstruction from proteome data) is hampered because it is not known if two protein sequences are isoforms (splice variants) or not (i.e. paralogs/orthologs). However, even without knowing the intron/exon structure, visual analysis of the pattern of similarity across the alignment of the two protein sequences is usually helpful since paralogs and orthologs feature substitutions with respect to each other, as opposed to isoforms, which do not. RESULTS: The IsoSVM tool introduces an automated approach to identifying isoforms on the protein level using a support vector machine (SVM) classifier. Based on three specific features used as input of the SVM classifier, it is possible to automatically identify isoforms with little effort and with an accuracy of more than 97%. We show that the SVM is superior to a radial basis function network and to a linear classifier. As an example application we use IsoSVM to estimate that a set of Xenopus laevis EST clusters consists of approximately 81% cases where sequences are each other's paralogs and 19% cases where sequences are each other's isoforms. The number of isoforms and paralogs in this allotetraploid species is of interest in the study of evolution. CONCLUSION: We developed an SVM classifier that can be used to distinguish isoforms from paralogs with high accuracy and without access to the genomic data. It can be used to analyze, for example, EST data and database search results. Our software is freely available on the Web, under the name IsoSVM

    Penicillium arizonense, a new, genome sequenced fungal species, reveals a high chemical diversity in secreted metabolites

    Get PDF
    A new soil-borne species belonging to the Penicillium section Canescentia is described, Penicillium arizonense sp. nov. (type strain CBS 141311(T) = IBT 12289(T)). The genome was sequenced and assembled into 33.7 Mb containing 12,502 predicted genes. A phylogenetic assessment based on marker genes confirmed the grouping of P. arizonense within section Canescentia. Compared to related species, P. arizonense proved to encode a high number of proteins involved in carbohydrate metabolism, in particular hemicellulases. Mining the genome for genes involved in secondary metabolite biosynthesis resulted in the identification of 62 putative biosynthetic gene clusters. Extracts of P. arizonense were analysed for secondary metabolites and austalides, pyripyropenes, tryptoquivalines, fumagillin, pseurotin A, curvulinic acid and xanthoepocin were detected. A comparative analysis against known pathways enabled the proposal of biosynthetic gene clusters in P. arizonense responsible for the synthesis of all detected compounds except curvulinic acid. The capacity to produce biomass degrading enzymes and the identification of a high chemical diversity in secreted bioactive secondary metabolites, offers a broad range of potential industrial applications for the new species P. arizonense. The description and availability of the genome sequence of P. arizonense, further provides the basis for biotechnological exploitation of this species

    Structure of Psb29/Thf1 and its association with the FtsH protease complex involved in photosystem II repair in cyanobacteria

    Get PDF
    One strategy for enhancing photosynthesis in crop plants is to improve the ability to repair photosystem II (PSII) in response to irreversible damage by light. D espite the pivotal role of thylakoid embedded FtsH protease complexes in the selective degradation of PSII subunits during repair, little is known about the factors involved in regulating FtsH exp ression. Here we show using the cyanobacterium Synechocystis sp. PCC 6803 that the Psb29 subunit, originally identified as a minor component of His tagged PSII preparations, physically interacts with FtsH complexes in vivo and is required for normal accumulation of the FtsH2/FtsH3 hetero oligo meric complex involved in PSII repair. We show using X ray crystallography that Psb29 from Thermosynechococcus elongatus has a unique fold consisting of a helical bundle and an extended C terminal heli x and contains a highly conserved region that might be involved in binding to FtsH. A similar interaction is likely to occur in Arabidopsis chloroplasts between the Psb29 homologue, termed THF1, and the FTSH2/FTSH5 complex. The direct involvement of Psb29/THF1 in Ft sH accumulation helps explain why THF1 is a target during the hypersensitive response in plants induced by pathogen i nfection. Downregulating FtsH function and the PSII repair cycle via THF1 would cont ribute to the productio

    Micropollutant bioremoval in wastewater treatment systems: from microbial population structure to function

    Get PDF
    Dissertation presented to obtain the Ph.D degree in BiologyThe continuous release of micropollutants into receiving waters due to insufficient elimination from wastewater treatment plants (WWTP) raises global concerns regarding their potential risks to the environment and human health.(...

    In silico analysis of mitochondrial proteins

    Get PDF
    Le rôle important joué par la mitochondrie dans la cellule eucaryote est admis depuis longtemps. Cependant, la composition exacte des mitochondries, ainsi que les processus biologiques qui sy déroulent restent encore largement inconnus. Deux facteurs principaux permettent dexpliquer pourquoi létude des mitochondries progresse si lentement : le manque defficacité des méthodes didentification des protéines mitochondriales et le manque de précision dans lannotation de ces protéines. En conséquence, nous avons développé un nouvel outil informatique, YimLoc, qui permet de prédire avec succès les protéines mitochondriales à partir des séquences génomiques. Cet outil intègre plusieurs indicateurs existants, et sa performance est supérieure à celle des indicateurs considérés individuellement. Nous avons analysé environ 60 génomes fongiques avec YimLoc afin de lever la controverse concernant la localisation de la bêta-oxydation dans ces organismes. Contrairement à ce qui était généralement admis, nos résultats montrent que la plupart des groupes de Fungi possèdent une bêta-oxydation mitochondriale. Ce travail met également en évidence la diversité des processus de bêta-oxydation chez les champignons, en corrélation avec leur utilisation des acides gras comme source dénergie et de carbone. De plus, nous avons étudié le composant clef de la voie de bêta-oxydation mitochondriale, lacyl-CoA déshydrogénase (ACAD), dans 250 espèces, couvrant les 3 domaines de la vie, en combinant la prédiction de la localisation subcellulaire avec la classification en sous-familles et linférence phylogénétique. Notre étude suggère que les gènes ACAD font partie dune ancienne famille qui a adopté des stratégies évolutionnaires innovatrices afin de générer un large ensemble denzymes susceptibles dutiliser la plupart des acides gras et des acides aminés. Finalement, afin de permettre la prédiction de protéines mitochondriales à partir de données autres que les séquences génomiques, nous avons développé le logiciel TESTLoc qui utilise comme données des Expressed Sequence Tags (ESTs). La performance de TESTLoc est significativement supérieure à celle de tout autre outil de prédiction connu. En plus de fournir deux nouveaux outils de prédiction de la localisation subcellulaire utilisant différents types de données, nos travaux démontrent comment lassociation de la prédiction de la localisation subcellulaire à dautres méthodes danalyse in silico permet daméliorer la connaissance des protéines mitochondriales. De plus, ces travaux proposent des hypothèses claires et faciles à vérifier par des expériences, ce qui présente un grand potentiel pour faire progresser nos connaissances des métabolismes mitochondriaux.The important role of mitochondria in the eukaryotic cell has long been appreciated, but their exact composition and the biological processes taking place in mitochondria are not yet fully understood. The two main factors that slow down the progress in this field are inefficient recognition and imprecise annotation of mitochondrial proteins. Therefore, we developed a new computational tool, YimLoc, which effectively predicts mitochondrial proteins from genomic sequences. This tool integrates the strengths of existing predictors and yields higher performance than any individual predictor. We applied YimLoc to ~60 fungal genomes in order to address the controversy about the localization of beta oxidation in these organisms. Our results show that in contrast to previous studies, most fungal groups do possess mitochondrial beta oxidation. This work also revealed the diversity of beta oxidation in fungi, which correlates with their utilization of fatty acids as energy and carbon sources. Further, we conducted an investigation of the key component of the mitochondrial beta oxidation pathway, the acyl-CoA dehydrogenase (ACAD). We combined subcellular localization prediction with subfamily classification and phylogenetic inference of ACAD enzymes from 250 species covering all three domains of life. Our study suggests that ACAD genes are an ancient family with innovative evolutionary strategies to generate a large enzyme toolset for utilizing most diverse fatty acids and amino acids. Finally, to enable the prediction of mitochondrial proteins from data beyond genome sequences, we designed the tool TESTLoc that uses expressed sequence tags (ESTs) as input. TESTLoc performs significantly better than known tools. In addition to providing two new tools for subcellular localization designed for different data, our studies demonstrate the power of combining subcellular localization prediction with other in silico analyses to gain insights into the function of mitochondrial proteins. Most importantly, this work proposes clear hypotheses that are easily testable, with great potential for advancing our knowledge of mitochondrial metabolism

    Systems analysis of minimal metabolic networks In prokaryotes

    Get PDF
    PhD Thesis in Chemical and Biological EngineeringThe complexity of living cells is staggering, as a result of billions of years of evolution through natural selection in constantly changing environments. Systems biology emerges as the preferred approach to the disentangling of this complexity by looking at living cells and their responses to environments in a holistic manner. Complete annotated sequences of genomes are now available for thousands of species of the simplest unicellular life forms known, the prokaryotes. Together with other large-scale datasets as proteomes and phenotypic screenings and a careful analysis of the literature, genome annotations allow for the reconstruction of large constraint-based models of cellular metabolism. Here, genome-scale metabolic models (GSMs) of prokaryotes are used together with other disparate large-scale datasets and literature assessments to study and predict essential components in minimal metabolic networks. A conceptual clarification is presented in a review of systems biology perspectives on minimal and simpler cells. An assessment of the biomass compositions in 71 GSMs of prokaryotes was then performed, revealing heterogeneity that impacted predictions of reaction essentiality. The integration of 33 large-scale essentiality assays with other data and literature revealed universally and conditionally essential cofactors for prokaryotes. These were used to revise predictions of essential genes and in the prediction of one biosynthetic pathway in the GSM of M. tuberculosis. Additionally, a large-scale assessment of essentiality of different metabolic subsystems was performed with 15 comparable GSMs. The results were validated with 36 large-scale experimental assays of gene essentiality. The ancestry of metabolic genes and subsystems was estimated by blasting representative genomes of all the phyla in the prokaryotic tree of life. Ancestry was correlated with essentiality in general but not with non-essentiality. Finally, a method was devised to generate minimal viable metabolic networks based on a curated and diverse universe of prokaryotic metabolic reactions. Different growth media were tested and shown to generate different networks regarding size, cofactor requirements and maximum biomass production. The results of this work are expected to contribute for fundamental investigations of core and ancestral prokaryotic metabolism and the design of modularized and controllable chassis cells.A complexidade das células vivas é surpreendente, como resultado de milhares de milhões de anos de evolução através de seleção natural em ambientes em constante mudança. A Biologia de sistemas surge como a abordagem preferencial para analisar esta complexidade por examinar as células e as suas respostas ao meio de uma forma holística. Estão hoje disponíveis sequências completas e anotadas de genomas para milhares de espécies das formas de vida unicelulares mais simples conhecidas, os procariotas. Juntamente com outros conjuntos de dados de larga escala como proteomas e triagens fenotípicas e uma análise cuidadosa da literatura, os genomas anotados permitem a reconstrução de grandes modelos do metabolismo celular baseados em restrições. Neste trabalho utilizam-se modelos metabólicos à escala genómica (GSMs) de procariotas em conjunto com outros grandes conjuntos de dados díspares e avaliações da literatura para estudar e prever componentes essenciais em redes metabólicas mínimas. Um esclarecimento conceptual é apresentado numa revisão de perspectivas da biologia de sistemas sobre células mínimas e mais simples. Segue-se uma avaliação das composições de biomassa em 71 GSMs de procariotas, revelando a heterogeneidade que afecta as previsões de essencialidade de reações. Com a integração de 33 ensaios em grande escala de essencialidade com outros dados e literatura, revelam-se cofactores essenciais universais e condicionais em procariotas. Estes foram utilizados na revisão de previsões de genes essenciais e na previsão de uma via biossintética no GSM de M. tuberculosis. Adicionalmente, foi realizada uma avaliação em larga escala de essencialidade de diferentes subsistemas metabólicos com 15 GSMs comparáveis. Os resultados foram validados com 36 ensaios experimentais de essencialidade em larga escala. A ancestralidade de genes metabólicos e subsistemas foi estimada por blast a genomas representativos de todos os filos na árvore da vida procariota. A ancestralidade revelou-se correlacionada com a essencialidade em geral, mas não com a não essencialidade. Finalmente, concebeu-se um método para gerar redes metabólicas mínimas viáveis com base num universo curado e diversificado de reações metabólicas procariotas. Diferentes meios de crescimento foram testados, mostrando-se a geração de diferentes redes em relação ao tamanho, os requisitos de cofactores e a produção de biomassa máxima. Espera-se que os resultados deste trabalho contribuam para investigações fundamentais dos metabolismos essencial e ancestral de procariotas e para o desenho de células chassis modulares e controláveis.This work was funded by FCT, the Portuguese Foundation for Science and Technology, with the grant SFRH/BD/81626/201

    Genomika bičíkovců skupiny Preaxostyla

    Get PDF
    Protists inhabiting oxygen-depleted environments have evolved various adaptation to thrive in their niches, including modified mitochondria to various degrees adapted to anaerobiosis. The most radically altered forms of these organelles (Mitochondria-Related Organelles, MROs) have completely lost their genomes and other defining features of canonical aerobic mitochondria. Anaerobic protists are often found as endobionts (parasites, mutualists, etc.) of larger organisms. The endobiotic lifestyle combined with anaerobiosis poses another source of evolutionary pressure forcing unique adaptations in the endobionts. Here we present new insights into the adaptations of an anaerobic protistan phylum Preaxostyla, especially with regard to the reductive evolution of mitochondria, which, uniquely among all known eukaryotes, led to a complete loss of the organelle in the oxymonad Monocercomonoides exilis. We have obtained M. exilis genomic assembly of good quality and completeness, as well as genomic and transcriptomic data of varying quality and completeness from 9 other Preaxostyla species. Based on extensive, thorough gene searches and functional gene annotation on these datasets, as well as phylogenetic analyses and protein localization experiments, we conclude: 1) M. exilis has completely lost the...Protista obývající prostředí chudá na kyslík si vyvinula řadu specifických adaptací, včetně modifikovaných mitochondrií, do různé míry přizpůsobených životu bez kyslíku. Nejradikálněji pozměněné typy těchto organel nazývaných MRO (Mitochondria-Related Organelles) zcela ztratily mitochondriální genom i další znaky definující kanonické aerobní mitochondrie. Anaerobní protista jsou často endobionty (tzn. parazity, mutualisty atp.) větších organizmů. Vedle anaerobního prostředí představuje endobiotický způsob života další zdroj selekčního tlaku, vyžadující unikátní adaptace včetně např. redukovaných biosyntetických schopností, nebo modifikovaného souboru povrchových proteinů. V této práci představujeme nové poznatky o adaptacích anaerobních protists kmene Preaxostyla, se zvláštním ohledem na reduktivní evoluci mitochondrií, která vedla k unikátní úplné ztrátě mitochondrie u zástupce této skupiny, druhu Monocercomonoides exilis. Získali jsme genomovou sekvenci M. exilis dobré kvality a úplnosti i genomová a transkriptomová data z 9 dalších devíti zástupců Preaxostyla. Na základě těchto dat, pečlivých funkčních anotací genů, fylogenetických analýz a experimentální lokalizace proteinů vyvozujeme tyto závěry: 1) M. exilis zcela ztratil mitochondrii. Tato ztráta byla pravděpodobně umožněna nahrazením...Katedra parazitologieDepartment of ParasitologyFaculty of SciencePřírodovědecká fakult
    corecore