1,923 research outputs found

    tRNA functional signatures classify plastids as late-branching cyanobacteria.

    Get PDF
    BackgroundEukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data.ResultsUsing Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data.ConclusionsPhylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies

    Use of artificial genomes in assessing methods for atypical gene detection

    Get PDF
    Parametric methods for identifying laterally transferred genes exploit the directional mutational biases unique to each genome. Yet the development of new, more robust methods - as well as the evaluation and proper implementation of existing methods - relies on an arbitrary assessment of performance using real genomes, where the evolutionary histories of genes are not known. We have used the framework of a generalized hidden Markov model to create artificial genomes modeled after genuine genomes. To model a genome, "core" genes - those displaying patterns of mutational biases shared among large numbers of genes - are identified by a novel gene clustering approach based on the Akaike information criterion. Gene models derived from multiple "core" gene clusters are used to generate an artificial genome that models the properties of a genuine genome. Chimeric artificial genomes - representing those having experienced lateral gene transfer - were created by combining genes from multiple artificial genomes, and the performance of the parametric methods for identifying "atypical" genes was assessed directly. We found that a hidden Markov model that included multiple gene models, each trained on sets of genes representing the range of genotypic variability within a genome, could produce artificial genomes that mimicked the properties of genuine genomes. Moreover, different methods for detecting foreign genes performed differently - i.e., they had different sets of strengths and weaknesses - when identifying atypical genes within chimeric artificial genomes. © 2005 Azad and Lawrence

    Analysis of spounaviruses as a case study for the overdue reclassification of tailed phages

    Get PDF
    Tailed bacteriophages are the most abundant and diverse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this diversity is confined to a single virus order-Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically diverse than once thought. This prompted us, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV), to consider overall reorganization of phage taxonomy. In this study, we used a wide range of complementary methods-including comparative genomics, core genome analysis, and marker gene phylogenetics-to show that the group of Bacillus phage SPO1-related viruses previously classified into the Spounavirinae subfamily, is clearly distinct from other members of the family Myoviridae and its diversity deserves the rank of an autonomous family. Thus, we removed this group from the Myoviridae family and created the family Herelleviridae-a new taxon of the same rank. In the process of the taxon evaluation, we explored the feasibility of different demarcation criteria and critically evaluated the usefulness of our methods for phage classification. The convergence of results, drawing a consistent and comprehensive picture of a new family with associated subfamilies, regardless of method, demonstrates that the tools applied here are particularly useful in phage taxonomy. We are convinced that creation of this novel family is a crucial milestone toward much-needed reclassification in the Caudovirales order.Peer reviewe

    ComPhy: Prokaryotic Composite Distance Phylogenies Inferred from Whole-Genome Gene Sets

    Get PDF
    doi:10.1186/1471-2105-10-S1-S5With the increasing availability of whole genome sequences, it is becoming more and more important to use complete genome sequences for inferring species phylogenies. We developed a new tool ComPhy, 'Composite Distance Phylogeny', based on a composite distance matrix calculated from the comparison of complete gene sets between genome pairs to produce a prokaryotic phylogeny. The composite distance between two genomes is defined by three components: Gene Dispersion Distance (GDD), Genome Breakpoint Distance (GBD) and Gene Content Distance (GCD). GDD quantifies the dispersion of orthologous genes along the genomic coordinates from one genome to another; GBD measures the shared breakpoints between two genomes; GCD measures the level of shared orthologs between two genomes. The phylogenetic tree is constructed from the composite distance matrix using a neighbor joining method. We tested our method on 9 datasets from 398 completely sequenced prokaryotic genomes. We have achieved above 90% agreement in quartet topologies between the tree created by our method and the tree from the Bergey's taxonomy. In comparison to several other phylogenetic analysis methods, our method showed consistently better performance. ComPhy is a fast and robust tool for genome-wide inference of evolutionary relationship among genomes."This work was supported in part by NSF/ITR-IIS-0407204.

    Bruk av Liquid Array Diagnostics (LAD) som verktøy for analyse av sammensetning og funksjon av tarmens mikrobiota

    Get PDF
    The microbial species residing in the human gut exercise vital functions for the host. They produce different metabolites that are crucial for human wellbeing. A variety of such molecules mediate signalling along the gut-brain axis, regulate host gene expression, develop and maintain intestinal and blood-brain barriers, are involved in lipogenesis and gluconeogenesis, in addition to taking part in a wide range of other functions. A deviation in the intestinal flora composition is mechanistically linked to various health disorders, including inflammatory bowel disease (IBD), irritable bowel syndrome (IBS), type 2 diabetes, Parkinson’s and Alzheimer’s disease. Such a deviation, known as dysbiosis, represents an unbalanced composition where certain microbial groups are promoted in the expense of others. These species are considered as promising biomarkers, valuable for disease diagnosis, monitoring and treatment. Of particular interest are those markers that can additionally unveil phenotypical characteristics, such as the overall level of short-chain fatty acids (SCFA) in human gut samples. The prospect of discovering additional markers is high, considering that the content of healthy human guts worldwide is not fully characterized. The field of gut microbiota is at a stage of switching focus to clinically relevant species, particularly to their rapid detection, as a means of offering simple diagnostic solutions with increased availability and accessibility. This affords putting biological findings to practical clinical use, which is often not feasible with current species identification platforms. With the intention of filling this need, the main aim of this thesis was to develop a targeted approach for rapid gut microbiota testing based on the novel Liquid Array Diagnostics (LAD) technology. LAD is adopted to target 16S rRNA gene sites unique for specific microbial groups. Requiring only commonplace qPCR instrumentation, it can detect up to 30 distinct microbial markers in a single-tube multiplex reaction within a working day. LAD’s utility in microbiome studies was validated by testing the prevalence and abundance of 15 microbial markers in 541 samples collected from mothers and their children, as reported in Paper I. Paper II, on the other hand, describes a comprehensive human gut prokaryotic genome collection, HumGut. It was built after screening thousands of human gut metagenome samples, collected from healthy people worldwide, for the presence of any high quality publicly available prokaryote genome. The main rationale for creating it was to enable functional studies through LAD-based 16S targeting. It was demonstrated that HumGut, as a reference database, aids whole genome sequencing studies by significantly increasing the number of mapped sequencing reads, thus elevating the potential for an improved taxonomic classification. However, as it is, HumGut exhibits limited practical use for 16S rRNA gene targeted approaches like LAD. This because most of the representative genomes either lack this gene, or the quality of 16S sequences is compromised (addressed in Paper III). Nonetheless, LAD was exploited to infer a segment of human gut microbiota functionality by targeting the 16S rRNA gene. This was performed based on data retrieved from 16S rDNA sequencing and short-chain fatty acid (SCFA) measurements. LAD’s value in classifying samples with disturbed SCFA ratios (namely high propionate-to-butyrate ratio) - an indication of functional dysbiosis - is presented in Paper IV. Taken together, this thesis introduces two tools, LAD and HumGut, both pointing at the direction of simplified human gut functional analysis via gut microbial composition detection.De mikrobielle artene som bor i menneskets tarm utøver vitale funksjoner for verten. De produserer forskjellige metabolitter avgjørende for menneskers helse. En rekke av disse molekylene deltar i prosesser som signaltransduksjon langs tarm-hjerne-aksen, regulering av genekspresjon, utvikling og vedlikehold av tarm- og blod-hjerne-barrieren, lipogenese og glukoneogenese, samt en rekke andre funksjoner. Avvik i tarmflorasammensetningen kan knyttes til mange ulike sykdommer og lidelser, inkludert irritabel tarm (IBS), innflammatorisk tarmsykdom (IBD), type -2 diabetes, Parkinsons og Alzheimers sykdom. Slike avvik, kjent som dysbiose, kjennetegnes av at visse mikrobielle grupper fremmes på bekostning av andre. Disse artene har potensiale som biomarkører, og kan slik være verdifulle for sykdomsdiagnose og behandling. Spesielt lovende er biomarkører i tarm som kan knyttes opp mot phenotypiske trekk, slik som kortkjedede fettsyrer (SCFA). Det antas at enda flere slike arter vil identifiseres i fremtiden, da mikrobiota-komposisjonen i sunne tarmer ikke er fullt karakterisert globalt. Mikrobiota-feltet er nå på et stadium hvor fokuset endres fra eksplorative studier til identifisering av klinisk relevante arter. Det vil da bli spesielt viktig med metoder som muliggjør rask deteksjon, da dette vil innebære enkle diagnostiske løsninger tilgjengelig for praktisk klinisk bruk, noe som ofte ikke er gjennomførbart med dagens artsidentifikasjonsplattformer. Hovedmålet med denne oppgaven var å utvikle en målrettet tilnærming for rask tarmmikrobiotatesting basert på det nye Liquid Array Diagnostics (LAD)-prinsippet. LAD er utviklet for å identifisere sekvenser i 16S rRNA-genet som er unike for spesifikke mikrobielle markører. Metoden krever kun et vanlig qPCR-instrument og kan oppdage inntil 30 forskjellige mikrobielle markører i étt enkelt test-rør i løpet av en arbeidsdag. LADs nytteverdi i mikrobiomstudier ble validert ved å teste forekomsten av 15 mikrobielle markører i 541 prøver samlet fra mødre og deres barn, som rapportert i Artikel I. Artikel II beskriver genereringen av en omfattende prokaryot genomsamling av menneskets tarm. Den ble bygget ved å screene tusenvis av metagenom fra tarmprøver samlet inn fra friske mennesker over hele verden. Metagenomene ble screenet for tilstedeværelse av alle offentlig tilgjengelige prokaryote genom. Sekvenser av dårlig kvalitet ble fjernet mens alle andre sekvenser ble samlet i én stor referansedatabase, HumGut. Hovedmålet med å lage denne referansedatabasen var å muliggjøre LAD-baserte funksjonelle studier. Det ble vist at HumGut fungerer som et nyttig verktøy for full-genoms sekvenseringsstudier ved å øke antallet artlagte sekvenseringsavlesninger betydelig, da dette gir forbedret taksonomisk klassifisering. HumGut har imidlertid begrenset nytteverdi for 16S rRNA-baserte metoder som LAD. Dette fordi de fleste genom i samlingen enten mangler dette genet fullstendig, eller har for dårlig kvalitet på 16S-sekvensene (behandlet i Artikel III). Til tross for begrensningene knyttet til 16S rRNA-genet i HumGut, ble LAD benyttet til å utvikle en 16S rDNA-basert test for måling av menneskelig tarmmikrobiotafunksjonalitet. Dette ble utført basert på data hentet fra 16S-sekvensering og målinger av kortkjedede fettsyrer (SCFA). LADs evne til å klassifisere prøver med forstyrret SCFA-forhold (nemlig høyt propionat-tilbutyrat-forhold) - en indikasjon på funksjonell dysbiose - er presentert i Artikel IV. Til sammen presenterer denne oppgaven to verktøy, LAD og HumGut, som begge peker i retning av forenklet funksjonell analyse av menneskelig tarm via deteksjon av mikrobiell sammensetning i tarmen
    corecore