82 research outputs found

    Automated simultaneous analysis phylogenetics (ASAP) : an enabling tool for phlyogenomics

    Get PDF
    Β© 2008 Sarkar et al. This is an Open Access article distributed under the terms of the Creative Commons Attribution License 2.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The definitive version was published in BMC Bioinformatics 9 (2008): 103, doi:10.1186/1471-2105-9-103.The availability of sequences from whole genomes to reconstruct the tree of life has the potential to enable the development of phylogenomic hypotheses in ways that have not been before possible. A significant bottleneck in the analysis of genomic-scale views of the tree of life is the time required for manual curation of genomic data into multi-gene phylogenetic matrices. To keep pace with the exponentially growing volume of molecular data in the genomic era, we have developed an automated technique, ASAP (Automated Simultaneous Analysis Phylogenetics), to assemble these multigene/multi species matrices and to evaluate the significance of individual genes within the context of a given phylogenetic hypothesis. Applications of ASAP may enable scientists to re-evaluate species relationships and to develop new phylogenomic hypotheses based on genome-scale data.This work is funded in part by NSF DBI-0421604 to GC and RD. INS is supported in part by the Ellison Medical Foundation

    Crystal Structure of Legionella DotD: Insights into the Relationship between Type IVB and Type II/III Secretion Systems

    Get PDF
    The Dot/Icm type IVB secretion system (T4BSS) is a pivotal determinant of Legionella pneumophila pathogenesis. L. pneumophila translocate more than 100 effector proteins into host cytoplasm using Dot/Icm T4BSS, modulating host cellular functions to establish a replicative niche within host cells. The T4BSS core complex spanning the inner and outer membranes is thought to be made up of at least five proteins: DotC, DotD, DotF, DotG and DotH. DotH is the outer membrane protein; its targeting depends on lipoproteins DotC and DotD. However, the core complex structure and assembly mechanism are still unknown. Here, we report the crystal structure of DotD at 2.0 Γ… resolution. The structure of DotD is distinct from that of VirB7, the outer membrane lipoprotein of the type IVA secretion system. In contrast, the C-terminal domain of DotD is remarkably similar to the N-terminal subdomain of secretins, the integral outer membrane proteins that form substrate conduits for the type II and the type III secretion systems (T2SS and T3SS). A short Ξ²-segment in the otherwise disordered N-terminal region, located on the hydrophobic cleft of the C-terminal domain, is essential for outer membrane targeting of DotH and Dot/Icm T4BSS core complex formation. These findings uncover an intriguing link between T4BSS and T2SS/T3SS

    Single-Nucleotide Polymorphism Genotyping Identifies a Locally Endemic Clone of Methicillin-Resistant Staphylococcus aureus

    Get PDF
    We developed, tested, and applied a TaqMan real-time PCR assay for interrogation of three single-nucleotide polymorphisms that differentiate a clade (termed β€˜t003-X’) within the radiation of methicillin-resistant Staphylococcus aureus (MRSA) ST225. The TaqMan assay achieved 98% typeability and results were fully concordant with DNA sequencing. By applying this assay to 305 ST225 isolates from an international collection, we demonstrate that clade t003-X is endemic in a single acute-care hospital in Germany at least since 2006, where it has caused a substantial proportion of infections. The strain was also detected in another hospital located 16 kilometers away. Strikingly, however, clade t003-X was not found in 62 other hospitals throughout Germany nor among isolates from other countries, and, hence, displayed a very restricted geographical distribution. Consequently, our results show that SNP-typing may be useful to identify and track MRSA clones that are specific to individual healthcare institutions. In contrast, the spatial dissemination pattern observed here had not been resolved by other typing procedures, including multilocus sequence typing (MLST), spa typing, DNA macrorestriction, and multilocus variable-number tandem repeat analysis (MLVA)

    The Evolution of the Major Hepatitis C Genotypes Correlates with Clinical Response to Interferon Therapy

    Get PDF
    Patients chronically infected with hepatitis C virus (HCV) require significantly different durations of therapy and achieve substantially different sustained virologic response rates to interferon-based therapies, depending on the HCV genotype with which they are infected. There currently exists no systematic framework that explains these genotype-specific response rates. Since humans are the only known natural hosts for HCV-a virus that is at least hundreds of years old-one possibility is that over the time frame of this relationship, HCV accumulated adaptive mutations that confer increasing resistance to the human immune system. Given that interferon therapy functions by triggering an immune response, we hypothesized that clinical response rates are a reflection of viral evolutionary adaptations to the immune system.We have performed the first phylogenetic analysis to include all available full-length HCV genomic sequences (n = 345). This resulted in a new cladogram of HCV. This tree establishes for the first time the relative evolutionary ages of the major HCV genotypes. The outcome data from prospective clinical trials that studied interferon and ribavirin therapy was then mapped onto this new tree. This mapping revealed a correlation between genotype-specific responses to therapy and respective genotype age. This correlation allows us to predict that genotypes 5 and 6, for which there currently are no published prospective trials, will likely have intermediate response rates, similar to genotype 3. Ancestral protein sequence reconstruction was also performed, which identified the HCV proteins E2 and NS5A as potential determinants of genotype-specific clinical outcome. Biochemical studies have independently identified these same two proteins as having genotype-specific abilities to inhibit the innate immune factor double-stranded RNA-dependent protein kinase (PKR).An evolutionary analysis of all available HCV genomes supports the hypothesis that immune selection was a significant driving force in the divergence of the major HCV genotypes and that viral factors that acquired the ability to inhibit the immune response may play a role in determining genotype-specific response rates to interferon therapy

    Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data

    Get PDF
    BACKGROUND: A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. METHODOLOGY/PRINCIPAL FINDINGS: We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. CONCLUSIONS: Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide insights in erroneous and missed annotations

    An Integrated Approach for Finding Overlooked Genes in Shigella

    Get PDF
    Background: The completion of numerous genome sequences introduced an era of whole-genome study. However, many genes are missed during genome annotation, including small RNAs (sRNAs) and small open reading frames (sORFs). In order to improve genome annotation, we aimed to identify novel sRNAs and sORFs in Shigella, the principal etiologic agents of bacillary dysentery. Methodology/Principal Findings: We identified 64 sRNAs in Shigella, which were experimentally validated in other bacteria based on sequence conservation. We employed computer-based and tiling array-based methods to search for sRNAs, followed by RT-PCR and northern blots, to identify nine sRNAs in Shigella flexneri strain 301 (Sf301) and 256 regions containing possible sRNA genes. We found 29 candidate sORFs using bioinformatic prediction, array hybridization and RT-PCR verification. We experimentally validated 557 (57.9%) DOOR operon predictions in the chromosomes of Sf301 and 46 (76.7%) in virulence plasmid.We found 40 additional co-expressed gene pairs that were not predicted by DOOR. Conclusions/Significance: We provide an updated and comprehensive annotation of the Shigella genome. Our study increased the expected numbers of sORFs and sRNAs, which will impact on future functional genomics and proteomics studies. Our method can be used for large scale reannotation of sRNAs and sORFs in any microbe with a known genom

    A Differentiation-Based Phylogeny of Cancer Subtypes

    Get PDF
    Histopathological classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. In this paper, we introduce a novel computational algorithm to rank tumor subtypes according to the dissimilarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia, breast cancer and liposarcoma subtypes and then apply it to a broader group of sarcomas. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors

    Identification of the Pangenome and Its Components in 14 Distinct Aggregatibacter actinomycetemcomitans Strains by Comparative Genomic Analysis

    Get PDF
    Aggregatibacter actinomycetemcomitans is genetically heterogeneous and comprises distinct clonal lineages that may have different virulence potentials. However, limited information of the strain-to-strain genomic variations is available.The genome sequences of 11 A. actinomycetemcomitans strains (serotypes a-f) were generated de novo, annotated and combined with three previously sequenced genomes (serotypes a-c) for comparative genomic analysis. Two major groups were identified; serotypes a, d, e, and f, and serotypes b and c. A serotype e strain was found to be distinct from both groups. The size of the pangenome was 3,301 genes, which included 2,034 core genes and 1,267 flexible genes. The number of core genes is estimated to stabilize at 2,060, while the size of the pangenome is estimated to increase by 16 genes with every additional strain sequenced in the future. Within each strain 16.7-29.4% of the genome belonged to the flexible gene pool. Between any two strains 0.4-19.5% of the genomes were different. The genomic differences were occasionally greater for strains of the same serotypes than strains of different serotypes. Furthermore, 171 genomic islands were identified. Cumulatively, 777 strain-specific genes were found on these islands and represented 61% of the flexible gene pool.Substantial genomic differences were detected among A. actinomycetemcomitans strains. Genomic islands account for more than half of the flexible genes. The phenotype and virulence of A. actinomycetemcomitans may not be defined by any single strain. Moreover, the genomic variation within each clonal lineage of A. actinomycetemcomitans (as defined by serotype grouping) may be greater than between clonal lineages. The large genomic data set in this study will be useful to further examine the molecular basis of variable virulence among A. actinomycetemcomitans strains

    Diversity of 16S-23S rDNA Internal Transcribed Spacer (ITS) Reveals Phylogenetic Relationships in Burkholderia pseudomallei and Its Near-Neighbors

    Get PDF
    Length polymorphisms within the 16S-23S ribosomal DNA internal transcribed spacer (ITS) have been described as stable genetic markers for studying bacterial phylogenetics. In this study, we used these genetic markers to investigate phylogenetic relationships in Burkholderia pseudomallei and its near-relative species. B. pseudomallei is known as one of the most genetically recombined bacterial species. In silico analysis of multiple B. pseudomallei genomes revealed approximately four homologous rRNA operons and ITS length polymorphisms therein. We characterized ITS distribution using PCR and analyzed via a high-throughput capillary electrophoresis in 1,191 B. pseudomallei strains. Three major ITS types were identified, two of which were commonly found in most B. pseudomallei strains from the endemic areas, whereas the third one was significantly correlated with worldwide sporadic strains. Interestingly, mixtures of the two common ITS types were observed within the same strains, and at a greater incidence in Thailand than Australia suggesting that genetic recombination causes the ITS variation within species, with greater recombination frequency in Thailand. In addition, the B. mallei ITS type was common to B. pseudomallei, providing further support that B. mallei is a clone of B. pseudomallei. Other B. pseudomallei near-neighbors possessed unique and monomorphic ITS types. Our data shed light on evolutionary patterns of B. pseudomallei and its near relative species

    Culture Enriched Molecular Profiling of the Cystic Fibrosis Airway Microbiome

    Get PDF
    The microbiome of the respiratory tract, including the nasopharyngeal and oropharyngeal microbiota, is a dynamic community of microorganisms that is highly diverse. The cystic fibrosis (CF) airway microbiome refers to the polymicrobial communities present in the lower airways of CF patients. It is comprised of chronic opportunistic pathogens (such as Pseudomonas aeruginosa) and a variety of organisms derived mostly from the normal microbiota of the upper respiratory tract. The complexity of these communities has been inferred primarily from culture independent molecular profiling. As with most microbial communities it is generally assumed that most of the organisms present are not readily cultured. Our culture collection generated using more extensive cultivation approaches, reveals a more complex microbial community than that obtained by conventional CF culture methods. To directly evaluate the cultivability of the airway microbiome, we examined six samples in depth using culture-enriched molecular profiling which combines culture-based methods with the molecular profiling methods of terminal restriction fragment length polymorphisms and 16S rRNA gene sequencing. We demonstrate that combining culture-dependent and culture-independent approaches enhances the sensitivity of either approach alone. Our techniques were able to cultivate 43 of the 48 families detected by deep sequencing; the five families recovered solely by culture-independent approaches were all present at very low abundance (<0.002% total reads). 46% of the molecular signatures detected by culture from the six patients were only identified in an anaerobic environment, suggesting that a large proportion of the cultured airway community is composed of obligate anaerobes. Most significantly, using 20 growth conditions per specimen, half of which included anaerobic cultivation and extended incubation times we demonstrate that the majority of bacteria present can be cultured
    • …
    corecore