69 research outputs found

    Peptide markers of aminoacyl tRNA synthetases facilitate taxa counting in metagenomic data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Taxa counting is a major problem faced by analysis of metagenomic data. The most popular method relies on analysis of 16S rRNA sequences, but some studies employ also protein based analyses. It would be advantageous to have a method that is applicable directly to short sequences, of the kind extracted from samples in modern metagenomic research. This is achieved by the technique proposed here.</p> <p>Results</p> <p>We employ specific peptides, deduced from aminoacyl tRNA synthetases, as markers for the occurrence of single genes in data. Sequences carrying these markers are aligned and compared with each other to provide a lower limit for taxa counts in metagenomic data. The method is compared with 16S rRNA searches on a set of known genomes. The taxa counting problem is analyzed mathematically and a heuristic algorithm is proposed. When applied to genomic contigs of a recent human gut microbiome study, the taxa counting method provides information on numbers of different species and strains. We then apply our method to short read data and demonstrate how it can be calibrated to cope with errors. Comparison to known databases leads to estimates of the percentage of novelties, and the type of phyla involved.</p> <p>Conclusions</p> <p>A major advantage of our method is its simplicity: it relies on searching sequences for the occurrence of just 4000 specific peptides belonging to the S61 subgroup of aaRS enzymes. When compared to other methods, it provides additional insight into the taxonomic contents of metagenomic data. Furthermore, it can be directly applied to short read data, avoiding the need for genomic contig reconstruction, and taking into account short reads that are otherwise discarded as singletons. Hence it is very suitable for a fast analysis of next generation sequencing data.</p

    Relating tissue specialization to the differentiation of expression of singleton and duplicate mouse proteins

    Get PDF
    BACKGROUND: Gene duplications have been hypothesized to be a major factor in enabling the evolution of tissue differentiation. Analyses of the expression profiles of duplicate genes in mammalian tissues have indicated that, with time, the expression patterns of duplicate genes diverge and become more tissue specific. We explored the relationship between duplication events, the time at which they took place, and both the expression breadth of the duplicated genes and the cumulative expression breadth of the gene family to which they belong. RESULTS: We show that only duplicates that arose through post-multicellularity duplication events show a tendency to become more specifically expressed, whereas such a tendency is not observed for duplicates that arose in a unicellular ancestor. Unlike the narrow expression profile of the duplicated genes, the overall expression of gene families tends to maintain a global expression pattern. CONCLUSION: The work presented here supports the view suggested by the subfunctionalization model, namely that expression divergence in different tissues, following gene duplication, promotes the retention of a gene in the genome of multicellular species. The global expression profile of the gene families suggests division of expression between family members, whose expression becomes specialized. Because specialization of expression is coupled with an increased rate of sequence divergence, it can facilitate the evolution of new, tissue-specific functions

    Emergence, development and diversification of the TGF-β signalling pathway within the animal kingdom

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The question of how genomic processes, such as gene duplication, give rise to co-ordinated organismal properties, such as emergence of new body plans, organs and lifestyles, is of importance in developmental and evolutionary biology. Herein, we focus on the diversification of the transforming growth factor-<it>β </it>(TGF-<it>β</it>) pathway – one of the fundamental and versatile metazoan signal transduction engines.</p> <p>Results</p> <p>After an investigation of 33 genomes, we show that the emergence of the TGF-<it>β </it>pathway coincided with appearance of the first known animal species. The primordial pathway repertoire consisted of four Smads and four receptors, similar to those observed in the extant genome of the early diverging tablet animal (<it>Trichoplax adhaerens</it>). We subsequently retrace duplications in ancestral genomes on the lineage leading to humans, as well as lineage-specific duplications, such as those which gave rise to novel Smads and receptors in teleost fishes. We conclude that the diversification of the TGF-<it>β </it>pathway can be parsimoniously explained according to the 2R model, with additional rounds of duplications in teleost fishes. Finally, we investigate duplications followed by accelerated evolution which gave rise to an atypical TGF-<it>β </it>pathway in free-living bacterial feeding nematodes of the genus Rhabditis.</p> <p>Conclusion</p> <p>Our results challenge the view of well-conserved developmental pathways. The TGF-<it>β </it>signal transduction engine has expanded through gene duplication, continually adopting new functions, as animals grew in anatomical complexity, colonized new environments, and developed an active immune system.</p

    Metabolic-network-driven analysis of bacterial ecological strategies

    Get PDF
    Bacterial ecological strategies revealed by metabolic network analysis show that ecological diversity correlates with metabolic flexibility, faster growth rate and intense co-habitation

    Genetic and metabolic analyses of Candidatus Liberibacter solanacearum infecting carrot

    Get PDF
    Insect-vectored plant bacterial pathogens are gaining attention in recent years due to crop threatening outbreaks around the world. Candidatus Liberibacter spp. are infecting crops of different botanical families: Solanaceae, Rutaceae, and Apiaceae and are vectored by psyllids. Five genetic haplotypes (A-E) have been described thus far for the species Ca. Liberibacter solanacearum (Lso). Haplotypes A and B infecting solanaceous plants, haplotypes C-E infecting Apiaceae crops. To better understand the genetic basis that governs host specificity of Lso haplotypes, we sequenced the genome of haplotype D (LsoD). The LsoD genome size is 1.23 Mbp, with a GC content of 34.8% and 1167 predicted genes. Enzyme Commission (EC) numbers were assigned using the JGI software tool and 358 ECs were identified. ECs were mapped to metabolic pathways and compared with other sequenced Liberibacters. Phylogenetic analysis based on ECs and assigned metabolic pathways shows that LsoD groups together with Lso haplotypes (A and B) and is clearly different than Liberibacter species infecting citrus. Differences between LsoD and LsoA/B haplotypes were also found, hinting on host specific enzymes. The LsoD genome was also scanned to identify putatively secreted proteins using the SignalP tool. Thirty-one putative genes were identified, most of them with unknown function. While some genes have homologous in other Lso haplotypes, some were unique to LsoD. By quantitative-PCR we examined the expression of the putatively secreted proteins in the different hosts; the psyllid vector Bactericera trigonica, and carrot. Several genes with significantly higher expression levels in carrot compared with psyllid and vice versa were identified. These genes may have host specific functions. Overall, our analyses reveal genetic and metabolic elements differentiating the carrot-infecting Lso from Lso haplotypes infecting potato/tomato. Research is underway to identify the function of these elements

    Denoising inferred functional association networks obtained by gene fusion analysis.

    Get PDF
    BACKGROUND: Gene fusion detection - also known as the 'Rosetta Stone' method - involves the identification of fused composite genes in a set of reference genomes, which indicates potential interactions between its un-fused counterpart genes in query genomes. The precision of this method typically improves with an ever-increasing number of reference genomes. RESULTS: In order to explore the usefulness and scope of this approach for protein interaction prediction and generate a high-quality, non-redundant set of interacting pairs of proteins across a wide taxonomic range, we have exhaustively performed gene fusion analysis for 184 genomes using an efficient variant of a previously developed protocol. By analyzing interaction graphs and applying a threshold that limits the maximum number of possible interactions within the largest graph components, we show that we can reduce the number of implausible interactions due to the detection of promiscuous domains. With this generally applicable approach, we generate a robust set of over 2 million distinct and testable interactions encompassing 696,894 proteins in 184 species or strains, most of which have never been the subject of high-throughput experimental proteomics. We investigate the cumulative effect of increasing numbers of genomes on the fidelity and quantity of predictions, and show that, for large numbers of genomes, predictions do not become saturated but continue to grow linearly, for the majority of the species. We also examine the percentage of component (and composite) proteins with relation to the number of genes and further validate the functional categories that are highly represented in this robust set of detected genome-wide interactions. CONCLUSION: We illustrate the phylogenetic and functional diversity of gene fusion events across genomes, and their usefulness for accurate prediction of protein interaction and function

    Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins

    Get PDF
    BACKGROUND: The combination of complete genome sequence information with expression data enables us to characterize the relationship between a protein's evolutionary origin or functional category and its expression pattern. In this study, mouse proteins were assigned into functional and phyletic groups and the gene expression patterns of the different protein groupings were examined by microarray analysis in various mouse tissues. RESULTS: Our results suggest that the proteins that are universally distributed in all tissues are predominantly enzymes and transporters. In contrast, the tissue-specific set is dominated by regulatory proteins (signal transduction and transcription factors). An increased tendency to tissue-specificity is observed for metazoan-specific proteins. As the composition of the phyletic groups highly correlates with that of the functional groups, the data were tested in order to determine which of the two factors - function or phyletic age - is dominant in shaping the expression profile of a protein. The observed differences in expression patterns of genes between functional groups were found mainly to reflect their different phyletic origin. The connection between tissue specificity and phyletic age cannot be explained by the recent rate of evolution. Finally, although metazoan-specific proteins tend to be tissue-specific compared with phyletically conserved proteins present in all domains of life, many such 'universal' proteins are also tissue-specific. CONCLUSION: The minimal cellular transcriptome of the metazoan cell differs from that of the ancestral unicellular eukaryote: new functions were added (metazoan-specific proteins), whilst other functions became specialized and no longer took place in all cells (tissue-specific pre-metazoan proteins)

    Construction, visualisation, and clustering of transcription networks from microarray expression data.

    Get PDF
    Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account. Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins. However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context. We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data. These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions. Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets. This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques. This work has been implemented in a freely available open-source application named BioLayout Express(3D)

    Genome Analysis of Haplotype D of Candidatus Liberibacter Solanacearum

    Get PDF
    Candidatus Liberibacter solanacearum (Lso) haplotype D (LsoD) is a suspected bacterial pathogen, spread by the phloem-feeding psyllid Bactericera trigonica Hodkinson and found to infect carrot plants throughout the Mediterranean. Haplotype D is one of six haplotypes of Lso that each have specific and overlapping host preferences, disease symptoms, and psyllid vectors. Genotyping of rRNA genes has allowed for tracking the haplotype diversity of Lso and genome sequencing of several haplotypes has been performed to advance a comprehensive understanding of Lso diseases and of the phylogenetic relationships among the haplotypes. To further pursue that aim we have sequenced the genome of LsoD from its psyllid vector and report here its draft genome. Genome-based single nucleotide polymorphism analysis indicates LsoD is most closely related to the A haplotype. Genomic features and the metabolic potential of LsoD are assessed in relation to Lso haplotypes A, B, and C, as well as the facultative strain Liberibacter crescens. We identify genes unique to haplotype D as well as putative secreted effectors that may play a role in disease characteristics specific to this haplotype of Lso
    corecore