88 research outputs found

    Next-Generation Transcriptome Assembly: Strategies and Performance Analysis

    Get PDF
    Accurate and comprehensive transcriptome assemblies lay the foundation for a range of analyses, such as differential gene expression analysis, metabolic pathway reconstruction, novel gene discovery, or metabolic flux analysis. With the arrival of next-generation sequencing technologies, it has become possible to acquire the whole transcriptome data rapidly even from non-model organisms. However, the problem of accurately assembling the transcriptome for any given sample remains extremely challenging, especially in species with a high prevalence of recent gene or genome duplications, those with alternative splicing of transcripts, or those whose genomes are not well studied. In this chapter, we provided a detailed overview of the strategies used for transcriptome assembly. We reviewed the different statistics available for measuring the quality of transcriptome assemblies with the emphasis on the types of errors each statistic does and does not detect. We also reviewed simulation protocols to computationally generate RNAseq data that present biologically realistic problems such as gene expression bias and alternative splicing. Using such simulated RNAseq data, we presented a comparison of the accuracy, strengths, and weaknesses of nine representative transcriptome assemblers including de novo, genome-guided, and ensemble methods

    Evolution of SET-Domain Protein Families in the Unicellular and Multicellular Ascomycota Fungi

    Get PDF
    Background: The evolution of multicellularity is accompanied by the occurrence of differentiated tissues, of organismal developmental programs, and of mechanisms keeping the balance between proliferation and differentiation. Initially, the SET-domain proteins were associated exclusively with regulation of developmental genes in metazoa. However, finding of SET-domain genes in the unicellular yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe suggested that SET-domain proteins regulate a much broader variety of biological programs. Intuitively, it is expected that the numbers, types, and biochemical specificity of SET-domain proteins of multicellular versus unicellular forms would reflect the differences in their biology. However, comparisons across the unicellular and multicellular domains of life are complicated by the lack of knowledge of the ancestral SET-domain genes. Even within the crown group, different biological systems might use the epigenetic \u27code\u27 differently, adapting it to organism-specific needs. Simplifying the model, we undertook a systematic phylogenetic analysis of one monophyletic fungal group (Ascomycetes) containing unicellular yeasts, Saccharomycotina (hemiascomycetes), and a filamentous fungal group, Pezizomycotina (euascomycetes). Results: Systematic analysis of the SET-domain genes across an entire eukaryotic phylum has outlined clear distinctions in the SET-domain gene collections in the unicellular and in the multicellular (filamentous) relatives; diversification of SET-domain gene families has increased further with the expansion and elaboration of multicellularity in animal and plant systems. We found several ascomycota-specific SET-domain gene groups; each was unique to either Saccharomycotina or Pezizomycotina fungi. Our analysis revealed that the numbers and types of SET-domain genes in the Saccharomycotina did not reflect the habitats, pathogenicity, mechanisms of sexuality, or the ability to undergo morphogenic transformations. However, novel genes have appeared for functions associated with the transition to multicellularity. Descendents of most of the SET-domain gene families found in the filamentous fungi could be traced in the genomes of extant animals and plants, albeit as more complex structural forms. Conclusion: SET-domain genes found in the filamentous species but absent from the unicellular sister group reflect two alternative evolutionary events: deletion from the yeast genomes or appearance of novel structures in filamentous fungal groups. There were no Ascomycota-specific SET-domain gene families (i.e., absent from animal and plant genomes); however, plants and animals share SET-domain gene subfamilies that do not exist in the fungi. Phylogenetic and genestructure analyses defined several animal and plant SET-domain genes as sister groups while those of fungal origin were basal to them. Plants and animals also share SET-domain subfamilies that do not exist in fungi

    Evolution of the Kdo2-lipid A biosynthesis in bacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lipid A is the highly immunoreactive endotoxic center of lipopolysaccharide (LPS). It anchors the LPS into the outer membrane of most Gram-negative bacteria. Lipid A can be recognized by animal cells, triggers defense-related responses, and causes Gram-negative sepsis. The biosynthesis of Kdo<sub>2</sub>-lipid A, the LPS substructure, involves with nine enzymatic steps.</p> <p>Results</p> <p>In order to elucidate the evolutionary pathway of Kdo<sub>2</sub>-lipid A biosynthesis, we examined the distribution of genes encoding the nine enzymes across bacteria. We found that not all Gram-negative bacteria have all nine enzymes. Some Gram-negative bacteria have no genes encoding these enzymes and others have genes only for the first four enzymes (LpxA, LpxC, LpxD, and LpxB). Among the nine enzymes, five appeared to have arisen from three independent gene duplication events. Two of such events happened within the Proteobacteria lineage, followed by functional specialization of the duplicated genes and pathway optimization in these bacteria.</p> <p>Conclusions</p> <p>The nine-enzyme pathway, which was established based on the studies mainly in <it>Escherichia coli </it>K12, appears to be the most derived and optimized form. It is found only in <it>E. coli </it>and related Proteobacteria. Simpler and probably less efficient pathways are found in other bacterial groups, with Kdo<sub>2</sub>-lipid A variants as the likely end products. The Kdo<sub>2</sub>-lipid A biosynthetic pathway exemplifies extremely plastic evolution of bacterial genomes, especially those of Proteobacteria, and how these mainly pathogenic bacteria have adapted to their environment.</p

    Codon usage in twelve species of Drosophila

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Codon usage bias (CUB), the uneven use of synonymous codons, is a ubiquitous observation in virtually all organisms examined. The pattern of codon usage is generally similar among closely related species, but differs significantly among distantly related organisms, e.g., bacteria, yeast, and <it>Drosophila</it>. Several explanations for CUB have been offered and some have been supported by observations and experiments, although a thorough understanding of the evolutionary forces (random drift, mutation bias, and selection) and their relative importance remains to be determined. The recently available complete genome DNA sequences of twelve phylogenetically defined species of <it>Drosophila</it> offer a hitherto unprecedented opportunity to examine these problems. We report here the patterns of codon usage in the twelve species and offer insights on possible evolutionary forces involved.</p> <p>Results</p> <p>(1) Codon usage is quite stable across 11/12 of the species: G- and especially C-ending codons are used most frequently, thus defining the preferred codons. (2) The only amino acid that changes in preferred codon is Serine with six species of the <it>melanogaster </it>group favoring TCC while the other species, particularly subgenus <it>Drosophila</it> species, favor AGC. (3) <it>D. willistoni </it>is an exception to these generalizations in having a shifted codon usage for seven amino acids toward A/T in the wobble position. (4) Amino acids differ in their contribution to overall CUB, Leu having the greatest and Asp the least. (5) Among two-fold degenerate amino acids, A/G ending amino acids have more selection on codon usage than T/C ending amino acids. (6) Among the different chromosome arms or elements, genes on the non-recombining element F (dot chromosome) have the least CUB, while genes on the element A (X chromosome) have the most. (7) Introns indicate that mutation bias in all species is approximately 2:1, AT:GC, the opposite of codon usage bias. (8) There is also evidence for some overall regional bias in base composition that may influence codon usage.</p> <p>Conclusion</p> <p>Overall, these results suggest that natural selection has acted on codon usage in the genus <it>Drosophila</it>, at least often enough to leave a footprint of selection in modern genomes. However, there is evidence in the data that random forces (drift and mutation) have also left patterns in the data, especially in genes under weak selection for codon usage for example genes in regions of low recombination. The documentation of codon usage patterns in each of these twelve genomes also aids in ongoing annotation efforts.</p

    Codon usage in twelve species of \u3ci\u3eDrosophila\u3c/i\u3e

    Get PDF
    Background: Codon usage bias (CUB), the uneven use of synonymous codons, is a ubiquitous observation in virtually all organisms examined. The pattern of codon usage is generally similar among closely related species, but differs significantly among distantly related organisms, e.g., bacteria, yeast, and Drosophila. Several explanations for CUB have been offered and some have been supported by observations and experiments, although a thorough understanding of the evolutionary forces (random drift, mutation bias, and selection) and their relative importance remains to be determined. The recently available complete genome DNA sequences of twelve phylogenetically defined species of Drosophila offer a hitherto unprecedented opportunity to examine these problems. We report here the patterns of codon usage in the twelve species and offer insights on possible evolutionary forces involved. Results: (1) Codon usage is quite stable across 11/12 of the species: G- and especially C-ending codons are used most frequently, thus defining the preferred codons. (2) The only amino acid that changes in preferred codon is Serine with six species of the melanogaster group favoring TCC while the other species, particularly subgenus Drosophila species, favor AGC. (3) D. willistoni is an exception to these generalizations in having a shifted codon usage for seven amino acids toward A/T in the wobble position. (4) Amino acids differ in their contribution to overall CUB, Leu having the greatest and Asp the least. (5) Among two-fold degenerate amino acids, A/G ending amino acids have more selection on codon usage than T/C ending amino acids. (6) Among the different chromosome arms or elements, genes on the non-recombining element F (dot chromosome) have the least CUB, while genes on the element A (X chromosome) have the most. (7) Introns indicate that mutation bias in all species is approximately 2:1, AT:GC, the opposite of codon usage bias. (8) There is also evidence for some overall regional bias in base composition that may influence codon usage. Conclusion: Overall, these results suggest that natural selection has acted on codon usage in the genus Drosophila, at least often enough to leave a footprint of selection in modern genomes. However, there is evidence in the data that random forces (drift and mutation) have also left patterns in the data, especially in genes under weak selection for codon usage for example genes in regions of low recombination. The documentation of codon usage patterns in each of these twelve genomes also aids in ongoing annotation efforts

    Assessing Multiple Sequence Alignments Using Visual Tools

    Get PDF
    Bioinformatics and molecular evolutionary analyses most often start with comparing DNA or amino acid sequences by aligning them. Pairwise alignment, for example, is used to measure the similarities between a query sequence and each of those in a database in BLAST similarity search, the most used bioinformatics tool (Altschul et al., 1990; Camacho et al.

    7TMRmine: a Web server for hierarchical mining of 7TMR proteins

    Get PDF
    Background: Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. Description: We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. Conclusion: 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla

    Molecular evolution of urea amidolyase and urea carboxylase in fungi

    Get PDF
    Background: Urea amidolyase breaks down urea into ammonia and carbon dioxide in a two-step process, while another enzyme, urease, does this in a one step-process. Urea amidolyase has been found only in some fungal species among eukaryotes. It contains two major domains: the amidase and urea carboxylase domains. A shorter form of urea amidolyase is known as urea carboxylase and has no amidase domain. Eukaryotic urea carboxylase has been found only in several fungal species and green algae. In order to elucidate the evolutionary origin of urea amidolyase and urea carboxylase, we studied the distribution of urea amidolyase, urea carboxylase, as well as other proteins including urease, across kingdoms. Results: Among the 64 fungal species we examined, only those in two Ascomycota classes (Sordariomycetes and Saccharomycetes) had the urea amidolyase sequences. Urea carboxylase was found in many but not all of the species in the phylum Basidiomycota and in the subphylum Pezizomycotina (phylum Ascomycota). It was completely absent from the class Saccharomycetes (phylum Ascomycota; subphylum Saccharomycotina). Four Sordariomycetes species we examined had both the urea carboxylase and the urea amidolyase sequences. Phylogenetic analysis showed that these two enzymes appeared to have gone through independent evolution since their bacterial origin. The amidase domain and the urea carboxylase domain sequences from fungal urea amidolyases clustered strongly together with the amidase and urea carboxylase sequences, respectively, from a small number of beta- and gammaproteobacteria. On the other hand, fungal urea carboxylase proteins clustered together with another copy of urea carboxylases distributed broadly among bacteria. The urease proteins were found in all the fungal species examined except for those of the subphylum Saccharomycotina. Conclusions: We conclude that the urea amidolyase genes currently found only in fungi are the results of a horizontal gene transfer event from beta-, gamma-, or related species of proteobacteria. The event took place before the divergence of the subphyla Pezizomycotina and Saccharomycotina but after the divergence of the subphylum Taphrinomycotina. Urea carboxylase genes currently found in fungi and other limited organisms were also likely derived from another ancestral gene in bacteria. Our study presented another important example showing plastic and opportunistic genome evolution in bacteria and fungi and their evolutionary interplay

    Carbon dioxide receptor genes and their expression profile in \u3ci\u3eDiabrotica virgifera virgifera\u3c/i\u3e

    Get PDF
    Background: Diabrotica virgifera virgifera, western corn rootworm, is one of the most devastating species in North America. D. v. virgifera neonates crawl through the soil to locate the roots on which they feed. Carbon dioxide (CO2) is one of the important volatile cues that attract D. v. virgifera larvae to roots. Results: In this study, we identified three putative D. v. virgifera gustatory receptor genes (Dvv_Gr1, Dvv_Gr2, and Dvv_Gr3). Phylogenetic analyses confirmed their orthologous relationships with known insect CO2 receptor genes from Drosophila, mosquitoes, and Tribolium. The phylogenetic reconstruction of insect CO2 receptor proteins and the gene expression profiles were analyzed. Quantitative analysis of gene expression indicated that the patterns of expression of these three candidate genes vary among larval tissues (i.e., head, integument, fat body, and midgut) and different development stages (i.e., egg, three larval stages, adult male and female). Conclusion: The Dvv_Gr2 gene exhibited highest expression in heads and neonates, suggesting its importance in allowing neonate larvae to orient to its host plant. Similar expression patterns across tissues and developmental stages for Dvv_Gr1 and Dvv_Gr3 suggest a potentially different role. Findings from this study will allow further exploration of the functional role of specific CO2 receptor proteins in D. v. virgifera

    Carbon dioxide receptor genes and their expression profile in \u3ci\u3eDiabrotica virgifera virgifera\u3c/i\u3e

    Get PDF
    Background: Diabrotica virgifera virgifera, western corn rootworm, is one of the most devastating species in North America. D. v. virgifera neonates crawl through the soil to locate the roots on which they feed. Carbon dioxide (CO2) is one of the important volatile cues that attract D. v. virgifera larvae to roots. Results: In this study, we identified three putative D. v. virgifera gustatory receptor genes (Dvv_Gr1, Dvv_Gr2, and Dvv_Gr3). Phylogenetic analyses confirmed their orthologous relationships with known insect CO2 receptor genes from Drosophila, mosquitoes, and Tribolium. The phylogenetic reconstruction of insect CO2 receptor proteins and the gene expression profiles were analyzed. Quantitative analysis of gene expression indicated that the patterns of expression of these three candidate genes vary among larval tissues (i.e., head, integument, fat body, and midgut) and different development stages (i.e., egg, three larval stages, adult male and female). Conclusion: The Dvv_Gr2 gene exhibited highest expression in heads and neonates, suggesting its importance in allowing neonate larvae to orient to its host plant. Similar expression patterns across tissues and developmental stages for Dvv_Gr1 and Dvv_Gr3 suggest a potentially different role. Findings from this study will allow further exploration of the functional role of specific CO2 receptor proteins in D. v. virgifera
    corecore