125 research outputs found

    DarkHorse: a method for genome-wide prediction of horizontal gene transfer

    Get PDF
    A new approach to rapid, genome-wide identification and ranking of horizontal transfer candidate proteins is presented. The method is quantitative, reproducible, and computationally undemanding. It can be combined with genomic signature and/or phylogenetic tree-building procedures to improve accuracy and efficiency. The method is also useful for retrospective assessments of horizontal transfer prediction reliability, recognizing orthologous sequences that may have been previously overlooked or unavailable. These features are demonstrated in bacterial, archaeal, and eukaryotic examples

    Genome-wide prediction and identification of cis-natural antisense transcripts in Arabidopsis thaliana

    Get PDF
    BACKGROUND: Natural antisense transcripts (NAT) are a class of endogenous coding or non-protein-coding RNAs with sequence complementarity to other transcripts. Several lines of evidence have shown that cis- and trans-NATs may participate in a broad range of gene regulatory events. Genome-wide identification of cis-NATs in human, mouse and rice has revealed their widespread occurrence in eukaryotes. However, little is known about cis-NATs in the model plant Arabidopsis thaliana. RESULTS: We developed a new computational method to predict and identify cis-encoded NATs in Arabidopsis and found 1,340 potential NAT pairs. The expression of both sense and antisense transcripts of 957 NAT pairs was confirmed using Arabidopsis full-length cDNAs and public massively parallel signature sequencing (MPSS) data. Three known or putative Arabidopsis imprinted genes have cis-antisense transcripts. Sequences and the genomic arrangement of two Arabidopsis NAT pairs are conserved in rice. CONCLUSION: We combined information from full-length cDNAs and Arabidopsis genome annotation in our NAT prediction work and reported cis-NAT pairs that could not otherwise be identified by using one of the two datasets only. Analysis of MPSS data suggested that for most Arabidopsis cis-NAT pairs, there is predominant expression of one of the two transcripts in a tissue-specific manner

    Alternative splicing of mouse transcription factors affects their DNA-binding domain architecture and is tissue specific

    Get PDF
    BACKGROUND: Analyzing proteins in the context of all available genome and transcript sequence data has the potential to reveal functional properties not accessible through protein sequence analysis alone. To analyze the impact of alternative splicing on transcription factor (TF) protein structure, we constructed a comprehensive database of splice variants in the mouse transcriptome, called MouSDB3 containing 461 TF loci. RESULTS: Our analysis revealed that 62% of these loci in MouSDB3 have variant exons, compared to 29% of all loci. These variant TF loci contain a total of 324 alternative exons, of which 23% are in-frame. When excluded, 80% of in-frame alternative exons alter the domain architecture of the protein as computed by SMART (simple modular architecture research tool). Sixty-eight % of these exons directly affect the coding regions of domains important for TF function. Seventy-five % of the domains affected are DNA-binding domains. Tissue distribution analyses of variant mouse TFs reveal that they have more alternatively spliced forms in 14 of the 18 tissues analyzed when compared to all the loci in MouSDB3. Further, TF isoforms are homogenous within a given single tissue and are heterogeneous across different tissues, indicating their tissue specificity. CONCLUSIONS: Our study provides quantitative evidence that alternative splicing preferentially adds or deletes domains important to the DNA-binding function of the TFs. Analyses described here reveal the presence of tissue-specific alternative splicing throughout the mouse transcriptome. Our findings provide significant biological insights into control of transcription and regulation of tissue-specific gene expression by alternative splicing via creation of tissue-specific TF isoforms

    A computational investigation of kinetoplastid trans-splicing

    Get PDF
    Trans-splicing is an unusual process in which two separate RNA strands are spliced together to yield a mature mRNA. We present a novel computational approach which has an overall accuracy of 82% and can predict 92% of known trans-splicing sites. We have applied our method to chromosomes 1 and 3 of Leishmania major, with high-confidence predictions for 85% and 88% of annotated genes respectively. We suggest some extensions of our method to other systems

    Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets

    Get PDF
    BACKGROUND: A class of eukaryotic non-coding RNAs termed microRNAs (miRNAs) interact with target mRNAs by sequence complementarity to regulate their expression. The low abundance of some miRNAs and their time- and tissue-specific expression patterns make experimental miRNA identification difficult. We present here a computational method for genome-wide prediction of Arabidopsis thaliana microRNAs and their target mRNAs. This method uses characteristic features of known plant miRNAs as criteria to search for miRNAs conserved between Arabidopsis and Oryza sativa. Extensive sequence complementarity between miRNAs and their target mRNAs is used to predict miRNA-regulated Arabidopsis transcripts. RESULTS: Our prediction covered 63% of known Arabidopsis miRNAs and identified 83 new miRNAs. Evidence for the expression of 25 predicted miRNAs came from northern blots, their presence in the Arabidopsis Small RNA Project database, and massively parallel signature sequencing (MPSS) data. Putative targets functionally conserved between Arabidopsis and O. sativa were identified for most newly identified miRNAs. Independent microarray data showed that the expression levels of some mRNA targets anti-correlated with the accumulation pattern of their corresponding regulatory miRNAs. The cleavage of three target mRNAs by miRNA binding was validated in 5' RACE experiments. CONCLUSIONS: We identified new plant miRNAs conserved between Arabidopsis and O. sativa and report a wide range of transcripts as potential miRNA targets. Because MPSS data are generated from polyadenylated RNA molecules, our results suggest that at least some miRNA precursors are polyadenylated at certain stages. The broad range of putative miRNA targets indicates that miRNAs participate in the regulation of a variety of biological processes

    Special issue on data management, analysis, and mining for the life sciences

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47870/1/778_2005_Article_165.pd

    A functional update of the Escherichia coli K-12 genome

    Get PDF
    Author Posting. © 2001 Serres et al. The definitive version was published in Genome Biology 2 (2001): research0035.1–0035.7, doi:10.1186/gb-2001-2-9-research0035.Background: Since the genome of Escherichia coli K-12 was initially annotated in 1997, additional functional information based on biological characterization and functions of sequence-similar proteins has become available. On the basis of this new information, an updated version of the annotated chromosome has been generated. Results: The E. coli K-12 chromosome is currently represented by 4,401 genes encoding 116 RNAs and 4,285 proteins. The boundaries of the genes identified in the GenBank Accession U00096 were used. Some protein-coding sequences are compound and encode multimodular proteins. The coding sequences (CDSs) are represented by modules (protein elements of at least 100 amino acids with biological activity and independent evolutionary history). There are 4,616 identified modules in the 4,285 proteins. Of these, 48.9% have been characterized, 29.5% have an imputed function, 2.1% have a phenotype and 19.5% have no function assignment. Only 7% of the modules appear unique to E. coli, and this number is expected to be reduced as more genome data becomes available. The imputed functions were assigned on the basis of manual evaluation of functions predicted by BLAST and DARWIN analyses and by the MAGPIE genome annotation system. Conclusions: Much knowledge has been gained about functions encoded by the E. coli K-12 genome since the 1997 annotation was published. The data presented here should be useful for analysis of E. coli gene products as well as gene products encoded by other genomes.This work was supported by NIH grant RO1 RR07861, the NASA Astrobiology Institute grant NCC2-1054, grants from the Edward Mallinckrodt, Jr Foundation and the Sinsheimer Foundation, and NSF grants NSF DBI - 9984882 and NSF IIS - 9996304
    • …
    corecore