19 research outputs found

    A computational investigation of kinetoplastid trans-splicing

    Get PDF
    Trans-splicing is an unusual process in which two separate RNA strands are spliced together to yield a mature mRNA. We present a novel computational approach which has an overall accuracy of 82% and can predict 92% of known trans-splicing sites. We have applied our method to chromosomes 1 and 3 of Leishmania major, with high-confidence predictions for 85% and 88% of annotated genes respectively. We suggest some extensions of our method to other systems

    A functional update of the Escherichia coli K-12 genome

    Get PDF
    Author Posting. © 2001 Serres et al. The definitive version was published in Genome Biology 2 (2001): research0035.1–0035.7, doi:10.1186/gb-2001-2-9-research0035.Background: Since the genome of Escherichia coli K-12 was initially annotated in 1997, additional functional information based on biological characterization and functions of sequence-similar proteins has become available. On the basis of this new information, an updated version of the annotated chromosome has been generated. Results: The E. coli K-12 chromosome is currently represented by 4,401 genes encoding 116 RNAs and 4,285 proteins. The boundaries of the genes identified in the GenBank Accession U00096 were used. Some protein-coding sequences are compound and encode multimodular proteins. The coding sequences (CDSs) are represented by modules (protein elements of at least 100 amino acids with biological activity and independent evolutionary history). There are 4,616 identified modules in the 4,285 proteins. Of these, 48.9% have been characterized, 29.5% have an imputed function, 2.1% have a phenotype and 19.5% have no function assignment. Only 7% of the modules appear unique to E. coli, and this number is expected to be reduced as more genome data becomes available. The imputed functions were assigned on the basis of manual evaluation of functions predicted by BLAST and DARWIN analyses and by the MAGPIE genome annotation system. Conclusions: Much knowledge has been gained about functions encoded by the E. coli K-12 genome since the 1997 annotation was published. The data presented here should be useful for analysis of E. coli gene products as well as gene products encoded by other genomes.This work was supported by NIH grant RO1 RR07861, the NASA Astrobiology Institute grant NCC2-1054, grants from the Edward Mallinckrodt, Jr Foundation and the Sinsheimer Foundation, and NSF grants NSF DBI - 9984882 and NSF IIS - 9996304

    Genetic algorithm learning as a robust approach to RNA editing site prediction

    Get PDF
    BACKGROUND: RNA editing is one of several post-transcriptional modifications that may contribute to organismal complexity in the face of limited gene complement in a genome. One form, known as C → U editing, appears to exist in a wide range of organisms, but most instances of this form of RNA editing have been discovered serendipitously. With the large amount of genomic and transcriptomic data now available, a computational analysis could provide a more rapid means of identifying novel sites of C → U RNA editing. Previous efforts have had some success but also some limitations. We present a computational method for identifying C → U RNA editing sites in genomic sequences that is both robust and generalizable. We evaluate its potential use on the best data set available for these purposes: C → U editing sites in plant mitochondrial genomes. RESULTS: Our method is derived from a machine learning approach known as a genetic algorithm. REGAL (RNA Editing site prediction by Genetic Algorithm Learning) is 87% accurate when tested on three mitochondrial genomes, with an overall sensitivity of 82% and an overall specificity of 91%. REGAL's performance significantly improves on other ab initio approaches to predicting RNA editing sites in this data set. REGAL has a comparable sensitivity and higher specificity than approaches which rely on sequence homology, and it has the advantage that strong sequence conservation is not required for reliable prediction of edit sites. CONCLUSION: Our results suggest that ab initio methods can generate robust classifiers of putative edit sites, and we highlight the value of combinatorial approaches as embodied by genetic algorithms. We present REGAL as one approach with the potential to be generalized to other organisms exhibiting C → U RNA editing

    A comprehensive platform for highly multiplexed mammalian functional genetic screens

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide screening in human and mouse cells using RNA interference and open reading frame over-expression libraries is rapidly becoming a viable experimental approach for many research labs. There are a variety of gene expression modulation libraries commercially available, however, detailed and validated protocols as well as the reagents necessary for deconvolving genome-scale gene screens using these libraries are lacking. As a solution, we designed a comprehensive platform for highly multiplexed functional genetic screens in human, mouse and yeast cells using popular, commercially available gene modulation libraries. The Gene Modulation Array Platform (GMAP) is a single microarray-based detection solution for deconvolution of loss and gain-of-function pooled screens.</p> <p>Results</p> <p>Experiments with specially constructed lentiviral-based plasmid pools containing ~78,000 shRNAs demonstrated that the GMAP is capable of deconvolving genome-wide shRNA "dropout" screens. Further experiments with a larger, ~90,000 shRNA pool demonstrate that equivalent results are obtained from plasmid pools and from genomic DNA derived from lentivirus infected cells. Parallel testing of large shRNA pools using GMAP and next-generation sequencing methods revealed that the two methods provide valid and complementary approaches to deconvolution of genome-wide shRNA screens. Additional experiments demonstrated that GMAP is equivalent to similar microarray-based products when used for deconvolution of open reading frame over-expression screens.</p> <p>Conclusion</p> <p>Herein, we demonstrate four major applications for the GMAP resource, including deconvolution of pooled RNAi screens in cells with at least 90,000 distinct shRNAs. We also provide detailed methodologies for pooled shRNA screen readout using GMAP and compare next-generation sequencing to GMAP (i.e. microarray) based deconvolution methods.</p

    Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies

    Get PDF
    Using a genome-scale, lentivirally delivered shRNA library, we performed massively parallel pooled shRNA screens in 216 cancer cell lines to identify genes that are required for cell proliferation and/or viability. Cell line dependencies on 11,000 genes were interrogated by 5 shRNAs per gene. The proliferation effect of each shRNA in each cell line was assessed by transducing a population of 11M cells with one shRNA-virus per cell and determining the relative enrichment or depletion of each of the 54,000 shRNAs after 16 population doublings using Next Generation Sequencing. All the cell lines were screened using standardized conditions to best assess differential genetic dependencies across cell lines. When combined with genomic characterization of these cell lines, this dataset facilitates the linkage of genetic dependencies with specific cellular contexts (e.g., gene mutations or cell lineage). To enable such comparisons, we developed and provided a bioinformatics tool to identify linear and nonlinear correlations between these features

    The structure of genomes

    No full text

    An organism-specific method to rank predicted coding regions in Trypanosoma brucei

    No full text
    Genome annotation in differently evolved organisms presents challenges because the lack of sequence-based homology limits the ability to determine the function of putative coding regions. To provide an alternative to annotation by sequence homology, we developed a method that takes advantage of unusual trypanosomatid biology and skews in nucleotide composition between coding regions and upstream regions to rank putative open reading frames based on the likelihood of coding. The method is 93% accurate when tested on known genes. We have applied our method to the full complement of open reading frames on Chromosome I of Trypanosoma brucei, and we can predict with high confidence that 226 putative coding regions are likely to be functional. Methods such as the one described here for discriminating true coding regions are critical for genome annotation when other sources of evidence for function are limited

    Bioinformatics : A computing perspective

    No full text
    Bostonxiv, 460 p.: fig.; 23 cm
    corecore