25 research outputs found

    Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data

    Get PDF
    Motivation: High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. Results: We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. Availability: The R package absfilter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    Understanding Mechanisms of Translation and Transcription

    No full text
    Genomics has recently entered the realm of Big Data, and the last decade has seen an explosion in genome sequencing and assembly. The age of Big Data has also become synonymous with deep learning, and various deep network architectures have been developed to tackle genome annotation problems. At the same time, new exciting techniques have emerged, which allow the sequencing of only the portions of the RNA being actively translated by the ribosomes (ribosome profiling), and sequencing the RNA from individual cells (scRNA-seq). This thesis takes advantage of recent advances in genomics, describing new methods and algorithms to improve the understanding of translation and genetic encoding biases, as well algorithms to improve the annotation on genome and single cell levels. Our algorithm to determine the rates of translation of codons using ribosome profiling data from yeast generated the first measurement of the differential rate of translation of all 61 codons in vivo. We developed several analytic approaches to demonstrate that prokaryotic coding regions have little specific depletion of Shine-Dalgarno motifs. We used highly conserved regions of the 16S rRNAs to develop an algorithm to fix erroneous 16S rRNA 3' end annotations in over twelve thousand prokaryotic organisms in the NCBI Genebank. In our foray into gene annotation, we evaluated various DNA K-mer embeddings, and developed DeepAnnotator, a deep learning architecture for genome annotation which achieved an F-score of 94%. We then turned to automatic annotation of cell phase in scRNA-seq data, describing Pre-Phaser, which established a general computational approach for precise cell phase assignment using k nearest neighbors. Finally, to pursue the goal of novel transcript and protein detection, we developed a statistical framework to identify all likely frameshift positions in a genome, as well as a frameshift simulator for the ribosome profiling data to verify our algorithm

    Enrichment of rare codons at 5' ends of genes is a spandrel caused by evolutionary sequence turnover and does not improve translation

    No full text
    Previously, Tuller et al. found that the first 30–50 codons of the genes of yeast and other eukaryotes are slightly enriched for rare codons. They argued that this slowed translation, and was adaptive because it queued ribosomes to prevent collisions. Today, the translational speeds of different codons are known, and indeed rare codons are translated slowly. We re-examined this 5’ slow translation ‘ramp.’ We confirm that 5’ regions are slightly enriched for rare codons; in addition, they are depleted for downstream Start codons (which are fast), with both effects contributing to slow 5’ translation. However, we also find that the 5’ (and 3’) ends of yeast genes are poorly conserved in evolution, suggesting that they are unstable and turnover relatively rapidly. When a new 5’ end forms de novo, it is likely to include codons that would otherwise be rare. Because evolution has had a relatively short time to select against these codons, 5’ ends are typically slightly enriched for rare, slow codons. Opposite to the expectation of Tuller et al., we show by direct experiment that genes with slowly translated codons at the 5’ end are expressed relatively poorly, and that substituting faster synonymous codons improves expression. Direct experiment shows that slow codons do not prevent downstream ribosome collisions. Further informatic studies suggest that for natural genes, slow 5’ ends are correlated with poor gene expression, opposite to the expectation of Tuller et al. Thus, we conclude that slow 5’ translation is a ‘spandrel’--a non-adaptive consequence of something else, in this case, the turnover of 5’ ends in evolution, and it does not improve translation

    Re-annotation of 12,495 prokaryotic 16S rRNA 3' ends and analysis of Shine-Dalgarno and anti-Shine-Dalgarno sequences.

    No full text
    We examined 20,648 prokaryotic unique taxids with respect to the annotation of the 3' end of the 16S rRNA, which contains the anti-Shine-Dalgarno sequence. We used the sequence of highly conserved helix 45 of the 16S rRNA as a guide. By this criterion, 8,153 annotated 3' ends correctly included the anti-Shine-Dalgarno sequence, but 12,495 were foreshortened or otherwise mis-annotated, missing part or all of the anti-Shine-Dalgarno sequence, which immediately follows helix 45. We re-annotated, giving a total of 20,648 16S rRNA 3' ends. The vast majority indeed contained a consensus anti-Shine-Dalgarno sequence, embedded in a highly conserved 13 base "tail". However, 128 exceptional organisms had either a variant anti-Shine-Dalgarno, or no recognizable anti-Shine-Dalgarno, in their 16S rRNA(s). For organisms both with and without an anti-Shine-Dalgarno, we identified the Shine-Dalgarno motifs actually enriched in front of each organism's open reading frames. This showed to what extent the Shine-Dalgarno motifs correlated with anti-Shine Dalgarno motifs. In general, organisms whose rRNAs lacked a perfect anti-Shine-Dalgarno motif also lacked a recognizable Shine-Dalgarno. For organisms whose 16S rRNAs contained a perfect anti-Shine-Dalgarno motif, a variety of results were obtained. We found one genus, Alteromonas, where several taxids apparently maintain two different types of 16S rRNA genes, with different, but conserved, antiSDs. The fact that some organisms do not seem to have or use Shine-Dalgarno motifs supports the idea that prokaryotes have other robust mechanisms for recognizing start codons for translation

    Measurement of average decoding rates of the 61 sense codons in vivo

    No full text
    Abstract Most amino acids can be encoded by several synonymous codons, which are used at unequal frequencies. The significance of unequal codon usage remains unclear. One hypothesis is that frequent codons are translated relatively rapidly. However, there is little direct, in vivo, evidence regarding codon-specific translation rates. In this study, we generate high-coverage data using ribosome profiling in yeast, analyze using a novel algorithm, and deduce events at the A-and P-sites of the ribosome. Different codons are decoded at different rates in the A-site. In general, frequent codons are decoded more quickly than rare codons, and AT-rich codons are decoded more quickly than GC-rich codons. At the P-site, proline is slow in forming peptide bonds. We also apply our algorithm to short footprints from a different conformation of the ribosome and find strong amino acid-specific (not codon-specific) effects that may reflect interactions with the exit tunnel of the ribosome

    Prokaryotic coding regions have little if any specific depletion of Shine-Dalgarno motifs.

    No full text
    The Shine-Dalgarno motif occurs in front of prokaryotic start codons, and is complementary to the 3' end of the 16S ribosomal RNA. Hybridization between the Shine-Dalgarno sequence and the anti-Shine-Dalgarno region of the16S rRNA (CCUCCU) directs the ribosome to the start AUG of the mRNA for translation. Shine-Dalgarno-like motifs (AGGAGG in E. coli) are depleted from open reading frames of most prokaryotes. This may be because hybridization of the 16S rRNA at Shine-Dalgarnos inside genes would slow translation or induce internal initiation. However, we analyzed 128 species from diverse phyla where the 16S rRNA gene(s) lack the anti-Shine-Dalgarno sequence, and so the 16S rRNA is incapable of interacting with Shine-Dalgarno-like sequences. Despite this lack of an anti-Shine-Dalgarno, half of these species still displayed depletion of Shine-Dalgarno-like sequences when analyzed by previous methods. Depletion of the same G-rich sequences was seen by these methods even in eukaryotes, which do not use the Shine-Dalgarno mechanism. We suggest previous methods are partly detecting a non-specific depletion of G-rich sequences. Alternative informatics approaches show that most prokaryotes have only slight, if any, specific depletion of Shine-Dalgarno-like sequences from open reading frames. Together with recent evidence that ribosomes do not pause at ORF-internal Shine-Dalgarno motifs, these results suggest the presence of ORF-internal Shine-Dalgarno-like motifs may be inconsequential, perhaps because internal regions of prokaryotic mRNAs may be structurally "shielded" from translation initiation
    corecore