2 research outputs found

    Data-driven modelling of protein synthesis: A sequence perspective

    No full text
    Recent advances in DNA sequencing, synthesis and genetic engineering have enabled the introduction of choice DNA sequences into living cells. This is an exciting prospect for the field of industrial biotechnology, which aims at using microorganisms to produce foods, beverages, pharmaceuticals and fine- and bulk chemicals in a sustainable fashion. Biotechnologists often achieve this by genetically engineering these microorganisms to introduce novel production pathways using genes found in other strains or species. However, detailed understanding of gene expression regulation remains elusive, especially at the level of translation; thus, when it comes to writing DNA to express proteins at user-specified levels, we are still miles away.Second generation DNA sequencing technologies have made it easy and affordable to reconstruct the genomes of industrially relevant microbes, thus providing better reference sequences for genetic engineering. However, technological limitations allow for reconstructing only parts of the entire genomes unambiguously, thus requiring additional scaffolding steps to obtain genome-length reconstructions. We propose a method that improves genome scaffolding by integrating heterogeneous sources of information on genome contiguity. These methods improve the quality of genome reconstructions at the cost of a limited number of additional errors.The ease and affordability of DNA sequencing has also led to the development of a number of biological assays which exploit sequencing, among which the ribosome profiling assay. This assay allows for unprecedented examination of the process of protein synthesis by recording positions of actively translating ribosomes across thousands of living cells. We employed these data to develop data-driven models of Saccharomyces cerevisiae protein synthesis. A relatively simple model was used to re-design genes for heterologous expression; a second, more complex model yielded insights into the process of translation. Our models suggest that protein synthesis is limited at the stage of initiation, and that codon translation rates are not determined by tRNA levels alone, and appear to be sequence context-dependent.Finally, the combination of DNA synthesis and sequencing offers the possibility to perform high-throughput in vivo assays to study the effect of user-designed sequences. We used this approach to study translation initiation at Internal Ribosome Entry Sites (IRESs). We identified short sequence elements predictive of IRES activity in viruses and humans, and obtained insights into the effect of element sequence, multiplicity and position on IRES activity. We propose a high-level architecture of viral and cellular IRESs, and offer a mechanistic explanation for differences between IRES architectures of different virus types.Pattern Recognition and Bioinformatic

    Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity

    No full text
    Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements.Pattern Recognition and Bioinformatic
    corecore