36,064 research outputs found

    A condition-specific codon optimization approach for improved heterologous gene expression in Saccharomyces cerevisiae

    Get PDF
    All authors are with the Department of Chemical Engineering, The University of Texas at Austin, 200 E Dean Keeton St. Stop C0400, Austin, TX 78712, USA -- Hal S. Alper is with the Institute for Cellular and Molecular Biology, The University of Texas at Austin, 2500 Speedway Avenue, Austin, TX 78712, USA -- Amanda M. Lanza Current Address: Bristol-Myers Squibb, Biologics Development, 35 South Street, Hopkinton, MA 01748, USABackground: Heterologous gene expression is an important tool for synthetic biology that enables metabolic engineering and the production of non-natural biologics in a variety of host organisms. The translational efficiency of heterologous genes can often be improved by optimizing synonymous codon usage to better match the host organism. However, traditional approaches for optimization neglect to take into account many factors known to influence synonymous codon distributions. Results: Here we define an alternative approach for codon optimization that utilizes systems level information and codon context for the condition under which heterologous genes are being expressed. Furthermore, we utilize a probabilistic algorithm to generate multiple variants of a given gene. We demonstrate improved translational efficiency using this condition-specific codon optimization approach with two heterologous genes, the fluorescent protein-encoding eGFP and the catechol 1,2-dioxygenase gene CatA, expressed in S. cerevisiae. For the latter case, optimization for stationary phase production resulted in nearly 2.9-fold improvements over commercial gene optimization algorithms. Conclusions: Codon optimization is now often a standard tool for protein expression, and while a variety of tools and approaches have been developed, they do not guarantee improved performance for all hosts of applications. Here, we suggest an alternative method for condition-specific codon optimization and demonstrate its utility in Saccharomyces cerevisiae as a proof of concept. However, this technique should be applicable to any organism for which gene expression data can be generated and is thus of potential interest for a variety of applications in metabolic and cellular engineering.Chemical EngineeringInstitute for Cellular and Molecular [email protected]

    Regulatory motif discovery using a population clustering evolutionary algorithm

    Get PDF
    This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

    TITER: predicting translation initiation sites by deep learning.

    Get PDF
    MotivationTranslation initiation is a key step in the regulation of gene expression. In addition to the annotated translation initiation sites (TISs), the translation process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging to predict TISs and study the underlying regulatory mechanisms. Meanwhile, the advent of several high-throughput sequencing techniques for profiling initiating ribosomes at single-nucleotide resolution, e.g. GTI-seq and QTI-seq, provides abundant data for systematically studying the general principles of translation initiation and the development of computational method for TIS identification.MethodsWe have developed a deep learning-based framework, named TITER, for accurately predicting TISs on a genome-wide scale based on QTI-seq data. TITER extracts the sequence features of translation initiation from the surrounding sequence contexts of TISs using a hybrid neural network and further integrates the prior preference of TIS codon composition into a unified prediction framework.ResultsExtensive tests demonstrated that TITER can greatly outperform the state-of-the-art prediction methods in identifying TISs. In addition, TITER was able to identify important sequence signatures for individual types of TIS codons, including a Kozak-sequence-like motif for AUG start codon. Furthermore, the TITER prediction score can be related to the strength of translation initiation in various biological scenarios, including the repressive effect of the upstream open reading frames on gene expression and the mutational effects influencing translation initiation efficiency.Availability and implementationTITER is available as an open-source software and can be downloaded from https://github.com/zhangsaithu/titer [email protected] or [email protected] informationSupplementary data are available at Bioinformatics online

    The Multifaceted Activity of the VirF Regulatory Protein in the Shigella Lifestyle

    Get PDF
    Shigella is a highly adapted human pathogen, mainly found in the developing world and causing a severe enteric syndrome. The highly sophisticated infectious strategy of Shigella banks on the capacity to invade the intestinal epithelial barrier and cause its inflammatory destruction. The cellular pathogenesis and clinical presentation of shigellosis are the sum of the complex action of a large number of bacterial virulence factors mainly located on a large virulence plasmid (pINV). The expression of pINV genes is controlled by multiple environmental stimuli through a regulatory cascade involving proteins and sRNAs encoded by both the pINV and the chromosome. The primary regulator of the virulence phenotype is VirF, a DNA-binding protein belonging to the AraC family of transcriptional regulators. The virF gene, located on the pINV, is expressed only within the host, mainly in response to the temperature transition occurring when the bacterium transits from the outer environment to the intestinal milieu. VirF then acts as anti-H-NS protein and directly activates the icsA and virB genes, triggering the full expression of the invasion program of Shigella. In this review we will focus on the structure of VirF, on its sophisticated regulation, and on its role as major player in the path leading from the non-invasive to the invasive phenotype of Shigella. We will address also the involvement of VirF in mechanisms aimed at withstanding adverse conditions inside the host, indicating that this protein is emerging as a global regulator whose action is not limited to virulence systems. Finally, we will discuss recent observations conferring VirF the potential of a novel antibacterial target for shigellosis

    From parasite genomes to one healthy world: Are we having fun yet?

    Get PDF
    In 1990, the Human Genome Sequencing Project was established. This laid the ground work for an explosion of sequence data that has since followed. As a result of this effort, the first complete genome of an animal, Caenorhabditis elegans was published in 1998. The sequence of Drosophila melanogaster was made available in March, 2000 and in the following year, working drafts of the human genome were generated with the completed sequence (92%) being released in 2003. Recent advancements and next-generation technologies have made sequencing common place and have infiltrated every aspect of biological research, including parasitology. To date, sequencing of 32 apicomplexa and 24 nematode genomes are either in progress or near completion, and over 600k nematode EST and 200k apicomplexa EST submissions fill the databases. However, the winds have shifted and efforts are now refocusing on how best to store, mine and apply these data to problem solving. Herein we tend not to summarize existing X-omics datasets or present new technological advances that promise future benefits. Rather, the information to follow condenses up-to-date-applications of existing technologies to problem solving as it relates to parasite research. Advancements in non-parasite systems are also presented with the proviso that applications to parasite research are in the making

    SEED: efficient clustering of next-generation sequences.

    Get PDF
    MotivationSimilarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.ResultsHere, we introduce SEED-an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60-85% and 21-41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12-27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.AvailabilityThe SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/[email protected] informationSupplementary data are available at Bioinformatics online

    Genetic Algorithms for the Imitation of Genomic Styles in Protein Backtranslation

    Get PDF
    Several technological applications require the translation of a protein into a nucleic acid that codes for it (``backtranslation''). The degeneracy of the genetic code makes this translation ambiguous; moreover, not every translation is equally viable. The common answer to this problem is the imitation of the codon usage of the target species. Here we discuss several other features of coding sequences (``coding statistics'') that are relevant for the ``genomic style'' of different species. A genetic algorithm is then used to obtain backtranslations that mimic these styles, by minimizing the difference in the coding statistics. Possible improvements and applications are discussed.Comment: 17 pages, 13 figures. Submitted to Theor. Comp. Scienc

    Synthetic biology—putting engineering into biology

    Get PDF
    Synthetic biology is interpreted as the engineering-driven building of increasingly complex biological entities for novel applications. Encouraged by progress in the design of artificial gene networks, de novo DNA synthesis and protein engineering, we review the case for this emerging discipline. Key aspects of an engineering approach are purpose-orientation, deep insight into the underlying scientific principles, a hierarchy of abstraction including suitable interfaces between and within the levels of the hierarchy, standardization and the separation of design and fabrication. Synthetic biology investigates possibilities to implement these requirements into the process of engineering biological systems. This is illustrated on the DNA level by the implementation of engineering-inspired artificial operations such as toggle switching, oscillating or production of spatial patterns. On the protein level, the functionally self-contained domain structure of a number of proteins suggests possibilities for essentially Lego-like recombination which can be exploited for reprogramming DNA binding domain specificities or signaling pathways. Alternatively, computational design emerges to rationally reprogram enzyme function. Finally, the increasing facility of de novo DNA synthesis—synthetic biology’s system fabrication process—supplies the possibility to implement novel designs for ever more complex systems. Some of these elements have merged to realize the first tangible synthetic biology applications in the area of manufacturing of pharmaceutical compounds.
    corecore