315 research outputs found
ExpressInHost : A codon tuning tool for the expression of recombinant proteins in host microorganisms
Funding Information This work was performed as part of the Innovate UK project âPredictive optimisation of biocatalyst production for high-value chemical manufacturingâ (Project Number TP101439). The current position of A.R. is funded by the German federal and state programme Professorinnenprogramms III for female scientists.Peer reviewedPublisher PD
Modern Computing Techniques for Solving Genomic Problems
With the advent of high-throughput genomics, biological big data brings challenges to scientists in handling, analyzing, processing and mining this massive data. In this new interdisciplinary field, diverse theories, methods, tools and knowledge are utilized to solve a wide variety of problems. As an exploration, this dissertation project is designed to combine concepts and principles in multiple areas, including signal processing, information-coding theory, artificial intelligence and cloud computing, in order to solve the following problems in computational biology: (1) comparative gene structure detection, (2) DNA sequence annotation, (3) investigation of CpG islands (CGIs) for epigenetic studies. Briefly, in problem #1, sequences are transformed into signal series or binary codes. Similar to the speech/voice recognition, similarity is calculated between two signal series and subsequently signals are stitched/matched into a temporal sequence. In the nature of binary operation, all calculations/steps can be performed in an efficient and accurate way. Improving performance in terms of accuracy and specificity is the key for a comparative method. In problem #2, DNA sequences are encoded and transformed into numeric representations for deep learning methods. Encoding schemes greatly influence the performance of deep learning algorithms. Finding the best encoding scheme for a particular application of deep learning is significant. Three applications (detection of protein-coding splicing sites, detection of lincRNA splicing sites and improvement of comparative gene structure identification) are used to show the computing power of deep neural networks. In problem #3, CpG sites are assigned certain energy and a Gaussian filter is applied to detection of CpG islands. By using the CpG box and Markov model, we investigate the properties of CGIs and redefine the CGIs using the emerging epigenetic data. In summary, these three problems and their solutions are not isolated; they are linked to modern techniques in such diverse areas as signal processing, information-coding theory, artificial intelligence and cloud computing. These novel methods are expected to improve the efficiency and accuracy of computational tools and bridge the gap between biology and scientific computing
Recommended from our members
Computational analysis of CpG site DNA methylation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Epigenetics is the study of factors that can change DNA and passed to next generation without change to DNA sequence. DNA methylation is one of the categories of epigenetic change. DNA methylation is the attachment of methyl group (CH3) to DNA. Most of the time it occurs in the sequences that G is followed by C known as CpG sites and by addition of methyl to the cytosine residue. As science and technology progress new data are available about individualâs DNA methylation profile in different conditions. Also new features discovered that can have role in DNA methylation. The availability of new data on DNA methylation and other features of DNA provide challenge to bioinformatics and the opportunity to discover new knowledge from existing data. In this research multiple data series were used to identify classes of methylation DNA to CpG sites. These classes are a) Never methylated CpG sites,b) Always methylated CpG sites, c) Methylated CpG sites in cancer/disease samples and non-methylated in normal samples d) Methylated CpG sites in normal samples and non-methylated in cancer/disease samples. After identification of these sites and their classes, an analysis was carried out to find the features which can better classify these sites a matrix of features was generated using four applications in EMBOSS software suite. Features matrix was also generated using the gUse/WS-PGRADE portal workflow system. In order to do this each of the four applications were grid enabled and ported to BOINC platform. The gUse portal was connected to the BOINC project via 3G-bridge. Each node in the workflow created portion of matrix and then these portions were combined together to create final matrix. This final feature matrix used in a hill climbing workflow. Hill climbing node was a JAVA program ported to BOINC platform. A Hill climbing search workflow was used to search for a subset of features that are better at classifying the CpG sites using 5 different measurements and three different classification methods: support vector machine, naĂŻve bayes and J48 decision tree. Using this approach the hill climbing search found the models which contain less than half the number of features and better classification results. It is also been demonstrated that using gUse/WS-PGRADE workflow system can provide a modular way of feature generation so adding new feature generator application can be done without changing other parts. It is also shown that using grid enabled applications can speedup both feature generation and feature subset selection. The approach used in this research for distributed workflow based feature generation is not restricted to this study and can be applied in other studies that involve feature generation. The approach also needs multiple binaries to generate portions of features. The grid enabled hill climbing search application can also be used in different context as it only requires to follow the same format of feature matrix
Recommended from our members
Gene Regulatory Compatibility in Bacteria: Consequences for Synthetic Biology and Evolution
Mechanistic understanding of gene regulation is crucial for rational engineering of new genetic systems through synthetic biology. Genetic engineering efforts in new organisms are often hampered by a lack of knowledge about how regulatory components function in new host contexts. This dissertation focuses on efforts to overcome these challenges through the development of generalizable experimental methods for studying the behavior of DNA regulatory sequences in diverse species at large-scale.
Chapter 2 describes experimental approaches for quantitatively assessing the functions of thousands of diverse natural regulatory sequences through a combination of metagenomic mining, high-throughput DNA synthesis and deep sequencing. By employing these methods in three distinct bacterial species, we revealed striking functional differences in gene regulatory capacity. We identified regulatory sequences with activity levels with activity levels spanning several orders of magnitude, which will aid in efforts to engineer diverse bacterial species. We also demonstrate functional species-selective gene circuits with programmable host behaviors that may be useful for microbial community engineering. In Chapter 3 we provide evidence for the evolution of altered stringency in Ï70-mediated transcriptional activation based on patterns of initiation and activity from promoters of diverse compositions. We show that the contrast in GC content between a regulatory element and the host genome dictates both the likelihood and the magnitude of expression. We also discuss the potential implications of this proposed mechanism on horizontal gene transfer.
The next two chapters focus on efforts aimed at extending the high-throughput methods described in earlier chapters to new organisms. Chapter 4 presents an in vitro approach for multiplexed gene expression profiling. Through the development and use of cell-free expression systems made from diverse bacteria, it was possible to rapidly acquire thousands of transcriptional measurements in small volume reactions, enabling functional comparisons of regulatory sequence function across multiple species. In Chapter 5 we characterize the restriction-modification system repertoires of several commensal bacterial species. We also describe ongoing efforts to develop methods for bypassing these systems in order to increase transformation efficiencies in species that are difficult or impossible to transform using current approaches
Targeting bifurcated methyltransferases towards user-defined DNA sequences
There would be great value in targeting methylation toward user-defined DNA sequences. Directing methylation toward single CpG sites within a genome would provide a means to examine the effects of single epigenetic alterations on cellular phenotype. The spread, erasure, or maintenance of such modifications could be examined in different cellular contexts and at different genomic loci. Further, as aberrant methylation patterns cause or are implicated in many disease states, targeted methylation might be used as a therapeutic.
Many groups have attempted to target methylation toward user-defined sites by fusing a methyltransferase enzyme to a sequence specific DNA binding domain. This strategy biases the methyltransferase toward specific DNA sequences, but the methyltransferase enzyme is active in the absence of the sequence specific DNA binding event. A better strategy would involve linking the DNA binding event of sequence specific proteins to the activity of the methyltransferase enzyme.
The contents of this thesis describe work on an assisted protein assembly strategy for targeting methylation to single CpG sites within a genome. This strategy utilizes naturally or unnaturally bifurcated methyltransferases fused to zinc fingers to affect reassembly over a desired site. The bifurcated methyltransferases are engineered to have reduced affinity for each other and/or for DNA, preventing unassisted enzymatic reassembly at non-targeted CpG sites. Zinc finger binding to sequences flanking an internal CpG site increase the local concentration of these assembly-deficient, bifurcated methyltransferases, enabling enzymatic reassembly and methylation only over the targeted CpG site.
In Chapter 2, we demonstrate the successful implementation of this strategy for two prokaryotic methyltransferases, M.HhaI and M.SssI. Further, we elucidate design parameters important for constructing active, targeted, bifurcated methyltransferases. In Chapter 3, we describe a novel directed-evolution strategy to quickly identify optimized zinc finger-fused bifurcated methyltransferases. Importantly, we also demonstrate that substitution of bifurcated methyltransferase fragments with new zinc fingers predictably targets methylation toward new zinc finger cognate sequences. Finally, in Chapter 4, we describe successful preliminary studies in human cell lines. We demonstrate the eukaryotic expression of both fragments, targeting specific sites in a mammalian expression vector and methyltransferase activity on chromosomal DNA.
Advisor: Professor Marc Ostermeier
Readers: Professor Sarah Woodson
Professor Jeffrey J. Gra
Synthetic Dna Technology As A Tool To Generate Vaccine Immunity In The Skin
Since DNAâs ability to generate an immune response was first described over 25 years ago, much work has been done to realize DNAâs full potential as a safe and potent vaccine candidate. Renewed research has focused on continually improving the potency of the platform, which has led to advancements in electroporation, DNA formulation, and novel synthesized sequence optimizations, allowing newer âsyntheticâ DNA vaccines (SDNA) to contend as a major vaccine platform. Further insights into factors that influence SDNA vaccine outcomes are critical to achieving full potential. Here, we designed novel SDNA encoded skin-derived cytokines within the IL-36 family, to assess their impact on immunity against several viral targets. Zika virus challenge studies were also performed to assess whether observed adjuvant activity led to improved challenge outcome. The studies show that codelivery of optimized IL-36 gamma, with a non-protective dose of a Zika SDNA vaccine, can enhance immune responses, allowing for protection against challenge compared to nonadjuvanted mice. Another important area that is relatively understudied is skin delivery of SDNA vaccines. The skin is a major immune organ, and expanded applications for immunization might be possible with better understanding of its potential in the context of newer SDNA technology. To test the impact of skin vaccination on a relevant pathogen challenge, two consensus SDNA vaccines that encode a Leishmania antigen, PEPCK, were designed incorporating several genetic improvements including RNA and codon optimization and addition of a highly efficient IgE leader sequence. These were used to immunize mice intramuscularly or intradermally and analyze the resulting immunity. We observed that intradermal vaccination drove a greater number of antigen specific skin resident T cells in the skin compared to intramuscular vaccination, both at the vaccination and distal site. We further observed that mice immunized intradermally were better protected against parasite challenge and burden compared to intramuscularly immunized mice. My thesis supports the idea that the skin represents both a robust source of important immune modulators that can improve vaccination outcome and a unique site for SDNA immunization that gives rise to long lived resident immune cells which may play a crucial role in generating effective interventions against infectious agents and cancer
Human Promoter Prediction Using DNA Numerical Representation
With the emergence of genomic signal processing, numerical representation techniques for DNA alphabet set {A, G, C, T} play a key role in applying digital signal processing and machine learning techniques for processing and analysis of DNA sequences. The choice of the numerical representation of a DNA sequence affects how well the biological properties can be reflected in the numerical domain for the detection and identification of the characteristics of special regions of interest within the DNA sequence. This dissertation presents a comprehensive study of various DNA numerical and graphical representation methods and their applications in processing and analyzing long DNA sequences. Discussions on the relative merits and demerits of the various methods, experimental results and possible future developments have also been included. Another area of the research focus is on promoter prediction in human (Homo Sapiens) DNA sequences with neural network based multi classifier system using DNA numerical representation methods. In spite of the recent development of several computational methods for human promoter prediction, there is a need for performance improvement. In particular, the high false positive rate of the feature-based approaches decreases the prediction reliability and leads to erroneous results in gene annotation.To improve the prediction accuracy and reliability, DigiPromPred a numerical representation based promoter prediction system is proposed to characterize DNA alphabets in different regions of a DNA sequence.The DigiPromPred system is found to be able to predict promoters with a sensitivity of 90.8% while reducing false prediction rate for non-promoter sequences with a specificity of 90.4%. The comparative study with state-of-the-art promoter prediction systems for human chromosome 22 shows that our proposed system maintains a good balance between prediction accuracy and reliability. To reduce the system architecture and computational complexity compared to the existing system, a simple feed forward neural network classifier known as SDigiPromPred is proposed. The SDigiPromPred system is found to be able to predict promoters with a sensitivity of 87%, 87%, 99% while reducing false prediction rate for non-promoter sequences with a specificity of 92%, 94%, 99% for Human, Drosophila, and Arabidopsis sequences respectively with reconfigurable capability compared to existing system
Optimising Gene Therapy for X-linked Retinitis Pigmentosa
Purpose
Mutations in RPGRORF15 cause 70 to 90 % of the monogenetic disease X-linked retinitis pigmentosa (XLRP), making this gene a high-yield target for causal treatment with gene therapy. Due to the purine-rich, repetitive nature of the terminal ORF15 exon, maintaining transgene sequence fidelity has proven to be a road-block in translational efforts. This thesis contributes to the optimisation of a gene therapy for RPGR-XLRP in two ways: firstly, it aims to investigate codon optimization and use of mutant AAV capsids as a means to overcome the inherent instability of RPGRORF15 and increase transgene expression. Secondly, analysis of pre-treatment characteristics in a cohort of 50 RPGR-XLRP patients will assist both future prospective observational and interventional trials by determining symmetry of disease, rate of progression and suitability of outcome measures as endpoints for clinical trials.
Methods
In the first part of the thesis, Western Blot was used to quantify transgene expression in HEK293T cells transfected with codon optimised (co) or wild type (wt) RPGR plasmids as well as to detect transgene expression in mice unilaterally injected with AAV2/8.coRPGR. Immunolabeling was used to show correct localisation of codon optimised transgene to the photoreceptor cilium and to compare transduction efficiency between wild type and single mutant AAV8Y733F capsids. In the thesisâ second part, a retrospective, cross-sectional analysis of 50 patients extracted visual acuity, visual fields (I4e and III4e targets), foveal thickness and ERG data points (ISCEV standard protocol) alongside molecular genetic data. Symmetry and progression were assessed using linear regression and cross-sectional analysis, respectively. Kaplan-Meyer Curves were used to estimate cumulative âsurvivalâ of three important levels of visual function (full vision, reading ability, threshold to legal blindness) with age.
Results
HEK293T cells transfected with p.coRPGR showed an increase in protein expression (p < 0.005) and demonstrated a superior transgene stability compared to the wild type control. Three different mouse lines, C57BL/6J, C57BL/6J Rd9/Boc and Rpgr-/y, treated with AAV2/8.coRPGR showed a reliable, albeit variable transgene expression and demonstrated co-localisation with RPGR interacting protein (RPGRIP) in the connecting cilium. Mutant capsid (AAV8Y733F) failed to show a significant increase in transduction of 661W cone-like photoreceptor cells (p = 0.058). In the retrospective analysis of clinical data from XLRP patients, 73 % of exonic mutations occurred in ORF15. Yet no clear genotype-phenotype relationship could be established between mutations located in these two parts of the RPGR gene and patients with ORF15 mutations did not have a significantly different visual acuity (p = 0.9) or visual field (III4e; p = 0.6) than those with mutations in exons 1-14. Comparison of both eyes revealed a strong symmetry of degeneration in all outcome measures, with visual fields (I4e Ï = 0.99; III4e Ï = 0.96) and ERG (30 Hz flicker Ï = 0.95) exhibiting the highest symmetry. Disease progression eluded description by a simple function. Kaplan-Meier curve (KMC) analysis predicts the most severe decline in vision between the third and fourth decade of life.
Conclusions
Codon optimisation of RPGR significantly increased transgene levels in HEK293T cells compared to a wild type RPGR expression cassette. AAV2/8.coRPGR injected mouse eyes reliably expressed RPGR protein that correctly localised to the photoreceptor connecting cilium in mouse models of RPGR-XLRP.
High symmetry in all outcome measures confirm that the contralateral eye can be used as an internal control in an RPGR-XLRP gene therapy trial. The variability between patients makes an intra-individual control preferable to an inter-individual control. According to these findings, the most sensitive parameter to measure disease progression and treatment success in an interventional RPGR-XLRP trial seems to be kinetic visual field using the III4e target.
Overall, these two pillars of research contribute to the foundation enabling translation of RPGRORF15 gene therapy into a clinical trial
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
- âŠ