71 research outputs found

    Novel Algorithms for LDD Motif Search

    Get PDF
    Background: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d1,d2)-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. Results: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d1,d2) instances. Conclusions: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS)

    Novel sampling techniques for reservoir history matching optimisation and uncertainty quantification in flow prediction

    Get PDF
    Modern reservoir management has an increasing focus on accurately predicting the likely range of field recoveries. A variety of assisted history matching techniques has been developed across the research community concerned with this topic. These techniques are based on obtaining multiple models that closely reproduce the historical flow behaviour of a reservoir. The set of resulted history matched models is then used to quantify uncertainty in predicting the future performance of the reservoir and providing economic evaluations for different field development strategies. The key step in this workflow is to employ algorithms that sample the parameter space in an efficient but appropriate manner. The algorithm choice has an impact on how fast a model is obtained and how well the model fits the production data. The sampling techniques that have been developed to date include, among others, gradient based methods, evolutionary algorithms, and ensemble Kalman filter (EnKF). This thesis has investigated and further developed the following sampling and inference techniques: Particle Swarm Optimisation (PSO), Hamiltonian Monte Carlo, and Population Markov Chain Monte Carlo. The inspected techniques have the capability of navigating the parameter space and producing history matched models that can be used to quantify the uncertainty in the forecasts in a faster and more reliable way. The analysis of these techniques, compared with Neighbourhood Algorithm (NA), has shown how the different techniques affect the predicted recovery from petroleum systems and the benefits of the developed methods over the NA. The history matching problem is multi-objective in nature, with the production data possibly consisting of multiple types, coming from different wells, and collected at different times. Multiple objectives can be constructed from these data and explicitly be optimised in the multi-objective scheme. The thesis has extended the PSO to handle multi-objective history matching problems in which a number of possible conflicting objectives must be satisfied simultaneously. The benefits and efficiency of innovative multi-objective particle swarm scheme (MOPSO) are demonstrated for synthetic reservoirs. It is demonstrated that the MOPSO procedure can provide a substantial improvement in finding a diverse set of good fitting models with a fewer number of very costly forward simulations runs than the standard single objective case, depending on how the objectives are constructed. The thesis has also shown how to tackle a large number of unknown parameters through the coupling of high performance global optimisation algorithms, such as PSO, with model reduction techniques such as kernel principal component analysis (PCA), for parameterising spatially correlated random fields. The results of the PSO-PCA coupling applied to a recent SPE benchmark history matching problem have demonstrated that the approach is indeed applicable for practical problems. A comparison of PSO with the EnKF data assimilation method has been carried out and has concluded that both methods have obtained comparable results on the example case. This point reinforces the need for using a range of assisted history matching algorithms for more confidence in predictions

    Computational and in vitro study of isolated domains from fungal polyketide synthases

    Get PDF
    Diverse approaches have been explored to generate new polyketides by engineering polyketide synthases (PKS). Although it has been proven possible to produce new compounds by designed PKS, engineering strategies failed to make polyketides available via widely applicable rules and protocols. The aim of this work was the first rational engineering of an iterative highly-reducing polyketide synthase (HR-PKS). This approach was performed on the Squalestatin Tetraketide Synthase (SQTKS), which catalyses the biosynthesis of the tetraketide side chain of squalestatin-S1 53, which is a potent squalene synthase inhibitor and can be potentially used to treat serum cholesterol related diseases. Second, tenellin 62 was investigated, which is the product of the iterative Type I polyketide synthase non ribosomal peptide synthetase (PKS-NRPS) TENS. Using a combination of different in silico methods, structural models of the enoyl reductase (ER) domain of SQTKS were obtained and validated. With the generated protein models different rational engineering experiments in silico were performed, in which amino acids for the mutagenesis approach in vitro were identified. The subsequent in vitro experiments revealed that it was possible to rationally engineer the ER domain of SQTKS. In addition, the different integrated mutations showed different effects on the intrinsic programming of the ER domain. Further, the chemical selectivity and kinetic parameters of the tested di-, tri-, tetra- and heptaketide substrate were influenced in a specific way through the different mutated ER domains. In addition, the structural-biological foundations and analysis for the domain swaps between Pretenellin A Synthetase (TENS), Predesmethylbassianin A Synthetase (DMBS) and Premilitarinone C Synthetase (MILS) were investigated and validated. Through different in silico structural analyses it was possible to consider the effects of swaps on protein structure and to understand the effect of the swaps at the structural level. Additionally, the in silico analysis helped to clarify the influence of extrinsic and intrinsic programming factors

    Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays ad gene expression arrays

    Get PDF
    Myelodysplastic syndromes have increased in frequency and incidence in the American population, but patient prognosis has not significantly improved over the last decade. Such improvements could be realized if biomarkers for accurate diagnosis and prognostic stratification were successfully identified. In this study, we propose a method that associates two state-of-the-art array technologies-single nucleotide polymor-phism (SNP) array and gene expression array-with gene motifs considered transcription factor-binding sites (TFBS). We are particularly interested in SNP-containing motifs introduced by genetic variation and mutation as TFBS. The potential regulation of SNP-containing motifs affects only when certain mutations occur. These motifs can be identified from a group of co-expressed genes with copy number variation. Then, we used a sliding window to identify motif candidates near SNPs on gene sequences. The candidates were filtered by coarse thresholding and fine statistical testing. Using the regression. based LARS-EN algorithm and a level. wise sequence combination procedure, we identified 28 SNP-containing motifs as candidate TFBS. We confirmed 21 of the 28 motifs with ChIP-chip fragments in the TRANSFAC database. Another six motifs were validated by TRANSFAC via searching binding fragments on co-regulated genes. The identified motifs and their location genes can be considered potential biomarkers for myelodysplastic syndromes. Thus, our proposed method, a novel strategy for associating two data categories, is capable of integrating information from different sources to identify reliable candidate regulatory SNP-containing motifs introduced by genetic variation and mutation

    Circuit Optimisation using Device Layout Motifs

    Get PDF
    Circuit designers face great challenges as CMOS devices continue to scale to nano dimensions, in particular, stochastic variability caused by the physical properties of transistors. Stochastic variability is an undesired and uncertain component caused by fundamental phenomena associated with device structure evolution, which cannot be avoided during the manufacturing process. In order to examine the problem of variability at atomic levels, the 'Motif' concept, defined as a set of repeating patterns of fundamental geometrical forms used as design units, is proposed to capture the presence of statistical variability and improve the device/circuit layout regularity. A set of 3D motifs with stochastic variability are investigated and performed by technology computer aided design simulations. The statistical motifs compact model is used to bridge between device technology and circuit design. The statistical variability information is transferred into motifs' compact model in order to facilitate variation-aware circuit designs. The uniform motif compact model extraction is performed by a novel two-step evolutionary algorithm. The proposed extraction method overcomes the drawbacks of conventional extraction methods of poor convergence without good initial conditions and the difficulty of simulating multi-objective optimisations. After uniform motif compact models are obtained, the statistical variability information is injected into these compact models to generate the final motif statistical variability model. The thesis also considers the influence of different choices of motif for each device on circuit performance and its statistical variability characteristics. A set of basic logic gates is constructed using different motif choices. Results show that circuit performance and variability mitigation can benefit from specific motif permutations. A multi-stage optimisation methodology is introduced, in which the processes of optimisation are divided into several stages. Benchmark circuits show the efficacy of the proposed methods. The results presented in this thesis indicate that the proposed methods are able to provide circuit performance improvements and are able to create circuits that are more robust against variability

    Predicting multidomain protein structure and function via co-evolved amino acids and application to polyketide synthases

    Get PDF
    Proteins are an important building block of life, and they are responsible for many processes in living organisms. Therefore, understanding their functions and working mechanisms has vital importance to answer many questions about diseases and is a basis for the development of novel drugs. Three dimensional (3D) structure of proteins determine their functions; therefore, the determination of the 3D structures of proteins has been studied widely. Although many experimental techniques have been developed to determine the structures of proteins, they have limitations, especially for large protein complexes. Protein structure can help understand protein function, as can looking at conserved residues, but typically time consuming mutagenesis experiments combined with protein function assays are needed. As an alternative to the experimental methods, researchers have been working on developing computational approaches. While it is relatively easy to predict structures when the structure of a homologous protein is known, as it can be used as a template, the prediction of protein structures in the absence of a template is more challenging. For template-free predictions, coevolved amino acid residue pairs, predicted from the alignment of the homologous sequences, provided promising improvements in the field. More recently, successful implementation of the artificial neural networks, fed by the predicted coevolved residue pairs, improved the accuracy of the predicted structures further. Although there are promising developments in the coevolution based approaches, especially for the structure prediction of small/medium-sized proteins, more developments are needed for predicting protein structure, particularly of large protein complexes. Here, we show that the prediction of distances between residue pairs, via deep neural networks fed by predictions of coevolved residue pairs, improves the accuracy of structure prediction in small/medium-sized proteins. The prediction of residue pair distances, using a similar approach, in two interacting domains also allows us to predict how two domains on the same chain interact with each other. Further, we show that prediction of coevolved residue groups, via statistical coupling analysis, allows us to determine functional boundaries of domains and diverged amino acid patterns in the sub-types of the domains in a multi-domain protein complex, a polyketide synthase. We found that using predicted distances, in addition to the predicted residue pairs in contact, allows us to generate structures closer to the experimental structures, and to select them as the final models in a straightforward approach. Additionally, we reveal that the distances of the residue pairs on interacting domain pairs can be predicted accurately leading to the successful prediction of the structural interface between two interacting proteins when the interface surface is large, and the sequence alignment is comprehensive enough. Finally, we found that functional domain boundaries, which are consistent with the experimental studies, can be determined. Also, some coevolved residue groups have distinct amino acid patterns in different domain sub-types including the positions that have already known as the fingerprint motifs of the different sub-types. These approaches can be applied to predict the structures of individual domains and to predict how two domains interact with each other, which can be used to predict the structure of multi-domain proteins. The work on polyketides here demonstrates how these developments might be applied, since identifying domain boundaries and residues important for substrate specificity should aid in the design of novel polyketide synthases and thus of novel polyketides. This in itself is an important development given the commercial and medicinal importance of polyketides, but also opens the way to similar analysis on other multidomain proteins

    Computer-aided drug design and biological evaluation of novel anti-viral agents

    Get PDF
    In this thesis is presented a description of studies concerning the molecular modelling and biological evaluation of a set of novel antiviral agents for the helicase and polymerase proteins of Flaviviridae. Viruses in this family are enveloped, have positive-sense RNA and are responsible for a variety of life threatening diseases. To date neither specific antiviral treatments exist nor are there any vaccines available for Flaviviridae infection. Thus there is an urgent need for new therapies. The ultimate aim of this project was to design a coordinated in silico in vitro protocol for the design and evaluation of novel Fla viviridae inhibitors. That was achieved initially by establishing the three-dimensional structures of various Flaviviridae members by homology-based molecular modelling. In continuation, a set of small compound libraries was designed using a de novo structure-based drug design approach. Those compounds were screened in silico with the aid of molecular docking and a set of scoring algorithms. The best candidates were chosen to be chemically synthesised not part of this thesis. The genes of Hepatitis C and Dengue helicases as well as the Dengue NS3 domain helicase and protease were cloned in expression vectors and the proteins were produced and purified. A novel biological assay was then established for the Hepatitis C helicase in order to evaluate the potency of the designed inhibitors in vitro. An attempt was finally made to feedback the computer model using the biological activity data of those compounds, in order to improve the cooperation levels between the in silico and the in vitro parts of this research.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Design and Synthesis of Small Molecules to Target Protein-Protein Interactions

    Get PDF
    Protein-protein interactions (PPIs) play key regulatory roles in biological systems, and some of these are interesting drug targets. Consequently, it is important to develop generally applicable methods to identify small molecules that disrupt or disturb PPIs. One emerging approach is to use novel small molecule scaffolds to mimic protein-protein interfaces. To identify good mimics for PPI targets, a novel computational approach Exploring Key Orientations (EKO) has been developed. This thesis is focused on the design, synthesis, EKO analysis and biological applications of two interface mimic scaffolds. The first mimic, an oligo-piperidine-piperidinone (OPP) scaffold was designed and synthesized to target extended interface regions. Derivatives of this scaffold have been efficiently prepared in a divergent-convergent method. Conformational studies revealed that the OPP scaffold could exist in an extended helical conformation and it was a good mimic of ideal α-helix in solid state. Further investigations by molecular modeling indicated this scaffold could be a multi-faceted mimic for several secondary structure motifs in solution. An interesting protein target called antithrombin was discovered with EKO database mining analysis for the OPP scaffold. In biological studies, derivatives of OPP were found to interfere with oligomerization of antithrombin in a side-chain and concentration dependent manner. As an orthogonal interface mimic, a new constrained cyclic peptide-organic hybrid was also explored to target compact PPI interface regions. An anthranilic acid was incorporated in the scaffold as a turn-inducing motif. Extensive conformational analyses by 1D and 2D NMR, CD, and molecular modeling were performed and the results showed that the cyclic peptide scaffold could mimic multiple turn structures. Moreover, these new turn mimics were conformationally homogeneous in solution and their conformations had a strong and predictable correlation with side-chain stereochemistries. The scaffolds described in this thesis represent suitable scaffolds to target protein-protein interactions. Compared with traditional methods, interface mimicry approach together with EKO analysis can significantly facilitate the discovery of small molecules for protein-protein interactions

    Computer-aided drug design and biological evaluation of novel anti-viral agents.

    Get PDF
    In this thesis is presented a description of studies concerning the molecular modelling and biological evaluation of a set of novel antiviral agents for the helicase and polymerase proteins of Flaviviridae. Viruses in this family are enveloped, have positive-sense RNA and are responsible for a variety of life threatening diseases. To date neither specific antiviral treatments exist nor are there any vaccines available for Flaviviridae infection. Thus there is an urgent need for new therapies. The ultimate aim of this project was to design a coordinated in silico in vitro protocol for the design and evaluation of novel Flaviviridae inhibitors. That was achieved initially by establishing the three-dimensional structures of various Flaviviridae members by homology-based molecular modelling. In continuation, a set of small compound libraries was designed using a de novo structure-based drug design approach. Those compounds were screened in silico with the aid of molecular docking and a set of scoring algorithms. The best candidates were chosen to be chemically synthesised not part of this thesis. The genes of Hepatitis C and Dengue helicases as well as the Dengue NS3 domain helicase and protease were cloned in expression vectors and the proteins were produced and purified. A novel biological assay was then established for the Hepatitis C helicase in order to evaluate the potency of the designed inhibitors in vitro. An attempt was finally made to feedback the computer model using the biological activity data of those compounds, in order to improve the cooperation levels between the in silico and the in vitro parts of this research
    corecore