Development of Computer-aided Concepts for the Optimization of Single-Molecules and their Integration for High-Throughput Screenings

Abstract

In the field of synthetic biology, highly interdisciplinary approaches for the design and modelling of functional molecules using computer-assisted methods have become established in recent decades. These computer-assisted methods are mainly used when experimental approaches reach their limits, as computer models are able to e.g., elucidate the temporal behaviour of nucleic acid polymers or proteins by single-molecule simulations, as well as to illustrate the functional relationship of amino acid residues or nucleotides to each other. The knowledge raised by computer modelling can be used continuously to influence the further experimental process (screening), and also shape or function (rational design) of the considered molecule. Such an optimization of the biomolecules carried out by humans is often necessary, since the observed substrates for the biocatalysts and enzymes are usually synthetic (``man-made materials'', such as PET) and the evolution had no time to provide efficient biocatalysts. With regard to the computer-aided design of single-molecules, two fundamental paradigms share the supremacy in the field of synthetic biology. On the one hand, probabilistic experimental methods (e.g., evolutionary design processes such as directed evolution) are used in combination with High-Throughput Screening (HTS), on the other hand, rational, computer-aided single-molecule design methods are applied. For both topics, computer models/concepts were developed, evaluated and published. The first contribution in this thesis describes a computer-aided design approach of the Fusarium Solanie Cutinase (FsC). The activity loss of the enzyme during a longer incubation period was investigated in detail (molecular) with PET. For this purpose, Molecular Dynamics (MD) simulations of the spatial structure of FsC and a water-soluble degradation product of the synthetic substrate PET (ethylene glycol) were computed. The existing model was extended by combining it with Reduced Models. This simulation study has identified certain areas of FsC which interact very strongly with PET (ethylene glycol) and thus have a significant influence on the flexibility and structure of the enzyme. The subsequent original publication establishes a new method for the selection of High-Throughput assays for the use in protein chemistry. The selection is made via a meta-optimization of the assays to be analyzed. For this purpose, control reactions are carried out for the respective assay. The distance of the control distributions is evaluated using classical static methods such as the Kolmogorov-Smirnov test. A performance is then assigned to each assay. The described control experiments are performed before the actual experiment (screening), and the assay with the highest performance is used for further screening. By applying this generic method, high success rates can be achieved. We were able to demonstrate this experimentally using lipases and esterases as an example. In the area of green chemistry, the above-mentioned processes can be useful for finding enzymes for the degradation of synthetic materials more quickly or modifying enzymes that occur naturally in such a way that these enzymes can efficiently convert synthetic substrates after successful optimization. For this purpose, the experimental effort (consumption of materials) is kept to a minimum during the practical implementation. Especially for large-scale screenings, a prior consideration or restriction of the possible sequence-space can contribute significantly to maximizing the success rate of screenings and minimizing the total time they require. In addition to classical methods such as MD simulations in combination with reduced models, new graph-based methods for the presentation and analysis of MD simulations have been developed. For this purpose, simulations were converted into distance-dependent dynamic graphs. Based on this reduced representation, efficient algorithms for analysis were developed and tested. In particular, network motifs were investigated to determine whether this type of semantics is more suitable for describing molecular structures and interactions within MD simulations than spatial coordinates. This concept was evaluated for various MD simulations of molecules, such as water, synthetic pores, proteins, peptides and RNA structures. It has been shown that this novel form of semantics is an excellent way to describe (bio)molecular structures and their dynamics. Furthermore, an algorithm (StreAM-Tg) has been developed for the creation of motif-based Markov models, especially for the analysis of single molecule simulations of nucleic acids. This algorithm is used for the design of RNAs. The insights obtained from the analysis with StreAM-Tg (Markov models) can provide useful design recommendations for the (re)design of functional RNA. In this context, a new method was developed to quantify the environment (i.e. water; solvent context) and its influence on biomolecules in MD simulations. For this purpose, three vertex motifs were used to describe the structure of the individual water molecules. This new method offers many advantages. With this method, the structure and dynamics of water can be accurately described. For example, we were able to reproduce the thermodynamic entropy of water in the liquid and vapor phase along the vapor-liquid equilibrium curve from the triple point to the critical point. Another major field covered in this thesis is the development of new computer-aided approaches for HTS for the design of functional RNA. For the production of functional RNA (e.g., aptamers and riboswitches), an experimental, round-based HTS (like SELEX) is typically used. By using Next Generation Sequencing (NGS) in combination with the SELEX process, this design process can be studied at the nucleotide and secondary structure levels for the first time. The special feature of small RNA molecules compared to proteins is that the secondary structure (topology), with a minimum free energy, can be determined directly from the nucleotide sequence, with a high degree of certainty. Using the combination of M. Zuker's algorithm, NGS and the SELEX method, it was possible to quantify the structural diversity of individual RNA molecules under consideration of the genetic context. This combination of methods allowed the prediction of rounds in which the first ciprofloxacin-riboswitch emerged. In this example, only a simple structural comparison was made for the quantification (Levenshtein distance) of the diversity of each round. To improve this, a new representation of the RNA structure as a directed graph was modeled, which was then compared with a probabilistic subgraph isomorphism. Finally, the NGS dataset (ciprofloxacin-riboswitch) was modeled as a dynamic graph and analyzed after the occurrence of defined seven-vertex motifs. For this purpose, motif-based semantics were integrated into HTS for RNA molecules for the first time. The identified motifs could be assigned to secondary structural elements that were identified experimentally in the ciprofloxacin aptamer R10k6. Finally, all the algorithms presented were integrated into an R library, published and made available to scientists from all over the world

    Similar works