445 research outputs found

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Multiscale Simulation and Analysis of Structured Ribonucleic Acids

    Get PDF
    I present the results of three projects in the course of my scientific work in the context of native structure-based models (SBMs) for regulatory RNA. They comprise a new and openly accessible software implementation of native structure-based model generation and evaluation, a study that employs a multiscale model to investigate cotranscriptional riboswitch folding and advances to a novel approach in the field of RNA tertiary structure prediction

    Engineering naturally occurring trans-acting non-coding RNAs to sense molecular signals

    Get PDF
    Non-coding RNAs (ncRNAs) are versatile regulators in cellular networks. While most trans-acting ncRNAs possess well-defined mechanisms that can regulate transcription or translation, they generally lack the ability to directly sense cellular signals. In this work, we describe a set of design principles for fusing ncRNAs to RNA aptamers to engineer allosteric RNA fusion molecules that modulate the activity of ncRNAs in a ligand-inducible way in Escherichia coli. We apply these principles to ncRNA regulators that can regulate translation (IS10 ncRNA) and transcription (pT181 ncRNA), and demonstrate that our design strategy exhibits high modularity between the aptamer ligand-sensing motif and the ncRNA target-recognition motif, which allows us to reconfigure these two motifs to engineer orthogonally acting fusion molecules that respond to different ligands and regulate different targets in the same cell. Finally, we show that the same ncRNA fused with different sensing domains results in a sensory-level NOR gate that integrates multiple input signals to perform genetic logic. These ligand-sensing ncRNA regulators provide useful tools to modulate the activity of structurally related families of ncRNAs, and building upon the growing body of RNA synthetic biology, our ability to design aptamer–ncRNA fusion molecules offers new ways to engineer ligand-sensing regulatory circuits

    FUNCTION-DRIVEN APPROACHES TO THE DESIGN OF OPTOGENETIC TOOLS

    Get PDF
    Proteins play a wide variety of roles in biology despite being produced from a small set of common subunits; this commonality can be exploited to understand the dynamics by which proteins fold into structures and perform their manifold functions and, subsequently, design new proteins for use both in research and as nanoscale machines in industry. While this design process has classically involved residue-level redesign of existing protein backbones and, more recently, the de novo design of backbones according to geometrical parameters, the increasing complexity of optogenetic photosystems, biosensors, and other mechanisms for making use of proteins with specific functions has established a need for a design protocol that can reconcile their various structural exigencies with the function-specific elements of as wide an array of proteins as possible in order to make best use of them. Requirement-driven design eschews specific structural templates in favor of general requirements dependent on the intended function of the design, and so can exploit the vastness of protein structural space in finding solutions to increasingly complex design problems. Here, we present three new advances in the requirement-driven design of proteins as diagnostic tools, including a more general photosystem for the direct optogenetic control of protein-protein interactions, a series of algorithmic improvements to the leading implementation of requirement-driven design in the Rosetta macromolecular design software suite, and a new version of that algorithm capable of performing requirement-driven backbone design and residue-level backbone optimization simultaneously. These technologies collectively represent a significant improvement in our ability to control the activity of proteins with a wide variety of control schemes and produce functional proteins for arbitrary requirement sets more generally.Doctor of Philosoph

    Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

    Get PDF
    Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

    Study of complex RNA function modulated by small molecules: the development of RNA directed small molecule library and probing the S-adenosyl methionine discrimination between on and off conformational states of the SAM-I riboswitch

    Get PDF
    RNA recently remained unexploited and is now drawing interest as a potential drug target. The methodology and available drug libraries for RNA targeting/screening are in rudimentary stages. The interactions made by ligands with RNA can be explored for RNA based drug development. The dissertation is composed of 4 chapters. The first chapter focuses on the structural features of RNA and the attempts made to target RNA previously. The second chapter focuses on the development of a small molecule library enriched with substructures derived from RNA binding ligands. For this study a fragment-based approach (fragment based approach is detailed in chapter 2) is used in order to accommodate the conformational flexibility of RNA. The library molecules are used for screening against suitable RNA targets using NMR. We identified at least 5 ligands out of which 2 are novel ligands binding to the ribosomal 16s rRNA. The third chapter is focused on the role of small molecules in inducing conformational changes in an RNA genetic regulatory element called the S-Adenosyl methionine (SAM) SAM-I riboswitch. The mechanistic features of the SAM-I riboswitch to understand the basis for specificity and discrimination and its gene regulation mechanism are reported. To address the conformational dynamics Bacillus subtilis and Thermoanearobacter tencongenesis SAM-I riboswitches in response to SAM binding several conformer mimics are designed, synthesized and characterized using NMR, equilibrium dialysis, and inline probing. The study shows that apart from the conserved residues of the binding pocket, residues downstream of the binding pocket are involved in detecting SAM and assist the binding of SAM to the riboswitch with weak affinity. Our data highlights the capacity of a so-called antiterminator helix from the expression platform to assist the formation of a partial P1 helix of the aptamer domain. A stable P1 is involved in recognition and tight binding of SAM. Our in vitro experiments suggest that the riboswitch could switch from an unbound conformation to tightly SAM bound structure through weakly binding intermediate structures in the presence of the small molecule SAM. The future directions are included in the fourth chapter along with the conclusions
    corecore