Search CORE

23,156 research outputs found

Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

Author: Cang Zixuan
Mu Lin
Wei Guowei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 27/08/2017
Field of study

This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

Author: Cang Zixuan
Wei Guo-Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 31/03/2017
Field of study

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

arXiv.org e-Print Archive

Directory of Open Access Journals

Predicting variation of DNA shape preferences in protein-DNA interaction in cancer cells with a new biophysical model

Author: Batmanov Kirill
Wang Junbai
Publication venue: 'MDPI AG'
Publication date: 01/09/2017
Field of study

DNA shape readout is an important mechanism of target site recognition by transcription factors, in addition to the sequence readout. Several models of transcription factor-DNA binding which consider DNA shape have been developed in recent years. We present a new biophysical model of protein-DNA interaction by considering the DNA shape features, which is based on a neighbour dinucleotide dependency model BayesPI2. The parameters of the new model are restricted to a subspace spanned by the 2-mer DNA shape features, which allowing a biophysical interpretation of the new parameters as position-dependent preferences towards certain values of the features. Using the new model, we explore the variation of DNA shape preferences in several transcription factors across cancer cell lines and cellular conditions. We find evidence of DNA shape variations at FOXA1 binding sites in MCF7 cells after treatment with steroids. The new model is useful for elucidating finer details of transcription factor-DNA interaction. It may be used to improve the prediction of cancer mutation effects in the future

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Characterization of Aptamer-Protein Complexes by X-ray Crystallography and Alternative Approaches

Author: Baugh
Bauke W. Dijkstra
Bing
Bock
Cao
Chayen
Convery
Doudna
Ellington
Friedmann
Garber
Hauke Smidt
Hermann
Hianik
Hoggan
Hollis
Horn
Huang
Huang
Hwang
Jiang
Johan Hekelaar
John van der Oost
Kaur
Ke
Kelly
Kikin
Krauss
Kwan
Laing
Lebruska
Lee
Long
Lupold
Macaya
Mark Levisson
Mascini
McPherson
Mehta
Miyakawa
Moorthy
Murai
Nix
Nomura
Orlova
Padmanabhan
Padmanabhan
Paige
Parisien
Poniková
Reinemann
Reinstein
Renault
Rivas
Rowsell
Ruigrok
Sekiya
Shum
Skrzypczak-Jankun
Snyder
Someya
Stoltenburg
Sugiyama
Sussman
Tereshko
Tuerk
Vincent J. B. Ruigrok
Wang
Wilson
Win
Wochner
Yan
Yee
Zuker
Publication venue
Publication date: 01/01/2012
Field of study

Aptamers are oligonucleotide ligands, either RNA or ssDNA, selected for high-affinity binding to molecular targets, such as small organic molecules, proteins or whole microorganisms. While reports of new aptamers are numerous, characterization of their specific interaction is often restricted to the affinity of binding (KD). Over the years, crystal structures of aptamer-protein complexes have only scarcely become available. Here we describe some relevant technical issues about the process of crystallizing aptamer-protein complexes and highlight some biochemical details on the molecular basis of selected aptamer-protein interactions. In addition, alternative experimental and computational approaches are discussed to study aptamer-protein interactions.

Multidisciplinary Digital Publishing Institute

University of Groningen

Directory of Open Access Journals

Wageningen University & Research Publications

CiteSeerX

Crossref

Proceedings - University of Groningen

ARTS repository - University of Groningen

PubMed Central

University of Groningen Digital Archive

Dissertations of the University of Groningen

Understanding diversity of human innate immunity receptors: analysis of surface features of leucine-rich repeat domains in NLRs and TLRs.

Author: Godzik Adam
Istomin Andrei Y
Publication venue: eScholarship, University of California
Publication date: 01/09/2009
Field of study

BackgroundThe human innate immune system uses a system of extracellular Toll-like receptors (TLRs) and intracellular Nod-like receptors (NLRs) to match the appropriate level of immune response to the level of threat from the current environment. Almost all NLRs and TLRs have a domain consisting of multiple leucine-rich repeats (LRRs), which is believed to be involved in ligand binding. LRRs, found also in thousands of other proteins, form a well-defined "horseshoe"-shaped structural scaffold that can be used for a variety of functions, from binding specific ligands to performing a general structural role. The specific functional roles of LRR domains in NLRs and TLRs are thus defined by their detailed surface features. While experimental crystal structures of four human TLRs have been solved, no structure data are available for NLRs.ResultsWe report a quantitative, comparative analysis of the surface features of LRR domains in human NLRs and TLRs, using predicted three-dimensional structures for NLRs. Specifically, we calculated amino acid hydrophobicity, charge, and glycosylation distributions within LRR domain surfaces and assessed their similarity by clustering. Despite differences in structural and genomic organization, comparison of LRR surface features in NLRs and TLRs allowed us to hypothesize about their possible functional similarities. We find agreement between predicted surface similarities and similar functional roles in NLRs and TLRs with known agonists, and suggest possible binding partners for uncharacterized NLRs.ConclusionDespite its low resolution, our approach permits comparison of molecular surface features in the absence of crystal structure data. Our results illustrate diversity of surface features of innate immunity receptors and provide hints for function of NLRs whose specific role in innate immunity is yet unknown

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Operator Sequence Alters Gene Expression Independently of Transcription Factor Occupancy in Bacteria

Author: Boedicker James Q.
Garcia Hernan G.
Gelles Jeff
Kondev Jane
Osborne Melissa
Phillips Rob
Sanchez Alvaro
Publication venue: 'Elsevier BV'
Publication date: 01/05/2012
Field of study

A canonical quantitative view of transcriptional regulation holds that the only role of operator sequence is to set the probability of transcription factor binding, with operator occupancy determining the level of gene expression. In this work, we test this idea by characterizing repression in vivo and the binding of RNA polymerase in vitro in experiments where operators of various sequences were placed either upstream or downstream from the promoter in Escherichia coli. Surprisingly, we find that operators with a weaker binding affinity can yield higher repression levels than stronger operators. Repressor bound to upstream operators modulates promoter escape, and the magnitude of this modulation is not correlated with the repressor-operator binding affinity. This suggests that operator sequences may modulate transcription by altering the nature of the interaction of the bound transcription factor with the transcriptional machinery, implying a new layer of sequence dependence that must be confronted in the quantitative understanding of gene expression

DSpace@MIT

Elsevier - Publisher Connector

Directory of Open Access Journals

PubMed Central

Caltech Authors

Control of DNA minor groove width and Fis protein binding by the purine 2-amino group.

Author: Cascio Duilio
Di Felice Rosa
Ghane Tahereh
Hancock Stephen P
Johnson Reid C
Rohs Remo
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

The width of the DNA minor groove varies with sequence and can be a major determinant of DNA shape recognition by proteins. For example, the minor groove within the center of the Fis-DNA complex narrows to about half the mean minor groove width of canonical B-form DNA to fit onto the protein surface. G/C base pairs within this segment, which is not contacted by the Fis protein, reduce binding affinities up to 2000-fold over A/T-rich sequences. We show here through multiple X-ray structures and binding properties of Fis-DNA complexes containing base analogs that the 2-amino group on guanine is the primary molecular determinant controlling minor groove widths. Molecular dynamics simulations of free-DNA targets with canonical and modified bases further demonstrate that sequence-dependent narrowing of minor groove widths is modulated almost entirely by the presence of purine 2-amino groups. We also provide evidence that protein-mediated phosphate neutralization facilitates minor groove compression and is particularly important for binding to non-optimally shaped DNA duplexes

CiteSeerX

PubMed Central

eScholarship - University of California