4,357 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Predicting Proteome-Early Drug Induced Cardiac Toxicity Relationships (Pro-EDICToRs) with Node Overlapping Parameters (NOPs) of a new class of Blood Mass-Spectra graphs
The 11th International Electronic Conference on Synthetic Organic Chemistry session Computational ChemistryBlood Serum Proteome-Mass Spectra (SP-MS) may allow detecting Proteome-Early Drug Induced Cardiac Toxicity Relationships (called here Pro-EDICToRs). However, due to the thousands of proteins in the SP identifying general Pro-EDICToRs patterns instead of a single protein marker may represents a more realistic alternative. In this sense, first we introduced a novel Cartesian 2D spectrum graph for SP-MS. Next, we introduced the graph node-overlapping parameters (nopk) to numerically characterize SP-MS using them as inputs to seek a Quantitative Proteome-Toxicity Relationship (QPTR) classifier for Pro-EDICToRs with accuracy higher than 80%. Principal Component Analysis (PCA) on the nopk values present in the QPTR model explains with one factor (F1) the 82.7% of variance. Next, these nopk values were used to construct by the first time a Pro-EDICToRs Complex Network having nodes (samples) linked by edges (similarity between two samples). We compared the topology of two sub-networks (cardiac toxicity and control samples); finding extreme relative differences for the re-linking (P) and Zagreb (M2) indices (9.5 and 54.2 % respectively) out of 11 parameters. We also compared subnetworks with well known ideal random networks including Barabasi-Albert, Kleinberg Small World, Erdos-Renyi, and Epsstein Power Law models. Finally, we proposed Partial Order (PO) schemes of the 115 samples based on LDA-probabilities, F1-scores and/or network node degrees. PCA-CN and LDA-PCA based POs with Tanimotoâs coefficients equal or higher than 0.75 are promising for the study of Pro-EDICToRs. These results shows that simple QPTRs models based on MS graph numerical parameters are an interesting tool for proteome researchThe authors thank projects funded by the Xunta de Galicia (PXIB20304PR and BTF20302PR) and the Ministerio de Sanidad y Consumo (PI061457). GonzĂĄlez-DĂaz H. acknowledges tenure track research position funded by the Program Isidro Parga Pondal, Xunta de Galici
Spherical harmonics coeffcients for ligand-based virtual screening of cyclooxygenase inhibitors
Background: Molecular descriptors are essential for many applications in computational chemistry, such as ligand-based similarity searching. Spherical harmonics have previously been suggested as comprehensive descriptors of molecular structure and properties. We investigate a spherical harmonics descriptor for shape-based virtual screening. Methodology/Principal Findings: We introduce and validate a partially rotation-invariant three-dimensional molecular shape descriptor based on the norm of spherical harmonics expansion coefficients. Using this molecular representation, we parameterize molecular surfaces, i.e., isosurfaces of spatial molecular property distributions. We validate the shape descriptor in a comprehensive retrospective virtual screening experiment. In a prospective study, we virtually screen a large compound library for cyclooxygenase inhibitors, using a self-organizing map as a pre-filter and the shape descriptor for candidate prioritization. Conclusions/Significance: 12 compounds were tested in vitro for direct enzyme inhibition and in a whole blood assay. Active compounds containing a triazole scaffold were identified as direct cyclooxygenase-1 inhibitors. This outcome corroborates the usefulness of spherical harmonics for representation of molecular shape in virtual screening of large compound collections. The combination of pharmacophore and shape-based filtering of screening candidates proved to be a straightforward approach to finding novel bioactive chemotypes with minimal experimental effort
Evolutionary Computation and QSAR Research
[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. ConsellerĂa de EconomĂa e Industria; 10SIN105004P
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
TI2BioP â Topological Indices to BioPolymers. A Graphicalâ Numerical Approach for Bioinformatics
We developed a new graphicalânumerical method called TI2BioP (Topological Indices to BioPolymers) to estimate topological indices (TIs) from two-dimensional (2D) graphical approaches for the natural biopolymers DNA, RNA and proteins The methodology mainly turns long biopolymeric sequences into 2D artificial graphs such as Cartesian and four-color maps but also reads other 2D graphs from the thermodynamic folding of DNA/RNA strings inferred from other programs. The topology of such 2D graphs is either encoded by node or adjacency matrixes for the calculation of the spectral moments as TIs. These numerical indices were used to build up alignment-free models to the functional classification of biosequences and to calculate alignment-free distances for phylogenetic purposes. The performance of the method was evaluated in highly diverse gene/protein classes, which represents a challenge for current bioinformatics algorithms. TI2BioP generally outperformed classical bioinformatics algorithms in the functional classification of Bacteriocins, ribonucleases III (RNases III), genomic internal transcribed spacer II (ITS2) and adenylation domains (A-domains) of nonribosomal peptide synthetases (NRPS) allowing the detection of new members in these target gene/protein classes. TI2BioP classification performance was contrasted and supported by predictions with sensitive alignment-based algorithms and experimental outcomes, respectively. The new ITS2 sequence isolated from Petrakia sp. was used in our graphicalânumerical approach to estimate alignment-free distances for phylogenetic inferences. Despite TI2BioP having been developed for application in bioinformatics, it can be extended to predict interesting features of other biopolymers than DNA and protein sequences. TI2BioP version 2.0 is freely available from http://ti2biop.sourceforge.net/
- âŠ