4,770 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Structural patterns in complex networks through spectral analysis
The study of some structural properties of networks is introduced from a graph spectral perspective. First, subgraph centrality of nodes is defined and used to classify essential proteins in a proteomic map. This index is then used to produce a method that allows the identification of superhomogeneous networks. At the same time this method classify non-homogeneous network into three universal classes of structure. We give examples of these classes from networks in different real-world scenarios. Finally, a communicability function is studied and showed as an alternative for defining communities in complex networks. Using this approach a community is unambiguously defined and an algorithm for its identification is proposed and exemplified in a real-world network
Analysis of heat kernel highlights the strongly modular and heat-preserving structure of proteins
In this paper, we study the structure and dynamical properties of protein
contact networks with respect to other biological networks, together with
simulated archetypal models acting as probes. We consider both classical
topological descriptors, such as the modularity and statistics of the shortest
paths, and different interpretations in terms of diffusion provided by the
discrete heat kernel, which is elaborated from the normalized graph Laplacians.
A principal component analysis shows high discrimination among the network
types, either by considering the topological and heat kernel based vector
characterizations. Furthermore, a canonical correlation analysis demonstrates
the strong agreement among those two characterizations, providing thus an
important justification in terms of interpretability for the heat kernel.
Finally, and most importantly, the focused analysis of the heat kernel provides
a way to yield insights on the fact that proteins have to satisfy specific
structural design constraints that the other considered networks do not need to
obey. Notably, the heat trace decay of an ensemble of varying-size proteins
denotes subdiffusion, a peculiar property of proteins
Characterizing Scales of Genetic Recombination and Antibiotic Resistance in Pathogenic Bacteria Using Topological Data Analysis
Pathogenic bacteria present a large disease burden on human health. Control
of these pathogens is hampered by rampant lateral gene transfer, whereby
pathogenic strains may acquire genes conferring resistance to common
antibiotics. Here we introduce tools from topological data analysis to
characterize the frequency and scale of lateral gene transfer in bacteria,
focusing on a set of pathogens of significant public health relevance. As a
case study, we examine the spread of antibiotic resistance in Staphylococcus
aureus. Finally, we consider the possible role of the human microbiome as a
reservoir for antibiotic resistance genes.Comment: 12 pages, 6 figures. To appear in AMT 2014 Special Session on
Advanced Methods of Interactive Data Mining for Personalized Medicin
- …