1,394 research outputs found
Quantitative toxicity prediction using topology based multi-task deep neural networks
The understanding of toxicity is of paramount importance to human health and
environmental protection. Quantitative toxicity analysis has become a new
standard in the field. This work introduces element specific persistent
homology (ESPH), an algebraic topology approach, for quantitative toxicity
prediction. ESPH retains crucial chemical information during the topological
abstraction of geometric complexity and provides a representation of small
molecules that cannot be obtained by any other method. To investigate the
representability and predictive power of ESPH for small molecules, ancillary
descriptors have also been developed based on physical models. Topological and
physical descriptors are paired with advanced machine learning algorithms, such
as deep neural network (DNN), random forest (RF) and gradient boosting decision
tree (GBDT), to facilitate their applications to quantitative toxicity
predictions. A topology based multi-task strategy is proposed to take the
advantage of the availability of large data sets while dealing with small data
sets. Four benchmark toxicity data sets that involve quantitative measurements
are used to validate the proposed approaches. Extensive numerical studies
indicate that the proposed topological learning methods are able to outperform
the state-of-the-art methods in the literature for quantitative toxicity
analysis. Our online server for computing element-specific topological
descriptors (ESTDs) is available at http://weilab.math.msu.edu/TopTox/Comment: arXiv admin note: substantial text overlap with arXiv:1703.1095
Persistent Homology in Sparse Regression and its Application to Brain Morphometry
Sparse systems are usually parameterized by a tuning parameter that
determines the sparsity of the system. How to choose the right tuning parameter
is a fundamental and difficult problem in learning the sparse system. In this
paper, by treating the the tuning parameter as an additional dimension,
persistent homological structures over the parameter space is introduced and
explored. The structures are then further exploited in speeding up the
computation using the proposed soft-thresholding technique. The topological
structures are further used as multivariate features in the tensor-based
morphometry (TBM) in characterizing white matter alterations in children who
have experienced severe early life stress and maltreatment. These analyses
reveal that stress-exposed children exhibit more diffuse anatomical
organization across the whole white matter region.Comment: submitted to IEEE Transactions on Medical Imagin
Path homologies of deep feedforward networks
We provide a characterization of two types of directed homology for
fully-connected, feedforward neural network architectures. These exact
characterizations of the directed homology structure of a neural network
architecture are the first of their kind. We show that the directed flag
homology of deep networks reduces to computing the simplicial homology of the
underlying undirected graph, which is explicitly given by Euler characteristic
computations. We also show that the path homology of these networks is
non-trivial in higher dimensions and depends on the number and size of the
layers within the network. These results provide a foundation for investigating
homological differences between neural network architectures and their realized
structure as implied by their parameters.Comment: To appear in the proceedings of IEEE ICMLA 201
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Topological Data Analysis of Database Representations for Information Retrieval
Appropriately representing elements in a database so that queries may be
accurately matched is a central task in information retrieval. This recently
has been achieved by embedding the graphical structure of the database into a
manifold so that the hierarchy is preserved. Persistent homology provides a
rigorous characterization for the database topology in terms of both its
hierarchy and connectivity structure. We compute persistent homology on a
variety of datasets and show that some commonly used embeddings fail to
preserve the connectivity. Moreover, we show that embeddings which successfully
retain the database topology coincide in persistent homology. We introduce the
dilation-invariant bottleneck distance to capture this effect, which addresses
metric distortion on manifolds. We use it to show that distances between
topology-preserving embeddings of databases are small.Comment: 15 pages, 7 figure
- …