7,511 research outputs found
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Computational and Theoretical Issues of Multiparameter Persistent Homology for Data Analysis
The basic goal of topological data analysis is to apply topology-based descriptors
to understand and describe the shape of data. In this context, homology is one of
the most relevant topological descriptors, well-appreciated for its discrete nature,
computability and dimension independence. A further development is provided
by persistent homology, which allows to track homological features along a oneparameter
increasing sequence of spaces. Multiparameter persistent homology, also
called multipersistent homology, is an extension of the theory of persistent homology
motivated by the need of analyzing data naturally described by several parameters,
such as vector-valued functions. Multipersistent homology presents several issues in
terms of feasibility of computations over real-sized data and theoretical challenges
in the evaluation of possible descriptors. The focus of this thesis is in the interplay
between persistent homology theory and discrete Morse Theory. Discrete Morse
theory provides methods for reducing the computational cost of homology and persistent
homology by considering the discrete Morse complex generated by the discrete
Morse gradient in place of the original complex. The work of this thesis addresses
the problem of computing multipersistent homology, to make such tool usable in real
application domains. This requires both computational optimizations towards the
applications to real-world data, and theoretical insights for finding and interpreting
suitable descriptors. Our computational contribution consists in proposing a new
Morse-inspired and fully discrete preprocessing algorithm. We show the feasibility
of our preprocessing over real datasets, and evaluate the impact of the proposed
algorithm as a preprocessing for computing multipersistent homology. A theoretical
contribution of this thesis consists in proposing a new notion of optimality for such
a preprocessing in the multiparameter context. We show that the proposed notion
generalizes an already known optimality notion from the one-parameter case. Under
this definition, we show that the algorithm we propose as a preprocessing is optimal
in low dimensional domains. In the last part of the thesis, we consider preliminary
applications of the proposed algorithm in the context of topology-based multivariate
visualization by tracking critical features generated by a discrete gradient field compatible
with the multiple scalar fields under study. We discuss (dis)similarities of such
critical features with the state-of-the-art techniques in topology-based multivariate
data visualization
Connections Between Adaptive Control and Optimization in Machine Learning
This paper demonstrates many immediate connections between adaptive control
and optimization methods commonly employed in machine learning. Starting from
common output error formulations, similarities in update law modifications are
examined. Concepts in stability, performance, and learning, common to both
fields are then discussed. Building on the similarities in update laws and
common concepts, new intersections and opportunities for improved algorithm
analysis are provided. In particular, a specific problem related to higher
order learning is solved through insights obtained from these intersections.Comment: 18 page
The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting
The numerous recent breakthroughs in machine learning (ML) make imperative to
carefully ponder how the scientific community can benefit from a technology
that, although not necessarily new, is today living its golden age. This Grand
Challenge review paper is focused on the present and future role of machine
learning in space weather. The purpose is twofold. On one hand, we will discuss
previous works that use ML for space weather forecasting, focusing in
particular on the few areas that have seen most activity: the forecasting of
geomagnetic indices, of relativistic electrons at geosynchronous orbits, of
solar flares occurrence, of coronal mass ejection propagation time, and of
solar wind speed. On the other hand, this paper serves as a gentle introduction
to the field of machine learning tailored to the space weather community and as
a pointer to a number of open challenges that we believe the community should
undertake in the next decade. The recurring themes throughout the review are
the need to shift our forecasting paradigm to a probabilistic approach focused
on the reliable assessment of uncertainties, and the combination of
physics-based and machine learning approaches, known as gray-box.Comment: under revie
Recommended from our members
Discrete Differential Geometry
This is the collection of extended abstracts for the 26 lectures and the open problems session at the second Oberwolfach workshop on Discrete Differential Geometry
- …