858 research outputs found
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
A semidiscrete version of the Citti-Petitot-Sarti model as a plausible model for anthropomorphic image reconstruction and pattern recognition
In his beautiful book [66], Jean Petitot proposes a sub-Riemannian model for
the primary visual cortex of mammals. This model is neurophysiologically
justified. Further developments of this theory lead to efficient algorithms for
image reconstruction, based upon the consideration of an associated
hypoelliptic diffusion. The sub-Riemannian model of Petitot and Citti-Sarti (or
certain of its improvements) is a left-invariant structure over the group
of rototranslations of the plane. Here, we propose a semi-discrete
version of this theory, leading to a left-invariant structure over the group
, restricting to a finite number of rotations. This apparently very
simple group is in fact quite atypical: it is maximally almost periodic, which
leads to much simpler harmonic analysis compared to Based upon this
semi-discrete model, we improve on previous image-reconstruction algorithms and
we develop a pattern-recognition theory that leads also to very efficient
algorithms in practice.Comment: 123 pages, revised versio
Synthesis and Optimization of Reversible Circuits - A Survey
Reversible logic circuits have been historically motivated by theoretical
research in low-power electronics as well as practical improvement of
bit-manipulation transforms in cryptography and computer graphics. Recently,
reversible circuits have attracted interest as components of quantum
algorithms, as well as in photonic and nano-computing technologies where some
switching devices offer no signal gain. Research in generating reversible logic
distinguishes between circuit synthesis, post-synthesis optimization, and
technology mapping. In this survey, we review algorithmic paradigms ---
search-based, cycle-based, transformation-based, and BDD-based --- as well as
specific algorithms for reversible synthesis, both exact and heuristic. We
conclude the survey by outlining key open challenges in synthesis of reversible
and quantum logic, as well as most common misconceptions.Comment: 34 pages, 15 figures, 2 table
Solving Algorithmic Problems in Finitely Presented Groups via Machine Learning
Machine learning and pattern recognition techniques have been successfully applied to algorithmic problems in free groups. In this dissertation, we seek to extend these techniques to finitely presented non-free groups, in particular to polycyclic and metabelian groups that are of interest to non-commutative cryptography.
As a prototypical example, we utilize supervised learning methods to construct classifiers that can solve the conjugacy decision problem, i.e., determine whether or not a pair of elements from a specified group are conjugate. The accuracies of classifiers created using decision trees, random forests, and N-tuple neural network models are evaluated for several non-free groups. The very high accuracy of these classifiers suggests an underlying mathematical relationship with respect to conjugacy in the tested groups.
In addition to testing these techniques on several well-known finitely presented groups, we introduce a new family of metabelian groups for which we analyze the computational complexity of the conjugacy search problem. We prove that for the family in general the time complexity of the conjugacy search problem is exponential, while for a subfamily the problem is polynomial. We also show that for some of these groups the conjugacy search problem is an instance of the discrete logarithm problem.
We also apply machine learning techniques to solving the conjugacy search problem. For each platform group we train a N-tuple regression network that can produce a candidate conjugator for a pair of conjugate elements. This candidate is then used as the initial state of a local search for a conjugator in the Cayley graph, in what we call regression-based conjugacy search (RBCS). RBCS can be applied to groups such as polycyclic groups for which other heuristic approaches, such as the length-based attack, are ineffective
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
- …