59 research outputs found
Recommended from our members
Inferring spatial and signaling relationships between cells from single cell transcriptomic data.
Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell-cell communications are then obtained by "optimally transporting" signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene-gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell-cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues
TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions
Although deep learning approaches have had tremendous success in image, video
and audio processing, computer vision, and speech recognition, their
applications to three-dimensional (3D) biomolecular structural data sets have
been hindered by the entangled geometric complexity and biological complexity.
We introduce topology, i.e., element specific persistent homology (ESPH), to
untangle geometric complexity and biological complexity. ESPH represents 3D
complex geometry by one-dimensional (1D) topological invariants and retains
crucial biological information via a multichannel image representation. It is
able to reveal hidden structure-function relationships in biomolecules. We
further integrate ESPH and convolutional neural networks to construct a
multichannel topological neural network (TopologyNet) for the predictions of
protein-ligand binding affinities and protein stability changes upon mutation.
To overcome the limitations to deep learning arising from small and noisy
training sets, we present a multitask topological convolutional neural network
(MT-TCNN). We demonstrate that the present TopologyNet architectures outperform
other state-of-the-art methods in the predictions of protein-ligand binding
affinities, globular protein mutation impacts, and membrane protein mutation
impacts.Comment: 20 pages, 8 figures, 5 table
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Supervised Gromov-Wasserstein Optimal Transport
We introduce the supervised Gromov-Wasserstein (sGW) optimal transport, an
extension of Gromov-Wasserstein by incorporating potential infinity patterns in
the cost tensor. sGW enables the enforcement of application-induced constraints
such as the preservation of pairwise distances by implementing the constraints
as an infinity pattern. A numerical solver is proposed for the sGW problem and
the effectiveness is demonstrated in various numerical experiments. The
high-order constraints in sGW are transferred to constraints on the coupling
matrix by solving a minimal vertex cover problem. The transformed problem is
solved by the Mirror-C descent iteration coupled with the supervised optimal
transport solver. In the numerical experiments, we first validate the proposed
framework by applying it to matching synthetic datasets and investigating the
impact of the model parameters. Additionally, we successfully apply sGW to real
single-cell RNA sequencing data. Through comparisons with other
Gromov-Wasserstein variants on real data, we demonstrate that sGW offers the
novel utility of controlling distance preservation, leading to the automatic
estimation of overlapping portions of datasets, which brings improved stability
and flexibility in data-driven applications
A topological approach for protein classification
Protein function and dynamics are closely related to its sequence and
structure. However prediction of protein function and dynamics from its
sequence and structure is still a fundamental challenge in molecular biology.
Protein classification, which is typically done through measuring the
similarity be- tween proteins based on protein sequence or physical
information, serves as a crucial step toward the understanding of protein
function and dynamics. Persistent homology is a new branch of algebraic
topology that has found its success in the topological data analysis in a
variety of disciplines, including molecular biology. The present work explores
the potential of using persistent homology as an indepen- dent tool for protein
classification. To this end, we propose a molecular topological fingerprint
based support vector machine (MTF-SVM) classifier. Specifically, we construct
machine learning feature vectors solely from protein topological fingerprints,
which are topological invariants generated during the filtration process. To
validate the present MTF-SVM approach, we consider four types of problems.
First, we study protein-drug binding by using the M2 channel protein of
influenza A virus. We achieve 96% accuracy in discriminating drug bound and
unbound M2 channels. Additionally, we examine the use of MTF-SVM for the
classification of hemoglobin molecules in their relaxed and taut forms and
obtain about 80% accuracy. The identification of all alpha, all beta, and
alpha-beta protein domains is carried out in our next study using 900 proteins.
We have found a 85% success in this identifica- tion. Finally, we apply the
present technique to 55 classification tasks of protein superfamilies over 1357
samples. An average accuracy of 82% is attained. The present study establishes
computational topology as an independent and effective alternative for protein
classification
AVIDA: Alternating method for Visualizing and Integrating Data
High-dimensional multimodal data arises in many scientific fields. The
integration of multimodal data becomes challenging when there is no known
correspondence between the samples and the features of different datasets. To
tackle this challenge, we introduce AVIDA, a framework for simultaneously
performing data alignment and dimension reduction. In the numerical
experiments, Gromov-Wasserstein optimal transport and t-distributed stochastic
neighbor embedding are used as the alignment and dimension reduction modules
respectively. We show that AVIDA correctly aligns high-dimensional datasets
without common features with four synthesized datasets and two real multimodal
single-cell datasets. Compared to several existing methods, we demonstrate that
AVIDA better preserves structures of individual datasets, especially distinct
local structures in the joint low-dimensional visualization, while achieving
comparable alignment performance. Such a property is important in multimodal
single-cell data analysis as some biological processes are uniquely captured by
one of the datasets. In general applications, other methods can be used for the
alignment and dimension reduction modules.Comment: To appear in Journal of Computational Science (Accepted, 2023
Poisson-Boltzmann based machine learning (PBML) model for electrostatic analysis
Electrostatics is of paramount importance to chemistry, physics, biology, and
medicine. The Poisson-Boltzmann (PB) theory is a primary model for
electrostatic analysis. However, it is highly challenging to compute accurate
PB electrostatic solvation free energies for macromolecules due to the
nonlinearity, dielectric jumps, charge singularity , and geometric complexity
associated with the PB equation. The present work introduces a PB based machine
learning (PBML) model for biomolecular electrostatic analysis. Trained with the
second-order accurate MIBPB solver, the proposed PBML model is found to be more
accurate and faster than several eminent PB solvers in electrostatic analysis.
The proposed PBML model can provide highly accurate PB electrostatic solvation
free energy of new biomolecules or new conformations generated by molecular
dynamics with much reduced computational cost
- …