1,246 research outputs found
Path Similarity Analysis: a Method for Quantifying Macromolecular Pathways
Diverse classes of proteins function through large-scale conformational
changes; sophisticated enhanced sampling methods have been proposed to generate
these macromolecular transition paths. As such paths are curves in a
high-dimensional space, they have been difficult to compare quantitatively, a
prerequisite to, for instance, assess the quality of different sampling
algorithms. The Path Similarity Analysis (PSA) approach alleviates these
difficulties by utilizing the full information in 3N-dimensional trajectories
in configuration space. PSA employs the Hausdorff or Fr\'echet path
metrics---adopted from computational geometry---enabling us to quantify path
(dis)similarity, while the new concept of a Hausdorff-pair map permits the
extraction of atomic-scale determinants responsible for path differences.
Combined with clustering techniques, PSA facilitates the comparison of many
paths, including collections of transition ensembles. We use the closed-to-open
transition of the enzyme adenylate kinase (AdK)---a commonly used testbed for
the assessment enhanced sampling algorithms---to examine multiple microsecond
equilibrium molecular dynamics (MD) transitions of AdK in its substrate-free
form alongside transition ensembles from the MD-based dynamic importance
sampling (DIMS-MD) and targeted MD (TMD) methods, and a geometrical targeting
algorithm (FRODA). A Hausdorff pairs analysis of these ensembles revealed, for
instance, that differences in DIMS-MD and FRODA paths were mediated by a set of
conserved salt bridges whose charge-charge interactions are fully modeled in
DIMS-MD but not in FRODA. We also demonstrate how existing trajectory analysis
methods relying on pre-defined collective variables, such as native contacts or
geometric quantities, can be used synergistically with PSA, as well as the
application of PSA to more complex systems such as membrane transporter
proteins.Comment: 9 figures, 3 tables in the main manuscript; supplementary information
includes 7 texts (S1 Text - S7 Text) and 11 figures (S1 Fig - S11 Fig) (also
available from journal site
Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures
Predicting cancer drug mechanisms of action using molecular network signatures
Molecular signatures are a powerful approach to characterize novel small molecules and derivatized small molecule libraries. While new experimental techniques are being developed in diverse model systems, informatics approaches lag behind these exciting advances. We propose an analysis pipeline for signature based drug annotation. We develop an integrated strategy, utilizing supervised and unsupervised learning methodologies that are bridged by network based statistics. Using this approach we can: 1, predict new examples of drug mechanisms that we trained our model upon; 2, identify “New” mechanisms of action that do not belong to drug categories that our model was trained upon; and 3, update our training sets with these “New” mechanisms and accurately predict entirely distinct examples from these new categories. Thus, not only does our strategy provide statistical generalization but it also offers biological generalization. Additionally, we show that our approach is applicable to diverse types of data, and that distinct biological mechanisms characterize its resolution of categories across different data types. As particular examples, we find that our predictive resolution of drug mechanisms from mRNA expression studies relies upon the analog measurement of a cell stress-related transcriptional rheostat along with a transcriptional representation of cell cycle state; whereas, in contrast, drug mechanism resolution from functional RNAi studies rely upon more dichotomous (e.g., either enhances or inhibits) association with cell death states. We believe that our approach can facilitate molecular signature-based drug mechanism understanding from different technology platforms and across diverse biological phenomena.National Cancer Institute (U.S.) (NCI Integrative Cancer Biology Program grant U54-CA112967
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
PANDORA: A Parallel Dendrogram Construction Algorithm for Single Linkage Clustering on GPU
This paper presents \pandora, a novel parallel algorithm for efficiently
constructing dendrograms for single-linkage hierarchical clustering, including
\hdbscan. Traditional dendrogram construction methods from a minimum spanning
tree (MST), such as agglomerative or divisive techniques, often fail to
efficiently parallelize, especially with skewed dendrograms common in
real-world data.
\pandora addresses these challenges through a unique recursive tree
contraction method, which simplifies the tree for initial dendrogram
construction and then progressively reconstructs the complete dendrogram. This
process makes \pandora asymptotically work-optimal, independent of dendrogram
skewness. All steps in \pandora are fully parallel and suitable for massively
threaded accelerators such as GPUs.
Our implementation is written in Kokkos, providing support for both CPUs and
multi-vendor GPUs (e.g., Nvidia, AMD). The multithreaded version of \pandora is
2.2 faster than the current best-multithreaded implementation, while
the GPU \pandora implementation achieved 6-20 on \amdgpu and
10-37 on \nvidiagpu speed-up over multithreaded \pandora. These
advancements lead to up to a 6-fold speedup for \hdbscan on GPUs over the
current best, which only offload MST construction to GPUs and perform
multithreaded dendrogram construction
Hierarchical Portfolio Optimization
The field of Portfolio Optimization has historically had a very hard time as the
Mathematical Models at its availability are based on certain assumptions one can
not afford to make in the financial markets, making naive approaches all-too enticing. In this project we have introduced the assumption that the different stocks in
the financial markets have a hierarchical structure and have allowed ourselves to be
inspired by it to build portfolios through a Machine Learning approach. We have
employed the Hierarchical Risk Parity algorithm and tested minor variations relating to the dissimilarity measure it makes use of. The tests were conducted with
historical daily closing price data from 2014 to 2020 for 440 stocks in the S&P 500
index. Results suggest most of the tested Hierarchical Risk Parity variants are robust and can compete with the Equal Weights Portfolio. We mainly encourage the
use of two dissimilarity measures, the standard one, a correlation based metric and
Dynamic Time Warping. The former is suggested to the pessimistic investor while
the latter to the hopeful yet conservative investor. To optimistic investors with
a high risk tolerance the recommendation would be to use the traditional Equal
Weights portfolio among the asset allocation methods considered in this project
Hierarchical Portfolio Optimization
The field of Portfolio Optimization has historically had a very hard time as the
Mathematical Models at its availability are based on certain assumptions one can
not afford to make in the financial markets, making naive approaches all-too enticing. In this project we have introduced the assumption that the different stocks in
the financial markets have a hierarchical structure and have allowed ourselves to be
inspired by it to build portfolios through a Machine Learning approach. We have
employed the Hierarchical Risk Parity algorithm and tested minor variations relating to the dissimilarity measure it makes use of. The tests were conducted with
historical daily closing price data from 2014 to 2020 for 440 stocks in the S&P 500
index. Results suggest most of the tested Hierarchical Risk Parity variants are robust and can compete with the Equal Weights Portfolio. We mainly encourage the
use of two dissimilarity measures, the standard one, a correlation based metric and
Dynamic Time Warping. The former is suggested to the pessimistic investor while
the latter to the hopeful yet conservative investor. To optimistic investors with
a high risk tolerance the recommendation would be to use the traditional Equal
Weights portfolio among the asset allocation methods considered in this project
- …