6,404 research outputs found

    Similarity-based virtual screening using 2D fingerprints

    Get PDF
    This paper summarises recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases related to the sizes of the molecules that are being sought. Group fusion involves combining the results of similarity searches based on multiple reference structures and a single similarity measure. We demonstrate the effectiveness of this approach to screening, and also describe an approximate form of group fusion, turbo similarity searching, that can be used when just a single reference structure is available

    Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing

    Get PDF
    Motivation: Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries

    Path Similarity Analysis: a Method for Quantifying Macromolecular Pathways

    Full text link
    Diverse classes of proteins function through large-scale conformational changes; sophisticated enhanced sampling methods have been proposed to generate these macromolecular transition paths. As such paths are curves in a high-dimensional space, they have been difficult to compare quantitatively, a prerequisite to, for instance, assess the quality of different sampling algorithms. The Path Similarity Analysis (PSA) approach alleviates these difficulties by utilizing the full information in 3N-dimensional trajectories in configuration space. PSA employs the Hausdorff or Fr\'echet path metrics---adopted from computational geometry---enabling us to quantify path (dis)similarity, while the new concept of a Hausdorff-pair map permits the extraction of atomic-scale determinants responsible for path differences. Combined with clustering techniques, PSA facilitates the comparison of many paths, including collections of transition ensembles. We use the closed-to-open transition of the enzyme adenylate kinase (AdK)---a commonly used testbed for the assessment enhanced sampling algorithms---to examine multiple microsecond equilibrium molecular dynamics (MD) transitions of AdK in its substrate-free form alongside transition ensembles from the MD-based dynamic importance sampling (DIMS-MD) and targeted MD (TMD) methods, and a geometrical targeting algorithm (FRODA). A Hausdorff pairs analysis of these ensembles revealed, for instance, that differences in DIMS-MD and FRODA paths were mediated by a set of conserved salt bridges whose charge-charge interactions are fully modeled in DIMS-MD but not in FRODA. We also demonstrate how existing trajectory analysis methods relying on pre-defined collective variables, such as native contacts or geometric quantities, can be used synergistically with PSA, as well as the application of PSA to more complex systems such as membrane transporter proteins.Comment: 9 figures, 3 tables in the main manuscript; supplementary information includes 7 texts (S1 Text - S7 Text) and 11 figures (S1 Fig - S11 Fig) (also available from journal site
    corecore