30 research outputs found

    Robust Learning on Graphs

    No full text
    Despite the recent success of graph neural networks (GNNs) in modeling graph-structured data, its vulnerability to adversarial attacks has been revealed, and attacks on both node features and graph structure have been designed. Direct extension of defense algorithms based on adversarial samples meets with the immediate challenge because computing the adversarial network costs substantially. Taking Graph Convolutional Networks (GCN) as a base model, we first present the work of adversarial training via latent perturbation which not only dispenses with generating adversarial networks, but also attains improved robustness and accuracy by respecting the latent manifold of the data. Next, given a threat model based on GCN, we investigate the certificate of robustness for the task of graph classification. Specifically, we develop a method based on Lagrangian dualization to provide a lower bound of classification margin. As a byproduct, we also come up with an efficient attack algorithm to serve an upper bound, measuring the tightness of certificates. Furthermore, to tackle the nonconvexity of activation functions, we propose another method based on the convex envelope, resulting in tight approximation bounds. Lastly, due to the intrinsic properties of graphs, we consider providing the certificate of robustness under permutation invariance. To address it, we first develop a new metric for graphs, named Orthogonal Gromov-Wasserstein (OGW) distance, which formulates a coupling between the structured data based on optimal transportation and optimizes over the orthogonal domain with tractable lower and upper bound. In addition, we build its Fenchel biconjugate to facilitate convex optimization in the certificate problem. By wrapping up with the classification margin criteria and deriving a tight convex envelope formulation, our methods construct the first robustness certificate for graph classification subject to structural permutation invariant constraint

    An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

    No full text
    Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD

    The scoring bias in reverse docking and the score normalization strategy to improve success rate of target fishing

    No full text
    <div><p>Target fishing often relies on the use of reverse docking to identify potential target proteins of ligands from protein database. The limitation of reverse docking is the accuracy of current scoring funtions used to distinguish true target from non-target proteins. Many contemporary scoring functions are designed for the virtual screening of small molecules without special optimization for reverse docking, which would be easily influenced by the properties of protein pockets, resulting in scoring bias to the proteins with certain properties. This bias would cause lots of false positives in reverse docking, interferring the identification of true targets. In this paper, we have conducted a large-scale reverse docking (5000 molecules to 100 proteins) to study the scoring bias in reverse docking by DOCK, Glide, and AutoDock Vina. And we found that there were actually some frequency hits, namely interference proteins in all three docking procedures. After analyzing the differences of pocket properties between these interference proteins and the others, we speculated that the interference proteins have larger contact area (related to the size and shape of protein pockets) with ligands (for all three docking programs) or higher hydrophobicity (for Glide), which could be the causes of scoring bias. Then we applied the score normalization method to eliminate this scoring bias, which was effective to make docking score more balanced between different proteins in the reverse docking of benchmark dataset. Later, the Astex Diver Set was utilized to validate the effect of score normalization on actual cases of reverse docking, showing that the accuracy of target prediction significantly increased by 21.5% in the reverse docking by Glide after score normalization, though there was no obvious change in the reverse docking by DOCK and AutoDock Vina. Our results demonstrate the effectiveness of score normalization to eliminate the scoring bias and improve the accuracy of target prediction in reverse docking. Moreover, the properties of protein pockets causing scoring bias to certain proteins we found here can provide the theory basis to further optimize the scoring functions of docking programs for future research.</p></div

    The probability density curves for the highly relevant protein pocket properties of three classes of proteins.

    No full text
    <p>The distribution of (a) median contact area, (b) volume of three classes of protein pockets in the reverse docking by DOCK. The distribution of (c) median contact area, (d) phobic of three classes of protein pockets in the reverse docking by Glide. The distribution of (e) median contact area, (f) size of three classes of protein pockets in the reverse docking by AutoDock Vina.</p

    The target prediction results in the reverse docking by DOCK, Glide and AutoDock Vina.

    No full text
    <p>The target prediction results in the reverse docking by DOCK, Glide and AutoDock Vina.</p

    Copper(0)/Selectfluor System-Promoted Oxidative Carbon–Carbon Bond Cleavage/Annulation of <i>o</i>‑Aryl Chalcones: An Unexpected Synthesis of 9,10-Phenanthraquinone Derivatives

    No full text
    A general and efficient protocol for the synthesis of 9,10-phenanthraquinone derivatives has been successfully developed involving a copper(0)/Selectfluor system-promoted oxidative carbon–carbon bond cleavage/annulation of <i>o</i>-aryl chalcones. A variety of substituted 9,10-phenanthraquinones were synthesized in moderate to good yields under mild reaction conditions

    The relationship between median docking score and the highly relevant protein pocket properties of three classes of proteins.

    No full text
    <p>The scatter diagram of median DOCK score with (a) median contact area, (b) volume of protein pockets. The scatter diagram of median Glide score with (c) median contact area, (d) phobic of protein pockets. The scatter diagram of median Vina score with (e) median contact area, (f) size of protein pockets. The fitting lines of scatter points are shown in the diagram.</p

    The reverse docking results of 5000 molecules.

    No full text
    <p>The probability density curves of (a) DOCK score, (b) Glide score, (c) Vina score of 100 proteins (different colors represent different proteins). (d) The hit frequence, (e) the average rank of 100 proteins in 5000 reverse docking cases.</p
    corecore