On QSAR-based cardiotoxicity modeling with the expressiveness-enhanced graph learning model and dual-threshold scheme


Introduction: Given the direct association with malignant ventricular arrhythmias, cardiotoxicity is a major concern in drug design. In the past decades, computational models based on the quantitative structure-activity relationship have been proposed to screen out cardiotoxic compounds and have shown promising results. The combination of molecular fingerprint and the machine learning model shows stable performance for a wide spectrum of problems; however, not long after the advent of the graph neural network (GNN) deep learning model and its variant (e.g., graph transformer), it has become the principal way of quantitative structure-activity relationship-based modeling for its high flexibility in feature extraction and decision rule generation. Despite all these progresses, the expressiveness (the ability of a program to identify non-isomorphic graph structures) of the GNN model is bounded by the WL isomorphism test, and a suitable thresholding scheme that relates directly to the sensitivity and credibility of a model is still an open question. Methods: In this research, we further improved the expressiveness of the GNN model by introducing the substructure-aware bias by the graph subgraph transformer network model. Moreover, to propose the most appropriate thresholding scheme, a comprehensive comparison of the thresholding schemes was conducted. Results: Based on these improvements, the best model attains performance with 90.4% precision, 90.4% recall, and 90.5% F1-score with a dual-threshold scheme (active: <1μM; non-active: >30μM). The improved pipeline (graph subgraph transformer network model and thresholding scheme) also shows its advantages in terms of the activity cliff problem and model interpretability

    Similar works