Computational discovery of ideal lead compounds is a critical process for
modern drug discovery. It comprises multiple stages: hit screening, molecular
property prediction, and molecule optimization. Current efforts are disparate,
involving the establishment of models for each stage, followed by multi-stage
multi-model integration. However, this is non-ideal, as clumsy integration of
incompatible models increases research overheads, and may even reduce success
rates in drug discovery. Facilitating compatibilities requires establishing
inherent model consistencies across lead discovery stages. Towards that effect,
we propose an open deep graph learning (DGL) based pipeline: generative
adversarial feature subspace enhancement (GAFSE), which first unifies the
modeling of these stages into one learning framework. GAFSE also offers
standardized modular design and streamlined interfaces for future expansions
and community support. GAFSE combines adversarial/generative learning, graph
attention network, graph reconstruction network, and optimizes the
classification/regression loss, adversarial/generative loss, and reconstruction
loss simultaneously. Convergence analysis theoretically guarantees model
generalization performance. Exhaustive benchmarking demonstrates that the GAFSE
pipeline achieves excellent performance across almost all lead discovery
stages, while also providing valuable model interpretability. Hence, we believe
this tool will enhance the efficiency and productivity of drug discovery
researchers.Comment: This article is used as the preliminary studies for the application
of Lee Kuan Yew Postdoctoral Fellowship (LKYPDF) 2023 in Singapore. All
rights reserve