DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes
combinatorially constructed small molecules to facilitate highly-efficient
screening assays. These selection experiments, involving multiple stages of
washing, elution, and identification of potent binders via unique DNA barcodes,
often generate complex data. This complexity can potentially mask the
underlying signals, necessitating the application of computational tools such
as machine learning to uncover valuable insights. We introduce a compositional
deep probabilistic model of DEL data, DEL-Compose, which decomposes molecular
representations into their mono-synthon, di-synthon, and tri-synthon building
blocks and capitalizes on the inherent hierarchical structure of these
molecules by modeling latent reactions between embedded synthons. Additionally,
we investigate methods to improve the observation models for DEL count data
such as integrating covariate factors to more effectively account for data
noise. Across two popular public benchmark datasets (CA-IX and HRP), our model
demonstrates strong performance compared to count baselines, enriches the
correct pharmacophores, and offers valuable insights via its intrinsic
interpretable structure, thereby providing a robust tool for the analysis of
DEL data