Machine learning and especially deep learning has had an increasing impact on
molecule and materials design. In particular, given the growing access to an
abundance of high-quality small molecule data for generative modeling for drug
design, results for drug discovery have been promising. However, for many
important classes of materials such as catalysts, antioxidants, and
metal-organic frameworks, such large datasets are not available. Such families
of molecules with limited samples and structural similarities are especially
prevalent for industrial applications. As is well-known, retraining and even
fine-tuning are challenging on such small datasets. Novel, practically
applicable molecules are most often derivatives of well-known molecules,
suggesting approaches to addressing data scarcity. To address this problem, we
introduce STRIDE, a generative molecule workflow that generates
novel molecules with an unconditional generative model guided by known
molecules without any retraining. We generate molecules outside of the training
data from a highly specialized set of antioxidant molecules. Our generated
molecules have on average 21.7% lower synthetic accessibility scores and also
reduce ionization potential by 5.9% of generated molecules via guiding