1,440 research outputs found
Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization
Molecular optimization is a fundamental goal in the chemical sciences and is
of central interest to drug and material design. In recent years, significant
progress has been made in solving challenging problems across various aspects
of computational molecular optimizations, emphasizing high validity, diversity,
and, most recently, synthesizability. Despite this progress, many papers report
results on trivial or self-designed tasks, bringing additional challenges to
directly assessing the performance of new methods. Moreover, the sample
efficiency of the optimization--the number of molecules evaluated by the
oracle--is rarely discussed, despite being an essential consideration for
realistic discovery applications.
To fill this gap, we have created an open-source benchmark for practical
molecular optimization, PMO, to facilitate the transparent and reproducible
evaluation of algorithmic advances in molecular optimization. This paper
thoroughly investigates the performance of 25 molecular design algorithms on 23
tasks with a particular focus on sample efficiency. Our results show that most
"state-of-the-art" methods fail to outperform their predecessors under a
limited oracle budget allowing 10K queries and that no existing algorithm can
efficiently solve certain molecular optimization problems in this setting. We
analyze the influence of the optimization algorithm choices, molecular assembly
strategies, and oracle landscapes on the optimization performance to inform
future algorithm development and benchmarking. PMO provides a standardized
experimental setup to comprehensively evaluate and compare new molecule
optimization methods with existing ones. All code can be found at
https://github.com/wenhao-gao/mol_opt
Sample-efficient Multi-objective Molecular Optimization with GFlowNets
Many crucial scientific problems involve designing novel molecules with
desired properties, which can be formulated as a black-box optimization problem
over the discrete chemical space. In practice, multiple conflicting objectives
and costly evaluations (e.g., wet-lab experiments) make the diversity of
candidates paramount. Computational methods have achieved initial success but
still struggle with considering diversity in both objective and search space.
To fill this gap, we propose a multi-objective Bayesian optimization (MOBO)
algorithm leveraging the hypernetwork-based GFlowNets (HN-GFN) as an
acquisition function optimizer, with the purpose of sampling a diverse batch of
candidate molecular graphs from an approximate Pareto front. Using a single
preference-conditioned hypernetwork, HN-GFN learns to explore various
trade-offs between objectives. We further propose a hindsight-like off-policy
strategy to share high-performing molecules among different preferences in
order to speed up learning for HN-GFN. We empirically illustrate that HN-GFN
has adequate capacity to generalize over preferences. Moreover, experiments in
various real-world MOBO settings demonstrate that our framework predominantly
outperforms existing methods in terms of candidate quality and sample
efficiency. The code is available at https://github.com/violet-sto/HN-GFN.Comment: NeurIPS 202
Machine Learning Guided Discovery and Design for Inertial Confinement Fusion
Inertial confinement fusion (ICF) experiments at the National Ignition Facility (NIF) and their corresponding computer simulations produce an immense amount of rich data. However, quantitatively interpreting that data remains a grand challenge. Design spaces are vast, data volumes are large, and the relationship between models and experiments may be uncertain.
We propose using machine learning to aid in the design and understanding of ICF implosions by integrating simulation and experimental data into a common frame-work. We begin by illustrating an early success of this data-driven design approach which resulted in the discovery of a new class of high performing ovoid-shaped implosion simulations. The ovoids achieve robust performance from the generation of zonal flows within the hotspot, revealing physics that had not previously been observed in ICF capsules.
The ovoid discovery also revealed deficiencies in common machine learning algorithms for modeling ICF data. To overcome these inadequacies, we developed a novel algorithm, deep jointly-informed neural networks (DJINN), which enables non-data scientists to quickly train neural networks on their own datasets. DJINN is routinely used for modeling data ICF data and for a variety of other applications (uncertainty quantification; climate, nuclear, and atomic physics data). We demonstrate how DJINN is used to perform parameter inference tasks for NIF data, and how transfer learning with DJINN enables us to create predictive models of direct drive experiments at the Omega laser facility.
Much of this work focuses on scalar or modest-size vector data, however many ICF diagnostics produce a variety of images, spectra, and sequential data. We end with a brief exploration of sequence-to-sequence models for emulating time-dependent multiphysics systems of varying complexity. This is a first step toward incorporating multimodal time-dependent data into our analyses to better constrain our predictive models
Tuning a variational autoencoder for data accountability problem in the Mars Science Laboratory ground data system
The Mars Curiosity rover is frequently sending back engineering and science
data that goes through a pipeline of systems before reaching its final
destination at the mission operations center making it prone to volume loss and
data corruption. A ground data system analysis (GDSA) team is charged with the
monitoring of this flow of information and the detection of anomalies in that
data in order to request a re-transmission when necessary. This work presents
-MADS, a derivative-free optimization method applied for tuning the
architecture and hyperparameters of a variational autoencoder trained to detect
the data with missing patches in order to assist the GDSA team in their
mission
- …