6 research outputs found
Adaptive Informative Path Planning with Multimodal Sensing
Adaptive Informative Path Planning (AIPP) problems model an agent tasked with
obtaining information subject to resource constraints in unknown, partially
observable environments. Existing work on AIPP has focused on representing
observations about the world as a result of agent movement. We formulate the
more general setting where the agent may choose between different sensors at
the cost of some energy, in addition to traversing the environment to gather
information. We call this problem AIPPMS (MS for Multimodal Sensing). AIPPMS
requires reasoning jointly about the effects of sensing and movement in terms
of both energy expended and information gained. We frame AIPPMS as a Partially
Observable Markov Decision Process (POMDP) and solve it with online planning.
Our approach is based on the Partially Observable Monte Carlo Planning
framework with modifications to ensure constraint feasibility and a heuristic
rollout policy tailored for AIPPMS. We evaluate our method on two domains: a
simulated search-and-rescue scenario and a challenging extension to the classic
RockSample problem. We find that our approach outperforms a classic AIPP
algorithm that is modified for AIPPMS, as well as online planning using a
random rollout policy.Comment: First two authors contributed equally; International Conference on
Automated Planning and Scheduling (ICAPS) 202
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
We propose fine-tuning large language models for generation of stable
materials. While unorthodox, fine-tuning large language models on text-encoded
atomistic data is simple to implement yet reliable, with around 90% of sampled
structures obeying physical constraints on atom positions and charges. Using
energy above hull calculations from both learned ML potentials and
gold-standard DFT calculations, we show that our strongest model (fine-tuned
LLaMA-2 70B) can generate materials predicted to be metastable at about twice
the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text
prompting's inherent flexibility, our models can simultaneously be used for
unconditional generation of stable material, infilling of partial structures
and text-conditional generation. Finally, we show that language models' ability
to capture key symmetries of crystal structures improves with model scale,
suggesting that the biases of pretrained LLMs are surprisingly well-suited for
atomistic data.Comment: ICLR 2024. Code available at:
https://github.com/facebookresearch/crystal-ll
Protein Design with Guided Discrete Diffusion
A popular approach to protein design is to combine a generative model with a
discriminative model for conditional sampling. The generative model samples
plausible sequences while the discriminative model guides a search for
sequences with high fitness. Given its broad success in conditional sampling,
classifier-guided diffusion modeling is a promising foundation for protein
design, leading many to develop guided diffusion models for structure with
inverse folding to recover sequences. In this work, we propose diffusioN
Optimized Sampling (NOS), a guidance method for discrete diffusion models that
follows gradients in the hidden states of the denoising network. NOS makes it
possible to perform design directly in sequence space, circumventing
significant limitations of structure-based methods, including scarce data and
challenging inverse design. Moreover, we use NOS to generalize LaMBO, a
Bayesian optimization procedure for sequence design that facilitates multiple
objectives and edit-based constraints. The resulting method, LaMBO-2, enables
discrete diffusions and stronger performance with limited edits through a novel
application of saliency maps. We apply LaMBO-2 to a real-world protein design
task, optimizing antibodies for higher expression yield and binding affinity to
several therapeutic targets under locality and developability constraints,
attaining a 99% expression rate and 40% binding rate in exploratory in vitro
experiments
Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Bayesian optimization (BayesOpt) is a gold standard for query-efficient
continuous optimization. However, its adoption for drug design has been
hindered by the discrete, high-dimensional nature of the decision variables. We
develop a new approach (LaMBO) which jointly trains a denoising autoencoder
with a discriminative multi-task Gaussian process head, allowing gradient-based
optimization of multi-objective acquisition functions in the latent space of
the autoencoder. These acquisition functions allow LaMBO to balance the
explore-exploit tradeoff over multiple design rounds, and to balance objective
tradeoffs by optimizing sequences at many different points on the Pareto
frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce
new tasks optimizing \emph{in silico} and \emph{in vitro} properties of
large-molecule fluorescent proteins. In our experiments LaMBO outperforms
genetic optimizers and does not require a large pretraining corpus,
demonstrating that BayesOpt is practical and effective for biological sequence
design.Comment: ICML 2022. Code available at https://github.com/samuelstanton/lamb