7,148 research outputs found
Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
The efficient exploration of chemical space to design molecules with intended
properties enables the accelerated discovery of drugs, materials, and
catalysts, and is one of the most important outstanding challenges in
chemistry. Encouraged by the recent surge in computer power and artificial
intelligence development, many algorithms have been developed to tackle this
problem. However, despite the emergence of many new approaches in recent years,
comparatively little progress has been made in developing realistic benchmarks
that reflect the complexity of molecular design for real-world applications. In
this work, we develop a set of practical benchmark tasks relying on physical
simulation of molecular systems mimicking real-life molecular design problems
for materials, drugs, and chemical reactions. Additionally, we demonstrate the
utility and ease of use of our new benchmark set by demonstrating how to
compare the performance of several well-established families of algorithms.
Surprisingly, we find that model performance can strongly depend on the
benchmark domain. We believe that our benchmark suite will help move the field
towards more realistic molecular design benchmarks, and move the development of
inverse molecular design algorithms closer to designing molecules that solve
existing problems in both academia and industry alike.Comment: 29+21 pages, 6+19 figures, 6+2 table
Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
By combining metal nodes with organic linkers we can potentially synthesize
millions of possible metal organic frameworks (MOFs). At present, we have
libraries of over ten thousand synthesized materials and millions of in-silico
predicted materials. The fact that we have so many materials opens many
exciting avenues to tailor make a material that is optimal for a given
application. However, from an experimental and computational point of view we
simply have too many materials to screen using brute-force techniques. In this
review, we show that having so many materials allows us to use big-data methods
as a powerful technique to study these materials and to discover complex
correlations. The first part of the review gives an introduction to the
principles of big-data science. We emphasize the importance of data collection,
methods to augment small data sets, how to select appropriate training sets. An
important part of this review are the different approaches that are used to
represent these materials in feature space. The review also includes a general
overview of the different ML techniques, but as most applications in porous
materials use supervised ML our review is focused on the different approaches
for supervised ML. In particular, we review the different method to optimize
the ML process and how to quantify the performance of the different methods. In
the second part, we review how the different approaches of ML have been applied
to porous materials. In particular, we discuss applications in the field of gas
storage and separation, the stability of these materials, their electronic
properties, and their synthesis. The range of topics illustrates the large
variety of topics that can be studied with big-data science. Given the
increasing interest of the scientific community in ML, we expect this list to
rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures
- …