16 research outputs found

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.Comment: 34 pages, 15 figures, comments and suggestions for additional references are welcome

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science

    SELFIES and the future of molecular string representations

    Get PDF
    Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science

    Bench-stable N -heterocyclic carbene nickel precatalysts for C−C and C−N bond-forming reactions

    No full text
    Herein, we introduce a new class of bench-stable N-heterocyclic carbene (NHC) nickel-precatalysts for homogeneous nickel-catalysis. The nickel(II) complexes are readily activated to Ni0 in situ under mild conditions, via a proposed Heck-type mechanism. The precatalysts are shown to facilitate carbonyl-ene, hydroalkenylation, and amination reactions.NIH Ruth L. Kirschstein National Research Service Award (no. F32GM120852

    Alkyne-Alkene [2+2] cycloaddition based on visible light photocatalysis

    Get PDF
    UV-activated alkyne-alkene [2+2] cycloaddition has served as an important tool to access cyclobutenes. Although broadly adopted, the limitations with UV light as an energy source prompted us to explore an alternative method. Here we report alkyne-alkene [2+2] cycloaddition based on visible light photocatalysis allowing the synthesis of diverse cyclobutenes and 1,3-dienes via inter- and intramolecular reactions. Extensive mechanistic studies suggest that the localized spin densities at sp(2) carbons of alkenes account for the productive sensitization of alkenes despite their similar triplet levels of alkenes and alkynes. Moreover, the efficient formation of 1,3-dienes via tandem triplet activation of the resulting cyclobutenes is observed when intramolecular enyne cycloaddition is performed, which may serve as a complementary means to the Ru(II)-catalyzed enyne metathesis. In addition, the utility of the [2+2] cycloaddition has been demonstrated by several synthetic transformations including synthesis of various extended pi-systems
    corecore