A basic electro-topological descriptor for the prediction of organic molecule geometries by simple machine learning

Abstract

This paper proposes a machine learning (ML) method to predict stable molecular geometries from their chemical composition. The method is useful for generating molecular conformations which may serve as initial geometries for saving time during expensive structure optimizations by quantum mechanical calculations of large molecules. Conformations are found by predicting the local arrangement around each atom in the molecule after trained from a database of previously optimized small molecules. It works by dividing each molecule in the database into minimal building blocks of different type. The algorithm is then trained to predict bond lengths and angles for each type of building block using an electro-topological fingerprint as descriptor. A conformation is then generated by joining the predicted blocks. Our model is able to give promising results for optimized molecular geometries from the basic knowledge of the chemical formula and connectivity. The method trends to reproduce interatomic distances within test blocks with RMSD under 0.05

    Similar works