Energies of the HOMO and LUMO Orbitals for 111725 Organic Molecules Calculated by DFT B3LYP / 6-31G*

Abstract

HOMO and LUMO orbital energies for 111725 organic molecules calculated at the B3LYP/6-31G*//PM6 or B3LYP/6-31G*//PM7 level of theory.<br><br>Related publication:<br><br>* Florbela Pereira, Kaixia Xiao, Diogo A. R. S. Latino, Chengcheng Wu, Qingyou Zhang and Joao Aires-de-Sousa:<br><br>Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals.<br><br>J. Chem. Inf. Model. (2017)<br><br>DOI: <a href="http://dx.doi.org/10.1021/acs.jcim.6b00340">10.1021/acs.jcim.6b00340</a><br> <br><br>This data set is publicly available at http://dx.doi.org/10.6084/m9.figshare.3384184.v1<br><br> <br><br>Files<br>-----<br><br>frontier_orbitals_111725mols_sdf.tar.gz - 111275 molecules in the MDL SDFile format<br><br>frontier_orbitals_111725mols.xlsx - HOMO and LUMO orbital energies for 111275 neutral organic molecules<br><br>coordinates_111725mols_xyz.zip - atomic coordinates used for the DFT calculation of the 111275 molecules<br><br>PM7_frontier_orbitals.xlsx - HOMO and LUMO energies calculated by the PM7 semi-empirical method.<br><br><br><br>Molecules<br>---------<br><br>For the database creation, molecular structural motifs were retrieved from organic electronics studies, and collections of dyes, metabolites and electrophiles/nucleophiles [1-5]. The database was populated by retrieval of similar examples from the ZINC database [6], the PubChem database [7] and by computationally combining motifs and lists of substituents with the ChemAxon Reactor software, JChem 15.4.6, 2015, ChemAxon (http://www.chemaxon.com). The structures were standardized with ChemAxon Standardizer (JChem 15.4.6, 2015, ChemAxon, http://www.chemaxon.com) and OpenBabel (Open Babel Package, version 2.3.1 http://openbabel.org) for neutralization and inclusion of all hydrogen atoms. The molecular structures include atomic elements C, H, B, N, O, F, Si, P, S, Cl, Se, and Br.<br><br>Molecular geometries were relaxed by the PM6 or PM7 methods using the MOPAC software [8] and orbital energies were calculated by the GAMESS program [9] with the B3LYP functional and the 6-31G* basis set. Structures were calculated with the geometry obtained with the PM6 or PM7 semi-empirical method.<br><br> <br>Format<br>------<br><br>Each molecule is stored in its own file, ending in ".sdf". These are the starting structures, previous to geometry relaxation with the MOPAC program. <br><br>The format is the standard MDL SDFile generated with ChemAxon Standardizer and OpenBabel.<br><br>The atomic coordinates obtained with the PM6 and PM7 methods are stored in files ending in ".xyz", one for each molecule. Each file comprises a header line specifying the number of atoms <i>n</i>, a line with the id of the structure, and <i>n</i> lines containing the element and atomic coordinates, one atom per line.<br><br>Orbital energies are stored in the frontier_orbitals_111725mols.xlsx file. Two different sheets are used for the main database and a data set used as final test set in the related publication. PM7 values are stored in the PM7_frontier_orbitals.xlsx with the same format. <br><br><br>Column Content of .xlsx files<br>------<br><br>1 Molecule ID (as appears in the corresponding .sdf file name)<br><br>2 HOMO energy in eV.<br><br>3 LUMO energy in eV.<br><br><br>References<br>----------<br><br>[1] Po R, Bianchi G, Carbonera C, Pellegrino A: All that glisters is not gold: an analysis of the synthetic complexity of efficient polymer donors for polymer solar cells. Macromolecules 2015, 48:453-461.<br><br>[2] Hachmann J, Olivares-Amaya R, Atahan-Evrenk S, Amador-Bedolla C, Sanchez-Carrera RS, Gold-Parker A, Vogt L, Brockway AM, Aspuru-Guzik A: The Harvard Clean Energy Project: large-scale computational screening and design of organic photovoltaics on the world community grid. J Phys Chem Lett 2011, 2:2241-2251.<br><br>[3] O’Boyle NM, Campbell CM, Hutchison GR: Computational design and selection of optimal organic photovoltaic materials. J Phys Chem C 2011, 115:16200-16210.<br><br>[4] Mayr H, Ofial AR: Kinetics of electrophile-nucleophile combinations: a general approach to polar organic reactivity. Pure Appl Chem 2005, 77:1807-1821.<br><br>[5] Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 2000, 28:27-30.<br><br>[6] Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG: ZINC: a free tool to discover chemistry for biology. J Chem Inf Model 2012, 52:1757-1768.<br><br>[7] Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH: PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44(D1):D1202-13. <br><br>[8] MOPAC2009 and MOPAC2012, James J. P. Stewart, Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2008-2012).<br><br>[9] Schmidt MW, Baldridge KK, Boatz JA, Elbert ST, Gordon MS, Jensen JJ, Koseki S, Matsunaga N, Nguyen KA, Su S, Windus TL, Dupuis M, Montgomery JA: General atomic and molecular electronic structure system. J Comput Chem 1993, 14:1347-1363. GAMESS Version 1 May 2013 (R1).<br><br

    Similar works