Rapid, non-destructive characterization of molecular level chemistry for
organic matter (OM) is experimentally challenging. Raman spectroscopy is one of
the most widely used techniques for non-destructive chemical characterization,
although it currently does not provide detailed identification of molecular
components in OM, due to the combination of diffraction-limited spatial
resolution and poor applicability of peak-fitting algorithms. Here, we develop
a genome-inspired collective molecular structure fingerprinting approach, which
utilizes ab initio calculations and data mining techniques to extract molecular
level chemistry from the Raman spectra of OM. We illustrate the power of such
an approach by identifying representative molecular fingerprints in OM, for
which the molecular chemistry is to date inaccessible using non-destructive
characterization techniques. Chemical properties such as aromatic cluster size
distribution and H/C ratio can now be quantified directly using the identified
molecular fingerprints. Our approach will enable non-destructive identification
of chemical signatures with their correlation to the preservation of
biosignatures in OM, accurate detection and quantification of environmental
contamination, as well as objective assessment of OM with respect to their
chemical contents