The identification of metabolites from complex biological samples often
involves matching experimental mass spectrometry data to signatures of
compounds derived from massive chemical databases. However, misidentifications
may result due to the complexity of potential chemical space that leads to
databases containing compounds with nearly identical structures. Prior
knowledge of compounds that may be enzymatically consumed or produced by an
organism can help reduce misidentifications by restricting initial database
searching to compounds that are likely to be present in a biological system.
While databases such as UniProt allow for the identification of small molecules
that may be consumed or generated by enzymes encoded in an organism's genome,
currently no tool exists for identifying SMILES strings of metabolites
associated with protein identifiers and expanding R-containing substructures to
fully defined, biologically relevant chemical structures. Here we present
Proteome2Metabolome (P2M), a tool that performs these tasks using external
database querying behind a simple command line interface. Beyond mass
spectrometry based applications, P2M can be generally used to identify
biologically relevant chemical structures likely to be observed in a biological
system