Metabolic pathfinding is the task of finding preferred metabolic pathways from
metabolic large reaction databases. Representing metabolism via networks
enables quick enumeration of paths between two compounds. Automated
pathfinding helps in working with ever increasing databases if reactions and
in finding novel pathways for metabolic engineering. However, the number
of pathways between two compounds can be as large as 500,000 in some
metabolic models and even more as the size of the input database grows,
which makes it imperative that the most relevant ones are ranked highly. While
graph theoretic representations of metabolic networks bring speed and ease
in enumeration of pathways, they also create the challenge of biochemically
insensible shortcuts through pool or currency metabolites.
In the past, strategies to circumvent such irrelevant pathways have included
weighing networks using the degree of nodes or the manual curation of edges
in the metabolic network. The former method wrongfully penalizes some
primary metabolites central to metabolism, while the latter requires someone
to complete manual curation. KEGG RPAIR database is an annotation to
describe reactions in terms of reactant pairs and has been used for metabolic
pathfinding. Here, I first study a few different centrality measures to identify
currency metabolites and identify one better than the degree centrality. I then
describe a method to augment the KEGG RPAIR based pathfinding method
using a chemical composition score and evaluate its ability to augment and
replace the role of RPAIRs in pathfinding. The new algorithm is validated
against a set of 30 biochemical pathways in E.coli. Since this method uses
chemical composition as a fallback measure, it can be used in the absence of
explicit RPAIR information, thus allowing the identification of putative paths
not possible via methods using the RPAIR database alone