A robust pathfinding algorithm using chemical composition

Abstract

Metabolic pathfinding is the task of finding preferred metabolic pathways from metabolic large reaction databases. Representing metabolism via networks enables quick enumeration of paths between two compounds. Automated pathfinding helps in working with ever increasing databases if reactions and in finding novel pathways for metabolic engineering. However, the number of pathways between two compounds can be as large as 500,000 in some metabolic models and even more as the size of the input database grows, which makes it imperative that the most relevant ones are ranked highly. While graph theoretic representations of metabolic networks bring speed and ease in enumeration of pathways, they also create the challenge of biochemically insensible shortcuts through pool or currency metabolites. In the past, strategies to circumvent such irrelevant pathways have included weighing networks using the degree of nodes or the manual curation of edges in the metabolic network. The former method wrongfully penalizes some primary metabolites central to metabolism, while the latter requires someone to complete manual curation. KEGG RPAIR database is an annotation to describe reactions in terms of reactant pairs and has been used for metabolic pathfinding. Here, I first study a few different centrality measures to identify currency metabolites and identify one better than the degree centrality. I then describe a method to augment the KEGG RPAIR based pathfinding method using a chemical composition score and evaluate its ability to augment and replace the role of RPAIRs in pathfinding. The new algorithm is validated against a set of 30 biochemical pathways in E.coli. Since this method uses chemical composition as a fallback measure, it can be used in the absence of explicit RPAIR information, thus allowing the identification of putative paths not possible via methods using the RPAIR database alone

    Similar works