Structural bioinformatics Extending P450 site-of-metabolism models with region-resolution data


Motivation: Cytochrome P450s are a family of enzymes responsible for the metabolism of approxi-mately 90 % of FDA-approved drugs. Medicinal chemists often want to know which atoms of a mol-ecule—its metabolized sites—are oxidized by Cytochrome P450s in order to modify their metabol-ism. Consequently, there are several methods that use literature-derived, atom-resolution data to train models that can predict a molecule’s sites of metabolism. There is, however, much more data available at a lower resolution, where the exact site of metabolism is not known, but the region of the molecule that is oxidized is known. Until now, no site-of-metabolism models made use of re-gion-resolution data. Results: Here, we describe XenoSite-Region, the first reported method for training site-of-metabol-ism models with region-resolution data. Our approach uses the Expectation Maximization algo-rithm to train a site-of-metabolism model. Region-resolution metabolism data was simulated from a large site-of-metabolism dataset, containing 2000 molecules with 3400 metabolized and 30 000 un-metabolized sites and covering nine Cytochrome P450 isozymes. When training on the same molecules (but with only region-level information), we find that this approach yields models almost as accurate as models trained with atom-resolution data. Moreover, we find that atom-resolution trained models are more accurate when also trained with region-resolution data from additional molecules. Our approach, therefore, opens up a way to extend the applicable domain of site-of-me-tabolism models into larger regions of chemical space. This meets a critical need in drug develop-ment by tapping into underutilized data commonly available in most large drug companies. Availability and implementation: The algorithm, data and a web server are available a

Similar works

Full text



Last time updated on 12/04/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.