Building Quantitative Prediction Models for Tissue Residue of Two Explosives Compounds in Earthworms from Microarray Gene Expression Data
- Publication date
- Publisher
Abstract
Soil contamination near munitions plants and testing grounds is a serious environmental concern that can result in the formation of tissue chemical residue in exposed animals. Quantitative prediction of tissue residue still represents a challenging task despite long-term interest and pursuit, as tissue residue formation is the result of many dynamic processes including uptake, transformation, and assimilation. The availability of high-dimensional microarray gene expression data presents a new opportunity for computational predictive modeling of tissue residue from changes in expression profile. Here we analyzed a 240-sample data set with measurements of transcriptomic-wide gene expression and tissue residue of two chemicals, 2,4,6-trinitrotoluene (TNT) and 1,3,5-trinitro-1,3,5-triazacyclohexane (RDX), in the earthworm <i>Eisenia fetida</i>. We applied two different computational approaches, LASSO (Least Absolute Shrinkage and Selection Operator) and RF (Random Forest), to identify predictor genes and built predictive models. Each approach was tested alone and in combination with a prior variable selection procedure that involved the Wilcoxon rank-sum test and HOPACH (Hierarchical Ordered Partitioning And Collapsing Hybrid). Model evaluation results suggest that LASSO was the best performer of minimum complexity on the TNT data set, whereas the combined Wilcoxon-HOPACH-RF approach achieved the highest prediction accuracy on the RDX data set. Our models separately identified two small sets of ca. 30 predictor genes for RDX and TNT. We have demonstrated that both LASSO and RF are powerful tools for quantitative prediction of tissue residue. They also leave more unknown than explained, however, allowing room for improvement with other computational methods and extension to mixture contamination scenarios