1 research outputs found
Explicable Machine Learning for Predicting High-Efficiency Lignocellulose Pretreatment Solvents Based on Kamlet–Taft and Polarity Parameters
Incorporating density functional theory (DFT) and machine
learning
(ML) methodologies, an intrinsic relationship model was developed
utilizing the Kamlet–Taft parameters and polarity values of
104 deep eutectic solvents (DES). DES with high lignocellulosic pretreatment
efficiency were expected to be screened through the synergistic combination
of hydrogen bond acidity (α), hydrogen bond basicity (β),
polarization (Î *) and molecular polarity index (MPI). Partial
least-squares (PLS) models and a variety of ML models were used to
predict cellulose retention and delignification. The XGBoost model
has the highest predictive performance with R2 of 0.97 and 0.91, respectively. Feature importance analysis
and partial dependence analysis were used to explain the importance
of variables based on the XGBoost model. Feature importance analysis
showed that α, β, Π* of DES and MPI of hydrogen
bond donor determined the pretreatment efficiency. The partial dependence
analysis showed that the relationship among 4 parameters and the pretreatment
efficiency is nonlinear, and there are multiple extreme values in
different intervals. The model gave a parameter range corresponding
to the high pretreatment efficiency. Based on the range of 4 parameters
given in this study, the composition and ratio of DES can be selected
to ensure that at least 80% of the cellulose is retained and 50% of
the lignin is removed. Molecular simulation results showed that these
highly efficient DES often contain a large number of hydrogen bonds
and highly polar groups