The Study on the Partial Least Squares Regression, Principal Component Regression, and Neural Networks to Analyze the Near-Infrared Spectrum Data

Abstract

近年來利用近紅外光分析儀等光譜儀器,來測定物質中的各種化學成分含量,有越來越普遍的趨勢,而分析這類型的資料,需要藉助多變數檢量的技巧,本研究利用主成分迴歸(principal component regression, PCR)、淨最小平方迴歸(partial least squares regression, PLSR)以及類神經網路(neural networks , NN)三種檢量模式,以及利用Gnanadesikan(1997)的概念,改良主成分分析及淨最小平方法,並藉著所得到的加性主成分(additive principal components),以及線性二次淨最小平方法(linear-quadratic partial least squares),針對糙米粒粗蛋白質含量的近紅外線光譜資料,進行檢量模式的建立,結果Gnanadesikan(1997)的擴充 矩陣概念,確實能改善PCR模式,不過當與PLSR相比時,PLSR模式的表現仍較佳。 除此之外,本文也探討眾多非線性檢量技巧之一的類神經網路,並分別以原始351個波段、主成分分析之成分以及淨最小平方法的得點 ,當作輸入的變數,但是得到的NN模式表現並不理想,因此再利用PCR、PLSR模式的迴歸係數,取代NN輸入層與隱藏層之間初始的加權值,來改良NN模式,最後得到的結果,以PLSR模式中的迴歸係數為輸入層與隱藏層之間初始的加權值之NN表現最好,並且也能加快網路的學習速度,因此以迴歸係數來改良NN,的確能夠改善強NN建立模式的能力。Near-infrared reflectance spectroscopy (NIRS) is widespreadly used for quantitative applications of chemometrics in recent years. In this work, calibration methods including principal component regression (PCR), partial least squares regression (PLSR), additive principal component regression (APCR), linear-quadratic partial least squares regression (LQ-PLSR) and neural networks (NN) were used in conjunction with the near-infrared reflectance spectroscopy technique to determine the protein content of brown rice. Some calibration methods are insensitive to the effects of non-linearity. Such is the case with the model developed by Gnanadesikan (1977), which expands the X matrix with the squares of the variables. The projection of the additive principal component analysis (APCs) and linear-quadratic partial least squares (LQ-PLS) components on a surface in the expanded space corresponds to that of the original X matrices in a quadratic space. The LQ-PLS regression preserves a linear internal relationship between the scores of the X and Y matrices. Based on the results, the additive principal component regression performed better than principal component regression and the performance of partial least squares regression was the best. In addition, the result of applying the scores of the principal component analysis and partial least squares to the neural networks was compared. It was found that the neural networks approach was not effective. Hence the regression coefficient of PCR and PLSR were used as the initial weights between the input and hidden layer in neural networks model. It was shown that the neural networks model based on the regression coefficient of PLSR performed most effective, so it was the best choice of calibration modeling.中文摘要………………………………………………… I 英文摘要………………………………………………… II 第壹章 前言………………………………………………1 第貳章 前人研究…………………………………………4 一、線性檢量方法……………………………………… 4 (一)主成分迴歸……………………………………… 4 (二)淨最小平方法…………………………………… 6 二、非線性檢量方法…………………………………… 12 (一)類神經網路……………………………………… 13 1.類神經網路的原理…………………………………… 13 2.倒傳遞網路…………………………………………… 16 (1).倒傳遞網路演算法………………………………… 19 (2).倒傳遞網路的參數設定…………………………… 23 三、預測模式的驗證…………………………………… 25 (一)內部驗證法……………………………………… 27 (二)外部驗證法……………………………………… 28 第參章 實例研究..………………………………………29 一、試驗資料分析……………………………………… 29 二、實例研究一………………………………………… 32 (一)目的與方法……………………………………… 32 (二)結果與…………………………………………… 33 三、實例研究二………………………………………… 44 (一)目的與方法……………………………………… 44 (二)結果與討論……………………………………… 45 四、實例研究三………………………………………… 57 (一)目的與方法……………………………………… 57 (二)結果與討論……………………………………… 58 第肆章 綜合討論…………………………………………78 參考文獻………………………………………………… 81 附錄 SAS/IML POGRAM……………………………………84 A.1 PCR……………………………………………………84 A.2 APCR………………………………………………… 85 A.3 PLSR………………………………………………… 86 A.4 LQ-PLSR………………………………………………87 A.5 NN-1………………………………………………… 88 A.6 NN-2………………………………………………… 92 A.7 NN-3………………………………………………… 96 A.8 PC-NN……………………………………………… 100 A.9 PLS-NN………………………………………………10

    Similar works