Near Infrared (NIR) spectrometry is a non-destructive and relatively cheap technology which enables automated controls in various domains such as food industry or pharmaceutics. Yet, if the quality of the prediction obtained thanks to NIR spectra is important, identifying the chemical components responsible for the prediction is also an essential issue, often neglected by traditional methods.
Generally speaking, NIR spectra may be considered as high-dimensional vectors, with an important degree of redundancy between components. These properties lead to numerical issues and render the models complex to interpret. A dimensionality reduction step is consequently required. Besides, factors such as experimental conditions induce non-linearities in the relationship between the spectral variables and the parameter of interest, which are ignored by the models traditionally met in this context.
The main goal of this work is therefore to propose a methodology taking the non-linearities into account and leading to an easier interpretation in terms of wavelength bands. This methodology relies on three aspects: spectra and variables normalizations, dimensionality reduction steps and non-linear modeling. In particular, the dimensionality issue is addressed by filters based on the Mutual Information concept, and functional methods such as B-splines representation or variable clustering.
A study over six databases reveals that non-linear models globally outperform linear models. In addition, the proposed methodology enables to identify a reduced number of wavelength ranges which correspond mostly to spectral regions considered as meaningful by the specialists.(FSA 3) -- UCL, 201