Multivariate Calibration Domain Adaptation with Unlabeled Data

Abstract

Multivariate calibration is about modeling the relationship between a substance\u27s chemical profile and its spectrum (here, near-infrared) in order to predict the concentration of new samples with known spectra. However, these new samples are often measured under different conditions than the primary conditions; different instruments, instrument drift, and temperature all affect the measurement conditions. Domain adaptation (DA) methods force the model to ignore these differences in order to generate an accurate model for the new domain (secondary conditions). There are two fundamental DA processes that individual methods can be classified under. One augments a few samples from the secondary domain with chemical reference values (labels) to the primary data and the other augments only secondary spectra (unlabeled data). In this work, we compare two existing labeled DA methods and two existing unlabeled DA methods to two novel labeled methods and a novel unlabeled approach. Since DA methods require selection of hyperparameters, a model selection framework based on model diversity and prediction similarity (MDPS) is applied to the DA methods. Regardless of the DA method, the MDPS process is shown to select models more accurate than the first quartile of all models generated by the DA process in three near-infrared datasets

    Similar works