Metabolomics data is typically scaled
to a common reference like
a constant volume of body fluid, a constant creatinine level, or a
constant area under the spectrum. Such scaling of the data, however,
may affect the selection of biomarkers and the biological interpretation
of results in unforeseen ways. Here, we studied how both the outcome
of hypothesis tests for differential metabolite concentration and
the screening for multivariate metabolite signatures are affected
by the choice of scale. To overcome this problem for metabolite signatures
and to establish a scale-invariant biomarker discovery algorithm,
we extended linear zero-sum regression to the logistic regression
framework and showed in two applications to <sup>1</sup>H NMR-based
metabolomics data how this approach overcomes the scaling problem.
Logistic zero-sum regression is available as an R package as well
as a high-performance computing implementation that can be downloaded
at https://github.com/rehbergT/zeroSum