We investigate a novel non-parametric regression-based clustering algorithm
for longitudinal data analysis. Combining natural cubic splines with Gaussian
mixture models (GMM), the algorithm can produce smooth cluster means that
describe the underlying data well. However, there are some shortcomings in the
algorithm: high computational complexity in the parameter estimation procedure
and a numerically unstable variance estimator. Therefore, to further increase
the usability of the method, we incorporated approaches to reduce its
computational complexity, we developed a new, more stable variance estimator,
and we developed a new smoothing parameter estimation procedure. We show that
the developed algorithm, SMIXS, performs better than GMM on a synthetic dataset
in terms of clustering and regression performance. We demonstrate the impact of
the computational speed-ups, which we formally prove in the new framework.
Finally, we perform a case study by using SMIXS to cluster vertical atmospheric
measurements to determine different weather regimes