Production deployments in complex systems require ML architectures to be
highly efficient and usable against multiple tasks. Particularly demanding are
classification problems in which data arrives in a streaming fashion and each
class is presented separately. Recent methods with stochastic gradient learning
have been shown to struggle in such setups or have limitations like memory
buffers, and being restricted to specific domains that disable its usage in
real-world scenarios. For this reason, we present a fully differentiable
architecture based on the Mixture of Experts model, that enables the training
of high-performance classifiers when examples from each class are presented
separately. We conducted exhaustive experiments that proved its applicability
in various domains and ability to learn online in production environments. The
proposed technique achieves SOTA results without a memory buffer and clearly
outperforms the reference methods.Comment: arXiv admin note: text overlap with arXiv:2211.1496