1 research outputs found
Enhancing Representation Learning on High-Dimensional, Small-Size Tabular Data: A Divide and Conquer Method with Ensembled VAEs
Variational Autoencoders and their many variants have displayed impressive
ability to perform dimensionality reduction, often achieving state-of-the-art
performance. Many current methods however, struggle to learn good
representations in High Dimensional, Low Sample Size (HDLSS) tasks, which is an
inherently challenging setting. We address this challenge by using an ensemble
of lightweight VAEs to learn posteriors over subsets of the feature-space,
which get aggregated into a joint posterior in a novel divide-and-conquer
approach. Specifically, we present an alternative factorisation of the joint
posterior that induces a form of implicit data augmentation that yields greater
sample efficiency. Through a series of experiments on eight real-world
datasets, we show that our method learns better latent representations in HDLSS
settings, which leads to higher accuracy in a downstream classification task.
Furthermore, we verify that our approach has a positive effect on
disentanglement and achieves a lower estimated Total Correlation on learnt
representations. Finally, we show that our approach is robust to partial
features at inference, exhibiting little performance degradation even with most
features missing