1 research outputs found
On convex conceptual regions in deep network representations
The current study of human-machine alignment aims at understanding the
geometry of latent spaces and the correspondence to human representations.
G\"ardenfors' conceptual spaces is a prominent framework for understanding
human representations. Convexity of object regions in conceptual spaces is
argued to promote generalizability, few-shot learning, and intersubject
alignment. Based on these insights, we investigate the notion of convexity of
concept regions in machine-learned latent spaces. We develop a set of tools for
measuring convexity in sampled data and evaluate emergent convexity in layered
representations of state-of-the-art deep networks. We show that convexity is
robust to basic re-parametrization, hence, meaningful as a quality of
machine-learned latent spaces. We find that approximate convexity is pervasive
in neural representations in multiple application domains, including models of
images, audio, human activity, text, and brain data. We measure convexity
separately for labels (i.e., targets for fine-tuning) and other concepts.
Generally, we observe that fine-tuning increases the convexity of label
regions, while for more general concepts, it depends on the alignment of the
concept with the fine-tuning objective. We find evidence that pre-training
convexity of class label regions predicts subsequent fine-tuning performance