This paper introduces BWSNet, a model that can be trained from raw human
judgements obtained through a Best-Worst scaling (BWS) experiment. It maps
sound samples into an embedded space that represents the perception of a
studied attribute. To this end, we propose a set of cost functions and
constraints, interpreting trial-wise ordinal relations as distance comparisons
in a metric learning task. We tested our proposal on data from two BWS studies
investigating the perception of speech social attitudes and timbral qualities.
For both datasets, our results show that the structure of the latent space is
faithful to human judgements