The drivers of compositionality in artificial languages that emerge when two
(or more) agents play a non-visual referential game has been previously
investigated using approaches based on the REINFORCE algorithm and the (Neural)
Iterated Learning Model. Following the more recent introduction of the
\textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper
investigates to what extent the drivers of compositionality identified so far
in the field apply in the ST-GS context and to what extent do they translate
into (emergent) systematic generalisation abilities, when playing a visual
referential game. Compositionality and the generalisation abilities of the
emergent languages are assessed using topographic similarity and zero-shot
compositional tests. Firstly, we provide evidence that the test-train split
strategy significantly impacts the zero-shot compositional tests when dealing
with visual stimuli, whilst it does not when dealing with symbolic ones.
Secondly, empirical evidence shows that using the ST-GS approach with small
batch sizes and an overcomplete communication channel improves compositionality
in the emerging languages. Nevertheless, while shown robust with symbolic
stimuli, the effect of the batch size is not so clear-cut when dealing with
visual stimuli. Our results also show that not all overcomplete communication
channels are created equal. Indeed, while increasing the maximum sentence
length is found to be beneficial to further both compositionality and
generalisation abilities, increasing the vocabulary size is found detrimental.
Finally, a lack of correlation between the language compositionality at
training-time and the agents' generalisation abilities is observed in the
context of discriminative referential games with visual stimuli. This is
similar to previous observations in the field using the generative variant with
symbolic stimuli.Comment: Accepted at 4th NeurIPS Workshop on Emergent Communication (EmeCom @
NeurIPS 2020