A longstanding debate in semiotics centers on the relationship between
linguistic signs and their corresponding semantics: is there an arbitrary
relationship between a word form and its meaning, or does some systematic
phenomenon pervade? For instance, does the character bigram \textit{gl} have
any systematic relationship to the meaning of words like \textit{glisten},
\textit{gleam} and \textit{glow}? In this work, we offer a holistic
quantification of the systematicity of the sign using mutual information and
recurrent neural networks. We employ these in a data-driven and massively
multilingual approach to the question, examining 106 languages. We find a
statistically significant reduction in entropy when modeling a word form
conditioned on its semantic representation. Encouragingly, we also recover
well-attested English examples of systematic affixes. We conclude with the
meta-point: Our approximate effect size (measured in bits) is quite
small---despite some amount of systematicity between form and meaning, an
arbitrary relationship and its resulting benefits dominate human language.Comment: Accepted for publication at ACL 201