What makes some types of languages more probable than others? For instance,
we know that almost all spoken languages contain the vowel phoneme /i/; why
should that be? The field of linguistic typology seeks to answer these
questions and, thereby, divine the mechanisms that underlie human language. In
our work, we tackle the problem of vowel system typology, i.e., we propose a
generative probability model of which vowels a language contains. In contrast
to previous work, we work directly with the acoustic information -- the first
two formant values -- rather than modeling discrete sets of phonemic symbols
(IPA). We develop a novel generative probability model and report results based
on a corpus of 233 languages.Comment: NAACL 201