As a contribution to personality detection in languages other than English,
we rely on distant supervision to create Personal-ITY, a novel corpus of
YouTube comments in Italian, where authors are labelled with personality
traits. The traits are derived from one of the mainstream personality theories
in psychology research, named MBTI. Using personality prediction experiments,
we (i) study the task of personality prediction in itself on our corpus as well
as on TwiSty, a Twitter dataset also annotated with MBTI labels; (ii) carry out
an extensive, in-depth analysis of the features used by the classifier, and
view them specifically under the light of the original theory that we used to
create the corpus in the first place. We observe that no single model is best
at personality detection, and that while some traits are easier than others to
detect, and also to match back to theory, for other, less frequent traits the
picture is much more blurred.Comment: 12 pages, Accepted at PEOPLES 2020 (workshop COLING 2020). arXiv
admin note: text overlap with arXiv:2011.0568