261 research outputs found
Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality
As a contribution to personality detection in languages other than English,
we rely on distant supervision to create Personal-ITY, a novel corpus of
YouTube comments in Italian, where authors are labelled with personality
traits. The traits are derived from one of the mainstream personality theories
in psychology research, named MBTI. Using personality prediction experiments,
we (i) study the task of personality prediction in itself on our corpus as well
as on TwiSty, a Twitter dataset also annotated with MBTI labels; (ii) carry out
an extensive, in-depth analysis of the features used by the classifier, and
view them specifically under the light of the original theory that we used to
create the corpus in the first place. We observe that no single model is best
at personality detection, and that while some traits are easier than others to
detect, and also to match back to theory, for other, less frequent traits the
picture is much more blurred.Comment: 12 pages, Accepted at PEOPLES 2020 (workshop COLING 2020). arXiv
admin note: text overlap with arXiv:2011.0568
Personal-ITY:A Novel YouTube-based Corpus for Personality Prediction in Italian
We present a novel corpus for personality prediction in Italian, containing a larger number of authors and a different genre compared to previously available resources. The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to a variety of experiments. We report on preliminary experiments on Personal-ITY, which can serve as a baseline for future work, showing that some types are easier to predict than others, and discussing the perks of cross-dataset prediction
Personal-ITY:A Novel YouTube-based Corpus for Personality Prediction in Italian
We present a novel corpus for personality prediction in Italian, containing a larger number of authors and a different genre compared to previously available resources. The corpus is built exploiting Distant Supervision, assigning Myers-Briggs Type Indicator (MBTI) labels to YouTube comments, and can lend itself to a variety of experiments. We report on preliminary experiments on Personal-ITY, which can serve as a baseline for future work, showing that some types are easier to predict than others, and discussing the perks of cross-dataset prediction
- …