2 research outputs found

    What demographic attributes do our digital footprints reveal? A systematic review

    Get PDF
    <div><p>To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who they are and what their interests may be. In the present paper we report a systematic review that synthesises current evidence on predicting demographic attributes from online digital traces. Studies were included if they met the following criteria: (i) they reported findings where at least one demographic attribute was predicted/inferred from at least one form of digital footprint, (ii) the method of prediction was automated, and (iii) the traces were either visible (e.g. tweets) or non-visible (e.g. clickstreams). We identified 327 studies published up until October 2018. Across these articles, 14 demographic attributes were successfully inferred from digital traces; the most studied included gender, age, location, and political orientation. For each of the demographic attributes identified, we provide a database containing the platforms and digital traces examined, sample sizes, accuracy measures and the classification methods applied. Finally, we discuss the main research trends/findings, methodological approaches and recommend directions for future research.</p></div

    Author Profiling Using Support Vector Machines Notebook for PAN at CLEF 2016

    No full text
    Abstract The objective of this work is to identify the gender and age of the author of a set of tweets using Support Vector Machines. This work is done as a task for the PAN 2016 which is a part of the CLEF conference. Techniques like tagging, removing stopwords, stemming, Bag-of-Words representation were used in order to create a 10 classes model. The tuning of the model was based on grid-search using k-fold cross-validation. The model was tested for precision and recall with the corpus from PAN 2015 and PAN 2016 and the results are presented. We have experienced the Peaking Phenomenon with the increment of the number of features. In the future we plan to try the term frequency-inverse document frequency in order to improve our results
    corecore