52 research outputs found
Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection
[EN] We briefly report on the four shared tasks organized as part of the PAN 2020 evaluation lab on digital text forensics and authorship analysis. Each tasks is introduced, motivated, and the results obtained are presented. Altogether, the four tasks attracted 230 registrations, yielding 83 successful submissions. This, and the fact that we continue to invite the submissions of software rather than its run output using the TIRA experimentation platform, marks for a good start into the second decade of PAN evaluations labs.We thank Symanto for sponsoring the ex aequo award for the two best performing systems at the author profiling shared task of this year on Profiling fake news spreaders on Twitter. The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMIS-FAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018Âż096212-B-C31). The work of Anastasia Giachanou is supported by the SNSF Early Postdoc Mobility grant under the project Early Fake News Detection on Social Media, Switzerland (P2TIP2_181441).Bevendorff, J.; Ghanem, BHH.; Giachanou, A.; Kestemont, M.; Manjavacas, E.; Markov, I.; Mayerl, M.... (2020). Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection. Springer. 372-383. https://doi.org/10.1007/978-3-030-58219-7_25S372383Bevendorff, J., et al.: Shared tasks on authorship analysis at PAN 2020. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12036, pp. 508–516. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45442-5_66Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Bias analysis and mitigation in the evaluation of authorship verification. In: 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 6301–6306 (2019)Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Generalizing unmasking for short texts. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 654–659 (2019)Ghanem, B., Rosso, P., Rangel, F.: An emotional analysis of false information in social media and news articles. ACM Trans. Internet Technol. (TOIT) 20(2), 1–18 (2020)Giachanou, A., RĂssola, E.A., Ghanem, B., Crestani, F., Rosso, Paolo: The role of personality and linguistic patterns in discriminating between fake news spreaders and fact checkers. In: MĂ©tais, E., Meziane, F., Horacek, H., Cimiano, P. (eds.) NLDB 2020. LNCS, vol. 12089, pp. 181–192. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51310-8_17Giachanou, A., Rosso, P., Crestani, F.: Leveraging emotional signals for credibility detection. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 877–880 (2019)Kestemont, M., Stamatatos, E., Manjavacas, E., Daelemans, W., Potthast, M., Stein, B.: Overview of the cross-domain authorship attribution task at PAN 2019. In: Working Notes Papers of the CLEF 2019 Evaluation Labs. CEUR Workshop Proceedings (2019)Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes Papers of the CLEF 2018 Evaluation Labs. CEUR Workshop Proceedings (2018)Peñas, A., Rodrigo, A.: A simple measure to assess non-response. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th author profiling task at PAN 2020: profiling fake news spreaders on Twitter. In: CLEF 2020 Labs and Workshops, Notebook Papers (2020)Rangel, F., Franco-Salvador, M., Rosso, P.: A low dimensionality representation for language variety identification. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9624, pp. 156–169. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75487-1_13Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 430–435 (2018)Vo, N., Lee, K.: Learning from fact-checkers: analysis and generation of fact-checking language. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)Noreen, E.W.: Computer-Intensive Methods for Testing Hypotheses: An Introduction. A Wiley-Interscience Publication, Hoboken (1989)Wiegmann, M., Potthast, M., Stein, B.: Overview of the celebrity profiling task at PAN 2020. In: CLEF 2020 Labs and Workshops, Notebook Papers (2020)Wiegmann, M., Stein, B., Potthast, M.: Celebrity profiling. In: 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Association for Computational Linguistics (2019)Wiegmann, M., Stein, B., Potthast, M.: Overview of the celebrity profiling task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)Zangerle, E., Mayerl, M., Specht, G., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2020. In: CLEF 2020 Labs and Workshops, Notebook Papers (2020)Zangerle, E., Tschuggnall, M., Specht, G., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019
Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling
[Abstract] Automatic user profiling from social networks has become a popular task due to its commercial applications
(targeted advertising, market studies...). Automatic profiling models infer demographic characteristics
of social network users from their generated content or interactions. Users’ demographic information is also
precious for more social worrying tasks such as automatic early detection of mental disorders. For this type
of users’ analysis tasks, it has been shown that the way how they use language is an important indicator which
contributes to the effectiveness of the models. Therefore, we also consider that for identifying aspects such as
gender, age or user’s origin, it is interesting to consider the use of the language both from psycho-linguistic
and semantic features. A good selection of features will be vital for the performance of retrieval, classification,
and decision-making software systems. In this paper, we will address gender classification as a part of the automatic
profiling task. We show an experimental analysis of the performance of existing gender classification
models based on external corpus and baselines for automatic profiling. We analyse in-depth the influence of
the linguistic features in the classification accuracy of the model. After that analysis, we have put together a
feature set for gender classification models in social networks with an accuracy performance above existing
baselines.This work was supported by projects RTI2018-093336-B-C21, RTI2018-093336-B-C22 (Ministerio de Ciencia e Innvovacion & ERDF) and the financial support supplied by the Conselleria de Educacion, Universidade e Formacion Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruna as a Research Center of the Galician University System.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431B 2019/0
Perfilado automático de usuarios en corpus sociales sobre el movimiento Black Lives Matter
[Resumen]: Con el objetivo de proporcionar una herramienta útil basada en el perfilado automático de
autores, a lo largo de este trabajo se profundizará más en este campo en el contexto de las
redes sociales, analizando cuáles son los mejores algoritmos, las técnicas más utilizadas y las
caracterĂsticas más comĂşnes que se suelen emplear. Asimismo, se desarrollará, bajo la metodologĂa
Scrum, una aplicaciĂłn web en la que se permita a los usuarios perfilar los autores de
una colección de textos propia —haciendo uso de algoritmos seleccionados en la fase previa
de investigación— y mostrando los resultado del perfilado en un dashboard intuitivo y accesible.
Finalmente, para mostrar un caso de uso real de la aplicación, se empleará la colección de
referencia sobre el movimiento Black Lives Matter (#BLM) puesta a disposiciĂłn para este TFG
por los tutores del mismo, con el objetivo de analizar el perfil de los usuarios que participaron
en los debates en las redes sociales sobre este movimiento de gran relevancia social.[Abstract]: To provide a useful tool based on automatic author profiling, throughout this work, we
will research deeper into this field in the context of social networks, analyzing the best algorithms,
the most commonly used techniques and the typical features that are often employed.
Furthermore, under the Scrum methodology, we will develop a web application that allows
users to profile authors of their own text datasets —employing algorithms selected in the
previous research phase— and displaying the profiling results on an intuitive and accessible
dashboard. Finally, to illustrate a real use case of the application, the reference collection related
to the ”Black Lives Matter” movement (#BLM) will be utilized. This collection has been
made available for this Bachelor’s Thesis by its supervisors, with the aim of analyzing the
profiles of users who participated in social media discussions about this socially significant
movement.Traballo fin de grao (UDC.FIC). EnxeñarĂa Informática. Curso 2022/202
A Decade of Shared Tasks in Digital Text Forensics at PAN
[EN] Digital text forensics aims at examining the originality and
credibility of information in electronic documents and, in this regard, to extract and analyze information about the authors of these documents. The research field has been substantially developed during the last decade. PAN is a series of shared tasks that started in 2009 and significantly contributed to attract the attention of the research community in well-defined digital text forensics tasks. Several benchmark datasets have been developed to assess the state-of-the-art performance in a wide range of tasks. In this paper, we present the evolution of both the examined tasks and the developed datasets during the last decade. We also briefly introduce the upcoming PAN 2019 shared tasks.We are indebted to many colleagues and friends who contributed greatly to PAN's tasks: Maik Anderka, Shlomo Argamon, Alberto Barrón-Cedeño, Fabio Celli, Fabio Crestani, Walter Daelemans, Andreas Eiselt, Tim Gollub,
Parth Gupta, Matthias Hagen, Teresa Holfeld, Patrick Juola, Giacomo Inches, Mike
Kestemont, Moshe Koppel, Manuel Montes-y-GĂłmez, Aurelio Lopez-Lopez, Francisco
Rangel, Miguel Angel Sánchez-Pérez, Günther Specht, Michael Tschuggnall, and Ben
Verhoeven. Our special thanks go to PANÂżs sponsors throughout the years and not
least to the hundreds of participants.Potthast, M.; Rosso, P.; Stamatatos, E.; Stein, B. (2019). A Decade of Shared Tasks in Digital Text Forensics at PAN. Lecture Notes in Computer Science. 11438:291-300. https://doi.org/10.1007/978-3-030-15719-7_39S2913001143
- …