[Abstract] Automatic user profiling from social networks has become a popular task due to its commercial applications
(targeted advertising, market studies...). Automatic profiling models infer demographic characteristics
of social network users from their generated content or interactions. Users’ demographic information is also
precious for more social worrying tasks such as automatic early detection of mental disorders. For this type
of users’ analysis tasks, it has been shown that the way how they use language is an important indicator which
contributes to the effectiveness of the models. Therefore, we also consider that for identifying aspects such as
gender, age or user’s origin, it is interesting to consider the use of the language both from psycho-linguistic
and semantic features. A good selection of features will be vital for the performance of retrieval, classification,
and decision-making software systems. In this paper, we will address gender classification as a part of the automatic
profiling task. We show an experimental analysis of the performance of existing gender classification
models based on external corpus and baselines for automatic profiling. We analyse in-depth the influence of
the linguistic features in the classification accuracy of the model. After that analysis, we have put together a
feature set for gender classification models in social networks with an accuracy performance above existing
baselines.This work was supported by projects RTI2018-093336-B-C21, RTI2018-093336-B-C22 (Ministerio de Ciencia e Innvovacion & ERDF) and the financial support supplied by the Conselleria de Educacion, Universidade e Formacion Profesional (accreditation 2019-2022 ED431G/01, ED431B 2019/03) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coruna as a Research Center of the Galician University System.Xunta de Galicia; ED431G/01Xunta de Galicia; ED431B 2019/0