3 research outputs found

    Detecting Female and Male Language Features in Facebook Comments by Malaysian Millennial Users

    Get PDF
    This study examines gendered language use in Facebook comments by Malaysian millennial users. Textual analysis was conducted on 260 Facebook comments collected from 11 Facebook social pages. Sixty participants’ reasons for identifying the gender of the writers of 14 Facebook comments were also analyzed. The results showed that half of the participants could correctly guess the writers’ gender. The Facebook comments showed more frequent use of male than female language features. The male millennial users were inclined towards using Sexual References, Insults/Profanities, Directive/Autonomy, Strong Assertion, and Rhetorical Questions. The females, however, were inclined towards using Hedges, Polite and Emotionally Expressive Words, Interpersonally Orientation/Supportiveness, Questions, and Experience Sharing. From the participants’ perspective, male writing is short, direct, rude, negative, and crude, while female writing is lengthy, tentative, polite, positive, emotional, and reflects concern for others. The non-gender specific language features identified from textual analysis are Information Orientation, Apologies, Tag Questions, and Aligned Orientation, but different features were given by the participants (Questions, Rhetorical Questions, and Strong Assertion). The study also shows that Information Orientation, Self-Promotion, Sexual Reference, Opposed Orientation, Hedges, Apologies and Tag Questions may be falling into disuse among Malaysian millennials in Facebook comments. The findings suggest that language patterns used by Malaysian millennials deviate from conventional norms, with some comments displaying cross-gender language patterns. This indicates a blurring of conventional gender language norms in online interactions

    A Semi-supervised approach for gender identification

    No full text
    Comunicació presentada a: LREC 2016, Tenth International Conference on Language Resources and Evaluation, celebrada del 23 al 28 de maig de 2016 a Portorož, Eslovènia.In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70% with only 113 instances to train the model. It is also shown that the algorithm performs equally well using publicly available data.The presentation of this work was partially supported by the ICT PhD program of Universitat Pompeu Fabra through a travel grant

    A Semi-supervised approach for gender identification

    No full text
    Comunicació presentada a: LREC 2016, Tenth International Conference on Language Resources and Evaluation, celebrada del 23 al 28 de maig de 2016 a Portorož, Eslovènia.In most of the research studies on Author Profiling, large quantities of correctly labeled data are used to train the models. However, this does not reflect the reality in forensic scenarios: in practical linguistic forensic investigations, the resources that are available to profile the author of a text are usually scarce. To pay tribute to this fact, we implemented a Semi-Supervised Learning variant of the k nearest neighbors algorithm that uses small sets of labeled data and a larger amount of unlabeled data to classify the authors of texts by gender (man vs woman). We describe the enriched KNN algorithm and show that the use of unlabeled instances improves the accuracy of our gender identification model. We also present a feature set that facilitates the use of a very small number of instances, reaching accuracies higher than 70% with only 113 instances to train the model. It is also shown that the algorithm performs equally well using publicly available data.The presentation of this work was partially supported by the ICT PhD program of Universitat Pompeu Fabra through a travel grant
    corecore