8 research outputs found

    Gender Prediction of Journalists from Writing Style

    Get PDF
    Web-based Kurdish media have seen a tangible growth in the last few years. There are many factors that have contributed into this rapid growth. These include an easy access to the internet connection, the low price of electronic gadgets and pervasive usage of social networking. The swift development of the Kurdish web-based media imposes new challenges that need to be addressed. For example, a newspaper article published online possesses properties such as author name, gender, age, and nationality among others. Determining one or more of these properties, when ambiguity arises, using computers is an important open research area. In this study the journalist’s gender in web-based Kurdish media determined using computational linguistic and text mining techniques. 75 web-based Kurdish articles used to train artificial model designed to determine the gender of journalists in web-based Kurdish media. Articles were downloaded from four different well known web-based Kurdish newspapers. 61 features were extracted from each article; these features are distinct in discriminating between genders. The Multi-Layer Perceptron (MLP) artificial neural network is used as a classification technique and the accuracy received were 76%

    On the Inference of Soft Biometrics from Typing Patterns Collected in a Multi-device Environment

    Full text link
    In this paper, we study the inference of gender, major/minor (computer science, non-computer science), typing style, age, and height from the typing patterns collected from 117 individuals in a multi-device environment. The inference of the first three identifiers was considered as classification tasks, while the rest as regression tasks. For classification tasks, we benchmark the performance of six classical machine learning (ML) and four deep learning (DL) classifiers. On the other hand, for regression tasks, we evaluated three ML and four DL-based regressors. The overall experiment consisted of two text-entry (free and fixed) and four device (Desktop, Tablet, Phone, and Combined) configurations. The best arrangements achieved accuracies of 96.15%, 93.02%, and 87.80% for typing style, gender, and major/minor, respectively, and mean absolute errors of 1.77 years and 2.65 inches for age and height, respectively. The results are promising considering the variety of application scenarios that we have listed in this work.Comment: The first two authors contributed equally. The code is available upon request. Please contact the last autho

    Author Gender Metadata Augmentation of HathiTrust Digital Library

    Get PDF
    ABSTRACT Bibliographic metadata is essential for digital library resource description. Especially as the size and number of bibliographic entities grows, high-quality metadata enables richer forms of digital library access, search, and use. Metadata records can be enriched through automated techniques. For example, a digital humanities scholar might use the gender of a set of authors during their literature analysis. In this study, we undertook to enrich the metadata description of a large-scale digital library, the HathiTrust (HT) digital library, specifically by determining the gender of authors of the public domain portion of the collection. The results are stored to a separate Solr index accessible through the HathiTrust Research Center services. This study, which successfully resolved in 78.9% of the cases the gender of authors in the HT public domain corpus, suggests future research directions in capturing and representing the provenance of the contributing sources to enhance trust, and in machine learning to resolve the remaining names

    Continuous touchscreen biometrics: authentication and privacy concerns

    Get PDF
    In the age of instant communication, smartphones have become an integral part of our daily lives, with a significant portion of the population using them for a variety of tasks such as messaging, banking, and even recording sensitive health information. However, the increasing reliance on smartphones has also made them a prime target for cybercriminals, who can use various tactics to gain access to our sensitive data. In light of this, it is crucial that individuals and organisations prioritise the security of their smartphones to protect against the abundance of threats around us. While there are dozens of methods to verify the identity of users before granting them access to a device, many of them lack effectiveness in terms of usability and potential vulnerabilities. In this thesis, we aim to advance the field of touchscreen biometrics which promises to alleviate some of the recurring issues. This area of research deals with the use of touch interactions, such as gestures and finger movements, as a means of identifying or authenticating individuals. First, we provide a detailed explanation of the common procedure for evaluating touch-based authentication systems and examine the potential pitfalls and concerns that can arise during this process. The impact of the pitfalls is evaluated and quantified on a newly collected large-scale dataset. We also discuss the prevalence of these issues in the related literature and provide recommendations for best practices when developing continuous touch-based authentication systems. Then we provide a comprehensive overview of the techniques that are commonly used for modelling touch-based authentication, including the various features, classifiers, and aggregation methods that are employed in this field. We compare the approaches under controlled, fair conditions in order to determine the top-performing techniques. Based on our findings, we introduce methods that outperform the current state-of-the-art. Finally, as a conclusion to our advancements in the development of touchscreen authentication technology, we explore any negative effects our work may cause to an ordinary user of mobile websites and applications. In particular, we look into any threats that can affect the privacy of the user, such as tracking them and revealing their personal information based on their behaviour on smartphones

    What demographic attributes do our digital footprints reveal? A systematic review

    Get PDF
    <div><p>To what extent does our online activity reveal who we are? Recent research has demonstrated that the digital traces left by individuals as they browse and interact with others online may reveal who they are and what their interests may be. In the present paper we report a systematic review that synthesises current evidence on predicting demographic attributes from online digital traces. Studies were included if they met the following criteria: (i) they reported findings where at least one demographic attribute was predicted/inferred from at least one form of digital footprint, (ii) the method of prediction was automated, and (iii) the traces were either visible (e.g. tweets) or non-visible (e.g. clickstreams). We identified 327 studies published up until October 2018. Across these articles, 14 demographic attributes were successfully inferred from digital traces; the most studied included gender, age, location, and political orientation. For each of the demographic attributes identified, we provide a database containing the platforms and digital traces examined, sample sizes, accuracy measures and the classification methods applied. Finally, we discuss the main research trends/findings, methodological approaches and recommend directions for future research.</p></div

    AUTHOR VERIFICATION OF ELECTRONIC MESSAGING SYSTEMS

    Get PDF
    Messaging systems have become a hugely popular new paradigm for sending and delivering text messages; however, online messaging platforms have also become an ideal place for criminals due to their anonymity, ease of use and low cost. Therefore, the ability to verify the identity of individuals involved in criminal activity is becoming increasingly important. The majority of research in this area has focused on traditional authorship problems that deal with single-domain datasets and large bodies of text. Few research studies have sought to explore multi-platform author verification as a possible solution to problems around forensics and security. Therefore, this research has investigated the ability to identify individuals on messaging systems, and has applied this to the modern messaging platforms of Email, Twitter, Facebook and Text messages, using different single-domain datasets for population-based and user-based verification approaches. Through a novel technique of cross-domain research using real scenarios, the domain incompatibilities of profiles from different distributions has been assessed, based on real-life corpora using data from 50 authors who use each of the aforementioned domains. The results show that the use of linguistics is likely be similar between platforms, on average, for a population-based approach. The best corpus experimental result achieved a low EER of 7.97% for Text messages, showing the usefulness of single-domain platforms where the use of linguistics is likely be similar, such as Text messages and Emails. For the user-based approach, there is very little evidence of a strong correlation of stylometry between platforms. It has been shown that linguistic features on some individual platforms have features in common with other platforms, and lexical features play a crucial role in the similarities between users’ modern platforms. Therefore, this research shows that the ability to identify individuals on messaging platforms may provide a viable solution to problems around forensics and security, and help against a range of criminal activities, such as sending spam texts, grooming children, and encouraging violence and terrorism.Royal Embassy of Saudi Arabia, Londo
    corecore