17 research outputs found

    Probabilistic Relational Supervised Topic Modelling using Word Embeddings

    Get PDF
    The increasing pace of change in languages affects many applications and algorithms for text processing. Researchers in Natural Language Processing (NLP) have been striving for more generalized solutions that can cope with continuous change. This is even more challenging when applied on short text emanating from social media. Furthermore, increasingly social media have been casting a major influence on both the development and the use of language. Our work is motivated by the need to develop NLP techniques that can cope with short informal text as used in social media alongside the massive proliferation of textual data uploaded daily on social media. In this paper, we describe a novel approach for Short Text Topic Modelling using word embeddings and taking into account any informality of words in the social media text with the aim of addressing the challenge of reducing noise in messy text. We present a new algorithm derived from the Term Frequency -Inverse Document Frequency (TF-IDF), named Term Frequency - Inverse Context Term Frequency (TF-ICTF). TF-ICTF relies on a probabilistic relation between words and context with respect to time. Our experimental work shows promising results against other state-of-the-art methods

    Comprehensive Evaluations of Student Performance Estimation via Machine Learning

    Get PDF
    Success in student learning is the primary aim of the educational system. Artificial intelligence utilizes data and machine learning to achieve excellence in student learning. In this paper, we exploit several machine learning techniques to estimate early student performance. Two main simulations are used for the evaluation. The first simulation used the Traditional Machine Learning Classifiers (TMLCs) applied to the House dataset, and they are Gaussian Naïve Bayes (GNB), Support Vector Machine (SVM), Decision Tree (DT), Multi-Layer Perceptron (MLP), Random Forest (RF), Linear Discriminant Analysis (LDA), and Quadratic Discriminant Analysis (QDA). The best results were achieved with the MLP classifier with a division of 80% training and 20% testing, with an accuracy of 88.89%. The fusion of these seven classifiers was also applied and the highest result was equal to the MLP. Moreover, in the second simulation, the Convolutional Neural Network (CNN) was utilized and evaluated on five main datasets, namely, House, Western Ontario University (WOU), Experience Application Programming Interface (XAPI), University of California-Irvine (UCI), and Analytics Vidhya (AV). The UCI dataset was subdivided into three datasets, namely, UCI-Math, UCI-Por, and UCI-Fused. Moreover, the AV dataset has three targets which are Math, Reading, and Writing. The best accuracy results were achieved at 97.5%, 99.55%, 98.57%, 99.28%, 99.40%, 99.67%, 92.93%, 96.99%, and 96.84% for the House, WOU, XAPI, UCI-Math, UCI-Por, UCI-Fused, AV-Math, AV-Reading, and AV-Writing datasets, respectively, under the same protocol of evaluation. The system demonstrates that the proposed CNN-based method surpasses all seven conventional methods and other state-of-the-art-work

    Gendered STEM Workforce in the United Kingdom:The Role of Gender Bias in Job Advertising

    Get PDF
    Evidence submitted to the ‘Diversity in STEM’ Inquiry, Science and Technology Committee, House of Commons, UK Parliamen

    Balancing Gender Bias in Job Advertisements with Text-Level Bias Mitigation

    Get PDF
    Despite progress towards gender equality in the labor market over the past few decades, gender segregation in labor force composition and labor market outcomes persists. Evidence has shown that job advertisements may express gender preferences, which may selectively attract potential job candidates to apply for a given post and thus reinforce gendered labor force composition and outcomes. Removing gender-explicit words from job advertisements does not fully solve the problem as certain implicit traits are more closely associated with men, such as ambitiousness, while others are more closely associated with women, such as considerateness. However, it is not always possible to find neutral alternatives for these traits, making it hard to search for candidates with desired characteristics without entailing gender discrimination. Existing algorithms mainly focus on the detection of the presence of gender biases in job advertisements without providing a solution to how the text should be (re)worded. To address this problem, we propose an algorithm that evaluates gender bias in the input text and provides guidance on how the text should be debiased by offering alternative wording that is closely related to the original input. Our proposed method promises broad application in the human resources process, ranging from the development of job advertisements to algorithm-assisted screening of job applications

    Balancing Gender Bias in Job Advertisements With Text-Level Bias Mitigation

    Get PDF
    Despite progress toward gender equality in the labor market over the past few decades, gender segregation in labor force composition and labor market outcomes persists. Evidence has shown that job advertisements may express gender preferences, which may selectively attract potential job candidates to apply for a given post and thus reinforce gendered labor force composition and outcomes. Removing gender-explicit words from job advertisements does not fully solve the problem as certain implicit traits are more closely associated with men, such as ambitiousness, while others are more closely associated with women, such as considerateness. However, it is not always possible to find neutral alternatives for these traits, making it hard to search for candidates with desired characteristics without entailing gender discrimination. Existing algorithms mainly focus on the detection of the presence of gender biases in job advertisements without providing a solution to how the text should be (re)worded. To address this problem, we propose an algorithm that evaluates gender bias in the input text and provides guidance on how the text should be debiased by offering alternative wording that is closely related to the original input. Our proposed method promises broad application in the human resources process, ranging from the development of job advertisements to algorithm-assisted screening of job applications

    Balancing Gender Bias in Job Advertisements with Text-Level Bias Mitigation

    Get PDF
    Despite progress toward gender equality in the labor market over the past few decades, gender segregation in labor force composition and labor market outcomes persists. Evidence has shown that job advertisements may express gender preferences, which may selectively attract potential job candidates to apply for a given post and thus reinforce gendered labor force composition and outcomes. Removing gender-explicit words from job advertisements does not fully solve the problem as certain implicit traits are more closely associated with men, such as ambitiousness, while others are more closely associated with women, such as considerateness. However, it is not always possible to find neutral alternatives for these traits, making it hard to search for candidates with desired characteristics without entailing gender discrimination. Existing algorithms mainly focus on the detection of the presence of gender biases in job advertisements without providing a solution to how the text should be (re)worded. To address this problem, we propose an algorithm that evaluates gender bias in the input text and provides guidance on how the text should be debiased by offering alternative wording that is closely related to the original input. Our proposed method promises broad application in the human resources process, ranging from the development of job advertisements to algorithm-assisted screening of job applications

    Natural Language Processing methods for short informal text

    No full text
    The change in the English language is faster than any time before. Social media is playing a great role in this change as it has become an essential part of peoples social life. Thoughts, ideas, feelings, or even special moments are the main contents of the posts on Twitter and Facebook which are the most popular social media platforms. In this work, we addressed the change in language problem and how it affects the traditional techniques of Natural Language Processing (NLP) for this specific domain. Such a domain is considered to be a challenge for many NLP methods like topic modelling, named entity recognition, and sentiment analysis. We produced novel methods in NLP that target the short text informality. Our first novel model is in topic modelling for short messy text. The proposed model was inspired by the relation between the word's frequency and the context words frequencies (words surrounding the selected word) over time. This relation had been translated to co-occurrence patterns and stored as word embeddings after being transformed into feature space. The features had been generated from the frequencies of words and context words by our novel Term Frequency-Inverse Context Term Frequency (TF-ICTF) algorithm. TF-ICTF had been derived from the traditional standard algorithm Term Frequency-Inverse Document Frequency (TF-IDF) which did not perform well on short messy text. The proposed model is based on the words probabilities and co-occurrences between words within the short text. Therefore, we named our proposed approach the Probabilistic Relational Supervised Topic Modelling. The second approach addresses the non-standard entities in a short text. We proposed a new model using word patterns embeddings that are generated from the Twitter streamed data. These patterns should include entities that are identified by the state-of-the-art of the named entity recognition (NER) algorithms. We named our approach the Probabilistic Named Entity Recognition (PNER). PNER was trained on the identified entities in the pattern embeddings to identify the non-standard entities format. Lastly, our Probabilistic co-occurrence Relational Sentiment (PR_ Sentiment) approach proposed to sentimentally classify tweets. We used sentiment patterns detected from the short text tweets. These patterns are structured by an n-gram technique. These n-grams will be detected from sentimentally annotated tweets and labeled accordingly. The dataset that was used is a standard dataset with more than one million annotated tweets. Moreover, the PR\_ Sentiment model performs within near real-time. The aim of our project is to address the informality and non-standardization in social media short text and produce novel NLP methods. These methods were designed as a novel approach towards generalising the short messy text processing. Therefore, our methods have been tested and compared against several state-of-the-art approaches to show novelty

    Probabilistic Named Entity Recognition for nonstandard format entities using cooccurrence word embeddings

    No full text
    The use of short text has become widespread in social media like Twitter and Facebook. Typically, users on social media platforms adopt nonstandard format terms when posting. This introduces challenges for Information Retrieval (IR) and Natural Language Processing (NLP) and standard or classical methods tend not to perform well in this domain. In this paper, we have addressed one of the challenges in IR which is Named Entity Recognition (NER). We introduce a novel probabilistic approach which targets entities occurring in an informal (nonstandard) format within short text. The Probabilistic Named Entity Recognition (PNER) model identifies these entities using cooccurrence patterns. These patterns have been detected using the word cooccurrence embeddings of 278.6 million tweets. The results show an enhancement of 7% on two standard methods when used in combination with PNER. The testing dataset has been created using the standard methods in addition to street names and places taken from the Open Street Map (OSM) database. © 2019 IEEE

    Probabilistic Relational Supervised Topic Modelling using Word Embeddings

    No full text
    The increasing pace of change in languages affects many applications and algorithms for text processing. Researchers in Natural Language Processing (NLP) have been striving for more generalized solutions that can cope with continuous change. This is even more challenging when applied on short text emanating from social media. Furthermore, increasingly social media have been casting a major influence on both the development and the use of language. Our work is motivated by the need to develop NLP techniques that can cope with short informal text as used in social media alongside the massive proliferation of textual data uploaded daily on social media. In this paper, we describe a novel approach for Short Text Topic Modelling using word embeddings and taking into account any informality of words in the social media text with the aim of addressing the challenge of reducing noise in messy text. We present a new algorithm derived from the Term Frequency -Inverse Document Frequency (TF-IDF), named Term Frequency - Inverse Context Term Frequency (TF-ICTF). TF-ICTF relies on a probabilistic relation between words and context with respect to time. Our experimental work shows promising results against other state-of-the-art methods
    corecore