500 research outputs found

    A comparison of various machine learning algorithms and execution of flask deployment on essay grading

    Get PDF
    Students’ performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%

    Development of an Automated Scoring Model Using SentenceTransformers for Discussion Forums in Online Learning Environments

    Get PDF
    Due to the limitations of public datasets, research on automatic essay scoring in Indonesian has been restrained and resulted in suboptimal accuracy. In general, the main goal of the essay scoring system is to improve execution time, which is usually done manually with human judgment. This study uses a discussion forum in online learning to generate an assessment between the responses and the lecturer\u27s rubric in the automated essay scoring. A SentenceTransformers pre-trained model that can construct the highest vector embedding was proposed to identify the semantic meaning between the responses and the lecturer\u27s rubric. The effectiveness of monolingual and multilingual models was compared. This research aims to determine the model\u27s effectiveness and the appropriate model for the Automated Essay Scoring (AES) used in paired sentence Natural Language Processing tasks. The distiluse-base-multilingual-cased-v1 model, which employed the Pearson correlation method, obtained the highest performance. Specifically, it obtained a correlation value of 0.63 and a mean absolute error (MAE) score of 0.70. It indicates that the overall prediction result is enhanced when compared to the earlier regression task research

    Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer Review

    Full text link
    Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques

    Assessing text readability and quality with language models

    Get PDF
    Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation

    Assessing text readability and quality with language models

    Get PDF
    Automatic readability assessment is considered as a challenging task in NLP due to its high degree of subjectivity. The majority prior work in assessing readability has focused on identifying the level of education necessary for comprehension without the consideration of text quality, i.e., how naturally the text flows from the perspective of a native speaker. Therefore, in this thesis, we aim to use language models, trained on well-written prose, to measure not only text readability in terms of comprehension but text quality. In this thesis, we developed two word-level metrics based on the concordance of article text with predictions made using language models to assess text readability and quality. We evaluate both metrics on a set of corpora used for readability assessment or automated essay scoring (AES) by measuring the correlation between scores assigned by our metrics and human raters. According to the experimental results, our metrics are strongly correlated with text quality, which achieve 0.4-0.6 correlations on 7 out of 9 datasets. We demonstrate that GPT-2 surpasses other language models, including the bigram model, LSTM, and bidirectional LSTM, on the task of estimating text quality in a zero-shot setting, and GPT-2 perplexity-based measure is a reasonable indicator for text quality evaluation

    Automatic inference of causal reasoning chains from student essays

    Get PDF
    While there has been an increasing focus on higher-level thinking skills arising from the Common Core Standards, many high-school and middle-school students struggle to combine and integrate information from multiple sources when writing essays. Writing is an important learning skill, and there is increasing evidence that writing about a topic develops a deeper understanding in the student. However, grading essays is time consuming for teachers, resulting in an increasing focus on shallower forms of assessment that are easier to automate, such as multiple-choice tests. Existing essay grading software has attempted to ease this burden but relies on shallow lexico-syntactic features and is unable to understand the structure or validity of a student’s arguments or explanations. Without the ability to understand a student’s reasoning processes, it is impossible to write automated formative assessment systems to assist students with improving their thinking skills through essay writing. In order to understand the arguments put forth in an explanatory essay in the science domain, we need a method of representing the causal structure of a piece of explanatory text. Psychologists use a representation called a causal model to represent a student\u27s understanding of an explanatory text. This consists of a number of core concepts, and a set of causal relations linking them into one or more causal chains, forming a causal model. In this thesis I present a novel system for automatically constructing causal models from student scientific essays using Natural Language Processing (NLP) techniques. The problem was decomposed into 4 sub-problems - assigning essay concepts to words, detecting causal-relations between these concepts, resolving coreferences within each essay, and using the structure of the whole essay to reconstruct a causal model. Solutions to each of these sub-problems build upon the predictions from the solutions to earlier problems, forming a sequential pipeline of models. Designing a system in this way allows later models to correct for false positive predictions from downstream models. However, this also has the disadvantage that errors made in earlier models can propagate through the system, negatively impacting the upstream models, and limiting their accuracy. Producing robust solutions for the initial 2 sub problems, detecting concepts, and parsing causal relations between them, was critical in building a robust system. A number of sequence labeling models were trained to classify the concepts associated with each word, with the most effective approach being a bidirectional recurrent neural network (RNN), a deep learning model commonly applied to word labeling problems. This is because the RNN used pre-trained word embeddings to better generalize to rarer words, and was able to use information from both ends of each sentence to infer a word\u27s concept. The concepts predicted by this model were then used to develop causal relation parsing models for detecting causal connections between these concepts. A shift-reduce dependency parsing model was trained using the SEARN algorithm and out-performed a number of other approaches by better utilizing the structure of the problem and directly optimizing the error metric used. Two pre-trained coreference resolution systems were used to resolve coreferences within the essays. However a word tagging model trained to predict anaphors combined with a heuristic for determining the antecedent out-performed these two systems. Finally, a model was developed for parsing a causal model from an entire essay, utilizing the solutions to the three previous problems. A beam search algorithm was used to produce multiple parses for each sentence, which in turn were combined to generate multiple candidate causal models for each student essay. A reranking algorithm was then used to select the optimal causal model from all of the generated candidates. An important contribution of this work is that it represents a system for parsing a complete causal model of a scientific essay from a student\u27s written answer. Existing systems have been developed to parse individual causal relations, but no existing system attempts to parse a sequence of linked causal relations forming a causal model from an explanatory scientific essay. It is hoped that this work can lead to the development of more robust essay grading software and formative assessment tools, and can be extended to build solutions for extracting causality from text in other domains. In addition, I also present 2 novel approaches for optimizing the micro-F1 score within the design of two of the algorithms studied: the dependency parser and the reranking algorithm. The dependency parser uses a custom cost function to estimate the impact of parsing mistakes on the overall micro-F1 score, while the reranking algorithm allows the micro-F1 score to be optimized by tuning the beam search parameter to balance recall and precision
    • …
    corecore