Distinguishing Human-Generated and AI-Generated Academic Writing: A Machine Learning Benchmark Study

Abstract

The rapid adoption of large language models (LLMs) such as ChatGPT has raised critical questions about authorship, originality, and integrity in academic writing. Unlike conventional plagiarism testing tools, AI-generated or AI-rephrased text can preserve the original meaning and context of the text while modifying the writing style, making it challenging to detect using standard similarity checks. This study addresses this challenge by creating a domain-specific corpus of postgraduate-level academic texts. The corpus contains 22,520 samples, equally divided between human-written text and AI-rephrased text. All samples were preprocessed and represented using two common techniques: TF-IDF and Word2Vec. The dataset was evaluated using well-known machine learning and deep learning models, including Logistic Regression, Support Vector Machines, Recurrent Neural Networks, and transformer-based models BERT and T5. The results show that linear and sequential models provide low baseline performance, with accuracy between 50-54%. While BERT significantly outperforms the other models, achieving 83% precision along with a high recall rate. Confusion matrix analysis further shows that traditional models tend to overpredict AI authorship, whereas BERT demonstrates strong reliability in distinguishing between human-written and AI-generated text. The results show that transformer-based models are more effective for authorship verification in academic settings. They also emphasize the trade-offs among interpretability, computational cost, and predictive performance. In general, this study offers some important recommendations for the creation of credible, transparent, and domain-sensitive AI detectors for academia

Similar works

Full text

VFAST - Virtual Foundation for Advancement of Science and Technology (Pakistan)

redirect
Last time updated on 16/05/2026

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.