The potential of artificial intelligence (AI)-based large language models
(LLMs) holds considerable promise in revolutionizing education, research, and
practice. However, distinguishing between human-written and AI-generated text
has become a significant task. This paper presents a comparative study,
introducing a novel dataset of human-written and LLM-generated texts in
different genres: essays, stories, poetry, and Python code. We employ several
machine learning models to classify the texts. Results demonstrate the efficacy
of these models in discerning between human and AI-generated text, despite the
dataset's limited sample size. However, the task becomes more challenging when
classifying GPT-generated text, particularly in story writing. The results
indicate that the models exhibit superior performance in binary classification
tasks, such as distinguishing human-generated text from a specific LLM,
compared to the more complex multiclass tasks that involve discerning among
human-generated and multiple LLMs. Our findings provide insightful implications
for AI text detection while our dataset paves the way for future research in
this evolving area