INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, cilt.12, sa.1, ss.20-32, 2025 (ESCI)
This study explores the effectiveness of using ChatGPT, an Artificial
Intelligence (AI) language model, as an Automated Essay Scoring (AES)
tool for grading English as a Foreign Language (EFL) learners’ essays.
The corpus consists of 50 essays representing various types including
analysis, compare and contrast, descriptive, narrative, and opinion
essays written by 10 EFL learners at the B2 level. Human raters and
ChatGPT (4o mini version) scored the essays using the International
English Language Testing System (IELTS) TASK 2 Writing band descriptors.
Adopting a quantitative approach, the Wilcoxon signed-rank tests and
Spearman correlation tests were employed to compare the scores
generated, revealing a significant difference between the two methods of
scoring, with human raters assigning higher scores than ChatGPT.
Similarly, significant differences with varying degrees were also
evident for each of the various types of essays, suggesting that the
genre of the essays was not a parameter affecting the agreement between
human raters and ChatGPT. After all, it was discussed that while ChatGPT
shows promise as an AES tool, the observed disparities suggest that it
has not reached sufficient proficiency for practical use. The study
emphasizes the need for improvements in AI language models to meet the
nuanced nature of essay evaluation in EFL contexts.