Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Uyar, AHMET; Büyükahıska, Dilek

doi:10.21449/ijate.1517994

Artificial intelligence as an automated essay scoring tool: A focus on ChatGPT

Uyar A. C., Büyükahıska D.

INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION, cilt.12, sa.1, ss.20-32, 2025 (ESCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 12 Sayı: 1
Basım Tarihi: 2025
Doi Numarası: 10.21449/ijate.1517994
Dergi Adı: INTERNATIONAL JOURNAL OF ASSESSMENT TOOLS IN EDUCATION
Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), ERIC (Education Resources Information Center), TR DİZİN (ULAKBİM)
Sayfa Sayıları: ss.20-32
Sivas Cumhuriyet Üniversitesi Adresli: Evet

Özet

This study explores the effectiveness of using ChatGPT, an Artificial Intelligence (AI) language model, as an Automated Essay Scoring (AES) tool for grading English as a Foreign Language (EFL) learners’ essays. The corpus consists of 50 essays representing various types including analysis, compare and contrast, descriptive, narrative, and opinion essays written by 10 EFL learners at the B2 level. Human raters and ChatGPT (4o mini version) scored the essays using the International English Language Testing System (IELTS) TASK 2 Writing band descriptors. Adopting a quantitative approach, the Wilcoxon signed-rank tests and Spearman correlation tests were employed to compare the scores generated, revealing a significant difference between the two methods of scoring, with human raters assigning higher scores than ChatGPT. Similarly, significant differences with varying degrees were also evident for each of the various types of essays, suggesting that the genre of the essays was not a parameter affecting the agreement between human raters and ChatGPT. After all, it was discussed that while ChatGPT shows promise as an AES tool, the observed disparities suggest that it has not reached sufficient proficiency for practical use. The study emphasizes the need for improvements in AI language models to meet the nuanced nature of essay evaluation in EFL contexts.