ScholarGPT's Performance in Oral and Maxillofacial Surgery

Balel, YUNUS

doi:10.1016/j.jormas.2024.102114

ScholarGPT's Performance in Oral and Maxillofacial Surgery

Atıf İçin Kopyala

Balel Y.

JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY, cilt.1, sa.1, ss.1, 2024 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 1 Sayı: 1
Basım Tarihi: 2024
Doi Numarası: 10.1016/j.jormas.2024.102114
Dergi Adı: JOURNAL OF STOMATOLOGY, ORAL AND MAXILLOFACIAL SURGERY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, MEDLINE
Sayfa Sayıları: ss.1
Sivas Cumhuriyet Üniversitesi Adresli: Evet

Objective

The purpose of this study is to evaluate the performance of Scholar GPT in answering technical questions in the field of oral and maxillofacial surgery and to conduct a comparative analysis with the results of a previous study that assessed the performance of ChatGPT.

Materials and Methods

Scholar GPT was accessed via ChatGPT (www.chatgpt.com) on March 20, 2024. A total of 60 technical questions (15 each on impacted teeth, dental implants, temporomandibular joint disorders, and orthognathic surgery) from our previous study were used. Scholar GPT's responses were evaluated using a modified Global Quality Scale (GQS). The questions were randomized before scoring using an online randomizer (www.randomizer.org). A single researcher performed the evaluations at three different times, three weeks apart, with each evaluation preceded by a new randomization. In cases of score discrepancies, a fourth evaluation was conducted to determine the final score.

Results

Scholar GPT performed well across all technical questions, with an average GQS score of 4.48 (SD=0.93). Comparatively, ChatGPT's average GQS score in previous study was 3.1 (SD=1.492). The Wilcoxon Signed-Rank Test indicated a statistically significant higher average score for Scholar GPT compared to ChatGPT (Mean Difference = 2.00, SE = 0.163, p < 0.001). The Kruskal-Wallis Test showed no statistically significant differences among the topic groups (χ² = 0.799, df = 3, p = 0.850, ε² = 0.0135).

Conclusion

Scholar GPT demonstrated a generally high performance in technical questions within oral and maxillofacial surgery and produced more consistent and higher-quality responses compared to ChatGPT. The findings suggest that GPT models based on academic databases can provide more accurate and reliable information. Additionally, developing a specialized GPT model for oral and maxillofacial surgery could ensure higher quality and consistency in artificial intelligence-generated information.