Artificial intelligence-powered chatbots’ responses to orthodontic questions from the dentistry specialization examination: Accuracy and source evaluation,

Çakmak, BERRAK; Sökmen, TEVHİDE; Baloş Tuncer, BURCU

doi:10.1016/j.jds.2025.11.027

Artificial intelligence-powered chatbots’ responses to orthodontic questions from the dentistry specialization examination: Accuracy and source evaluation,

Çakmak B., Sökmen T., Baloş Tuncer B.

JOURNAL OF DENTAL SCIENCES, sa.1, ss.1-2, 2025 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2025
Doi Numarası: 10.1016/j.jds.2025.11.027
Dergi Adı: JOURNAL OF DENTAL SCIENCES
Derginin Tarandığı İndeksler: Scopus, Science Citation Index Expanded (SCI-EXPANDED), Directory of Open Access Journals
Sayfa Sayıları: ss.1-2
Gazi Üniversitesi Adresli: Evet

Özet

Background: /purpose: The use of artificial intelligence (AI) powered chatbots in dental education is becoming increasingly widespread. Evaluating their performance and the reliability of their sources is essential to understand their educational value. The aim of this study was to evaluate the performance of AI-powered chatbots in addressing orthodontic questions from the Dental Specialty Exam (DUS) and to assess the accuracy and reliability of the information sources on which they rely. Materials and methods: A total of 129 orthodontic questions from the exam administered between 2012 and 2021 were categorized according to Bloom’s taxonomy. Each question was individually entered into ChatGPT-5, Claude 3.7, and Copilot, and their performances were comparatively evaluated. The sources referenced by the chatbots while generating their answers were also assessed. The data were analyzed using Pearson’s chi-squared test. Results: ChatGPT-5, Claude 3.7, and Copilot achieved accuracy rates of 82.2 %, 83.7 %, and 85.3 %, respectively. Copilot performed best on scenario-based questions (100 %) but performed worst on visual analysis questions (33.3 %). Citation analysis showed that, ChatGPT5.0 used reliable academic sources, whereas Claude cited few and less credible references, and Copilot relied mainly on moderately reliable materials. Conclusion: Chatbots exhibited strong text-based reasoning abilities but limited visual interpretation skills. While ChatGPT-5.0 provided more reliable and well-referenced responses, other models showed weaker citation practices. These underscored both the potential and the current limitations of AI-based systems in orthodontic education and clinical practice.