JOURNAL OF DENTAL SCIENCES, sa.1, ss.1-2, 2025 (SCI-Expanded, Scopus)
Background: /purpose: The use of artificial intelligence (AI) powered chatbots in dental education is becoming increasingly widespread. Evaluating their performance and the reliability of their sources is essential to understand their educational value. The aim of this study was to evaluate the performance of AI-powered chatbots in addressing orthodontic questions from the Dental Specialty Exam (DUS) and to assess the accuracy and reliability of the information sources on which they rely. Materials and methods: A total of 129 orthodontic questions from the exam administered between 2012 and 2021 were categorized according to Bloom’s taxonomy. Each question was individually entered into ChatGPT-5, Claude 3.7, and Copilot, and their performances were comparatively evaluated. The sources referenced by the chatbots while generating their answers were also assessed. The data were analyzed using Pearson’s chi-squared test. Results: ChatGPT-5, Claude 3.7, and Copilot achieved accuracy rates of 82.2 %, 83.7 %, and 85.3 %, respectively. Copilot performed best on scenario-based questions (100 %) but performed worst on visual analysis questions (33.3 %). Citation analysis showed that, ChatGPT5.0 used reliable academic sources, whereas Claude cited few and less credible references, and Copilot relied mainly on moderately reliable materials. Conclusion: Chatbots exhibited strong text-based reasoning abilities but limited visual interpretation skills. While ChatGPT-5.0 provided more reliable and well-referenced responses, other models showed weaker citation practices. These underscored both the potential and the current limitations of AI-based systems in orthodontic education and clinical practice.