Journal of Prosthetic Dentistry, 2025 (SCI-Expanded)
Statement of problem: Despite advances in artificial intelligence (AI), the quality, reliability, and understandability of health-related information provided by chatbots is still a question mark. Furthermore, studies on maxillofacial prosthesis (MP) information from AI chatbots are lacking. Purpose: The purpose of this study was to assess and compare the reliability, quality, readability, and similarity of responses to MP-related questions generated by 4 different chatbots. Material and methods: A total of 15 questions were provided by a maxillofacial prosthodontist and from 4 different chatbots (ChatGPT-3.5, Gemini 2.5 Flash, Copilot, and DeepSeek V3). The Reliability Scoring (adapted DISCERN), the Global Quality Scale (GQS), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Reading Grade Level (FKRGL), and the Similarity Index (iThenticate) were used to evaluate the performance of chatbots. Data were compared using the Kruskal-Wallis test, and the differences between chatbots were determined by the Conover multiple comparison test with Benjamini-Hochberg correction (α=.05). Results: There were no significant differences between the chatbots' DISCERN scores, except for one question where ChatGPT showed significantly higher reliability than Gemini or Copilot (P=.03). There was no statistically significant difference among AI tools in terms of GQS values (P=.096), FRES values (P=.166), and FKRGL values (P=.247). The similarity rate of Gemini was statistically higher than other AI chatbots (P=.03). Conclusions: ChatGPT-3.5, Gemini 2.5 Flash, Copilot, and DeepSeek V3 showed good quality responses. All chatbots' responses were difficult for non-professionals to read and understand. Low similarity rates were found for all chatbots except Gemini, indicating originality of their information.