Gazi Medical Journal, vol.35, no.2, pp.186-191, 2024 (ESCI)
Objective: In recent advancements in artificial intelligence, ChatGPT by OpenAI has emerged as a versatile tool capable of performing various tasks; however, its application in medicine is challenged by complexities and limitations in accuracy. This article aims to compare ChatGPT’s performance with orthopedic residents at Gazi University in a multiple-choice exam to assess its applicability and reliability in the field of orthopedics. Methods: In this observational study conducted at Gazi University, 31 orthopedic residents were stratified by experience level and assessed using a 50-question multiple-choice test on various orthopedic topics. The study also evaluated ChatGPT 3.5’s responses to the same questions, focusing on both the correctness and reasoning behind the answers. Results: Orthopedic residents tested, ranging from 6 months to 5 years in experience, scored between 23 and 40 out of 50 in a multiple-choice exam, with a mean score of 30.81, varying by seniority. ChatGPT provided correct answers for 25 out of 50 questions, showing consistency in different languages and times, but also exhibited limitations by giving incorrect responses or stating that the correct answer was not among the choices for some questions. Conclusion: While ChatGPT can accurately answer some theoretical questions, its effectiveness is limited in interpretive scenarios and in situations with multiple variables, although its accuracy may improve with updates over time.