Beyond human in neurosurgical exams: ChatGPT's success in the Turkish neurosurgical society proficiency board exams.


Şahin M. Ç., Sözer A., Kuzucu P., Turkmen T., Sahin M. B., Sozer E., ...Daha Fazla

Computers in biology and medicine, cilt.169, ss.107807, 2024 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 169
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1016/j.compbiomed.2023.107807
  • Dergi Adı: Computers in biology and medicine
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Applied Science & Technology Source, BIOSIS, Biotechnology Research Abstracts, CINAHL, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, Library, Information Science & Technology Abstracts (LISTA)
  • Sayfa Sayıları: ss.107807
  • Anahtar Kelimeler: Artificial intelligence, Board, ChatGPT, Education, Exam, Large language model, Machine learning
  • Gazi Üniversitesi Adresli: Evet

Özet

Chat Generative Pre-Trained Transformer (ChatGPT) is a sophisticated natural language model that employs advanced deep learning techniques and is trained on extensive datasets to produce responses akin to human conversation for user inputs. In this study, ChatGPT's success in the Turkish Neurosurgical Society Proficiency Board Exams (TNSPBE) is compared to the actual candidates who took the exam, along with identifying the types of questions it answered incorrectly, assessing the quality of its responses, and evaluating its performance based on the difficulty level of the questions. Scores of all 260 candidates were recalculated according to the exams they took and included questions in those exams for ranking purposes of this study. The average score of the candidates for a total of 523 questions is 62.02 ± 0.61 compared to ChatGPT, which was 78.77. We have concluded that in addition to ChatGPT's higher response rate, there was also a correlation with the increase in clarity regardless of the difficulty level of the questions with Clarity 1.5, 2.0, 2.5, and 3.0. In the participants, however, there is no such increase in parallel with the increase in clarity.