ChatGPT versus human authors: A comparative study of concept maps for clinical reasoning training with virtual patients


Szydlak R., KIYAK Y. S., Hege I., Górski S., Linglart L., Shchudrova T., ...Daha Fazla

Medical Teacher, 2025 (SCI-Expanded, Scopus) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1080/0142159x.2025.2583403
  • Dergi Adı: Medical Teacher
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, Educational research abstracts (ERA), MEDLINE, Public Affairs Index
  • Anahtar Kelimeler: Clinical reasoning, concept mapping, large language models, virtual patient
  • Gazi Üniversitesi Adresli: Evet

Özet

Purpose: This study investigates whether ChatGPT can generate clinically accurate and pedagogically valuable maps for clinical reasoning (CR) training. The aim is to assess its potential as a tool for supporting the creation of high-quality educational resources for CR training. Materials and methods: We selected 10 diverse virtual patients (VPs) from the European iCoViP project. For each case, CR concept maps were generated by a custom ChatGPT model and compared to expert-created maps available in the CASUS VP system. The comparison encompassed structural metrics (number of concepts, connections, and graph density), clinical content quality (clinical expert evaluation of concept and connection validity), and pedagogical utility (medical educator assessment of clarity, abstraction, and progression). Statistical analysis included Student’s t-tests and interrater reliability using weighted Cohen’s kappa. Results: ChatGPT-generated maps contained significantly more concepts and connections than expert maps, indicating higher structural complexity (p < 0.001), though graph density did not differ significantly. Clinician evaluations showed comparable clinical content quality across both groups, with no statistically significant differences in concept or connection ratings. The educational review revealed that while ChatGPT maps offered comprehensive information, they lacked abstraction, prioritization, and contextual alignment, occasionally exceeding the optimal cognitive load for learners. Conclusions: ChatGPT can reliably generate concept maps that match expert-level clinical accuracy. However, limitations in educational clarity and usability underscore the need for expert refinement. With appropriate oversight, large language models (LLMs) such as ChatGPT can support efficient development of learning resources for CR education.