ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.


Kıyak Y. S., Coşkun Ö., Budakoğlu I. İ., Uluoğlu C.

European journal of clinical pharmacology, cilt.80, sa.5, ss.729-735, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 80 Sayı: 5
  • Basım Tarihi: 2024
  • Doi Numarası: 10.1007/s00228-024-03649-x
  • Dergi Adı: European journal of clinical pharmacology
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, BIOSIS, CAB Abstracts, Chimica, CINAHL, EMBASE
  • Sayfa Sayıları: ss.729-735
  • Anahtar Kelimeler: Artificial intelligence, Automatic item generation, ChatGPT, Medical education, Multiple-choice questions, Rational pharmacotherapy
  • Gazi Üniversitesi Adresli: Evet

Özet

Purpose

Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.

Methods

This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.

Results

Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.

Conclusions

The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.