Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Raymond, Mark; Swygert, Kimberly; Kahraman, NİLÜFER

doi:10.1111/j.1745-3984.2012.00180.x

Psychometric Equivalence of Ratings for Repeat Examinees on a Performance Assessment for Physician Licensure

Raymond M. R., Swygert K. A., Kahraman N.

JOURNAL OF EDUCATIONAL MEASUREMENT, cilt.49, sa.4, ss.339-361, 2012 (SSCI)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 49 Sayı: 4
Basım Tarihi: 2012
Doi Numarası: 10.1111/j.1745-3984.2012.00180.x
Dergi Adı: JOURNAL OF EDUCATIONAL MEASUREMENT
Derginin Tarandığı İndeksler: Social Sciences Citation Index (SSCI), Scopus
Sayfa Sayıları: ss.339-361
Gazi Üniversitesi Adresli: Hayır

Özet

Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical skills assessment required for physician licensure. Each examinee was rated in four skill domains: data gathering, communication-interpersonal skills, spoken English proficiency, and documentation proficiency. Conditional standard errors of measurement computed for single-take and multiple-take examinees indicated that ratings were of comparable precision for the two groups within each of the four skill domains; however, conditional errors were larger for low-scoring examinees regardless of retest status. In addition, on their first attempt multiple-take examinees exhibited less score consistency across the skill domains but on their second attempt their scores became more consistent. Further, the median correlation between scores on the four clinical skill domains and three external measures was .15 for multiple-take examinees on their first attempt but increased to .27 for their second attempt, a value, which was comparable to the median correlation of .26 for single-take examinees. The findings support the validity of inferences based on scores from the second attempt.