Examination of the Reliability of the Measurements Regarding the Written Expression Skills According to Different Test Theories


Creative Commons License

Yildirim Seheryeli M., Tan Ş.

JOURNAL OF MEASUREMENT AND EVALUATION IN EDUCATION AND PSYCHOLOGY-EPOD, cilt.10, ss.327-347, 2019 (ESCI İndekslerine Giren Dergi) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 10
  • Basım Tarihi: 2019
  • Doi Numarası: 10.21031/epod.559470
  • Dergi Adı: JOURNAL OF MEASUREMENT AND EVALUATION IN EDUCATION AND PSYCHOLOGY-EPOD
  • Sayfa Sayıları: ss.327-347

Özet

The aim of the study is to examine the reliability estimations of written expression skills analytical rubric based on the Classical Test Theory (CTT), Generalizability Theory (GT) and Item Response Theory (IRT) which differ in their field of study. In this descriptive study, the stories of the 523 students in the study group were scored by seven raters. CTT results showed that Eta coefficient revealed that there was no difference between the scoring of the raters (eta =. 926); Cronbach Alpha coefficients were over.88. GT results showed that G and Phi coefficients were over .97. The students' expected differentiation emerged, the difficulty levels of the criteria did not change from one student to another, and the consistency between the scores among raters was excellent. In the Item Response Theory, parameters were estimated according to Samejima's (1969) Graded Response Model and item discrimination differed according to the different raters. According to b parameters, for all the raters; individuals are expected to be at least -2.35, -0.80, 0.41 ability level in order to be scored higher than 0, 1 or 2 categories respectively with.50 probability. Marginal reliability coefficients were quite high (around .93). The Fisher Z' statistic was calculated for the significance of the difference between all reliability estimates. GT revealed more detailed information than CTT in the explanation of error variance sources and determination of reliability; while IRT provided more detailed information than CTT in determining the item-level error estimations and the ability level. There was a significant difference between the estimated parameters of CTT and GT in interrater reliability (p <.05); there was no significant difference between the parameters predicted according to CTT and IRT (p >.05).