Journal of Measurement and Evaluation in Education and Psychology, cilt.15, sa.1, ss.65-78, 2024 (ESCI)
This study aimed to reveal the effect on reliability of testlets consisting of open-ended and multiple-choice items with similar content. For this purpose, two different mathematics achievement tests, one with multiple-choice items and the other with open-ended items, were administered to 128 8th-grade students. Reliability estimations on the obtained data were conducted in the Edu-G program based on the Generalizability Theory. A decision study was also performed. In the achievement test with testlets consisting of open-ended items, p×i×r (p: person, i: item, r: rater) fully crossed design was used when testlet effect was not considered; p×(i:t)×r (t: testlet) nested design was used when testlet effect was considered. According to the results, the reliability coefficient was estimated higher when the testlet effect was not considered. Similarly, in the achievement test with testlets consisting of multiple-choice items, the p×i crossed design was used when the testlet effect was not considered, and the p×(i:t) nested design was used when the testlet effect was considered. According to the results, the reliability coefficient was estimated higher when the testlet effect was not considered. According to the data obtained within the scope of the study, the reliability coefficient was estimated higher in the test with open-ended items than in the test with multiple-choice items. When the testlet effect was included, the change in the reliability coefficient in the test with open-ended items was higher than the change in the test with multiple-choice items. In the decision studies, it was observed that the increase in the number of items and testlets positively affected reliability, but the increase in testlets contributed to reliability more. In the tests consisting of open-ended items, it was observed that the increase in the number of raters contributed to reliability less than items and testlets.