Comparison of Dichotomous Item Response Theory Models in Terms of Test Information Function


Creative Commons License

Çelikten S., Önen E.

Uluslararası Eğitim Kongresi: Gelecek için Eğitim, Ankara, Türkiye, 13 - 15 Mayıs 2015

  • Yayın Türü: Bildiri / Özet Bildiri
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Gazi Üniversitesi Adresli: Evet

Özet

In educational settings researchers and practitioners are interested in students’ attributes such

as achievement, intelligence, aptitude, ability, skills, attitudes, interests and motivation. These

attributes are defined as psychological constructs since they could not be observed and

measured directly. Psychological constructs are measured by observing the behaviors which

are accepted as indicators of these constructs (Lord & Novick, 1968; Embretson & Reise,

2000). Therefore, information is obtained about the individuals in terms of the related

attributes obtained through psychological measurement tools such as tests, scales

questionnaires. A test is a measuring tool that describes numerically the degree of amount of

the interested construct individuals have under standardized conditions. Tests contain a set of

test items to measure the related constructs and they are used for many purposes in

educational settings. According to the results obtained from these tests, many decisions are

made about the students such as admission and placement to some programs. Therefore it is

important to get valid and reliable measures (Haladayna, 2004). Regardless of the purpose of

measurement, tests are required to have the psychometric properties as validity and reliability.

For example, if a test intents to discriminate among examinees over a wide range of ability, it

needs to be composed of items of medium difficulty. On the other hand, if a test aims to

identify areas of specific weaknesses for low-ability students, it needs to include a substantial

number of items which are relatively easy for the students as a whole (Crocker & Algina,

1986). As it is understood, through the intentions of measurement, tests to be used are

differing in terms of ability levels. Therefore, it is important to know which test is more

suitable for the measurement purposes. IRT has an important advantage in terms of item and

test information functions which clarify the effectiveness of the test according to ability levels

of individuals by taking account the amount of information provided by these functions. Item

response theory is an effective way of describing items and tests, selecting test items and

comparing tests. Preparing the suitable test design involves the use of item and test

information functions. Item information function has an important role in item evaluation and

test development. Since a test is a composition of items, the test information at a given ability

level is computed by summing the item information at that level. As a result, the amount of

information provided by the test will be much higher than the amount of information provided

by a single item. Hence, a test estimates the ability more precisely than a single item

(Hambleton, Swaminathan, & Rogers, 1991; Baker, 2001). It could be determined at which

points on the theta scale the test provides the most information. Moreover, selecting the

appropriate model for the related study is crucial in educational and psychological

measurement for dealing with measurement errors. Since, it clarifies the relationships among

test items and ability scores to achieve the best test design (Hambleton & Jones, 1993).

Therefore it is considered that comparison of dichotomous IRT models for different ability

levels in terms of the item and test information functions would yield more information about

reliability of measures. For this reason at this study it is aimed to compare dichotomously

scored one-parameter, two-parameter, and three-parameter logistic item response theory

models in terms of the test information function at the three ability levels as low, middle and

high, separately. Therefore, the method of this study is survey research. Data was collected by

using the test that aims to measure students’ achievement levels on the subject of “educational

measurement and evaluation”. This test was developed by researcher and administered to the

students in the Gazi University at the Faculty of Education at the spring term of 2014-2015

academic year. Obtained data includes 264 participiants’ responses. Then, this data is

simulated in R studio by package of Latent Trait Models under IRT by taking the sample size

1000. Similarly, this simulated data was analyzed in the program of R studio. The analyses

were carried out by the R package of Latent Trait Models under IRT. The results show that

one and two-parameter logistic models provide the highest information at the middle ability

level, and the lowest information at the high ability level. Moreover, three-parameter logistic

model provides the highest information at the middle ability level although it provides the

lowest information at the low ability level. Also, three-parameter model provides the highest

information among these models in terms of total information (%95.19), which explains 64.39

percent of total information at the middle ability level. This findings show that guessing

parameter is an important factor for this achievement test. Therefore, use of three-parameter

logistic model is the most suitable one for this test, and also this test could be used for

participants at the middle ability level. For the future researches, it is recommended to

compare the dichotomously scored models in terms of ability estimation at different ability

levels.