Uluslararası Eğitim Kongresi: Gelecek için Eğitim, Ankara, Türkiye, 13 - 15 Mayıs 2015
In educational settings researchers and practitioners are interested in students’ attributes such
as achievement, intelligence, aptitude, ability, skills, attitudes, interests and motivation. These
attributes are defined as psychological constructs since they could not be observed and
measured directly. Psychological constructs are measured by observing the behaviors which
are accepted as indicators of these constructs (Lord & Novick, 1968; Embretson & Reise,
2000). Therefore, information is obtained about the individuals in terms of the related
attributes obtained through psychological measurement tools such as tests, scales
questionnaires. A test is a measuring tool that describes numerically the degree of amount of
the interested construct individuals have under standardized conditions. Tests contain a set of
test items to measure the related constructs and they are used for many purposes in
educational settings. According to the results obtained from these tests, many decisions are
made about the students such as admission and placement to some programs. Therefore it is
important to get valid and reliable measures (Haladayna, 2004). Regardless of the purpose of
measurement, tests are required to have the psychometric properties as validity and reliability.
For example, if a test intents to discriminate among examinees over a wide range of ability, it
needs to be composed of items of medium difficulty. On the other hand, if a test aims to
identify areas of specific weaknesses for low-ability students, it needs to include a substantial
number of items which are relatively easy for the students as a whole (Crocker & Algina,
1986). As it is understood, through the intentions of measurement, tests to be used are
differing in terms of ability levels. Therefore, it is important to know which test is more
suitable for the measurement purposes. IRT has an important advantage in terms of item and
test information functions which clarify the effectiveness of the test according to ability levels
of individuals by taking account the amount of information provided by these functions. Item
response theory is an effective way of describing items and tests, selecting test items and
comparing tests. Preparing the suitable test design involves the use of item and test
information functions. Item information function has an important role in item evaluation and
test development. Since a test is a composition of items, the test information at a given ability
level is computed by summing the item information at that level. As a result, the amount of
information provided by the test will be much higher than the amount of information provided
by a single item. Hence, a test estimates the ability more precisely than a single item
(Hambleton, Swaminathan, & Rogers, 1991; Baker, 2001). It could be determined at which
points on the theta scale the test provides the most information. Moreover, selecting the
appropriate model for the related study is crucial in educational and psychological
measurement for dealing with measurement errors. Since, it clarifies the relationships among
test items and ability scores to achieve the best test design (Hambleton & Jones, 1993).
Therefore it is considered that comparison of dichotomous IRT models for different ability
levels in terms of the item and test information functions would yield more information about
reliability of measures. For this reason at this study it is aimed to compare dichotomously
scored one-parameter, two-parameter, and three-parameter logistic item response theory
models in terms of the test information function at the three ability levels as low, middle and
high, separately. Therefore, the method of this study is survey research. Data was collected by
using the test that aims to measure students’ achievement levels on the subject of “educational
measurement and evaluation”. This test was developed by researcher and administered to the
students in the Gazi University at the Faculty of Education at the spring term of 2014-2015
academic year. Obtained data includes 264 participiants’ responses. Then, this data is
simulated in R studio by package of Latent Trait Models under IRT by taking the sample size
1000. Similarly, this simulated data was analyzed in the program of R studio. The analyses
were carried out by the R package of Latent Trait Models under IRT. The results show that
one and two-parameter logistic models provide the highest information at the middle ability
level, and the lowest information at the high ability level. Moreover, three-parameter logistic
model provides the highest information at the middle ability level although it provides the
lowest information at the low ability level. Also, three-parameter model provides the highest
information among these models in terms of total information (%95.19), which explains 64.39
percent of total information at the middle ability level. This findings show that guessing
parameter is an important factor for this achievement test. Therefore, use of three-parameter
logistic model is the most suitable one for this test, and also this test could be used for
participants at the middle ability level. For the future researches, it is recommended to
compare the dichotomously scored models in terms of ability estimation at different ability
levels.