vignettes/blog/2018-11-05-test-and-item-information-functions-in-dexter.Rmd
2018-11-05-test-and-item-information-functions-in-dexter.Rmd
In an enormously influential short paper, Embretson (1996) sums up five most important differences between classical test theory (CTT) and item response theory (IRT). The first among them is that, in CTT, the standard error of measurement applies to all scores in a particular population, while in IRT the standard error of measurement differs across scores, but generalizes across populations.
To see how this works, we first take a leisurely, informal look at some simple examples with the Rasch model; we then examine the information functions more formally, and we explain how they are implemented in dexter.
The Rasch model predicts that, given an item and its difficulty parameter, , the probability of a correct response is a logistic function of ability, , namely, The information function for the same item happens to be , so it is, again, a function of . Below we show the item response curve (IRF) for the item, i.e., the function along with the corresponding item information function (IIF), shown in red. When , ; this is also the point where the IIF peaks, and the maximum is of course equal to 0.25.
Now, the test information function (TIF) is, simply, the sum of the IIF over all items in the test. Under the Rasch model, this is a function of bounded between 0 and the number of items in the test. Now, the standard error of measurement (SEM) is inversely related to the TIF. More precisely, , so it is again a function of , and it appears that it cannot go lower than with the number of items in the test.
This is all illustrated with three items on the plot below. In IRT, we have a clearer idea of planning our test and its precision. To get the error of measurement more even over the ability range, we should have items of various difficulties included. If, on the contrary, we are interested in maximizing measurement precision near some important threshold where decisions are made, it would make sense to include more items with difficulties in that particular range. And, the idea of adaptive testing is just around the corner.
That was the easy introduction, now follows the more interesting part. Let us stage the actors:
First, there is the IRT model. Let denote the response to item with the event that the th response was chosen corresponding to a score . In dexter, the basic IRT model is the NRM, which is a divide-by-total model and can be written as: where is a positive increasing functions of student ability with derivatives: $$ \frac{d F_{ij}(\theta)}{d \theta} = a_{ij} F_{ij}(\theta), \text{and} \\ \frac{d}{d\theta} \ln P_{ij}(\theta) = a_{ij} - \sum_h a_{ih}P_{ih}(\theta) $$
The second actor is the log-likelihood function. It is useful to define as a dummy-coded response with if and zero otherwise. This allows us to write the log-likelihood with respect to ability as: with derivatives
The item information function is what statisticians call the (observed) Fisher information of an item response variable about (see Ly et al. (2017)). It can be defined as:
Finally, the test information function is simply the sum of the item information functions; i.e., Since item information is just test information when the test consists of one item, we need only consider how to calculate test information. To do the calculations efficiently we write This inspires the following implementation, where first and last are vector of indices. That is, the parameters of each item are ordered with first indicating where (in vectors and ) they start and last where they end.
myIJ = function(b, a, first, last, theta)
{
nI=length(first)
I = matrix(0,nI, length(theta))
for (i in 1:nI)
{
Fij = b[first[i]:last[i]]*exp(outer(a[first[i]:last[i]],theta))
Pi = apply(Fij,2,function(x)x/sum(x))
M1 = Pi*a[first[i]:last[i]]
M2 = M1*a[first[i]:last[i]]
I[i,] = colSums(M2) - colSums(M1)^2
}
colSums(I)
}
To use myIJ
, we run fit_enorm
and take out
the item parameters,
,
the item scores,
,
and the vectors first
and last
from the
output. We will from now on use the user-level function which does this
for us.
As an example, let us calculate the information function using a real-data example. We first get the data from a test (no, we are not telling you which), and we estimate the item parameters using female respondents who took booklet 1652 (geslacht means gender in Dutch):
db = open_project("/Rdatasets/670.db")
prms = fit_enorm(db, booklet_id=="1652" & geslacht=="f")
The next step is to calculate the information function and plot it.
Inf_1652 = information(prms, booklet = "1652")
plot(Inf_1652, from=-4,to=4, main = "booklet 1652",
ylab = "test information",
xlab = "Ability")
## plot in standard-error of max. likelihood estimator of ability
abls=ability_tables(prms)
points(abls$theta,1/abls$se^2,pch=16, cex=0.5, col="blue")
The points on the curve correspond to , where is the standard-error of the maximum-likelihood estimate (MLE) of ability. This illustrates the relationship between the standard error of the MLE and test information given ability The plot shows that the test is most suited for respondents whose true abilities lie around .
It is worth noting that information
returns a function.
Thus, to get the test information at
,
one would simply type Inf_1652(0)
. The ability
corresponding to maximum information can be found using
optimize(Inf_1652,c(-4,4), maximum = TRUE)
; in this case
0.2750423.
In practice, item parameters are not known. Let us repeat the exercise but this time taking the uncertainty about the item parameters into account. To this effect, we first estimate the item parameters using a Bayesian method.
prms = fit_enorm(db, booklet_id=="1652" & geslacht=="f",
method = "Bayes", nDraws = 515)
The number of iterations, 515, is the number we need to produce a posterior sample of size 100 if we use a burn-in of 20 and thereafter choose every fifth-th sample to get rid of auto-correlations.
plot(Inf_1652, from=-4,to=4, main = "booklet 1652",
ylab = "test information",
xlab = "Ability")
for (i in seq(from=20,to=15+5*100,by=5))
{
Inf_ = information(prms, booklet = "1652", which.draw = i)
plot(Inf_, from=-4,to=4, add=TRUE, col="grey")
}
plot(Inf_1652, from=-4, to=4, add=TRUE)
The shape of the curve is well-preserved but note that, even with as many as 5753 respondents, the information function is clearly affected by uncertainty in the item parameters.
#References