A typical task in adaptive testing is to select, out of a precalibrated item pool, the most appropriate item to ask, given an interim estimate of ability, θ0\theta_0. A popular approach is to select the item having the largest value of the item information function (IIF) at θ0\theta_0.

When the IRT model is the Rasch model, the item response function (IRF) is P(X=1|θ,b)=11+exp(bθ)P(X=1|\theta,b)=\frac{1}{1+\exp(b-\theta)} where bb is the item difficulty, and the IIF can be computed as P(X=1|θ,b)[1P(X=1|θ,b)]P(X=1|\theta,b)[1-P(X=1|\theta,b)]. For all items, this function reaches a maximum of 0.25, encountered exactly where θ=b\theta=b and P(X=1|θ,b)=0.5P(X=1|\theta,b)=0.5. So picking the item with the maximum IIF is the same as picking the item for which a person of ability of θ0\theta_0 has a probability of 0.5 to produce the correct answer.

If we use the two-parameter logistic (2PL) model instead, the IRF is P(X=1|θ,a,b)=11+exp[a(bθ)]P(X=1|\theta,a,b)=\frac{1}{1+\exp[a(b-\theta)]} where the new parameter aa is called the discrimination parameter, and the IIF can be computed as a2P(X=1|θ,a,b)[1P(X=1|θ,a,b)]a^2P(X=1|\theta,a,b)[1-P(X=1|\theta,a,b)]. Note that while the contribution of the product, P(1P)P(1-P), remains bounded between 0 and 0.25, the influence of a2a^2 can become quite large – for example, if a=5a=5, P(1P)P(1-P) gets multiplied by 25, ten times the maximum value under the Rasch model. Selecting such an informative item gives us the happy feeling that our error of measurement will decrease a lot, but what happens to the person’s probability to give the correct response?

In our game, we have one person and two items. The person’s ability is 0, represented with a vertical gray line. One of the item is fixed to be a Rasch item with b=0b=0; its IRF is shown as a solid black curve, and its IIF as a dotted black curve. The second item, shown in red, is 2PL, and you can control its two parameters with the sliders. Initially, a=1a=1 and b=1b=1.

The purpose of the game is to find particularly awkward situations where the more informative item is the most inappropriate in the sense of:

  • having a difficulty as remote from θ=0\theta=0 as possible;

  • having a probability of a correct response at θ=0\theta=0 as low as possible.

Can you find the sweet spots? I have just computed the answer to the second question but I am not telling (in fact, I can tell you the value of PP: 0.0022, quite a bit off 0.5).