[A muggle preface by Ivailo] As every fantasy lover knows, IRT people belong to two towers, or is it schools of magic. The little wizards and witches of one school learn to condition on their sufficient statistics before they can even fly or play quidditch, while at the other school they will integrate out anything thrown at them.

The two approaches have their subtle differences. To predict the score distribution conditional on ability, wizards at the first school apply the same powerful curse, elementary symmetric functions, that they use for all kinds of magic – from catching dragons over estimating parameters, up to item-total and item-rest regressions, to mention but a few. At the second school, computing the score distribution used to be a frustrating task until Lord and Wingersky (1984) came up with a seemingly unrelated jinx that did the trick.

In what follows, Timo explains how the two kinds of magic relate to each other.

Consider the Rasch model. Let XiX_{i} denote the binary-coded response to item ii and assume that: p(𝐱|t,𝐛)=∏ip(xi|t,bi)=tx+∏ibixi∏i(1+tbi) p(\mathbf{x}|t,\mathbf{b}) =\prod_i p(x_{i}|t,b_i) =\frac{t^{x_+} \prod_i b^{x_{i}}_i}{\prod_i (1 + t b_i)} where t>0t>0 is person ability, bib_i the difficulty of item ii, and x+x_+ the number of correct answers. Conditional on the person’s ability and the item difficulties, the test score distribution is: P(X+=s|t,𝐛)=βˆ‘π±:x+=sp(𝐱|t,𝐛)=tsΞ³s(𝐛)∏i(1+tbi), P(X_+=s|t, \mathbf{b}) = \sum_{\mathbf{x} : x_+ = s} p(\mathbf{x}|t,\mathbf{b}) =\frac{t^s \gamma_s(\boldsymbol{b})}{\prod_i (1+t b_i)}, where Ξ³s(𝐛)=βˆ‘π±:x+=s∏ibixi\gamma_s(\boldsymbol{b}) = \sum_{\mathbf{x} : x_+ = s} \prod_i b_i^{x_i} is the elementary symmetric function (esf) of order ss of the item parameters 𝐛=(b1,b2,…,bm)\mathbf{b}=(b_1,b_2,\dots, b_m).

Thus, the esf arise naturally when one works with Rasch models. They appear when we expand a linear factorization of a monic polynomial. More specifically, since βˆ‘sP(X+=s|ΞΈ)=1\sum_s P(X_+=s|\theta)=1, we find the identity:
∏i=1m(1+tbi)=βˆ‘s=0mΞ³s(𝐛)ts \prod^m_{i=1} (1+tb_i) =\sum^m_{s=0} \gamma_s(\boldsymbol{b}) t^s which is a generalization of Newton’s binomial theorem where 𝐛=𝟏\mathbf{b} = \mathbf{1} and Ξ³s(𝟏)=(ms)\gamma_s(\mathbf{1}) = {m \choose s}.

The notation can be adapted to also handle items with more than two score categories. Let yijy_{ij} denote the dummy-coded response to item ii; that is, yij=1y_{ij}=1 category j∈{0,…,mi}j \in \{0,\dots, m_i\} of item ii is picked and zero otherwise. Let aija_{ij} denote the number of points earned for a response in category jj such that ai0=0a_{i0}=0. The IRT model becomes p(𝐲|t,𝐛)=∏i∏j(taijbij)yijβˆ‘jtaijbij p(\mathbf{y}|t,\mathbf{b}) = \prod_i \frac{\prod_j \left( t^{a_{ij}}b_{ij}\right)^{y_{ij}}}{\sum_j t^{a_{ij}}b_{ij}}

This is a nominal response model with fixed integer item-category scores. We use the letter bb for the item parameters but note that we have included a dummy parameter bi0=1b_{i0}=1 for the lowest category. For a dichotomous item, mi=1m_i=1, so that j∈{0,1}j\in\{0,1\}, ai0=0a_{i0}=0, ai1=1a_{i1}=1, and bi1=bib_{i1}=b_i.

Using this notation, the test score distribution is: P(Y++=s|t,𝐛)=ty++Ξ³s(𝐛)∏i(βˆ‘jtaijbij), P(Y_{++}=s|t, \mathbf{b}) = \frac{t^{y_{++}} \gamma_s(\boldsymbol{b})}{\prod_i \left(\sum_j t^{a_{ij}}b_{ij}\right)}, where y++=βˆ‘iβˆ‘jaijyij=x+y_{++}=\sum_i \sum_j a_{ij} y_{ij} = x_+. and esf are defined as: Ξ³s(𝐛)=βˆ‘π²:y++=s∏i∏jbijyij \gamma_s(\mathbf{b}) = \sum_{\mathbf{y}:y_{++}=s} \prod_i \prod_j b^{y_{ij}}_{ij} It follows that: ∏i=1m(1+βˆ‘j=1mitaijbij)=βˆ‘s=0MtsΞ³s(𝐛) \prod^m_{i=1} \left(1+\sum^{m_i}_{j=1} t^{a_{ij}}b_{ij}\right) = \sum^{M}_{s=0} t^{s} \gamma_s(\boldsymbol{b}) where ss now runs over values of y++y_{++}; that is, from zero to the highest test score M=βˆ‘iaimiM=\sum_i a_{im_i}.

To illustrate, consider m=2m=2 items with three categores each; that is, mi=2m_i=2, and ai1=1a_{i1}=1. Writing Ξ³s\gamma_s as short-hand for Ξ³s(𝐛)\gamma_s(\mathbf{b}) the above identity says: (1+ta11b11+ta21b21)(1+ta11b11+ta21b21)=Ξ³0t0+Ξ³1t1+Ξ³2t2+t3Ξ³3+Ξ³4t2 \left(1+t^{a_{11}}b_{11} + t^{a_{21}}b_{21}\right) \left(1+t^{a_{11}}b_{11} + t^{a_{21}}b_{21}\right)= \gamma_0 t^0 + \gamma_1 t^1 + \gamma_2 t^2 + t^3 \gamma_3 + \gamma_4 t^2 where Ξ³0=1(by definition)Ξ³1=b11+b21Ξ³2=b22+b11b21+b12Ξ³3=b11b22+b12b21Ξ³4=b12b21\begin{align*} \gamma_0 &=1 \text{(by definition)}\\ \gamma_1 &=b_{11}+b_{21}\\ \gamma_2 &=b_{22}+b_{11}b_{21}+b_{12}\\ \gamma_3 &=b_{11}b_{22}+b_{12}b_{21} \\ \gamma_4 &=b_{12}b_{21} \end{align*} Looking at the subscripts, an easy trick is: Ξ³s\gamma_s is the sum of the products of item parameters over all possible ways to obtain a score ss. For example, a score of 2 can be obtained by a score of 2 on item 2, a score of 1 on both items or a score of 2 on item 1. Note that, for any score that is impossible, the corresponding esf is set to zero.

The esf are sums of products and calculating these is not trivial as rounding errors tend to accumulate. Fortunately, the esf satisfy the following recursive relation: Ξ³s(𝐛)=βˆ‘j=0miΞ³sβˆ’aij(𝐛(i))bij, \gamma_s(\mathbf{b}) = \sum^{m_i}_{j=0} \gamma_{s-a_{ij}}(\mathbf{b}^{(i)}) b_{ij}, where 𝐛(i)\mathbf{b}^{(i)} denotes the item parameters excluding the parameters that belong to item ii. This recursion is the basis for a numerically stable algorithm known as the sum (or summation)-algorithm. Proposed by Andersen (1972), this algorithm is implemented in dexter’s elsym function.

We illustrate the sum-algorithm with three Rasch items ordered in an arbitrary way. We begin the recursion with the first item: Ξ³0[1]=b10=1\gamma^{[1]}_0 = b_{10} = 1 and Ξ³1[1]=b11\gamma^{[1]}_1 = b_{11}, where the superscript indicates the number of items. Then, we add the second item: Ξ³0[2]=Ξ³0[1]b20=b10b20=1Ξ³1[2]=Ξ³1[1]b20+Ξ³0[1]b21=b11b20+b21b10=b11+b21Ξ³2[2]=Ξ³1[1]b21=b11b21\begin{align*} \gamma^{[2]}_0 &= \gamma^{[1]}_{0}b_{20} = b_{10}b_{20} = 1\\ \gamma^{[2]}_1 &= \gamma^{[1]}_{1}b_{20} + \gamma^{[1]}_{0}b_{21} = b_{11}b_{20} + b_{21}b_{10}=b_{11}+b_{21}\\ \gamma^{[2]}_2 &= \gamma^{[1]}_{1}b_{21} = b_{11}b_{21} \end{align*} Finally, we add the third item: Ξ³0[3]=Ξ³0[2]b30=b10b20b30=1Ξ³1[3]=Ξ³1[2]b30+Ξ³0[2]b31=b11b20b30+b21b10b30+b10b20b31=b11+b21+b31Ξ³2[3]=Ξ³2[2]b30+Ξ³1[2]b31=b11b21b30+b11b20b31+b21b10b31=b11b21+b11b31+b21b31Ξ³3[3]=Ξ³2[2]b31=b11b21b31\begin{align*} \gamma^{[3]}_0 &= \gamma^{[2]}_{0}b_{30} = b_{10}b_{20}b_{30} = 1\\ \gamma^{[3]}_1 &= \gamma^{[2]}_{1}b_{30} + \gamma^{[2]}_{0}b_{31} = b_{11}b_{20}b_{30} + b_{21}b_{10}b_{30} +b_{10}b_{20}b_{31} = b_{11}+b_{21}+b_{31}\\ \gamma^{[3]}_2 &= \gamma^{[2]}_{2}b_{30} + \gamma^{[2]}_{1}b_{31} = b_{11}b_{21}b_{30}+b_{11}b_{20}b_{31} + b_{21}b_{10}b_{31} =b_{11}b_{21} + b_{11}b_{31} + b_{21}b_{31}\\ \gamma^{[3]}_3 &= \gamma^{[2]}_{2}b_{31} = b_{11}b_{21}b_{31} \end{align*}

Now consider two ways to calculate the conditional test score distribution. First, using the identity in the previous section, we can write: p(X+=s|t)=p(Y++=s|t,𝐛)=ty++Ξ³s(𝐛)βˆ‘s=0MtsΞ³s(𝐛) p(X_+=s|t) = p(Y_{++}=s|t, \mathbf{b}) = \frac{t^{y_{++}} \gamma_s(\boldsymbol{b})}{\sum^M_{s=0} t^{s} \gamma_s(\mathbf{b})} This is the expression used by the p_score function in dexter.

Second, we can also write: P(Y++=s|t,𝐛)=βˆ‘π²:y+=s∏i∏jΟ€ijyij=Ξ³s(𝛑) P(Y_{++}=s|t, \mathbf{b}) = \sum_{\mathbf{y}:y_+ = s} \prod_i \prod_j \pi_{ij}^{y_{ij}} = \gamma_s(\boldsymbol{\pi}) where Ο€ij\pi_{ij} is short-hand for p(Yij=1|t)=p(Xi=j|t)p(Y_{ij}=1|t) = p(X_i=j|t). Written in this way, the test score distribution is an esf, albeit with argument 𝛑\boldsymbol{\pi}. The following R function uses this observation to calculate the score distribution:

LW = function(Pi, a)
{
  first = which(a==0)
  last = c(first[-1]-1,length(a))
  dexter:::elsym(Pi,a,first,last)
}

The elsym function is internal to dexter, hence the three colons. This function actually implements the Lord-Wingersky algorithm, extended to polytomous items. Compared to the first solution, it has the advantage that it works for any valid 𝛑=(Ο€10,Ο€11,…,Ο€m1)\boldsymbol{\pi}=(\pi_{10},\pi_{11},\dots, \pi_{m1}) and 𝐚\mathbf{a} and could be used with any IRT model, not just the NRM implemented in dexter.

References

Andersen, Erling B. 1972. β€œThe Numerical Solution of a Set of Conditional Estimation Equations.” Journal of the Royal Statistical Society: Series B (Methodological) 34 (1): 42–54.
Lord, Frederic M, and Marilyn S Wingersky. 1984. β€œComparison of IRT True-Score and Equipercentile Observed-Score Equatings.” Applied Psychological Measurement 8 (4): 453–61.