In a previous posting we discussed IRT true score
equating, and in the process we identified a few of its limitations, most notably the inability to use the method below the
chance performance level. This time around we’re going to talk about another IRT equating method that, while a bit more
complicated than true score equating, does not suffer from this limitation. This method is usually referred to as IRT observed
score equating.
Wait a minute. If we’re equating the observed
scores, how does IRT fit in?
Well, actually IRT observed score equating is a
bit of a misnomer. It should probably be called something like IRT theoretical observed score distribution equating. What
we do is use IRT to construct the expected observed score distribution for each of the forms to be equated using their IRT
item parameter values and a target ability distribution. We can then equate those two score distributions with a conventional
method such as equipercentile equating.
Huh?
Yeah, well, we said it was more complicated. Look, equipercentile equating is pretty straightforward and can be dealt
with another time. For now let’s just focus on constructing the observed score distributions. And let’s do it
with a simple case, say a form with just three items.
First,
let’s build a form. The table below shows the IRT statistics for the three items we’ve selected for our form.
Item | a | b | c |
1 | 1.21 | -0.53 | 0.05 |
2 | 0.93 | -0.27 | 0.15 |
3 | 0.67 | 0.55 | 0.22 |
Now, for a three-item test there are eight possible item score
vectors, which are shown in the next table. And if we pick a value for theta we can plug the theta and the item statistics
into the macro for computing likelihoods we can obtain the likelihood of each of the vectors.
# | 1 | 2 | 3 |
1 | 0 | 0 | 0 |
2 | 0 | 0 | 1 |
3 | 0 | 1 | 0 |
4 | 1 | 0 | 0 |
5 | 0 | 1 | 1 |
6 | 1 | 0 | 1 |
7 | 1 | 1 | 0 |
8 | 1 | 1 | 1 |
Assuming a theta value of 0.5, the likelihood of each of the
above item score vectors is as given in the next table. Note that for future use we have also included the number correct
score resulting from each item score vector.
# | 1 | 2 | 3 | Score | L |
1 | 0 | 0 | 0 | 0 | 0.007911 |
2 | 0 | 0 | 1 | 1 | 0.011811 |
3 | 0 | 1 | 0 | 1 | 0.032883 |
4 | 1 | 0 | 0 | 1 | 0.069875 |
5 | 0 | 1 | 1 | 2 | 0.049095 |
6 | 1 | 0 | 1 | 2 | 0.104327 |
7 | 1 | 1 | 0 | 2 | 0.290446 |
8 | 1 | 1 | 1 | 3 | 0.43365 |
Now, here’s an interesting point. Because the above table
includes all possible item score vectors, the likelihoods by necessity sum to 1.0, which means that in effect the likelihoods
we’ve computed give the relative likelihoods, or put another way, the expected frequency distribution, of the different
vectors.
That just leaves us with one last step. We don’t
want the expected frequency distribution of the item score vectors, we want the expected frequency distribution of the number
correct scores. That’s were the score column in the above table comes into play. To get what we want all we have to
do is add together the likelihoods for the vectors that produce the same number correct score. That produces the following
table.
Score | Freq |
0 | 0.007911 |
1 | 0.11457 |
2 | 0.443869 |
3 | 0.43365 |
This is just the expected distribution for a single theta, of
course. To obtain the distribution for a target ability distribution we would repeat this process for each value of theta,
multiple the frequencies for each theta by the relative frequency of the theta in the theta distribution, and add them all
together.
Below is an Excel macro that puts it all together
into a single process.
Sub GetObservedScoreDistribution()
Dim MySheet As Excel.Worksheet
Dim a() As Single, b() As Single, c() As Single, t As Single
Dim p As Single
Dim NItems As Byte
Dim u As Byte, Scr As Byte
Dim L As Single
Dim F()
As Single
Dim i As Long, j As Long
Set MySheet = ActiveWorkbook.Worksheets("Item
Stats")
NItems = 1
While Len(MySheet.Cells(NItems + 1, 1)) > 0
NItems =
NItems + 1
ReDim Preserve a(NItems - 1)
ReDim Preserve b(NItems - 1)
ReDim Preserve c(NItems - 1)
a(NItems
- 1) = MySheet.Cells(NItems, 2)
b(NItems - 1) = MySheet.Cells(NItems, 3)
c(NItems - 1) = MySheet.Cells(NItems, 4)
Wend
Set MySheet = Nothing
ReDim F(NItems)
NItems = NItems - 1
t = 0.5
Set MySheet = ActiveWorkbook.Worksheets("Vectors")
i = 1
While Len(MySheet.Cells(i + 1, 1)) > 0
i = i + 1
L = 1#
Scr = 0
For j =
1 To NItems
u = MySheet.Cells(i, j + 1)
Scr = Scr + u
p =
Prob3PL(a(j), b(j), c(j), t)
If u = 1 Then
L = L * p
Else
L = L * (1 - p)
End If
Next j
F(Scr) = F(Scr) + L
Wend
Set MySheet =
Nothing
Set MySheet = ActiveWorkbook.Worksheets("Output")
For i = 0 To NItems
MySheet.Cells(i + 2, 1)
= i
MySheet.Cells(i + 2, 2) = F(i)
Next i
Set MySheet = Nothing
ReDim a(0)
ReDim b(0)
ReDim c(0)
ReDim F(0)
MsgBox "Done"
End Sub
And there you have it! Yes?
Um, not really. What happens if the test has 50 items? Aren’t there
250 possible item score vectors for a 50-item test? That would take an awfully big workbook.
If you want to pick nits, then yes it would. So, would it be enough if we
just said use short tests?
No, we didn’t think so. In that case, try
this one out.
Sub ShortCut()
Dim MySheet As Excel.Worksheet
Dim a As Single, b As Single, c As Single, t As Single
Dim p() As Single
Dim F() As Single
Dim
NItems As Byte
Dim i As Long, j As
Long
Set MySheet = ActiveWorkbook.Worksheets("Item
Stats")
t = 0.5
NItems = 1
While Len(MySheet.Cells(NItems + 1, 1)) > 0
NItems = NItems + 1
ReDim Preserve
p(NItems - 1)
a = MySheet.Cells(NItems, 2)
b = MySheet.Cells(NItems, 3)
c = MySheet.Cells(NItems, 4)
p(NItems
- 1) = Prob3PL(a, b, c, t)
Wend
Set MySheet =
Nothing
ReDim F(NItems)
NItems = NItems - 1
F(0) = (1 - p(1)) * (1 - p(2))
F(1) = p(1) * (1
- p(2)) + (1 - p(1)) * p(2)
F(2) = p(1) * p(2)
For i = 3 To NItems
For j
= i To 1 Step -1
F(j) = F(j) * (1 - p(i)) + F(j - 1) *
p(i)
Next j
F(0)
= F(0) * (1 - p(i))
Next i
Set MySheet = ActiveWorkbook.Worksheets("Output")
For i = 0 To NItems
MySheet.Cells(i + 2, 1) = i
MySheet.Cells(i + 2, 2)
= F(i)
Next i
Set MySheet
= Nothing
ReDim p(0)
ReDim F(0)
MsgBox "Done"
End Sub
This macro works no matter how long the test is.
How does it work?
In this approach you don’t actually have to compute the likelihoods individually for the different vectors.
Instead, it takes advantage of the fact that at any point in the test the likelihood of a particular score is the likelihood
of the same score on one fewer items times the probability of missing the current item, plus the likelihood of a one-item-lower
score on one fewer items times the probability of getting the current item correct. That’s what the code in red is doing.
Uh, didn’t quite get that.
Okay. Look at it this way. If P1 is the probability of a correct
response to item 1, then L1 = P1 is the likelihood of a score of 1 after the first item while L0
= (1- P1) is the likelihood of a score of 0 after the first item. Then if P2 is the probability of a
correct response to item 2 then after the second item L0 = L0*(1-P2) because to still have
a score of 0 the respondent has to miss the second item. Similarly, to maintain a score of 1 after two items someone who had
a score of 1 after one item would have to miss the second item and to obtain a score of 1 after two items someone who had
missed the first item would have to get the second item correct. Thus, after two items L1 = L0*P2
+ L1*(1-P2).
Got it now? Good.