HomeAbout UsConsulting ServicesTricks of the TradeOur ToolboxOur PortfolioOur StoreContact Us

Tricks of the Trade

Tuesday, July 8, 2008

IRT Observed Score Equating

In a previous posting we discussed IRT true score equating, and in the process we identified a few of its limitations, most notably the inability to use the method below the chance performance level. This time around we’re going to talk about another IRT equating method that, while a bit more complicated than true score equating, does not suffer from this limitation. This method is usually referred to as IRT observed score equating.

 

Wait a minute. If we’re equating the observed scores, how does IRT fit in?

 

Well, actually IRT observed score equating is a bit of a misnomer. It should probably be called something like IRT theoretical observed score distribution equating. What we do is use IRT to construct the expected observed score distribution for each of the forms to be equated using their IRT item parameter values and a target ability distribution. We can then equate those two score distributions with a conventional method such as equipercentile equating.

 

Huh?

 

Yeah, well, we said it was more complicated. Look, equipercentile equating is pretty straightforward and can be dealt with another time. For now let’s just focus on constructing the observed score distributions. And let’s do it with a simple case, say a form with just three items.

 

First, let’s build a form. The table below shows the IRT statistics for the three items we’ve selected for our form.

 

Item

a

b

c

1

1.21

-0.53

0.05

2

0.93

-0.27

0.15

3

0.67

0.55

0.22

 

Now, for a three-item test there are eight possible item score vectors, which are shown in the next table. And if we pick a value for theta we can plug the theta and the item statistics into the macro for computing likelihoods we can obtain the likelihood of each of the vectors.

 

#

1

2

3

1

0

0

0

2

0

0

1

3

0

1

0

4

1

0

0

5

0

1

1

6

1

0

1

7

1

1

0

8

1

1

1

 

Assuming a theta value of 0.5, the likelihood of each of the above item score vectors is as given in the next table. Note that for future use we have also included the number correct score resulting from each item score vector.

 

#

1

2

3

Score

L

1

0

0

0

0

0.007911

2

0

0

1

1

0.011811

3

0

1

0

1

0.032883

4

1

0

0

1

0.069875

5

0

1

1

2

0.049095

6

1

0

1

2

0.104327

7

1

1

0

2

0.290446

8

1

1

1

3

0.43365

 

Now, here’s an interesting point. Because the above table includes all possible item score vectors, the likelihoods by necessity sum to 1.0, which means that in effect the likelihoods we’ve computed give the relative likelihoods, or put another way, the expected frequency distribution, of the different vectors.

 

That just leaves us with one last step. We don’t want the expected frequency distribution of the item score vectors, we want the expected frequency distribution of the number correct scores. That’s were the score column in the above table comes into play. To get what we want all we have to do is add together the likelihoods for the vectors that produce the same number correct score. That produces the following table.

 

Score

Freq

0

0.007911

1

0.11457

2

0.443869

3

0.43365

 

This is just the expected distribution for a single theta, of course. To obtain the distribution for a target ability distribution we would repeat this process for each value of theta, multiple the frequencies for each theta by the relative frequency of the theta in the theta distribution, and add them all together.

 

Below is an Excel macro that puts it all together into a single process.

 

Sub GetObservedScoreDistribution()

Dim MySheet As Excel.Worksheet

Dim a() As Single, b() As Single, c() As Single, t As Single

Dim p As Single

Dim NItems As Byte

Dim u As Byte, Scr As Byte

Dim L As Single

Dim F() As Single

Dim i As Long, j As Long

Set MySheet = ActiveWorkbook.Worksheets("Item Stats")

NItems = 1

While Len(MySheet.Cells(NItems + 1, 1)) > 0

    NItems = NItems + 1

    ReDim Preserve a(NItems - 1)

    ReDim Preserve b(NItems - 1)

    ReDim Preserve c(NItems - 1)

    a(NItems - 1) = MySheet.Cells(NItems, 2)

    b(NItems - 1) = MySheet.Cells(NItems, 3)

    c(NItems - 1) = MySheet.Cells(NItems, 4)

    Wend

Set MySheet = Nothing

ReDim F(NItems)

NItems = NItems - 1

t = 0.5

Set MySheet = ActiveWorkbook.Worksheets("Vectors")

i = 1

While Len(MySheet.Cells(i + 1, 1)) > 0

    i = i + 1

    L = 1#

    Scr = 0

    For j = 1 To NItems

        u = MySheet.Cells(i, j + 1)

        Scr = Scr + u

        p = Prob3PL(a(j), b(j), c(j), t)

        If u = 1 Then

            L = L * p

        Else

            L = L * (1 - p)

            End If

        Next j

    F(Scr) = F(Scr) + L

  Wend

Set MySheet = Nothing

Set MySheet = ActiveWorkbook.Worksheets("Output")

For i = 0 To NItems

    MySheet.Cells(i + 2, 1) = i

    MySheet.Cells(i + 2, 2) = F(i)

    Next i

Set MySheet = Nothing

ReDim a(0)

ReDim b(0)

ReDim c(0)

ReDim F(0)

MsgBox "Done"

End Sub

 

And there you have it! Yes?

 

Um, not really. What happens if the test has 50 items? Aren’t there 250 possible item score vectors for a 50-item test? That would take an awfully big workbook.

 

If you want to pick nits, then yes it would. So, would it be enough if we just said use short tests?

 

No, we didn’t think so. In that case, try this one out.

 

Sub ShortCut()

Dim MySheet As Excel.Worksheet

Dim a As Single, b As Single, c As Single, t As Single

Dim p() As Single

Dim F() As Single

Dim NItems As Byte

Dim i As Long, j As Long

Set MySheet = ActiveWorkbook.Worksheets("Item Stats")

t = 0.5

NItems = 1

While Len(MySheet.Cells(NItems + 1, 1)) > 0

    NItems = NItems + 1

    ReDim Preserve p(NItems - 1)

    a = MySheet.Cells(NItems, 2)

    b = MySheet.Cells(NItems, 3)

    c = MySheet.Cells(NItems, 4)

    p(NItems - 1) = Prob3PL(a, b, c, t)

    Wend

Set MySheet = Nothing

ReDim F(NItems)

NItems = NItems - 1

F(0) = (1 - p(1)) * (1 - p(2))

F(1) = p(1) * (1 - p(2)) + (1 - p(1)) * p(2)

F(2) = p(1) * p(2)

For i = 3 To NItems

    For j = i To 1 Step -1

        F(j) = F(j) * (1 - p(i)) + F(j - 1) * p(i)

        Next j

    F(0) = F(0) * (1 - p(i))

    Next i

Set MySheet = ActiveWorkbook.Worksheets("Output")

For i = 0 To NItems

    MySheet.Cells(i + 2, 1) = i

    MySheet.Cells(i + 2, 2) = F(i)

    Next i

Set MySheet = Nothing

ReDim p(0)

ReDim F(0)

MsgBox "Done"

End Sub

 

This macro works no matter how long the test is.

 

How does it work?

 

In this approach you don’t actually have to compute the likelihoods individually for the different vectors. Instead, it takes advantage of the fact that at any point in the test the likelihood of a particular score is the likelihood of the same score on one fewer items times the probability of missing the current item, plus the likelihood of a one-item-lower score on one fewer items times the probability of getting the current item correct. That’s what the code in red is doing.

 

Uh, didn’t quite get that.

 

Okay. Look at it this way. If P1 is the probability of a correct response to item 1, then L1 = P1 is the likelihood of a score of 1 after the first item while L0 = (1- P1) is the likelihood of a score of 0 after the first item. Then if P2 is the probability of a correct response to item 2 then after the second item L0 = L0*(1-P2) because to still have a score of 0 the respondent has to miss the second item. Similarly, to maintain a score of 1 after two items someone who had a score of 1 after one item would have to miss the second item and to obtain a score of 1 after two items someone who had missed the first item would have to get the second item correct. Thus, after two items L1 = L0*P2 + L1*(1-P2).

 

Got it now? Good.


12:54 pm cdt

Monday, June 16, 2008

Option Curves

In the posting on posterior-based DIF it was pointed out that when accumulating posteriors over test taker samples to obtain ability distributions it is a simple matter to limit the summation to those cases in which the test taker belongs to a particular group. In DIF analysis we are usually interested in groups based on demographic variables such as gender or ethnicity. However, we can just as easily limit our summations to test taker subgroups defined by what response option they choose when responding to a test question.

In the following table the column headed “A” contains the sum of the posteriors for all test takers within a particular sample of 6,500 test takers who selected option A on a particular multiple-choice item. The next column contains similar information for the test takers who picked the second option.

Node

Theta

A

B

C

D

Total

1

-3.00

17.01

16.75

13.73

18.66

66.16

2

-2.50

49.56

47.79

40.61

53.24

191.19

3

-2.00

139.02

130.37

120.06

144.15

533.61

4

-1.50

234.09

218.29

237.82

232.95

923.16

5

-1.00

194.81

187.48

302.88

196.60

881.77

6

-0.50

116.35

103.68

376.48

124.65

721.16

7

0.00

65.70

64.33

561.95

68.63

760.62

8

0.50

26.58

28.74

776.73

31.91

863.96

9

1.00

7.24

9.25

841.04

8.84

866.37

10

1.50

0.62

1.24

484.19

1.24

487.28

11

2.00

0.02

0.06

156.53

0.11

156.72

12

2.50

0.00

0.00

37.60

0.01

37.60