Friday, September 7, 2012


Interpreting Item Analysis



Let's look at what we have and see what we can see

.

  1. What do we see looking at this first one? [Potential Miskey]

       Upper    Low    Difference    D    Total    Difficulty

1. *A    1      4        -3        -.2       5          .17
    B    1      3
    C   10      5
    D    3      3
    O   <----means omit or no answer
    • #1, more high group students chose C than A, even though A is supposedly the correct answer
    • more low group students chose A than high group so got negative discrimination;
    • only .16% of class got it right
    • most likely you just wrote the wrong answer key down 
--> this is an easy and very common mistake for you to make
      • better you find out now before you hand back then when kids complain
      • OR WORSE, they don't complain, and teach themselves that your miskey as the "correct" answer
    • so check it out and rescore that question on all the papers before handing them back
    • Makes it 10-5 Difference = 5; D=.34; Total = 15; difficulty=.50
      --> nice item
OR:
    • you check and find that you didn't miskey it --> that is the answer you thought

      two possibilities:
      1. one possibility is that you made slip of the tongue and taught them the wrong answer
        • anything you say in class can be taken down and used against you on an examination....
      2. more likely means even "good" students are being tricked by a common misconception -->
You're not supposed to have trick questions, so may want to dump it
--> give those who got it right their point, but total rest of the marks out of 24 instead of 25
If scores are high, or you want to make a point, might let it stand, and then teach to it --> sometimes if they get caught, will help them to remember better in future
such as:
    • very fine distinctions
    • crucial steps which are often overlooked
REVISE it for next time to weaken "B" 
-- alternatives are not supposed to draw more than the keyed answer
-- almost always an item flaw, rather than useful distinction

  1. What can we see with #2: [Can identify ambiguous items]

       Upper    Low   Difference   D    Total   Difficulty

2.  A    6      5
    B    1      2        
   *C    7      5        2        .13     12      .40
    D    1      3
    O
    • #2, about equal numbers of top students went for A and D.
      • either, students didn't know this material (in which case you can reteach it)
      • or the item was defective --->
    • look at their favorite alternative again, and see if you can find any reason they could be choosing it
    • often items that look perfectly straight forward to adults are ambiguous to students
FavoriteExamples of ambiguous items.
    • if you NOW realize that D was a defensible answer, rescore before you hand it back to give everyone credit for either A or D -- avoids arguing with you in class
    • if it's clearly a wrong answer, then you now know which error most of your students are making to get wrong answer
    • useful diagnostic information on their learning, your teaching

  1. Equally to all alternatives
      Upper    Low    Difference    D    Total    Difficulty

3.  A     4     3
    B     3     4
   *C     5     4       1          .06       9        .30         
    D     3     4
    O
    • item #3, students respond about equally to all alternatives
    • usually means they are guessing
Three possibilities:
      1. may be material you didn't actually get to yet
        • you designed test in advance (because I've convinced you to plan ahead) but didn't actually get everything covered before holidays....
        • or item on a common exam that you didn't stress in your class
      2. item so badly written students have no idea what you're asking
      3. item so difficult students just completely baffled
    • review the item:
      • if badly written ( by other teacher) or on material your class hasn't taken, toss it out, rescore the exam out of lower total
        • BUT give credit to those that got it, to a total of 100%

      • if seems well written, but too hard, then you know to (re)teach this material for rest of class....
      • maybe the 3 who got it are top three students, 
        • tough but valid item:
        • OK, if item tests valid objective
        • want to provide occasional challenging question for top students
        • but make sure you haven't defined "top 3 students" as "those able to figure out what the heck I'm talking about"

  1. Alternatives aren't working

       Upper    Low   Difference    D   Total   Difficulty

4.  A     1      5
   *B    14      7       7        .47     21        .77
    C     0      2
    D     0      0
    O  
    • example #4 --> no one fell for D --> so it is not a plausible alternative
    • question is fine for this administration, but revise item for next time
    • toss alternative D, replace it with something more realistic
    • each distracter has to attract at least 5% of the students
      • class of 30, should get at least two students

    • or might accept one if you positively can't think of another fourth alternative -- otherwise, do not reuse the item
if two alternatives don't draw any students --> might consider redoing as true/false

  1. Distracter too attractive

       Upper   Low   Difference   D   Total   Difficulty  

5.  A    7     10
    B    1      2
    C    1      1
   *D    5      2       3        .20     7       .23
    O


    • sample #5 --> too many going for A
--> no ONE distracter should get more than key

--> no one distracter should pull more than about half of students
 

-- doesn't leave enough for correct answer and five percent for each alternative
    • keep for this time
    • weaken it for next time

  1. Question not discriminating

      Upper   Low   Difference   D   Total   Difficulty


6. *A    7     7        0        .00     14      .47         
    B    3     2
    C    2     1
    D    3     5
    O
    • sample #6: low group gets it as often as high group
    • on norm-referenced tests, point is to rank students from best to worst
    • so individual test items should have good students get question right, poor students get it wrong
    • test overall decides who is a good or poor student on this particular topic
      • those who do well have more information, skills than those who do less well
      • so if on a particular question those with more skills and knowledge do NOT do better, something may be wrong with the question
    • question may be VALID, but off topic
      • E.G.: rest of test tests thinking skill, but this is a memorization question, skilled and unskilled equally as likely to recall the answer
      • should have homogeneous test --> don't have a math item in with social studies
      • if wanted to get really fancy, should do separate item analysis for each cell of your blueprint...as long as you had six items per cell

    • question is VALID, on topic, but not RELIABLE
      • addresses the specified objective, but isn't a useful measure of individual differences
      • asking Grade 10s Capital of Canada is on topic, but since they will all get it right, won't show individual differences -- give you low D

  1. Negative Discrimination

       Upper   Low   Difference   D   Total   Difficulty


7. *A    7    10      -3       -.20     17       .57 
    B    3     3
    C    2     1
    D    3     1
    O
    • D (discrimination) index is just upper group minus lower group
    • varies from +1.0 to -1.0
    • if all top got it right, all lower got it wrong = 100% = +1
    • if more of the bottom group get it right than the top group, you get a negative D index
    • if you have a negative D, means that students with less skills and knowledge overall, are getting it right more often than those who the test says are better overall
    • in other words, the better you are, the more likely you are to get it wrong
WHAT COULD ACCOUNT FOR THAT?

Two possibilities:
    • usually means an ambiguous question
      • that is confusing good students, but weak students too weak to see the problem
      • look at question again, look at alternatives good students are going for, to see if you've missed something


OR:
    • or it might be off topic

      --> something weaker students are better at (like rote memorization) than good students

      --> not part of same set of skills as rest of test--> suggests design flaw with table of specifications perhaps
((-if you end up with a whole bunch of -D indices on the same test, must mean you actually have two different distinct skills, because by definition, the low group is the high group on that bunch of questions
--> end up treating them as two separate tests))
    • if you have a large enough sample (like the provincial exams) then we toss the item and either don't count it or give everyone credit for it
    • with sample of 100 students or less, could just be random chance, so basically ignore it in terms of THIS administration
      • kids wrote it, give them mark they got
    • furthermore, if you keep dropping questions, may find that you're starting to develop serious holes in your blueprint coverage -- problem for sampling
      • but you want to track stuff this FOR NEXT TIME
    • if it's negative on administration after administration, consistently, likely not random chance, it's screwing up in some way
    • want to build your future tests out of those items with high positive D indices
    • the higher the average D indices on the test, the more RELIABLE the test as a whole will be 
    • revise items to increase D
-->if good students are selecting one particular wrong alternative, make it less attractive

-->or increase probability of their selecting right answer by making it more attractive
    • may have to include some items with negative Ds if those are the only items you have for that specification, and it's an important specification
      • what this means is that there are some skills/knowledge in this unit which are unrelated to rest of the skills/knowledge
        --> but may still be important

    • e.g., statistics part of this course may be terrible on those students who are the best item writers, since writing tends to be associated with the opposite hemisphere in the brain than math, right... but still important objective in this course
      • may lower reliability of test, but increases content validity

  1. Too Easy

       Upper   Low   Difference   D   Total   Difficulty

8.  A     0     1
   *B    14    13       1       .06     27       .90
    C     0     1
    D     1     1
    O
    • too easy or too difficult won't discriminate well either
    • difficulty (p) (for proportion) varies from +1.0 (everybody got it right) to 0 (nobody)
REMEMBER: THE HIGHER THE DIFFICULTY INDEX, THE EASIER THE QUESTION
    • if the item is NOT miskeyed or some other glaring problem, it's too late to change after administered --> everybody got it right, OK, give them the mark
TOO DIFFICULT = 30 to 35% (used to be rule in Branch, now not...)
    • if the item is too difficult, don't drop it, just because everybody missed it --> you must have thought it was an important objective or it wouldn't have been on there;
    • and unless literally EVERYONE missed it, what do you do with the students who got it right?
    • give them bonus marks?
    • cheat them of a mark they got?
furthermore, if you drop too many questions, lose content validity (specs)
--> if two or three got it right may just be random chance,
so why should they get a bonus mark

    • however, DO NOT REUSE questions with too high or low difficulty (p) values in future
if difficulty is over 85%, you're wasting space on limited item test
    • asking Grade 10s the Capital of Canada is probably waste of their time and yours --> unless this is a particularly vital objective
    • same applies to items which are too difficult --> no use asking Grade 3s to solve quadratic equation
    • but you may want to revise question to make it easier or harder rather than just toss it out cold
OR SOME EXCEPTIONS HERE: 

You may have consciously decided to develop a "Mastery" style tests
--> will often have very easy questions -& expect everyone to get everything trying to identify only those who are not ready to go on

--> in which case, don't use any question which DOES NOT have a difficulty level below 85% or whatever
Or you may want a test to identify the top people in class, the reach for the top team, and design a whole test of really tough questions
--> have low difficulty values (i.e., very hard)

    • so depends a bit on what you intend to do with the test in question
    • this is what makes the difficulty index (proportion) so handy
    1. you create a bank of items over the years
--> using item analysis you get better questions all the time, until you have a whole bunch that work great

-->can then tailor-make a test for your class

you want to create an easier test this year, you pick questions with higher difficulty (p) values;

you want to make a challenging test for your gifted kids, choose items with low difficulty (p) values

--> for most applications will want to set difficulty level so that it gives you average marks, nice bell curve
      • government uses 62.5 --> four item multiple choice, middle of bell curve,

    1. start tests with an easy question or two to give students a running start
    2. make sure that the difficulty levels are spread out over examination blueprint
      • not all hard geography questions, easy history
        • unfair to kids who are better at geography, worse at history
        • turns class off geography if they equate it with tough questions
-->REMEMBER here that difficulty is different than complexity, Bloom
      • so can have difficult recall knowledge question, easy synthesis
      • synthesis and evaluation items will tend to be harder than recall questions so if find higher levels are more difficult, OK, but try to balance cells as much as possible
      • certainly content cells should be the roughly the same

  1. OMIT

       Upper   Low   Difference   D   Total   Difficulty

9.  A    2     1
    B    3     4
   *C    7     3        4       .26     10       .33
    D    1     1
    O    2     4    
If near end of the test
    1. --> they didn't find it because it was on the next page

      --format problem
OR
    • --> your test is too long, 6 of them (20%) didn't get to it


OR, if middle of the test:
    1. --> totally baffled them because: 
      • way too difficult for these guys 
      • or because also 2 from high group too: ambiguous wording

  1. &
  2. RELATIONSHIP BETWEEN D INDEX AND DIFFICULTY (p)

      Upper   Low   Difference   D   Total   Difficulty

10.  A     0    5
    *B    15    0        15      1.0    15       .50
     C     0    5
     D     0    5
     O
   ---------------------------------------------------
11.  A    3     2
    *B    8     7         1      0.6     15      .50
     C    2     3
     D    2     3
     O
    • 10 is a perfect item --> each distracter gets at least 5
      discrimination index is +1.0
(ACTUALLY PERFECT ITEM WOULD HAVE DIFFICULTY OF 65% TO ALLOW FOR GUESSING)
    • high discrimination D indices require optimal levels of difficulty
    • but optimal levels of difficulty do not assure high levels of D
    • 11 has same difficulty level, different D
      • on four item multiple-choice, student doing totally by chance will get 25%


No comments:

Post a Comment