ITEM ANALYSIS
Evaluation is indispensable part of
and different types of tests are used for assessment and consequently
evaluation. Tests play important role in giving feedback stakeholders in
education on various aspects therefore quality of tests has always been a hot
issue since long; consequently literature is full of comprehensive discussion
on validity, reliability and the characteristics of quality assessment programs
(Stephen & Polly, 2006) so that to bring improvement in feedback. For this
reasons item analysis is widely used to improve test quality through knowing
about item statistics. Item statistics are not only used for improvement of
test but also in item revision (Lange, 1967). Item analysis allows us to
observe the characteristics of a particular item and can be used to ensure that
questions are of an appropriate standard for inclusion in a test. A
comprehensive knowledge of the factors leading to construct a good test item
can enable us to create more effective test besides standardizing the existing
tests. Improvement in the test through item analysis can save a lot of time and
energy on the part of teachers and test developer. Typically, in analysis of a
test, two values are computed, a difficulty level and a discrimination index.
One of the most important tasks
confronting by the faculty members is the student performance evaluation. Once
designed, the evaluative procedure must be administered and then scored,
interpreted and graded. After that, feedback must be presented to students. Doing
these tasks demands a broad range of cognitive, technical and interpersonal
resources on the part of faculty. But there is an even more critical task
remains which is investigating the quality of the evaluative procedure.
What
constitutes a good exam item? The students seem to know or at least believe
they know but are they correct when they claim that an item was too difficult,
too tricky or too unfair. According to Lewis Aiken, item analysis is a group of
procedures for assessing the quality of exam items. The purpose of an item
analysis is to improve the quality of an exam by identifying items that are
candidates for retention, revision or removal. It can also clarify what
concepts the examinees have and have not mastered.
PURPOSE:
1.
Assemble or write a
relatively large number of items of the type you want on the test.
2.
Analyze the items
carefully using item format analysis to make sure they are well-written and
clear.
3.
Pilot the items using a
group of students similar to the group that will ultimately be taking the test.
4.
4. Analyze the results
statistically using item analysis techniques.
5.
Select the most
effective items and make a shorter, more effective revised version of the test.
2 BROAD CATEGORIES:
Qualitative Item Analysis
Qualitative
item analysis procedures include careful proofreading of the exam prior to its
administration for typographical errors, for grammatical cues that might
inadvertently tip off examinees to the correct answer, and for the
appropriateness of the reading level of the material. This procedure can also
include small group discussion of the quality of the exam and its items with
the examinees who have already taken the test or with departmental student
assistants or even experts in the field. Some teachers use “think-aloud test
administration” in which examinees are asked to verbally express their opinions
or what they are thinking as they respond to each of the items on the exam. It
can assist the teachers in determining whether certain students misinterpreted
particular items and it can also determine why students may have misinterpreted
a particular item.
Quantitative Item Analysis
Specifically,
three numerical indicators are often derived during an item analysis: item
difficulty, item discrimination and distractor power.
1. Item Difficulty Index
(p)
The item difficulty statistic is an appropriate choice
for achievement or aptitude tests when the items are scored dividedly (correct
and incorrect). It can be derived for true-false, multiple choice, matching
items and essay items where the instructor can convert the range of possible
point values into categories “passing” and “failing.”
The item diificulty index or p can be computed by:
p
= Number of test takers who answered the
item correctly
Total number of students who answered the
item
p can range between 0.00 (no
examinees answered the item correctly) and 1.00 (all examinees answered the
item correctly. It can also range from 0% to 100%, the higher the value, the
easier the item. Pvalues above 0.90 are very easy items and might be a concept
not worth testing. P-values below 0.20 indicate difficult items and should be
reviewed for possible confusing language or the contents needs re-instruction.
Optimum difficulty level is 0.50 for maximum discrimination between high and
low achievers.
No test item need have only one p value. Not only may the p value vary with each class group that
takes the test, the teacher may gain insight by computing the item difficulty
level for a number of different subgroups within a class, those who did well on
the exam overall and those who performed more poorly.
For example,
the difficulty level is 0.20 so it means 20% of the examinees answered the item
correctly. Does this item mean that the item was challenging for all? Does it
mean that the teacher failed in his or her attempt to teach the particular
topic assessed by the item? Does it mean that the students failed to learn the
material? Does it mean that the item was poorly written? Teachers must also
rely on other item analysis procedure, qualitative and quantitative.
2. Item Discrimination
Index (D)
Item discrimination analysis deals with the fact that
often different test takers will answer a test item in different ways. It
addresses the validity of the items on a test, the extent to which the items
tap the attributes they were intended to assess.
It is the point-bacterial
relationship between students’ performance on individual item and total test
score. This value ranges between 0.0 and 1.00. The higher the value, more discriminating
the item. A highly discriminating item indicates that the students who had high
tests scores got the item correct whereas students who had low test scores got
the item incorrect.
Teachers test because they want to find out if students
know the material but they learn is how they did on the exam we gave them. Item
discrimination index tests the test in the hope of keeping the correlation
between knowledge and exam performance as close as it can be in an admittedly
imperfect system.
It is calculated by:
A.
Divide the group of test takers into 2
groups. (high scoring and low scoring).
B.
Compute the item difficulty levels
separately for the upper (Pupper)
and lower (Plower) scoring
groups.
C.
D = pupper- plower
How can this
be interpreted?
Example: Half of the examinees
answered a particular item correctly and that all of the examinees who scored
above the median on the exam answered the item correctly and all of the
examinees who scored below the median answered incorrectly. Pupper = 1.00 and plower = 0.00. Then, D = 1.00
and the item is somehow a perfect positive discriminator. This suggests that
those examinees who knew the material and were well-prepared passed the item
while the others failed it.
Difficulty and discrimination are
not independent. If all students in both the upper and lower levels either pass
or fail an item, there is nothing in the data to indicate whether the item
itself was good or not. The value of the item discrimination index will be
maximized when only half of the test takers overall answer an item correctly.
The ideal situation is one in which the half who passed the item were students
who all did well on the exam overall. There are many reasons to include at
least some such items. Very easy items can reflect the fact that some
relatively straightforward concepts were taught well and mastered by all
students. The teacher may choose to include some very difficult items on the
exam to challenge even the best-prepared students. The teacher should be aware
that neither of these types of items functions well to make discriminations
among those taking the test.
3. Item Distractor
Analysis
This mainly applies particularly to
multiple-choice item. The incorrect alternatives are called distractors. Item
distractor analysis examines the percentage of examinees who select each
incorrect alternative, to determine whether the distractors are functioning as
intended. On a well-designed multiple choice item, those who know the material
and are well-prepared for the exam should select the correct alternative. Those
who are not well-prepared should guess or select almost randomly from among the
available distractors. Such an item would be a very good discriminator and
would very likely be a candidate for retention for use in future exams.
It can also provide useful
diagnostic information in other situations. Candidate for removal from the exam
is the item that was passed by more of those who did poorly on the exam overall
that those who were well-prepared and knew the material
Caution
in Item Analysis
•
Item analysis data is just a reflection of internal consistency and therefore
should not be treated as item
validity which requires an external criteria e.g. experts’ opinions to
accurately judge the validity
of test items.
•
A low discrimination index does not make an item to be dropped from a test
because extremely difficult
or easy items may have low ability to discriminate. But such items can be
included in a test to sample course
content adequately. Similarly an item may have low discrimination because multidimensionality of a test.
•
Item analysis data are tentative due to influences by factors like sample of
students, quality of instruction,
and chance errors.
(“Teste
Item Analysis”, Online,
www.utexas.edu/academic/mec/research/.../itemanalysishandout.pdf,
accessed on 15-08-2010)
Conclusion:
Item difficulty and discrimination
analysis programs are often included in the software used in processing exams
answered on Scantron or other optically scannable forms. These analyses can
often be performed for students by personnel in the computer services office.
Item analysis can certainly help determine whether or not the items on the
exams were good ones and to determine which items to retain, revise or replace.
References:
(Zurawski, R. Making the
Most of Exams: Procedures for Item Analysis.)
very helpful.. thanks for this :D
ReplyDelete