Wednesday, SEPTEMBER 5, 2012
Item Analysis and Item Discrimination
Submitted by: Song liping
Item analysis
Item analysis
is a process which examines student responses to individual test items in order
to assess the quality of those items and of the test as a whole. Item analysis is especially valuable in improving
items which will be used again in later tests, but it can also be used to
eliminate ambiguous or misleading items in a single test administration.
Item discrimination
Traditional test analysis
considers the extent to which a single item distinguishes between able and less
able candidates in a similar way to the test as a whole. Items which are not
consistent with the other items in the way in which they distinguish between
able and less able candidates (as measured by this test) are considered for
deletion, amendment, or placement on a different test. Modern test analysis techniques
consider other factors as well. (These are discussed later).For a test of many
items, it is common practice to assume that the total score on the trial test
is a reasonable estimate of achievement for that type of test. Criterion groups
may be selected on the basis of total score (if that type of analysis is being
done). When such an assumption is made, we expect candidates with high total
scores to have high achievement and candidates with low total scores to have low
achievement. The procedure investigates how each item distinguishes between candidates
with knowledge and skill, and those lacking such knowledge and skills. Choosing
items with an acceptable discrimination index will tend to provide a new
version of the test
with greater homogeneity.
[However this process should not be taken too far because a test measuring a
more complex area will be made less relevant if only one type of item is
retained.]
Difficulty and Discrimination Distributions
At the end of the Item
Analysis report, test items are listed according their degrees of difficulty
(easy, medium, hard) and discrimination (good, fair, poor). These distributions provide a quick overview
of the test, and can be used to identify items which are not performing well
and which can perhaps be improved or discarded.
Test Statistics
Two statistics are provided
to evaluate the performance of the test as a whole.
Reliability
Coefficient. The reliability of a test
refers to the extent to which the test is likely to produce consistent
scores.
three characteristics of
the test:
1. The intercorrelations
among the items -- the greater the relative number of positive
relationships, and the
stronger those relationships are, the greater the reliability. Item discrimination indices and the test's
reliability coefficient are related in this regard.
2. The length of the test
-- a test with more items will have a higher reliability, all other things being
equal.
3. The content of the test
-- generally, the more diverse the subject matter tested and the testing techniques
used, the lower the reliability.
Reliability coefficients
theoretically range in value from zero (no reliability) to 1.00 (perfect reliability). High reliability means that the questions of
a test tended to "pull together."
Students who answered a given question correctly were more likely to
answer other questions correctly. If a parallel
test were developed by using similar items, the relative scores of students
would show little change. Low reliability means that the questions tended to be
unrelated to each other in terms of who answered them correctly. The resulting
test scores reflect peculiarities of the items or the testing situation more
than students' knowledge of the subject matter. As with many statistics, it is
dangerous to interpret the magnitude of a reliability coefficient out of context.
High reliability should be demanded in situations in which a single test score
is used to make major decisions, such as professional licensure examinations. Because classroom examinations are typically
combined with other scores to determine grades, the standards for a single test
need not be as stringent
Cautions in interpreting
item analysis data
Item analysis identifies
questionable items which up until the trial stage had met our criteria
for relevant, reasonably valid, and fair items. Item analysis may
not necessarily identify faulty questions which should not have been
included in the trial test because those criteria were not met. Some
users of item analysis seek to reject all items but those with
the very highest discrimination values.While this apparently gives
increased reliability, this may be gained at expense of the
validity of the final test. For example, a test of computation may
have addition, subtraction, multiplication and division items. If
items are progressively discarded through
continued analysis it is
likely that only one of the operations will remain (probably the one
with the most items). The resulting test will be an apparently more
reliable test but, because only one of the four operations is tested,
it is no longer representative of all four processes, and hence not
valid for the purpose of assessing the four processes.Items which do not perform
as expected can be discarded or revised. Test constructors
should be aware of the possibility of distortion in the balance
of questions when there are not enough
items to satisfy
requirements in all cells of the specification grid. If the original specification
represents the best sampling of content, skills, and item formats,
in the judgments of those preparing and reviewing the test, then
leaving some cells of the grid vacant will indicate a less than
adequate test. To avoid this possibility, test
constructors may prepare
three or four times as many questions that they think they will
need for each cell in the grid. Test constructors have to avoid
the tendency to test what is easy to test,rather than what is
important to test.
No comments:
Post a Comment