Wednesday, September 5, 2012

Item Analysis and Item Discrimination


Wednesday, SEPTEMBER 5, 2012

Item Analysis and Item Discrimination

Submitted by: Song liping

Item analysis
   Item analysis is a process which examines student responses to individual test items in order to assess the quality of those items and of the test as a whole.  Item analysis is especially valuable in improving items which will be used again in later tests, but it can also be used to eliminate ambiguous or misleading items in a single test administration. 
Item discrimination
Traditional test analysis considers the extent to which a single item distinguishes between able and less able candidates in a similar way to the test as a whole. Items which are not consistent with the other items in the way in which they distinguish between able and less able candidates (as measured by this test) are considered for deletion, amendment, or placement on a different test. Modern test analysis techniques consider other factors as well. (These are discussed later).For a test of many items, it is common practice to assume that the total score on the trial test is a reasonable estimate of achievement for that type of test. Criterion groups may be selected on the basis of total score (if that type of analysis is being done). When such an assumption is made, we expect candidates with high total scores to have high achievement and candidates with low total scores to have low achievement. The procedure investigates how each item distinguishes between candidates with knowledge and skill, and those lacking such knowledge and skills. Choosing items with an acceptable discrimination index will tend to provide a new version of the test
with greater homogeneity. [However this process should not be taken too far because a test measuring a more complex area will be made less relevant if only one type of item is retained.]
Difficulty and Discrimination Distributions
At the end of the Item Analysis report, test items are listed according their degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor).  These distributions provide a quick overview of the test, and can be used to identify items which are not performing well and which can perhaps be improved or discarded.

Test Statistics
Two statistics are provided to evaluate the performance of the test as a whole.
Reliability Coefficient.  The reliability of a test refers to the extent to which the test is likely to produce consistent scores. 

three characteristics of the test: 
1. The intercorrelations among the items -- the greater the relative number of positive
relationships, and the stronger those relationships are, the greater the reliability.  Item discrimination indices and the test's reliability coefficient are related in this regard. 
2. The length of the test -- a test with more items will have a higher reliability, all other things being equal.
3. The content of the test -- generally, the more diverse the subject matter tested and the testing techniques used, the lower the reliability. 
Reliability coefficients theoretically range in value from zero (no reliability) to 1.00 (perfect reliability).  High reliability means that the questions of a test tended to "pull together."  Students who answered a given question correctly were more likely to answer other questions correctly.  If a parallel test were developed by using similar items, the relative scores of students would show little change. Low reliability means that the questions tended to be unrelated to each other in terms of who answered them correctly. The resulting test scores reflect peculiarities of the items or the testing situation more than students' knowledge of the subject matter. As with many statistics, it is dangerous to interpret the magnitude of a reliability coefficient out of context. High reliability should be demanded in situations in which a single test score is used to make major decisions, such as professional licensure examinations.  Because classroom examinations are typically combined with other scores to determine grades, the standards for a single test need not be as stringent

Cautions in interpreting item analysis data
Item analysis identifies questionable items which up until the trial stage had met our criteria for relevant, reasonably valid, and fair items. Item analysis may not necessarily identify faulty questions which should not have been included in the trial test because those criteria were not met. Some users of item analysis seek to reject all items but those with the very highest discrimination values.While this apparently gives increased reliability, this may be gained at expense of the validity of the final test. For example, a test of computation may have addition, subtraction, multiplication and division items. If items are progressively discarded through
continued analysis it is likely that only one of the operations will remain (probably the one with the most items). The resulting test will be an apparently more reliable test but, because only one of the four operations is tested, it is no longer representative of all four processes, and hence not valid for the purpose of assessing the four processes.Items which do not perform as expected can be discarded or revised. Test constructors should be aware of the possibility of distortion in the balance of questions when there are not enough
items to satisfy requirements in all cells of the specification grid. If the original specification represents the best sampling of content, skills, and item formats, in the judgments of those preparing and reviewing the test, then leaving some cells of the grid vacant will indicate a less than adequate test. To avoid this possibility, test
constructors may prepare three or four times as many questions that they think they will need for each cell in the grid. Test constructors have to avoid the tendency to test what is easy to test,rather than what is important to test.

No comments:

Post a Comment