Assessment of Learning: Item Analysis

ITEM ANALYSIS

Evaluation is indispensable part of and different types of tests are used for assessment and consequently evaluation. Tests play important role in giving feedback stakeholders in education on various aspects therefore quality of tests has always been a hot issue since long; consequently literature is full of comprehensive discussion on validity, reliability and the characteristics of quality assessment programs (Stephen & Polly, 2006) so that to bring improvement in feedback. For this reasons item analysis is widely used to improve test quality through knowing about item statistics. Item statistics are not only used for improvement of test but also in item revision (Lange, 1967). Item analysis allows us to observe the characteristics of a particular item and can be used to ensure that questions are of an appropriate standard for inclusion in a test. A comprehensive knowledge of the factors leading to construct a good test item can enable us to create more effective test besides standardizing the existing tests. Improvement in the test through item analysis can save a lot of time and energy on the part of teachers and test developer. Typically, in analysis of a test, two values are computed, a difficulty level and a discrimination index.

One of the most important tasks confronting by the faculty members is the student performance evaluation. Once designed, the evaluative procedure must be administered and then scored, interpreted and graded. After that, feedback must be presented to students. Doing these tasks demands a broad range of cognitive, technical and interpersonal resources on the part of faculty. But there is an even more critical task remains which is investigating the quality of the evaluative procedure.

What constitutes a good exam item? The students seem to know or at least believe they know but are they correct when they claim that an item was too difficult, too tricky or too unfair. According to Lewis Aiken, item analysis is a group of procedures for assessing the quality of exam items. The purpose of an item analysis is to improve the quality of an exam by identifying items that are candidates for retention, revision or removal. It can also clarify what concepts the examinees have and have not mastered.

PURPOSE:

1. Assemble or write a relatively large number of items of the type you want on the test.

2. Analyze the items carefully using item format analysis to make sure they are well-written and clear.

3. Pilot the items using a group of students similar to the group that will ultimately be taking the test.

4. 4. Analyze the results statistically using item analysis techniques.

5. Select the most effective items and make a shorter, more effective revised version of the test.

2 BROAD CATEGORIES:

Qualitative Item Analysis

Qualitative item analysis procedures include careful proofreading of the exam prior to its administration for typographical errors, for grammatical cues that might inadvertently tip off examinees to the correct answer, and for the appropriateness of the reading level of the material. This procedure can also include small group discussion of the quality of the exam and its items with the examinees who have already taken the test or with departmental student assistants or even experts in the field. Some teachers use “think-aloud test administration” in which examinees are asked to verbally express their opinions or what they are thinking as they respond to each of the items on the exam. It can assist the teachers in determining whether certain students misinterpreted particular items and it can also determine why students may have misinterpreted a particular item.

Quantitative Item Analysis

Specifically, three numerical indicators are often derived during an item analysis: item difficulty, item discrimination and distractor power.

1. Item Difficulty Index (p)

The item difficulty statistic is an appropriate choice for achievement or aptitude tests when the items are scored dividedly (correct and incorrect). It can be derived for true-false, multiple choice, matching items and essay items where the instructor can convert the range of possible point values into categories “passing” and “failing.”

The item diificulty index or p can be computed by:

p = Number of test takers who answered the item correctly

Total number of students who answered the item

p can range between 0.00 (no examinees answered the item correctly) and 1.00 (all examinees answered the item correctly. It can also range from 0% to 100%, the higher the value, the easier the item. Pvalues above 0.90 are very easy items and might be a concept not worth testing. P-values below 0.20 indicate difficult items and should be reviewed for possible confusing language or the contents needs re-instruction. Optimum difficulty level is 0.50 for maximum discrimination between high and low achievers.

No test item need have only one p value. Not only may the p value vary with each class group that takes the test, the teacher may gain insight by computing the item difficulty level for a number of different subgroups within a class, those who did well on the exam overall and those who performed more poorly.

For example, the difficulty level is 0.20 so it means 20% of the examinees answered the item correctly. Does this item mean that the item was challenging for all? Does it mean that the teacher failed in his or her attempt to teach the particular topic assessed by the item? Does it mean that the students failed to learn the material? Does it mean that the item was poorly written? Teachers must also rely on other item analysis procedure, qualitative and quantitative.

2. Item Discrimination Index (D)

Item discrimination analysis deals with the fact that often different test takers will answer a test item in different ways. It addresses the validity of the items on a test, the extent to which the items tap the attributes they were intended to assess.

It is the point-bacterial relationship between students’ performance on individual item and total test score. This value ranges between 0.0 and 1.00. The higher the value, more discriminating the item. A highly discriminating item indicates that the students who had high tests scores got the item correct whereas students who had low test scores got the item incorrect.

Teachers test because they want to find out if students know the material but they learn is how they did on the exam we gave them. Item discrimination index tests the test in the hope of keeping the correlation between knowledge and exam performance as close as it can be in an admittedly imperfect system.

It is calculated by:

A. Divide the group of test takers into 2 groups. (high scoring and low scoring).

B. Compute the item difficulty levels separately for the upper (P_upper) and lower (P_lower) scoring groups.

C. D = p_upper- p_lower

How can this be interpreted?

Example: Half of the examinees answered a particular item correctly and that all of the examinees who scored above the median on the exam answered the item correctly and all of the examinees who scored below the median answered incorrectly. P_upper = 1.00 and p_lower = 0.00. Then, D = 1.00 and the item is somehow a perfect positive discriminator. This suggests that those examinees who knew the material and were well-prepared passed the item while the others failed it.

Difficulty and discrimination are not independent. If all students in both the upper and lower levels either pass or fail an item, there is nothing in the data to indicate whether the item itself was good or not. The value of the item discrimination index will be maximized when only half of the test takers overall answer an item correctly. The ideal situation is one in which the half who passed the item were students who all did well on the exam overall. There are many reasons to include at least some such items. Very easy items can reflect the fact that some relatively straightforward concepts were taught well and mastered by all students. The teacher may choose to include some very difficult items on the exam to challenge even the best-prepared students. The teacher should be aware that neither of these types of items functions well to make discriminations among those taking the test.

3. Item Distractor Analysis

This mainly applies particularly to multiple-choice item. The incorrect alternatives are called distractors. Item distractor analysis examines the percentage of examinees who select each incorrect alternative, to determine whether the distractors are functioning as intended. On a well-designed multiple choice item, those who know the material and are well-prepared for the exam should select the correct alternative. Those who are not well-prepared should guess or select almost randomly from among the available distractors. Such an item would be a very good discriminator and would very likely be a candidate for retention for use in future exams.

It can also provide useful diagnostic information in other situations. Candidate for removal from the exam is the item that was passed by more of those who did poorly on the exam overall that those who were well-prepared and knew the material

Caution in Item Analysis

• Item analysis data is just a reflection of internal consistency and therefore should not be treated as item validity which requires an external criteria e.g. experts’ opinions to accurately judge the validity of test items.

• A low discrimination index does not make an item to be dropped from a test because extremely difficult or easy items may have low ability to discriminate. But such items can be included in a test to sample course content adequately. Similarly an item may have low discrimination because multidimensionality of a test.

• Item analysis data are tentative due to influences by factors like sample of students, quality of instruction, and chance errors.

(“Teste Item Analysis”, Online,

www.utexas.edu/academic/mec/research/.../itemanalysishandout.pdf, accessed on 15-08-2010)

Conclusion:

Item difficulty and discrimination analysis programs are often included in the software used in processing exams answered on Scantron or other optically scannable forms. These analyses can often be performed for students by personnel in the computer services office. Item analysis can certainly help determine whether or not the items on the exams were good ones and to determine which items to retain, revise or replace.

References:

(Zurawski, R. Making the Most of Exams: Procedures for Item Analysis.)

http://www.ntlf.com/html/pi/9811/v7n6smpl.pdf

http://jalt.org/test/PDF/Brown18.pdf

http://www.eurojournals.com/ejss_17_1_07.pdf