Tuesday, September 4, 2012

Multiple Choice - Test Item Analysis


Submitted by: Camille Ann Bolos

__________________________________________________________________



Multiple Choice Test Item Analysis
David Hamill and Paul Usala

·        What is Item Analysis?
o   “The term ‘Item Analysis’ refers to a loosely structured group of statistics that can be computed for each item in a test.” (Murphy, K. & Davidshofer, 1991)

·        Why Analyze Test Items?
o   To Identify:
§  Potential mistakes in scoring program
§  Ambiguous items
§  Equal distribution of all alternatives
§  Alternatives that don’t work
§  “Tricky” items
§  Problems with time limits
§  Potential organizational policy conflicts

·        Assessment System Parameters
o   Program Parameters
§  Assessments are used as part of a promotional assessment system
§  Each assessment is part of a comprehensive examination taxonomy
§  Structured method for test development
§  Continuous testing – No piloting of Multiple-Choice test items
§  Equate each new assessment with previous versions

·        Assessment Process
o   Develop Assessments
o   Administer Assessments to Candidates
o   Perform a “Key Clearance”
§  BP Correlations
§  Hi/Low statistics
§  Internal consistency correlations
o   Score Assessments
o   Equate with current version

·        KEY CLEARANCE OVERVIEW
1.      Review of PB Data
a.      Does a correct response relate to overall test performance?
2.      Examine Distractors and Item Difficulty
a.      How many candidates responded correctly?
b.     Which responses did high and low performers choose?
3.      Confirm Decisions with the KR-20 Internal Consistency Estimate of Reliability
a.      If the Item is removed, will it increase the reliability of the assessment?
KEY CLEARANCE
·        Item-Total Correlations (Point Biseral)
o   Purpose: to identify items that correlate with overall test performance
·        Distractors and Item difficulty
o   Difficulty analysis: to identify the proportion of examinees who choose the correct response option (p-value)



 



·        Distractors and Item Difficulty
o   Method of extreme groups: categorizes test takers into groups and compares item responses
o   Split test takers into 3 groups based on scores:
§  Upper group (usually top 27%-35%)
§  Middle group
§  Lower group (usually bottom 27%-35%)

·        KR-20 Internal Consistency Reliability
o   Identifies items that would increase overall test reliability if dropped from  battery


No comments:

Post a Comment