Monday, August 6, 2012

Table of Specifications

A Table of Specifications is a blueprint for an objective selected response assessment.  The purpose is to coordinate the assessment questions with the time spent on any particular content area, the objectives of the unit being taught, and the level of critical thinking required by the objectives or state standards.  The use of a Table of Specifications is to increase the validity and quality of objective type assessments.  The teacher should know in advance specifically what is being assessed as well as the level of critical thinking required of the students.  Tables of Specifications are created as part of the preparation for the unit, not as an afterthought the night before the test.  Knowing what is contained in the assessment and that the content matches the standards and benchmarks in level of critical thinking will guide learning experiences presented to students.  Students appreciate knowing what is being assessed and what level mastery is required.
Also, a table of specification is a table chart that breaks down the topics that will be on a test and the amount of test questions or percentage of weight each section will have on the final test grade. This kind of table chart is usually split into two charts, and each sub topic is numbered under the main topics that are being covered for the test. This type of table is mainly used by teachers to help break down their testing outline on a specific subject. Some teachers use this particular table as their teaching guideline by breaking the table into subjects, the teachers main points, how much time should be spent on the point, and what assignment or project can be done to help the student learn the subject. For many teachers, a table of specification is both part of the process of test building and a product of the test building process. This table provides teachers and their students with a visual approximation of the content that will tested and the amount of weight it is given on a test. As part of the entire teaching process, many education experts advise constructing a table of specification early in the lesson plan building process in order to ensure that the content of lessons and projects match what will ultimately appear on the test.By offering students the opportunity to view a table of specification, teachers offer their students the opportunity to view a certain kind of rubric against which they will be graded. This opportunity allows students to have full knowledge over what they will be tested over and which sections or topics of their study will be tested. 

Tables of Specifications are designed based on:
1.course objectives
2.topics covered in class
3.amount of time spent on those topics
4.textbook chapter topics
5.emphasis and space provided in the text

A Table of Specification could be designed in 3 simple steps:
1. identify the domain that is to be assessed
2. break the domain into levels (e.g. knowledge, comprehension, application …)
3. construct the table
Textbook provided assessments or teacher made assessments should be analyzed by a Table of Specifications.  Textbook assessments may stress areas of content the teacher does not address with the same importance the text does.  The assessment may not match the time and level of thinking required by the teacher.  A Table of Specifications can help prevent this.  It is possible the level of critical thinking required by a textbook test does not match that required by state standards.  Teachers should analyze assessments carefully and match those assessments with state standards and what was actually presented in class.

Here is the specific example of the Table of Specifications:

Table of Specifications: Insuring Accountability in Teacher-Made Test

By: Charles E. Notar, Dennis C. Zuelke, Janell D. Wilson, and Barbara D. Yunker 

A Table of Specifications identifies not only the content areas covered in class, it identifies the performance objectives at each level of the cognitive domain of Bloom's Taxonomy. Teachers can be assured that they are measuring students' learning across a wide range of content and readings as well as cognitive processes requiring higher order thinking, The use of a Table insures that teachers include test items that tap different levels of cognitive complexity when measuring students' achievement. Kubiszyn & Borich (2003) suggested that teachers should use a Table so they won't forget the details.

Carey (1988) listed six major elements that should be attended to in developing a Table of Specifications for a comprehensive end of unit exam: (1) balance among the goals selected for the exam; (2) balance among the levels of learning; (3) the test format; (4) the total number of items; (5) the number of test items for each goal and level of learning; and (6) the enabling skills to be selected from each goal framework. A Table of Specifications incorporating these six elements will result in a "comprehensive posttest that represents each unit and is balanced by goals and levels of learning".

A Table of Specifications is developed before the test is written. In fact it should be constructed before the actual teaching begins (Kubiszyn & Borich. 2003; Mehrans & Lehman, 1973; Ooster. 2003). As much time and effort is spent on developing the house blueprint; so too a Table of Specifications requires considerable time and effort to develop (Kubiszyn & Borich, 2003). Linn and Gronlund (2000) stated "While the process is time-consuming, the effort that goes into development of a table of specifications also makes it much easier to prepare the test once the plan is developed".

Heading provides the administrative data for the test and Table. All tables of specifications have a Table heading. The heading provides for the administrative requirements of the test and the information needed to construct the two-way table. The heading makes it easier for filing and retrieving.

The course title is exactly that, the title of the course as seen on the teachers' and students' schedule e.g. American history I, English II. Grade level is the grade for which the course is intended on the local or state course of study. Test periods are time limits for which the test has been developed for administration. Date of test is the date the teacher will administer the test.

The subject matter digest is a paragraph that provides the limits of the subject matter that will be covered in class. This insures that the class covers only required material as related to stated objectives and nothing else. This setting of parameters helps guide discussion and keeps lessons focused and on topic. Textbook title and date of publication along with unit numbers or pages being covered can also be part of the digest.

The teacher must determine what type of test will be developed in order to establish the amount of detail required in the Table. A main focus in teacher made assessments concerns students' cognitive abilities to understand and apply the concepts they have learned. There is less concern about the rapidity of a student's responses to questions than about the content of those responses. Accordingly, time limits on achievement tests are very generous, allowing all students enough time to consider each question and attempt to answer it. These tests are called power tests. Items on a power test have different levels of difficulty usually arranged in a hierarchy from knowledge level (easy) to increasing difficulty. A power test should be administered so that a very large percentage (90% is an acceptable minimum) of the pupils for whom it is designed will have ample time to attempt all of the items.

A speed test is one in which a student must, in a limited amount of time, answer a series of questions or perform a series of tasks of a uniformly low level of difficulty. The near-constant level of difficulty of the questions or tasks is such that, if the pupil had unlimited time, he or she could easily answer each question or perform each task successfully. The intent of a speed test is to measure the rapidity with which a pupil can do what is asked of him or her. Speed of performance frequently becomes important after students have mastered task basics as in using a keyboard, manipulatives, or phonics.

Tests are often a mixture of speed and power even when achievement level is the test's purpose. Such tests are called partially speeded tests. Teachers must check time limits carefully to be sure that all students will have the opportunity to address each test item adequately before the allotted time is up.

Once the purpose of the test as a power, speed or partially speeded test has been established, the teacher can decide the actual length of the test in minutes. The amount of time for the test is determined before test construction and is facilitated by using a Table of Specifications. Testing time, measured in minutes, is determined by a number of factors including: the number of objective to be tested; coverage of objectives; objective complexity; number of conditions to be tested; and levels of acceptable performance. In addition, teachers must look at students' age and ability levels, class time available, types of test items, length and complexity of test items, and amount of computation required.

Carey (1988) pointed out that the time available for testing depended not only on the length of the class period but also on students' attention spans. Completion of the test should be possible within one class period and the students should finish before they become fatigued (a six year old will not be able to take a 40 minute, paper-pencil test). A Table of Specifications insures that teachers will address all of these important issues in constructing an end of unit exam.

To continue our analogy, the something new at the wedding of teacher made tests and accountability is the use of an assessment plan to determine test value. The assessment plan has been around for a number of years but has not been associated with the development of a Table of Specifications. An assessment plan considers how many points the test is worth, how the test fits into the semester grade point total and eventually determines the Grade Point Average. An assessment plan determines total number of points available in a marking period. Semester and final grades for the year come from the six (or nine) week assessment plans added together.

The first step in developing an assessment plan is to list the assessment activities to be used in the class. The second step is to determine how many of each activity will be used in each grading period. The third step is to assign points according to the worth of the activity. This is a value judgment, e.g. "homework is less important than a unit exam but more important than answering questions in class." The following is an example of a six week assessment plan.

An assessment plan should be formed before each grading period begins. In the example above, the points for testing and points for class work are evenly divided. This is the authors' point of view. Mehrens & Lehman (1973) suggested that the teacher determines the balance in the assessment plan. But, balance will not happen if there is inadequate planning. Adequate and extensive planning is required so that instructional objectives, the teaching strategy to be employed, the text material, and the evaluative procedures are all related in some meaningful fashion.

He also made suggestions for determining a base number of items to use per test. "Recall-level items require less time than application-level items, whatever the test format. Items that ask students to solve problems, analyze or synthesize information, or evaluate examples all require more time than do items that require students to remember a term, fact, definition, rule, or principle. Essay questions require more time than either selected-response or short-answer items".

Some rules of thumb exist for how long it takes most students to answer various types of questions:
* A true-false test item takes 15 seconds to answer unless the student is asked to provide the correct answer for false questions. Then the time increases to 30-45 seconds.
* A seven item matching exercise takes 60-90 seconds.
* A four response multiple choice test item that asks for an answer regarding a term, fact, definition, rule or principle (knowledge level item) takes 30 seconds. The same type of test item that is at the application level may take 60 seconds.
* Any test item format that requires solving a problem, analyzing, synthesizing information or evaluating examples adds 30-60 seconds to a question.
* Short-answer test items take 30-45 seconds.
* An essay test takes 60 seconds for each point to be compared and contrasted.

Fallback positions for determining how many questions should be on a test are how much time is available for testing and the level of performance required (test by conditions as well as action verb). In general, the more items on a test, the more valid and reliable the test will be. However, a test could be prohibitively long. On the other hand, a test with only one item per objective even if all items were answered correctly would provide insufficient evidence of proficiency. When all else fails look in the mirror to see who determines the number of test questions on a teacher made test.

Constraints are those variables that prevent testing in the manner that would be most appropriate for the level of instruction required to master the performance level indicated by the objective's action verb. Write the reason why you see a constraint, if there are no constraints state NONE. Types of constraints are time, personnel, cost, equipment, facilities, realism, logistics, communications, others.

The first heading in the body of the Table is called Learning Objectives. This heading has four subheadings: No; Level; Time; and Q/P/%. These subheadings, although distinct, are interrelated. No. represents the number designation of the objective. Either write the objective out in this space or put the number of the objective from an objective list in the space. If a list is used, it must be attached to the table.

The table itself is predicated on the writing of good performance objectives. A performance objective states the performance required or capability that is involved (action verb). The content is then specified through the behavior, situation, and special conditions components of the objective (condition{s}). When developing a Table you want to test all the objectives. You can only be sure students can perform the objectives which are tested. However, a constraint in doing that may be time. In that case you would want to do sampling of objectives.

You should sample among objectives only if it will solve a constraint problem. Document the sampling plan. Always test the most critical objectives. Test the less critical objectives in rotation randomly. Students are not informed of the objectives to be tested.

Sample among conditions if the action must be performed under each of two conditions develop items for each condition. If the action may be performed under either of two conditions, test the more difficult condition if only one can be tested. If the action must be performed under three conditions, test the two most critical ones. If the action must be performed under a large number of conditions, test at least 30% of them including the most critical ones.

Level equals domain level of the action verb of the objective. Level is assigning the objective's action verb to a category in Bloom's taxonomy. For example, Objective 1 is application and Objective 2 is comprehension. There are a number of lists of action verbs according to taxonomy level (e.g. Linn & Gronlund (2000), Appendix G). This assignment is done graphically so that you can look to the right of the assignment to see if there are any questions in levels beyond the assigned level. You can only test to the level taught. Otherwise you will be setting your students up for failure. You also must test objectives at full performance if you are going to state that students are competent at action verb level. At the level necessary, you can and should test the enabling skills for assurance that the students have the prerequisite skills to achieve full performance. In the following example from Table 1, partially reproduced here as Chart 1.

Objective 1 reads as follows "Identify architectural style in examples of 20th century revival style buildings around the world." There are no questions listed in the Table above application so we are not testing above the level taught. Under application there are five questions, therefore, Objective 1 is being tested at full performance. Under comprehension for Objective 1, there are six questions listed. These six questions test enabling skills required to obtain full performance. These questions may be such that examples of original styles of building architecture are presented and the student names them.

Bloom's Taxonomy's cognitive domain can be arranged in columns. Bloom's taxonomy is used because it provides the ability to develop a Table for a teacher made test in the cognitive, affective and psychomotor domains. The Tables used in this fastback as illustrations are all cognitive, however, the only difference between the cognitive and the affective and psychomotor is the interchange of the placement of the levels.

A Table ensures your test will include a variety of items at different levels of cognitive complexity. The cognitive domain is looked at as a set of steps. You must take the first step before you can attain the second, and so on. This mind set is very important when you look at congruency.

The example under LEVEL in Chart 1 illustrated an aspect of testing called CONGRUENCY. Congruency is teaching and testing at the same level. The level of the objective is matched with the placement of test items. Chart 2 is an example of congruency; testing what you are teaching using Objective 7 in Table 1.

The teacher is teaching Objective 7 at the application level. Similarly, to state that a student can fully perform at the application level, the test must assess at the application level. In the chart, if the teacher uses Test 1 Objective 7 has not been tested to the level of the objective, and you will not be able to state that the student who passed has mastered the objective. Test 2 is the reverse, you have set the students up for failure because you are testing at a mastery level you did not teach them to attain. Test 3 gives you a variety of ways to test for mastery of the objective level application, with Test 3 version b being used for Objective 7.

You would use Test 3 versions a, b, or c, if you were testing prerequisite or enabling objectives. While testing for maximum performance of the objective action verb you may need to ask questions on the prerequisite and enabling objectives to insure that the student had these abilities, otherwise you will not know why the student failed at the full performance measure. The testing of prerequisite and enabling objectives is extremely important, it helps you in being diagnostic and prescriptive in your test critique and determining if you taught with sufficient emphasis, depth, and breadth, the objective. An example of an enabling test question would be to give the value of [PI] if the objective full performance was to calculate the circumference of a circle given its radius.

To do the calculations for the TIME and Q/P/% columns of the table of specifications the teacher must use the following formulas for each objective in the table.
FORMULA "A"time in class spent on objective (min) / total time for the instruction being examined (min) = % of instruction time

Example from Table 1 using Objective 1: total time for instruction 600 minutes. Time in class spent on Objective 1 95 minutes.

95/600 = .16 or 16%
THEN the instructor should look at the number of test items and their point weight per question and complete Formula "B."

FORMULA "B"point total of questions for objective / total points * on examination = % of examination value

Example from Table 1 using Objective 1:
16/100 = 16%
Then the two percentages from Formula "A" and Formula "B" should be placed in Formula "C." If the outcome of Formula "C" is within the established parameters, the teacher may go to the next objective until they have completed the process for all objectives.

(* Total points is academic point value assigned to examination)
THEN the two percentages from Formula "A" and Formula "B" should be placed in Formula "C." If the outcome of Formula "C" is within the established parameters, the instructors may go to the next objective until they have completed the process for all objectives.

FORMULA "C"Percent of instruction time = percent of examination value (within +- 2 percent, if not, redo test)

Example from Table 1 using Objective 1:
16 = 16

Using as an example Table 1 objective NO. 1 had 95 minutes of instructional time spent on it. The total time of instruction covered by the test was 600 minutes. Using Formula "A" objective NO. 1 would have 16% of the instructional time. Using Formula "B" 16% of the instructional time would equate to 11 questions and 16 points. Formula "C" compares the two percentages. The percentages should be within the values established for content validity for an examination.

TIME equals the time, expressed in minutes, spent in class and other learning activities on the objective. Mehrens & Lehman (1973) state the major advantage of teacher made tests is that a teacher made test can be tailor made to fit the teacher's unique and/or particular objectives. However, the teacher must insure that appropriate weight is given during the test to those particular objectives. The formulas for calculating time have already been presented. Remember that all these times are in minutes and then converted to percent. The use of these formulas and their answers determine the distribution of numbers of questions on the test and point values assigned to said questions. Emphasis given during instruction must be used to assign weight in a test. Emphasis on an objective in a class and corresponding activities is a students' first and major clue to relevance and value of what is being taught. You have been in class where the teacher spend "X" amount f time on a subject and there is one question on the test covering that material and 14 on something that was covered by a paragraph in the text. The way the Table is constructed, time on objective, both direct and integrated is used to establish relevance of material to the students and for test construction. Total Time Spent Teaching all material is the baseline that is used to determine the weight given to the objective in the overall scheme of the Table. Mehrens & Lehman (1973) states there is no guarantee a "match" between instructional objectives and test item will take place if a Table is used; it will only indicate the number or proportion of test items to be allotted to each of the instructional objectives specified.

The final distribution of items in the Table of Specifications should reflect the emphasis given during the instruction. This concept of relative weight impacts both the construction of the Table and student perception that the test is fair. Objectives considered more important by the teacher should be allotted more test items. Similarly, areas of content receiving more instruction time should be allotted more test items. Too often students say, "I studied the chart in the book that we spent two days on and then there was nothing on the test. And where did that essay on cause and effect come from." Relative weighting will alleviate these types of student comments.

Although the decisions involved in making the Table are somewhat arbitrary and the process is time consuming, the preparation of the Table of Specifications is one of the best means for ensuring that the test will measure a representative sample of instructionally related tasks.

The percentages are then used to determine the number of questions per objective and the value of points per objective.

Q/P/% is the number of questions (Q) and points (P) by percent (%) that represent the emphasis of instructional time based on relative weight. These are the number of questions and points that are the bench mark for test development. In the example below from Table 1, partially reproduced here as Chart 3, the Q/P/% of Objective 3 is in bold (6/9).

Linn & Gronlund (2000) provided the rationale behind the Q/P/% when they stated "We should like any assessment of achievement that we construct to produce results that represent both the content areas and the objectives we wish to measure, and the table of specifications aids in obtaining a sample of tasks that represents both. The percentages in the table indicate the relative degree of emphasis that each content area and each instructional objective is to be given in the test".

Linn & Gronlund (2000) further stated "this table indicates both the total number of test items and assessment tasks and the percentage allotted to each objective and each area of content. For classroom testing, using the number of items may be sufficient, but the percentages are useful in determining the amount of emphasis to give to each area".

Linn & Gronlund (2000) summed up
The second major heading in the Table body is ITEM TYPE. Item type is the type(s) of test item(s) used to test the student's ability to obtain the objective. The Test Item Format Chart below provides a visual representation of the levels of the cognitive domain that can be tested by the five basic test items used on teacher made tests. Depending on complexity, wherever possible use the most simplistic test item format.

Using Table 1, partially reproduced here as Chart 5, as an example, Objectives 1 and 3 are both full performance at the application level. However, they are being tested by different item types, but with the correct types of questions as prescribed by the chart. The use of the essay in Objective 3 may be to explain reasoning or a procedure required by the objective for full performance.

The third subheading in the Table body is Bloom's Taxonomy/Congruency. LEVELS of the domain tested and the total number of the types of questions in the level(s) tested are listed. This will assist in determining if testing is at multiple levels, only at the highest level, or at too high a level. The base Table of Specifications is set up for the cognitive domain. If testing the affective or psychomotor domain, the Table is the same, except the cognitive levels would be replaced by the levels of the affective or psychomotor domain.

The sums of the columns and row should be equal. If they are not, then the addition is incorrect. The bottom right hand corner is where the column and row totals are found. The total number of questions for each level of the domain is summed objective. Then all the levels of the domains are added. This total should equal the total number of questions which where determined to be on the test. Similarly the values of each question for each objective are summed and the total of all points is added. This total should equal the set value of the examination.
NOTE: Common sense is important. Make point values whole numbers, no 1.5, etc. You will spend too much time grading. The questions per objective and point value are assigned based on percent of time taught including direct instruction and integrated instruction. Therefore one percent equals one question worth one point. However, if you use a question and it is worth two points look at that as two questions. If you have an essay question worth 5 points look at it as five questions. Also, when rounding up or down to get a full question or point, always round up for the higher level objectives. Number of questions per objective can go down but point value per objective is not changed.

Using Table 1, partially reproduced here as Chart 7, the objectives and points are:

Summarizing the objectives and their point totals in Chart 7 would look like this:
To check that your test is assessing as taught you look at the total row at the bottom of the Table 1, partially reproduced here as Chart 8 and you will see if values are within line.
To keep with the wedding theme something needs to be borrowed. We have borrowed two things for this wedding. We are going to borrow from Carey (1988) some thoughts on how to make the Table provide a test that is both valid and reliable.

Carey (1988) stated "During the design of classroom tests, you need to be concerned with the validity and reliability of test scores. We have discussed content validity and how the Table will provide for it. Reliability is not normally associated with the Table. Reliability refers to the consistency or stability of scores obtained from a test. If the scores are unreliable, decisions or inferences based on them are dubious. Tests must be designed carefully to yield reliable and valid scores".
Carey (1988) continued that there are "five steps during the design stage you must take to achieve reliable test results: (1) select a representative sample of objectives from the goal framework; (2) select enough items to represent adequately the skills required in the objective; (3) select item formats that reduce the likelihood of guessing; (4) prescribe only the number of items students can complete in the time available; and (5) determine ways to maintain positive student attitudes toward testing. The subordinate skills in an instructional goal framework should be divided into prerequisite skills (skills students should have mastered before entering a unit of instruction) and enabling skills (skills that comprise the main focus of instruction for a unit)". The Table presented takes into account the five steps that will make a test reliable.

The second thing borrowed is Linn & Gronlund's (2000) idea to embed related non-test assessment procedures in an expanded Table of Specifications.

Reproducing the assessment plan shown earlier as Chart 8 and we could add the class attendance, homework, class participation, and quiz points used during the instructional time that our test covered. In Table 1 (reproduced from page 3) with the added non-test points we have added the categories and values in the heading of the Table and then emphasized in the body of the Table the non-test learning activities and their relative points by underlining.
Example
Assessment Plan:
Determining Marking Period Point Values

Observation time on
Objective/task                30 x 05 = 150
Homework                      6 x 20 = 120
Class Participation           30 x 10 = 300

Quizzes
Open book                     3 x 10 = 30
Closed book                   2 x 25 = 50

Tests
Unit test                     3 x 100 = 300
Marking period exam           1 x 200 = 200
Portfolio                     0 for marking period
Total points marking period   1150
(Class work = 570 Tests = 580)

Teaching   Application
Learning   Application
Test 1     Knowledge, Comprehension
Test 2     Comprehension, Application Synthesis
Test 3     Version a. Knowledge,                Application
           Version b. Knowledge, Comprehension, Application
           Version c.            Comprehension, Application
           Version d.                           Application
Q/P/% when they stated "the final distribution of items in the table of specifications should reflect the emphasis given during the instruction. Objectives considered more important by the teacher should be allotted more test items. This applies not only to the items on the classroom test but also to performance assessment tasks. The weight given to the performance of such assessment tasks should reflect the importance of the objective. Similarly, areas of content receiving more instruction time should be allocated more test items and assessment tasks".
Objectives   #   Point           Value
10%           1   Knowledge       12
21%           2   Comprehension   21
44%           4   Application        34
 10%           1   Analysis            10
08%           1   Synthesis           06
 07%           1   Evaluation         07
100%                                      100

TABLE 1
Heading
Course Title: Art III
Grade level: 6, 7, 8,9, 10, 11, 12 (Circle as appropriate)
Periods test is being used: 1 2 3 4 5 6 7 (Circle as appropriate)
Date of test: April 15, 2003
Subject matter digest: 19th and 20th Century Art. Includes artists
from around the world. Oils and water  colors as primary medium.
Identify major works, styles, and schools.
Type Test: Power, Speed, Partially Speeded (Circle One)
Test Time: 45 minutes
Test Value: 100 points
Base Number of Test Questions: 75
Constraints: Test time, quantity of art available for test items
                                         

Reference: Notar, C., Zuelke, D., Wilson, J., and Yunker, B. (2004). Table of Specifications: Insuring Accountability in Teacher-Made Test. Journal of Instructional Psychology. Vol.1, Issue2. Retrieved from, http://www.freepatentsonline.com/article/Journal-Instructional-Psychology/119611686.html

Prepared by: Danielou P. Galla 

No comments:

Post a Comment