Tables of Specifications are designed based on:
5.emphasis and space provided in the text
A Table of Specification could be designed
in 3 simple steps:
2. break the domain into levels (e.g. knowledge,
comprehension, application …)
3. construct the table
Textbook
provided assessments or teacher made assessments should be analyzed by a Table
of Specifications. Textbook assessments may stress areas of content the
teacher does not address with the same importance the text does. The
assessment may not match the time and level of thinking required by the
teacher. A Table of Specifications can help prevent this. It is
possible the level of critical thinking required by a textbook test does not
match that required by state standards. Teachers should analyze
assessments carefully and match those assessments with state standards and what
was actually presented in class.
Here is
the specific example of the Table of Specifications:
Table of Specifications: Insuring Accountability in
Teacher-Made Test
By:
Charles E. Notar, Dennis C. Zuelke, Janell D. Wilson, and Barbara D. Yunker
A Table of Specifications identifies not only the content areas covered in class, it identifies the performance objectives at each level of the cognitive domain of Bloom's Taxonomy. Teachers can be assured that they are measuring students' learning across a wide range of content and readings as well as cognitive processes requiring higher order thinking, The use of a Table insures that teachers include test items that tap different levels of cognitive complexity when measuring students' achievement. Kubiszyn & Borich (2003) suggested that teachers should use a Table so they won't forget the details.
Carey (1988) listed six major elements that should be
attended to in developing a Table of Specifications for a comprehensive end of
unit exam: (1) balance among the goals selected for the exam; (2) balance among
the levels of learning; (3) the test format; (4) the total number of items; (5)
the number of test items for each goal and level of learning; and (6) the
enabling skills to be selected from each goal framework. A Table of
Specifications incorporating these six elements will result in a "comprehensive
posttest that represents each unit and is balanced by goals and levels of
learning".
A Table of Specifications is developed before the test is
written. In fact it should be constructed before the actual teaching begins
(Kubiszyn & Borich. 2003; Mehrans & Lehman, 1973; Ooster. 2003). As
much time and effort is spent on developing the house blueprint; so too a Table
of Specifications requires considerable time and effort to develop (Kubiszyn
& Borich, 2003). Linn and Gronlund (2000) stated "While the process is
time-consuming, the effort that goes into development of a table of
specifications also makes it much easier to prepare the test once the plan is
developed".
Heading provides the administrative data for the test and
Table. All tables of specifications have a Table heading. The heading provides
for the administrative requirements of the test and the information needed to
construct the two-way table. The heading makes it easier for filing and
retrieving.
The course title is exactly that, the title of the course as
seen on the teachers' and students' schedule e.g. American history I, English
II. Grade level is the grade for which the course is intended on the local or
state course of study. Test periods are time limits for which the test has been
developed for administration. Date of test is the date the teacher will
administer the test.
The subject matter digest is a paragraph that provides the
limits of the subject matter that will be covered in class. This insures that
the class covers only required material as related to stated objectives and
nothing else. This setting of parameters helps guide discussion and keeps
lessons focused and on topic. Textbook title and date of publication along with
unit numbers or pages being covered can also be part of the digest.
The teacher must determine what type of test will be
developed in order to establish the amount of detail required in the Table. A
main focus in teacher made assessments concerns students' cognitive abilities
to understand and apply the concepts they have learned. There is less concern
about the rapidity of a student's responses to questions than about the content
of those responses. Accordingly, time limits on achievement tests are very
generous, allowing all students enough time to consider each question and
attempt to answer it. These tests are called power tests. Items on a power test
have different levels of difficulty usually arranged in a hierarchy from
knowledge level (easy) to increasing difficulty. A power test should be administered
so that a very large percentage (90% is an acceptable minimum) of the pupils
for whom it is designed will have ample time to attempt all of the items.
A speed test is one in which a student must, in a limited
amount of time, answer a series of questions or perform a series of tasks of a
uniformly low level of difficulty. The near-constant level of difficulty of the
questions or tasks is such that, if the pupil had unlimited time, he or she
could easily answer each question or perform each task successfully. The intent
of a speed test is to measure the rapidity with which a pupil can do what is
asked of him or her. Speed of performance frequently becomes important after
students have mastered task basics as in using a keyboard, manipulatives, or
phonics.
Tests are often a mixture of speed and power even when
achievement level is the test's purpose. Such tests are called partially
speeded tests. Teachers must check time limits carefully to be sure that all
students will have the opportunity to address each test item adequately before
the allotted time is up.
Once the purpose of the test as a power, speed or partially
speeded test has been established, the teacher can decide the actual length of
the test in minutes. The amount of time for the test is determined before test
construction and is facilitated by using a Table of Specifications. Testing
time, measured in minutes, is determined by a number of factors including: the
number of objective to be tested; coverage of objectives; objective complexity;
number of conditions to be tested; and levels of acceptable performance. In
addition, teachers must look at students' age and ability levels, class time
available, types of test items, length and complexity of test items, and amount
of computation required.
Carey (1988) pointed out that the time available for testing
depended not only on the length of the class period but also on students'
attention spans. Completion of the test should be possible within one class
period and the students should finish before they become fatigued (a six year
old will not be able to take a 40 minute, paper-pencil test). A Table of
Specifications insures that teachers will address all of these important issues
in constructing an end of unit exam.
To continue our analogy, the something new at the wedding of
teacher made tests and accountability is the use of an assessment plan to
determine test value. The assessment plan has been around for a number of years
but has not been associated with the development of a Table of Specifications.
An assessment plan considers how many points the test is worth, how the test
fits into the semester grade point total and eventually determines the Grade
Point Average. An assessment plan determines total number of points available
in a marking period. Semester and final grades for the year come from the six
(or nine) week assessment plans added together.
The first step in developing an assessment plan is to list
the assessment activities to be used in the class. The second step is to
determine how many of each activity will be used in each grading period. The
third step is to assign points according to the worth of the activity. This is
a value judgment, e.g. "homework is less important than a unit exam but
more important than answering questions in class." The following is an
example of a six week assessment plan.
An assessment plan should be formed before each grading
period begins. In the example above, the points for testing and points for
class work are evenly divided. This is the authors' point of view. Mehrens
& Lehman (1973) suggested that the teacher determines the balance in the
assessment plan. But, balance will not happen if there is inadequate planning.
Adequate and extensive planning is required so that instructional objectives, the
teaching strategy to be employed, the text material, and the evaluative
procedures are all related in some meaningful fashion.
He also made suggestions for determining a base number of
items to use per test. "Recall-level items require less time than application-level
items, whatever the test format. Items that ask students to solve problems,
analyze or synthesize information, or evaluate examples all require more time
than do items that require students to remember a term, fact, definition, rule,
or principle. Essay questions require more time than either selected-response
or short-answer items".
Some rules of thumb exist for how long it takes most students
to answer various types of questions:
* A true-false test item takes 15 seconds to answer unless
the student is asked to provide the correct answer for false questions. Then
the time increases to 30-45 seconds.
* A seven item matching exercise takes 60-90 seconds.
* A four response multiple choice test item that asks for an
answer regarding a term, fact, definition, rule or principle (knowledge level
item) takes 30 seconds. The same type of test item that is at the application
level may take 60 seconds.
* Any test item format that requires solving a problem,
analyzing, synthesizing information or evaluating examples adds 30-60 seconds
to a question.
* Short-answer test items take 30-45 seconds.
* An essay test takes 60 seconds for each point to be
compared and contrasted.
Fallback positions for determining how many questions should
be on a test are how much time is available for testing and the level of
performance required (test by conditions as well as action verb). In general,
the more items on a test, the more valid and reliable the test will be.
However, a test could be prohibitively long. On the other hand, a test with
only one item per objective even if all items were answered correctly would
provide insufficient evidence of proficiency. When all else fails look in the
mirror to see who determines the number of test questions on a teacher made
test.
Constraints are those variables that prevent testing in the
manner that would be most appropriate for the level of instruction required to
master the performance level indicated by the objective's action verb. Write
the reason why you see a constraint, if there are no constraints state NONE.
Types of constraints are time, personnel, cost, equipment, facilities, realism,
logistics, communications, others.
The first heading in the body of the Table is called Learning
Objectives. This heading has four subheadings: No; Level; Time; and Q/P/%.
These subheadings, although distinct, are interrelated. No. represents the
number designation of the objective. Either write the objective out in this space
or put the number of the objective from an objective list in the space. If a
list is used, it must be attached to the table.
The table itself is predicated on the writing of good
performance objectives. A performance objective states the performance required
or capability that is involved (action verb). The content is then specified
through the behavior, situation, and special conditions components of the
objective (condition{s}). When developing a Table you want to test all the
objectives. You can only be sure students can perform the objectives which are
tested. However, a constraint in doing that may be time. In that case you would
want to do sampling of objectives.
You should sample among objectives only if it will solve a
constraint problem. Document the sampling plan. Always test the most critical
objectives. Test the less critical objectives in rotation randomly. Students
are not informed of the objectives to be tested.
Sample among conditions if the action must be performed under
each of two conditions develop items for each condition. If the action may be
performed under either of two conditions, test the more difficult condition if
only one can be tested. If the action must be performed under three conditions,
test the two most critical ones. If the action must be performed under a large
number of conditions, test at least 30% of them including the most critical
ones.
Level equals domain level of the action verb of the
objective. Level is assigning the objective's action verb to a category in
Bloom's taxonomy. For example, Objective 1 is application and Objective 2 is
comprehension. There are a number of lists of action verbs according to
taxonomy level (e.g. Linn & Gronlund (2000), Appendix G). This assignment
is done graphically so that you can look to the right of the assignment to see
if there are any questions in levels beyond the assigned level. You can only
test to the level taught. Otherwise you will be setting your students up for
failure. You also must test objectives at full performance if you are going to
state that students are competent at action verb level. At the level necessary,
you can and should test the enabling skills for assurance that the students
have the prerequisite skills to achieve full performance. In the following example
from Table 1, partially reproduced here as Chart 1.
Objective 1 reads as follows "Identify architectural
style in examples of 20th century revival style buildings around the
world." There are no questions listed in the Table above application so we
are not testing above the level taught. Under application there are five
questions, therefore, Objective 1 is being tested at full performance. Under
comprehension for Objective 1, there are six questions listed. These six
questions test enabling skills required to obtain full performance. These
questions may be such that examples of original styles of building architecture
are presented and the student names them.
Bloom's Taxonomy's cognitive domain can be arranged in
columns. Bloom's taxonomy is used because it provides the ability to develop a
Table for a teacher made test in the cognitive, affective and psychomotor
domains. The Tables used in this fastback as illustrations are all cognitive,
however, the only difference between the cognitive and the affective and
psychomotor is the interchange of the placement of the levels.
A Table ensures your test will include a variety of items at
different levels of cognitive complexity. The cognitive domain is looked at as
a set of steps. You must take the first step before you can attain the second,
and so on. This mind set is very important when you look at congruency.
The example under LEVEL in Chart 1 illustrated an aspect of
testing called CONGRUENCY. Congruency is teaching and testing at the same
level. The level of the objective is matched with the placement of test items.
Chart 2 is an example of congruency; testing what you are teaching using
Objective 7 in Table 1.
The teacher is teaching Objective 7 at the application level.
Similarly, to state that a student can fully perform at the application level,
the test must assess at the application level. In the chart, if the teacher
uses Test 1 Objective 7 has not been tested to the level of the objective, and
you will not be able to state that the student who passed has mastered the
objective. Test 2 is the reverse, you have set the students up for failure
because you are testing at a mastery level you did not teach them to attain.
Test 3 gives you a variety of ways to test for mastery of the objective level
application, with Test 3 version b being used for Objective 7.
You would use Test 3 versions a, b, or c, if you were testing
prerequisite or enabling objectives. While testing for maximum performance of
the objective action verb you may need to ask questions on the prerequisite and
enabling objectives to insure that the student had these abilities, otherwise
you will not know why the student failed at the full performance measure. The
testing of prerequisite and enabling objectives is extremely important, it
helps you in being diagnostic and prescriptive in your test critique and
determining if you taught with sufficient emphasis, depth, and breadth, the
objective. An example of an enabling test question would be to give the value
of [PI] if the objective full performance was to calculate the circumference of
a circle given its radius.
To do the calculations for the TIME and Q/P/% columns of the
table of specifications the teacher must use the following formulas for each
objective in the table.
FORMULA "A"—time in class spent on objective (min) / total time
for the instruction being examined (min) = % of instruction time
Example from Table 1 using Objective 1: total time for
instruction 600 minutes. Time in class spent on Objective 1 95 minutes.
95/600 = .16 or 16%
THEN the instructor should look at the number of test items
and their point weight per question and complete Formula "B."
FORMULA "B"—point total of questions for objective / total points
* on examination = % of examination value
Example from Table 1 using Objective 1:
16/100 = 16%
Then the two percentages from Formula "A" and
Formula "B" should be placed in Formula "C." If the outcome
of Formula "C" is within the established parameters, the teacher may
go to the next objective until they have completed the process for all
objectives.
(* Total points is academic point value assigned to
examination)
THEN the two percentages from Formula "A" and
Formula "B" should be placed in Formula "C." If the outcome
of Formula "C" is within the established parameters, the instructors
may go to the next objective until they have completed the process for all
objectives.
FORMULA "C"—Percent of instruction time = percent of examination
value (within +- 2 percent, if not, redo test)
Example from Table 1 using Objective 1:
16 = 16
Using as an example Table 1 objective NO. 1 had 95 minutes of
instructional time spent on it. The total time of instruction covered by the
test was 600 minutes. Using Formula "A" objective NO. 1 would have
16% of the instructional time. Using Formula "B" 16% of the
instructional time would equate to 11 questions and 16 points. Formula
"C" compares the two percentages. The percentages should be within
the values established for content validity for an examination.
TIME equals the time, expressed in minutes, spent in class
and other learning activities on the objective. Mehrens & Lehman (1973)
state the major advantage of teacher made tests is that a teacher made test can
be tailor made to fit the teacher's unique and/or particular objectives.
However, the teacher must insure that appropriate weight is given during the
test to those particular objectives. The formulas for calculating time have
already been presented. Remember that all these times are in minutes and then
converted to percent. The use of these formulas and their answers determine the
distribution of numbers of questions on the test and point values assigned to
said questions. Emphasis given during instruction must be used to assign weight
in a test. Emphasis on an objective in a class and corresponding activities is
a students' first and major clue to relevance and value of what is being
taught. You have been in class where the teacher spend "X" amount f
time on a subject and there is one question on the test covering that material
and 14 on something that was covered by a paragraph in the text. The way the
Table is constructed, time on objective, both direct and integrated is used to
establish relevance of material to the students and for test construction.
Total Time Spent Teaching all material is the baseline that is used to
determine the weight given to the objective in the overall scheme of the Table.
Mehrens & Lehman (1973) states there is no guarantee a "match"
between instructional objectives and test item will take place if a Table is
used; it will only indicate the number or proportion of test items to be
allotted to each of the instructional objectives specified.
The final distribution of items in the Table of
Specifications should reflect the emphasis given during the instruction. This
concept of relative weight impacts both the construction of the Table and
student perception that the test is fair. Objectives considered more important
by the teacher should be allotted more test items. Similarly, areas of content
receiving more instruction time should be allotted more test items. Too often
students say, "I studied the chart in the book that we spent two days on
and then there was nothing on the test. And where did that essay on cause and
effect come from." Relative weighting will alleviate these types of
student comments.
Although the decisions involved in making the Table are
somewhat arbitrary and the process is time consuming, the preparation of the
Table of Specifications is one of the best means for ensuring that the test
will measure a representative sample of instructionally related tasks.
The percentages are then used to determine the number of
questions per objective and the value of points per objective.
Q/P/% is the number of questions (Q) and points (P) by
percent (%) that represent the emphasis of instructional time based on relative
weight. These are the number of questions and points that are the bench mark
for test development. In the example below from Table 1, partially reproduced
here as Chart 3, the Q/P/% of Objective 3 is in bold (6/9).
Linn & Gronlund (2000) provided the rationale behind the
Q/P/% when they stated "We should like any assessment of achievement that
we construct to produce results that represent both the content areas and the
objectives we wish to measure, and the table of specifications aids in
obtaining a sample of tasks that represents both. The percentages in the table
indicate the relative degree of emphasis that each content area and each
instructional objective is to be given in the test".
Linn & Gronlund (2000) further stated "this table
indicates both the total number of test items and assessment tasks and the
percentage allotted to each objective and each area of content. For classroom
testing, using the number of items may be sufficient, but the percentages are
useful in determining the amount of emphasis to give to each area".
Linn & Gronlund (2000) summed up
The second major heading in the Table body is ITEM TYPE. Item
type is the type(s) of test item(s) used to test the student's ability to
obtain the objective. The Test Item Format Chart below provides a visual
representation of the levels of the cognitive domain that can be tested by the
five basic test items used on teacher made tests. Depending on complexity,
wherever possible use the most simplistic test item format.
Using Table 1, partially reproduced here as Chart 5, as an
example, Objectives 1 and 3 are both full performance at the application level.
However, they are being tested by different item types, but with the correct
types of questions as prescribed by the chart. The use of the essay in
Objective 3 may be to explain reasoning or a procedure required by the
objective for full performance.
The third subheading in the Table body is Bloom's
Taxonomy/Congruency. LEVELS of the domain tested and the total number of the
types of questions in the level(s) tested are listed. This will assist in
determining if testing is at multiple levels, only at the highest level, or at
too high a level. The base Table of Specifications is set up for the cognitive
domain. If testing the affective or psychomotor domain, the Table is the same,
except the cognitive levels would be replaced by the levels of the affective or
psychomotor domain.
The sums of the columns and row should be equal. If they are not, then the addition is incorrect. The bottom right hand corner is where the column and row totals are found. The total number of questions for each level of the domain is summed objective. Then all the levels of the domains are added. This total should equal the total number of questions which where determined to be on the test. Similarly the values of each question for each objective are summed and the total of all points is added. This total should equal the set value of the examination.
NOTE: Common sense is important. Make point values whole
numbers, no 1.5, etc. You will spend too much time grading. The questions per
objective and point value are assigned based on percent of time taught
including direct instruction and integrated instruction. Therefore one percent
equals one question worth one point. However, if you use a question and it is
worth two points look at that as two questions. If you have an essay question
worth 5 points look at it as five questions. Also, when rounding up or down to
get a full question or point, always round up for the higher level objectives.
Number of questions per objective can go down but point value per objective is
not changed.
Using Table 1, partially reproduced here as Chart 7, the
objectives and points are:
Summarizing the objectives and their point totals in Chart 7
would look like this:
To check that your test is assessing as taught you look at
the total row at the bottom of the Table 1, partially reproduced here as Chart
8 and you will see if values are within line.
To keep with the wedding theme something needs to be
borrowed. We have borrowed two things for this wedding. We are going to borrow
from Carey (1988) some thoughts on how to make the Table provide a test that is
both valid and reliable.
Carey (1988) stated "During the design of classroom
tests, you need to be concerned with the validity and reliability of test
scores. We have discussed content validity and how the Table will provide for
it. Reliability is not normally associated with the Table. Reliability refers
to the consistency or stability of scores obtained from a test. If the scores
are unreliable, decisions or inferences based on them are dubious. Tests must
be designed carefully to yield reliable and valid scores".
Carey (1988) continued that there are "five steps during
the design stage you must take to achieve reliable test results: (1) select a
representative sample of objectives from the goal framework; (2) select enough
items to represent adequately the skills required in the objective; (3) select
item formats that reduce the likelihood of guessing; (4) prescribe only the
number of items students can complete in the time available; and (5) determine
ways to maintain positive student attitudes toward testing. The subordinate
skills in an instructional goal framework should be divided into prerequisite
skills (skills students should have mastered before entering a unit of
instruction) and enabling skills (skills that comprise the main focus of
instruction for a unit)". The Table presented takes into account the five
steps that will make a test reliable.
The second thing borrowed is Linn & Gronlund's (2000)
idea to embed related non-test assessment procedures in an expanded Table of
Specifications.
Reproducing the assessment plan shown earlier as Chart 8 and
we could add the class attendance, homework, class participation, and quiz
points used during the instructional time that our test covered. In Table 1
(reproduced from page 3) with the added non-test points we have added the
categories and values in the heading of the Table and then emphasized in the
body of the Table the non-test learning activities and their relative points by
underlining.
Example
Assessment Plan:
Observation time on
Homework
6 x 20 = 120
Quizzes
Closed
book
2
x 25 = 50
Tests
Marking period
exam 1 x 200 = 200
Teaching Application
Learning Application
Test 1 Knowledge, Comprehension
Test 2 Comprehension, Application
Synthesis
Test 3 Version a.
Knowledge,
Application
Version b. Knowledge, Comprehension, Application
Objectives #
Point Value
10% 1
Knowledge 12
21% 2
Comprehension 21
44% 4
Application 34
10%
1 Analysis 10
08% 1
Synthesis 06
07%
1 Evaluation 07
100%
100
TABLE 1
Heading
Course Title: Art III
Grade level: 6, 7, 8,9, 10, 11, 12 (Circle
as appropriate)
Periods test is being used: 1 2 3 4 5 6 7
(Circle as appropriate)
Date of test: April 15, 2003
Subject matter digest: 19th and 20th
Century Art. Includes artists
from around the world. Oils and
water colors as primary medium.
Identify major works, styles, and schools.
Type Test: Power, Speed, Partially Speeded
(Circle One)
Test Time: 45 minutes
Test Value: 100 points
Base Number of Test Questions: 75
Constraints: Test time, quantity of art
available for test items
Reference: Notar, C., Zuelke, D.,
Wilson, J., and Yunker, B. (2004). Table of Specifications: Insuring
Accountability in Teacher-Made Test. Journal of Instructional Psychology.
Vol.1, Issue2. Retrieved from, http://www.freepatentsonline.com/article/Journal-Instructional-Psychology/119611686.html
Prepared by: Danielou P. Galla
it was very good organized i like it
ReplyDeleteThis is fantastic...and well organized
ReplyDeleteCoz help me to know the meaning and how to prepare table of specification......I like it...a lot...!!!