For Your Consideration #8
 

Written and designed by the staff of the Center for Teaching and Learning. Reproduce with permission only. 

Improving Multiple Choice Questions


November 1990

Although multiple choice questions are widely used in higher education, they have a mixed reputation among faculty because they seem inherently inferior to essay questions. Many teachers believe that multiple choice exams are only suitable for testing factual information, whereas essay exams test higher-order cognitive skills. Poorly-written multiple choice items elicit complaints from students that the questions are "picky" or confusing. But multiple choice exams can test many of the same cognitive skills that essay tests do, and, if the teacher is willing to follow the special requirements for writing multiple choice questions, the tests will be reliable and valid measures of learning. In this article, we have summarized some of the best information available about multiple choice exams from the practices of effective teachers at UNC and from testing and measurement literature.

 

Validity and Reliability

The two most important characteristics of an achievement test are its content validity and reliability. A test's validity is determined by how well it samples the range of knowledge, skills, and abilities that students were supposed to acquire in the period covered by the exam. Test reliability depends upon grading consistency and discrimination between students of differing performance levels. Well-designed multiple choice tests are generally more valid and reliable than essay tests because (a) they sample material more broadly; (b) discrimination between performance levels is easier to determine; and (c) scoring consistency is virtually guaranteed.

The validity of multiple choice tests depends upon systematic selection of items with regard to both content and level of learning. Although most teachers try to select items that sample the range of content covered by an exam, they often fail to consider the level of the questions they select. Moreover, since it is easy to develop items that require only recognition or recall of information, they tend to rely heavily on this type of question. Multiple-choice tests in the teacher's manuals that accompany textbooks are often composed exclusively of recognition or recall items.
 
 

Cognitive Levels

Teachers are aware of the fact that there are different levels of learning, but most of us lack a convenient scheme for using the concept in course planning or testing. Psychologists have elaborate systems for classifying cognitive levels, but a simple three-level scheme is sufficient for most test planning purposes.

We will designate the three categories recall, application, and evaluation. At the lowest level, recall, students remember specific facts, terminology, principles, or theories--e.g., recollecting the equation for Boyle's Law or the characteristics of Impressionist painting. At the median level, application, students use their knowledge to solve a problem or analyze a situation--e.g., solving a chemistry problem using the equation for Boyle's Law or applying ecological formulas to determine the outcome of a predator-prey relationship. The highest level, evaluation, requires students to derive hypotheses from data or exercise informed judgment--e.g., choosing an appropriate experiment to test a research hypothesis or determining which economic policy a country should pursue to reduce inflation. Since it is difficult to construct multiple choice questions at the complex application level, we have included an example at the end of this article.

Analyzing your course material in terms of these three categories will help you design exams that sample both the range of content and the various cognitive levels at which the students must operate. Performing this analysis is an essential step in designing multiple choice tests that have high validity and reliability. The test matrix, sometimes called a "table of specifications," is a convenient tool for accomplishing this task.
 
 

The Test Matrix

The test matrix is a simple two-dimensional table that reflects the importance of the concepts and the emphasis they were given in the course. At the initial stage of test planning, you can use the matrix to determine the proportion of questions that you need in each cell of the table. For example, Figure 1 below reflects an instructional unit in which the students spent 20% of their time and effort at the recall and simple application levels on Topic I. On the other hand, they spent 25% of their effort on the complex application level of Topic II. This matrix has been simplified to illustrate the principle of proportional allocation; in practice, the table will be more complex. The greater the detail and precision of the matrix, the greater the validity of the exam.
 


Figure 1.
Cognitive Levels
Recall Application Evaluation
Topic I 10% 10% 5%
Topic II 5% 10% 25%
Topic III 5% 10% 20%

As you develop questions for each section of the test, record the question numbers in the cells of the matrix. Figure 2 below shows the distribution of 33 questions on a test covering a unit in a psychology course.

In practice, you may have to leave some of the cells blank because of limitations in the number of questions you can ask in a test period. Obviously, these elements of the course should be less significant than those you do choose to test.
 


Figure 2.
Topics Recall Application Evaluation
A. Identify crisis vs. role confusion; achievement motivation. 2, 9 4, 21, 33 16 18%
B. Adolescent sexual behavior; transition of puberty. 5, 8 1, 13, 26 11 18%
C. Social isolation and self-esteem; person perception. 14, 6 3, 20 25 15%
D. Egocentrism; adolescent idealism. 7, 29 12, 31 10, 15, 27 21%
E. Law and maintenance of the social order. 17 22 18 9%
F. Authoritarian bias; moral development. 19 30 24 9%
G. Universal ethical principle orientation. 28 23 32 9%
33% 40% 27%
 
 

Guidelines for Writing Questions

Constructing good multiple choice items requires plenty of time for writing,review, and revision. If you write a few questions after class each day when the material is fresh in your mind, the exam is more likely to reflect your teaching emphases than if you wait to write them all later. Writing questions on three-by-five index cards or in a word-processing program will allow you to re-arrange, add, or discard questions easily.

The underlying principle in constructing good multiple choice questions is simple: the questions must be asked in a way that neither rewards "test wise" students nor penalizes students whose test-taking skills are less developed. The following guidelines will help you develop questions that measure learning rather than skill in taking tests.

 
 

Writing the Stem

The "stem" of a multiple-choice item poses a problem or states a question. The basic rule for stem-writing is that students should be able to understand the question without reading it several times and without having to read all the options.  

Writing the Responses

Multiple-choice questions usually have four or five options to make it difficult for students to guess the correct answer. The basic rules for writing responses are (a) students should be able to select the right response without having to sort out complexities that have nothing to do with knowing the correct answer and (b) they should not be able to guess the correct answer from the way the responses are written.  

General Issues

 

Analyzing the Responses

After the test is given, it is important to perform a test-item analysis to determine the effectiveness of the questions. Most machine-scored test printouts include statistics for each question regarding item difficulty, item discrimination, and frequency of response for each option. This kind of analysis gives you the information you need to improve the validity and reliability of your questions.

Item difficulty is denoted by the percentage of students who answered the question correctly, and since the chance of guessing correctly is 25%, you should rewrite any item that falls below 30%. Testing authorities suggest that you should strive for items that yield a wide range of difficulty levels, with an average difficulty of about 50%. Also, if you expect a question to be particularly difficult or easy, results that vary widely from your expectation merit investigation.

To derive the item discrimination index, the class scores are divided into upper and lower halves and then compared for performance on each question. For an item to be a good discriminator, most of the upper group should get it right and most of the lower group should miss it. A score of .50 on the index reflects a maximal level of discrimination, and test experts suggest that you should reject items below .30 on this index. If equal numbers of students in each half answer the question correctly, or if more students in the lower half than in the upper half answer it correctly (a negative discrimination), the item should not be counted in the exam.

Finally, by examining the frequency of responses for the incorrect options under each question, you can determine if they are equally distracting. If no one chooses a particular option, you should rewrite it before using the question again.
 
 

Developing Complex Application Questions

Devising multiple choice questions that measure higher-level cognitive skills will enable you to test such skills in large classes without spending enormous amounts of time grading. In an evaluation question, a situation is described in a short paragraph and then a problem is posed as the stem of the question. All the rules for writing multiple choice items described above also apply to writing evaluation questions, but students must use judgment and critical thinking to answer them correctly. In the example below (adapted from Welsh, 1978), students must understand the concepts of price inflation, aggregate private demand, and tight monetary policy. They must also be able to analyze the information presented and, based on projected effects, choose the most appropriate policy. This question requires critical thinking and the complex application of economic principles learned in the course.
"Because of rapidly rising national defense expenditures, it is anticipated that Country A will experience a price inflation unless measures are taken to restrict the growth of aggregate private demand. Specifically, the goverment is considering either (1) increasing personal income tax rates or (2) introducing a very tight monetary policy." If the government of Country A wishes to minimize the adverse effect of its anti-inflationary policies on economic growth, which one of the following policies should it use ?
 

Teachers have developed similar questions for analysis and interpretation of poetry, literature, historical documents, and various kinds of scientific data. If you would like assistance in writing this kind of multiple choice question, or if you would like to discuss testing in general, please call the Center for Teaching and Learning for an appointment.
 


Bibliography

EmailPublicationsHomeFYC Index

home / fyc index / publications / email