Multiple choice questions (MCQs) continue to dominate dental education, particularly in the early years of basic sciences and knowledge-based assessment.(1) A key criticism of MCQs are that they tend to reflect “a combination of what the student knows, partially knows, can guess, or is cunning enough to surmise from cues in the questions.”(2) Emerging assessment methods such as Extended Matching Questions (EMQs) may address some of the key flaws of MCQ examinations and improve the potential of item selection tests to assess application of knowledge beyond memorisation of discrete facts.(3, 4) This paper will examine the history and rationale of MCQ assessment and reference this to the ‘Criteria for Good Assessment’ forwarded by Norcini et al.(5) as a means of comparing MCQ and EMQ assessment strategies.
Historical development
Frederick J. Kelly wrote the first multiple choice question (MCQ) in 1914 in an attempt to improve standardisation and simplify marking compared with assessment methods such as short answer questions. It is interesting to note that after nearly a century of application, early examples of multiple choice questions bear significant resemblance to today’s models.(6) Common features of the MCQ include a stem, which poses the question or provides the scenario and two or more options from which the examinee can select. There are two broad types of multiple choice formats; ‘True/False’ and ‘One-Best-Answer’. True/False question formats are inherently problematic in that they purport that things can be 100% true or 100% false- a prospect that seldom exists, or requires significant contextual clarification to be justified. Case and Swanson warn “to avoid ambiguity, we are pushed toward assessing recall of an isolated fact- something we are actively trying to avoid.”(7) Some advocate desertion of the True/False question format on this basis.(3, 7) One-best-answer questions remain a widely used assessment tool (7) and will provide the reference for further exploration in this paper when referring to MCQs.
Rationale for the use of multiple choice questions in medical and dental education
The MCQ has become widely adopted over the past century in medical, dental and allied health education. Some of the key benefits possessed by MCQs are the broad range of knowledge that can be assessed in a short period of time and improved reproducibility in marking compared with favoured earlier methods such as short answer, essay and oral examination formats.(8) In addition to these pragmatic benefits, the adoption of optical and computerized marking for MCQs enhances the inherent cost-effectiveness of MCQ administration. An academic climate in which faculty are often pressured to ‘process’ increasing numbers of students with limited resource allocation provides an important impetus for the continued use of MCQ formats for assessment of core knowledge.(9)
It is now recognised that MCQs are largely limited to assessment at the ‘knowledge’ level of Miller’s Pyramid of competency and alternative methods have been forwarded for this purpose. Among these is the Extended Matching Question (EMQ) format. In many ways, the EMQ bears similarities to the MCQ and can be considered part of this broad family.(10) Sue Case and David Swanson are primarily credited for the development of the EMQ.(10) Their work began in the 1980’s,(11) but gained refinement and momentum primarily in the 1990s.(7) The main motivation for development of this test instrument was to link theoretical knowledge with diagnostic and management decisions, however it has also shown its worth in examining the basic sciences.(10) An EMQ usually has four components; a theme (e.g. ulcers); a list of potential answers (typically 10- 20); the lead in for the question (e.g. select the most likely diagnosis for each patient); and the question in the form of a vignette providing key information (e.g. age, gender and presentation of the patient). Multiple questions and vignettes can be related to the same potential answer list and answers may be correct for more than one question or none at all.(4)
Student and staff perspectives
Despite the presence of alternatives, MCQ assessment remains particularly prevalent in undergraduate dental education during the earlier years in which the basic sciences are being taught and tested.(1) Anecdotally, both students and staff alike consider multiple-choice examinations to be easier for students than alternative test instruments
Although the hunger for marks persists in MCQ examinations among dental students accustomed to the positive feedback of good grades, there is also a perceived safety net against identification of incompetence associated with this testing tool since it is often easy to form a reasonable guess even if the answer is not known.(2) It is the author’s first hand experience that this may encourage complacency among poorer students who may not have grasped the knowledge, but have well-honed test taking skills, and frustration among high achievers whose knowledge may not be differentiated with this instrument. Such student perceptions are a reflection of the broader possible shortcomings of construct and criterion validity associated with MCQs.
Efforts have been made to offset these potential shortcomings by accounting statistically for the effects of ‘guessing’ as part of multiple-choice examinations. There are several theories and equations that seek to offset this effect. Many rely on varying penalties for incorrect answers. Although these strategies serve to alleviate measurement error, there is also an argument that they penalize unfairly against risk-averse students. Furthermore, the complexity of correction strategies may represent difficulties that detract from the main appeal of MCQ as a simple test instrument.(12)
Since the key detractors to multiple-choice questions are the prospect of guessing, cueing and difficulty in item writing so as not to create these artifacts,(3, 12) an ideal solution would be one that limited these features whilst maintaining the pragmatic benefits of MCQ. Based on these desired properties, EMQs have been forwarded as an alternative to MCQs.(3)
Assessment of learning
The notions of validity and reliability are key considerations in determining the capacity for a test instrument to adequately assess learning. Fenderson et al.(3) state that a valid test should not only demonstrate attainment of content knowledge and course goals, but also “make meaningful distinctions among students at different levels of ability, leading to defensible pass/fail decisions.” A comparison study by this group in 1997 demonstrated that an EMQ assessment had greater capacity than an MCQ assessment to differentiate well prepared from marginal students based on the discrimination indices of each test. (3)
The possible compromises in MCQ validity are not compensated for by the reliability of results obtained through this format. The notion of reliability extends beyond the ability to create reproducible results and depends also on an accurate assessment of the student’s knowledge.(3) Reliability in MCQs is highly dependant on the ‘distractors’ of a question operating effectively.(10) The higher reliability associated with EMQs compared with MCQs as indicated by demonstration of higher coefficient alpha values in comparison studies are likely to reflect the higher number of distractors available to students in the EMQ format.(3) It has been demonstrated that 7-12 relevant distractors provide sufficient insulation against the effects of ‘guessing’ on the results of EMQs.(13) It is also interesting to note that EMQs with 20 answer choices have been shown to be as reliable and valid as uncued formats with several hundred choices.(3)
Beyond considerations of validity and reliability, modern assessment strategies should be evaluated in terms of their feasibility and acceptability to stakeholders in addition to their potential ‘educational effect’ (assessment AS learning) and ‘catalytic effect’ (assessment FOR learning).(5)
Feasibility and acceptability considerations
The aforementioned benefits of MCQ in terms of feasibility are largely transferable to the EMQ format. It has been forwarded that MCQs are in fact more difficult to write than EMQs. It is the requirement to write plausible distracters that are true or false that makes MCQ item writing more difficult than EMQs.(3, 10) Furthermore, efficiency is enhanced using an EMQ format since many questions can use the same answer set.(14) When one considers that optical and computer marking may be equally applicable to EMQs, familiarity with the MCQ format, reliance on existing MCQ pools and general resistance to change may remain as the only practical deterrents to EMQ adoption from a feasibility and staff acceptance perspective.
Assessment as and for learning
MCQs are adopted primarily when the knowledge being assessed can be broken down into facts that the student should know. As a result, there is a tendency for ‘rote learning’ and memorisation of trivia that is driven by MCQ adoption.(9)
It has been posited that “multiple choice examinations often place undue emphasis on recall and stimulate students to learn in a like mode.”(8) (Wilson) Memorisation of discrete facts, or worse still, pools of questions that are commonly available through past papers or a ‘black market’ source are unlikely to create the integrated knowledge base that underpins the safe clinical practice of health professionals. EMQs are not free from such risks, but the domains of diagnosis and management questions to which this tool is well-suited and increased number of potential answers assert a certain protective effect.(3, 4) Furthermore, the challenge to link theoretical knowledge with clinical presentations, diagnosis and management concepts affords a unique capacity for EMQ preparation to develop important foundation skills in clinical reasoning.(3)
A further criticism of the MCQ lies in the lack of feedback and therefore ‘assessment for learning’ that it affords relative to alternative written test methods. Analysis of EMQ test items can provide additional information about the strengths and weaknesses of the student and the programme.(4) This is relevant not only at an undergraduate level, but also in continuing professional development after graduation. MCQs are often used in this context, however they are not usually mandatory since documentation is not required as part of CPD requirements for dentists in Australia. Such assessment items therefore presumably have their major function in self-appraisal and remediation. The direct potential implications that such learning may have on patient treatment and outcomes demands feedback for learning that is best able to encourage mastery of content and integration of concepts, avoiding false confidence in the application of new knowledge. It is doubtful whether MCQ meets this need and further research is required to illuminate the potential for EMQ and other alternatives to address these shortcomings.
Conclusion
Since its original inception almost 100 years ago, guidelines have been created to direct MCQ item writing to minimise the hints and cueing that can arise, however this format remains subject to difficulties in accounting for guessing and chance.(7) A more recent iteration of the MCQ is the EMQ. This assessment tool provides greater potential to distinguish between good and poorly performing learners, is more favourable in terms of the test preparation that it encourages, and has greater capacity for feedback to both students and faculty.(3, 4) In addition, the EMQ format reduces the possible bias and need to correct for guessing and is more efficient in its construction than the MCQ whilst maintaining the key benefits of ease and reproducibility of marking possessed by the MCQ.(3, 10) On this basis, educators may consider adoption of the EMQ in preference to the traditional MCQ format, especially where intermediate steps of reasoning are to be encouraged through concept linking.(3) Such a transition should be viewed as evolutionary rather than revolutionary, but is progressive nonetheless.
REFERENCES
1. Albino JE, Young SK, Neumann LM, Kramer GA, Andrieu SC, Henson L, et al. Assessing dental students’ competence: best practice recommendations in the performance assessment literature and investigation of current practices in predoctoral dental education. Journal of dental education. 2008 Dec;72(12):1405-35.
2. Newble D, Baxter A. A comparison of multiple-choice and free-response tests in examinations of clinical competence. Med Educ. 1979;13:263-8.
3. Fenderson B, Damjanov I, Robeson M, Veloski J, Rubin E. The virtues of extended matching and uncued tests as alternatives to multiple choice questions. Human Pathology. 1997;28(5):526-32.
4. Bhakta B, Tennant A, Horton M, Lawton G, Andrich D. Using item response theory to explore the psychometric properties of extended matching questions in undergraduate medical education. BMC Medical Education. 2005;5:9.
5. Norcini J, Anderson B, Bollela V, Burch V, Costa M, Duvivier R, et al. Criteria for good assessment: Consensus statement and recommendations from the Ottawa 2010 Conference. Medical Teacher. 2011;33:206-14.
6. Davidson C. How We Measure. In: Davidson C, editor. Now You See It: How the Brain Science of Attention Will Transform the Way We Live, Work, and Learn. New York: Penguin Books; 2011.
7. Case S, Swanson D. Constructing Written Test Questions For the Basic and Clinical Sciences. 2002 [cited 2012 30th September]. Available from: http://www.nbme.org/publications/item-writing-manual-download.html.
8. Wilson R, Case S. Extended Matching Questions: An alternative to Multiple-choice or Free-response Questions. Journal of Veterinary Medical Education. 1993;20(3).
9. DiBattista D, Kurzawa L. Examination of the Quality of Multiple-choice Items on Classroom Tests. The Candian Journal for the Scholarship of Teaching and Learning. 2011;2(2):1-23.
10. Jolly B. Written examinations. In: Swanwick T, editor. Understanding Medical Education: Evidence, Theory and Practice. Malaysia: Wiley-Blackwell; 2010.
11. Case S. The development and evaluation of a new instrument to assess medical problem solving. Dissertation Abstracts International. 1983;44:1764.
12. Espinosa M, Gardeazabal J. Optimal correction for guessing in multiple-choice tests. Journal of Mathematical Psychology. 2010;54(5):415-25.
13. Zimmerman D, Williams R. New look at the influence of guessing on the reliability of multiple-choice tests. Applied Psychological Measurement. 2003;27:357-71.
14. Swanson D, Holtzman K, Clauser B, Sawhill A. Psychometric characteristics and response times for one-best-answer questions in relation to number and source of options. Academic Medicine 2005;80(s):S93-6.