Assessment is sampling, and this has consequences

The structure of knowledge is a complex thing. Very rarely can it be codified as a simple set of statements. There are facts, yes, but also links, opinions, metaphors, context, images (both mental and physical) and much more to boot. Exam boards put enormous effort into trying to distinguish between those students who have ‘got it’ and those who have not. As a teacher setting a low-stakes assessment, you find yourself trying to second-guess the exam board.

So how does the exam board go about it? They cannot test everything that has been learned over the course of two years, so they sample. They ask a range of questions that taken together should give a reasonably accurate view of the overall attainment of the student.

The same is true when you set a low-stakes assessment such as an end-of-topic quiz. Even if you include a full list of facts that have been covered in the topic, you are still only sampling the much more complex structure of knowledge that is (you hope) in the student’s head.

With students, sampling carries a risk. A student who passes a test may well believe that s/he has mastered the subject, but all the test actually shows is that s/he has mastered those items that were sampled. This is why the exam board must go to great lengths to ensure that their questions remain secret until the last moment. What a lot of teachers don’t realise, I suspect, is that the same principle applies to you.

There is an enormous difference between a student who aces a test on the first attempt, and one achieves the same score after having spent time reading and rote-learning the specific questions. The latter case is clearly not a valid sample and is likely to lead to gross over-confidence in the student.

How, then, does this square with the principle of formative assessment that everything should be routinely assessed? The principle is simple. Like the scientist sampling from the lake, we must test multiple samples from different locations. Rather than use a single question, we should exploit multiple questions that address the same concept, but in different ways. As far as is practical, the student never sees exactly the same question twice.

If you are an assessment author, you are probably feeling a little faint at this point. Are you seriously expected to suddenly start writing five or ten times more questions than you already have? That is indeed how exam boards build their question banks, but there are shortcuts you can take. Here are my favourite three:

  1. Use other peoples’ questions. Yes, only you can write questions that address exactly the way you taught the topic. But other authors are writing questions aimed at the same exam syllabus, and their understanding of the requirement may be just as valid as yours. Giving students access to their perspectives is a good thing, not a bad thing.
  2. Time is on your side. A student will forget a specific question after a week or so, enabling you to recycle it. If you are following an Ebbinghaus revision schedule for your students, and you should be, then you can judiciously re-present old questions.
  3. Let auto-randomisation of options do the job for you. This is weakest of my three solutions, but as it is zero work to implement, I would still recommend it as part of the mix.

Up to this point I have been careful to talk in terms of principles that apply to any low-stakes assessment regime, but I shall switch now to describing specifically what are doing with Yacapaca.

The core of our approach has to do with keywords and key concept analysis. In order to present students with multiple perspectives on the same concept, we first need to know which questions actually address that concept. We have done a huge amount of work using both crowdsourcing and computational linguistics to link each of our our 172,000 questions(!) with the concept(s) it addresses.

Having done that, we have built an engine that delivers Computer-adaptive Testing (CAT) streams. These are questions that are selected algorithmically, on the fly, to match certain criteria. By linking these into a detailed profile of the concepts each student has covered, we are able to offer a much more advanced approach to low-stakes assessment. Contrast this with a standard quiz that contains a static set of questions selected by an author.

We currently use CAT streams in two places

  • Homework, in which the questions present all the relevant concepts within a given topic. Topics are extracted from the syllabus you selected for the student set.
  • Revision, in which all completed topics are covered, thus ensuring that previous material does not get forgotten.

More on the differences here.

Recently I have been looking at how to make all this more explicit in Yacapaca, for both teachers and students. The new Results page is part of this. The next step will be to make the students’ revision log (which already exists) visible, and to gamify it appropriately. Watch this space.


Posted in:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: