In this post I argue that compulsory GCSEs for all, which means enforced failure for some, is profoundly unethical. I also outline an alternative model for secondary school assessments. 

Educational assessment is like a festival of abstract jargon, so please bear with me. As far as I can gather, when people sit an exam you can set grade boundaries in three main ways: criterion-referenced, norm-referenced and cohort-referenced testing.

An example of criterion-referenced testing is the driving test. There are a number of predefined criteria – don’t crash etc – and if you meet enough of them, you pass, and if you don’t, you fail. Another example is music grades. In the UK, music exams are graded from 1 to 8, with grade 8 being the level required for entry to music college. Some school-based assessments are criterion-referenced also, such as Key Stage 2 SATs and the International Baccalaureate (IB). It worth noting that criterion-referencing does not rule out the ability to provide students with grades: while the driving test is a straight pass-fail, in music exams, SATs and the IB students are awarded grades to reflect varying degrees of success.

In norm-referenced testing, the grade is pre-determined by comparing your score with that of a ‘norm’ group who sat the test previously. The IQ test is a good example of a norm-referenced test. The whole point of an IQ test is to rank everyone who takes the test against the population at large, with 100 being the mid-point, or average. Cognitive Attainment Tests (CATs), which are often completed by students as they enter secondary school, work in very much the same way.

Cohort-referencing is very similar to norm-referencing – however, rather than being graded against a pre-existing ‘norm’ group, as with the IQ test, in cohort-referenced testing grades are divided up among the cohort taking the test. Job interviews and theatre auditions are examples of cohort-referenced testing. In selective settings such as these, where there are only so many vacancies available and you can only appoint out of the people who apply, only some can ‘make the cut’ – and so cohort-referenced testing seems entirely appropriate.

So, what kind of testing do we use at GCSE?

Ofqual claim that we don’t use norm-referenced assessment at GCSE. In a technical sense, this is true – we don’t use a purely norm-referenced approach, because “as the cohorts for different subjects vary, awarding grades using the same pre-determined set of percentages would make the same grade in different subjects have a very different meaning” (Ofqual, 2014, p5).

What we use instead is a kind of cooked version of normative assessment known as the ‘statistical approach’, which was described in 2012 by John Dunford, then chair of the Chartered Institute of Educational Assessors, as a “strange mix of criterion and norm-referencing”.

While it may be strange, I think it is clear that the way in which we grade GCSE exams is fundamentally norm-referenced, or more precisely cohort-referenced in nature. This can be seen clearly in the table to your right, which outlines how the old A*-G grading system maps on to the new 1-9 system (source: Joint Council for Qualifications).

Given that the Ofqual definition of norm-referenced assessment as that in which “the proportion of each grade available to the cohort is pre-determined” – it is difficult to see how the system we have could reasonably be described as “not norm-referenced” (or cohort-referenced for any pedants out there).

So what? What’s wrong with norm-referenced testing?

Often, conversations about the new assessment system focus on the top end: how many top grades will be available, what will count as a pass, what will be the new equivalent of the C/D borderline and so on? For example just this week, a top civil servant suggested that only 2 students in the UK will get straight level 9s, and the current Secretary of State for Education announced that whereas previously the DfE had announced that Grade 5 would be the equivalent of a ‘good pass’ (i.e. that required by colleges and employers), this has now been revised so that a grade 4 will be a ‘standard pass’ and grade 5 will be a ‘strong pass’.

Setting aside any Spinal Tap comparisons (let’s take this baby all the way up to 11), all of this focus on pass rates overlooks an important truth. Look at the bottom half of the table above: no matter how had they work – and no matter what raw scores they achieve – our current system guarantees that 32.1% of students (16.7 + 8.1 + 4.1 + 2.0 + 1.2) will not meet the level required for a “standard pass”, and will therefore have to repeat their exams post-16.

Is everyone OK with this? Because it doesn’t have to be this way.

Why don’t we have a criterion-referenced system?

Back to Ofqual:

“When GCSEs were first being developed in the mid-1980s the Government’s intention was that criteria-related grades would be introduced as soon as practicable with candidates who reached the required standard being awarded those grades. Despite heroic efforts, it proved impossible in practice to meet that intention. So GCSEs have never been criterion referenced” (Ofqual, 2014, p5-6).

And apparently, that’s it – there have been some unspecified “heroic” efforts at criterion referencing, and now the idea has been laid to rest for good. (Ed – I would love to hear more about these heroic efforts – if any readers can shed some light, please comment below).

What’s wrong with norm-referenced testing?

Daisy Christodoulou has written in defence of norm-referencing here. Essentially, her argument echoes that of Ofqual – criterion-referencing is difficult to get right, and so maybe norm-referencing is the least bad option. It is certainly true that criterion-referencing is not without its problems. As Debra Kidd wrote recently:

“Fixing the results [through norm-referenced testing] protects children from a catastrophic drop in results when government ministers have fiddled with the exam system. It creates stability. The alternative is what we saw with KS2 SATs last year – a criteria based system – where the % of children meeting expected standard fell from 80% to 53%. A drop like that at GCSE would be disastrous… it seems like the fairest option in a flawed system.”

The way the debate is currently framed (i.e. within a system of compulsory testing), we have two choices – criterion-referenced and norm-referenced testing – and the latter is the least bad model.

Daisy concludes:

“Despite all the real technical flaws with criterion-referencing… there is still an element of hostility to norm-referencing amongst many educationalists. In my experience, I sense that many people think that norm-referencing is ‘ideological’ – that the only people who advocate it are those who want to force pupils to compete against each other. Nothing could be further from the truth.”

This is where we part ways. While I do have a problem with norm-referenced assessment, but it isn’t an ideological. It’s ethical. I have changed my mind on this. A few years ago, I accepted norm-referenced testing as the best way to grade students. My thinking was as follows: exams can vary in difficulty from one year to the next, but ability probably stays more or less constant from one cohort to the next; norm-referenced testing tells you how well someone performed within the context of their cohort, and that is preferable to a situation where everybody in one cohort gets good grades because the exam happened to be easier that year.

Since then however, I have thought again. My problem is not to do with competition, but compulsion. In a compulsory system of norm (or cohort)-referenced testing, no matter how hard everyone works or how well they perform, a certain proportion of them are destined for failure before they’ve even started the course. As Tom Sherrington wrote in his amusing, if genuinely angry blog post Nicky Morgan vs the Bell Curve:

The thing is this: by definition there are only a limited number of places on the bell-curve that can be called ‘Good GCSEs’.  You’ve decided to give a pejorative label (implicitly ‘Bad GCSEs’) to about 50% of all grades.  Now, instead of Grades 1-4 at GCSE representing any sort of achievement, they’ve been killed stone dead. Nice work. That didn’t take you long. 

The fact that grade 4 has since been reclassified as a “standard pass” is entirely beside the point. My question is this: how can we have arrived at the point that it’s OK to force hundreds of thousands of young people to sit exams every year when we know in advance that by definition, 32.1% of them will have to fail? Is this really necessary?

What’s compulsion got to do, got to do with it?

What must it feel like to be forced to sit an exam you know – and your teacher knows – you are going to fail? At what point did we decide as a society that it’s a good idea to force young people to sit exams at all, regardless of their wishes? Is this really in their best interest?

There seems to be an idea floating around that some must fail in order for the passes to mean something, or else we have a vacuous culture where “all must have prizes”. But this is palpable nonsense. Do all children need to take a surfing exam at age 16 in order for some people to be considered good at surfing? Of course not. Do all people have to sit their driving test at age 18? No. Why then do we feel the need to apply this madness to subject learning? If gaining a pass in GCSE English and maths is to remain a point of entry for the vast majority of jobs and college courses, then people will take it when they are good and ready. For lots of reasons, not everybody is ready to sit high stakes exams in the May of their 16th year. Those who do not wish to should not be compelled to – just as some people choose not to learn to drive, or to surf.

You might say “Ah well, GCSEs are different because people need to be literate and numerate whereas you don’t necessarily need to drive or surf”. But this is a bad argument, because passing GCSE English and maths is not the same thing as being functionally literate and numerate. GCSE maths is way harder than the Functional Skills maths exam. If functional literacy and numeracy were the grand scheme here, it would be the Functional Skills exams that are compulsory. But they’re not. If anything, forcing 30% of people to fail at GCSE English and maths is counterproductive with regard to promoting functional literacy and numeracy; repeatedly branded failures throughout their school career (let’s not forget, GCSEs are not the only compulsory exam – they are simply the high stakes exams that come at the end of compulsory schooling), they develop a bad relationship with words and numbers and actively avoid trying to improve themselves in this regard.

I cannot for the life of me understand why we as a society have decided that forcing a young person to take an exam – regardless of their ability or affinity for that subject – is an acceptable thing to do. A common argument you hear is that students who are initially resistant to a subject are later grateful for having been ‘put through the ordeal’ – that it is ‘tough medicine’ but worth it in the end. But I only ever hear this from teachers. In my ten years as a teacher I have spoken to hundreds of young people about how they feel about exams and I have yet to encounter one who peddles the tough medicine line, whereas it is commonplace for year 11 students to say things like “I used to really love [insert subject], but after the exam I never want to hear of it again.”

It doesn’t have to be this way – does it?

I work part-time at an alternative education provider – the Self-Managed Learning College in Brighton, where students aged 9-16 are not compelled to do exams, or indeed to learn about anything that they don’t want to learn about. Many of them do exams – indeed a number do exceptionally well with very little “knowledge input” from tutors – while others choose not to do GCSEs at all. This is all fine. Some teachers might find this difficult to believe, but actually the world doesn’t grind to a halt when we stop forcing children to fail exams against their wishes. What generally happens is that they spend their time getting really good at something else, like art or music or computing or film-making or enterprise or cooking or animation…

Let’s consider for a moment what a non-compulsory, criterion-referenced secondary school system might look like. I think it is fair to say that the driving test method – a single test with a straight pass/fail – would not lend itself to school-based assessments. However, the fact that people take a driving test when they are good and ready – and that they can retake the test as many times as they wish – is worth bearing in mind. But why is it so difficult to imagine a world in which GCSE exams were graded in a similar way to piano grades?

Instead of sitting a single exam in which all students are subdivided into grades of relative success and failure, why can we not have a system where students take their Grade 1 maths exam – which could either carry a straight pass/fail or could be subdivided into fail/pass/merit/distinction – when they are good and ready? In this model, gaining a Grade 1 would cause for celebration – as in, “YAY, I passed my Grade 1 maths exam, next I’ll have a go at Grade 2” – rather than “OH NO, I sat a maths exam and I only got only a Grade 1, which must mean I’m stupid coz the government is making me retake it at college’.

Here’s a question: what would happen if, when teaching young people about stuff, we actually gave them a choice about whether they wanted to take the exam at all? The second half of this sentence is something of a heresy in some educational circles, but I’m going to say it anyway: it is quite possible to learn something without sitting assessments. When somebody learns the piano, they get to a point where their teacher (assuming they have one) says ‘I think you’re ready to take your Grade 1 exam’. It is then up to the student whether they take it or not. When I learned the piano as a child, I chose not to sit any of the formal exams. Despite this, I know that I progressed to around a grade 5, which was fine for me. Classical music bored me and I had no desire to go to music college, but I was good enough to play in bands – an ability I still enjoy almost 30 years after my piano lessons finished.

Also, and this is perhaps another heresy – sometimes, people stop learning stuff. In year 7, I had oboe lessons for a few months. It wasn’t for me – it made my lips ache and the way saliva drips out onto your shoes is really disgusting. And so I stopped – and this is fine as well. Actually, to be fair the school system is kind of OK with children stopping learning things, but only if every young person in the country goes through the same pruning process in unison – and even then, only at age 14 (where you can choose around five non-compulsory subjects, although in reality the options are extremely limited by schools seeking to maximise their league table position), age 16 (whittle it down to three) and age 18 (now pick just one, if you’re lucky).

All this standardisation and top-down control over what young people can learn and when is all rather odd, isn’t it? Am I alone here?

According to Daisy Christodoulou and OfQual, writing criterion-referenced exams it too difficult and so we no longer bother. Except for when we do, like with music grades and SATs and the IB, which seem to work just fine (except for when they are compulsory – see above). In fact, I suspect that many of the problems with criterion-referenced testing would disappear if we did away with compulsory testing, and let young people choose a) what they study, b) whether they want to be assessed, and c) at what level.

Is it really so difficult to imagine a school system where for each subject, GCSE exams are split into graded exams, in the same way that music grades work, and where students can choose whether to take the assessments? Can somebody please explain to me why this is not possible or desirable?