There are of course a host of huge problems with the reliability issue attending the end of keystage assessments (“SATs”) in English education. They are subject to teacher/headteacher policy manipulation (how much time a school or a teacher decides to spend on English and maths, say), to the statistical slop within the tests themselves (just a look at the overlaps of standard deviation would render your faith in the outcomes completely null), to the variation year-on-year in the quality and “hardness” of the tests (though the DfE will say that there is none) and on the setting and preparation made by the school for the taking of the tests themselves (relaxed classroom testing is completely different than silence in a hall).
So the reliability of the tests does not meet the necessary criteria, which must include:
- standardised test of a
- standardised curriculum taught in a
- standardised manner, and then sat under
- standardised test conditions when everyone has a
- standardised understanding of the stakes attending the assessment.
This very rarely happens, and thus at each stage the reliability of tests diminishes.
The same is true of book scrutiny, and one of the purposes of this post is to point you to a new blog post from Becky Allen, entitled The Book Scrutiny Monster. It is absolutely essential reading for all teachers, for all heads who use book scrutiny as a means of checking “quality of education” and for any inspector. In fact, I strongly suggest that you print it off if you are expecting an Ofsted visit, and along with the safeguarding information that you present to her or him, give the inspector a copy of it and do not let him through the security door until he has signed to say he has read it.
At the very least it will change the initial conversation, and put “book scrutiny” back in its proper place.
Becky Allen makes a range of points about book scrutiny, and I am not going to repeat them here. Daniel Muijs is the author of the research background to Ofsted’s most recent approach to book scrutiny as contained within the new framework. The research background from Ofsted is not to do with the effectiveness of book scrutiny, but more about whether Ofsted can be sure that an inspector can use it reliably. But along with this highly non-standardised form of assessment (where variation within schools is far higher than variation between schools) in which vague and imprecise language and descriptors are used, and where there is huge variation in the way that books are used between subjects, there is at least an acknowledgement from Ofsted, which heads need to hear really clearly, that book scrutiny yields poor information on the actual quality of teaching and cannot reliably be used to measure progress over time.
Where book scrutiny is really useful, for instance, is in moderation of quality (and the comparative assessment approach that Daisy Christodoulou has been urging on us is perhaps the best way of doing this – so it hardly counts as book scrutiny!), in the assessment by leadership of the quality of written feedback (a small part of one factor that we know has some impact on children’s learning), of checking whether children are recording their work using writing, checking the quality of handwriting and other transcriptional issues, etc. As a head, I did all of these, and (hopefully) the outcomes were welcomed and the feedback given to year group pairs attended to. At our SIAMS inspection two years ago, there was an acknowledgement that in one year group, there was simply not enough RE recorded. It was a fair cop, and it stopped RE from getting the same outstanding grading as all the other parts of the inspection.
But it is such a vague way of assessing progress generally, that it creates more problems than it solves, and certainly raises more questions. A comparative moderating meeting between schools held in the spring of 2018 showed the variation in the way schools do it – holistically or mechanically – and made me realise that until you take appreciation and celebration rather than scrutiny as the benchmark of your teacherly attitude to writing, you will forever be left with more questions than answers, and new questions – unnecessary ones, will emerge too!
So read Becky’s post, read the stuff on pedagogical styles and her reservations, which I share, on having a single quality of education judgment. Her solution? Do an in depth study of “20 schools:
that all have reasonably similar demographic profiles and for each one carry out a month long inspection with at least 10 subject specialists on site. During that month, inspectors spend time with leadership and in their subject departments observing lessons, talking to teachers, testing students, and so on, to work towards an overall judgement of school quality. Now, I am not suggesting that this would be true school quality, but it would be a fuller picture of school provision than the status quo. Just before this month-long inspection begins, we send in a different set of inspectors to carry out a standard Ofsted inspection, without reporting the judgment. At the end of the project, we would be able to compare the ranking of the schools under the conventional and the extended inspection. If rankings of the 20 schools were considerably different, this would suggest that our inspection system is not fit-for-purpose.
This is probably the only way of giving any meaning at all to Ofsted judgments. Otherwise we may as well scrap the whole thing (my preferred solution!).