Load to Prediction - Variation Theory

In response to Christian Moore-Anderson's question on how Predictive Processing / Active Inference theory applies to aspects of Variation Theory

Jun 29, 2025

In response to my blog From Load To Prediction, which proposes that Predictive Processing / Active Inference theory should replace the information processing/memory model of human cognitive architecture,

Christian Moore-Anderson

asked:

"Following Spencer-Brown, constructivists and enactivists, a critical operation of the mind in bringing forth a world is in the act of drawing distinctions. Does Predictive Processing discuss how distinctions are made or how teachers can provoke students to make them?"

Short answer: Yes — drawing distinctions is fundamental to Predictive Processing and Active Inference. As for how teachers can provoke students to make them — that's something we're all figuring out together.

In Predictive Processing, we develop a generative model that consists of a hierarchy of interlinked hidden states that allows us to predict our sensory stream of signals. When we encounter sensory data that doesn’t match our prediction, a prediction error is generated. The prediction error signals a difference between our internal model and the external world. Our Bayesian inference system seeks to resolve that difference by adjusting a latent variable within the currently selected branch of the generative model or switching to a better-fitting adjacent branch. It then continues predicting with the updated model, receiving feedback from new prediction errors to test its fit. These rapid prediction-error-adjustment loops are fundamental to perception and cognition. The currently selected branch of the generative model constitutes our conscious experience of reality.

When we encounter prediction errors that cannot be resolved by parameter tweaking or switching branches, the system infers that its generative model structure may need to change. This is the moment of discernment — detecting a meaningful difference that the current model cannot account for. The system generates hypotheses about new model structures that could better explain the incoming data, evaluates their expected prediction error (free energy), and selects the structure with the best fit. If this new configuration proves useful, the brain flags it as salient, and processes begin to consolidate it into long-term structure.

Crucially, these generative model updates are constrained — they can only build on what's already there. New structures must be composed from existing conceptual and inferential building blocks. This aligns with the constructivist principle of building on prior knowledge and with

Kristopher Boulton

’s Unstoppable Learning model which emphasises ensuring that students have the foundational “atoms” to build new understanding.

To illustrate how Predictive Processing and Active Inference address the idea of drawing distinctions or discernment as critical to "bringing forth a world," I turn to

Christian Moore-Anderson

's excellent book The Difference Maker.

A child growing up in a monolingual Russian village has a generative model finely tuned to the sounds of Russian. When comprehending another speaker, they are constantly predicting the next word or sound and receiving prediction errors, which are corrected within the scope of their spoken language generative model. These are quickly resolved prediction errors that are inherent to perception and understanding. When a Finnish speaker arrives, the child tries to understand using their native language model but encounters repeated, unresolvable prediction errors. These errors signal a fundamental mismatch, prompting the brain to infer that this isn’t just unfamiliar Russian — it’s something categorically different. The child’s generative model restructures, creating a new higher-order distinction: there can be different types of verbal expression — languages. That means there can be variants of language, the mother tongue Russian being one, and the foreign language being another.

The new language is not a completely blank slate — there are noticeable similarities in patterns of sound. This is a new language model, but it still belongs to the category of a language generative model, inheriting many of the properties of the existing one. The differences will be learned (updated into the generative model) through experiencing the new language and noting patterns and contrasts. This includes passively listening in different contexts and inferring salient differences and similarities. Over time, a branch of the generative model begins to form for Finnish. This new branch develops within a new hidden state, which we might label “languages,” which now contains variants such as Russian, Finnish and others. What began as confusion gives way to comprehension, and as more experience is gained in the foreign language, the overarching language model becomes further refined in terms of its common features, statistical patterns, and variables or parameters. This is Predictive Processing in action: a brain that learns by noticing when the world doesn’t fit.

Classroom Practice

To return to the second part of the original question — “how can teachers provoke students to make them (discernments)?”

Predictive Processing and Active Inference don’t yet dictate radically new pedagogical practices — but they provide a compelling unifying theory for many existing good practices, particularly those grounded in variation theory and explicit instruction approaches, such as modelling, guided practice, and scaffolding.

These theories acknowledge that teachers cannot directly “write” into a student’s generative model. Instead, we must create structured experiences where students encounter differences that their current models cannot resolve. This is where the brain’s natural learning engine kicks in: through surprise, prediction error, and iterative refinement.

But humans have a remarkable advantage: we can imagine, simulate, and talk about situations. Teachers can design symbolic, verbal, and visual environments that simulate experience. We can carefully guide attention, introduce contrast, and walk through comparisons and analogies to provoke model updates — all without the student needing to live through the full range of real-world experiences.

We can:

Set up moments of surprise (targeted prediction error) where current models will fail.
Structure contrasts and examples/non-examples to support correct generalisation.
Use diagrams, language, and structured rehearsal to scaffold and strengthen the new model.
Use spaced retrieval and feedback to consolidate model updates into long-term generative structure.

Predictive Processing and Active Inference suggest that we should design instruction to carefully manage surprise. We want to provoke salient prediction errors precisely where students need to notice a difference and revise their model. At the same time, we must ensure stability and predictability around the features we want students to see as shared. Instructional and external noise — irrelevant variation — can cause prediction errors in the wrong place, leading to unhelpful model updates.

This is why variation theory aligns beautifully with Predictive Processing. It helps us control contrast and sameness so that students draw the right distinctions. Clear diagrams, well-sequenced examples, and carefully chosen prompts scaffold the student’s inference process. Good instruction isn’t just “delivering content” — it’s engineering the environment so the student’s brain does the learning it was designed to do.

But initial discernment isn’t enough. The brain consolidates model changes only when they prove salient and useful over time. If a new generative structure is not reactivated soon enough, it decays. Reuse, rehearsal, and retrieval are essential.

This is where the Ebbinghaus forgetting curve, retrieval practice, and spaced repetition align directly with the neurobiology: synapses only strengthen through repeated successful activation. Weak or rapidly corrected prediction errors don’t trigger strong updates. The implication for teachers? Once students have drawn a distinction, don’t move on too quickly. Structured rehearsal/practice must explore the variants and similarities; it must introduce the effects of different contexts and different task structures. There needs to be consolidation time, where the newly formed model updates formed in the hippocampus and striatum are transferred to the long-term areas of the brain to start the embedding process for long-term use. This consolidation happens in wakeful rest and REM sleep. We have to revisit and reuse the newly learnt model updates in spaced retrieval practice so the brain gets the signal that these are important and need embedding for the long term. As we perform this retrieval practice, we can not only reactivate for increased salience, but we can refine our models through careful interleaving and use in differing contexts. I will revisit this in a future blog - working title: “How Learning Lasts -Why spacing works, and consolidation takes time, rest, and reactivation. The brain updates more like a plant growing than a hard drive writing data.”

These insights resonate with

Christian Moore-Anderson

’s book The Difference Maker, Ausubel’s meaningful learning theory as explained in

Sarah Cottinghatt

’s Ausubel’s Meaningful Learning in Action ,

Kristopher Boulton

’s Unstoppable Learning, and — worth revisiting — Engelmann’s Direct Instruction Model. Each of these models is supported by core Predictive Processing / Active Inference insight: learning happens through experience of surprise and similarity, through reuse and refinement, and through careful instructional design that aligns with how the brain naturally learns.

Self Reflection

For myself, I have found that having the heuristic ideas of managing predictability, designing surprise, thinking about how I can engineer students’ generative model updates and embedding them has improved my practice. Not only this, this same core theory/paradigm lets me understand and adapt to my neurodiverse and SEMH learners, helps me understand social relationships and emotions, helps me understand depression and trauma, and how to manage through dysregulation events with my students. Its gives me a greater level of empathy and understanding of my students and how different their lived versions of reality can be.

Let’s keep refining what this means for classrooms — and keep drawing distinctions ourselves, as teachers and thinkers. Please contribute your thoughts, ask questions, challenge, and move the debate on by responding in the comments.

Predictably, Correct

Discussion about this post