Relevance, prior knowledge, and why applying underlying theory matters
Predictive Processing as cognitive architecture, not an interpretive lens
Recent discussion of the Buchin and Mulligan study has converged on a shared surface conclusion: the knowledge taught in A, B and C did not meaningfully support learning D. On that point, there is little disagreement. Where the discussion becomes more interesting is not in that diagnosis, but in what we think it explains, and what we take to be left unexplained.
My purpose in writing about this study was not to highlight a flawed design, nor to offer an alternative narrative gloss on its null result. It was to make a more general point: research on learning cannot be methodologically adequate unless it is grounded in a clear theory of the cognitive mechanisms that are assumed to be operating. In this case, the key issue is not simply whether ideas “interlock”, but how the cognitive system determines, in real time, what counts as relevant prior knowledge at all.
It has been said that prior knowledge supports learning only when ideas genuinely interlock. This sounds like common sense, and at one level it is. But “interlock” is not an explanation. It names an outcome without specifying a mechanism. Ideas do not become relevant to one another merely by virtue of their content or their position within a domain. They must be selected, activated, and weighted as useful during online inference. A purely structural account of content dependencies can tell us which concepts are logically related, but it cannot tell us how a learner’s cognitive system decides which of its many available representations will actually shape learning in the moment.
This is the explanatory gap that Predictive Processing addresses. It treats prior knowledge not as a static store that is either present or absent, but as a set of competing expectations whose influence depends on their capacity to reduce uncertainty in a given context. On this view, relevance is not an effect, a strategy, or a pedagogical add-on. It is a core feature of the predictive architecture itself.
Seen through this lens, the Buchin and Mulligan finding is neither paradoxical nor especially surprising. It does not show that domain-adjacent knowledge has no bearing on new learning in any absolute sense. Rather, it shows that the predictive system had access to other priors that were sufficient, or even superior, for making sense of the target material. Knowledge taught in A, B and C may have been usable in principle, but it was not privileged. The system simply did not need it.
This distinction matters. Saying that prior knowledge “has no bearing” on learning is too strong. A more precise claim is that it was not selected. From a Predictive Processing perspective, learners are inference-maximising systems. They will recruit whatever expectations they already have that best minimise prediction error in the current task. Previously taught material may lose out to more general, more entrenched, or more precise priors drawn from elsewhere in the learner’s generative model.
Once this is recognised, a deeper methodological challenge comes into view. If learners actively and opportunistically recruit whatever priors work best, then experimentally isolating the contribution of specific prior knowledge becomes inherently difficult. You cannot assume that supplying particular information guarantees that it will be used. The system will route around it if alternative expectations already do the job. Many null results, from this perspective, are not failures of instruction but predictable consequences of how cognition is organised.
This is why the issue here is not merely methodological but theoretical. Research designs always rest on assumptions about cognitive architecture, whether those assumptions are explicit or not. In this case, the implicit assumption is that domain adjacency is a reasonable proxy for relevance, and that knowledge, once taught, is available for deployment unless something blocks it. Predictive Processing challenges that assumption at its root. It insists that relevance must be explained, not assumed.
Importantly, this is not a matter of relabelling common sense in more elaborate terms. Predictive Processing is not being used here as an interpretive flourish or an obscure lens. Its value lies in specifying the mechanisms that make our everyday intuitions about relevance reliable in the first place. When we say that ideas “interlock”, we are already presupposing a system that selects, weights, and suppresses competing representations. Predictive Processing provides an account of how such a system works.
More broadly, Predictive Processing offers a single organising principle for perception, action, attention, emotion and affect, and learning: the continual minimisation of prediction error through hierarchical inference. Within such a framework, concepts like prior knowledge, relevance, attention, and transfer are not separate problems to be handled piecemeal. They are different expressions of the same underlying architecture.
By contrast, much of what currently sits under the banner of the “science of learning”, particularly cognitivist information-processing models such as Cognitive Load Theory, has developed as a patchwork of metaphors, effects, and partial mechanisms. These approaches have been genuinely useful in moving education away from ideology and towards empirical constraint. But they struggle to specify how knowledge is selected, how competing representations are resolved, or why apparently sensible manipulations so often fail to produce measurable effects. As a result, relevance is typically treated as a property of content rather than a property of the cognitive system.
I will continue to use this space to explore how the core principles of Predictive Processing and Active Inference can provide a theoretical foundation for learning-related research and practice. My interest is not in adding yet another lens to an already crowded field, but in arguing for a unifying account of cognition that is grounded in the physical architecture of the brain itself. Such an account allows null results to be theoretically informative rather than merely disappointing, and demands greater precision in how we define constructs like prior knowledge and relevance.
I think that education will, over time, move towards a more explicitly predictivist view of cognition, drawing on Predictive Processing, Active Inference, and related theories of inference-based control. Not because this shift is fashionable, but because it offers a level of coherence and explanatory depth that our current patchwork of models lacks. As Carl Hendrick has foreshadowed, ”Been reading a lot about predictive processing and I think it’s almost certainly going to be the next frontier of learning science and instructional design.”, but with the qualification offered by Becky Allen that “It’s going to be a messy revolution!” My aim here is not to declare that shift complete, but to begin doing the work it requires: clarifying concepts, examining assumptions, and showing how this way of thinking can inform both experimental design and classroom practice.
Consider this an open invitation. I will continue to offer explanations and illustrations that make predictive accounts of learning concrete and usable. I will gladly accept suggestions and questions and do my best to address and explain them. Those who are interested in where the science of learning might go next are very welcome to follow along.

