2026-04-26 · working manuscript, v2
Theory (epistemological): for any single observer, the existence of a distal stimulus can only be established through inference on proximal stimuli.
Claim. For any single observer \(O\), the existence of a distal stimulus \(D\) can only be established through inference on proximal stimuli received by \(O\).
Suppose, for contradiction, that observer \(O\) can establish the existence of some distal stimulus \(D\) without performing inference on any proximal stimulus.
By definition, a distal stimulus is an object or event external to the observer's sensory apparatus. The only channel by which information about \(D\) can reach \(O\) is through \(O\)'s sensory systems — that is, as proximal stimuli. (If information were to reach \(O\) by some other means, we would simply expand our definition of “sensory system” to include that channel.)
So either:
In both cases, \(O\) cannot establish the existence of \(D\) without inference on proximal stimuli.
Key distinction: this is an epistemological claim, not an ontological one. Distal stimuli may well exist in the world; the point is that an observer can only know about them through inference on proximal input. The traditional perceptual-psychology framing — posit a “real” distal world and ask how accurately the brain reconstructs it — quietly relies on a god's-eye comparison that no observer can actually perform.
The proof above establishes what we might call epistemic closure: the observer's world model is constructed entirely from inference on proximal stimuli. The question is: what kind of inference?
The idea that perception involves inference is not new. Helmholtz (1867) argued that perception consists of “unconscious inferences” — the brain draws conclusions about the distal world from ambiguous sensory data, using prior experience as a guide[1]. As he put it, sensory experiences are not direct representations of the world but rather learned signs that must be interpreted. Helmholtz's key claim was that many of these inferences are learned rather than innate[2]. This framework — perception as inverse inference — has been enormously productive and directly underlies the Likelihood Principle.
Our proof of epistemic closure can be understood as a formalization and radicalization of Helmholtz's insight. Helmholtz recognized that perception involves inference; we show that inference on proximal stimuli is the only epistemic channel available to any observer. From this stronger starting point, we can ask what kind of inferential architecture follows.
The Likelihood Principle states: given proximal input, the percept is the distal configuration most likely to have produced it. That is, the brain selects the percept that maximizes \(P(\mathrm{proximal} \mid \mathrm{distal})\). This is already an inferential framework, and it is compatible with the proof above. So why is it insufficient as a foundation?
The argument proceeds in three parts.
Part 1: The likelihood function is observer-relative. To evaluate \(P(\mathrm{proximal} \mid \mathrm{distal})\), the observer needs a model of how distal configurations produce proximal stimuli. But by epistemic closure, the observer has no direct access to distal configurations. The likelihood function must therefore be estimated from experience — that is, learned from prior proximal stimuli. It is a component of the inference machine, not an objective bridge to the external world. Different observers with different histories will construct different likelihood functions, and none of them can be checked against ground truth. Kido (2022) identifies precisely this gap in existing Bayesian perception frameworks, noting a disconnect between the process by which a posterior is derived from likelihood and prior, and the process by which the likelihood and prior themselves are learned from data[3].
Part 2: No convergence guarantee without external ground truth. The Likelihood Principle implicitly suggests convergence toward veridical perception: more sensory channels, more data, better percepts. But convergence requires that the likelihood function is well-calibrated, and calibration requires checking against the distal world. By epistemic closure, this check is impossible. The observer can only compare new proximal input against predicted proximal input — never against distal reality. This is sufficient for building useful models (the observer survives), but not for building true models (the world model matches the distal world). There will always be a structural disconnect between the world and the inference machine building the world model, and this disconnect is not a bug to be minimized but a permanent architectural feature of any observer system.
Hohwy (2013) arrives at a compatible conclusion from within the predictive processing tradition, arguing that the mind has a “fragile and indirect relation to the world” — we are deeply in tune with our environment yet structurally separated from it[8].
Part 3: Prediction error is the only available learning signal. Given epistemic closure and the action–perception loop (Axiom 2), the only feedback the observer ever receives is whether its predictions about future proximal stimuli were accurate. This makes prediction error the foundational learning signal. The likelihood function that the brain appears to compute is itself the product of accumulated prediction-error-driven learning. The Likelihood Principle describes the behavior of a well-trained prediction machine at a single time-slice; prediction-error minimization explains how the machine got there, what happens when it fails, and why it works at all.
This view finds substantial support in Friston's (2010) free-energy principle, which proposes that all biological systems minimize variational free energy — an information-theoretic bound on surprise — through continuous correction of their world models[4]. Under this framework, perception and action are two sides of the same optimization: the brain either updates its models to better predict sensory input (perception), or acts on the world to make sensory input conform to its predictions (active inference). The neural implementation of this principle has been modeled in detail. Rao and Ballard (1999) demonstrated that a hierarchical predictive coding network, where feedback connections carry predictions and feedforward connections carry prediction errors, reproduces known receptive-field properties in visual cortex[5]. Friston and Kiebel (2009) extended this to dynamical systems, showing how the brain could use hierarchical generative models to recognize and predict temporal sequences of sensory states[6].
The claim is not that the Likelihood Principle is wrong. It is that the Likelihood Principle is a derived property of a more fundamental prediction-error-minimizing system — analogous to how Newtonian mechanics is a special case of general relativity. The Likelihood Principle describes what the system does when it is operating in a well-behaved regime: rich sensory data, well-trained priors, familiar stimuli. But it cannot account for its own foundations (where did the likelihood function come from?), and it breaks down at the edges — novel stimuli, degraded input, and developmental learning.
Clark (2016) draws the same distinction in Surfing Uncertainty, characterizing minds as “prediction machines — devices that have evolved to anticipate the incoming streams of sensory stimulation before they arrive”[7]. The “predictive processing” framework Clark describes replaces the traditional feed-forward picture with one in which top-down predictions and bottom-up prediction errors are in constant exchange across a cortical hierarchy. Crucially, Clark emphasizes that this is not merely a Bayesian gloss on standard perception — it is a fundamentally different account of what the brain is doing: generating predictions and using their failures as the primary signal for learning and adaptation.
A defender of the Likelihood Principle might respond: “We never claimed the likelihood function was objective. We only claim the brain acts as if it computes likelihoods.” But this concession is exactly the point. Saying the brain “acts as if” it computes likelihoods is an observation, not an explanation. Why does it act that way? Because it is a prediction-error-minimizing system that, over sufficient training, builds an internal model whose behavior approximates likelihood computation. The “acts as if” is a consequence of Axiom 3, not a competing account. As a reviewer of Hohwy's work noted, “evidence for Bayesian perceptual psychology is not in itself evidence for the [predictive processing] framework” — the two operate at different levels of description[9].
This distinction is not merely philosophical — the two framings make different empirical predictions:
Illusions, perceptual biases, and clinical conditions like agnosia are not failures of a likelihood computation on this account. They are expected behavior of a prediction machine with a particular training history. The predictive framework explains both veridical perception (when priors are well-calibrated to the environment) and systematic error (when they are not) under a single mechanism.
Consider the Müller-Lyer illusion as a test case. The classic “carpentered world” hypothesis (Segall et al., 1966) proposed that susceptibility to the illusion varied across cultures because of differences in environmental exposure to rectilinear architecture[10]. If true, this would support a strong version of the Likelihood Principle: the likelihood function is calibrated to environmental statistics, and different environments produce different calibrations. However, recent comprehensive review by Amir and Firestone (2025) synthesizes evidence that the illusion appears across diverse animal species (birds, fish, reptiles, primates), persists with curved stimuli that lack any rectilinear features, and is experienced by congenitally blind individuals shortly after sight restoration — all of which challenge the cultural-calibration account[11]. From the predictive framework, this pattern is expected: the illusion reflects deep structural properties of the prediction machinery (likely phylogenetically installed priors), not a culture-specific likelihood function. The degree of susceptibility may vary with experience, but the basic phenomenon is architectural.
Axiom 1 (Epistemic Closure). The observer's world model consists exclusively of inferences drawn from proximal stimuli.
Axiom 2 (Action–Perception Loop). The observer's actions produce new proximal stimuli that serve as feedback.
Axiom 3 (Update Rule). The inference model is updated based on the discrepancy between predicted and actual proximal stimuli (prediction error).
These axioms replace the original, underspecified set:
Axiom 1 (Epistemic Closure) is the proof's conclusion elevated to axiomatic status. Axiom 2 (Action–Perception Loop) makes explicit that the observer's own actions generate new proximal stimuli, creating a closed loop. Axiom 3 (Update Rule) introduces prediction error as the specific learning signal — implicit in the original text (“getting yelled at by your mother”) but never stated. The closed loop is what makes the model predictive rather than merely reactive: inference → action → new proximal stimuli → comparison with prediction → update.
In this model, the observer reacts to proximal stimulus based on their internal inference machine. Whether the reaction was “good” or “bad” is again interpreted via proximal stimulus — in ancient times this probably meant death vs. survival, but in modern times getting yelled at by your mother is a better comparison. After each inference–action cycle, the observer's inference machine is updated through the prediction-error signal to attempt to make better inferences going forward.
This inference is what allows us to conclude that the barking we hear implies there is a dog around the corner. But getting into the semantics is what is really interesting.
Is all inference learned? The answer is clearly no.
Research on neonates (0–7 days from birth) demonstrates that the visual world of the newborn is already highly organized. Slater (1998) presents evidence that newborns respond to human faces not as arbitrary collections of stimulus elements but as faces specifically, and that they can form auditory-visual associations after only brief exposure — suggesting that innate capacities direct and facilitate early learning[12].
More broadly, Spelke and Kinzler (2007) have proposed that human cognition is founded on at least four innate core knowledge systems: (1) a system for representing objects and their mechanical interactions (governed by principles of cohesion, continuity, and contact), (2) a system for representing agents and their goal-directed actions, (3) a system for representing number and approximate magnitude, and (4) a system for representing the geometry of surrounding space. A fifth system, for representing social partners and group membership, may also be innate[13]. These systems are domain-specific, abstract, present across cultures, shared with nonhuman primates, and functional from the earliest months of life. They are not learned from scratch through proximal-stimulus exposure — they are present before significant learning has occurred.
This poses a challenge for our framework. If Axiom 3 (the Update Rule) states that the inference model is updated via prediction error, then where do these initial priors come from? They were not installed by prediction error within the observer's lifetime.
We adopt the position that innate priors are the product of evolutionary learning — prediction-error-driven optimization operating over phylogenetic time rather than ontogenetic time. Natural selection functions as an update rule: organisms whose innate priors produce poor predictions about the world (and therefore poor actions) are selected against. Organisms whose priors produce good predictions survive and reproduce, passing those priors to the next generation. Over evolutionary timescales, this process installs increasingly well-calibrated initial conditions into the inference machine.
This framing unifies innate and learned inference under a single principle — prediction error minimization — operating at two timescales:
We acknowledge that “evolutionary prediction error” is partly metaphorical — natural selection does not compute prediction errors in the same way a neural circuit does. But the functional logic is identical: variants that fail to predict their environment are eliminated, and the result over time is a system whose initial state is well-calibrated to environmental regularities. The analogy is precise enough to be productive, and we flag it as a framing device rather than a strict identity claim.
Spelke's core knowledge systems map neatly onto this account. The principles governing object representation (cohesion, continuity, contact) are predictions about the physical world that have been stable across mammalian evolutionary history. The agent-detection system reflects the survival value of rapidly identifying intentional actors. The approximate number system encodes magnitude comparisons useful for foraging and threat assessment. These are the priors that evolution has determined are too important — and too stable across environments — to leave to individual learning.
This leads to a striking implication: if innate priors are the product of a species' evolutionary history, then different species — or hypothetical alien observers — would have fundamentally different initial inference machines. Not merely different learned models of the world, but different starting axioms for perception itself. The core knowledge systems that Spelke identifies are not universal features of any possible observer; they are the specific priors that human evolutionary history has installed. An observer with a different evolutionary trajectory would begin with different core systems and would therefore construct a different world model from the same proximal stimuli.
This has implications beyond philosophy of mind. The specific structure of human innate priors may explain why human societies organize the way they do.
Consider what the prediction-error framework implies about motivation and reward. If the observer's fundamental drive is to minimize prediction error — to make the world model as accurate and comprehensive as possible — then anything that increases the observer's predictive power should be experienced as rewarding. Tools, technologies, and explanatory frameworks all serve this function: they extend the observer's ability to predict and control proximal stimuli. A hammer lets you predict the consequences of striking. A telescope extends your sensory range. A scientific theory lets you predict classes of phenomena you have never directly encountered.
This suggests that the human drive toward technology and understanding is not incidental but is a direct consequence of our inference architecture. We are organisms whose reward system is tuned to world-model expansion — to making more of the world predictable. Societies that develop technology, science, and complex social structures are not doing something extraneous to human cognition; they are doing exactly what a prediction-error-minimizing system would do when scaled up to populations that can share and accumulate knowledge.
If this account is correct, it generates an unsettling prediction about the trajectory of technological civilization. The drive toward technology is driven by prediction-error minimization: we build tools and systems that make the world more predictable and controllable. But what happens when the world is predictable and controllable — when the basic problems of survival, comfort, and environmental management are solved by mechanized systems?
In a fully mechanized world, the observer's prediction errors are already near-minimal for the domains that historically drove technological development. The inference machine has less to do. But the reward system is still tuned to prediction-error reduction. If there are no significant prediction errors left to resolve through technology, the system faces a choice:
The second trajectory is the darker prediction: that post-technological societies may tend not toward further complexity but toward voluntary cognitive simplification. The prediction machine, having solved the hard problems, optimizes for low prediction error by shrinking the space of things it tries to predict. Comfort, routine, algorithmically curated environments, and the offloading of cognitive work to automated systems all serve this function. The result is a world that is maximally predictable — and maximally unstimulating.
This is speculative, but it follows logically from the framework: if the reward signal is prediction-error minimization, and if technology makes the world predictable, then the system's equilibrium state is not “more technology” but “less to predict.”
The Gestalt psychologists identified a set of principles governing perceptual organization: proximity, similarity, good continuation, closure, common fate, and the overarching Law of Prägnanz (the tendency toward the simplest, most regular interpretation). The traditional explanations for why these principles hold fall into two camps:
Our framework offers a third account that subsumes both: Gestalt principles are descriptions of the prediction machine's priors — regularities that have been installed by evolutionary learning (phylogenetic prediction-error minimization) and refined by individual experience (ontogenetic prediction-error minimization). They are phenomenological summaries of what the system has learned, not rules it follows.
This reinterpretation has empirical support. Brunswik and Kamiya (1953) were the first to propose that Gestalt principles reflect statistical regularities of the natural visual environment[14]. Elder and Goldberg (2002) tested this directly by measuring the co-occurrence statistics of contour elements in natural images and found that the Gestalt principles of proximity and good continuation correspond to genuine statistical regularities: elements that are nearby and smoothly aligned are in fact more likely to belong to the same physical contour[15]. In other words, the prediction machine's priors match the statistics of the world it evolved in — exactly what the evolutionary-learning account predicts.
From the predictive framework, this is not surprising. A prediction-error-minimizing system that has been exposed to natural scenes — either over a lifetime or over evolutionary time — will develop internal models that exploit whatever statistical regularities reduce prediction error most efficiently. Proximity grouping works because spatially nearby elements in natural scenes do tend to belong to the same object. Good continuation works because physical contours are smooth. Common fate works because parts of a single object do move together. These are not arbitrary perceptual preferences; they are predictions about the structure of the world that happen to be accurate for the environments in which our inference machinery evolved.
The challenge to a purely learning-based account was that some Gestalt principles appear in neonates before significant visual experience. As discussed above, Slater (1998) demonstrated that newborns already show organized visual perception, and Spelke and Kinzler (2007) documented innate core knowledge systems for objects, agents, number, and geometry[12][13].
With the evolutionary-learning framework now in place, this is no longer a counterexample. Neonatal Gestalt responses are not evidence that these principles are “hardwired rules” independent of learning. They are evidence that phylogenetic learning — natural selection operating as a prediction-error update rule across generations — has installed these priors as the initial state of the inference machine. The principles of proximity, continuity, and common fate reflect statistical regularities of the physical world that have been stable across mammalian evolutionary history. They were too important, and too stable across environments, to leave to individual learning.
The ontogenetic learning that follows birth then refines these priors. An infant born with a rough prior for proximity grouping will, through prediction-error-driven experience, sharpen that prior to the specific spatial statistics of its environment. The phylogenetic prior gets the system into the right neighborhood; ontogenetic learning fine-tunes it.
This reinterpretation changes the explanatory direction. The traditional account says: “we perceive this way because of the Law of Prägnanz.” Our account says: “we perceive this way because our prediction machine — shaped by evolutionary and individual learning — has built priors that happen to match the statistical structure of natural environments. The Law of Prägnanz is a description of what those priors produce, not a cause of perception.”
The same logic applies to the Likelihood Principle: saying the brain selects the most likely interpretation is a description of a well-calibrated prediction machine in action. And it applies to Prägnanz: saying perception tends toward simplicity is a description of what a system optimized for prediction-error minimization does when its priors match the environment — simple interpretations are the ones that minimize prediction error because the world itself has regular structure.
All three — Gestalt principles, the Likelihood Principle, and Prägnanz — collapse into consequences of a single underlying mechanism: prediction-error minimization operating on priors installed by evolutionary and individual learning.
If Gestalt principles are priors shaped by evolutionary history, then a genuinely alien observer — one with a different evolutionary trajectory in a different environment — would exhibit different “Gestalt principles.” An organism evolved in a world dominated by fluid dynamics rather than rigid bodies might not group by proximity or common fate in the way we do, because the statistical regularities of its ancestral environment would be different. Its prediction machine would have been calibrated to different ecological statistics, and its perceptual organization would reflect those statistics.
This reinforces the species-specific-priors argument from Section 4.3: the way we carve the world into objects, surfaces, agents, and groups is not a universal feature of perception but a reflection of our evolutionary history. Gestalt principles are human Gestalts.
The argument now proceeds as follows:
What remains: