Sparse Neural Networks as a Computational Analogue of Controlled-to-Automatic Processing

2026-05-05 ·

Abstract

The distinction between fluid and crystallized intelligence — the capacity to reason through novel problems versus the ability to deploy acquired knowledge — has been a cornerstone of differential psychology since Cattell[1]. A parallel distinction exists in cognitive psychology between controlled processing, which is slow, serial, and resource-intensive, and automatic processing, which is fast, parallel, and effortless[2][3]. In artificial intelligence, a strikingly analogous trajectory emerges during neural network training and subsequent pruning: a dense, highly plastic network is trained on a task and then reduced to a sparse, efficient subnetwork that preserves or even exceeds the original performance[4][5]. Ill try to argue that the progression from dense-to-sparse representations in artificial neural networks provides a useful computational analogue for the controlled-to-automatic processing transition observed in human cognition.

1. Introduction

Understanding the computational basis of human intelligence remains one of the central challenges of cognitive science. Two largely independent research traditions have converged on a common observation: efficient performance on cognitive tasks is achieved not by deploying more resources, but by deploying fewer, better-organized ones. In differential psychology, this insight is captured by the finding that individuals with higher intelligence exhibit lower cortical activation during task performance — the neural efficiency hypothesis[6][7]. In deep learning, the same principle is formalized by the lottery ticket hypothesis, which demonstrates that large neural networks contain sparse subnetworks capable of matching or exceeding the full network's performance when trained in isolation[4].

I dont think that these parallels are not coincidental. I think that the training-and-pruning pipeline of modern neural networks — in which a dense, overparameterized network is first trained and then reduced to a sparse, task-specific circuit — provides a computational analogue for the well-characterized transition from controlled to automatic processing in human cognition[2][3][9]. I want to extend this analogy to the psychometric distinction between fluid and crystallized intelligence[1][10], implying that these two broad ability factors correspond to different phases of the dense-to-sparse trajectory.

Section 2 reviews theory of fluid and crystallized intelligence. Section 3 summarizes the controlled–automatic processing framework. Section 4 describes the relevant neural network pruning literature, and the lottery ticket hypothesis. Section 5, The core analogy, and a comparison with developmental synaptic pruning and the neural efficiency hypothesis. Section 6 derives testable predictions. Section 7 discusses limitations and future directions.


2. Fluid and Crystallized Intelligence

The theory of fluid and crystallized intelligence, introduced by Cattell[1] and refined by Horn and Cattell[10][11], partitions general cognitive ability into two broad factors. Fluid intelligence (\(G_f\)) refers to the capacity to reason and solve novel problems independent of previously acquired knowledge. It is measured by tasks such as matrix reasoning, pattern recognition, and abstract series completion. Crystallized intelligence (\(G_c\)) refers to the breadth and depth of knowledge and skills acquired through experience and education. It is assessed through vocabulary tests, general knowledge, and domain-specific expertise.

These two factors display distinct developmental trajectories. Fluid intelligence peaks in early adulthood and declines gradually thereafter, while crystallized intelligence continues to increase through middle age and remains relatively stable into old age[11][12]. This dissociation has been attributed to the biological substrate underlying \(G_f\) — including processing speed, working memory capacity, and the integrity of prefrontal cortical circuits — being more sensitive to age-related neural decline than the distributed cortical representations supporting \(G_c\)[13].

Critically, the two forms of intelligence are not independent. Fluid intelligence is thought to facilitate the acquisition of new knowledge and skills, which, once consolidated, become part of the crystallized repertoire. As Cattell noted[1], \(G_c\) can be understood as the “historical product” of \(G_f\) applied to learning opportunities over time. This investment model implies a directional relationship: fluid processes build the crystallized store, and the crystallized store, once established, reduces the need for fluid engagement with familiar problems.


3. Controlled and Automatic Processing

The transition from effortful, deliberate reasoning to fast, effortless execution is one of the most robust findings in cognitive psychology. Schneider and Shiffrin[2][3] formalized this distinction in their dual-process theory of human information processing, identifying two qualitatively different modes of cognition.

Controlled processing is slow, serial, capacity-limited, and requires active attentional engagement. It is flexible — it can be adapted to new task demands on the fly — but it is resource-intensive, fatiguing, and bottlenecked by working memory. Controlled processing dominates during the early stages of learning, when the mapping between stimuli and responses is novel or inconsistent.

Automatic processing is fast, parallel, relatively effortless, and operates without conscious attentional control. Once developed through extensive practice with consistent stimulus–response mappings, automatic processes are difficult to suppress or modify. They consume minimal cognitive resources, freeing capacity for other operations[2][14].

Fitts and Posner[9] described three stages through which skill acquisition proceeds from controlled to automatic performance. In the cognitive stage, the learner must devote substantial attention to understanding the task structure and generating appropriate responses. In the associative stage, performance becomes more consistent, errors decrease, and the learner begins to refine strategies. In the autonomous stage, performance is fast, accurate, and largely independent of conscious control.

Neuroimaging studies have confirmed the neural correlates of this transition. Early learning activates a broad, distributed network including prefrontal and anterior cingulate cortices — regions associated with executive control and error monitoring[15][16]. As automaticity develops, activation in these control regions decreases, and task-relevant processing becomes localized to more specialized cortical areas[16][17]. In short, the brain achieves expertise by becoming sparser in its activation patterns.

Correspondence between cognitive stages and neural network stages Human Cognition Artificial Network Controlled Processing slow, effortful, flexible Associative Stage improving, consolidating Automatic Processing fast, effortless, rigid practice mastery Dense Network (Training) overparameterized, plastic Task-Specific Pathways weights converging Sparse Pruned Network efficient, task-specific convergence pruning analogy analogy analogy
Figure 1. Proposed correspondence between stages of human cognitive skill acquisition (top) and the training-and-pruning pipeline of artificial neural networks (bottom). Both trajectories proceed from a resource-intensive, flexible state to an efficient, specialized one.

4. Neural Network Pruning and the Lottery Ticket Hypothesis

Modern deep neural networks are typically overparameterized: they contain far more trainable weights than are strictly necessary to learn the target function. This overparameterization facilitates optimization — gradient descent can more easily find good solutions in high-dimensional parameter spaces — but it results in networks that are computationally expensive and memory-intensive at inference time.

Network pruning addresses this inefficiency by removing weights, neurons, or entire filters from a trained network while preserving accuracy[18][5][19]. Pruning techniques can reduce parameter counts by 90% or more without significant performance degradation, producing sparse networks that are faster and cheaper to deploy[5][20].

4.1. The Lottery Ticket Hypothesis

Frankle and Carbin[4] made a surprising discovery that reframes the role of pruning. They demonstrated that dense, randomly initialized networks contain sparse subnetworks — termed winning tickets — that, when trained in isolation from their original initialization, can match or exceed the full network's test accuracy in a comparable number of training iterations. Formally:

Lottery Ticket Hypothesis. A randomly-initialized, dense neural network contains a subnetwork that is initialized such that — when trained in isolation — it can match the test accuracy of the original network after training for at most the same number of iterations.

The procedure for identifying winning tickets involves four steps: (1) randomly initialize a dense network \(f(x;\, \theta_0)\); (2) train it for \(j\) iterations, arriving at parameters \(\theta_j\); (3) prune \(p\)% of the smallest-magnitude weights, creating a binary mask \(m\); and (4) reset the surviving weights to their original values \(\theta_0\), yielding the winning ticket \(f(x;\, m \odot \theta_0)\).

Crucially, the initialization matters. When the surviving weights are randomly reinitialized rather than reset to \(\theta_0\), performance degrades substantially. This demonstrates that winning tickets succeed not merely because of their architecture (which connections survive) but because of a fortunate combination of architecture and initial weight values.

These winning tickets are typically 10–20% the size of the original network. They learn faster (reaching minimum validation loss in fewer iterations), achieve higher test accuracy, and generalize better than the full dense network[4].

Dense versus sparse network architectures Dense Network prune Sparse Network
Figure 2. Dense versus sparse network architectures. Left: a fully connected network with all neurons and connections active (overparameterized). Right: after pruning, inactive neurons (dashed circles) and connections (faint dashed lines) are removed, leaving only the task-critical subnetwork — the “winning ticket”[4]. The sparse network achieves comparable or superior performance with a fraction of the parameters.

5. Bridging the Analogy: From Artificial Pruning to Cognitive Efficiency

I think that the parallels between artificial neural network pruning and human cognitive development are structural, not merely metaphorical.

5.1. Developmental Synaptic Pruning

The human brain undergoes a process remarkably similar to the training-and-pruning pipeline of artificial networks. Huttenlocher[8] demonstrated that synaptic density in the human frontal cortex increases rapidly after birth, reaching a peak approximately 50% above adult levels by age 1–2 years. This initial overproduction of synapses is followed by a prolonged period of synaptic elimination — synaptic pruning — that continues through adolescence and into early adulthood[21][22]. The process is not random: activity-dependent mechanisms selectively eliminate synapses that are weakly activated while strengthening those that participate in frequently used circuits[23].

This biological trajectory mirrors the artificial case precisely. The developing brain begins in a dense, overconnected state (analogous to a randomly initialized, overparameterized network). Through experience-dependent activity (analogous to training), certain pathways are strengthened while others weaken. Finally, the weakened connections are eliminated (pruned), leaving a sparser, more efficient circuit[24][25].

5.2. The Neural Efficiency Hypothesis

Haier and colleagues[6] first observed that individuals who scored higher on intelligence tests exhibited lower cortical glucose metabolism during cognitive task performance, as measured by positron emission tomography (PET). This counterintuitive finding — that smarter brains work less hard — was termed the neural efficiency hypothesis. In a follow-up study, Haier et al.[26] demonstrated that practice on a novel task (the video game Tetris) led to decreased cortical activation over time, with the magnitude of the decrease correlating with performance improvement.

Neubauer and Fink[7] conducted a comprehensive review of 54 neuroimaging studies and confirmed that the neural efficiency effect is robust for tasks of low to moderate difficulty, though it reverses for very demanding tasks, where higher-ability individuals may recruit additional resources. This pattern is consistent with the observation that pruned neural networks perform well within their trained domain but may lack the capacity to generalize to substantially different tasks without retraining.

The neural efficiency hypothesis provides a direct neural correlate for the analogy proposed here. If expertise involves the progressive sparsification of cortical activation patterns — fewer neurons, firing more selectively — then the artificial pruning pipeline captures this process in a mathematically tractable form.

5.3. Mapping the Analogy

Table 1 summarizes the proposed correspondences between artificial neural network stages and human cognitive constructs.

Table 1. Proposed correspondences between neural network processing stages and human cognitive constructs.
Neural Network Stage Cognitive Analogue Characteristics
Dense, randomly initialized network Controlled processing; novel task engagement (\(G_f\)) Overparameterized, plastic, resource-intensive, slow convergence
Trained network with defined pathways Associative stage; consolidation Weights converged, task-specific pathways strengthened, some redundancy remains
Pruned sparse network (winning ticket) Automatic processing; crystallized skill (\(G_c\)) Efficient, fast inference, minimal redundancy, rigid and task-specific

6. Predictions and Testable Hypotheses

If the neural network pruning framework is applicable to the controlled-to-automatic processing transition, I would expect the following observable patterns in human cognition.

6.1. Prediction 1: Three Phases of Learning Efficiency

Learning a new skill should display three distinct phases that parallel the dense-to-sparse pipeline:

(a) An initial learning period characterized by slow performance, high error rates, high metabolic cost (broad cortical activation), and maximal plasticity. This corresponds to the training phase of a dense network, where gradients are large and many parameters are being updated simultaneously.

(b) An intermediate “new expert” period characterized by improved speed and accuracy, reduced but still elevated metabolic cost, and decreasing plasticity. This corresponds to a trained but unpruned network, where task-specific pathways have been identified but redundant connections remain.

(c) A mature “old expert” period characterized by fast, accurate, and metabolically efficient performance, with minimal plasticity and reduced adaptability to task variations. This corresponds to the pruned winning ticket.

These three phases map directly onto the cognitive, associative, and autonomous stages of Fitts and Posner[9], but the neural network framework adds a quantitative prediction: the efficiency gain (measured, for instance, as accuracy per unit of metabolic expenditure) should follow a trajectory resembling the test accuracy versus network sparsity curves reported by Frankle and Carbin[4].

6.2. Prediction 2: Differential Intelligence Profiles

Individuals high in fluid intelligence (\(G_f\)) and those high in crystallized intelligence (\(G_c\)) should display different performance signatures across the three phases.

Individuals with high \(G_f\) should exhibit advantages during the initial learning period — faster convergence, fewer errors, more efficient exploration of the task space. In network terms, they may possess superior “initial weight configurations” or more effective optimization dynamics, enabling them to identify useful pathways more rapidly in a dense parameter space.

Individuals with high \(G_c\) should exhibit advantages during the expert periods — faster retrieval, lower metabolic cost, more efficient execution of well-practiced skills. In network terms, they have successfully pruned their task-relevant circuits, retaining only the most efficient subnetwork.

6.3. Prediction 3: The Role of Selective Inhibition

An important asymmetry between artificial networks and adult human brains warrants attention. Artificial networks begin training with no prior knowledge — their initial weights are random. Adult humans, by contrast, bring a vast repertoire of previously learned representations to every new task. This suggests that a key component of fluid intelligence may not be the ability to learn de novo, but rather the ability to selectively inhibit irrelevant prior knowledge — to, in effect, identify which existing connections should be suppressed when facing a novel problem.

In pruning terms, high-\(G_f\) individuals may excel at rapidly generating a task-appropriate mask \(m\) that suppresses interference from previously learned circuits. This prediction aligns with evidence that working memory capacity — a strong correlate of \(G_f\) — is closely tied to the ability to resist interference from irrelevant information[14].

6.4. Prediction 4: Sparsity–Rigidity Trade-off

The lottery ticket hypothesis demonstrates that winning tickets, while efficient, are specific to their trained task. Pruned networks lose the capacity to generalize to substantially different tasks — a consequence of having eliminated the redundant connections that might have supported alternative solution pathways.

This predicts an analogous trade-off in human expertise. Highly practiced skills should become both more efficient (lower metabolic cost, faster execution) and more rigid (resistant to modification, vulnerable to disruption by task variations). Expert performance should break down when task parameters change in ways that require re-engaging the pruned-away pathways. This is consistent with the well-documented phenomenon of the “Einstellung effect” — the tendency for experts to apply familiar solutions even when they are suboptimal for the current problem.


7. Discussion

The framework proposed here offers a unified computational vocabulary for phenomena that have been studied in relative isolation: fluid and crystallized intelligence in psychometrics, controlled and automatic processing in cognitive psychology, neural efficiency in cognitive neuroscience, and synaptic pruning in developmental neurobiology. By mapping these phenomena onto the training-and-pruning pipeline of artificial neural networks, I gain not only a useful metaphor but also a source of quantitative predictions and formal tools.

Several limitations should be acknowledged. First, the analogy is structural, not mechanistic. Artificial neural networks learn by backpropagation; biological networks learn through Hebbian plasticity, neuromodulation, and a host of other processes that have no direct counterpart in standard deep learning. The claim is not that the brain implements magnitude-based weight pruning, but that the functional outcome — progressive sparsification in the service of efficiency — is shared.

Second, the lottery ticket hypothesis has primarily been demonstrated in feed-forward and convolutional architectures for supervised classification tasks. The human brain employs recurrent, feedback-rich connectivity and faces far more diverse and open-ended task demands. Extensions of the lottery ticket hypothesis to recurrent networks, reinforcement learning, and continual learning settings would strengthen the analogy considerably.

Third, the neural efficiency hypothesis is itself subject to important moderating variables. Neubauer and Fink[7] report that the inverse relationship between intelligence and cortical activation holds primarily for tasks of low to moderate difficulty and may reverse for highly demanding tasks. This suggests that the “pruned network” mode of operation has boundary conditions — under sufficient task demands, even efficient systems must recruit additional (perhaps previously pruned) resources.

Despite these caveats, the framework generates novel research directions. Computational experiments could directly test whether the relationship between network sparsity and performance recapitulates the relationship between cortical activation sparsity and cognitive efficiency. Neuroimaging studies could examine whether the development of automaticity is accompanied by structural changes (e.g., in white matter connectivity or synaptic density) that parallel the pruning masks identified in artificial networks. Individual-differences research could investigate whether \(G_f\) and \(G_c\) predict differential patterns of network-level reorganization during skill acquisition, as measured by dynamic functional connectivity[17].


References

  1. Cattell, R.B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 54(1), 1–22. doi:10.1037/h0046743.
  2. Schneider, W. & Shiffrin, R.M. (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 84(1), 1–66.
  3. Shiffrin, R.M. & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84(2), 127–190.
  4. Frankle, J. & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. Proceedings of the International Conference on Learning Representations (ICLR). arXiv:1803.03635.
  5. Han, S., Pool, J., Tran, J., & Dally, W.J. (2015). Learning both weights and connections for efficient neural networks. Advances in Neural Information Processing Systems, 28, 1135–1143.
  6. Haier, R.J., Siegel, B.V., Nuechterlein, K.H., Hazlett, E., Wu, J.C., Paek, J., Browning, H.L., & Buchsbaum, M.S. (1988). Cortical glucose metabolic rate correlates of abstract reasoning and attention studied with positron emission tomography. Intelligence, 12(2), 199–217.
  7. Neubauer, A.C. & Fink, A. (2009). Intelligence and neural efficiency. Neuroscience & Biobehavioral Reviews, 33(7), 1004–1023.
  8. Huttenlocher, P.R. (1979). Synaptic density in human frontal cortex — developmental changes and effects of aging. Brain Research, 163(2), 195–205.
  9. Fitts, P.M. & Posner, M.I. (1967). Human Performance. Belmont, CA: Brooks/Cole.
  10. Horn, J.L. & Cattell, R.B. (1966). Refinement and test of the theory of fluid and crystallized general intelligences. Journal of Educational Psychology, 57(5), 253–270.
  11. Horn, J.L. & Cattell, R.B. (1967). Age differences in fluid and crystallized intelligence. Acta Psychologica, 26, 107–129.
  12. McGrew, K.S. (2009). CHC theory and the human cognitive abilities project: Standing on the shoulders of the giants of psychometric intelligence research. Intelligence, 37(1), 1–10.
  13. Deary, I.J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201–211.
  14. Schneider, W. & Chein, J.M. (2003). Controlled & automatic processing: behavior, theory, and biological mechanisms. Cognitive Science, 27(3), 525–559.
  15. Petersen, S.E., van Mier, H., Fiez, J.A., & Raichle, M.E. (1998). The anterior cingulate cortex mediates processing selection in the Stroop attentional conflict paradigm. Proceedings of the National Academy of Sciences, 95(3), 853–860.
  16. Chein, J.M. & Schneider, W. (2005). Neuroimaging studies of practice-related change: fMRI and meta-analytic evidence of a domain-general control network for learning. Cognitive Brain Research, 25(3), 607–623.
  17. Bassett, D.S., Wymbs, N.F., Porter, M.A., Mucha, P.J., Carlson, J.M., & Grafton, S.T. (2011). Dynamic reconfiguration of human brain networks during learning. Proceedings of the National Academy of Sciences, 108(18), 7641–7646.
  18. LeCun, Y., Denker, J.S., & Solla, S.A. (1990). Optimal brain damage. Advances in Neural Information Processing Systems, 2, 598–605.
  19. Li, H., Kadav, A., Durdanovic, I., Samet, H., & Graf, H.P. (2017). Pruning filters for efficient convnets. Proceedings of the International Conference on Learning Representations (ICLR).
  20. Zhu, M. & Gupta, S. (2017). To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878.
  21. Huttenlocher, P.R. (1990). Morphometric study of human cerebral cortex development. Neuropsychologia, 28(6), 517–527.
  22. Feinberg, I. (1982). Schizophrenia: Caused by a fault in programmed synaptic elimination during adolescence? Journal of Psychiatric Research, 17(4), 319–334.
  23. Rakic, P., Bourgeois, J.-P., Eckenhoff, M.F., Zecevic, N., & Goldman-Rakic, P.S. (1986). Concurrent overproduction of synapses in diverse regions of the primate cerebral cortex. Science, 232(4747), 232–235.
  24. Bullmore, E.T. & Sporns, O. (2012). The economy of brain network organization. Nature Reviews Neuroscience, 13(5), 336–349.
  25. Sporns, O. & Zwi, J.D. (2004). The small world of the cerebral cortex. Neuroinformatics, 2(2), 145–162.
  26. Haier, R.J., Siegel, B.V., Tang, C., Abel, L., & Buchsbaum, M.S. (1992). Intelligence and changes in regional cerebral glucose metabolic rate following learning. Intelligence, 16(3–4), 415–426.

← Back to blog