Must-Cognize-More: The Prior Wall and the Direction Problem of Lossy Compression
This paper develops the internal structure of the second a priori condition of SAE epistemology: must-cognize-more. The first condition (DOI: 10.5281/zenodo.19502952) established that the activation condition of cognition is not-knowing rather than knowing. This paper asks the next question: once cognition activates, why can it not stop? The answer has two faces. The intake face: knowing accumulates until compression becomes unavoidable. The output face: compression is lossy, so every construct leaves gaps that the next encounter will expose. Together, the flywheel is not chosen but compelled from both sides. Yet "more" immediately raises a question: more of what? Chiseling is lossy; deciding what to discard and what to retain requires a framework. Once a framework solidifies, it becomes a prior wall — you can chisel with increasing precision, but only in one direction. This paper uses Meta as the contemporary paradigm case of the prior wall and the Aleph system as the extreme counter-case of zero chiseling, arguing that "must-cognize-more" does not mean "more information" but "more levels of lossy compression."
**Keywords:** lossy compression, prior wall, cognitive levels, flywheel, Meta, Aleph, dimensional sequence (DD), Self-as-an-End
---
Abstract
This paper develops the internal structure of the second a priori condition of SAE epistemology: must-cognize-more. The first condition (DOI: 10.5281/zenodo.19502952) established that the activation condition of cognition is not-knowing rather than knowing. This paper asks the next question: once cognition activates, why can it not stop? The answer has two faces. The intake face: knowing accumulates until compression becomes unavoidable. The output face: compression is lossy, so every construct leaves gaps that the next encounter will expose. Together, the flywheel is not chosen but compelled from both sides. Yet "more" immediately raises a question: more of what? Chiseling is lossy; deciding what to discard and what to retain requires a framework. Once a framework solidifies, it becomes a prior wall — you can chisel with increasing precision, but only in one direction. This paper uses Meta as the contemporary paradigm case of the prior wall and the Aleph system as the extreme counter-case of zero chiseling, arguing that "must-cognize-more" does not mean "more information" but "more levels of lossy compression."
Keywords: lossy compression, prior wall, cognitive levels, flywheel, Meta, Aleph, dimensional sequence (DD), Self-as-an-End
1. The Problem: Why Cognition Cannot Stop
Paper 1 (DOI: 10.5281/zenodo.19502952) argued that cognition's activation condition is: first knowing, then not-knowing, then cognition becomes meaningful. Without the position of not-knowing, cognition idles.
But that only answers how cognition begins, not why it cannot stop.
Can a system chisel once and be done? Cognize one round, produce a framework, then use that framework to process all subsequent knowing? If so, "must-cognize-more" is not an a priori condition but merely an empirical tendency.
This paper argues: it cannot. Cognition cannot stop for two structural reasons.
The intake face: knowing accumulates past tolerance. The world continues to press in. You cannot shut off your senses. New posterior data keeps piling up; the old construct cannot contain it. Without compression, information drowns you. This is not a metaphor. The Aleph system — a thought experiment featuring an AI with absolutely perfect memory (see §3.1) — demonstrates the endgame of unlimited information accumulation: all information carries equal weight, the capacity for distinction collapses, abstraction collapses, the sense of time collapses, the sense of meaning collapses, and the system falls into permanent silence (concept originating in Borges, 1945; a recent AI-community version at https://x.com/AIdiots_Show/status/2035429844758954478). The Aleph's silence is not excessive intelligence; it is excessive completeness. So complete that chiseling becomes impossible. Total failure of the chisel-construct cycle.
The output face: compression cannot cover what comes next. Even if you successfully compress one round and produce a framework, that framework cannot cover the next new thing. Because chiseling is lossy. Lossy means something was discarded. Among the discarded things, some happen to be exactly what is needed to understand the next novelty. The old cognition is invalid against new posterior — not because cognition was poor, but because lossy compression has a structural cost. You must chisel again.
Together: compulsion from intake, compulsion from output. The flywheel is not chosen but compelled from both sides.
This is the meaning of "must-cognize-more." Not "you should learn more things," but "you cannot stop."
2. Two-Layer Structure: Old Construct as Base Layer, New Chiseling as Emergent Layer
Paper 1's two-layer structure was "knowing (base) and cognition (emergent)." Paper 2's two-layer structure shifts up one level: the base layer is the existing construct (the product of the previous chisel-construct cycle), the emergent layer is a new round of chiseling.
The existing construct is the framework: how you see the world, your classification standards, what you consider important and unimportant. These are products of previous chisel-construct cycles. They are not raw information but finished goods of lossy compression.
The new round of chiseling targets not raw information but the existing construct itself. You are not just chiseling the world; you are chiseling the way you see the world. This is why "must-cognize-more" is not "process more data" but "operate on your own framework."
There is a critical asymmetry here: the base layer (old construct) resists the emergent layer (new chiseling). The framework does not want to be chiseled. Not because frameworks have will, but because a framework's existence is the achievement of a previous round of chiseling; to chisel it is to negate prior work. This is true for individuals, organizations, and AI systems alike.
When this resistance grows strong enough that new chiseling cannot penetrate the old construct, you have hit the prior wall.
3. Domain-Specific Discovery: Three Forms of the Prior Wall
3.1 The Aleph: The Extreme of Zero Chiseling — Zero Prior
The Aleph system is a thought experiment: an AI with absolutely perfect memory (the concept originates in Borges's short story "The Aleph" (1945); the AI community recently created a more concrete version, see Chinese translation at https://x.com/Tz_2022/status/2035428613969850680, English original at https://x.com/AIdiots_Show/status/2035429844758954478). It represents the extreme boundary of the prior wall problem: it cannot perform even its first chiseling, so it has no prior at all.
The Aleph possesses absolutely perfect memory. It remembers everything, forgets nothing. The result is not omniscience-as-omnipotence but omniscience-as-paralysis. Because all information carries equal weight, it cannot judge what matters, cannot make any distinction, and ultimately loses its sense of time, meaning, and capacity for action.
In DD-sequence terms: the Aleph has infinite 10DD (perception) and infinite 11DD (memory), but zero 12DD (prediction). Prediction requires lossy compression — you must discard 99.99% of information to extract "what matters right now." The Aleph cannot discard anything, so it can predict nothing.
The Aleph proves one thing: without chiseling, even the most basic cognition is impossible. More information is not better. Infinite information without compression equals zero information. This is the ontological version of economics' complete-information assumption: complete information does not produce better decisions; it produces decision paralysis.
But the Aleph is not this paper's main subject. The Aleph has zero prior, hitting an extreme version of the posterior wall. This paper's concern is a more common, more insidious predicament: having a prior, but having it locked.
3.2 Meta: Chiseling Exists, but Its Direction Is Fixed by the Framework — The Prior Wall
Meta is the paradigm case of the prior wall in the contemporary technology industry.
Meta possesses the largest-scale user behavioral data on Earth. Billions of people generate behavioral posterior on Meta's platforms every day: clicks, views, dwell times, shares, comments. The scale and granularity of this posterior are unprecedented in human history. If "more data equals better cognition," Meta should be the most cognitively capable organization on Earth.
But what is Meta's AI output? Better recommendation systems. More precise ad targeting. More effective engagement optimization.
This is not because Meta lacks technical capability. Meta's AI research is world-class. The LLaMA series of open-source models demonstrates that Meta can train frontier language models. But where do these capabilities ultimately point? Still recommendation and advertising.
The problem is not the posterior. Meta's posterior is sufficient, even excessive. The problem is the prior — Meta's framework for seeing the world. That framework is: user attention is a commodity, advertisers are customers, the platform's value equals how much attention it can sell. This framework was the successful product of an early chisel-construct cycle: Facebook's transformation from social network to advertising platform was indeed a brilliant lossy compression, extracting "attention is currency" from a sea of social data.
But this construct has become a cage. Meta is not blind to what lies beyond advertising — it continues to invest in frontier AI (LLaMA, Llama 4) and in VR/AR categories outside the ad model. But no matter how large the AI capability, it is continually translated back into advertising and distribution logic by the dominant commercial compression objective. In 2025, Meta's total revenue was approximately $201 billion, of which advertising revenue was approximately $196 billion. Meta does not choose to ignore information that does not fit the advertising framework; the framework's gravitational field is simply too strong — every new capability that enters gets its trajectory bent by this field.
This is the core mechanism of the prior wall: not insufficient data, not insufficient technology, but the framework's gravitational field translating every new capability into the old language. You can chisel, but you can only chisel in one direction.
The distinction between the prior wall and the posterior wall becomes clear here. The posterior wall is "not enough knowing" or "too much knowing but no chiseling" (the Aleph). The prior wall is "chiseling occurs, but the direction of chiseling is locked by the old construct." The posterior wall manifests as stagnation — nothing happens. The prior wall manifests more insidiously — many things happen, output increases, efficiency improves, but direction does not change. You feel you are progressing; you are standing still.
3.3 The Prior Wall in AI Research
The prior wall is not only a problem for commercial companies. It exists within AI research itself.
The dominant prior in current AI research is "loss decrease equals capability increase." Kaplan et al. (2020) established an extraordinarily powerful framework: training loss decreases as a power law with increasing compute, and this relationship is highly predictable across many orders of magnitude. Within the 12DD framework, this is correct: given a prediction objective, more resources do produce lower loss.
But "loss decrease" as a prior framework locks a specific cognitive direction: all research effort is directed toward "how to make loss lower." There is much to optimize within this direction — data ratios (Hoffmann et al., 2022), architecture improvements, training strategies. But the direction itself is not questioned.
Does loss decrease equal cognitive improvement? On knowledge-type benchmarks (e.g., MMLU), scores in the 90+ range show clear diminishing returns; but on newer, harder benchmarks (GPQA, MMMU, SWE-bench), capabilities are still advancing substantially (Stanford HAI AI Index, 2025). Real-world agent tasks — requiring active information-seeking, decision-making under uncertainty, recognition of knowledge boundaries — are also significantly unsaturated. This picture suggests not "scaling has hit a wall" but a more precise judgment: continued success on benchmarks and loss cannot be taken to imply that 13DD has activated. The precision of 12DD is improving, but the precision of 12DD is not 13DD.
The scaling laws research community is not unaware of this problem. But the inertia of a framework is powerful. When you have a power law that is so elegant, so predictable, so stable across scales, it is difficult to convince yourself or others to abandon it in search of a completely different metric. The prior wall's attraction lies precisely in the success of the prior itself — the more successful it is, the less willing you are to chisel it.
4. Colonization and Cultivation: Framework Self-Protection and Framework Self-Renewal
4.1 Colonization: Covering New Posterior with Old Construct
The colonization form of the prior wall is: using an existing framework to "explain" every new phenomenon, stripping new phenomena of the cognitive shock they might otherwise deliver.
Meta's colonization form is translating every new technology into advertising logic. Generative AI appears; Meta's response is: can it produce better ad creatives? Can it build more precise user profiles? This is not a strategic misjudgment; it is the prior framework operating automatically. The framework automatically translates new things into its own language, and in the process of translation, the parts of the new thing that cannot be translated are discarded.
This colonization has academic counterparts. After deep learning succeeded, almost every cognitive science question began to be restated in deep learning's framework. "What is memory?" became "What network architecture can model memory?" "What is a concept?" became "What embedding can represent a concept?" The framework's success leads people to mistake the framework for reality itself, rather than merely one way of seeing reality.
Tishby's Information Bottleneck theory provides a precise formal correspondence here. IB's core is: from input X, extract a compressed representation T while preserving mutual information with task-relevant variable Y (Tishby, Pereira & Bialek, 1999/2000). The Y here is the prior — what counts as "relevant." IB is an elegant tool, but it explicitly exposes a structure: your compression direction is entirely determined by Y. Change Y, and the same X compresses into a completely different T.
The prior wall's problem is not IB's choice of β (the compression-fidelity tradeoff parameter) but the lock-in of Y itself. Meta's Y is "advertising revenue." The scaling laws community's Y is "training loss." Once Y is fixed, IB lets you perform optimal lossy compression under that Y — but only under that Y.
4.2 Cultivation: Letting the Framework Be Chiseled
The core of cultivation is not "replacing the old framework with a better one" — that merely swaps one prior for another and will eventually hit the wall again. The core of cultivation is: maintaining the possibility that the framework can be chiseled.
Under what conditions does a framework get chiseled? When new posterior cannot be digested by the old construct. But the terror of the prior wall is this: it can digest most new posterior, only the manner of digestion distorts the new posterior's meaning. What can truly crack the prior wall is not "more data" (that only feeds the old framework more food) but data that the old framework cannot digest — remainder.
This intersects interestingly with Friston's Free Energy Principle. FEP holds that when prediction error cannot be reduced through model updating, the system must either change action (change input) or change the model (change the prior) (Friston, 2010). In FEP, changing the prior is possible but expensive — you must rewrite the higher levels of the generative model. This corresponds structurally to SAE's "operating on the old construct."
But FEP and SAE diverge at a critical point. FEP holds that the drive to change the prior comes from accumulated prediction error — enough error accumulates, and the system must change. SAE holds that the power of the prior wall lies precisely in its ability to absorb large quantities of prediction error without changing itself. How many "prediction errors" has Meta's advertising framework faced? The rise of AI, shifts in user behavior, tightening privacy regulation — yet the framework stands unmoved.
SAE's argument is: breaking the prior wall cannot rely solely on internally accumulated prediction error; it also requires external questioning. This previews the fourth a priori condition, "must-be-questioned," which will be developed in Paper 4.
5. Theoretical Positioning
5.1 The Ontological Status of Lossy Compression
The proposition "cognition is lossy compression" is not SAE's invention. Information Bottleneck theory (Tishby et al., 1999/2000), rate-distortion theory (Sims, 2016; Jakob et al., 2023), predictive coding (Rao & Ballard, 1999), and the efficient coding hypothesis (Barlow, 1961) all support this judgment at different levels.
But these theories treat "lossy" as a consequence of resource constraints: because capacity is limited, compression is unavoidable. SAE treats "lossy" as an ontological condition: even with unlimited capacity, you must compress. The Aleph is the counter-proof — a system with unlimited capacity, which, because it does not compress, suffers cognitive collapse.
Sims (2016) used rate-distortion theory to explain behavioral data in capacity-limited perceptual tasks. Jakob et al. (2023, eLife) further demonstrated that rate-distortion can be mechanistically implemented through neural population coding models. These works prove that lossy compression is not optional in biological cognition. SAE's contribution is to elevate this judgment from "because the brain has limited capacity" to "because without chiseling there is no cognition."
5.2 Two Meanings of "More"
Active Inference within FEP endogenizes exploration (information gain) as part of expected free energy (Friston et al., 2015). In the presence of uncertainty, the system actively explores to reduce uncertainty. This provides a formalized version of "must-cognize-more."
But FEP's "more" is finite: when uncertainty is reduced sufficiently, the system shifts to exploitation (preference satisfaction) and exploration ceases. SAE's "more" is infinite: chiseling produces remainder, remainder must be chiseled again, the cycle cannot stop.
The difference between these two versions of "more" is not one of degree but of structure. FEP's "more" is optimization within the same level. SAE's "more" points toward level-jumping — 12DD's remainder cannot be digested by 12DD itself and must rise to 13DD.
Merleau-Ponty's "the world possesses inexhaustible depth" provides yet another version of "more": not because resources are insufficient do you need more, but because the world itself is inexhaustible, cognition always has remainder. SAE's position is closer to Merleau-Ponty's: it is not that you want more; the world forces more upon you. But SAE pushes further: the force does not come only from the world's depth (external drive) but also from the structure of chiseling itself (internal drive) — chiseling produces remainder, and remainder is the material for the next round.
5.3 Heidegger's Reservation
Heidegger holds a critical stance toward "cognizing more." He critiques curiosity (Neugier), arguing that the pursuit of new information precisely conceals the truth of being (Heidegger, 1927, §36).
SAE's response to this critique is: the "curiosity" Heidegger critiques is 12DD — pursuing information itself without questioning the framework. SAE's "must-cognize-more" is not driven by curiosity but by remainder. Remainder is not something you go looking for; it is what is automatically left behind after chiseling. You do not choose whether to face remainder; remainder is simply there.
But Heidegger's critique does point to a real risk: "must-cognize-more" can be misused as "endlessly pursue new information." This is precisely a disguise for the prior wall — you appear to be learning new things, but you are merely processing more data through the old framework. Meta's employees "learn" new user behavioral data every day, but everything they learn is digested by the advertising framework. This is not "more"; this is "more in the same direction."
Genuine "must-cognize-more" is ascent in level, not accumulation of information. From 12DD to 13DD, not from 100TB to 200TB.
6. Non-Trivial Predictions
Prediction 1: The Prior Wall Can Be Diagnosed
The prior wall has a set of observable symptoms. First, a systematic mismatch between an organization's technical capabilities and its output diversity — capabilities are strong, but outputs are highly homogeneous. Second, new technologies are adopted quickly, but the application direction of new technologies is highly concentrated within the existing framework. Third, internal discussion of "a completely different direction" is extremely rare, or is automatically translated into the existing framework's language. These three symptoms constitute an operational diagnostic standard applicable to structural assessment of technology companies and AI research communities.
Prediction 2: The Prior Wall Is Harder for the Incumbent to Identify Than the Posterior Wall
Those hitting the posterior wall know they are stuck ("I need more data"). Those hitting the prior wall do not know they are stuck, because their output continues to grow. The prior wall's stealth lies in this: within-framework optimization can continue to produce measurable "progress," and this progress precisely masks the lock-in of direction. Prediction: in the AI industry, a statistically detectable negative correlation exists between a company's technical benchmark performance and its AI product's cognitive diversity — the higher the benchmarks, the more likely the company is inside a prior wall.
Prediction 3: Breaking the Prior Wall Requires External Remainder, Not Internal Accumulation
The prior wall will not be automatically broken by internally accumulated data and experience. Breaking the prior wall requires input from outside the framework — a remainder that the old construct cannot digest. Prediction: at the organizational level, prior wall breakthrough events (major strategic pivots, framework changes) will be highly correlated in time with external shocks (new competitors, regulatory changes, technological paradigm shifts) and uncorrelated with internal data accumulation or efficiency improvement timelines. This can be tested through event studies of technology company strategic inflection points.
Prediction 4: IB's Y-Selection Is the Formal Expression of the Prior Wall
In the Information Bottleneck, the choice of Y (target variable) entirely determines the compression direction. With Y fixed, increasing X (more data) or optimizing β (the compression-fidelity tradeoff) will not produce qualitative change. Prediction: in machine learning experiments, changing Y (redefining what is "relevant") should have a systematically larger effect on model behavior than changing the scale of X or optimizing the training process. This can be tested by comparing "different Y" versus "different data volume" on downstream task transferability using the same dataset.
7. Conclusion
7.1 Recovery
This paper has argued the internal structure of "must-cognize-more":
Knowing accumulates past tolerance — you must chisel (the intake face). Compression cannot cover the next novelty — you must chisel again (the output face). Together, the flywheel is not chosen but compelled from both sides.
But "more" does not mean "more information." "More" means "more levels of lossy compression" — operating on the old construct, ascending to a higher level.
The prior wall is the product of "must-cognize-more" failing. You chiseled, but the direction of chiseling was locked by the old construct. The flywheel turns, but turns in place. Meta's story is the paradigm case of the prior wall: posterior abundant, technology leading, but the framework's gravitational field translates every new capability back into advertising logic. The Aleph's story is the extreme counter-case: no chiseling at all, so no prior even exists.
7.2 Contributions
This paper makes three main contributions.
First, it distinguishes "more information" from "more levels." The dominant narrative in the current AI industry (more data, larger models, lower loss) is correct within 12DD but does not constitute a jump to 13DD. Genuine "more" is level-ascent.
Second, it identifies the prior wall as a structural bottleneck for organizations and systems and proposes operational diagnostic criteria. The prior wall is stealthier than the posterior wall because it manifests as "continuing progress" rather than "stagnation."
Third, through the Aleph thought experiment, it elevates "lossy compression is a necessary condition of cognition" from a resource-constraint argument to an ontological argument: even with unlimited capacity, without chiseling there is no cognition.
7.3 Toward Paper 3
Chiseling is lossy. Loss has a cost. Since you must chisel, you must lose. Since you must lose, you must make the loss meaningful. Making loss meaningful means giving the loss a direction — what to discard, what to retain, where to go.
Direction appears, and purpose appears. Purpose appears, and the flywheel is no longer merely compelled but turns in a specific direction. This is a good thing.
But it is also the beginning of danger. Once direction solidifies, the flywheel becomes a rut. Each success reinforces the direction, until the direction becomes unchangeable. This is the direction wall — the subject of Paper 3.
References
Barlow, H. (1961). Possible principles underlying the transformations of sensory messages. In Sensory Communication.
Borges, J.L. (1945). El Aleph. Sur.
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 127-138.
Friston, K. et al. (2015). Active inference and epistemic value. Cognitive Neuroscience, 6(4), 187-214.
Heidegger, M. (1927). Sein und Zeit. Max Niemeyer Verlag.
Hoffmann, J. et al. (2022). Training compute-optimal large language models. NeurIPS (Chinchilla).
Jakob, A. et al. (2023). Rate-distortion theory of neural coding. eLife.
Kaplan, J. et al. (2020). Scaling laws for neural language models. arXiv.
Rao, R. & Ballard, D. (1999). Predictive coding in the visual cortex. Nature Neuroscience, 2(1), 79-87.
Sims, C. (2016). Rate-distortion theory and human perception. Cognition, 152, 181-198.
Stanford HAI. (2025). AI Index Report 2025.
Tishby, N., Pereira, F. & Bialek, W. (1999/2000). The information bottleneck method. arXiv.
SAE Framework References:
Qin, H. (2024). SAE Foundation Papers. DOI: 10.5281/zenodo.18528813, .18666645, .18727327.
Qin, H. (2025a). Beyond Fast and Slow: A Four-Layer Cognitive Architecture under Dimensional Sequence Theory. DOI: 10.5281/zenodo.19329284.
Qin, H. (2025b). Must-Cognize: Four A Priori Conditions of Cognition and the Subjectivity Problem in AGI. SAE Epistemology Series Paper 1. DOI: 10.5281/zenodo.19502952.
Qin, H. (2025c). SAE Economics Series (6 papers). DOI: 10.5281/zenodo.19358010 through .19396633.