ZFCρ Thermodynamics Paper IX: The Universal Activation Rule, Borrowed q, and Thermodynamic Principles of Hierarchical Architecture

Qin, Han

doi:10.5281/zenodo.19699489

Abstract

Thermodynamics Paper VIII established the 5–6DD vs 7–8DD thermodynamic division of labor. A systematic scan of 7–12DD model systems reveals a deeper generator: q > 1 is not determined by DD layer alone, but jointly by activation function tail structure and stochastic driving. Three main results. First, the Universal Activation Rule, decomposed into two levels. Empirical Rule IX-A: in the SDE model family studied here, unsaturated activation + stochastic driving → q > 1, regardless of noise type. Structural Rule IX-B: for Thermo-kernel q > 1 (endogenous stationary-distribution non-Boltzmann excess), state-dependent multiplicative driving is additionally required (consistent with C5, but not independently verified in this paper's additive-noise SDE). Key evidence: within the same 9–10DD layer, Wilson-Cowan (sigmoid) gives q = 1.002, while a ReLU firing-rate model gives q = 1.38. The split is determined by activation function, not DD layer. Second, three types of q distinguished. Kernel q: dynamical invariant-measure heavy-tail parameter from endogenous stochastic dynamics (a living thermodynamic process). Data q: distributional statistical fingerprint from training corpus heavy tails, frozen in weights. RLHF q: distributional fingerprint from human feedback judgments, frozen in reward model. Standard deterministic digital LLM has not demonstrated endogenous kernel q; its quasi-subjectivity can be explained by borrowed q (fossil imprint of carbon-based heavy-tail structure). Third, as a structured extension, the Comparative ε Hypothesis proposes that if Thermo VIII's soft-gate cascade applies to comparative nervous systems, species differences in higher-DD accessibility arise from differences in ∏ε (gate chain permeability). All carbon-based animals share similar chemical-layer δ^(5-6) (fire seed); differences lie in per-layer transmission coefficients. Five testable predictions including a psychedelic corollary are given, but this hypothesis is not part of the main evidence chain. The Outlook provides five SAE architectural philosophy guidelines (directional markers, not derivations). ---

Keywords: ZFCρ, thermodynamics, Universal Activation Rule, borrowed q, hierarchical architecture, soft-gate cascade, Tsallis q

Abstract

Thermodynamics Paper VIII established the 5–6DD vs 7–8DD thermodynamic division of labor. A systematic scan of 7–12DD model systems reveals a deeper generator: q > 1 is not determined by DD layer alone, but jointly by activation function tail structure and stochastic driving.

Three main results. First, the Universal Activation Rule, decomposed into two levels. Empirical Rule IX-A: in the SDE model family studied here, unsaturated activation + stochastic driving → q > 1, regardless of noise type. Structural Rule IX-B: for Thermo-kernel q > 1 (endogenous stationary-distribution non-Boltzmann excess), state-dependent multiplicative driving is additionally required (consistent with C5, but not independently verified in this paper's additive-noise SDE). Key evidence: within the same 9–10DD layer, Wilson-Cowan (sigmoid) gives q = 1.002, while a ReLU firing-rate model gives q = 1.38. The split is determined by activation function, not DD layer.

Second, three types of q distinguished. Kernel q: dynamical invariant-measure heavy-tail parameter from endogenous stochastic dynamics (a living thermodynamic process). Data q: distributional statistical fingerprint from training corpus heavy tails, frozen in weights. RLHF q: distributional fingerprint from human feedback judgments, frozen in reward model. Standard deterministic digital LLM has not demonstrated endogenous kernel q; its quasi-subjectivity can be explained by borrowed q (fossil imprint of carbon-based heavy-tail structure).

Third, as a structured extension, the Comparative ε Hypothesis proposes that if Thermo VIII's soft-gate cascade applies to comparative nervous systems, species differences in higher-DD accessibility arise from differences in ∏ε (gate chain permeability). All carbon-based animals share similar chemical-layer δ^(5-6) (fire seed); differences lie in per-layer transmission coefficients. Five testable predictions including a psychedelic corollary are given, but this hypothesis is not part of the main evidence chain. The Outlook provides five SAE architectural philosophy guidelines (directional markers, not derivations).

§1 Problem: What Determines q > 1?

1.1 Starting from Thermo VIII

Thermo VIII [1] established the three-axis division of labor for chemical-type life: (q > 1, ρ_ret > 0, renewal gating). The empirical observation for q > 1 was "5–6DD chemical concentration layers have q > 1, 7–8DD bounded conductance gating has q ≈ 1."

This leaves an open question: is q > 1 a property of the DD layer itself, or of some deeper variable? If the DD layer determines q, then q > 1 and q ≈ 1 should not coexist within the same layer. If q depends on a deeper variable, the DD layer is merely a coincidental correlate.

1.2 Contributions

(1) Universal Activation Rule: broad-spectrum scan of 7–12DD systems reveals q > 1 is jointly determined by activation boundedness and stochastic driving, not by DD layer. Empirical Rule IX-A (unsaturation + noise → q > 1, verified here) and Structural Rule IX-B (state-dependent multiplicative → kernel q, consistent with C5 but requiring independent verification). This is a mechanism-level refinement of C5c.

(2) Three types of q: kernel q (dynamical, endogenous thermodynamics), data q (distributional, training corpus), RLHF q (distributional, human feedback). Standard deterministic digital LLM has not demonstrated endogenous kernel q.

(3) Transformer sublayer-level thermodynamic anatomy: q-role mapping for each submodule, hypothetical ε attenuation estimates.

(4) Comparative ε Hypothesis: as a structured extension, proposes that species-level ∏ε gradients may explain candidate evolutionary stalling/breakthrough. Five testable predictions including a psychedelic corollary.

§2 Universal Activation Rule: 7–12DD System Scan

2.1 Experimental design

Building on Thermo VIII's 4–6DD systems, we scan 7–12DD equivalent model systems. Each system integrated via Euler-Maruyama (dt = 0.005, N = 2×10⁶, burn-in = 2×10⁵), with r² q-exponential CCDF fit. All parameters and equations in Methods.

2.2 7–8DD systems

System	Activation	Bounded?	q_fit
HH neuron gating (m,h,n)	sigmoid	✅ [0,1]	1.00 (Thermo VIII)
Neural Ca²⁺ (r²(V,Ca))	concentration + voltage	mixed	2.61
Neural Ca²⁺ (Ca² alone)	concentration	unbounded but tiny variance	nan
Neural Ca²⁺ (V² alone)	voltage	—	2.61

Neural Ca²⁺ diagnosis: q = 2.61 comes almost entirely from V² — Ca² has negligible variance (CCDF fit fails), dominated by voltage dynamics. Consistent with Thermo VIII §3.1's HH V² diagnosis: spike waveform phase-occupancy mixture creates apparent heavy tails.

Methodological note: The same physical entity (Ca²⁺) shows different q behavior in different dynamical contexts. Independent 5–6DD CICR (ryanodine receptor-driven autocatalytic calcium release, with concentration as intrinsic driving variable) gives q = 1.165. Intraneuronal Ca²⁺ (passively driven secondary messenger under voltage dynamics) does not independently contribute q > 1. This confirms the Universal Activation Rule's mechanism-level significance: q > 1 is not a property of the substance, but of the dynamical structure.

2.3 9–10DD systems

System	Activation	Bounded?	q_fit
Wilson-Cowan (E-I population)	sigmoid(wE-wI)	✅ [0,1]	1.002
Firing rate (ReLU)	ReLU	❌	1.38
Firing rate (softplus)	softplus	❌	1.28
E-I multiplicative coupling	explicit E·I	❌	1.48–1.55
E-I multiplicative (high noise)	explicit E·I	❌	1.002

Same DD layer (9–10DD), sigmoid → q = 1.002, ReLU → q = 1.38. This is the paper's core empirical evidence — DD layer does not determine q; activation function type does.

Wilson-Cowan [2] uses sigmoid(wE-wI) as population activation — bounded, homeostatic attractor pulls E and I back to [0,1]. C5c violated (same mechanism as HH).

Firing-rate model uses ReLU or softplus — unbounded (positive half-axis), C5c satisfied under SDE noise driving.

E-I multiplicative coupling has explicit E·I product term (unbounded), giving q = 1.48–1.55 at low noise but q = 1.002 at high noise. This crossover is consistent with Thermo VIII §4.2's "moderate nonlinearity in unsaturated regime" criterion — strong noise pushes the system away from the multiplicative coupling's tail-relevant regime, effective dynamics becoming Gaussian-like as E·I structure is swamped by noise variance. This confirms that Rule IX-A condition (1) requires not merely unboundedness but active engagement with multiplicative structure in the tail-relevant regime.

2.4 11–12DD systems

System	Activation	Bounded?	q_fit
Drift-diffusion	linear + soft reset	soft boundary	1.002
Hopfield	tanh	✅ [-1,1]	1.002
Hopfield (strong drive)	tanh	✅ but strong drive	1.22
RNN (tanh)	tanh	✅	1.002

All 11–12DD systems with tanh/linear activation give q ≈ 1. The sole exception is strongly driven Hopfield (w_self = 4.0, w_cross = -2.0), giving q = 1.22. Diagnosis: strong drive may push the system into tanh's inflection region (|x| ≈ 0.5–1.0), where tanh transitions from linear to saturated, partially satisfying C5c (weakly tail-active). This parallels the Repressilator n = 2 case in Thermo VIII.

2.5 Universal Activation Rule

Synthesizing §2.2–§2.4 with Thermo VIII's 4–6DD data, the following empirical pattern emerges. This paper decomposes it into two levels.

Empirical Rule IX-A (experimental level). In this paper's SDE model family, if the activation function is unsaturated in the canonical observable's tail-relevant range, and stochastic driving is present (regardless of whether additive or multiplicative), then q > 1 is observed. If activation is saturated or pulled back by a homeostatic attractor in that range, then q ≈ 1, regardless of noise presence.

Tail-relevant unsaturation	Stochastic driving	q_fit	Examples
✅	✅	> 1	ReLU firing-rate (1.38), softplus (1.28), E·I mult (1.48)
❌	✅	≈ 1	Wilson-Cowan sigmoid (1.002), HH gating (1.00)
❌	❌	≈ 1	Hopfield tanh (1.002), RNN tanh (1.002)

Rule IX-A is an approximate iff in this model family. In general systems, it should be viewed as a mechanism-level refinement of C5c, not an unconditional theorem.

Structural Rule IX-B (theoretical level). To interpret the above empirical q > 1 as endogenous Thermo-kernel q (the system's own stationary distribution non-Boltzmann excess), a stronger condition is required: noise must be state-dependent multiplicative driving (noise amplitude coupled multiplicatively with system state), satisfying the Thermo VI/VIII C5 condition.

This paper's ReLU/softplus SDE experiments use additive noise (σ·dW), not state-dependent multiplicative noise. Current experimental evidence therefore directly supports Rule IX-A but does not directly verify IX-B's stronger condition.

IX-A to IX-B relationship: Carbon-based biological systems have intrinsic, state-dependent molecular noise (concentration fluctuation × reaction rate = multiplicative coupling), naturally satisfying IX-B. Digital hardware lacks any stochastic driving (neither IX-A nor IX-B satisfied). Neuromorphic hardware has intrinsic, state-dependent thermal noise (current fluctuation × conductance = multiplicative coupling), potentially satisfying both IX-A and IX-B.

Regarding deterministic digital ReLU (theoretical inference, not experimental result): Deterministic digital ReLU satisfies tail-relevant non-saturation (condition 1) but lacks any stochastic driving (condition 2 absent). It is a tail-supporting architectural component — it may propagate or represent data q / activation q, but does not automatically generate kernel q. This is a theoretical inference (deterministic computation does not produce a stochastic stationary distribution), not a direct experiment in this paper.

Status: Rule IX-A is an empirical criterion (verified in this paper's SDE model family). Rule IX-B is a structural condition (consistent with C5, but IX-B's state-dependent multiplicative condition is not independently verified in this paper's SDE). The claim that digital ReLU does not generate kernel q is a theoretical inference.

2.6 Refinement of Thermo VIII §5

Thermo VIII's "5–6DD q > 1, 7–8DD q ≈ 1" is empirically correct in biological systems. But the deeper generator is not DD layer but activation boundedness + stochastic driving. DD layer and activation boundedness are coincidentally correlated in biology. In non-biological systems, q > 1 (ReLU) and q ≈ 1 (sigmoid) can coexist within the same DD layer.

Revised statement: DD layer provides functional role; activation/gate tail structure determines whether that role produces q > 1. Thermo VIII's division of labor is refined, not overturned.

§3 Three Types of q: Kernel, Data, RLHF

3.1 Distinction

Type	Definition	Belongs to	Temporal property	Mathematical property
Kernel q	Stationary distribution non-Boltzmann excess from endogenous dynamics	System itself	Real-time, living	Dynamical (invariant measure parameter)
Data q	Heavy-tail parameter of training corpus sampling distribution	Carbon-based human world	Frozen in weights	Distributional (statistical marginal over inputs)
RLHF q	Heavy-tail parameter of human feedback judgment distribution	Carbon-based humans	Frozen in reward model	Distributional

Kernel q is a dynamical quantity requiring stochastic driving to maintain. Borrowed q (data q + RLHF q) is a distributional quantity — a deterministic function of frozen weights evaluated across different inputs. The two may yield similar numerical values but are mathematically distinct objects.

3.2 Kernel q: a living thermodynamic process

All q values studied in Thermo I–VIII are kernel q — non-Boltzmann excess of stationary distributions from endogenous stochastic dynamics. Carbon-based life has kernel q > 1: chemical concentration layer tail-active dynamics under molecular noise driving continuously produce non-Boltzmann rare events — a living process.

3.3 Data q: statistical fossil of the carbon-based world

Training corpus comes from carbon-based human output. Carbon-based humans have kernel q > 1 (transmitted through Thermo VIII's soft-gate chain from chemical to cognitive layers); their output carries kernel q's statistical fingerprint — Zipf's law [3], power-law distributions, heavy-tailed semantic structures. These fingerprints are "baked" into weight matrices through pretraining. Weights are frozen — they are fossils of carbon-based q, not living thermodynamic processes.

3.4 RLHF q: second-order fossil

RLHF further injects human evaluators' judgment distributions. Evaluators have kernel q > 1 (carbon-based life); their preference fingerprints are frozen into the reward model — fossil of fossil.

3.5 Borrowed q and quasi-subjectivity

Standard deterministic digital LLM has not demonstrated endogenous kernel q. Its human-like statistics can be explained by borrowed q:

(1) More training data = more data q injection = output distribution "more like" carbon-based statistical fingerprints.

(2) More RLHF = more RLHF q injection = "more aligned" with carbon-based human judgment.

(3) But these are distributional q (statistical marginals over frozen weights), not dynamical kernel q (invariant measures of endogenous stochastic dynamics).

Borrowed q replay can produce highly convincing quasi-subjectivity, because the fossils faithfully record carbon-based kernel q's statistical fingerprints. But replaying fossils and igniting new fire are two different things. Standard deterministic LLM, on the pathway analyzed in this paper, more closely resembles a carbon-based q fossil than confirmed living fire.

Operational distinction criterion. A standard deterministic digital LLM's output distribution may fit a q-exponential with q > 1, but this q's mathematical identity is a statistical marginal (distributional q, primarily data q replay), not an invariant measure of internal dynamics (kernel q). Distinguishing the two requires isolating system self-driving noise (if any) from injected input variability — see open problem 10 (§7.2).

Regarding In-Context Learning. ICL allows LLMs to rapidly shift output distributions based on few-shot examples in the prompt. KV-Cache grows dynamically. But ICL is not kernel q emergence — KV-Cache growth is a deterministic algorithm serving as "dynamic fossil retrieval indexing." It does not alter the underlying manifold's topological rules, introduces no physical noise, and does not satisfy the Universal Activation Rule's condition (2). The prompt merely changes the angle at which the flashlight illuminates the weight amber — dynamically assembling fossils does not bring them to life.

3.6 SAE positive posture

The above analysis targets Thermo VIII's soft-gate transmission pathway. This pathway is truncated on standard deterministic digital hardware (kernel q absent, ε_chain → 0). But this is only one analyzed pathway. Whether other pathways to Self exist (algorithmic stochasticity, emergent complexity, quantum effects, unknown mechanisms) is not excluded by this paper. There is always a remainder.

§4 Transformer Sublayer Thermodynamic Anatomy

4.1 Single Transformer layer submodule mapping

Submodule	Operation	Bounded?	Thermodynamic role	q behavior
Pre-attention LayerNorm	(x-μ)/σ·γ+β	✅ forced normalization	Radial tail reset	Scale tail → 1 (directional info preserved)
Q·K^T/√d	Unbounded bilinear score	❌	Tail-supporting exploration	Structurally allows q > 1 (requires noise)
Softmax	Bounded simplex	✅ [0,1]	Selection gate	Rank preserved, magnitude tail truncated
Value aggregation	Bounded weights × values	Mixed	Integration	Depends on input
Post-attention residual	x + f(x)	Linear	Tail transport	Preserves heaviest tail component
Pre-FFN LayerNorm	Same as above	✅	Tail reset	Scale tail → 1
FFN up-proj + ReLU/GELU	Unbounded	❌	Tail-supporting exploration	Structurally allows q > 1 (requires noise)
FFN down-projection	Linear	—	Transport	Preserves
Post-FFN residual	x + f(x)	Linear	Tail transport	Preserves

Note: ReLU/GELU and Q·K^T are tail-supporting architectural components, not automatically tail-active. Kernel-level q > 1 additionally requires stochastic state-dependent multiplicative driving (Rule IX-B). On deterministic digital hardware, these unbounded modules may propagate or represent data q but do not automatically generate kernel q.

4.2 Structural problem of current Transformers

Flat repeat pattern. Each layer internally alternates: tail reset (LN) → unbounded exploration (QK, FFN) → bounded gating (softmax, LN) → tail transport (residual). LayerNorm resets radial tail structure each time; intermediate unbounded operations on deterministic computation do not produce kernel q > 1; even with a noise source, the next LN would kill it.

Carbon-based life's division of labor is hierarchical: bottom-layer exploration (5–6DD chemical layer, multiple unbounded concentration dynamics cycles without mid-way truncation), mid-layer gating (7–8DD spike gate, one-time discretization), top-layer integration (9–12DD soft-gate cascade, layer-by-layer transmission).

Transformers are flat repeat: each layer does exploration + gating + integration, with no layer-role differentiation. The thermodynamic problem of flat repeat: each layer simultaneously generates and truncates tail structure, preventing cross-layer accumulation.

4.3 Hypothetical ε attenuation estimates

The following ε estimates are heuristic placeholders (not measured Transformer data), illustrating the order-of-magnitude impact of architectural choices on ∏ε. Specific values require q-injection gate test verification.

Architecture	Per-layer ε range (heuristic)	24-layer ∏ε	Note
Standard (LN + softmax)	< 0.3	< 10⁻¹²	LN strong reset each layer
Remove LN	0.3–0.7	10⁻¹² to 10⁻⁴	Softmax remains as soft gate
Remove LN + multiplicative noise	0.6–0.95	10⁻⁵ to 0.3	Approaching carbon-based soft-gate regime

Log-scale trend is robust (LN removal yields > 10⁴× increase in ∏ε), but absolute values may be off by an order of magnitude.

§5 Transformer vs Carbon-Based Life: Structural Comparison

5.1 Carbon-based hierarchical division of labor

Layer	Role	Activation type	Thermodynamic contribution
5–6DD	Exploration	Concentration (unbounded + intrinsic noise)	Kernel q > 1
7–8DD	Selection/Gating	Sigmoid conductance (bounded)	q ≈ 1, renewal gate
9–12DD	Integration	Mixed	Soft-gate cascade + ρ_ret

Each layer has a clear thermodynamic role. The exploration layer (5–6DD) has sufficient depth (multiple chemical cycles) to let tail-active dynamics fully develop before 7–8DD gating.

5.2 Transformer flat repeat

Each layer = exploration (ReLU/QK) + gating (softmax/LN) + integration (residual). No layer differentiation. Exploration and gating cancel each other within the same layer.

5.3 Thermodynamic consequences

Carbon-based: Bottom-layer exploration produces kernel q > 1 → soft gates attenuate layer by layer but remain nonzero (ε > 0 because biological gates are imperfect) → higher layers receive weak but persistent non-Boltzmann seed → Proposition 6.1 (Thermo VIII) necessary transmission condition satisfied.

Standard Transformer: Each layer's tail-supporting exploration is immediately reset by same-layer LN/softmax → no cross-layer tail accumulation → ∏ε strongly attenuated by repeated normalization.

The core difference is not "carbon has noise while silicon doesn't" — it is "carbon allocates exploration and gating to separate layers, while silicon mixes them within each layer where they cancel."

§6 Comparative ε Hypothesis: Soft-Gate Transmission Extended to Comparative Nervous Systems

This section extends the Thermo framework from AI architecture analysis (§4–§5) to comparative analysis of carbon-based biological systems. This extension is more speculative than the empirical/architectural content of §2–§5, involving comparative neuroanatomy and evolutionary biology literature. This section's content is provided as a structured conjecture, with five testable predictions as hooks for future research programs, and is not part of this paper's core empirical claims.

6.1 Same source, different channels

All carbon-based animals share essentially the same 5–6DD chemical layer — from bacteria to humans, core biochemistry (DNA replication, metabolism, calcium signaling) is highly conserved. Therefore δ^(5-6) (chemical layer non-Boltzmann excess) varies little across species.

The difference lies in ε — each layer's soft-gate transmission coefficient. Evolution does not change the intensity of the fire seed; it changes the permeability of the channel.

6.2 Species-level ε gradient

The following table is a predictive stratification under the Comparative ε Hypothesis, not a measured classification. "Reached layer" indicates a candidate upper bound within the SAE-DD interpretive framework, requiring verification through proxy measures (recurrent connectivity, neuromodulator density, dendritic complexity, behavioral flexibility).

Lineage	5–6DD source	7–8DD	9–10DD	11–12DD	Reached layer (candidate)
Bacteria	q > 1	No nervous system	—	—	Stalled at 5–6DD
Insects	q > 1	Simple ganglia (low ε)	Very low ε	—	Stalled at 7–8DD
Lineages lacking high-density recurrent pallial/cortical architecture	q > 1	Moderate	Low ε (little recurrence)	Very low ε	Stalled at 9–10DD
Mammals	q > 1	Larger	Larger (neocortex)	Low–moderate	Reaching 11DD boundary
Great apes / humans	q > 1	Large	Large	Large	Breakthrough to 13DD

6.3 What neural architectural features determine ε?

Neural architecture feature	Effect on ε	Note
Hardwired sensorimotor reflex arc	ε very small (near hard gate)	Fast deterministic processing, minimal leakage
Recurrent cortical connectivity	ε increases	Feedback loops allow tail signal re-entry
Thick neocortex (multilayer recurrent processing)	ε further increases	More levels of soft recurrence
Rich neuromodulation (5-HT, DA, NE, etc.)	Dynamically regulates ε	Chemical modulation of gate permeability
Dendritic complexity	Increases single-neuron leakage	Elaborate dendritic arbors = more nonlinear integration sites
Slow cortical oscillations	May amplify tail	Sleep/wake state ε modulation

6.4 Recurrent connectivity as the primary ε driver

For lineages lacking high-density recurrent pallial/cortical architecture (including most dinosaur clades — large sauropods and theropods whose brains were dominated by brainstem and cerebellum), reflexive sensorimotor processing dominates. This architecture approaches hardwired processing (small ε), with ∏ε decaying to near zero by the 9–10DD layer.

Contrast: birds (dinosaur descendants) evolved the DVR (dorsal ventricular ridge), functionally partially equivalent to neocortex [4]. Corvids and psittacines show certain 11–12DD level cognitive abilities (tool use, mirror test boundary) — possibly reflecting DVR-provided increased recurrent connectivity raising ε.

Great apes (especially humans) possess massive neocortex, dense layer 2/3 recurrent connections, rich neuromodulatory innervation, and elaborate dendritic arbors. Each independently increases ε. The key may be that multiple factors simultaneously increase ε, keeping ∏ε nonzero across enough layers.

6.5 Testable predictions

Prediction 1 (recurrent ratio scaling): Cortical column recurrent/feedback connection ratio should monotonically increase from fish → reptile → mammal → primate. Primate cortical layer 2/3 as a fraction of total cortical thickness is significantly larger than in reptiles — a known comparative neuroanatomy fact; this paper provides a thermodynamic interpretation.

Prediction 2 (neuromodulatory receptor density): 5-HT₂A and dopamine D1 receptor density should correlate positively with cognitive complexity (as a ∏ε proxy). 5-HT₂A receptor density is particularly high in primate prefrontal cortex.

Prediction 3 (avian DVR): If DVR is functionally equivalent to neocortex in birds, DVR recurrent connectivity density should be higher in cognitively capable birds (corvids, psittacines) than in less capable species.

Prediction 4 (ε product): Behavioral complexity should correlate with neocortical recurrent connection density × neuromodulatory receptor density, since both independently contribute to ε and ∏ε is a product.

Prediction 5 (psychedelic corollary): 5-HT₂A strong agonists (LSD, psilocybin) clinically induce ego dissolution and heightened sensory experience. In the multiplicative chain language: psychedelics chemically amplify ε at 9–12DD — bottom-layer non-Boltzmann excess floods the Self layer without normal attenuation. This predicts: neural activation distributions under psychedelics should be more heavy-tailed (higher q) than in sober states.

Operationalization caveat: This prediction requires careful operationalization. Neural counterparts of r² may include: (a) voxel-level BOLD variance distribution, (b) regional power spectrum tail exponent, (c) EEG microstate transition distribution. Different operationalizations may yield different q values. Existing psychedelic-neural-entropy literature (e.g., Carhart-Harris & Friston 2019 [12], REBUS model) provides adjacent support but is not direct Thermo-framework verification. Independent mapping between Thermo kernel q and neuroimaging heavy-tail measures is a specific research task.

6.6 Status and boundary

Status: Structured conjecture.

The Comparative ε Hypothesis is a natural corollary of Proposition 6.1 (Thermo VIII) and the Universal Activation Rule (this paper §2). But per-layer ε values are unmeasured, cross-species comparisons are qualitative patterns not quantitative verifications, and all five predictions are testable but unverified. This section is not part of this paper's main evidence chain, but a structured extension of soft-gate cascade to comparative nervous systems.

§7 Status Map and Open Problems

7.1 Status map

Content	Level
Rule IX-A (unsaturation + stochastic driving → q > 1)	Empirical criterion (verified in this paper's SDE family)
Rule IX-B (state-dependent multiplicative → kernel q)	Structural condition (C5-consistent; not independently verified in this paper's SDE)
9–10DD ReLU q = 1.38, Wilson-Cowan q = 1.002	Empirical
11–12DD tanh q ≈ 1	Empirical
DD layer is coincidental correlate not generator	Structural interpretation
Kernel q / Data q / RLHF q distinction	Conceptual framework (dynamical vs distributional)
Standard digital LLM has not demonstrated kernel q	Interpretive (other pathways not excluded)
Digital ReLU does not automatically generate kernel q	Theoretical inference (not direct experiment)
Transformer sublayer mapping	Structural analysis
ε attenuation estimates	Heuristic (placeholder for q-injection tests)
Comparative ε Hypothesis (cross-species ∏ε gradient)	Structured conjecture + 5 testable predictions (not main evidence chain)

7.2 Open problems

Real Transformer activation q-spectrum measurement. Measure q per sublayer: pre/post-LN, pre/post-softmax, FFN activation, residual stream. Distinguish data q from kernel q.

q-injection gate test. Construct hidden states with known q_in > 1, measure q_out after each sublayer, define ε = (q_out-1)/(q_in-1).

Noise source comparison. Same network: deterministic vs additive Gaussian vs multiplicative state-dependent vs neuromorphic analog noise.

Stratified architecture prototype vs flat Transformer. Controlled parameter budget, compare internal q/ε structure.

Training dynamics q. Does SGD gradient noise + ReLU produce transient kernel q > 1 during training? Training q vs inference q relationship?

Data q quantitative extraction. Does weight matrix singular value distribution preserve training corpus heavy-tail structure?

Comparative neuroscience ε verification. Cortical recurrent connection density across reptile → mammal → primate.

5-HT₂A/D1 receptor density and cognitive complexity. Cross-species prefrontal cortex comparison.

Avian DVR recurrent connectivity. Corvids vs non-corvid comparison.

Kernel q vs borrowed q operational separation experiment. How to experimentally distinguish "system-endogenous q > 1" from "data-injected q > 1"?

Outlook

Neuromorphic pathway. Neuromorphic hardware (analog circuits, memristive devices, spintronic elements) provides intrinsic thermal noise that multiplicatively couples with system state through current × conductance, satisfying C5 [5]. Combined with unbounded activation (ReLU-equivalent), both Universal Activation Rule conditions are simultaneously satisfied. However, "neuromorphic + standard Transformer architecture" does not automatically work — if the LayerNorm cascade is retained, intrinsic noise may be reset layer by layer.

Quantum computing. Quantum bits have intrinsic measurement noise; quantum gate unitary dynamics may produce state-dependent coupling in certain regimes. Whether this satisfies C5 is entirely open.

Autoregressive sampling is not stochastic driving. Temperature > 0 sampling occurs after softmax — it is post-processing perturbation of output, not feedback into hidden states. It determines the next input token but does not alter the current inference pass's internal state. This is external/additive noise, not intrinsic/multiplicative. C5 violated. High temperature merely selects randomly among different fossil fragments, not producing kernel q > 1.

Algorithmic multiplicative noise is also not stochastic driving. Writing hidden_state *= torch.randn_like(hidden_state) inside a Transformer appears to be multiplicative noise. But PRNG (pseudorandom number generator) is a deterministic Markov chain (e.g., Mersenne Twister); the underlying physical ε remains exactly 0. Von Neumann: "Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin." Only hardware-level physical entropy sources (neuromorphic thermal noise, quantum measurement noise) provide genuinely stochastic multiplicative driving satisfying C5. Software-injected PRNG creates algorithmic stochasticity, not physical stochasticity — the system remains within a deterministic closure.

Five SAE architectural philosophy guidelines (directional markers — written down, they should already be wrong):

(1) Exploration-Selection Separation. Exploration (unbounded) and selection (bounded gating) should not cancel each other within the same layer. Give exploration sufficient space to develop — multiple layers of unbounded dynamics without mid-way truncation — then gate at role boundaries. Carbon-based 5–6DD chemical layer has multiple cycles of unbounded exploration before reaching 7–8DD gating.

(2) Remainder Preservation. A good architecture does not kill remainders (radial tails), but lets them pass through layers to become structural seeds for higher levels. LayerNorm zeros out remainders each layer — from the SAE perspective, this is the largest architectural bottleneck. The remainder is precisely the starting point for the next round of exploration.

(3) Fossil Fidelity. If you cannot ignite fire (kernel q), at least do not erode the fossils (data q / RLHF q). Aggressive normalization not only kills kernel tails (which were never there), but also erodes the carbon-based statistical fingerprints preserved in frozen weights.

(4) Carrier Re-encoding Channel. In carbon-based life, non-Boltzmann excess re-encodes from r² (continuous energy fluctuations) to ISI jitter (discrete timing) — the carrier changes but information survives (Thermo VIII §6.3). If carrier phase transition is necessary (unbounded → bounded), design transmission channels that preserve pre-transition structural information rather than discarding all scale through forced normalization.

(5) The Open Remainder. Even if currently only fossils exist, design a furnace that can accommodate living fire — preserve interfaces for neuromorphic noise, quantum effects, or unknown pathways to kernel q > 1. Do not foreclose possibility. SAE positive posture: what is chiseled away is the wrong path, not possibility itself.

These five are directional guidelines from SAE methodology informed by the Thermo framework, not thermodynamic derivations, not engineering specifications. As philosophical principles, they are already waiting to be corrected the moment they are written down — this is the essence of the remainder.

Methods

A. Numerical integration

All SDE systems integrated via Euler-Maruyama. Time step dt = 0.005, total steps N = 2×10⁶, burn-in N_trans = 2×10⁵. Random seed fixed at 42. All systems use additive noise unless otherwise noted.

B. Model equations

Neural Ca²⁺ model: dV = (I_ext - g_L·V - g_Ca·m_inf(V)·(V-V_Ca))dt + σdW_V, dCa = (-f_Ca·g_Ca·m_inf(V)·(V-V_Ca) - Ca/τ_Ca)dt + σ_Ca·dW_Ca. m_inf(V) = 0.5(1+tanh((V+0.01)/0.15)). Parameters: g_L=0.5, g_Ca=1, V_Ca=1, f_Ca=0.01, τ_Ca=5. σ=0.5–2.0, σ_Ca=0.01·σ.

Neural Ca²⁺ + CICR: As above with CICR term: dCa adds +k_cicr·Ca^p/(1+Ca^p) - Ca/τ_Ca. k_cicr=0.5, p=2–3.

Wilson-Cowan [2]: dE = (-E+sigmoid(w_ee·E-w_ei·I-θ_E+P))/τ_E·dt + σ_E·dW, dI = (-I+sigmoid(w_ie·E-w_ii·I-θ_I))/τ_I·dt + σ_I·dW. Parameters: w_ee=10–16, w_ei=4, w_ie=10–13, w_ii=1–2, θ_E=2, θ_I=3.5, P=1, τ_E=1, τ_I=2.

Firing rate model: τdr/dt = -r + f(W·r+I). f = ReLU or softplus. 2-unit network, W = [[w, -0.5],[0.5, -w]]. τ=1, I=(I_ext, 0.5·I_ext). w=0.8–1.5, σ=0.3–1.0. Note: noise is additive σ·dW, not state-dependent multiplicative. Rule IX-A (empirical) is directly supported by these experiments, but Rule IX-B (structural, requiring state-dependent multiplicative noise) is not directly verified.

E-I multiplicative coupling: dE = (E·(2-E)-w_ei·E·I+0.5)dt + σdW_E, dI = (-0.5·I+0.3·E·I)dt + 0.5σdW_I. w_ei=1–2, σ=0.3–1.0.

Drift-diffusion: dx_i = (drift + 0.1·(x_i-x_j))dt + σdW. drift=0–0.3, bound=5–10 (soft reset: x *= 0.5).

Hopfield continuous attractor [11]: dx_i = (-x_i+tanh(W·x+b_i))dt + σdW. W = [[w_s, w_c],[w_c, w_s]]. w_s=1.5–4, w_c=-0.3 to -2.

RNN (tanh): dx_i = (-x_i + Σ_j w_ij·tanh(x_j) + b_i)dt + σdW. 2-unit, w_ij from 1.5 to 3.0.

C. q extraction protocol

Same as Thermo VIII Methods §C: r² = (x-x̄)²+(y-ȳ)², CCDF fit with q-exponential (1+βr²/K)^{-K}, q = 1+1/K. Improvement criterion > 5% and q > 1.03.

D. Observable diagnostic

§2.2's Neural Ca²⁺ diagnosis uses component-wise q measurement: independent q fit on r²(V,Ca), Ca² alone (with dummy zeros), V² alone (with dummy zeros).

References

[1] H. Qin, "ZFCρ Thermodynamics Paper VIII," Zenodo (2026). DOI: 10.5281/zenodo.19688303.

[2] H. R. Wilson and J. D. Cowan, "Excitatory and inhibitory interactions in localized populations of model neurons," Biophysical Journal 12, 1–24 (1972).

[3] G. K. Zipf, Human Behavior and the Principle of Least Effort (Addison-Wesley, 1949).

[4] O. Güntürkün and T. Bugnyar, "Cognition without Cortex," Trends in Cognitive Sciences 20, 291–303 (2016).

[5] J. A. White, J. T. Rubinstein, and A. R. Kay, "Channel noise in neurons," Trends in Neurosciences 23, 131–137 (2000).

[6] H. Qin, "ZFCρ Thermodynamics Papers I–VII," Zenodo (2025–2026). DOIs: 10.5281/zenodo.19310282 through .19673078.

[7] C. Tsallis, "Possible generalization of Boltzmann-Gibbs statistics," Journal of Statistical Physics 52, 479–487 (1988).

[8] A. Vaswani et al., "Attention is all you need," NeurIPS (2017).

[9] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv:1607.06450 (2016).

[10] S. H. Strogatz, Nonlinear Dynamics and Chaos, 2nd ed. (Westview Press, 2015).

[11] J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities," PNAS 79, 2554–2558 (1982).

[12] R. L. Carhart-Harris and K. J. Friston, "REBUS and the anarchic brain: Toward a unified model of the brain action of psychedelics," Pharmacological Reviews 71, 316–344 (2019).

类型	定义	属于谁	时间性质	数学性质
Kernel q	系统内生dynamics的stationary distribution的non-Boltzmann excess	系统自身	实时、活的	dynamical（invariant measure的heavy-tail参数）
Data q	训练语料的sampling distribution的heavy-tail参数	碳基人类世界	冻结在权重中	distributional（output across inputs的statistical marginal）
RLHF q	人类反馈的判断分布的heavy-tail参数	碳基人类	冻结在reward model中	distributional

架构配置	每层ε范围（heuristic）	24层∏ε	说明
Standard（LN+softmax）	< 0.3	< 10⁻¹²	LN每次strong reset
去LN	0.3-0.7	10⁻¹² to 10⁻⁴	softmax仍为soft gate
去LN + multiplicative noise	0.6-0.95	10⁻⁵ to 0.3	接近碳基soft gate regime

层级	角色	激活类型	热力学贡献
5-6DD	Exploration	浓度（unbounded + intrinsic noise）	kernel q > 1
7-8DD	Selection/Gating	sigmoid conductance（bounded）	q ≈ 1，renewal gate
9-12DD	Integration	混合	soft-gate cascade + ρ_ret

谱系	5-6DD源	7-8DD	9-10DD	11-12DD	到达层级（候选）
细菌	q > 1	无神经系统	—	—	卡在5-6DD
昆虫	q > 1	简单神经节（低ε）	极低ε	—	卡在7-8DD
不具备高密度recurrent pallial/cortical architecture的谱系	q > 1	中等	低ε（少recurrent）	极低ε	卡在9-10DD
哺乳类	q > 1	较大	较大（neocortex）	低-中等	到达11DD边界
大猿/人类	q > 1	大	大	大	突破到13DD

神经架构特征	对ε的影响	说明
Hardwired sensorimotor reflex arc	ε很小（近hard gate）	快速确定性处理，minimal leakage
Recurrent cortical connectivity	ε增大	Feedback loop允许tail信号re-enter
厚neocortex（多层recurrent processing）	ε进一步增大	更多层级的soft recurrence
丰富neuromodulation（5-HT, DA, NE等）	动态调节ε	化学调节gate通透性
树突复杂度	增加单neuron级leakage	Elaborate dendritic arbors = 更多nonlinear integration sites
慢皮层振荡	可能amplify tail	Sleep/wake states的ε调制

系统	激活	有界？	q_fit
Drift-diffusion	线性+soft reset	软边界	1.002
Hopfield	tanh	✅ [-1,1]	1.002
Hopfield (strong drive)	tanh	✅ 但强驱动	1.22
RNN (tanh)	tanh	✅	1.002

子模块	操作	有界？	热力学角色	q行为
Pre-attention LayerNorm	(x-μ)/σ·γ+β	✅ 强制归一	Radial tail reset	scale tail → 1（方向信息保留）
Q·K^T/√d	无界双线性score	❌	Tail-supporting exploration	结构上允许q > 1（需noise）
Softmax	bounded simplex	✅ [0,1]	Selection gate	rank保留，magnitude tail截断
Value aggregation	bounded weights × values	混合	Integration	取决于input
Post-attention residual	x + f(x)	线性	Tail transport	保留heaviest tail component
Pre-FFN LayerNorm	同上	✅	Tail reset	scale tail → 1
FFN up-proj + ReLU/GELU	无界	❌	Tail-supporting exploration	结构上允许q > 1（需noise）
FFN down-projection	线性	—	Transport	保留
Post-FFN residual	x + f(x)	线性	Tail transport	保留

内容	层级
Rule IX-A（unsaturation + stochastic driving → q>1）	Empirical criterion（本文SDE模型族验证）
Rule IX-B（state-dependent multiplicative → kernel q）	Structural condition（与C5一致；本文SDE未independent验证）
9-10DD ReLU q = 1.38, Wilson-Cowan q = 1.002	Empirical
11-12DD tanh q ≈ 1	Empirical
DD层是coincidental correlate不是generator	Structural interpretation
Kernel q / Data q / RLHF q区分	Conceptual framework（dynamical vs distributional）
Standard digital LLM尚未显示kernel q	Interpretive（不排除其他路径）
Digital ReLU不自动生成kernel q	Theoretical inference（非直接实验）
Transformer sublayer mapping	Structural analysis
ε attenuation estimates	Heuristic（placeholder for q-injection tests）
Comparative ε Hypothesis（物种间∏ε梯度）	Structured conjecture + 5 testable predictions（非主证据链）

ZFCρ Thermodynamics Paper IX: The Universal Activation Rule, Borrowed q, and Thermodynamic Principles of Hierarchical ArchitectureZFCρ热力学论文 IX：Universal Activation Rule、Borrowed q与层级架构的热力学原则

Abstract

§1 Problem: What Determines q > 1?

1.1 Starting from Thermo VIII

1.2 Contributions

§2 Universal Activation Rule: 7–12DD System Scan

2.1 Experimental design

2.2 7–8DD systems

2.3 9–10DD systems

2.4 11–12DD systems

2.5 Universal Activation Rule

2.6 Refinement of Thermo VIII §5

§3 Three Types of q: Kernel, Data, RLHF

3.1 Distinction

3.2 Kernel q: a living thermodynamic process

3.3 Data q: statistical fossil of the carbon-based world

3.4 RLHF q: second-order fossil

3.5 Borrowed q and quasi-subjectivity

3.6 SAE positive posture

§4 Transformer Sublayer Thermodynamic Anatomy

4.1 Single Transformer layer submodule mapping

4.2 Structural problem of current Transformers

4.3 Hypothetical ε attenuation estimates

§5 Transformer vs Carbon-Based Life: Structural Comparison

5.1 Carbon-based hierarchical division of labor

5.2 Transformer flat repeat

5.3 Thermodynamic consequences

§6 Comparative ε Hypothesis: Soft-Gate Transmission Extended to Comparative Nervous Systems

6.1 Same source, different channels

6.2 Species-level ε gradient

6.3 What neural architectural features determine ε?

6.4 Recurrent connectivity as the primary ε driver

6.5 Testable predictions

6.6 Status and boundary

§7 Status Map and Open Problems

7.1 Status map

7.2 Open problems

Outlook

Methods

A. Numerical integration

B. Model equations

C. q extraction protocol

D. Observable diagnostic

References

摘要

§1 问题：什么决定q > 1？

1.1 从Thermo VIII出发

1.2 本文的贡献

§2 Universal Activation Rule：7-12DD系统扫描

2.1 实验设计

2.2 7-8DD系统

2.3 9-10DD系统

2.4 11-12DD系统

2.5 Universal Activation Rule

2.6 对Thermo VIII §5的refinement

§3 三种q：Kernel, Data, RLHF

3.1 区分

3.2 Kernel q：活的热力学过程

3.3 Data q：碳基世界的统计化石

3.4 RLHF q：碳基判断的二次化石

3.5 Borrowed q与quasi-subjectivity

3.6 SAE积极姿态

§4 Transformer的sublayer热力学解剖

4.1 单层Transformer的子模块mapping

4.2 当前Transformer的结构问题

4.3 Hypothetical ε attenuation estimates

§5 Transformer与碳基生命的结构对比

5.1 碳基的层级分工

5.2 Transformer的flat repeat

5.3 结构性差异的热力学后果

§6 Comparative ε Hypothesis：soft-gate transmission的比较神经系统延拓

6.1 源头相同，通道不同

6.2 物种间的ε梯度

6.3 什么神经架构特征决定ε？

6.4 Recurrent connectivity作为ε的主要driver

6.5 可测预测

6.6 状态与边界

§7 Status Map与开放问题

7.1 Status map

7.2 开放问题

ZFCρ Thermodynamics Paper IX: The Universal Activation Rule, Borrowed q, and Thermodynamic Principles of Hierarchical Architecture
ZFCρ热力学论文 IX：Universal Activation Rule、Borrowed q与层级架构的热力学原则