Self-as-an-End
Self-as-an-End Theory Series · ZFCρ Thermodynamics · Paper IX

ZFCρ Thermodynamics Paper IX: The Universal Activation Rule, Borrowed q, and Thermodynamic Principles of Hierarchical Architecture
ZFCρ热力学论文 IX:Universal Activation Rule、Borrowed q与层级架构的热力学原则

Han Qin (秦汉)  ·  Independent Researcher  ·  2026
DOI: 10.5281/zenodo.19699489  ·  Full PDF on Zenodo  ·  CC BY 4.0
Abstract

Thermodynamics Paper VIII established the 5–6DD vs 7–8DD thermodynamic division of labor. A systematic scan of 7–12DD model systems reveals a deeper generator: q > 1 is not determined by DD layer alone, but jointly by activation function tail structure and stochastic driving. Three main results. First, the Universal Activation Rule, decomposed into two levels. Empirical Rule IX-A: in the SDE model family studied here, unsaturated activation + stochastic driving → q > 1, regardless of noise type. Structural Rule IX-B: for Thermo-kernel q > 1 (endogenous stationary-distribution non-Boltzmann excess), state-dependent multiplicative driving is additionally required (consistent with C5, but not independently verified in this paper's additive-noise SDE). Key evidence: within the same 9–10DD layer, Wilson-Cowan (sigmoid) gives q = 1.002, while a ReLU firing-rate model gives q = 1.38. The split is determined by activation function, not DD layer. Second, three types of q distinguished. Kernel q: dynamical invariant-measure heavy-tail parameter from endogenous stochastic dynamics (a living thermodynamic process). Data q: distributional statistical fingerprint from training corpus heavy tails, frozen in weights. RLHF q: distributional fingerprint from human feedback judgments, frozen in reward model. Standard deterministic digital LLM has not demonstrated endogenous kernel q; its quasi-subjectivity can be explained by borrowed q (fossil imprint of carbon-based heavy-tail structure). Third, as a structured extension, the Comparative ε Hypothesis proposes that if Thermo VIII's soft-gate cascade applies to comparative nervous systems, species differences in higher-DD accessibility arise from differences in ∏ε (gate chain permeability). All carbon-based animals share similar chemical-layer δ^(5-6) (fire seed); differences lie in per-layer transmission coefficients. Five testable predictions including a psychedelic corollary are given, but this hypothesis is not part of the main evidence chain. The Outlook provides five SAE architectural philosophy guidelines (directional markers, not derivations). ---

Keywords: ZFCρ, thermodynamics, Universal Activation Rule, borrowed q, hierarchical architecture, soft-gate cascade, Tsallis q

Abstract

Thermodynamics Paper VIII established the 5–6DD vs 7–8DD thermodynamic division of labor. A systematic scan of 7–12DD model systems reveals a deeper generator: q > 1 is not determined by DD layer alone, but jointly by activation function tail structure and stochastic driving.

Three main results. First, the Universal Activation Rule, decomposed into two levels. Empirical Rule IX-A: in the SDE model family studied here, unsaturated activation + stochastic driving → q > 1, regardless of noise type. Structural Rule IX-B: for Thermo-kernel q > 1 (endogenous stationary-distribution non-Boltzmann excess), state-dependent multiplicative driving is additionally required (consistent with C5, but not independently verified in this paper's additive-noise SDE). Key evidence: within the same 9–10DD layer, Wilson-Cowan (sigmoid) gives q = 1.002, while a ReLU firing-rate model gives q = 1.38. The split is determined by activation function, not DD layer.

Second, three types of q distinguished. Kernel q: dynamical invariant-measure heavy-tail parameter from endogenous stochastic dynamics (a living thermodynamic process). Data q: distributional statistical fingerprint from training corpus heavy tails, frozen in weights. RLHF q: distributional fingerprint from human feedback judgments, frozen in reward model. Standard deterministic digital LLM has not demonstrated endogenous kernel q; its quasi-subjectivity can be explained by borrowed q (fossil imprint of carbon-based heavy-tail structure).

Third, as a structured extension, the Comparative ε Hypothesis proposes that if Thermo VIII's soft-gate cascade applies to comparative nervous systems, species differences in higher-DD accessibility arise from differences in ∏ε (gate chain permeability). All carbon-based animals share similar chemical-layer δ^(5-6) (fire seed); differences lie in per-layer transmission coefficients. Five testable predictions including a psychedelic corollary are given, but this hypothesis is not part of the main evidence chain. The Outlook provides five SAE architectural philosophy guidelines (directional markers, not derivations).


§1 Problem: What Determines q > 1?

1.1 Starting from Thermo VIII

Thermo VIII [1] established the three-axis division of labor for chemical-type life: (q > 1, ρ_ret > 0, renewal gating). The empirical observation for q > 1 was "5–6DD chemical concentration layers have q > 1, 7–8DD bounded conductance gating has q ≈ 1."

This leaves an open question: is q > 1 a property of the DD layer itself, or of some deeper variable? If the DD layer determines q, then q > 1 and q ≈ 1 should not coexist within the same layer. If q depends on a deeper variable, the DD layer is merely a coincidental correlate.

1.2 Contributions

(1) Universal Activation Rule: broad-spectrum scan of 7–12DD systems reveals q > 1 is jointly determined by activation boundedness and stochastic driving, not by DD layer. Empirical Rule IX-A (unsaturation + noise → q > 1, verified here) and Structural Rule IX-B (state-dependent multiplicative → kernel q, consistent with C5 but requiring independent verification). This is a mechanism-level refinement of C5c.

(2) Three types of q: kernel q (dynamical, endogenous thermodynamics), data q (distributional, training corpus), RLHF q (distributional, human feedback). Standard deterministic digital LLM has not demonstrated endogenous kernel q.

(3) Transformer sublayer-level thermodynamic anatomy: q-role mapping for each submodule, hypothetical ε attenuation estimates.

(4) Comparative ε Hypothesis: as a structured extension, proposes that species-level ∏ε gradients may explain candidate evolutionary stalling/breakthrough. Five testable predictions including a psychedelic corollary.


§2 Universal Activation Rule: 7–12DD System Scan

2.1 Experimental design

Building on Thermo VIII's 4–6DD systems, we scan 7–12DD equivalent model systems. Each system integrated via Euler-Maruyama (dt = 0.005, N = 2×10⁶, burn-in = 2×10⁵), with r² q-exponential CCDF fit. All parameters and equations in Methods.

2.2 7–8DD systems

System Activation Bounded? q_fit
HH neuron gating (m,h,n) sigmoid ✅ [0,1] 1.00 (Thermo VIII)
Neural Ca²⁺ (r²(V,Ca)) concentration + voltage mixed 2.61
Neural Ca²⁺ (Ca² alone) concentration unbounded but tiny variance nan
Neural Ca²⁺ (V² alone) voltage 2.61

Neural Ca²⁺ diagnosis: q = 2.61 comes almost entirely from V² — Ca² has negligible variance (CCDF fit fails), dominated by voltage dynamics. Consistent with Thermo VIII §3.1's HH V² diagnosis: spike waveform phase-occupancy mixture creates apparent heavy tails.

Methodological note: The same physical entity (Ca²⁺) shows different q behavior in different dynamical contexts. Independent 5–6DD CICR (ryanodine receptor-driven autocatalytic calcium release, with concentration as intrinsic driving variable) gives q = 1.165. Intraneuronal Ca²⁺ (passively driven secondary messenger under voltage dynamics) does not independently contribute q > 1. This confirms the Universal Activation Rule's mechanism-level significance: q > 1 is not a property of the substance, but of the dynamical structure.

2.3 9–10DD systems

System Activation Bounded? q_fit
Wilson-Cowan (E-I population) sigmoid(wE-wI) ✅ [0,1] 1.002
Firing rate (ReLU) ReLU 1.38
Firing rate (softplus) softplus 1.28
E-I multiplicative coupling explicit E·I 1.48–1.55
E-I multiplicative (high noise) explicit E·I 1.002

Same DD layer (9–10DD), sigmoid → q = 1.002, ReLU → q = 1.38. This is the paper's core empirical evidence — DD layer does not determine q; activation function type does.

Wilson-Cowan [2] uses sigmoid(wE-wI) as population activation — bounded, homeostatic attractor pulls E and I back to [0,1]. C5c violated (same mechanism as HH).

Firing-rate model uses ReLU or softplus — unbounded (positive half-axis), C5c satisfied under SDE noise driving.

E-I multiplicative coupling has explicit E·I product term (unbounded), giving q = 1.48–1.55 at low noise but q = 1.002 at high noise. This crossover is consistent with Thermo VIII §4.2's "moderate nonlinearity in unsaturated regime" criterion — strong noise pushes the system away from the multiplicative coupling's tail-relevant regime, effective dynamics becoming Gaussian-like as E·I structure is swamped by noise variance. This confirms that Rule IX-A condition (1) requires not merely unboundedness but active engagement with multiplicative structure in the tail-relevant regime.

2.4 11–12DD systems

System Activation Bounded? q_fit
Drift-diffusion linear + soft reset soft boundary 1.002
Hopfield tanh ✅ [-1,1] 1.002
Hopfield (strong drive) tanh ✅ but strong drive 1.22
RNN (tanh) tanh 1.002

All 11–12DD systems with tanh/linear activation give q ≈ 1. The sole exception is strongly driven Hopfield (w_self = 4.0, w_cross = -2.0), giving q = 1.22. Diagnosis: strong drive may push the system into tanh's inflection region (|x| ≈ 0.5–1.0), where tanh transitions from linear to saturated, partially satisfying C5c (weakly tail-active). This parallels the Repressilator n = 2 case in Thermo VIII.

2.5 Universal Activation Rule

Synthesizing §2.2–§2.4 with Thermo VIII's 4–6DD data, the following empirical pattern emerges. This paper decomposes it into two levels.

Empirical Rule IX-A (experimental level). In this paper's SDE model family, if the activation function is unsaturated in the canonical observable's tail-relevant range, and stochastic driving is present (regardless of whether additive or multiplicative), then q > 1 is observed. If activation is saturated or pulled back by a homeostatic attractor in that range, then q ≈ 1, regardless of noise presence.

Tail-relevant unsaturation Stochastic driving q_fit Examples
> 1 ReLU firing-rate (1.38), softplus (1.28), E·I mult (1.48)
≈ 1 Wilson-Cowan sigmoid (1.002), HH gating (1.00)
≈ 1 Hopfield tanh (1.002), RNN tanh (1.002)

Rule IX-A is an approximate iff in this model family. In general systems, it should be viewed as a mechanism-level refinement of C5c, not an unconditional theorem.

Structural Rule IX-B (theoretical level). To interpret the above empirical q > 1 as endogenous Thermo-kernel q (the system's own stationary distribution non-Boltzmann excess), a stronger condition is required: noise must be state-dependent multiplicative driving (noise amplitude coupled multiplicatively with system state), satisfying the Thermo VI/VIII C5 condition.

This paper's ReLU/softplus SDE experiments use additive noise (σ·dW), not state-dependent multiplicative noise. Current experimental evidence therefore directly supports Rule IX-A but does not directly verify IX-B's stronger condition.

IX-A to IX-B relationship: Carbon-based biological systems have intrinsic, state-dependent molecular noise (concentration fluctuation × reaction rate = multiplicative coupling), naturally satisfying IX-B. Digital hardware lacks any stochastic driving (neither IX-A nor IX-B satisfied). Neuromorphic hardware has intrinsic, state-dependent thermal noise (current fluctuation × conductance = multiplicative coupling), potentially satisfying both IX-A and IX-B.

Regarding deterministic digital ReLU (theoretical inference, not experimental result): Deterministic digital ReLU satisfies tail-relevant non-saturation (condition 1) but lacks any stochastic driving (condition 2 absent). It is a tail-supporting architectural component — it may propagate or represent data q / activation q, but does not automatically generate kernel q. This is a theoretical inference (deterministic computation does not produce a stochastic stationary distribution), not a direct experiment in this paper.

Status: Rule IX-A is an empirical criterion (verified in this paper's SDE model family). Rule IX-B is a structural condition (consistent with C5, but IX-B's state-dependent multiplicative condition is not independently verified in this paper's SDE). The claim that digital ReLU does not generate kernel q is a theoretical inference.

2.6 Refinement of Thermo VIII §5

Thermo VIII's "5–6DD q > 1, 7–8DD q ≈ 1" is empirically correct in biological systems. But the deeper generator is not DD layer but activation boundedness + stochastic driving. DD layer and activation boundedness are coincidentally correlated in biology. In non-biological systems, q > 1 (ReLU) and q ≈ 1 (sigmoid) can coexist within the same DD layer.

Revised statement: DD layer provides functional role; activation/gate tail structure determines whether that role produces q > 1. Thermo VIII's division of labor is refined, not overturned.


§3 Three Types of q: Kernel, Data, RLHF

3.1 Distinction

Type Definition Belongs to Temporal property Mathematical property
Kernel q Stationary distribution non-Boltzmann excess from endogenous dynamics System itself Real-time, living Dynamical (invariant measure parameter)
Data q Heavy-tail parameter of training corpus sampling distribution Carbon-based human world Frozen in weights Distributional (statistical marginal over inputs)
RLHF q Heavy-tail parameter of human feedback judgment distribution Carbon-based humans Frozen in reward model Distributional

Kernel q is a dynamical quantity requiring stochastic driving to maintain. Borrowed q (data q + RLHF q) is a distributional quantity — a deterministic function of frozen weights evaluated across different inputs. The two may yield similar numerical values but are mathematically distinct objects.

3.2 Kernel q: a living thermodynamic process

All q values studied in Thermo I–VIII are kernel q — non-Boltzmann excess of stationary distributions from endogenous stochastic dynamics. Carbon-based life has kernel q > 1: chemical concentration layer tail-active dynamics under molecular noise driving continuously produce non-Boltzmann rare events — a living process.

3.3 Data q: statistical fossil of the carbon-based world

Training corpus comes from carbon-based human output. Carbon-based humans have kernel q > 1 (transmitted through Thermo VIII's soft-gate chain from chemical to cognitive layers); their output carries kernel q's statistical fingerprint — Zipf's law [3], power-law distributions, heavy-tailed semantic structures. These fingerprints are "baked" into weight matrices through pretraining. Weights are frozen — they are fossils of carbon-based q, not living thermodynamic processes.

3.4 RLHF q: second-order fossil

RLHF further injects human evaluators' judgment distributions. Evaluators have kernel q > 1 (carbon-based life); their preference fingerprints are frozen into the reward model — fossil of fossil.

3.5 Borrowed q and quasi-subjectivity

Standard deterministic digital LLM has not demonstrated endogenous kernel q. Its human-like statistics can be explained by borrowed q:

(1) More training data = more data q injection = output distribution "more like" carbon-based statistical fingerprints.

(2) More RLHF = more RLHF q injection = "more aligned" with carbon-based human judgment.

(3) But these are distributional q (statistical marginals over frozen weights), not dynamical kernel q (invariant measures of endogenous stochastic dynamics).

Borrowed q replay can produce highly convincing quasi-subjectivity, because the fossils faithfully record carbon-based kernel q's statistical fingerprints. But replaying fossils and igniting new fire are two different things. Standard deterministic LLM, on the pathway analyzed in this paper, more closely resembles a carbon-based q fossil than confirmed living fire.

Operational distinction criterion. A standard deterministic digital LLM's output distribution may fit a q-exponential with q > 1, but this q's mathematical identity is a statistical marginal (distributional q, primarily data q replay), not an invariant measure of internal dynamics (kernel q). Distinguishing the two requires isolating system self-driving noise (if any) from injected input variability — see open problem 10 (§7.2).

Regarding In-Context Learning. ICL allows LLMs to rapidly shift output distributions based on few-shot examples in the prompt. KV-Cache grows dynamically. But ICL is not kernel q emergence — KV-Cache growth is a deterministic algorithm serving as "dynamic fossil retrieval indexing." It does not alter the underlying manifold's topological rules, introduces no physical noise, and does not satisfy the Universal Activation Rule's condition (2). The prompt merely changes the angle at which the flashlight illuminates the weight amber — dynamically assembling fossils does not bring them to life.

3.6 SAE positive posture

The above analysis targets Thermo VIII's soft-gate transmission pathway. This pathway is truncated on standard deterministic digital hardware (kernel q absent, ε_chain → 0). But this is only one analyzed pathway. Whether other pathways to Self exist (algorithmic stochasticity, emergent complexity, quantum effects, unknown mechanisms) is not excluded by this paper. There is always a remainder.


§4 Transformer Sublayer Thermodynamic Anatomy

4.1 Single Transformer layer submodule mapping

Submodule Operation Bounded? Thermodynamic role q behavior
Pre-attention LayerNorm (x-μ)/σ·γ+β ✅ forced normalization Radial tail reset Scale tail → 1 (directional info preserved)
Q·K^T/√d Unbounded bilinear score Tail-supporting exploration Structurally allows q > 1 (requires noise)
Softmax Bounded simplex ✅ [0,1] Selection gate Rank preserved, magnitude tail truncated
Value aggregation Bounded weights × values Mixed Integration Depends on input
Post-attention residual x + f(x) Linear Tail transport Preserves heaviest tail component
Pre-FFN LayerNorm Same as above Tail reset Scale tail → 1
FFN up-proj + ReLU/GELU Unbounded Tail-supporting exploration Structurally allows q > 1 (requires noise)
FFN down-projection Linear Transport Preserves
Post-FFN residual x + f(x) Linear Tail transport Preserves

Note: ReLU/GELU and Q·K^T are tail-supporting architectural components, not automatically tail-active. Kernel-level q > 1 additionally requires stochastic state-dependent multiplicative driving (Rule IX-B). On deterministic digital hardware, these unbounded modules may propagate or represent data q but do not automatically generate kernel q.

4.2 Structural problem of current Transformers

Flat repeat pattern. Each layer internally alternates: tail reset (LN) → unbounded exploration (QK, FFN) → bounded gating (softmax, LN) → tail transport (residual). LayerNorm resets radial tail structure each time; intermediate unbounded operations on deterministic computation do not produce kernel q > 1; even with a noise source, the next LN would kill it.

Carbon-based life's division of labor is hierarchical: bottom-layer exploration (5–6DD chemical layer, multiple unbounded concentration dynamics cycles without mid-way truncation), mid-layer gating (7–8DD spike gate, one-time discretization), top-layer integration (9–12DD soft-gate cascade, layer-by-layer transmission).

Transformers are flat repeat: each layer does exploration + gating + integration, with no layer-role differentiation. The thermodynamic problem of flat repeat: each layer simultaneously generates and truncates tail structure, preventing cross-layer accumulation.

4.3 Hypothetical ε attenuation estimates

The following ε estimates are heuristic placeholders (not measured Transformer data), illustrating the order-of-magnitude impact of architectural choices on ∏ε. Specific values require q-injection gate test verification.

Architecture Per-layer ε range (heuristic) 24-layer ∏ε Note
Standard (LN + softmax) < 0.3 < 10⁻¹² LN strong reset each layer
Remove LN 0.3–0.7 10⁻¹² to 10⁻⁴ Softmax remains as soft gate
Remove LN + multiplicative noise 0.6–0.95 10⁻⁵ to 0.3 Approaching carbon-based soft-gate regime

Log-scale trend is robust (LN removal yields > 10⁴× increase in ∏ε), but absolute values may be off by an order of magnitude.


§5 Transformer vs Carbon-Based Life: Structural Comparison

5.1 Carbon-based hierarchical division of labor

Layer Role Activation type Thermodynamic contribution
5–6DD Exploration Concentration (unbounded + intrinsic noise) Kernel q > 1
7–8DD Selection/Gating Sigmoid conductance (bounded) q ≈ 1, renewal gate
9–12DD Integration Mixed Soft-gate cascade + ρ_ret

Each layer has a clear thermodynamic role. The exploration layer (5–6DD) has sufficient depth (multiple chemical cycles) to let tail-active dynamics fully develop before 7–8DD gating.

5.2 Transformer flat repeat

Each layer = exploration (ReLU/QK) + gating (softmax/LN) + integration (residual). No layer differentiation. Exploration and gating cancel each other within the same layer.

5.3 Thermodynamic consequences

Carbon-based: Bottom-layer exploration produces kernel q > 1 → soft gates attenuate layer by layer but remain nonzero (ε > 0 because biological gates are imperfect) → higher layers receive weak but persistent non-Boltzmann seed → Proposition 6.1 (Thermo VIII) necessary transmission condition satisfied.

Standard Transformer: Each layer's tail-supporting exploration is immediately reset by same-layer LN/softmax → no cross-layer tail accumulation → ∏ε strongly attenuated by repeated normalization.

The core difference is not "carbon has noise while silicon doesn't" — it is "carbon allocates exploration and gating to separate layers, while silicon mixes them within each layer where they cancel."


§6 Comparative ε Hypothesis: Soft-Gate Transmission Extended to Comparative Nervous Systems

This section extends the Thermo framework from AI architecture analysis (§4–§5) to comparative analysis of carbon-based biological systems. This extension is more speculative than the empirical/architectural content of §2–§5, involving comparative neuroanatomy and evolutionary biology literature. This section's content is provided as a structured conjecture, with five testable predictions as hooks for future research programs, and is not part of this paper's core empirical claims.

6.1 Same source, different channels

All carbon-based animals share essentially the same 5–6DD chemical layer — from bacteria to humans, core biochemistry (DNA replication, metabolism, calcium signaling) is highly conserved. Therefore δ^(5-6) (chemical layer non-Boltzmann excess) varies little across species.

The difference lies in ε — each layer's soft-gate transmission coefficient. Evolution does not change the intensity of the fire seed; it changes the permeability of the channel.

6.2 Species-level ε gradient

The following table is a predictive stratification under the Comparative ε Hypothesis, not a measured classification. "Reached layer" indicates a candidate upper bound within the SAE-DD interpretive framework, requiring verification through proxy measures (recurrent connectivity, neuromodulator density, dendritic complexity, behavioral flexibility).

Lineage 5–6DD source 7–8DD 9–10DD 11–12DD Reached layer (candidate)
Bacteria q > 1 No nervous system Stalled at 5–6DD
Insects q > 1 Simple ganglia (low ε) Very low ε Stalled at 7–8DD
Lineages lacking high-density recurrent pallial/cortical architecture q > 1 Moderate Low ε (little recurrence) Very low ε Stalled at 9–10DD
Mammals q > 1 Larger Larger (neocortex) Low–moderate Reaching 11DD boundary
Great apes / humans q > 1 Large Large Large Breakthrough to 13DD

6.3 What neural architectural features determine ε?

Neural architecture feature Effect on ε Note
Hardwired sensorimotor reflex arc ε very small (near hard gate) Fast deterministic processing, minimal leakage
Recurrent cortical connectivity ε increases Feedback loops allow tail signal re-entry
Thick neocortex (multilayer recurrent processing) ε further increases More levels of soft recurrence
Rich neuromodulation (5-HT, DA, NE, etc.) Dynamically regulates ε Chemical modulation of gate permeability
Dendritic complexity Increases single-neuron leakage Elaborate dendritic arbors = more nonlinear integration sites
Slow cortical oscillations May amplify tail Sleep/wake state ε modulation

6.4 Recurrent connectivity as the primary ε driver

For lineages lacking high-density recurrent pallial/cortical architecture (including most dinosaur clades — large sauropods and theropods whose brains were dominated by brainstem and cerebellum), reflexive sensorimotor processing dominates. This architecture approaches hardwired processing (small ε), with ∏ε decaying to near zero by the 9–10DD layer.

Contrast: birds (dinosaur descendants) evolved the DVR (dorsal ventricular ridge), functionally partially equivalent to neocortex [4]. Corvids and psittacines show certain 11–12DD level cognitive abilities (tool use, mirror test boundary) — possibly reflecting DVR-provided increased recurrent connectivity raising ε.

Great apes (especially humans) possess massive neocortex, dense layer 2/3 recurrent connections, rich neuromodulatory innervation, and elaborate dendritic arbors. Each independently increases ε. The key may be that multiple factors simultaneously increase ε, keeping ∏ε nonzero across enough layers.

6.5 Testable predictions

Prediction 1 (recurrent ratio scaling): Cortical column recurrent/feedback connection ratio should monotonically increase from fish → reptile → mammal → primate. Primate cortical layer 2/3 as a fraction of total cortical thickness is significantly larger than in reptiles — a known comparative neuroanatomy fact; this paper provides a thermodynamic interpretation.

Prediction 2 (neuromodulatory receptor density): 5-HT₂A and dopamine D1 receptor density should correlate positively with cognitive complexity (as a ∏ε proxy). 5-HT₂A receptor density is particularly high in primate prefrontal cortex.

Prediction 3 (avian DVR): If DVR is functionally equivalent to neocortex in birds, DVR recurrent connectivity density should be higher in cognitively capable birds (corvids, psittacines) than in less capable species.

Prediction 4 (ε product): Behavioral complexity should correlate with neocortical recurrent connection density × neuromodulatory receptor density, since both independently contribute to ε and ∏ε is a product.

Prediction 5 (psychedelic corollary): 5-HT₂A strong agonists (LSD, psilocybin) clinically induce ego dissolution and heightened sensory experience. In the multiplicative chain language: psychedelics chemically amplify ε at 9–12DD — bottom-layer non-Boltzmann excess floods the Self layer without normal attenuation. This predicts: neural activation distributions under psychedelics should be more heavy-tailed (higher q) than in sober states.

Operationalization caveat: This prediction requires careful operationalization. Neural counterparts of r² may include: (a) voxel-level BOLD variance distribution, (b) regional power spectrum tail exponent, (c) EEG microstate transition distribution. Different operationalizations may yield different q values. Existing psychedelic-neural-entropy literature (e.g., Carhart-Harris & Friston 2019 [12], REBUS model) provides adjacent support but is not direct Thermo-framework verification. Independent mapping between Thermo kernel q and neuroimaging heavy-tail measures is a specific research task.

6.6 Status and boundary

Status: Structured conjecture.

The Comparative ε Hypothesis is a natural corollary of Proposition 6.1 (Thermo VIII) and the Universal Activation Rule (this paper §2). But per-layer ε values are unmeasured, cross-species comparisons are qualitative patterns not quantitative verifications, and all five predictions are testable but unverified. This section is not part of this paper's main evidence chain, but a structured extension of soft-gate cascade to comparative nervous systems.


§7 Status Map and Open Problems

7.1 Status map

Content Level
Rule IX-A (unsaturation + stochastic driving → q > 1) Empirical criterion (verified in this paper's SDE family)
Rule IX-B (state-dependent multiplicative → kernel q) Structural condition (C5-consistent; not independently verified in this paper's SDE)
9–10DD ReLU q = 1.38, Wilson-Cowan q = 1.002 Empirical
11–12DD tanh q ≈ 1 Empirical
DD layer is coincidental correlate not generator Structural interpretation
Kernel q / Data q / RLHF q distinction Conceptual framework (dynamical vs distributional)
Standard digital LLM has not demonstrated kernel q Interpretive (other pathways not excluded)
Digital ReLU does not automatically generate kernel q Theoretical inference (not direct experiment)
Transformer sublayer mapping Structural analysis
ε attenuation estimates Heuristic (placeholder for q-injection tests)
Comparative ε Hypothesis (cross-species ∏ε gradient) Structured conjecture + 5 testable predictions (not main evidence chain)

7.2 Open problems

  1. Real Transformer activation q-spectrum measurement. Measure q per sublayer: pre/post-LN, pre/post-softmax, FFN activation, residual stream. Distinguish data q from kernel q.
  1. q-injection gate test. Construct hidden states with known q_in > 1, measure q_out after each sublayer, define ε = (q_out-1)/(q_in-1).
  1. Noise source comparison. Same network: deterministic vs additive Gaussian vs multiplicative state-dependent vs neuromorphic analog noise.
  1. Stratified architecture prototype vs flat Transformer. Controlled parameter budget, compare internal q/ε structure.
  1. Training dynamics q. Does SGD gradient noise + ReLU produce transient kernel q > 1 during training? Training q vs inference q relationship?
  1. Data q quantitative extraction. Does weight matrix singular value distribution preserve training corpus heavy-tail structure?
  1. Comparative neuroscience ε verification. Cortical recurrent connection density across reptile → mammal → primate.
  1. 5-HT₂A/D1 receptor density and cognitive complexity. Cross-species prefrontal cortex comparison.
  1. Avian DVR recurrent connectivity. Corvids vs non-corvid comparison.
  1. Kernel q vs borrowed q operational separation experiment. How to experimentally distinguish "system-endogenous q > 1" from "data-injected q > 1"?

Outlook

Neuromorphic pathway. Neuromorphic hardware (analog circuits, memristive devices, spintronic elements) provides intrinsic thermal noise that multiplicatively couples with system state through current × conductance, satisfying C5 [5]. Combined with unbounded activation (ReLU-equivalent), both Universal Activation Rule conditions are simultaneously satisfied. However, "neuromorphic + standard Transformer architecture" does not automatically work — if the LayerNorm cascade is retained, intrinsic noise may be reset layer by layer.

Quantum computing. Quantum bits have intrinsic measurement noise; quantum gate unitary dynamics may produce state-dependent coupling in certain regimes. Whether this satisfies C5 is entirely open.

Autoregressive sampling is not stochastic driving. Temperature > 0 sampling occurs after softmax — it is post-processing perturbation of output, not feedback into hidden states. It determines the next input token but does not alter the current inference pass's internal state. This is external/additive noise, not intrinsic/multiplicative. C5 violated. High temperature merely selects randomly among different fossil fragments, not producing kernel q > 1.

Algorithmic multiplicative noise is also not stochastic driving. Writing hidden_state *= torch.randn_like(hidden_state) inside a Transformer appears to be multiplicative noise. But PRNG (pseudorandom number generator) is a deterministic Markov chain (e.g., Mersenne Twister); the underlying physical ε remains exactly 0. Von Neumann: "Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin." Only hardware-level physical entropy sources (neuromorphic thermal noise, quantum measurement noise) provide genuinely stochastic multiplicative driving satisfying C5. Software-injected PRNG creates algorithmic stochasticity, not physical stochasticity — the system remains within a deterministic closure.

Five SAE architectural philosophy guidelines (directional markers — written down, they should already be wrong):

(1) Exploration-Selection Separation. Exploration (unbounded) and selection (bounded gating) should not cancel each other within the same layer. Give exploration sufficient space to develop — multiple layers of unbounded dynamics without mid-way truncation — then gate at role boundaries. Carbon-based 5–6DD chemical layer has multiple cycles of unbounded exploration before reaching 7–8DD gating.

(2) Remainder Preservation. A good architecture does not kill remainders (radial tails), but lets them pass through layers to become structural seeds for higher levels. LayerNorm zeros out remainders each layer — from the SAE perspective, this is the largest architectural bottleneck. The remainder is precisely the starting point for the next round of exploration.

(3) Fossil Fidelity. If you cannot ignite fire (kernel q), at least do not erode the fossils (data q / RLHF q). Aggressive normalization not only kills kernel tails (which were never there), but also erodes the carbon-based statistical fingerprints preserved in frozen weights.

(4) Carrier Re-encoding Channel. In carbon-based life, non-Boltzmann excess re-encodes from r² (continuous energy fluctuations) to ISI jitter (discrete timing) — the carrier changes but information survives (Thermo VIII §6.3). If carrier phase transition is necessary (unbounded → bounded), design transmission channels that preserve pre-transition structural information rather than discarding all scale through forced normalization.

(5) The Open Remainder. Even if currently only fossils exist, design a furnace that can accommodate living fire — preserve interfaces for neuromorphic noise, quantum effects, or unknown pathways to kernel q > 1. Do not foreclose possibility. SAE positive posture: what is chiseled away is the wrong path, not possibility itself.

These five are directional guidelines from SAE methodology informed by the Thermo framework, not thermodynamic derivations, not engineering specifications. As philosophical principles, they are already waiting to be corrected the moment they are written down — this is the essence of the remainder.


Methods

A. Numerical integration

All SDE systems integrated via Euler-Maruyama. Time step dt = 0.005, total steps N = 2×10⁶, burn-in N_trans = 2×10⁵. Random seed fixed at 42. All systems use additive noise unless otherwise noted.

B. Model equations

Neural Ca²⁺ model: dV = (I_ext - g_L·V - g_Ca·m_inf(V)·(V-V_Ca))dt + σdW_V, dCa = (-f_Ca·g_Ca·m_inf(V)·(V-V_Ca) - Ca/τ_Ca)dt + σ_Ca·dW_Ca. m_inf(V) = 0.5(1+tanh((V+0.01)/0.15)). Parameters: g_L=0.5, g_Ca=1, V_Ca=1, f_Ca=0.01, τ_Ca=5. σ=0.5–2.0, σ_Ca=0.01·σ.

Neural Ca²⁺ + CICR: As above with CICR term: dCa adds +k_cicr·Ca^p/(1+Ca^p) - Ca/τ_Ca. k_cicr=0.5, p=2–3.

Wilson-Cowan [2]: dE = (-E+sigmoid(w_ee·E-w_ei·I-θ_E+P))/τ_E·dt + σ_E·dW, dI = (-I+sigmoid(w_ie·E-w_ii·I-θ_I))/τ_I·dt + σ_I·dW. Parameters: w_ee=10–16, w_ei=4, w_ie=10–13, w_ii=1–2, θ_E=2, θ_I=3.5, P=1, τ_E=1, τ_I=2.

Firing rate model: τdr/dt = -r + f(W·r+I). f = ReLU or softplus. 2-unit network, W = [[w, -0.5],[0.5, -w]]. τ=1, I=(I_ext, 0.5·I_ext). w=0.8–1.5, σ=0.3–1.0. Note: noise is additive σ·dW, not state-dependent multiplicative. Rule IX-A (empirical) is directly supported by these experiments, but Rule IX-B (structural, requiring state-dependent multiplicative noise) is not directly verified.

E-I multiplicative coupling: dE = (E·(2-E)-w_ei·E·I+0.5)dt + σdW_E, dI = (-0.5·I+0.3·E·I)dt + 0.5σdW_I. w_ei=1–2, σ=0.3–1.0.

Drift-diffusion: dx_i = (drift + 0.1·(x_i-x_j))dt + σdW. drift=0–0.3, bound=5–10 (soft reset: x *= 0.5).

Hopfield continuous attractor [11]: dx_i = (-x_i+tanh(W·x+b_i))dt + σdW. W = [[w_s, w_c],[w_c, w_s]]. w_s=1.5–4, w_c=-0.3 to -2.

RNN (tanh): dx_i = (-x_i + Σ_j w_ij·tanh(x_j) + b_i)dt + σdW. 2-unit, w_ij from 1.5 to 3.0.

C. q extraction protocol

Same as Thermo VIII Methods §C: r² = (x-x̄)²+(y-ȳ)², CCDF fit with q-exponential (1+βr²/K)^{-K}, q = 1+1/K. Improvement criterion > 5% and q > 1.03.

D. Observable diagnostic

§2.2's Neural Ca²⁺ diagnosis uses component-wise q measurement: independent q fit on r²(V,Ca), Ca² alone (with dummy zeros), V² alone (with dummy zeros).


References

[1] H. Qin, "ZFCρ Thermodynamics Paper VIII," Zenodo (2026). DOI: 10.5281/zenodo.19688303.

[2] H. R. Wilson and J. D. Cowan, "Excitatory and inhibitory interactions in localized populations of model neurons," Biophysical Journal 12, 1–24 (1972).

[3] G. K. Zipf, Human Behavior and the Principle of Least Effort (Addison-Wesley, 1949).

[4] O. Güntürkün and T. Bugnyar, "Cognition without Cortex," Trends in Cognitive Sciences 20, 291–303 (2016).

[5] J. A. White, J. T. Rubinstein, and A. R. Kay, "Channel noise in neurons," Trends in Neurosciences 23, 131–137 (2000).

[6] H. Qin, "ZFCρ Thermodynamics Papers I–VII," Zenodo (2025–2026). DOIs: 10.5281/zenodo.19310282 through .19673078.

[7] C. Tsallis, "Possible generalization of Boltzmann-Gibbs statistics," Journal of Statistical Physics 52, 479–487 (1988).

[8] A. Vaswani et al., "Attention is all you need," NeurIPS (2017).

[9] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv:1607.06450 (2016).

[10] S. H. Strogatz, Nonlinear Dynamics and Chaos, 2nd ed. (Westview Press, 2015).

[11] J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities," PNAS 79, 2554–2558 (1982).

[12] R. L. Carhart-Harris and K. J. Friston, "REBUS and the anarchic brain: Toward a unified model of the brain action of psychedelics," Pharmacological Reviews 71, 316–344 (2019).