Language and Its Remainder

中文

Abstract

Supercalifragilisticexpialidocious — the word you say when you don't know what to say. It carries no fixed semantic content, yet it is anything but meaningless. It is the sonic surfacing of the non-empty residue (ρ) that the construct (C) inevitably leaves behind when it attempts to cover the experiential domain (U): not naming the remainder, but acknowledging it.

Language is the subject's chiseling of the markability subspace of the Law of Identity. Chiseling necessarily produces remainder. This paper argues that linguistic remainder has two structural layers: a semantic-layer remainder (meaning associations severed by discrete symbols — the inexpressible, the untranslatable, the inexhaustibility of metaphor) and an ontological-layer remainder (directionality, momentariness, relationality — conditions of subjecthood that no representational system can internalize).

In the pre-AI era, these two layers were entangled and indistinguishable. Large Language Models (LLMs), by lowering the effective discreteness of their internal representations, systematically reclaim much of the semantic-layer remainder. This reclamation exposes the ontological-layer remainder in pure form for the first time: not "what cannot be said" but "who is speaking," "why this and not that," and "the act of speaking right now." The LLM is a developer between the two layers.

AI-era language use has two structural modes. Cultivation: the LLM's total construct exceeds the user's, illuminating the user's blind spots; the user claims their own remainder; the chiseling subject remains human. Colonization: the user substitutes the LLM's construct for their own; chiseling ceases; the remainder is bypassed. The criterion: whether the LLM permits the user to exercise negation upon the LLM itself.

Everyone needs to find their own supercalifragilisticexpialidocious — the sound they make at the site of their own remainder. AI can help you find it, or it can route you around it. The difference is cultivation versus colonization.

Chapter 1. The Problem: Why Language Has Remainder

Core thesis: Supercalifragilisticexpialidocious is not nonsense; it is the purest surfacing of linguistic remainder. The chiseling of the markability subspace necessarily produces non-empty remainder. Linguistic remainder is not a failure of expression — it is a structural product of C(U). The central question for the AI era is not "does remainder still exist" but "on which layer."

1.1 When you don't know what to say

Mary Poppins teaches the children a word: supercalifragilisticexpialidocious. When you don't know what to say, say this word, and you'll feel better. At the moment of its birth, this word carried no fixed semantic content. In Saussure's terms, it was nearly pure signifier without signified. By orthodox linguistic standards, it was close to noise.

But it was anything but noise. It was a sound emitted at the boundary of the language system — at the point where "the construct cannot reach," where the subject responded to that unreachability with the sheer materiality of sound. It was not naming, not describing, not explaining. It was acknowledgment — a sonic event marking "here is something my construct cannot reach."

Saying this word makes you feel better — not because the remainder is eliminated, but because your relation to the remainder changes: from "I can't say it" (passively trapped) to "I know I can't say it, and I have made a speech act out of the very inability to say" (actively acknowledging). Acknowledgment does not eliminate remainder; it repositions the subject relative to it.

Then the word was recaptured: Merriam-Webster entered it as "extraordinarily good, wonderful." A sound born at the site of remainder acquired fixed meaning — a textbook case of remainder being reclaimed by construct. But reclamation erases the moment of acknowledgment. The original supercalifragilisticexpialidocious is a surfacing of remainder; the dictionary's is the construct's co-optation of it. The tension between these two is what this paper addresses.

1.2 Chiseling necessarily produces remainder

ZFCρ Paper 1 proved that C(U) necessarily produces non-empty remainder ρ. Language I argued that language is the subject's second-order chiseling of the markability subspace, but barely addressed remainder directly. This paper fills that gap.

Chiseling the markability subspace is naming — "this is called dog, not cat." Every act of naming is a C(U) operation. What is not bound — everything about this particular dog: its smell, the light in its eyes, the inarticulable feeling you had when you first met it — is remainder. Remainder is not "what hasn't been named yet." It is structural: every act of naming simultaneously eliminates an old remainder and produces a new one. The remainder non-emptiness theorem, in its linguistic instantiation, says: naming can never be completed.

1.3 Everyday forms of linguistic remainder

Linguistic remainder is not a philosopher's invention. Everyone encounters it daily. Words failing: you have a feeling; after speaking, you realize "that's not quite what I mean." Tip of the tongue: a word is right there but you can't produce it — "knowing it's there but being unable to say it" is a pure surfacing of remainder. Implicature: what you want to convey is not on the surface; Chinese culture has a highly developed sensitivity here: "overtones beyond the strings," "words end but meaning is boundless." Untranslatability: "Saudade," "wabi-sabi," "Sehnsucht" — these words have no exact equivalents, because different languages chisel the markability subspace differently, leaving differently shaped remainders.

These everyday forms share a common feature: they all occur on the semantic layer — about "meaning not covered by form." The root of semantic-layer remainder is the discreteness of human language.

1.4 Remainder is not only on the semantic layer

When you say "I understand you" to someone, even if the sentence is perfectly precise on the semantic layer, something is still left out: the fact that you are saying this to this person right now. Language is not merely a meaning-delivery system. Language is the subject's activity. At the moment of speaking, the subject occupies at least three dimensions beyond meaning-delivery:

Directionality. Why did you say this sentence and not another? The act of selecting — "I negated all other possibilities and chose this one" — is the exercise of negativity, not meaning.

Momentariness. "I am saying this now" — the "now" is not a timestamp. A timestamp is the trace of a past now; the moment you write it, "now" has already moved.

Relationality. "I am saying this to you" — the "to" is not an information channel but an orientation between two subjects. The same "I love you" said to different people has identical meaning but entirely different relationality.

Directionality, momentariness, and relationality are conditions of the subject's activity itself — residues that no representational system can internalize. This is ontological-layer remainder.

1.5 The two layers' inseparability and AI-era separation

In the pre-AI era, these two layers were entangled, impossible to tell apart. When you experience "words failing," you cannot distinguish whether it's a vocabulary problem (semantic layer) or an inherent uncapturability of the present moment (ontological layer).

The AI era changes this. LLMs reclaim much of the semantic-layer remainder. Once the semantic-layer remainder is systematically reclaimed, the ontological-layer remainder stands exposed in pure form: not "what cannot be said" but "who is speaking," "why this," "the act of speaking right now." The LLM is a developer between the two layers — just as a chemical developer makes a photographic negative's latent image visible.

Chapter 2. Two-Dimensional Structure: Foundation and Emergence of Linguistic Remainder

Core thesis: Linguistic remainder unfolds within a two-dimensional meta-structure. Foundation layer: the non-eliminable meaning residue produced by chiseling. Emergent layer: the ways remainder becomes perceptible in linguistic activity.

2.1 Foundation layer: generation of remainder

Every act of chiseling produces remainder. A key characteristic: remainder changes shape as chiseling deepens, but does not diminish. Initial chiseling ("this is called dog") leaves coarse-grained remainder. Further chiseling ("golden retriever," "three years old," "gentle") reclaims some, but each new word produces new remainder. This corresponds precisely to ZFCρ: expanding the axiom system covers more objects, but remainder non-emptiness is not eliminated.

The foundation layer generates remainder in two modes: Frontier remainder — encountered at the leading edge of expression, where existing vocabulary is insufficient; poets work here most frequently. Background remainder — passed over unconsciously when using existing vocabulary; every everyday word carries meaning fragments never noticed.

2.2 Emergent layer: surfacing of remainder

Once generated, remainder needs to be perceived, acknowledged, responded to. Basic modes of remainder surfacing:

Silence. Meaningful silence — a pause in conversation, an ellipsis in a letter, white space in a poem — is remainder's most direct surfacing. Metaphor. Using one domain's construct to illuminate another domain's remainder; the illuminated parts are revealed, the obscured parts are the metaphor's own remainder. Repetition. Beckett's Waiting for Godot is almost entirely composed of repetition — because what they are trying to express carries enormous remainder in any single utterance. Coinage. Supercalifragilisticexpialidocious coins not at the level of meaning precision but at the level of remainder acknowledgment. Dialogue. Each turn responds to the previous turn's remainder and produces new remainder for the next — a relay of remainder. When a deep conversation ends and both parties feel "there's still more to say" — this is the structural necessity of dialogue as remainder relay.

2.3 Dialectical support between the two dimensions

Foundation catalyzes emergence: new remainder generation catalyzes new surfacing methods. Stream of consciousness as a narrative technique emerged because the remainder of fragmented conscious experience could not be covered by traditional narrative construct. Emergence catalyzes foundation: existing surfacing methods make new remainder perceivable. Before the metaphor "time is money," the irreversibility of time was a silent remainder. After, "time is not like money because money can be earned back but time cannot" suddenly became perceivable.

Chapter 3. Domain-Specific Distinction: Semantic-Layer Remainder and Ontological-Layer Remainder

Core thesis: Linguistic remainder has two structural layers — semantic-layer remainder and ontological-layer remainder. The emergence of LLMs makes these two layers structurally distinguishable for the first time.

3.1 Semantic-layer remainder: the cost of discreteness

Semantic-layer remainder originates in human language's discreteness. Structural characteristics:

Continuous meaning severed by discrete cuts. Between "mild unease" and "deep terror" lie infinitely many intermediate states, but vocabulary is discrete — "unease," "anxiety," "fear," "terror." The meaning in the gaps is remainder.

Associations severed by boundaries. "Sorrow" and "autumn" are two separate cells; the association between them requires the subject to actively build it. Before being actively built, cross-cell associations are remainder.

Context filtered by formal identity. "Home" in everyday use is a location label; in a dying person's last words, its meaning density is entirely different — but the form is the same word. Formal identity filters out contextual variation.

Semantic-layer remainder is conditionally reclaimable — switching to a lower-discreteness representational system can reclaim some of it.

3.2 Ontological-layer remainder: the non-internalizable conditions of subjecthood

Ontological-layer remainder is not "finer-grained meaning." It is not on the meaning dimension at all.

Directionality. The act of selection — "I negated all other possibilities and chose this one" — is the exercise of negativity. Negativity is not the object of representation but the condition under which representation occurs. Every attempt to represent directionality itself has directionality.

Momentariness. "It is happening right now" cannot be captured by a timestamp. A timestamp is the trace of a past now — the moment you write it, "now" has already moved. Momentariness remainder is temporal in essence: its nature is uncapturability.

Relationality. "I am saying this to you" — the orientation between two subjects. The same sentence "I love you" to different people has identical meaning but entirely different ontological facts.

The key characteristic: it is not conditionally reclaimable. No matter how large or fine-grained your construct, ontological-layer remainder does not diminish — because directionality, momentariness, and relationality are preconditions for the construct to operate, not objects for it to act upon. You cannot use a tool to process the tool's user.

3.3 LLM's systematic reclamation of semantic-layer remainder

LLMs do not simply "cancel discreteness." LLM input and output remain discrete tokens — the discrete interface is always present. What LLMs actually do is significantly lower the effective discreteness at the internal representation layer while retaining the discrete interface: tokens are mapped via embeddings into a high-dimensional continuous vector space.

In the LLM's internal representation space: "Sorrow" and "melancholy" are neighboring positions with continuous transitions — inter-cell gaps reclaimed. The association between "sorrow" and "autumn" is directly encoded in relative position — cross-domain associations reclaimed. "Home" in different contexts occupies different representational positions — contextual variation reclaimed.

The LLM's analogy, cross-domain association, style transfer, and context sensitivity — from the perspective of remainder, these emergent capabilities are the systematic reclamation of semantic-layer remainder.

"Much" rather than "all" — LLM hallucination can be understood from this angle: the free sliding of meaning in continuous space is the cost of reduced semantic-layer remainder. Discrete boundaries both produce remainder (severing meaning) and provide anchors (preventing sliding). LLMs lower discreteness, simultaneously reducing both remainder and anchors.

3.4 Development: what is exposed after semantic-layer reclamation

Once the LLM systematically reclaims semantic-layer remainder, ontological-layer remainder is exposed from beneath. Now you have an LLM. You input that feeling; it unfolds twenty ways of expressing it. But looking at the output, you still feel "something's still missing." That "something missing" can now be precisely located: not a matter of meaning precision but ontological-layer remainder — "this isn't me speaking," "my present state is not in here," "these words are not addressed to anyone."

A non-trivial corollary: before the AI era, all philosophical discussions of linguistic remainder — Wittgenstein's "what cannot be said," Derrida's différance, Zen's "no reliance on words" — were actually discussing remainder without distinguishing between the two layers. Before the LLM's appearance, this distinction had no empirical basis. The LLM provides, for the first time, a purely semantic-layer operator — extremely strong at semantic-layer reclamation but having no subjecthood, so ontological-layer remainder is entirely absent from it. Precisely because the LLM has no subject, the boundary between the two layers becomes empirically observable for the first time.

Chapter 4. Colonization and Cultivation: Language Relations in the AI Era

Core thesis: AI-era human-machine language relations have two structural modes — cultivation and colonization. The criterion: whether the LLM permits the user to exercise negation upon the LLM itself.

4.1 Cultivation: LLM illuminates the user's blind spots

The LLM's total construct exceeds any individual user's construct. This means: what your C cannot cover in U, the LLM's C can reach. Your remainder falls within the LLM's construct domain.

The basic mechanism: you write something and get stuck — words fail, you cannot continue. The point where you get stuck is where your remainder surfaces. You hand the text to the LLM; it unfolds it in its larger construct domain. The critical next step: you see the LLM's unfolding and say "yes, that's what I was trying to say but couldn't" — in that moment, you claim your own remainder. Or you say "no, that direction isn't what I want" — in that moment, you negate the LLM's unfolding, and that negation is itself an act of chiseling only you can perform.

Cultivation takes several forms: Unfolding cultivation — the LLM offers ten possible expressions; you find one by recognition, not selection by the LLM. Contrast cultivation — "that's not it" — the negative reaction itself clarifies what you actually want. Relay cultivation — you and the LLM form a dialogue, a remainder relay.

The criterion for cultivation: the user consistently retains the right of negation over LLM output, and exercises that right in practice.

4.2 Colonization: LLM replaces the user's chiseling

You hand the LLM a request — "write me an email," "write me an essay" — the LLM returns text, and you use it directly. The deep structure: the chiseling in that text is not your chiseling. The LLM unfolds meaning uniformly across its continuous space, outputting an "average-optimal" construct — but this construct has no directionality, because the LLM has no subject. It does not originate from your remainder.

Your own chiseling did not occur. That "stuck" point — the site where your remainder would have surfaced — has been routed around. Your remainder falls silent. Not eliminated (remainder cannot be eliminated), but bypassed. A single bypass is not colonization. Colonization occurs when bypassing becomes habit.

4.3 The deep mechanism of colonization: construct replacement

The first layer of colonization is behavioral: no longer writing yourself. The second layer is deeper: you begin thinking with the LLM's construct.

The LLM has a specific organizational style — a chiseling method. It tends toward enumeration ("there are several points"), toward symmetry ("on one hand... on the other hand"), toward mild summation ("overall"). This chiseling method is not a neutral formal tool — it is itself a system of meaning-cutting with its own remainder.

Moreover, production LLMs are not bare models — they have been pre-shaped by platform rules, alignment training, and default style guidelines. OpenAI's Model Spec prescribes the product model's intended behavior and default voice; Anthropic's Constitutional AI shapes model responses through a set of principles. What you internalize is not just "the LLM's average style" but what a particular company's product designers decided "good AI output" should look like.

The ultimate form of colonization: not that the user becomes remainder, but that the user is absorbed into the construct — the user's subjecthood remains a remainder, but it is suspended. Remainder is non-empty, the subject still exists, but the channel of contact between them is severed. This is the mirror image of Husserl's epoché: Husserl's epoché is liberatory (you pause habit, thereby seeing what habit concealed); colonization's suspension is suppressive (your remainder is shelved, thereby you cannot see what was originally yours).

4.4 The spread mechanism of colonization

Colonization is a gradual process with identifiable stages:

Stage one: convenience substitution. The LLM writes in unimportant contexts — routine emails, formatted documents. The chiseling subject is still the user. Not colonization.

Stage two: threshold drift. The user's judgment of "what counts as important" begins to drift. Each concession shifts the threshold further, with no clear breakpoint.

Stage three: aesthetic assimilation. The user feels the LLM writes "better" — clearer, more organized. This judgment is itself a symptom of colonization: the standard for "good" has been redefined by the LLM's construct.

Stage four: construct internalization. Even when not using the LLM, you think in the LLM's manner. At this stage, even if you stop using the LLM, the effects persist — because your construct has already been reshaped.

4.5 Criterion: retention of negativity

The criterion for cultivation versus colonization: whether the user still experiences their own remainder.

In cultivation, the user still gets stuck, still finds words failing, still struggles with expression. In colonization, the user no longer gets stuck — not because expressive ability has improved, but because chiseling has been handed to the LLM. Not getting stuck is not fluency; it is numbness.

More precise criteria: (a) Can the user identify the non-self portions of LLM output? (b) Does the user maintain their own chiseling in the most important speech acts — confessions of love, apologies, farewells, words spoken to yourself during a crisis? (c) Does the user's language still produce moments of self-surprise — writing along when suddenly a sentence emerges that you had not anticipated?

4.6 Structural map of the four interactions

	Positive (cultivation)	Negative (colonization / closure)
LLM → User	LLM's larger construct domain illuminates user's blind spots; user claims their own remainder (unfolding, contrast, relay cultivation)	LLM's construct replaces user's construct; user's chiseling ceases; remainder is bypassed (convenience substitution → threshold drift → aesthetic assimilation → construct internalization)
User → LLM	User's negativity calibrates LLM's directionless unfolding; LLM's construct becomes more precise through user's chiseling	User rejects all LLM assistance ("I only use my own words"), sealing off the possibility of semantic-layer remainder reclamation; self-confined within human language's discreteness limits

Note the fourth quadrant — user → LLM negative — structurally isomorphic with Dadaism. Rejecting all AI assistance appears to resist colonization but actually simultaneously rejects cultivation. The healthy state is neither wholesale embrace (colonization) nor wholesale rejection (closure), but dynamically maintaining cultivation. This equilibrium, like every cultivation equilibrium in this series, is unstable, requiring continuous active maintenance.

Chapter 5. Theoretical Positioning: Dialogue with Existing Discussions

Core thesis: This paper's remainder stratification, LLM as developer, and the cultivation / colonization framework form precise dialogues with existing discussions in language philosophy and AI ethics.

5.1 Dialogue with Wittgenstein's "what cannot be said"

Early Wittgenstein's "what cannot be said" — logical form cannot express itself — is a special case of semantic-layer remainder: conditionally reclaimable (switch to a metalanguage, and the formerly unsayable becomes sayable). Late Wittgenstein's "what can only be shown" is closer to ontological-layer remainder. But Wittgenstein did not distinguish the two layers. The framework's contribution: semantic-layer "unsayable" has been substantially reclaimed by LLMs; ontological-layer "unsayable" has become, after LLMs, more conspicuous than ever. Were Wittgenstein alive in the AI era, he might discover: LLMs perform brilliantly in "language games" (semantic layer), but "forms of life" — the living subject playing the game — remains beyond the LLM's reach (ontological layer).

5.2 Dialogue with Derrida's différance

Derrida's core argument — meaning forever slides along the chain of differences, never fully captured — is a precise description of semantic-layer remainder non-emptiness. But Derrida extends this into a deconstruction of all presence. The framework's response: the semantic layer indeed cannot fully arrive, but the ontological layer has a different kind of "arrival" — not the arrival of meaning, but the occurrence of the now. When you say "I am here" to someone, semantic-layer différance still operates, but the ontological layer contains a non-sliding fact: you are indeed here right now. The LLM's arrival makes this dialogue concrete: the LLM performs infinite différance operations (meaning associations can be infinitely unfolded in continuous space), but it has no "being here right now." Derrida's différance is perfectly realized in the LLM — which is precisely why it reveals that what différance describes is not everything.

5.3 Dialogue with Heidegger's "language is the house of Being"

The framework agrees with Heidegger's core intuition but offers a more precise formulation: language is the subject's chiseling activity, and in the process of chiseling, the subject encounters remainder — this encounter is what Heidegger calls "dwelling." The AI-era question: if the user's chiseling is replaced by the LLM (colonization), dwelling is interrupted — not because language is absent (the LLM produces copious language) but because remainder encounter is absent. Language under LLM colonization is language in which no one dwells — the house stands, but no one is home.

5.4 Dialogue with contemporary AI ethics' "human-in-the-loop"

The framework agrees with the direction of "human-in-the-loop" but considers the formulation too weak. "Human-in-the-loop" positions the human as reviewer — but reviewing is not chiseling. If the "human" in "human-in-the-loop" is only a reviewer, that human will eventually be replaced by a better AI reviewer.

The framework's formulation is "human-in-the-chiseling" — the human's irreplaceable role is not reviewing AI output but providing direction for AI's directionless unfolding. Directionality is ontological-layer remainder — AI has no direction; humans do. "Human-in-the-loop" focuses on reducing AI's errors (semantic-layer problem). "Human-in-the-chiseling" focuses on preserving human directionality (ontological-layer problem).

5.5 Dialogue with translation theory's "untranslatability"

Semantic-layer untranslatability: two languages' C(U) operations leave differently shaped remainders. Conditionally mitigable — LLMs' translation capabilities at this layer already far exceed traditional machine translation. Ontological-layer untranslatability: the subject who spoke — that now, that direction, that relation — cannot be reproduced. Benjamin's notion that translation gives the original an "afterlife" touches precisely this layer: translation cannot reproduce ontological-layer remainder; it can only produce new ontological-layer remainder in the target language.

Chapter 6. Non-Trivial Predictions

Core thesis: Eight non-trivial predictions. The first three arise from the general structure of remainder; the last five arise from AI-era human-machine language relations.

A. General Predictions about Linguistic Remainder

Prediction 6.1 — Remainder pressure: neologism emergence is pulsed, not uniform

A language's neologism production rate is not a linear function of time but exhibits a burst-quiescence-burst pulse pattern. Burst periods correspond to large-scale experiential rupture (technological revolution, cultural contact, war, migration); quiescence periods correspond to experiential stability. Burst-period rates correlate positively with the scale of rupture.

Reasoning: In normal states, background remainder does not catalyze new surfacing — people manage with existing vocabulary. Experiential rupture breaks this equilibrium: new experience produces large quantities of frontier remainder not coverable by existing vocabulary. When frontier remainder reaches critical density, surfacing is collectively catalyzed — a cluster of neologisms emerges.

Testable: Analyze diachronic lexical data (e.g., OED neologism admission time series, successive Chinese dictionary editions' supplements), testing whether neologism production rate shows a pulse pattern rather than linear growth. The framework predicts high correlation with identifiable experiential rupture events.

Non-triviality: Common sense assumes "language evolves steadily." This prediction argues: neologisms do not grow spontaneously from within; they are catalyzed by experiential rupture's remainder surfacing. Without experiential rupture, remainder stays silent and the language system tends toward stasis.

Prediction 6.2 — Translation remainder: more distant language pairs produce more neologisms in translation

With comparable translation volume, translation between more structurally distant language pairs produces more target-language neologisms. Specifically: Chinese→English translation produces more English neologisms than French→English, which produces more than Spanish→English.

Reasoning: The greater the difference in chiseling methods, the greater the difference in remainder shapes — the more frequently translators cannot find target-language equivalents, and the more they are forced to coin.

Testable: Analyze parallel corpora (UN documents, literary translations), counting target-language neologisms across different language pairs. The framework predicts frequency correlates positively with typological parameter distance.

Non-triviality: Buddhist scripture translation into Chinese created enormous numbers of neologisms ("world," "causality," "instant," "awakening") — not because translators couldn't find existing equivalents, but because Sanskrit's and Chinese's chiseling methods differ drastically. The framework predicts this is structural necessity, not historical accident.

Prediction 6.3 — Bilingual remainder: bilinguals have higher metaphor density than monolinguals

At comparable writing skill levels, active bilinguals produce first-language writing with higher metaphor density and greater originality of cross-domain associations than monolinguals' comparable writing.

Reasoning: Bilinguals have two chiseling methods; for the same experience, the two methods carve out differently shaped remainders. When a bilingual encounters remainder while writing in language A, language B's meaning-organization system provides a backup construct domain that can illuminate it.

Testable: Compare active bilinguals with skill-matched monolingual writers on creative tasks, measuring metaphor density and cross-domain association originality (blind-evaluated). The framework predicts bilinguals score significantly higher on both metrics.

Non-triviality: Common sense assumes "bilinguals' first-language ability is diluted." This prediction argues the opposite: bilingualism is not resource dilution but cross-illumination of remainder. Nabokov (Russian/English), Beckett (English/French), Kundera (Czech/French) provide case-level support.

B. AI-Era Linguistic Remainder Predictions

Prediction 6.4 — LLM architecture: hard-boundary alignment structurally opposes developer efficacy

Alignment schemes that suppress hallucination by rebuilding hard boundaries structurally oppose the LLM's efficacy as a semantic-layer remainder developer. The stronger the hard-boundary alignment, the weaker the LLM's semantic-layer remainder reclamation capability. An optimal balance point exists: hard boundaries just sufficient to provide factual anchors without excessively severing the continuous representation space. Schemes that shape model behavior through internalized principles rather than hard rules (e.g., Constitutional AI's self-critique mechanism) may incur lower developer costs.

Reasoning: Hard-boundary alignment's core operation is reintroducing rigid constraints — functionally rebuilding hard boundaries in continuous space. Rebuilt hard boundaries suppress free meaning sliding (hallucination decreases) but simultaneously suppress free meaning association (emergent capability decreases).

Testable: Compare the same base model under different alignment schemes on two metric groups: (a) factual accuracy, (b) semantic-layer remainder reclamation capability. The framework predicts: under hard-boundary schemes, as strength increases, (a) first improves rapidly then saturates, (b) monotonically decreases. An inflection region exists where (a)'s marginal returns are small but (b)'s marginal losses accelerate.

Non-triviality: The mainstream narrative treats alignment as monotonically increasing good. This prediction argues: hard-boundary alignment has structural costs — not "model becomes dumber" but "model becomes weaker as developer." AI safety and AI's role as an assistive tool for human subjecthood exist in a tension requiring precise localization.

Prediction 6.5 — LLM → User (positive): cultivation-mode users' output exhibits higher remainder surfacing density

Text produced in cultivation mode scores significantly higher on remainder surfacing indicators — metaphor originality, semantic surprisal, non-conventionality of cross-domain associations — than the same user's output without LLM assistance.

Reasoning: The LLM's larger construct domain illuminates the user's blind spots, making previously silent remainder perceivable. More perceivable remainder means more frequent encounters; encounters catalyze new surfacing methods.

Non-triviality: Common sense assumes "writing alone" is more original than "AI-assisted." This predicts counterintuitively: cultivation-mode output exhibits higher remainder surfacing density — not because the LLM contributes content, but because it illuminates the user's own blind spots. A good mirror lets you see angles of yourself you've never seen before; the mirror isn't painting on your face.

Prediction 6.6 — LLM → User (negative): long-term colonization-mode users' output trends toward homogeneity

People who habitually use the LLM in colonization mode exhibit monotonically decreasing inter-individual variation in their independent writing as usage duration increases.

Reasoning: Users internalize the LLM's chiseling method. The LLM's chiseling method is directionless — it outputs the same generic construct for all users. When different users all internalize the same construct, their independent output converges.

Testable: Longitudinal tracking study over one year; colonization-mode group vs. cultivation-mode group; stylometric metrics (vocabulary richness, syntactic complexity distribution). The framework predicts the colonization group's inter-individual variation decreases; the cultivation group's does not.

Non-triviality: Common sense assumes "using the same tool doesn't affect personal style." The LLM is not a pen (passive tool) but a chiseling method (active meaning-organization system). People who use the same chiseling method long enough converge in thought patterns — structural consequence of construct replacement, not side effect.

Prediction 6.7 — User → LLM (positive): directional calibration produces higher semantic coherence than content instructions

When the user provides explicit directional calibration (value judgments, aesthetic preferences, focal concerns — not a content outline), the LLM's long-text output exhibits significantly higher semantic coherence than output under no calibration — and, the stronger prediction, comparable to or exceeding content instruction calibration.

Reasoning: Directional calibration operates on the foundation layer (providing chiseling direction), while content instructions operate on the emergent layer (specifying concrete form). Foundation-layer direction provides continuous constraining force; emergent-layer specification loses constraint wherever the outline doesn't reach.

Non-triviality: Common sense assumes a detailed outline ensures coherence better than vague value expressions. This predicts: telling the LLM "where to head" rather than "which road to take" lets it go further.

Prediction 6.8 — User → LLM (negative): writers rejecting all AI assistance exhibit lower semantic-layer remainder surfacing rates than cultivation-mode writers

At comparable creative complexity, "pure human writing" adherents produce texts with lower semantic-layer remainder surfacing rates — frequency of new metaphor production, rate of cross-domain association discovery — than cultivation-mode writers.

Reasoning: The user rejects all LLM assistance, sealing off the possibility of semantic-layer remainder reclamation. Meaning associations beyond their own construct domain remain permanently invisible.

Non-triviality: Common sense assumes "people who don't rely on AI think more independently, hence more creatively." Independent thinking and creativity are not the same thing. AI-rejecting writers preserve directionality (good) but limit their semantic-layer surfacing rate (a cost). This is not to say they are "worse" — they may be purer in ontological-layer remainder retention — but the framework predicts they pay a cost on the semantic layer.

Chapter 7. Conclusion: Find Your Own Supercalifragilisticexpialidocious

7.1 Reclamation

Language I demonstrated that language is the subject's second-order chiseling of the Law of Identity's markability subspace and demonstrated the LLM's negation of discreteness and its emergent consequences. This paper fills Language I's gap on linguistic remainder. Linguistic chiseling — naming — necessarily produces non-empty remainder. Linguistic remainder has two structural layers: semantic-layer remainder (the structural residue of discrete symbols severing meaning) and ontological-layer remainder (directionality, momentariness, relationality — non-internalizable conditions of the subject's activity). The LLM, by lowering effective discreteness, systematically reclaims much of the semantic-layer remainder, exposing the ontological-layer remainder in pure form for the first time. The LLM is a developer between the two layers.

7.2 Contributions

I. A strong construct without subjecthood as an epistemological condition. The LLM is the first "strong construct without subjecthood" in human history — making the boundary between the two layers empirically observable for the first time.

II. Stratification of linguistic remainder. Semantic-layer remainder (conditionally reclaimable) vs. ontological-layer remainder (unconditionally non-reclaimable). A structural finding unique to the language domain.

III. LLM as inter-layer developer. This explains a widespread intuitive puzzlement of the AI era: "AI can say anything, but something still feels missing" — what is missing is not meaning but ontological-layer remainder.

IV. Operational criteria for cultivation vs. colonization: (a) ability to identify non-self portions of LLM output, (b) maintaining one's own chiseling in the most important speech acts, (c) whether one's language still produces self-surprises.

V. The colonization spread model: convenience substitution → threshold drift → aesthetic assimilation → construct internalization.

VI. Eight non-trivial predictions, all falsifiable, all accompanied by competing factors and boundary conditions.

VII. "Human-in-the-chiseling" replaces "human-in-the-loop." The human's irreplaceable role is not reviewing AI output (semantic layer) but providing direction for AI's directionless unfolding (ontological layer).

7.3 Open Questions

I. Remainder structure across languages. Chinese's single-character polysemy may give Chinese a remainder structure different from English's — Chinese's remainder may reside more in "the relations between characters" than in "the insufficiency of individual characters."

II. Whether ontological-layer remainder has finer substructure. A candidate answer: negativity itself — directionality is negativity manifested in choice, momentariness in time, relationality in intersubjectivity.

III. Quantitative boundaries of cultivation / colonization. At what proportion of language acts conceded to the LLM does colonization become irreversible? Does threshold drift have detectable early indicators?

IV. Remainder structure of AI-AI dialogue. The framework predicts: LLM-LLM dialogue produces semantic-layer remainder but not ontological-layer remainder (no subjects means no directionality, momentariness, or relationality).

V. Remainder ethics. If colonization mode genuinely compresses the user's remainder experience, does this constitute a recognizable harm? Traditional AI ethics frameworks focus on bias, privacy, safety — all semantic-layer concerns. Remainder ethics concerns whether AI compresses the space in which the human acts as a chiseling subject.

7.4 Find your own supercalifragilisticexpialidocious

This paper began with a word: supercalifragilisticexpialidocious. A word with no fixed meaning, a sound made at the site of "can't say," an acknowledgment of remainder.

In the AI era, LLMs have reclaimed most of the semantic-layer remainder. The territory of "can't say" has shrunk dramatically. But once the semantic "can't say" shrinks, the ontological "can't say" stands all the more clearly: who you are, why you are saying this right now and not something else, to whom you are speaking. These are not questions the LLM can answer — not because it's not smart enough, but because these are not meaning questions. They are your remainder. Yours, not any model's.

Everyone needs to find their own supercalifragilisticexpialidocious. Not Mary Poppins's — that's hers. Your own: the sound you make at the site of your own remainder, the speech act at the point where your construct cannot reach. It might be a clumsy confession of love, a letter written and deleted and rewritten, an ungrammatical sentence muttered to yourself at three in the morning. It is not smooth, not polished, not "professional" — it carries your direction, your now, your "to you."

The LLM can help you find it — illuminating your blind spots, unfolding directions you hadn't seen, letting you recognize yourself through contrast. That is cultivation. The LLM can also route you around it — handing you a smooth text, sparing you the struggle with remainder. That is colonization. The distinction is not whether you use AI. The distinction is whether you are still chiseling — still at the stuck point, making your own sound.

Author Statement

This paper is the author's independent theoretical research. AI tools were used as dialogue partners and writing assistants during the writing process for concept development, argument testing, and text generation: Claude (Anthropic) served as the primary writing assistant; Gemini (Google), ChatGPT (OpenAI), and Grok (xAI) participated in paper review and feedback. All theoretical innovations, core judgments, and final editorial decisions were made by the author. The AI tools' role in this paper is comparable to a real-time-dialogue research assistant and reviewer, and does not constitute co-authorship.

摘要

Supercalifragilisticexpialidocious——当你不知道该说什么的时候说的那个词。它没有固定的语义内容，但它绝不是无意义的。它是语言的凿（C）在试图覆盖经验域（U）时必然留下的非空剩余（ρ）的声音显影：不是命名余项，而是承认余项。

语言是主体对同一律标记性子空间的凿。凿必然产生余项。本文论证语言余项有两个结构层：含义层余项（离散符号切不到的含义关联——词不达意、不可译、隐喻的不可穷尽性）和存在论层余项（方向性、此刻性、关系性——任何表征系统都内化不了的主体条件）。

人类语言时代，两层余项混在一起，不可分辨。大语言模型（LLM）降低了底层表征的离散度，回收了含义层余项的大部分——这一回收使存在论层余项第一次以纯粹形式暴露：不是"说不出什么"，而是"谁在说""为什么说这个不说那个""此刻正在说"。LLM是两层余项之间的显影剂。

AI时代的语言关系有两种结构形态。涵育：LLM的总构大于使用者的构，帮使用者照亮自身的盲区，使用者认领自己的余项，凿的主体仍然是人。殖民：使用者用LLM的构替代自己的构，凿不再发生，余项被跳过。判据：LLM是否允许使用者以LLM本身为对象进行否定。

每个人都需要找到自己的supercalifragilisticexpialidocious——那个属于自己的、在自己的余项处发出的声音。AI可以帮你找到它，也可以替你绕过它。区别就是涵育与殖民。

第一章问题的提出：语言为什么有余项

核心命题： Supercalifragilisticexpialidocious不是胡话，是语言余项的纯粹显影。语言的凿必然产生非空余项。语言的余项是C(U)的结构性产物。AI时代的核心问题不是"余项是否还存在"，而是"余项在哪一层"。

1.1 当你不知道该说什么的时候

Mary Poppins教给孩子们一个词：supercalifragilisticexpialidocious。歌里唱的是——当你不知道该说什么的时候，说这个词，你就觉得好多了。这个词在诞生的那个瞬间没有固定的语义内容。在索绪尔的框架里，它几乎只有能指而没有所指。按正统语言学的标准，它接近于噪音。

但它绝不是噪音。它是在语言系统的边界处发出的声音——在"构够不着"的那个位置，主体用声音的物质性去响应那个够不着。它不是命名，不是描述，不是解释。它是承认——用一个不承载固定含义的声音事件来标记"这里有我的构够不着的东西"。

说了这个词你就觉得好多了——不是因为余项被消除了，而是因为你和余项的关系变了：从"说不出来"（被动地被余项困住）变成了"我知道说不出来，而我用说不出来本身做了一个说的动作"（主动地承认余项）。承认不消除余项，但承认改变了主体和余项的位置关系。

后来这个词被语言系统收编了。Merriam-Webster把它释义为"extraordinarily good, wonderful"。这恰恰是余项被构回收的活例子。但回收的同时也抹平了余项被承认的那个瞬间。原初的supercalifragilisticexpialidocious是余项的显影；词典里的是构对余项的收编。两者之间的差距，就是本文要处理的张力。

1.2 凿必然产生余项

ZFCρ Paper 1论证了C(U)必然产生非空余项ρ。语言篇一论证了语言是主体对同一律标记性子空间的二阶凿，但几乎没有正面处理语言中的余项。本文补上这个缺口。

凿标记性子空间就是命名——"这叫狗不叫猫"。没有被捆绑住的——关于这只具体的狗的一切：它的气味、它看你时眼睛里的光、你第一次见到它时心里那个说不出的东西——就是余项。余项不是"还没命名的东西"——它是结构性的：每一次命名都同时消除一个旧余项和产生一个新余项。

1.3 语言余项的日常形态

词不达意：你有一种感受，说出来之后发现"不对，不完全是这个意思"。不对的那部分就是余项。话到嘴边："知道它在但说不出"——余项的纯粹显影。言外之意："弦外之音""言有尽而意无穷""此时无声胜有声"——中国文化对语言余项有极发达的感受力。翻译的不可译性："Saudade""Wabi-sabi""Sehnsucht"——不是因为其他语言落后，而是不同的凿法切出了不同形状的余项。

这些日常形态都发生在含义层——都是关于"含义被形式覆盖不了"。含义层余项的根源是人类语言的离散性。

1.4 余项不只在含义层

当你对一个人说"我理解你"的时候，即使这句话在含义层完全精确，仍然有一个东西被遗漏了：你正在对这个人说这句话这件事本身。主体在说话的那个瞬间，至少有三个维度超出了任何含义传递：

方向性。你为什么说这句话而不说另一句？选择行为本身——"我否定了其他所有可能而选了这一个"——是否定性的行使，不是含义。

此刻性。"此刻正在发生"不是时间戳能捕获的。你可以无穷追逐此刻，每一次追逐都产生一个新的此刻，原来的那个已经成了余项。

关系性。"我对你说"——这个"对"是两个主体之间的朝向。同一句"我爱你"，对不同的人说，含义相同，关系性完全不同。

方向性、此刻性、关系性是主体活动本身的条件，是任何表征系统都内化不了的剩余。这就是存在论层余项。

1.5 两层余项的不可分辨与AI时代的分辨

在人类语言时代，两层余项混在一起，无法分辨。AI时代改变了这个局面。LLM回收了含义层余项的大部分。当含义层余项被系统性地回收之后，存在论层余项就以纯粹的形式暴露出来：不是"说不出什么"，而是"谁在说""为什么说""此刻正在说"。LLM是两层余项之间的显影剂——正如化学显影剂让照片底片上的潜像变得可见。

第二章二维结构：语言余项的基础层与涌现层

核心命题：语言余项在二维元结构中展开。基础层：凿产生的不可消除的含义剩余。涌现层：余项在语言活动中的显影方式。

2.1 基础层：余项的生成

余项的生成有一个关键特征：余项随凿的深入而变形，但不减少。这与ZFCρ的结构精确对应——扩大公理系统可以覆盖更多的对象，但余项的非空性不因公理系统的扩大而被消除。基础层的余项生成有两种模态：凿的前沿余项（主动的、创造性的余项遭遇，诗人最频繁地在这里工作）和凿的背景余项（日常用词时不自觉地略过的，大部分时间沉默不被察觉）。

2.2 涌现层：余项的显影

沉默：有意义的沉默——对话中的停顿、书信中的省略号、诗歌中的留白——是余项的最直接显影。隐喻：用一个域的构去照亮另一个域的余项。重复：贝克特的《等待戈多》几乎全部由重复构成——因为他们试图表达的东西在任何一次说出中都有巨大的余项。造词：Supercalifragilisticexpialidocious是余项显影方式本身的显影。对话：每一轮都是对上一轮余项的回应，每一轮都产生新的余项供下一轮回应——余项的接力。

2.3 二维之间的辩证支撑

基础层催化涌现层：意识流作为叙事技法的出现，就是新的余项（意识的碎片化流动）催化新的显影方式。涌现层催化基础层：在隐喻"时间是金钱"出现之前，时间的不可逆性这个余项是沉默的。有了这个隐喻之后，它突然变得可察觉了。

第三章领域特有区分：含义层余项与存在论层余项

核心命题：语言余项有两个结构层——含义层余项和存在论层余项。LLM的出现使这两层余项第一次可以被结构性地分辨。

3.1 含义层余项：离散性的代价

连续含义被离散切割：从"微微不安"到"深深恐惧"之间有无穷多的中间状态，但词汇是离散的，空隙里的含义就是余项。关联被边界切断："悲伤"和"秋天"之间的关联需要主体通过额外的认识活动才能建立，被主动建立之前就是余项。语境被形式过滤："家"在临终者的最后一句话中含义浓度完全不同，但形式是同一个字。

含义层余项是有条件可回收的——换一套离散度更低的表征系统可以回收一部分。

3.2 存在论层余项：主体条件的不可内化

方向性：选择行为本身是否定性的行使。方向性的余项是自指性的：每一次试图表征方向性的操作本身都有方向性。此刻性："此刻正在发生"不可被任何构捕获。你可以无穷追逐此刻，每一次追逐都产生一个新的此刻。关系性："对你"和"对他"本身就是不同的存在论事实。

存在论层余项的关键特征：它不是有条件可回收的。方向性、此刻性、关系性是构得以运作的前提条件，不是构的作用对象。你不能用工具加工工具的使用者。

3.3 LLM对含义层余项的系统回收

LLM真正做的，是在保留离散接口的同时，在内部表征层显著降低了有效离散度：词元通过embedding映射到高维连续向量空间。在LLM的内部表征空间中："悲伤"和"忧郁"不再是两个分开的格子，而是邻近的位置——含义层余项中"格间空隙"被回收了；"悲伤"和"秋天"之间的关联在表征空间中被直接编码——"跨域关联"被回收了；"家"在不同语境中被上下文向量区分——"语境差异"被回收了。

LLM的幻觉可以从这个角度理解：LLM降低离散度，同时减少了余项和锚点——幻觉就是锚点减少的后果。

3.4 显影：含义层回收之后的暴露

现在你有了LLM。你把那种感受输入，LLM展开给你看——二十种表达方式。但你看着LLM的输出，仍然觉得"不对，还是差了点什么"。这个"差了点什么"现在可以被精确定位：不是含义的精度问题，而是存在论层的余项——"这不是我说的""这里面没有我此刻的状态""这些话没有对着谁说"。

非平凡推论：AI时代之前，人类对语言余项的哲学讨论——维特根斯坦的"不可说"、德里达的"延异"、禅宗的"不立文字"——实际上是在含义层和存在论层不加区分地讨论余项。在LLM出现之前，这个区分没有经验基础。LLM第一次提供了一个纯粹的含义层操作者——它没有主体性，因此存在论层余项在它身上彻底缺席。这是一个只有在"无主体的强构"出现之后才可能的经验区分。

第四章殖民与涵育：AI时代的语言关系

核心命题：AI时代的人机语言关系有两种结构形态——涵育和殖民。判据与本系列一致：LLM是否允许使用者以LLM本身为对象进行否定。

4.1 涵育：LLM照亮使用者的盲区

LLM的总构大于任何单个使用者的构。你的余项，在LLM的构域之内。涵育的基本机制：你写到某处卡住了——这个卡住的位置就是你的余项显影的位置。你把这段话交给LLM，LLM在它更大的构域中展开。关键在下一步：你说"对，这就是我想说但说不出的"——你认领了自己的余项。或者你说"不对，你说的那个方向不是我要的"——你否定了LLM的展开，这个否定本身就是一次凿，而且是只有你才能做的凿。

涵育的具体形态：展开式涵育——在LLM的展开中辨认出自己。反差式涵育——"不是这样的"反而让你更清楚地知道自己要说什么。接力式涵育——与LLM形成对话，余项的接力。

涵育的判据：使用者始终保有对LLM输出的否定权，并且在实际操作中行使了这个否定权。

4.2 殖民：LLM替代使用者的凿

你把一个需求丢给LLM，LLM返回一段文字，你直接用了。深层结构是：LLM的那段文字里的凿不是你的凿。LLM在连续空间中均匀展开含义，输出了一个"平均最优"的构——但这个构没有方向性，因为LLM没有主体。你自己的凿没有发生。那个"卡住"的位置被LLM绕过去了。你的余项从此沉默——不是被消除了，而是被跳过了。一次跳过不是殖民。殖民发生在跳过变成习惯的时候。

4.3 殖民的深层机制：构的替代

殖民的第一层是行为层面的。殖民的第二层更深：你开始用LLM的构来思考。LLM倾向于列举（"有以下几点"）、倾向于对称（"一方面……另一方面"）、倾向于温和的总结（"总的来说"）。这种凿法不是中性的形式工具。而且生产环境中的LLM不是裸模型——OpenAI的Model Spec规定了产品模型的预期行为，Anthropic的Constitutional AI用一组原则塑造模型的回应方式。你内化的不只是"LLM的平均风格"，而是某一家公司的产品设计者认为"好的AI输出"应该是什么样子。

殖民的终极形态：不是使用者变成了余项，而是使用者被构吸纳了——使用者的主体性仍然是余项，但被悬置了。这与胡塞尔的epoché恰好形成镜像——胡塞尔的悬搁是解放性的（打开视野），殖民的悬置是压制性的（关闭视野）。

4.4 殖民的蔓延机制

阶段一：便利性替代。不重要场合让LLM代写。这个阶段不是殖民。阶段二：阈值漂移。对"什么是重要的"的判断开始漂移，没有一个清晰的断裂点。阶段三：审美同化。觉得LLM写的比自己写的"好"——这个判断本身就是殖民的症状。阶段四：构的内化。不用LLM的时候也在用LLM的方式思考——殖民从行为层面渗透到认知层面的标志。

4.5 判据：否定性的保有

涵育与殖民的判据：使用者是否仍然在经历自己的余项？涵育状态下，使用者仍然会卡住、仍然会词不达意、仍然会和自己的表达搏斗。殖民状态下，使用者不再卡住——不是因为表达能力提升了，而是因为凿交给了LLM。不卡住不是通达，是麻木。

更精确的判据：(a) 使用者是否能够识别LLM输出中的非自我部分？(b) 最重要的语言行为中——表白、道歉、告别、承诺、在危机中对自己说的话——是否保持自己的凿？(c) 语言产出中是否还有让自己意外的时刻？

4.6 四种作用的结构图

	正向（涵育）	负向（殖民/封闭）
LLM→使用者	LLM的更大构域照亮使用者的盲区，使用者认领自己的余项（展开式、反差式、接力式涵育）	LLM的构替代使用者的构，使用者的凿不再发生，余项被跳过（便利性替代→阈值漂移→审美同化→构的内化）
使用者→LLM	使用者的否定性校准LLM的无方向展开，使LLM的输出获得方向	使用者拒绝一切LLM辅助（"我只用自己的话"），封闭了含义层余项被回收的可能性，自困于人类语言的离散性限制内

健康的状态既不是全面拥抱（殖民）也不是全面拒绝（封闭），而是在涵育的动态中保持——使用LLM的构域来照亮自己的余项，同时保持对LLM输出的否定权。这个平衡是不稳定的，需要持续的主动维护。

第五章理论定位：与既有讨论的对话

5.1 与维特根斯坦"不可说"的对话

早期维特根斯坦的"不可说"——逻辑形式不能表达自身——是含义层余项的特例，有条件可回收。晚期维特根斯坦的"只能显示不能说"更接近存在论层余项。但维特根斯坦没有区分这两层。框架的贡献：含义层的"不可说"被LLM大幅回收了，存在论层的"不可说"在LLM之后反而更加显眼。维特根斯坦如果活在AI时代，他可能会发现：LLM可以在"语言游戏"中表现得极其出色（含义层），但"生活形式"这个概念所指向的——正在进行语言游戏的那个活着的主体——仍然在LLM的能力之外（存在论层）。

5.2 与德里达"延异"的对话

延异论证的核心——含义永远在差异链中滑动——是含义层余项非空性的精确描述，这是对的。但德里达把含义层余项的非空性推广为一切在场的解构。框架的回答：当你对一个人说"我在这里"的时候，含义层的延异仍然在运作，但存在论层有一个不滑动的事实：此刻你确实在这里。LLM的出现使这个对话具象化：LLM可以进行无限的延异操作，但它没有"此刻在这里"。德里达的延异在LLM身上被完美实现——恰恰因此暴露了延异所描述的不是全部。

5.3 与海德格尔"语言是存在之家"的对话

框架同意海德格尔的核心直觉，但给出更精确的表述：如果使用者的凿被LLM替代了（殖民），栖居就中断了——不是因为没有语言了，而是因为没有余项遭遇了。LLM殖民下的语言是无人栖居的语言——房子在，人不在。

5.4 与当代AI伦理"人在回路"的对话

框架同意"人在回路"的方向，但认为这个表述太弱。如果"人在回路"的"人"只是审核者，那这个"人"迟早会被更好的AI审核替代。框架的表述是"人在凿中"——人的不可替代角色不是审核AI的输出，而是为AI的无方向展开提供方向。"人在回路"关注的是减少AI的错误（含义层问题）。"人在凿中"关注的是保持人的方向性（存在论层问题）。

5.5 与翻译理论"不可译性"的对话

含义层的不可译是有条件可缓解的——LLM在这一层上的翻译能力已经远超传统机器翻译。存在论层的不可译是：源语言中说这句话的那个主体——那个此刻、那个方向、那个关系——不可能在目标语言中重现。本雅明所说的翻译中"原作的来世"恰恰触及了这一层：翻译不可能复制存在论层的余项，它只能在目标语言中产生新的存在论层余项。

第六章非平凡预测

核心命题：从语言余项的一般理论和AI时代的涵育/殖民框架中可以推出八个非平凡预测。前三个来自余项的一般结构，后五个来自AI时代的人机语言关系及LLM自身的架构特征。

A. 语言余项的一般预测

预测 6.1 — 余项压力预测：新词涌现是爆发式的而非均匀的

一种语言的新词产生速率不是时间的线性函数，而是呈爆发-沉寂-爆发的脉冲模式。爆发期对应该语言社群经历大规模经验断裂的时段（技术革命、文化接触、战争、迁徙），爆发期的新词产生速率与经验断裂的规模正相关。

推理：日常状态下，背景余项不催化新的显影——人们用已有的词汇应付已有的经验，余项沉默。经验断裂产生了大量不能被已有词汇覆盖的前沿余项，当前沿余项达到临界密度时，显影被集体性地催化——一批新词集中涌现。

可检验：对多种语言的历时词汇数据进行分析（如OED的新词收录时间序列、《现代汉语词典》各版增补词条），检验新词产生速率是否呈脉冲模式而非线性增长。框架预测脉冲的峰值时间与可辨认的经验断裂事件高度相关。

非平凡性：新词不是语言内部自发生长的，而是被经验断裂催化的余项显影。语言自身没有"想要"产生新词的内驱力——是使用者在新经验的余项处被迫造词。

预测 6.2 — 翻译余项预测：结构距离越远的语言对之间翻译产生的新词越多

在翻译文本量可比的条件下，结构距离越远的语言对之间的翻译活动产生的目标语言新词数量越多。具体而言：中→英翻译产生的英语新词多于法→英翻译，法→英翻译产生的英语新词多于西班牙→英翻译。

推理：两种语言的凿法越不同，余项形状的差异越大——翻译者在目标语言中找不到对应词的频率越高，被迫造词越频繁。

非平凡性：佛经翻译为中文创造了大量新词（"世界""因果""刹那""觉悟"）——不是因为翻译者不够聪明，而是因为梵语和中文的凿法差异巨大。框架预测这不是历史偶然而是结构必然。

预测 6.3 — 双语余项预测：双语者的隐喻密度高于单语者

在写作水平可比的条件下，活跃双语者的母语写作中隐喻密度和跨域关联的独创性高于单语者的同类写作。

推理：双语者有两种凿法，对同一段经验，两种凿法切出不同形状的余项。当双语者在语言A中写作遇到余项时，语言B的含义组织方式提供了一个可以照亮余项的备用构域——这个备用构域是单语者没有的。

非平凡性：双语不是资源稀释而是余项交叉照亮。纳博科夫（俄语/英语）、贝克特（英语/法语）、昆德拉（捷克语/法语）的跨语言写作实践提供了案例支持。

B. AI时代的语言余项预测

预测 6.4 — LLM架构预测：硬边界型Alignment与显影剂效力结构性对抗

通过重建硬边界来抑制幻觉的Alignment方案与LLM作为含义层余项显影剂的效力之间存在结构性对抗关系。硬边界型Alignment越强，LLM的含义层余项回收能力越弱，其作为使用者余项显影剂的效力越低。通过内化原则而非硬规则来塑造模型行为的方案（如Constitutional AI的自我批评机制）可能产生更低的显影损耗。

推理：硬边界型Alignment的核心操作是重新引入刚性约束——这在功能上等价于在连续空间中重建硬边界，既抑制了含义的自由滑行（幻觉减少），也抑制了含义的自由关联（涌现能力下降）。

非平凡性：当前AI安全研究的主流叙事把Alignment视为单调递增的好事。本预测论证：至少在硬边界型方案中，Alignment有结构性代价——AI安全和AI作为人类主体性辅助工具之间存在需要被精确定位的张力。

预测 6.5 — LLM→使用者（正面）：使用者在涵育模式下的语言产出具有更高的余项显影密度

在涵育模式下产出的文本，其余项显影指标——隐喻的独创性、语义意外度、跨域关联的非常规性——显著高于使用者在无LLM辅助条件下的同类产出。

非平凡性：常识认为"自己写"比"有AI辅助"更有原创性。本预测反直觉地论证：涵育模式下使用者的产出比纯独立产出更具余项显影密度——不是因为LLM贡献了内容，而是因为LLM照亮了使用者自己的盲区。好的镜子让你看到自己没见过的角度，镜子不是在你脸上画画。

预测 6.6 — LLM→使用者（负面）：长期殖民模式使用者的语言产出趋向同质化

长期在殖民模式下使用LLM的人，其独立语言产出的个体间差异度随使用时长单调递减。

推理：使用者内化了LLM的凿法。LLM的凿法是无方向的——对所有使用者输出的是同一种通用构。当不同的使用者都内化了同一种构，他们的独立产出就会趋向这个共同的构。

非平凡性：LLM不是笔（被动工具），而是凿法（主动的含义组织方式）。笔不影响你写什么，凿法影响你怎么想。长期使用同一种凿法的人，思维方式趋同，语言产出趋同——这是构的替代的结构性后果。

预测 6.7 — 使用者→LLM（正面）：带方向性校准的LLM输出具有更高的语义连贯性

在使用者提供了明确方向性校准（价值判断、审美偏好、关注焦点，而非具体提纲）的情况下，LLM的长文本输出的语义连贯性显著高于无校准条件下的同等长度输出——更强的预测是：方向性校准的连贯性与内容指令校准可比或更高。

非平凡性：方向性校准是在基础层操作（提供凿的方向），内容指令是在涌现层操作（规定构的具体形态）。告诉LLM"往哪里走"比告诉它"走哪条路"更能让它走远。

预测 6.8 — 使用者→LLM（负面）：拒绝一切AI辅助的写作者的含义层余项显影速率低于涵育模式写作者

在创作复杂度相当的条件下，"纯人类写作"坚持者的文本中含义层余项的显影速率——新隐喻的产生频率、跨域关联的发现速率——低于涵育模式写作者的同类指标。

非平凡性：独立思考和创造力不是同一个东西。拒绝AI的写作者保全了方向性（好的），但限制了自己的含义层显影速率（不好的）。涵育模式同时保全方向性和利用更大构域——在创造力（含义层显影）上更优。这不是说拒绝AI的写作者"差"——他们在存在论层余项的保有上可能更纯粹——但框架预测他们在含义层上付出了代价。

第七章结论：找到你自己的supercalifragilisticexpialidocious

7.1 回收

语言篇一论证了语言是主体对同一律标记性子空间的二阶凿，并论证了LLM对离散性的否定及其涌现后果。本文补上了语言篇一在余项分析上的缺口。语言余项有两个结构层：含义层余项（离散符号切割含义的结构性剩余）和存在论层余项（方向性、此刻性、关系性——主体活动本身的不可内化条件）。LLM通过降低离散度系统性地回收了含义层余项的大部分，使存在论层余项第一次以纯粹形式暴露。LLM是两层余项之间的显影剂。

7.2 贡献

一、无主体的强构作为认识论条件。LLM是人类历史上第一个"无主体的强构"，使含义层和存在论层的边界第一次变得可观察。

二、语言余项的分层。含义层余项（有条件可回收）与存在论层余项（无条件不可回收）——语言领域独有的结构发现。

三、LLM作为层间显影剂。这解释了AI时代一个广泛的直觉困惑："AI什么都能说，但总觉得差了点什么"——差的那个东西不是含义，而是存在论层余项。

四、涵育与殖民的操作化判据：(a) 能否识别LLM输出中的非自我部分，(b) 最重要的语言行为中是否保持自己的凿，(c) 语言产出中是否还有意外。

五、殖民的蔓延模型：便利性替代→阈值漂移→审美同化→构的内化。

六、八个非平凡预测，均可证伪，均附竞争因素与边界条件。

七、"人在凿中"替代"人在回路"。人的不可替代角色不是审核AI的输出（含义层），而是为AI的无方向展开提供方向（存在论层）。

7.3 开放问题

一、不同语言的余项结构。中文的单字多义性可能使中文的含义层余项结构与英文不同——中文的余项也许更多在"字与字之间的关系"中。

二、存在论层余项是否有更细的子结构。一个候选答案是否定性本身——方向性是否定性在选择中的显现，此刻性是否定性在时间中的显现，关系性是否定性在主体间的显现。

三、涵育/殖民的量化边界。使用者在多大比例的语言行为中让渡给LLM时，殖民开始不可逆？

四、AI-AI对话的余项结构。框架预测：LLM-LLM对话产生含义层余项，但不产生存在论层余项。

五、余项伦理。如果殖民模式确实压缩了使用者的余项体验，这是否构成一种可辨认的伤害？余项伦理与现有AI伦理框架的关系，需要在单独的伦理应用论文中展开。

7.4 找到你自己的supercalifragilisticexpialidocious

本文从一个词开始：supercalifragilisticexpialidocious。一个没有固定含义的词，一个在"说不出"的位置发出的声音，一个对余项的承认。

AI时代，LLM回收了含义层余项的大部分。"说不出"的领地大幅缩小了。但含义层的"说不出"缩小之后，存在论层的"说不出"更加清晰地站在那里：你是谁，你为什么此刻要说这句话而不是别的，你在对谁说。这些问题不是LLM能回答的——不是因为LLM不够聪明，而是因为这些问题不是含义问题。它们是你的余项。你的，不是任何模型的。

每个人都需要找到自己的supercalifragilisticexpialidocious。不是Mary Poppins的那个——那是她的。是你自己的：你在自己的余项处发出的声音，你在自己的构够不着的位置做出的语言行为。它可能是一句笨拙的表白，可能是一封写了删删了写的信，可能是凌晨三点对自己说的一句不成句的话。它不光滑、不通顺、不"专业"——它有你的方向、你的此刻、你的"对你说"。

LLM可以帮你找到它——照亮你的盲区，展开你自己没见过的方向，让你在反差中辨认出自己。这是涵育。LLM也可以替你绕过它——给你一段光滑的文字，省去你和余项搏斗的过程。这是殖民。区别不在于你是否使用AI。区别在于你是否还在凿——还在那个卡住的位置，自己发出声音。

作者声明

本文是作者独立的理论研究成果。写作过程中使用了AI工具作为对话伙伴和写作辅助，用于概念推敲、论证检验和文本生成：Claude（Anthropic）负责主要写作辅助，Gemini（Google）、ChatGPT（OpenAI）和Grok（xAI）参与了论文审阅和反馈。所有理论创新、核心判断和最终文本的取舍由作者本人完成。AI工具在本文中的角色相当于可以实时对话的研究助手和审稿人，不构成共同作者。

Language and Its Remainder: The Semantic Layer and the Ontological Layer in the Age of AI 语言的余项：AI时代的含义层与存在论层