Chisel and Construct: Essay I — Music and Song

中文

This series does not discuss music theory. It discusses the rationale behind music theory.

Music theory tells you that a dominant seventh chord resolves to the tonic. The rationale tells you why the concept of "resolution" exists in every musical culture — regardless of whether that culture ever developed harmonic theory as an encoding system.

I do not discuss what a work "expresses." I discuss only one question: why can some music be entered repeatedly, with each entry yielding something new, while other music is effective only on first encounter?

I compress this difference into a four-step cycle: Arise (生), Settle (定), Unfold (展), Fix (固).

This is not music theory. It is not style. It is not cultural tradition. It is the basic operation by which the human cognitive system processes expectation across time. These four terms are taken directly from the Life Cycle Table in the Self-as-an-End (SAE) framework — selection, construction, being chiseled open, re-closure. Music moves us precisely because it replicates, in the auditory channel, the rhythm of life itself.

You do not need to understand music theory. You need only acknowledge one thing: you have experienced the sensation of an ending that weighs more than the beginning.

A preliminary note on scale: This cycle is scale-variable. It can operate at the level of a single musical phrase or across an entire work. It can nest (a large cycle containing multiple smaller cycles), run in parallel (multiple cycles operating simultaneously), or stagger (different channels at different stages). A work does not run through the cycle only once — it runs multiple cycles at multiple scales simultaneously, and the relationships between those cycles are themselves a source of remainder.

I. An Experience You Already Know

Set aside all theory for now.

Think of a piece of music you have listened to many times and would willingly listen to again. Any genre. Classical, pop, rap, folk, electronic — all qualify.

Now recall, roughly, your experience of hearing it for the first time:

Something at the opening caught you. Perhaps a melody, a rhythm, a quality of sound. You may not be able to name it, but your attention locked. Your brain began doing something: predicting. You began to expect "what comes next."

Then that expectation was confirmed. Repetition, development, or the arrival of a chorus. You felt: "I understand what this piece is doing." A sense of comfort. You entered its world.

Then, at some moment, your expectation was broken.

Perhaps an unanticipated modulation. A sudden shift in rhythm. A vocal treatment you did not foresee. Perhaps very subtle — you cannot articulate precisely what changed. But you felt it: a "wrongness," a "surprise" — yet not random surprise. It followed a logic you could sense but could not have computed in advance.

Then it returned. The melody came back, or some sense of closure appeared. But your experience had changed. The same melody now carried different weight.

At the end, you possessed something you did not possess at the beginning. But you cannot say what it is.

This is the complete experience of the chisel-construct cycle. Every person who has listened to music has undergone it. A three-year-old hearing a nursery rhyme undergoes it. It requires no music theory to explain, though music theory can be explained by it.

II. Arise, Settle, Unfold, Fix

Decompose the above experience into four steps.

Arise (生). Establish expectation. A segment of musical material appears; the nervous system begins forming a predictive model. This requires no conscious effort — it is automatic. As long as you are listening, your brain is predicting the next sound.

The opening four notes of Beethoven's Fifth Symphony are Arise. The opening verse of a BTS song is Arise. The 散板 introduction of the guqin piece "Flowing Water" is Arise. The first flow pattern Kendrick Lamar establishes in a track is Arise. The moment a DJ set's build-up begins is Arise.

Material differs. Encoding differs. The operation is the same: cause the listener's brain to form an expectation of "what comes next."

Settle (定). Confirm expectation; allow it to become a stable model. Repetition, development, reinforcement. The listener moves from "I am predicting" to "I am certain."

A chorus repeated twice is Settle. A fugue subject appearing in different voices is Settle. A rapper running the same flow through an entire verse is Settle. A motif elaborated across a guqin passage is Settle.

This step establishes comfort. But comfort is not the goal — it is preparation for the next step. You must first have a stable model before that model can be broken. Without Settle there is no Unfold, just as you cannot chisel a stone that has not yet taken shape.

Unfold (展). Break expectation. Under the condition of non-randomness, cause the listener's predictive model to fail.

This is the most critical step.

Beethoven fragments the four-note motif in the development section, displaces tonality, distorts rhythm. A K-pop bridge suddenly shifts meter, key, and energy density. Kendrick switches flow mid-verse, restructuring the rhythmic framework. Bach inverts, retrogrades, and compresses the fugue subject in the development. The guqin piece "Flowing Water" erupts from 散板 into the dense overtone cascade of the "seventy-two rolling waves" — you thought you understood the temporal logic of this piece, and then it opens entirely.

The essence of Unfold: you thought you knew what the next sound would be, and you were wrong. But it was not random — it followed a logic you could feel but could not have calculated.

At the moment of breaking, the cognitive system exposes something: remainder. The portion your model cannot absorb. "I thought I understood this piece — I did not" — that gap is remainder.

Fix (固). Re-close while carrying the traces of chiseling.

The recapitulation returns. The final chorus returns. The guqin returns to its overtone coda. But this return is not simple repetition. The listener has been changed by Unfold, so the same melody now carries different psychological weight.

Fix is not a return to Settle. Fix is a new stable state that contains one additional layer — the trace left by the chisel. You have closed, but the content of closure is thicker.

This is the source of "the ending weighs more than the beginning."

III. Why Some Music Crosses Cycles

With this model, "why does some music still have something after a hundred listens" receives a structural answer:

Because its remainder is inexhaustible.

With each listen, your predictive model becomes more precise than the last. You grow increasingly familiar with the Arise and Settle portions. But in the Unfold portion — the place where your expectation was broken — there is always something that exceeds what you can fully absorb in that particular listening. You know something is there, but you have not fully taken it in. Next time you listen, it is still there, waiting.

Bach's Well-Tempered Clavier fugues are the extreme case of this structure. The polyphonic texture of multiple voices means that each listening allows you to track different lines, and the relationships between lines produce remainder that is practically infinite. You can listen five hundred times and notice something new each time. Not because you missed sounds — your ears received all the information — but because your cognitive model can process only a portion per pass. The rest is remainder, and it is always there.

Conversely, music whose information content reaches zero after a hundred listens does so because its remainder was fully absorbed by the third listen.

This is not a judgment of taste. It is a structural diagnosis.

IV. Two Modes of Degradation

Degradation One: Information entropy reaches zero — only Arise and Settle.

A song establishes a pattern, then repeats it indefinitely. No Unfold, no chisel. The listener's predictive model is fully fixed by the second pass; every subsequent pass is pure confirmation at zero information.

This is the structure of most viral short-form melodies. They are not "bad music" — structurally, they are incomplete chisel-construct cycles. They accomplish Arise and Settle but never reach Unfold. They are consumed rapidly: what feels irresistible today provokes indifference within a week, because remainder was exhausted on the first day.

Degradation Two: Pure chisel without anchor — only Unfold.

The opposite extreme. Certain radical experimental music breaks expectation throughout but never establishes anything. Every sound is a "surprise." But because the listener can never form a predictive model, there is no experience of "being broken." You cannot chisel a shapeless thing.

Remainder cannot be exposed because there is no construct to chisel. This is why such music typically has a narrow audience — not because listeners are "insufficiently sophisticated," but because structurally it lacks Arise and Settle, leaving Unfold without a reference frame.

Two modes of degradation: one never chisels, the other has nothing to chisel. Music that endures occupies the middle: it builds a sufficiently stable model, breaks it at a precise location, and the breaking produces remainder that cannot be absorbed in a single pass.

V. Music Theory Is Only Encoding

To this point, we have not used a single music-theory term in the argument. This is deliberate.

Because music theory — any music theory — is merely a specific culture's encoding system for the universal cognitive process of Arise-Settle-Unfold-Fix.

Western music encodes with tonal harmony. It uses "chord resolution" to implement Fix (dominant seventh → tonic is a form of closure), "tonal displacement" to implement Unfold (modulation breaks tonal expectation), and "motivic development" to implement Settle (repeated themes stabilize prediction).

Chinese traditional music encodes with pentatonic scales and 板式 (rhythmic-modal frameworks). 散板 (free rhythm) is itself an implementation of Arise — you cannot predict when the next beat will land, so your cognitive system is perpetually forming and revising expectation. "Tight playing, slow singing" (紧拉慢唱) runs two speed layers simultaneously, which is essentially a cross-level Unfold.

Indian classical music encodes with raga. A raga prescribes different pitch sets for ascending and descending phrases — this embeds the Unfold mechanism: the predictive model you form during ascent fails during descent, forcing a model break.

West African drumming encodes with polyrhythm. Multiple independent rhythmic lines run simultaneously, their phase relationships constantly shifting — this is cognitively the same operation as Bach's multi-voice polyphony, only the encoding system is entirely different.

Encodings differ. What is encoded — Arise, Settle, Unfold, Fix — is the same.

This is why a person who truly understands Bach will not feel "lost" when hearing West African drumming. They will recognize the chisel-construct operations even without understanding the encoding system. Their aesthetic capacity — the precision with which they perceive the completeness of chisel-construct cycles — is cross-encoding and universal.

This is also the distinction between taste and aesthetic judgment. Taste is which encoding system you habitually inhabit. Aesthetic judgment is the precision with which you perceive the chisel-construct cycle itself. The former is subjective and varies from person to person. The latter has an objective standard. A person may listen exclusively to trap music yet possess excellent aesthetic judgment (high sensitivity to the precision of the chisel), or listen extensively to classical music yet have mediocre aesthetic judgment (enjoying only the comfort of Settle, never attending to what Unfold is doing).

5a. Dialogue with Two Predecessors

The above insights are not entirely new. The philosophy of music has, over the past hundred and seventy years, touched different parts of this elephant in scattered fashion. A brief dialogue with two predecessors will locate the present essay's position.

Hanslick (1854), On the Musically Beautiful. He is the founder of musical formalism. Core thesis: the meaning of music lies not in what emotions it "expresses" but in "tonally moving forms" (tönend bewegte Formen) themselves. Musical beauty is structural, not referential.

This resonates strongly with the present essay — we likewise refuse the interpretive route and do not say "Beethoven expresses the struggle against fate." But Hanslick stops at "form is meaning." He tells you beauty resides in structure but provides no criterion: what kind of structure crosses cycles? What kind degrades to zero information? He distinguishes "beauty in form" from "beauty in feeling" but does not distinguish "durable form" from "consumable form." Arise-Settle-Unfold-Fix picks up precisely where he stopped. Not merely that beauty resides in form, but that form endures because it completes the chisel-construct cycle and its remainder is inexhaustible.

Meyer (1956), Emotion and Meaning in Music. He did something very close to the present essay: he explained musical meaning through "expectation and the breaking of expectation." His core argument is that musical meaning arises at the moment when a listener's expectation is delayed or broken. This is nearly identical to Unfold.

But Meyer's framework lacks two things. First, he focuses on the single point of expectation-breaking, without treating "establish → confirm → break → re-close with trace" as a complete cycle. He can explain "why a particular moment surprises you" but cannot explain "why the ending weighs more than the beginning" — because he has no concept of Fix (re-closure carrying the trace of breaking). Second, his framework is psychological, stopping at the description of cognitive process without touching the ontological question of "whether remainder is exhaustible."

Placing Hanslick and Meyer together: Hanslick says beauty is in form but provides no criterion; Meyer says meaning is in breaking expectation but does not make the closure. The present essay's position falls precisely between the two — more criterion than Hanslick (which forms cross cycles), more closure than Meyer (how breaking leads to re-construction with trace), and both unified within a single four-step cycle.

This is not to say Hanslick and Meyer were wrong. Each grasped one leg of the elephant. The present essay attempts to assemble the whole animal.

VI. Isomorphic Comparison: Beethoven and BTS

Two traditions that could not be more different. One is early nineteenth-century Viennese symphonic music; the other is twenty-first-century Korean pop. But at the level of the chisel-construct cycle, they perform the same operation.

Beethoven, Symphony No. 5, first movement.

Arise: The four-note motif (short-short-short-long) is established within the opening eight bars. Extremely concise; anyone can memorize it in one hearing. Your brain immediately begins predicting — how will this motif develop?

Settle: The exposition spends approximately one hundred bars repeatedly confirming this motif. It appears at different pitches, in different instruments — always the same motif. Your predictive model grows increasingly stable.

Unfold: The development section begins. The four-note motif is fragmented. It appears in keys you did not expect; rhythm is stretched or compressed; instrumental groups tear against each other. Your predictive model fails completely — you recognize the four notes, but you do not know where they will go next. Remainder is massively exposed.

Fix: The recapitulation. The four-note motif returns to the home key. But what you hear is not the same motif as at the opening — having passed through the development's tearing, the same notes now carry different weight in your cognition. Closure, but the content of closure has changed.

BTS, "Spring Day."

Arise: The intro and first verse establish a melodic and emotional baseline through soft electronic textures and vocals. Your brain begins modeling.

Settle: The pre-chorus confirms the direction. You feel you know what the chorus will sound like.

Unfold: The bridge. Rhythm shifts. Arrangement shifts. Vocal treatment shifts. The best K-pop performs genuine structural breaking in the bridge — your predictive model fails, and you are uncertain how the song will end.

Fix: The final chorus returns. But you carry the bridge experience into your hearing, so the same melody weighs more than the first chorus did.

The four steps are identical. The encoding systems are entirely different (orchestra vs. electronic production + vocals; sonata form vs. verse-chorus structure), but the chisel-construct operations are isomorphic.

Why do both cross cycles? Because the Unfold in each produces remainder that cannot be exhausted in a single pass. Beethoven's development section's complex texture still yields new relationships after fifty listens. BTS's "Spring Day" bridge still has subtle qualities of timbre and emotional inflection after thirty listens. Their remainder is real, not decorative.

VII. Heteromorphic Equivalence: Bach Fugue and Guqin "Flowing Water"

If Beethoven and BTS represent "different traditions performing the same operation," then Bach's fugues and the guqin piece "Flowing Water" demonstrate a further level of proof: "entirely different encoding systems producing the same remainder effect."

Bach's Well-Tempered Clavier fugues: the encoding system is Western contrapuntal polyphony. The subject enters in different voices at different time intervals, then undergoes inversion, retrograde, compression, and expansion. The Unfold of the entire work is achieved through rigorous mathematical transformation — it breaks your expectation by presenting the same subject in spatial relationships you did not anticipate.

The guqin piece "Flowing Water": the encoding system is Chinese traditional 散板 structure combined with specialized right-hand techniques. It has no counterpoint, no harmony, no fixed meter. Its Unfold relies on entirely different means — subtle variations of timbre (the same note can produce entirely different textures through ornamental techniques such as 吟, 猱, 绰, 注), the unpredictability of silence (散板 means the temporal distance between notes is elastic), and the sudden density explosion of the "seventy-two rolling waves" passage.

The encoding systems share almost no common features. But the remainder effect is the same: something new after five hundred listens.

This is "heteromorphic equivalence": structures completely different, but the inexhaustibility of remainder is equivalent. Both works have crossed centuries, crossed the cultural boundary between East and West, for reasons that are, at the cognitive level, the same.

VIII. Counter-Example: When the Chisel-Construct Cycle Is Incomplete

The positive cases have been presented. Now a negative case is needed.

The 2024 collaboration "APT" by Bruno Mars and BLACKPINK's Rosé is a precise example. When it was released, it went viral globally — the hook was extremely strong, the "APT APT APT" chant was memorable after a single listen, the chorus was immediately catchy. But within a few months, almost no one was listening to it anymore.

What was missing? Returning to the four steps: its Arise is extremely strong (the chant hook captures you instantly), and its Settle is extremely strong (the entire song repeatedly confirms this hook). But Unfold is virtually absent — the bridge is merely a brief energy reduction before immediately returning to the same hook; the predictive model is not broken. The result: the cognitive model is fully fixed by the first chorus. From the second listen onward, you know precisely what the next second will bring. Information entropy reaches zero. No remainder awaits you.

This is not a matter of taste. You may enjoy this song and derive pleasure from those first few listens. But it cannot cross cycles because its chisel-construct cycle is incomplete. "Sounds fine at first, gets tiresome with repetition" — the structural explanation for this universal experience is: Arise and Settle were maximized, but Unfold was skipped, so remainder was exhausted by the third listen.

The creative process itself also confirms the analysis. The song's core hook was born from an impromptu session — Rosé taught the production team a Korean drinking game, and they immediately began chanting "APT" over a drum beat. Producer Rogét Chahayed said they left the studio that day "not really knowing what they had." This illustrates precisely the point: Arise and Settle can be completed instantaneously through improvisation. But Unfold is not something a flash of inspiration can deliver. Chiseling requires stepping back from the construct you have just built, examining it repeatedly, and finding the precise location for breaking. Improvisation gives you construct; examination gives you chisel.

By contrast: why do the best BTS songs, Jay Chou's early works, and Kendrick Lamar's albums cross cycles when they are equally pop music? Because their Unfold is real. Their bridge passages, flow switches, and arrangement ruptures genuinely break the predictive model and produce remainder that cannot be absorbed in three listens.

Jay Chou deserves particular mention. The distinctiveness of his early works lies not merely in good melody (good melody means only that Arise and Settle are strong), but in a cross-system operation at the encoding level — embedding Chinese pentatonic scales within Western pop harmony. The collision of these two encoding systems itself produces a unique form of remainder: the cognitive system is simultaneously running two predictive models to process the same music, and the conflict and fusion between the two models generates residual information that is inexhaustible. This is not the empty label of "East-meets-West" — it is a verifiable structural fact.

IX. A Special Case: Religious Music

One category of music appears not to fit the Arise-Settle-Unfold-Fix model: Gregorian chant, Buddhist 梵呗, Islamic call to prayer, Tibetan Buddhist chanting.

Their structure is "Arise-Settle-Settle-Settle-Settle..." extending indefinitely. Unfold is deliberately suppressed. Melody is extremely simple, rhythm extremely regular; there is no development section, no bridge, no moment that breaks expectation.

By the preceding logic, this should be the degraded form of "information entropy reaching zero." Yet in fact, good religious music can sustain a listener for extended periods without fatigue — and can even induce an altered state of consciousness.

This is not contradictory. And it can be explained by the same structural mechanism, without resort to intention or purpose.

The key lies in a concept that Essay III will discuss in detail: the phase transition from Settle to Unfold. When Settle is pushed past a certain cognitive saturation threshold, the predictive model does not continue to stabilize — it begins to disintegrate precisely because of over-stabilization. A cognitive system running for too long in a state where no new information can be captured undergoes a phase transition at the biological limit: repetition itself becomes a form of breaking. You no longer hear "yet another iteration of the same melody"; you begin to hear micro-structures within the melody that you never noticed — an overtone, a breath, a deviation between your own heartbeat and the rhythm.

This is fully isomorphic with Pina Bausch's repetition technique in cognitive mechanism: extreme Settle, past the threshold, reverses into Unfold. Remainder is not produced from new material but seeps out from the over-stabilization of old material.

Then what is the difference between religious music and a viral pop hook? Why does one trigger phase transition while the other merely reaches information entropy zero?

The difference lies in temporal scale and cognitive environment. A viral hook's Settle lasts fifteen seconds to three minutes, played in an environment of highly dispersed attention — the cognitive system falls far short of reaching the saturation threshold before being interrupted by the next stimulus. Religious music's Settle lasts tens of minutes to hours, conducted in an environment deliberately purged of other stimuli (cathedral, temple, ritual space). It has sufficient time and cognitive space to push Settle past the threshold.

Same mechanism, different temporal parameters. No need for "purpose" or "intention" to explain — structure itself explains why one triggers phase transition and the other only degrades.

X. The Rationale Behind Music Theory

Return to the opening sentence.

Music theory is technique; the rationale is the dao. The West developed tonal harmony, counterpoint, and sonata form. China developed pentatonic scales, modal-rhythmic frameworks, and 散板 structure. India developed the raga system. West Africa developed polyrhythm. Each is a sophisticated encoding system; each records and organizes sound differently.

But they encode the same thing.

A three-year-old child knows nothing of tonality, pentatonic scales, or raga, yet hearing a song she forms expectation, feels pleasure when expectation is confirmed, feels surprise when expectation is broken, and feels completion at the end. She is undergoing the complete chisel-construct cycle; she simply does not know its name.

The rationale behind music theory is this: all music theories encode the same cognitive process. Encoding systems vary in sophistication and granularity, but what is encoded — Arise, Settle, Unfold, Fix — is a basic operation of the human nervous system, invariant across cultures and across history.

Whether a piece of music can cross cycles depends not on which encoding system it employs, but on whether, within that system, it completes the chisel-construct cycle, and whether its remainder is real and inexhaustible.

This criterion applies to Beethoven and to K-pop, to Bach and to the guqin, to Kendrick Lamar and to Gregorian chant (the latter reaches remainder through a different path — extreme Settle triggering phase transition at the cognitive saturation point).

Essay II takes us into opera and Chinese opera — when the chisel-construct cycle is no longer confined to a single auditory channel but runs simultaneously across sound, body, and narrative, a new mechanism appears: cross-channel chisel. What Mei Lanfang's single glance and Maria Callas's single breath placement do is, structurally, the same operation.

本系列不讲乐理，讲乐理的道理。

乐理告诉你这里是属七和弦解决到主和弦。道理告诉你为什么"解决"这个概念在任何音乐文化里都存在——不管这个文化有没有发展出和声学这套编码系统。

我不讨论作品"表达了什么"。我只讨论一个问题：为什么某些音乐能让你反复进入、反复被改变，而另一些只在第一次有效？

我把这个差异压缩成一个四步循环：生、定、展、固。

它不是乐理，不是风格，不是文化传统。它是人类认知系统在时间中处理期待的基本操作。这四个字直接取自Self-as-an-End框架的生命周期表——选择、构建、被凿开、重新闭合。音乐让你感动，恰恰是因为它在听觉通道上复现了生命本身的节奏。

你不需要懂乐理。你只需要承认一件事：你听过那种"结束之后比开始更重"的感觉。

关于尺度的说明：这个循环是尺度可变的。它可以在单个乐句层面运作，也可以跨越整部作品。它可以嵌套（大循环包含多个小循环）、并行（多个循环同时运行）、或交错（不同通道处于不同阶段）。一部作品不只经历一次循环——它同时在多个尺度上运行多个循环，这些循环之间的关系本身就是余项的来源。

一、你已经知道的体验

先不谈任何理论。

想一首你听过很多遍还愿意再听的曲子。任何类型都可以。古典、流行、rap、民歌、电子，都行。

现在回忆一下你第一次听它的经历。不需要精确，大概的感觉就够：

开头的某个东西抓住了你。可能是一段旋律，可能是一个节奏，可能是一个声音的质感。你不一定说得出它是什么，但你的注意力被锁定了。你的大脑开始做一件事——预测。你开始期待"接下来会怎样"。

然后这个期待被确认了。重复出现，或者发展了，或者副歌来了。你觉得"我知道这首歌在做什么了"。一种舒适感。你进入了它的世界。

然后某个瞬间，你的期待被打破了。

可能是一个你没预料到的转调。可能是节奏突然变了。可能是人声做了一个你没想到的处理。可能很微妙，你甚至说不清具体是什么。但你感觉到了——一个"不对"，一个"意外"，但不是随机的意外，而是在某种你说不出的逻辑里的意外。

然后它回来了。旋律回来了，或者某种闭合感出现了。但你的感受变了。同样的旋律，在你心理上的重量不一样了。

结束的时候，你比开始的时候多了一点什么。但你说不出那是什么。

这就是凿构循环的完整体验。每个听过音乐的人都经历过。三岁小孩听儿歌也在经历它。它不需要乐理来解释，但乐理可以被它解释。

二、生定展固

把上面那个体验拆成四步。

生。建立期待。一段音乐材料出现，你的神经系统开始形成预测模型。不需要你有意识地去做这件事，它是自动的。只要你在听，你的大脑就在预测下一个音。

贝多芬第五交响曲的开头四个音就是"生"。BTS一首歌的verse开始也是"生"。古琴曲《流水》的散板引子是"生"。Kendrick Lamar在一首歌里建立的第一个flow pattern是"生"。DJ set里build-up开始的那一刻是"生"。

材料不同，编码不同，操作相同：让听者的大脑产生"下一步会怎样"的期待。

定。确认期待，让它成为稳定模型。重复、发展、reinforcement。听者从"我在预测"变成"我确定了"。

副歌重复两遍是"定"。赋格主题在不同声部出现是"定"。rap里同一个flow跑满一个verse是"定"。古琴曲里一个音型被反复展开是"定"。

这一步建立舒适感。但舒适感不是目的，它是为下一步做准备。你必须先有一个稳定的模型，这个模型才能被打破。没有"定"就没有"展"，就像你凿不开一块还没成型的石头。

展。打破期待。在不随机的前提下，让听者的预测模型失效。

这是最关键的一步。

贝多芬在发展部把四音动机碎片化，调性偏移，节奏错位。K-pop的bridge段突然变拍、变调、降低能量密度。Kendrick在verse中间突然切换flow，节奏结构被重组。巴赫在赋格的展开部把主题做倒影、逆行、紧缩。古琴曲《流水》在"七十二滚拂"段落突然从散板进入密集的泛音流——你以为你理解了这首曲子的节奏逻辑，然后它全部打开了。

"展"的本质是：你以为你知道下一个音是什么，结果不是。但又不是随机的。它在一个你能感受到但无法提前计算的逻辑里。

被打破的那一刻，你的认知系统暴露出一个东西：余项。就是你的模型吸收不了的那个部分。"我以为我理解了这首曲子，原来没有"——这个落差就是余项。

固。带着凿的痕迹重新闭合。

再现部回来了。最后一遍副歌回来了。古琴回到泛音的尾声。但这个回归不是简单的重复。听者已经被"展"改变了，所以同样的旋律在心理上的重量完全不同。

"固"不是回到"定"。"固"是一个新的稳定态，它比"定"多了一层东西——凿留下的痕迹。你闭合了，但闭合的内容更厚了。

这就是"结束之后比开始更重"的来源。

三、为什么有些音乐穿越周期

有了这个模型，"为什么有些音乐听一百遍还有东西"就有了一个结构性的回答：

因为它的余项不可穷尽。

每次听，你的预测模型都比上一次更精确。你对"生"和"定"的部分越来越熟悉。但"展"的部分——那个打破你期待的地方——总有一些东西超出你当次能完全消化的范围。你知道那里有什么，但你没有完全吸收。下次听的时候它还在那里等你。

巴赫的平均律赋格就是这个结构的极致。多声部的复调织体意味着你每次听都可以追踪不同的线条，而线条之间的关系产生的余项几乎是无穷的。你可以听五百遍，每次注意到一个新的东西。不是因为你漏听了——你的耳朵接收到了所有信息——而是因为你的认知模型每次只能处理其中一部分。剩下的就是余项，它永远在那里。

反过来，听一百遍之后信息量归零的音乐，是因为它的余项在第三遍就被完全吸收了。

这不是品味判断。这是结构判断。

四、两种退化

退化一：信息熵归零——只有生和定。

一首歌建立了一个pattern，然后无限重复。没有展，没有凿。听者的预测模型在第二遍就完全固化了，之后每一遍都是纯粹的确认，信息量为零。

这就是大部分短视频平台上那些几秒就能记住的旋律的结构。它们不是"坏音乐"——在结构上它们只是不完整的凿构循环。它们完成了"生"和"定"，但从来没有到达"展"。所以它们消耗极快：你今天觉得上头，一周后完全无感，因为余项在第一天就被穷尽了。

退化二：纯凿失锚——只有展。

另一个极端。某些极端实验音乐全程打破期待但从不建立。每一个音都是"意外"。但因为听者始终无法形成预测模型，所以也不存在"被打破"的体验。你凿不开一块没有形状的东西。

余项暴露不出来，因为根本没有construct可凿。这就是为什么这类音乐的受众通常很窄——不是因为听众"不够高级"，而是因为它在结构上缺少了"生"和"定"，使得"展"失去了参照物。

两种退化，一种是从不凿，一种是无处可凿。经久不衰的音乐在中间：它建立了足够稳定的模型，在精确的位置打破它，而且打破产生的余项不能被一次性吸收。

五、乐理只是编码

到这里我们还没有用过任何乐理术语来做论证。这是有意的。

因为乐理——任何一种乐理——只是特定文化对"生定展固"这个普遍认知过程发展出的编码系统。

西方用调性和声编码。它用"和弦解决"来实现"固"（属七→主和弦就是一种闭合），用"调性偏移"来实现"展"（转调就是打破你对调性的期待），用"动机发展"来实现"定"（重复的主题让你形成稳定预测）。

中国传统音乐用五声音阶和板式编码。散板（无固定拍）本身就是"生"的一种实现——你无法预测下一拍什么时候到来，所以你的认知系统始终在形成和修正期待。"紧拉慢唱"是同一时间两个速度层在运行，本质上是一种跨层级的"展"。

印度古典音乐用raga编码。raga规定了上行和下行使用不同的音列——这本身就内置了"展"的机制：你对上行形成的预测模型在下行时失效，系统强制你的模型被打破。

非洲鼓乐用复合节奏编码。多条节奏线各自独立运行，它们之间的相位关系不断变化——这跟巴赫赋格的多声部结构在认知层面是同一个操作，只是编码系统完全不同。

编码不同，被编码的东西——生定展固——是同一个。

这就是为什么一个真正懂巴赫的人听非洲鼓乐不会觉得"听不懂"。他会识别出凿构循环的操作，即使他完全不了解那套编码系统。他的审美能力——对凿构循环完整性的感知精度——是跨编码系统通用的。

这也是品味和审美的区别。品味是你习惯待在哪个编码系统里。审美是你对凿构循环本身的感知精度。前者千人千面，后者有客观标准。一个人可能只听trap但审美极好（对凿的精度非常敏感），也可能饱听古典但审美平平（只享受"定"的舒适感，从不注意"展"在做什么）。

5a、与两位前辈的对话

以上这些并非全新的洞察。音乐哲学领域在过去一百七十年里已经零散地触碰到了这头大象的不同部位。值得跟两位前辈做一个简短的对话，划出本文的位置。

Hanslick（汉斯力克），1854年，《论音乐中的美》。他是音乐形式主义的奠基人。核心主张：音乐的意义不在于它"表达了"什么情感，而在于"音调的运动形式"（tönend bewegte Formen）本身。音乐的美是结构性的，不是指涉性的。

这跟本文的立场高度共振——我们也不走解读路线，不说"贝多芬表达了命运的抗争"。但Hanslick止步于"形式就是意义"。他告诉你美在结构里，但他没有给出判据：什么样的结构能穿越周期？什么样的结构会信息熵归零？"生定展固"接上了他停下来的地方。不只是说美在形式，而是说形式之所以成立，是因为它完成了凿构循环，且余项不可穷尽。

Meyer（迈尔），1956年，《音乐中的情感与意义》。他做了一件跟本文非常接近的事：用"期待与打破期待"来解释音乐的意义。他的核心论点是音乐意义产生于听者期待被延迟或打破的瞬间。这几乎就是"展"。

但Meyer的框架缺两样东西。第一，他关注的是单点的期待打破，没有把"建立→确认→打破→带痕闭合"作为一个完整循环来处理。他能解释"为什么某个瞬间让你意外"，但解释不了"为什么结束之后比开始更重"——因为他没有"固"（带痕闭合）的概念。第二，他的框架是心理学的，止步于认知过程的描述，不触及"余项是否可穷尽"这个存在论层面的问题。

把Hanslick和Meyer放在一起看：Hanslick说美在形式但不给判据，Meyer说意义在打破期待但不做闭合。本文的位置恰好在两者之间——比Hanslick多了判据（哪种形式能穿越周期），比Meyer多了闭合（打破之后如何带痕重建），而且把它们统一在同一个四步循环里。

这不是说Hanslick和Meyer错了。他们各自抓住了这头大象的一条腿。本文试图做的，是把这几条腿装到同一头大象上。

六、同构对照：贝多芬与BTS

两个不可能更不同的传统。一个是十九世纪初的维也纳交响乐，一个是二十一世纪的韩国流行音乐。但在凿构循环的层面上，它们做着同一个操作。

贝多芬第五交响曲，第一乐章。

生：四音动机（短短短长）在开头的八小节内建立。极度简洁，任何人听一遍就能记住。你的大脑立刻开始预测——这个动机会怎样发展？

定：呈示部用大约一百小节反复确认这个动机。它在不同音高出现，在不同乐器出现，每次都是它。你的预测模型越来越稳定。

展：发展部开始。四音动机被拆碎了。它出现在你不预期的调性里，节奏被拉伸或压缩，乐器组之间互相撕扯。你的预测模型彻底失效——你知道这是那四个音，但你不知道它接下来会去哪。余项大量暴露。

固：再现部回来。四音动机回到主调。但你听到的不是开头那个动机了——它经历了发展部的撕裂之后，同样的音符在你的认知里变重了。闭合，但闭合的内容已经不同。

BTS "Spring Day"。

生：前奏和第一个verse的旋律线建立了一个情绪基调和节奏期待。柔和的电子音色加上人声，你的大脑开始建模。

定：Pre-chorus确认了这个方向。你觉得你知道副歌会是什么样子。

展：Bridge段。节奏变了。编曲变了。人声的处理方式变了。好的K-pop在bridge段做的是真正的结构打破——你的预测模型失效了，你不确定歌会怎样结束。

固：最后的副歌回来。但你带着bridge的体验在听，所以同样的旋律比第一遍副歌更重。

四步完全一样。编码系统完全不同（管弦乐 vs 电子编曲+人声，奏鸣曲式 vs verse-chorus结构），但凿构循环的操作是同构的。

为什么这两首都能穿越周期？因为它们的"展"都产生了不可一次性穷尽的余项。贝多芬发展部的复杂织体你听五十遍还能发现新关系。BTS "Spring Day"的bridge段的音色处理和情绪转折你听三十遍还有微妙的东西在那里。它们的余项是真实的，不是装饰性的。

七、异构同效：巴赫赋格与古琴《流水》

如果说贝多芬和BTS是"不同传统做同一个操作"，那巴赫赋格和古琴《流水》是另一个层面的证明："完全不同的编码系统产生同一种余项效果"。

巴赫平均律的赋格：编码系统是西方复调对位法。主题在不同声部以不同的时间间隔进入，然后被倒影、逆行、紧缩、扩大。整个作品的"展"是通过严格的数学变换实现的——它打破你期待的方式是让同一个主题以你没预料到的空间关系出现。

古琴曲《流水》：编码系统是中国传统的散板结构加上特殊的右手技法。它没有对位法，没有和声，没有固定节拍。它的"展"靠的是完全不同的手段——音色的微妙变化（同一个音可以用吟、猱、绰、注产生完全不同的质感）、留白的不可预测性（散板意味着每个音之间的时间距离是弹性的）、以及"七十二滚拂"段落的突然密度爆发。

编码系统几乎没有任何共同点。但余项效果是一样的：听五百遍还有东西。

这就是"异构同效"：结构完全不同，但余项的不可穷尽性是等价的。这两首作品穿越了几百年，穿越了东西方的文化边界，原因在认知层面是同一个。

八、反例：当凿构循环不完整

正面案例说完了，需要一个反面。

2024年Bruno Mars和BLACKPINK Rosé的"APT"是一个精确的例子。这首歌刚出来的时候火遍全球——hook极强，"APT APT APT"的chant第一遍就能记住，副歌上头，你第一次听觉得很爽。但几个月后几乎没人再听了。

它缺了什么？回到四步看：它的"生"极强（chant hook瞬间抓住你），"定"极强（整首歌在反复确认这个hook）。但"展"几乎不存在——bridge段只是能量的短暂降低，然后立刻回到同一个hook，你的预测模型没有被打破。结果就是：你的认知模型在第一遍副歌就完全固化了。第二遍开始，每一秒你都精确知道下一秒是什么。信息量归零。没有余项在等你。

这不是品味问题。你可以喜欢这首歌，在那几遍里获得快乐。但它穿不过周期，因为它的凿构循环不完整。"刚听还行，听多了就烦"——这个普遍体验的结构解释就是：生和定被最大化了，但展被跳过了，所以余项在第三遍就穷尽了。

这首歌的创作过程本身也印证了这个分析。它的核心hook诞生于一次录音室里的即兴——Rosé给制作团队展示韩国酒桌游戏，大家当场就在鼓点上喊出了"APT"的chant，制作人Rogét Chahayed说他们那天离开录音室时"都不知道自己手上有什么"。这恰好说明了一件事："生"和"定"可以在即兴中瞬间完成。但"展"不是灵感迸发能解决的。凿需要对自己刚建好的construct反复审视，找到那个精确的打破位置。即兴给你construct，审视给你chisel。这首歌完成了前者，跳过了后者。

与之对比：为什么同样是流行音乐，BTS最好的几首歌、周杰伦早期作品、Kendrick Lamar的专辑能穿过去？因为它们的"展"是真实的。它们的bridge段、flow切换、编曲突变不是装饰性的——它们真的打破了预测模型，产生了不能被三遍消化的余项。

周杰伦特别值得提一句。他早期作品之所以独特，不只是因为旋律好（旋律好只是"生"和"定"强），而是因为他在编码层面做了一个跨系统的操作——把中国五声音阶嵌进西方流行和声里。这两套编码系统的碰撞本身就制造了一种独特的余项：认知系统同时在用两套预测模型去处理同一段音乐，两套模型之间的冲突和融合产生的剩余信息是不可穷尽的。这不是"中西合璧"的空洞标签，这是一个可检验的结构事实。

九、一个特殊案例：宗教音乐

有一类音乐似乎不符合"生定展固"模型：格里高利圣咏、佛教梵呗、伊斯兰宣礼、藏传佛教的诵经。

它们的结构是"生-定-定-定-定……"无限延续。有意压制"展"。旋律极度简洁，节奏极度规律，没有发展部，没有bridge段，没有任何打破期待的瞬间。

按照前面的逻辑，这应该是"信息熵归零"的退化形态。但事实上，好的宗教音乐能让人在其中停留很长时间而不觉得厌倦——甚至进入某种变化了的意识状态。

这不矛盾。而且它可以被同一个结构机制解释，不需要诉诸意图或目的。

关键在于第三篇会详细讨论的一个概念：定→展的相变。当"定"被推过某个认知饱和的临界点时，你的预测模型不是继续稳定——它因为过度稳定而开始崩解。认知系统在无法捕捉到任何新信息的状态下持续运行太久，会在生物学极限上发生相变：重复本身变成了一种打破。你不再听到"又一遍同样的旋律"，你开始听到旋律里面那些你从来没注意过的微观结构——一个泛音，一个气息，一个你自己心跳和节奏之间的偏差。

这跟皮娜·鲍什的重复手法在认知机制上是完全同构的：极端的"定"在临界点之后反转为"展"，余项不是从新材料中产生的，而是从旧材料的过度稳定中渗透出来的。

那么宗教音乐跟抖音神曲的区别是什么？为什么同样是"纯定"，一个能触发相变，一个只是信息熵归零？

区别在于时间尺度和认知环境。抖音神曲的"定"持续十五秒到三分钟，在一个注意力高度分散的环境里播放——你的认知系统远远没有到达饱和临界点，就已经被下一个刺激打断了。宗教音乐的"定"持续几十分钟到几个小时，在一个刻意排除其他刺激的环境里进行（教堂、寺庙、仪式空间）。它有足够的时间和认知空间把"定"推过临界点。

同一个机制，不同的时间参数。不需要用"目的"或"意图"来解释——结构本身已经解释了为什么一个触发相变，另一个只是退化。

十、乐理的道理

回到开头的那句话。

乐理是术，道理是道。西方发展出了调性和声体系、对位法、奏鸣曲式。中国发展出了五声音阶、板式变化、散板结构。印度发展出了raga体系。非洲发展出了复合节奏。每一套都是精妙的编码系统，记录和组织声音的方式各不相同。

但它们编码的是同一个东西。

一个三岁小孩不知道什么是调性、什么是五声音阶、什么是raga，但他听到一首歌会产生期待，期待被确认时会高兴，期待被打破时会惊讶，结束时会感到某种完成。他在经历完整的凿构循环，只是不知道它叫这个名字。

乐理的道理就是：所有乐理都在编码同一个认知过程。编码方式有高低精粗之分，但被编码的那个"生定展固"是人类神经系统的基本操作，不分中外古今。

一首音乐能不能穿越周期，不取决于它用了哪套编码系统，取决于它有没有在那套系统内完成完整的凿构循环，以及它的余项是不是真实且不可穷尽的。

这个判断标准适用于贝多芬也适用于K-pop，适用于巴赫也适用于古琴，适用于Kendrick Lamar也适用于格里高利圣咏（后者通过极端的"定"在认知饱和点触发相变，用另一条路径抵达了余项）。

下一篇，我们进入戏曲和歌剧——当凿构循环不再限于单一听觉通道，而是跨越声音、身体、叙事同时运行时，一个新的机制出现了：跨通道凿。梅兰芳的一个眼神和卡拉丝的一个气息，在结构上做着同一个操作。

Chisel and Construct: The Universal Structure of Temporal Arts Essay I: Music and Song 凿与构：时间性艺术的通用结构·第一篇：音乐/歌曲