Chisel and Construct: The Universal Structure of Temporal Arts
Essay I: Music and Song
凿与构:时间性艺术的通用结构·第一篇:音乐/歌曲
This series does not discuss music theory. It discusses the rationale behind music theory.
Music theory tells you that a dominant seventh chord resolves to the tonic. The rationale tells you why the concept of "resolution" exists in every musical culture — regardless of whether that culture ever developed harmonic theory as an encoding system.
I do not discuss what a work "expresses." I discuss only one question: why can some music be entered repeatedly, with each entry yielding something new, while other music is effective only on first encounter?
I compress this difference into a four-step cycle: Arise (生), Settle (定), Unfold (展), Fix (固).
This is not music theory. It is not style. It is not cultural tradition. It is the basic operation by which the human cognitive system processes expectation across time. These four terms are taken directly from the Life Cycle Table in the Self-as-an-End (SAE) framework — selection, construction, being chiseled open, re-closure. Music moves us precisely because it replicates, in the auditory channel, the rhythm of life itself.
You do not need to understand music theory. You need only acknowledge one thing: you have experienced the sensation of an ending that weighs more than the beginning.
A preliminary note on scale: This cycle is scale-variable. It can operate at the level of a single musical phrase or across an entire work. It can nest (a large cycle containing multiple smaller cycles), run in parallel (multiple cycles operating simultaneously), or stagger (different channels at different stages). A work does not run through the cycle only once — it runs multiple cycles at multiple scales simultaneously, and the relationships between those cycles are themselves a source of remainder.
I. An Experience You Already Know
Set aside all theory for now.
Think of a piece of music you have listened to many times and would willingly listen to again. Any genre. Classical, pop, rap, folk, electronic — all qualify.
Now recall, roughly, your experience of hearing it for the first time:
Something at the opening caught you. Perhaps a melody, a rhythm, a quality of sound. You may not be able to name it, but your attention locked. Your brain began doing something: predicting. You began to expect "what comes next."
Then that expectation was confirmed. Repetition, development, or the arrival of a chorus. You felt: "I understand what this piece is doing." A sense of comfort. You entered its world.
Then, at some moment, your expectation was broken.
Perhaps an unanticipated modulation. A sudden shift in rhythm. A vocal treatment you did not foresee. Perhaps very subtle — you cannot articulate precisely what changed. But you felt it: a "wrongness," a "surprise" — yet not random surprise. It followed a logic you could sense but could not have computed in advance.
Then it returned. The melody came back, or some sense of closure appeared. But your experience had changed. The same melody now carried different weight.
At the end, you possessed something you did not possess at the beginning. But you cannot say what it is.
This is the complete experience of the chisel-construct cycle. Every person who has listened to music has undergone it. A three-year-old hearing a nursery rhyme undergoes it. It requires no music theory to explain, though music theory can be explained by it.
II. Arise, Settle, Unfold, Fix
Decompose the above experience into four steps.
Arise (生). Establish expectation. A segment of musical material appears; the nervous system begins forming a predictive model. This requires no conscious effort — it is automatic. As long as you are listening, your brain is predicting the next sound.
The opening four notes of Beethoven's Fifth Symphony are Arise. The opening verse of a BTS song is Arise. The 散板 introduction of the guqin piece "Flowing Water" is Arise. The first flow pattern Kendrick Lamar establishes in a track is Arise. The moment a DJ set's build-up begins is Arise.
Material differs. Encoding differs. The operation is the same: cause the listener's brain to form an expectation of "what comes next."
Settle (定). Confirm expectation; allow it to become a stable model. Repetition, development, reinforcement. The listener moves from "I am predicting" to "I am certain."
A chorus repeated twice is Settle. A fugue subject appearing in different voices is Settle. A rapper running the same flow through an entire verse is Settle. A motif elaborated across a guqin passage is Settle.
This step establishes comfort. But comfort is not the goal — it is preparation for the next step. You must first have a stable model before that model can be broken. Without Settle there is no Unfold, just as you cannot chisel a stone that has not yet taken shape.
Unfold (展). Break expectation. Under the condition of non-randomness, cause the listener's predictive model to fail.
This is the most critical step.
Beethoven fragments the four-note motif in the development section, displaces tonality, distorts rhythm. A K-pop bridge suddenly shifts meter, key, and energy density. Kendrick switches flow mid-verse, restructuring the rhythmic framework. Bach inverts, retrogrades, and compresses the fugue subject in the development. The guqin piece "Flowing Water" erupts from 散板 into the dense overtone cascade of the "seventy-two rolling waves" — you thought you understood the temporal logic of this piece, and then it opens entirely.
The essence of Unfold: you thought you knew what the next sound would be, and you were wrong. But it was not random — it followed a logic you could feel but could not have calculated.
At the moment of breaking, the cognitive system exposes something: remainder. The portion your model cannot absorb. "I thought I understood this piece — I did not" — that gap is remainder.
Fix (固). Re-close while carrying the traces of chiseling.
The recapitulation returns. The final chorus returns. The guqin returns to its overtone coda. But this return is not simple repetition. The listener has been changed by Unfold, so the same melody now carries different psychological weight.
Fix is not a return to Settle. Fix is a new stable state that contains one additional layer — the trace left by the chisel. You have closed, but the content of closure is thicker.
This is the source of "the ending weighs more than the beginning."
III. Why Some Music Crosses Cycles
With this model, "why does some music still have something after a hundred listens" receives a structural answer:
Because its remainder is inexhaustible.
With each listen, your predictive model becomes more precise than the last. You grow increasingly familiar with the Arise and Settle portions. But in the Unfold portion — the place where your expectation was broken — there is always something that exceeds what you can fully absorb in that particular listening. You know something is there, but you have not fully taken it in. Next time you listen, it is still there, waiting.
Bach's Well-Tempered Clavier fugues are the extreme case of this structure. The polyphonic texture of multiple voices means that each listening allows you to track different lines, and the relationships between lines produce remainder that is practically infinite. You can listen five hundred times and notice something new each time. Not because you missed sounds — your ears received all the information — but because your cognitive model can process only a portion per pass. The rest is remainder, and it is always there.
Conversely, music whose information content reaches zero after a hundred listens does so because its remainder was fully absorbed by the third listen.
This is not a judgment of taste. It is a structural diagnosis.
IV. Two Modes of Degradation
Degradation One: Information entropy reaches zero — only Arise and Settle.
A song establishes a pattern, then repeats it indefinitely. No Unfold, no chisel. The listener's predictive model is fully fixed by the second pass; every subsequent pass is pure confirmation at zero information.
This is the structure of most viral short-form melodies. They are not "bad music" — structurally, they are incomplete chisel-construct cycles. They accomplish Arise and Settle but never reach Unfold. They are consumed rapidly: what feels irresistible today provokes indifference within a week, because remainder was exhausted on the first day.
Degradation Two: Pure chisel without anchor — only Unfold.
The opposite extreme. Certain radical experimental music breaks expectation throughout but never establishes anything. Every sound is a "surprise." But because the listener can never form a predictive model, there is no experience of "being broken." You cannot chisel a shapeless thing.
Remainder cannot be exposed because there is no construct to chisel. This is why such music typically has a narrow audience — not because listeners are "insufficiently sophisticated," but because structurally it lacks Arise and Settle, leaving Unfold without a reference frame.
Two modes of degradation: one never chisels, the other has nothing to chisel. Music that endures occupies the middle: it builds a sufficiently stable model, breaks it at a precise location, and the breaking produces remainder that cannot be absorbed in a single pass.
V. Music Theory Is Only Encoding
To this point, we have not used a single music-theory term in the argument. This is deliberate.
Because music theory — any music theory — is merely a specific culture's encoding system for the universal cognitive process of Arise-Settle-Unfold-Fix.
Western music encodes with tonal harmony. It uses "chord resolution" to implement Fix (dominant seventh → tonic is a form of closure), "tonal displacement" to implement Unfold (modulation breaks tonal expectation), and "motivic development" to implement Settle (repeated themes stabilize prediction).
Chinese traditional music encodes with pentatonic scales and 板式 (rhythmic-modal frameworks). 散板 (free rhythm) is itself an implementation of Arise — you cannot predict when the next beat will land, so your cognitive system is perpetually forming and revising expectation. "Tight playing, slow singing" (紧拉慢唱) runs two speed layers simultaneously, which is essentially a cross-level Unfold.
Indian classical music encodes with raga. A raga prescribes different pitch sets for ascending and descending phrases — this embeds the Unfold mechanism: the predictive model you form during ascent fails during descent, forcing a model break.
West African drumming encodes with polyrhythm. Multiple independent rhythmic lines run simultaneously, their phase relationships constantly shifting — this is cognitively the same operation as Bach's multi-voice polyphony, only the encoding system is entirely different.
Encodings differ. What is encoded — Arise, Settle, Unfold, Fix — is the same.
This is why a person who truly understands Bach will not feel "lost" when hearing West African drumming. They will recognize the chisel-construct operations even without understanding the encoding system. Their aesthetic capacity — the precision with which they perceive the completeness of chisel-construct cycles — is cross-encoding and universal.
This is also the distinction between taste and aesthetic judgment. Taste is which encoding system you habitually inhabit. Aesthetic judgment is the precision with which you perceive the chisel-construct cycle itself. The former is subjective and varies from person to person. The latter has an objective standard. A person may listen exclusively to trap music yet possess excellent aesthetic judgment (high sensitivity to the precision of the chisel), or listen extensively to classical music yet have mediocre aesthetic judgment (enjoying only the comfort of Settle, never attending to what Unfold is doing).
5a. Dialogue with Two Predecessors
The above insights are not entirely new. The philosophy of music has, over the past hundred and seventy years, touched different parts of this elephant in scattered fashion. A brief dialogue with two predecessors will locate the present essay's position.
Hanslick (1854), On the Musically Beautiful. He is the founder of musical formalism. Core thesis: the meaning of music lies not in what emotions it "expresses" but in "tonally moving forms" (tönend bewegte Formen) themselves. Musical beauty is structural, not referential.
This resonates strongly with the present essay — we likewise refuse the interpretive route and do not say "Beethoven expresses the struggle against fate." But Hanslick stops at "form is meaning." He tells you beauty resides in structure but provides no criterion: what kind of structure crosses cycles? What kind degrades to zero information? He distinguishes "beauty in form" from "beauty in feeling" but does not distinguish "durable form" from "consumable form." Arise-Settle-Unfold-Fix picks up precisely where he stopped. Not merely that beauty resides in form, but that form endures because it completes the chisel-construct cycle and its remainder is inexhaustible.
Meyer (1956), Emotion and Meaning in Music. He did something very close to the present essay: he explained musical meaning through "expectation and the breaking of expectation." His core argument is that musical meaning arises at the moment when a listener's expectation is delayed or broken. This is nearly identical to Unfold.
But Meyer's framework lacks two things. First, he focuses on the single point of expectation-breaking, without treating "establish → confirm → break → re-close with trace" as a complete cycle. He can explain "why a particular moment surprises you" but cannot explain "why the ending weighs more than the beginning" — because he has no concept of Fix (re-closure carrying the trace of breaking). Second, his framework is psychological, stopping at the description of cognitive process without touching the ontological question of "whether remainder is exhaustible."
Placing Hanslick and Meyer together: Hanslick says beauty is in form but provides no criterion; Meyer says meaning is in breaking expectation but does not make the closure. The present essay's position falls precisely between the two — more criterion than Hanslick (which forms cross cycles), more closure than Meyer (how breaking leads to re-construction with trace), and both unified within a single four-step cycle.
This is not to say Hanslick and Meyer were wrong. Each grasped one leg of the elephant. The present essay attempts to assemble the whole animal.
VI. Isomorphic Comparison: Beethoven and BTS
Two traditions that could not be more different. One is early nineteenth-century Viennese symphonic music; the other is twenty-first-century Korean pop. But at the level of the chisel-construct cycle, they perform the same operation.
Beethoven, Symphony No. 5, first movement.
Arise: The four-note motif (short-short-short-long) is established within the opening eight bars. Extremely concise; anyone can memorize it in one hearing. Your brain immediately begins predicting — how will this motif develop?
Settle: The exposition spends approximately one hundred bars repeatedly confirming this motif. It appears at different pitches, in different instruments — always the same motif. Your predictive model grows increasingly stable.
Unfold: The development section begins. The four-note motif is fragmented. It appears in keys you did not expect; rhythm is stretched or compressed; instrumental groups tear against each other. Your predictive model fails completely — you recognize the four notes, but you do not know where they will go next. Remainder is massively exposed.
Fix: The recapitulation. The four-note motif returns to the home key. But what you hear is not the same motif as at the opening — having passed through the development's tearing, the same notes now carry different weight in your cognition. Closure, but the content of closure has changed.
BTS, "Spring Day."
Arise: The intro and first verse establish a melodic and emotional baseline through soft electronic textures and vocals. Your brain begins modeling.
Settle: The pre-chorus confirms the direction. You feel you know what the chorus will sound like.
Unfold: The bridge. Rhythm shifts. Arrangement shifts. Vocal treatment shifts. The best K-pop performs genuine structural breaking in the bridge — your predictive model fails, and you are uncertain how the song will end.
Fix: The final chorus returns. But you carry the bridge experience into your hearing, so the same melody weighs more than the first chorus did.
The four steps are identical. The encoding systems are entirely different (orchestra vs. electronic production + vocals; sonata form vs. verse-chorus structure), but the chisel-construct operations are isomorphic.
Why do both cross cycles? Because the Unfold in each produces remainder that cannot be exhausted in a single pass. Beethoven's development section's complex texture still yields new relationships after fifty listens. BTS's "Spring Day" bridge still has subtle qualities of timbre and emotional inflection after thirty listens. Their remainder is real, not decorative.
VII. Heteromorphic Equivalence: Bach Fugue and Guqin "Flowing Water"
If Beethoven and BTS represent "different traditions performing the same operation," then Bach's fugues and the guqin piece "Flowing Water" demonstrate a further level of proof: "entirely different encoding systems producing the same remainder effect."
Bach's Well-Tempered Clavier fugues: the encoding system is Western contrapuntal polyphony. The subject enters in different voices at different time intervals, then undergoes inversion, retrograde, compression, and expansion. The Unfold of the entire work is achieved through rigorous mathematical transformation — it breaks your expectation by presenting the same subject in spatial relationships you did not anticipate.
The guqin piece "Flowing Water": the encoding system is Chinese traditional 散板 structure combined with specialized right-hand techniques. It has no counterpoint, no harmony, no fixed meter. Its Unfold relies on entirely different means — subtle variations of timbre (the same note can produce entirely different textures through ornamental techniques such as 吟, 猱, 绰, 注), the unpredictability of silence (散板 means the temporal distance between notes is elastic), and the sudden density explosion of the "seventy-two rolling waves" passage.
The encoding systems share almost no common features. But the remainder effect is the same: something new after five hundred listens.
This is "heteromorphic equivalence": structures completely different, but the inexhaustibility of remainder is equivalent. Both works have crossed centuries, crossed the cultural boundary between East and West, for reasons that are, at the cognitive level, the same.
VIII. Counter-Example: When the Chisel-Construct Cycle Is Incomplete
The positive cases have been presented. Now a negative case is needed.
The 2024 collaboration "APT" by Bruno Mars and BLACKPINK's Rosé is a precise example. When it was released, it went viral globally — the hook was extremely strong, the "APT APT APT" chant was memorable after a single listen, the chorus was immediately catchy. But within a few months, almost no one was listening to it anymore.
What was missing? Returning to the four steps: its Arise is extremely strong (the chant hook captures you instantly), and its Settle is extremely strong (the entire song repeatedly confirms this hook). But Unfold is virtually absent — the bridge is merely a brief energy reduction before immediately returning to the same hook; the predictive model is not broken. The result: the cognitive model is fully fixed by the first chorus. From the second listen onward, you know precisely what the next second will bring. Information entropy reaches zero. No remainder awaits you.
This is not a matter of taste. You may enjoy this song and derive pleasure from those first few listens. But it cannot cross cycles because its chisel-construct cycle is incomplete. "Sounds fine at first, gets tiresome with repetition" — the structural explanation for this universal experience is: Arise and Settle were maximized, but Unfold was skipped, so remainder was exhausted by the third listen.
The creative process itself also confirms the analysis. The song's core hook was born from an impromptu session — Rosé taught the production team a Korean drinking game, and they immediately began chanting "APT" over a drum beat. Producer Rogét Chahayed said they left the studio that day "not really knowing what they had." This illustrates precisely the point: Arise and Settle can be completed instantaneously through improvisation. But Unfold is not something a flash of inspiration can deliver. Chiseling requires stepping back from the construct you have just built, examining it repeatedly, and finding the precise location for breaking. Improvisation gives you construct; examination gives you chisel.
By contrast: why do the best BTS songs, Jay Chou's early works, and Kendrick Lamar's albums cross cycles when they are equally pop music? Because their Unfold is real. Their bridge passages, flow switches, and arrangement ruptures genuinely break the predictive model and produce remainder that cannot be absorbed in three listens.
Jay Chou deserves particular mention. The distinctiveness of his early works lies not merely in good melody (good melody means only that Arise and Settle are strong), but in a cross-system operation at the encoding level — embedding Chinese pentatonic scales within Western pop harmony. The collision of these two encoding systems itself produces a unique form of remainder: the cognitive system is simultaneously running two predictive models to process the same music, and the conflict and fusion between the two models generates residual information that is inexhaustible. This is not the empty label of "East-meets-West" — it is a verifiable structural fact.
IX. A Special Case: Religious Music
One category of music appears not to fit the Arise-Settle-Unfold-Fix model: Gregorian chant, Buddhist 梵呗, Islamic call to prayer, Tibetan Buddhist chanting.
Their structure is "Arise-Settle-Settle-Settle-Settle..." extending indefinitely. Unfold is deliberately suppressed. Melody is extremely simple, rhythm extremely regular; there is no development section, no bridge, no moment that breaks expectation.
By the preceding logic, this should be the degraded form of "information entropy reaching zero." Yet in fact, good religious music can sustain a listener for extended periods without fatigue — and can even induce an altered state of consciousness.
This is not contradictory. And it can be explained by the same structural mechanism, without resort to intention or purpose.
The key lies in a concept that Essay III will discuss in detail: the phase transition from Settle to Unfold. When Settle is pushed past a certain cognitive saturation threshold, the predictive model does not continue to stabilize — it begins to disintegrate precisely because of over-stabilization. A cognitive system running for too long in a state where no new information can be captured undergoes a phase transition at the biological limit: repetition itself becomes a form of breaking. You no longer hear "yet another iteration of the same melody"; you begin to hear micro-structures within the melody that you never noticed — an overtone, a breath, a deviation between your own heartbeat and the rhythm.
This is fully isomorphic with Pina Bausch's repetition technique in cognitive mechanism: extreme Settle, past the threshold, reverses into Unfold. Remainder is not produced from new material but seeps out from the over-stabilization of old material.
Then what is the difference between religious music and a viral pop hook? Why does one trigger phase transition while the other merely reaches information entropy zero?
The difference lies in temporal scale and cognitive environment. A viral hook's Settle lasts fifteen seconds to three minutes, played in an environment of highly dispersed attention — the cognitive system falls far short of reaching the saturation threshold before being interrupted by the next stimulus. Religious music's Settle lasts tens of minutes to hours, conducted in an environment deliberately purged of other stimuli (cathedral, temple, ritual space). It has sufficient time and cognitive space to push Settle past the threshold.
Same mechanism, different temporal parameters. No need for "purpose" or "intention" to explain — structure itself explains why one triggers phase transition and the other only degrades.
X. The Rationale Behind Music Theory
Return to the opening sentence.
Music theory is technique; the rationale is the dao. The West developed tonal harmony, counterpoint, and sonata form. China developed pentatonic scales, modal-rhythmic frameworks, and 散板 structure. India developed the raga system. West Africa developed polyrhythm. Each is a sophisticated encoding system; each records and organizes sound differently.
But they encode the same thing.
A three-year-old child knows nothing of tonality, pentatonic scales, or raga, yet hearing a song she forms expectation, feels pleasure when expectation is confirmed, feels surprise when expectation is broken, and feels completion at the end. She is undergoing the complete chisel-construct cycle; she simply does not know its name.
The rationale behind music theory is this: all music theories encode the same cognitive process. Encoding systems vary in sophistication and granularity, but what is encoded — Arise, Settle, Unfold, Fix — is a basic operation of the human nervous system, invariant across cultures and across history.
Whether a piece of music can cross cycles depends not on which encoding system it employs, but on whether, within that system, it completes the chisel-construct cycle, and whether its remainder is real and inexhaustible.
This criterion applies to Beethoven and to K-pop, to Bach and to the guqin, to Kendrick Lamar and to Gregorian chant (the latter reaches remainder through a different path — extreme Settle triggering phase transition at the cognitive saturation point).
Essay II takes us into opera and Chinese opera — when the chisel-construct cycle is no longer confined to a single auditory channel but runs simultaneously across sound, body, and narrative, a new mechanism appears: cross-channel chisel. What Mei Lanfang's single glance and Maria Callas's single breath placement do is, structurally, the same operation.