Self-as-an-End
Temporal Arts Series · Essay II

Chisel and Construct: The Universal Structure of Temporal Arts
Essay II: Opera and Chinese Opera
凿与构:时间性艺术的通用结构·第二篇:戏曲/歌剧

DOI: 10.5281/zenodo.18989669  ·  CC BY 4.0
Han Qin · 2026
EN
中文

Essay I established the Arise-Settle-Unfold-Fix model within a single auditory channel. Music's chisel-construct cycle runs on one line — the ear receives sound, the brain builds a predictive model, the model is broken, and re-closure carries the trace.

But if you attend a performance of Peking opera, or a performance of Western opera, you notice something: you are processing several lines simultaneously. The singing is one line; bodily movement is another; stage visuals are another; if narrative is present, the story is yet another. Each line can independently run its own chisel-construct cycle.

This leads to the core question of this essay: when multiple chisel-construct cycles run simultaneously, what structurally new phenomenon occurs?

The answer is: cross-channel chisel.

I. What Is Cross-Channel Chisel

The chisel-construct cycle of pure music is single-channel. Unfold occurs within audition — melody breaks your melodic expectation, rhythm breaks your rhythmic expectation.

Opera and Chinese opera introduce a new mechanism: one channel is in Settle while another channel is simultaneously in Unfold.

Consider: you are listening to a vocal passage whose melodic line is entirely regular, its rhythmic-modal framework fully predictable — your auditory predictive model is stable; you are in Settle. But simultaneously, the bodily movement on stage does something you did not anticipate — a sudden turn, a hesitant gesture, an illogical pause. Your visual predictive model is broken.

Your audition is in Settle; your vision is in Unfold. A phase difference has appeared between two chisel-construct cycles.

This phase difference itself produces remainder — and remainder that pure music cannot produce. Because this remainder exists in neither single channel; it exists between channels. Your cognitive system simultaneously processes two lines, and the desynchronization of the two produces a residue: you sense something, but you do not know whether it belongs to what you heard or what you saw.

This is cross-channel chisel. It is a structural resource specific to opera and Chinese opera that pure music does not possess.

II. Peking Opera: Modal Framework Provides Anchor, Body Produces Chisel

Peking opera's encoding system is highly formalized. Modal-rhythmic frameworks (xipi, erhuang, various introductory and free-rhythm modes) prescribe the rhythmic structure and melodic direction of vocal passages. Role types (sheng, dan, jing, chou) prescribe vocal quality and performance norms. A viewer familiar with Peking opera has an extremely stable predictive model at the vocal level — they can predict the landing pitch of the next syllable almost half a phrase in advance.

This appears to be pure Settle. If Peking opera consisted only of singing, it would be a form with extremely strong formalization and minimal space for chiseling.

But Peking opera has more than singing. It has body work (身段).

"Farewell My Concubine," Consort Yu's Sword Dance.

What is the singing doing? The nanbanzi modal framework is highly regular, the melodic line beautiful but predictable. Your audition is in Settle — stable, comfortable; you know how the next phrase will cadence.

What is the sword dance doing? Each arc of the blade — its trajectory, speed, angle — falls outside your visual prediction. You do not know what the next movement will be. The sword's motion is fluid, with an improvisatory quality (though in fact rigorously trained); its trajectory exceeds your visual model.

Two lines run simultaneously. Audition in Settle, vision in Unfold. Your cognitive system is pulled between two different states, and this pulling is itself remainder.

Moreover, this remainder has a special property: it cannot be fully absorbed in any single channel. You cannot obtain it from a recording alone (the recording lacks bodily movement), nor can you obtain it from a silent video of the sword dance alone (without the vocal anchor, the visual loses its reference frame). It exists only at the intersection of the two channels.

This is why the live experience of Peking opera and its audio recording are two entirely different experiences. The recording amputates cross-channel chisel, leaving only single-channel vocal cycle.

"The Drunken Beauty," Yang Guifei's Body Work.

Another example. What Mei Lanfang does in this piece is subtler. The singing itself contains chiseling (certain non-standard treatments in the sipingdiao mode), but the core cross-channel chisel occurs in the counter-motion between body work and singing: when the vocal line moves toward "closing," the bodily movement moves toward "opening." Sound is converging; the body is diverging. The two channels transmit contradictory signals.

This contradiction is not error; it is design. It produces a form of remainder that is difficult to describe in language — you know you sensed something, but you cannot determine whether it was "heard" or "seen," because it belongs entirely to neither channel.

III. Western Opera: The Same Operation, Different Encoding

Set Peking opera aside and consider Western opera. The encoding system is entirely different — Western tonal harmony, orchestral instrumentation, Italian or German text, European stage tradition. But cross-channel chisel operates identically.

Wagner, Tristan und Isolde, Act II love duet.

This is the extreme case of cross-channel chisel in opera.

The two vocal parts — Tristan and Isolde — do this: one completes a phrase (Fix) while the other opens a new phrase (Arise). The melodic lines interweave, but their chisel-construct cycles are offset. When you follow Isolde's line, you feel closure approaching; simultaneously Tristan's line pulls you toward a new beginning. You are permanently in a half-completed state.

This is already cross-channel chisel within audition — a phase difference between two melodic lines. But Wagner layers another level: a disjunction between textual narrative and musical emotion. Tristan's sung text concerns night and death, but the music's trajectory is ascending, tending toward light. The narrative channel says one thing; the music channel says another.

Remainder is continuously produced between channels but never resolved. The entire second act is a vast Unfold without Fix. Closure is indefinitely deferred — until the Liebestod (Love-Death) of Act III, when all accumulated remainder resolves in the final harmonic cadence. That cadence carries such weight precisely because the preceding two-plus hours of cross-channel chisel accumulated an immeasurable quantity of remainder.

Puccini, Turandot, "Nessun dorma."

Far simpler than Wagner, but cross-channel chisel is still operating.

The melodic line is extremely clear, broad, predictable — this is Settle at the auditory level. Calaf is singing a "I will surely triumph" aria. But the orchestral texture beneath does something else: harmonic movement suggests uncertainty; orchestral color oscillates between brightness and shadow. The melody says "certainty"; the orchestra says "not certain."

Add another layer: if you understand the text, you know Calaf is wagering his life. The tension of textual narrative and the "triumphant feeling" of the melody create a gap. That gap is remainder produced by cross-channel chisel.

IV. Isomorphic Comparison: Consort Yu's Sword Dance and the Tristan Duet

The encoding systems of these two passages share almost no common features. One is Peking opera's nanbanzi mode plus body work; the other is German Romantic opera's chromatic harmony plus dual vocal parts. Language differs, musical system differs, performance tradition differs, audience cultural background differs.

But at the level of cross-channel chisel, they perform the same operation:

One channel provides an anchor (the vocal/one melodic line in Settle); another channel creates breaking atop that anchor (body movement/the other melodic line in Unfold). The phase difference between two chisel-construct cycles produces remainder that cannot be absorbed within any single channel.

The remainder in the sword dance comes from the phase difference between audition (Settle) and vision (Unfold). The remainder in the Tristan duet comes from the phase difference between two melodic lines, and between narrative and music. The channel combinations differ, but the operation "phase difference produces remainder" is the same.

Isomorphic: different traditions, same chisel-construct operation.

V. Heteromorphic Equivalence: Kunqu's "Dream in the Garden" and Mozart's Don Giovanni

If the sword dance and Tristan are "the same operation in different encodings," then Kunqu opera's The Peony Pavilion: Dream in the Garden and the final scene of Mozart's Don Giovanni are proof at another level: "entirely different cross-channel chisel methods producing the same remainder effect."

Kunqu, Dream in the Garden.

Kunqu is among the most highly formalized forms in Chinese opera. The 曲牌 (fixed-tune) system prescribes the prosody, melodic framework, and even tonal contour of every sung syllable. Body-work norms specify the angle of fingers. Within this extremely formalized construct, the space for Unfold appears to be nearly zero.

Yet the "Garden Stroll" scene produces a unique form of cross-channel chisel precisely at the extreme of formalization: the lyrics say "spring is beautiful" (construct, confirmatory), but Kunqu's characteristic slow tempo and the 水磨腔 (water-polished singing) treatment stretches every syllable into an extended temporal experience. Time itself becomes a channel. At the textual level you receive confirmation (this is spring, this is beauty), but at the level of temporal experience you receive breaking — this "spring is beautiful" is stretched too long, so long that within the stretched time you begin to feel something else. The loneliness and desire beneath Du Liniang's perception of spring is not stated by the lyrics; it is exposed by the stretching of time.

Remainder exists not in the text, not in the melody, but in the phase difference between time and text.

Mozart, Don Giovanni, final scene.

An entirely different cross-channel chisel method. The stone statue (the Commendatore's ghost) comes to dine. What Mozart does is: the narrative channel is in construct (the ghost comes to judge Don Giovanni — this is a moral story's closure), but the music channel is in chisel.

The musical treatment exceeds the logic of the narrative. The harmony at the statue's appearance (D minor, the dark timbre of trombones) does not merely "accompany" the plot; it produces a terror that exceeds the narrative framework. The narrative says "the wicked man is punished"; the music says "there is a force here that your moral narrative cannot frame." At the narrative level you receive closure (the villain falls); at the musical level you feel opening — the harmonic darkness is not something "justice is served" can explain.

Remainder exists in the gap between narrative closure and musical opening. This is why this scene is not merely the conclusion of a moral story but one of the most unsettling scenes in the history of music.

The two works' cross-channel chisel methods are entirely different — Kunqu uses temporal stretching to expose what lies beneath text; Mozart uses musical darkness to exceed the narrative frame. But the remainder effect is equivalent: you receive a residue that cannot be absorbed within any single channel. This residue has kept both works crossing centuries.

Heteromorphic equivalence: different chiseling, same inexhaustibility.

5a. Dialogue with Wagner's Gesamtkunstwerk

Cross-channel chisel has a natural interlocutor: Wagner's concept of Gesamtkunstwerk (total work of art).

In 1849, Wagner argued in The Artwork of the Future that since ancient Greece, music, poetry, drama, and dance had been severed from one another, and that opera's mission was to reunify them into a whole. All art forms should dissolve their boundaries and flow into a single river, serving a common purpose.

On the surface, this resembles the "multi-channel" discussion of this essay. But the underlying logic is precisely the opposite.

Wagner pursues the unity of channels — all channels saying the same thing, transmitting the same message, fusing into a seamless whole. The present analysis points toward the phase difference between channels — the power of cross-channel chisel lies precisely in the desynchronization of channels. The sword dance is powerful not because singing and body "fuse into one," but because singing is in Settle while the body is in Unfold. If all channels were perfectly synchronized, transmitting the same information, that would be "multi-channel Settle," not cross-channel chisel. Multi-channel Settle only thickens the construct without producing new remainder.

And there is an irony: Wagner's own finest work violates his own theory. The second act of Tristan is great not because it achieves the Gesamtkunstwerk ideal of unity — text speaks of death while music speaks of ascent; narrative closes while harmony opens. The contradiction between channels is the source of that scene's power. Wagner's theory says to fuse; Wagner's practice chisels.

This is, in fact, the consistent stance of the Self-as-an-End framework: not to pursue harmonious unity, but to produce genuine negation within the construct and observe what survives. Remainder is not produced in fusion; it is exposed in contradiction. The phase difference between channels is a form of contradiction — audition tells you one thing, vision tells you another, your cognitive system is torn between the two, and the tear is remainder.

A brief note on Brecht's Verfremdungseffekt (alienation effect). He stands opposite Wagner: Wagner wants the audience immersed in a unified illusion; Brecht wants to shatter the illusion and make the audience aware they are watching a performance. In the language of this essay, Brecht performs chisel at the meta-level — what he breaks is not the predictive model within any single channel, but the higher-order predictive model that "these channels should fuse into one." He exposes the channels themselves. This is another form of cross-channel chisel, only the object of chiseling is not the phase difference between channel contents, but the fact of "channel existence" itself.

Three positions are thus clear: Wagner unifies channels, Brecht exposes channels, this essay exploits the phase difference between channels. All three address the problem of multiple channels, but in entirely different operational directions.

VI. Formalization and the Tension of Chisel: The Structural Difference Between Master and Artisan

Opera and Chinese opera share a feature that pure music does not: a high degree of formalization.

Peking opera's modal-rhythmic frameworks, role types, and body-work norms are all strictly codified. Western opera has the division between recitative and aria, voice-type casting, and orchestral convention. Kunqu's fixed-tune system prescribes nearly every note. Noh theater's formalization is even more extreme — masks, foot patterns, fan angles are all fixed.

Formalization is an extremely strong construct. It allows the audience's predictive model to be established before the performance even begins — you know how xipi yuanban's rhythm will proceed; you know the aria will end with a high note.

This appears to be the enemy of Unfold. The stronger the formalization, the smaller the space for chiseling. But the truth is precisely the reverse.

The stronger the formalization, the more powerful any micro-chisel within it becomes. Because the audience's predictive model is extremely precise, any minute deviation is immediately perceived. In a framework where anything might happen, a small deviation goes unnoticed. In a framework where every note is prescribed, a half-note deviation is an earthquake.

This is the structural difference between artisan and master.

The artisan completes the formalized closure. Arise-Settle-Fix, clean, complete, impeccable. The audience applauds because the technique is perfect.

The master produces a minimal but real deviation within that closure. Mei Lanfang's single glance — at the moment formalization prescribes that you should look in a certain direction, his gaze hesitates for a fraction of a second. Callas's single breath placement — where everyone else would breathe at the same point, she shifts the breath by half a beat, subtly altering the respiratory structure of a phrase.

These deviations are so small you are barely conscious of them. But your predictive model registers them. You do not know what happened, but you feel "this is different from others." That "difference" is the Unfold the master has produced within the formalized construct. It is small enough not to destroy the formalization, but real enough to produce remainder.

The artisan's performance: watch once and that suffices — because it perfectly confirmed your prediction, with no remainder. The master's performance: watch ten times and there is still something — because those micro-chiselings produced inexhaustible remainder.

This is a verifiable structural diagnosis, not a judgment of taste.

VII. Counter-Example: When "Unfold" Is Systematically Deleted

The preceding counter-examples were at the individual level — a technically perfect but unmoved performer. Now consider a larger-scale counter-example: what happens when a state's power apparatus systematically deletes Unfold?

The twentieth century offers two independent large-scale experiments.

Nazi Germany, 1933–1945. The 1938 "Degenerate Music" (Entartete Musik) exhibition defined Schoenberg's atonality, jazz, and all music deemed "dissonant, chaotic, intellectual" as racial contamination. Only the German Romantic tradition from Beethoven to Bruckner was permitted. Nazi requirements for music were expressed almost entirely in negatives: music must not be dissonant, must not be atonal, must not be twelve-tone, must not be "chaotic," must not be jazz-influenced. What remained? Only Arise-Settle-Fix. Chisel was equated with racial degeneration.

The Soviet Union, 1930s–1950s. Socialist Realism required music to exhibit "narodnost" (populism) — which in practice meant conservative tonal harmony comprehensible to all. "Formalism" was the highest charge. Shostakovich's opera Lady Macbeth of the Mtsensk District was personally banned by Stalin in 1936 because its harmony was too dark, too satirical — in the language of this essay, because it performed genuine chiseling. Shostakovich never again wrote a serious opera or ballet. Chisel was equated with bourgeois corruption.

The two regimes' ideologies were diametrically opposed — one far-right racial nationalism, the other far-left communism. But in music policy they performed the same structural operation: delete Unfold, retain only Arise-Settle-Fix.

This is not coincidence. Authoritarian systems instinctively regard Unfold as a threat, because Unfold breaks the predictive model, and authoritarianism requires everything to be predictable and controllable. Chisel is uncontrollable — you do not know what remainder will expose. A dissonant chord might be merely a dissonant chord, or it might be a questioning of the existing order. Authoritarianism cannot tolerate that uncertainty, so its choice is: define chisel itself as a crime.

The result? Official art under both regimes was highly formalized, technically accomplished, and profoundly tedious. It completed Arise-Settle-Fix perfectly, but no one voluntarily returns to it. It cannot cross cycles because remainder was prevented at the institutional level.

And the irony: the best works under both regimes were precisely those that performed micro-chisel within the censorship framework. Shostakovich's symphonies superficially satisfied Socialist Realist requirements, but in orchestration and harmonic shadow harbored irony and subversion — things you cannot hear on the first listen but begin to sense on the tenth. They crossed cycles. The officially approved works did not.

This is the validation of the Arise-Settle-Unfold-Fix model at the institutional level: you can use state power to delete Unfold, but you cannot delete the human cognitive system's need for chisel. The result of deleting chisel is not better art — it is more boring art.

VIII. From Single Channel to Cross-Channel: The Progression from Essay I to Essay II

Reviewing what has been established.

Essay I proved: within the single auditory channel, Arise-Settle-Unfold-Fix is the universal structure of all music. Enduring music = four steps complete + remainder real and inexhaustible.

Essay II (this essay) has proved: when multiple sensory channels participate simultaneously, the chisel-construct cycle can not only run independently within each channel but also run across channels. Cross-channel chisel produces remainder that exists in no single channel but only in the phase difference between channels. This is a structural resource specific to opera and Chinese opera.

Formalization is not the enemy of chisel but its finest foundation — the more precise the predictive model, the greater the effect of micro-deviation. The structural difference between master and artisan lies not in technique but in whether real Unfold has been produced within the formalized construct.

This raises the next question: if the chisel-construct cycle can run across channels, does it depend on a particular channel combination? If we remove the auditory channel entirely, leaving only bodily movement, can the chisel-construct cycle still hold?

In the next essay, we enter ballet and dance — the body as the primary channel of chisel. When music recedes to the background or vanishes entirely, where lies the difference between military drill and Pina Bausch?