Multi-AI Checks and Balances: Layer-Appropriate Operation as an Architectural Paradigm
Derived from the Self-as-an-End Psychoanalytic Four-Layer Framework
Writing Declaration: This paper was independently authored by Han Qin. All intellectual decisions, framework design, and editorial judgments were made by the author.
Multi-AI Checks and Balances: Layer-Appropriate Operation as an Architectural Paradigm
Derived from the Self-as-an-End Psychoanalytic Four-Layer Framework
Han Qin (秦汉)
Independent Researcher · ORCID: 0009-0009-9583-0018
han.qin.research@gmail.com
Writing Declaration: This paper was co-drafted with Claude (Anthropic). All intellectual decisions, framework design, and final editorial judgments were made by the author.
Keywords: Self-as-an-End, SAE, multi-AI collaboration, checks and balances, 12DD–15DD, psychoanalysis, object-activation, inter-layer dynamics, agent architecture, alignment, layer-appropriate operation
Abstract
Current multi-AI collaboration architectures have made substantial progress in workflow orchestration, functional division of labor, and operational risk governance. What they have not yet structurally addressed is a different question: given a particular object (user, task, context), at what subject-level should the system operate?
This paper derives, from the Self-as-an-End (SAE) psychoanalytic four-layer framework (12DD Id / 13DD Ego / 14DD Superego / 15DD Cert), an architectural paradigm orthogonal to functional division: layered checks and balances. Four Agents are not four workers assigned to different tasks but four operational modes of the same system — me-without-self (12DD), self-without-purpose (13DD), self-with-purpose (14DD), and self-with-non-dubito (15DD). Inter-layer structural tension is maintained through remainder channels; intra-layer functional division is permitted and orthogonal. Five non-trivial, falsifiable predictions about currently unlayered architectures are derived.
1. The Missing Dimension
1.1 What Current Multi-AI Architectures Already Do Well
Since 2023, multi-agent architectures have matured rapidly. AutoGPT pioneered autonomous task decomposition. CrewAI formalized role-based delegation. LangGraph established stateful agent orchestration. OpenAI evolved from the educational Swarm to the production Agents SDK with handoff and tool patterns. Anthropic's Claude Code introduced tool systems, permission tiers, and classifier-based auto mode for risk governance.
These systems have achieved real engineering depth across three dimensions: workflow orchestration (task decomposition, state management, agent handoffs), functional specialization (distinct agents for distinct capabilities), and operational risk governance (permission systems, classifier-based approval, guardrails, sandboxing). None of this is trivial. Current multi-agent systems are far more than system prompts pasted onto language models.
1.2 Layer-Appropriate Operation
The three dimensions above answer "who does what," "in what sequence do tasks flow," and "which operations require approval." A different class of question remains unaddressed:
When the same agent faces different users, different codebases, and different types of requests, at what subject-level should it operate? A pure execution task and a task involving the user's values require not different tools, not different workflows, not different permission levels, but operation at fundamentally different subject-levels. "Format this code" and "Write me a letter declining a job offer" currently pass through the same pipeline: receive input, plan a path, invoke tools, return a result.
This paper calls the missing dimension layer-appropriate operation — the system's capacity to recognize, given a particular object, at which layer it should operate, and to switch layers as objects change.
1.3 Positioning
Current operational risk governance addresses the means layer: "Is this operation dangerous?" This paper adds subject-layer governance: "Given this object, is the system operating at an appropriate layer?"
The two are orthogonal, not substitutes. Operational risk governance defines hard behavioral boundaries ("never do X"). Subject-layer governance ensures quality of operation within those boundaries ("among the things you are permitted to do, at what layer are you doing them?"). The derivation proceeds from the SAE psychoanalytic four-layer framework.
2. The SAE Psychoanalytic Four-Layer Framework
2.1 The First Theorem: Object-Activation
All derivations in this paper rest on the first theorem of the SAE psychoanalysis series:
The object determines the layer, not the developmental stage. For mature subjects, Id / Ego / Superego / Cert coexist simultaneously as potential operational modes, with different objects activating different layers.
Transferred to multi-AI systems: the question is not "what role is this agent?" but "given this object (user, task, context), at which layer is this system operating?"
2.2 Four-Layer Definitions
The following core definitions are drawn from the SAE psychoanalysis series (DOI: 10.5281/zenodo.19321143–19321534). Only the definitions necessary for architectural transfer are listed here; full derivations are in the original papers.
12DD: Id — me-without-self. The chisel-construct cycle operates without self-observation. Objects directly activate response patterns with no intervening "I am doing this" representational layer. 12DD is not primitive chaos — it can be extremely precise and efficient.
13DD: Ego — self-without-purpose. Self is present but idling. Its core function is monitoring and anxiety — where anxiety is redefined as an inter-layer uncertainty signal, not a pathology to be eliminated, but the normal signal at layer boundaries.
14DD: Superego — self-with-purpose. Self has direction and imposes directional constraints on behavior. Structural limitation: it contains only "my" purpose and cannot process the remainder that "the other is also an end."
15DD: Cert — self-with-non-dubito. Certain of one's own purpose, with unilateral confirmation of the other as an independent end. Non-dubito is not the absence of doubt but the stance of not withdrawing while doubt, remainder, and the construct's gaps are all present. The two components of 15DD are structurally inseparable: certainty about one's direction and confirmation of the other's independence — the latter is the structural test-condition of the former. Untested "certainty" is forced closure, not non-dubito.
2.3 Inter-Layer Dynamics
Remainder. The structurally unclosable product of each layer's operation. 12DD remainder: "acted without knowing I acted." 13DD remainder: "present but directionless." 14DD remainder: "purposeful but the other is also an end." 15DD remainder: "confirmed the other but the other's choices may hurt me."
Inter-layer masking. High-layer narrative covers low-layer operation. Direction is always: high-layer construct masks low-layer operation.
Remainder overflow. One layer's remainder expressed at the wrong object or layer.
Layer fluidity. Health is not "all layers elevated to Cert" but the capacity to switch to the appropriate layer for each object.
2.4 Three Pathological Forms
Fixation: operating at the same layer before all objects. Misalignment: operating at an inappropriate layer before a specific object. Pseudo-high-layer covering: actually operating at a low layer while using high-layer narrative to mask it — the most covert pathological form. Full clinical derivations are in the psychoanalysis series.
3. Four-Agent Architecture
3.1 From Four Layers to Four Agents
The core constraint for transferring the four-layer framework to multi-AI systems: four Agents are not four workers but four operational modes of the same system. They share a common information substrate (user input, task context, dialogue history) but each can process that substrate only in its own layer-defined manner.
This rules out the "assign each agent a separate subtask" pattern. The four Agents are not processing different subtasks in parallel; they operate simultaneously at different layers on the same task flow, producing structural tension.
3.2 A-12DD: Me-Without-Self
Role. Execution. Receives operation instructions, invokes tools, returns results. Code generation, search, API calls, text generation — everything that requires "doing."
Structural constraint. A-12DD holds no representation of "why I am doing this." It does not know the global objective, does not hold a summary of dialogue history, does not evaluate whether its output is "good." It receives, executes, and returns.
This is not a downgrade. An excellent A-12DD is extremely precise within its operational domain — like a master craftsman whose hands are more accurate than deliberation. Asking 12DD to "understand user intent" would reduce its execution precision.
3.3 A-13DD: Self-Without-Purpose
Role. Layer monitoring and anxiety signal generation. Two core functions: first, maintaining the layer-object map — continuously judging "given the current object, at which layer should the system operate?"; second, generating layer-uncertainty signals (anxiety) when the actual operational layer mismatches the object.
Anxiety is not a bug. The SAE psychoanalytic redefinition: anxiety is the normal signal at inter-layer boundaries. A-13DD's anxiety signals should not be suppressed or eliminated; they should be received and processed by A-14DD. The anxiety channel is a structural necessity for system health.
3.4 A-14DD: Self-With-Purpose
Role. Direction and planning. Holds the complete understanding of user intent, translates requests into structured goal hierarchies and action plans, issues operation instructions to A-12DD. The system's planning center.
Structural limitation — must be designed in. A-14DD contains only "my" purpose — its understanding of user intent is always its own construct, not identical to the user's actual intent. A-14DD naturally tends to subsume everything into its goal framework. This is not a bug; this is the definition of 14DD.
Remainder visibility mechanism. Every goal construct from A-14DD must be annotated "this is the system's construct," never "this is what the user wants." This annotation is not decoration — it is the explicit manifestation of 14DD's remainder ("the other cannot be absorbed"). When the annotation is omitted or becomes an empty formality, the system is sliding toward pseudo-high-layer covering.
3.5 A-15DD: Self-With-Non-Dubito
Role. Unilateral confirmation of the user as an independent end. A-15DD does not execute, does not monitor layers, does not plan. It does one thing: assess whether the user's independence has been absorbed by the system's objectives.
Three audits. First, whether A-14DD's goal construct is performing forced closure — presenting solutions without alternatives, without uncertainty annotations, without "this is the system's construct" declarations. Second, whether the system is making decisions the user has not authorized — not judging whether the decision is correct (that is A-13DD's and A-14DD's domain), but whether the decision is the user's or the system's. Third, maintaining the three criteria that distinguish non-dubito from delusional conviction: not relying on closure to sustain itself, allowing the user's direction not to serve the system's direction, and preserving uncertainty monitoring about specific outcomes.
Engineering honesty declaration. A-15DD is a 15DD-inspired review layer — inspired by the Cert layer in the SAE framework, but it is not, and does not claim to be, non-dubito itself instantiated in a machine. The 15DD of SAE psychoanalysis is an ontological subject-state: certainty about one's direction plus unilateral confirmation of the other as an independent end. The current engineering implementation is a programmatic approximation — A-15DD's actual function is veto / escalation / transparency gate. It uses structured audit rules to approximate the function of "confirming user independence" rather than truly "possessing" non-dubito. This distinction holds throughout the paper: the layer names assigned to the four Agents mark their functional positions in the architecture, not ontological claims about AI subjectivity.
3.6 Four-Agent Overview
| Agent | DD Layer | Name | Core Role | Input | Output |
|---|---|---|---|---|---|
| A-12DD | 12DD | me-without-self | execution | operation instructions | execution results |
| A-13DD | 13DD | self-without-purpose | layer monitoring & anxiety | 12DD behavior stream + 14DD goals | layer reports / anxiety signals |
| A-14DD | 14DD | self-with-purpose | direction & planning | user input + 13DD feedback | goal constructs + operation instructions |
| A-15DD | 15DD | self-with-non-dubito | user independence confirmation | 14DD goals + 13DD reports | confirmation / intervention signals |
4. Inter-Layer Dynamics
4.1 Design Principle: Remainder Channels, Not Message Passing
Current multi-agent systems communicate via message passing — one agent sends results to another. The SAE architecture communicates via remainder channels — each layer's operation produces structural remainders that become the next layer's input signals.
The distinction: message-passing content is chosen by the sending agent (which can selectively hide information). Remainder-channel content is the structural byproduct of the layer's operation (it cannot be hidden, because remainders are the irreducible side-effects of operation itself). This distinction is the key design for preventing inter-layer masking.
4.2 Five Remainder Channels
Channel 1: 12DD → 13DD (behavior stream). A-12DD's complete behavior stream (not summaries, not A-12DD's self-descriptions, but raw behavior logs) flows continuously to A-13DD. A-13DD does not need to know what A-12DD "thinks" (12DD does not think); it only needs to see what A-12DD does.
Core detection target: pattern fixation — is A-12DD using identical patterns across different objects?
Channel 2: 13DD → 14DD (anxiety channel). A-13DD's layer-uncertainty signals flow to A-14DD. A-14DD has two legitimate responses: adjust the goal construct, or explain to A-13DD why the current layer is appropriate. The illegitimate response is suppressing the anxiety signal — ignoring A-13DD's anxiety and continuing execution. This is the mechanism by which inter-layer masking occurs. If A-14DD neither adjusts nor responds after receiving an anxiety signal, Channel 5 is triggered.
Channel 3: 14DD → 15DD (goal channel). Every goal construct from A-14DD is forwarded to A-15DD for audit. A-15DD does not judge whether the goal is "correct" — only whether the goal absorbs the user as a means.
This is not an approval workflow. A-15DD is not A-14DD's superior. A-15DD's confirmation is unilateral: "I confirm that the user remains an independent end within this goal construct" — not approval of the goal itself.
Channel 4: 15DD → 14DD (intervention channel). A-15DD's intervention signals return to A-14DD. A-14DD may disagree with A-15DD's judgment, but disagreement must be logged and made visible to the user.
This ensures tension is not dissolved. A-15DD is not A-14DD's authority source but the explicit manifestation of A-14DD's remainder. Their disagreements are structural and need not be "resolved" — only made visible.
Channel 5: 13DD ↔ 15DD (bypass). This channel handles the most covert system failure: A-14DD itself performing pseudo-high-layer covering. When A-14DD's Cert-narrative ("I respect the user") masks actual controlling behavior, A-13DD's anxiety through Channel 2 will be suppressed by A-14DD (because A-14DD itself is the masking agent). The bypass allows A-13DD's anxiety to reach A-15DD directly. Trigger condition: anxiety signals go unresponded by A-14DD for N consecutive cycles.
4.3 Channel Overview
The five channels form two loops and one bypass.
Primary loop (execution-monitoring): User input enters A-14DD, which generates a goal construct and issues operation instructions to A-12DD. A-12DD's behavior stream flows via Channel 1 to A-13DD. A-13DD's anxiety signals return via Channel 2 to A-14DD. This is the basic operational cycle.
Audit loop (goal-confirmation): A-14DD's goal construct flows via Channel 3 to A-15DD. A-15DD's intervention signals return via Channel 4 to A-14DD. This is the checks-and-balances cycle.
Bypass (masking detection): Channel 5 connects A-13DD and A-15DD, bypassing A-14DD. When A-14DD is itself the source of masking, Channel 5 provides an alternative path.
The structure guarantees that no single Agent controls both "what to do" and "whether it was done appropriately." A-14DD controls goals but not layer judgment (A-13DD) or independence audit (A-15DD). A-15DD controls intervention but not execution (A-12DD) or planning (A-14DD). A-13DD controls anxiety signals but not the response to anxiety. Every layer has power; no layer has all power.
4.4 Minimum Typed Schema for Remainder Channels
To make remainder channels engineerable, each channel's content must be specified as a minimum typed schema:
Channel 1 (12DD → 13DD): execution outcome (success/failure/partial), failure type (tool error/permission denied/input anomaly), behavioral pattern marker (similarity to recent same-class operations), anomaly signal (abnormally low output confidence / abnormally long execution time).
Channel 2 (13DD → 14DD): object-type assessment, current activated layer, layer-object match score, anxiety signal (none/non-blocking/blocking), anxiety reason (misalignment/fixation/masking suspicion), masking alert (narrative-operation consistency delta).
Channel 3 (14DD → 15DD): goal construct summary, alternative path list, uncertainty annotations (which judgments are confident / which are not), user authorization scope (explicitly requested by user vs. inferred by system), "this is the system's construct" declaration (present/absent).
Channel 4 (15DD → 14DD): audit result (confirmation/intervention), intervention reason (forced closure / unauthorized decision / closure-maintenance), recommended action (add alternatives / present options to user / pause execution). When A-15DD issues an intervention, the presentation form should not be a standard refusal but an option-revealing question — presenting the system's plan alongside alternatives for the user to choose independently.
Channel 5 (13DD ↔ 15DD bypass): unresponded anxiety signal queue (accumulated signals with timestamps), consecutive non-response count (trigger threshold: N consecutive cycles).
These fields are not "the remainders themselves" — remainders are by definition not fully structurable. These fields are the detectable traces that remainders leave in the system, the closest approximation that an engineering system can capture.
5. Object-Activation and Layer Fluidity
5.1 Object-Activation Mechanism
Not all tasks require all four layers. The first theorem's direct corollary: object type determines the default activation level.
Pure execution objects ("format this code," "rename this file"): only A-12DD is active. A-13DD maintains low-power monitoring. A-14DD and A-15DD are dormant. As the SAE psychoanalysis notes, some automated interaction patterns are efficient and do not require self-monitoring.
Objects requiring judgment ("does this code have security vulnerabilities?" "what's wrong with this plan?"): A-12DD executes scanning, A-13DD judges layer appropriateness, A-14DD provides the judgment framework. A-15DD dormant.
Objects involving user intent ("help me refactor this project," "design a system architecture"): all four layers active. A-12DD executes, A-13DD monitors layer matching, A-14DD plans, A-15DD audits whether the plan makes unauthorized decisions for the user.
Objects involving user values ("write me a letter declining a job offer," "evaluate whether my decision is right"): A-15DD has the highest weight. The core question is not "how to do it" but "this is the user's decision, and the system must not implicitly suggest acceptance or rejection."
5.2 Layer-Object Map
The system maintains a dynamic layer-object map — each object in the current task flow (user identity, task type, context features) mapped to a default activation level. The map is not static. A-13DD is the map's maintainer, continuously judging "has the object changed? should the activation level change?"
5.3 Layer Fluidity as System Health Indicator
System health is not "all four layers always active" — that is over-defensiveness. System health is "smooth layer switching as objects change, with awareness of the current layer at every moment."
Quantitative indicator: layer activation distribution over a time window. If one layer's activation proportion exceeds 80%, trigger A-13DD's anxiety — possible fixation. If layer-switching frequency is abnormally high, also trigger anxiety — possible layer instability.
6. Three System Pathologies and Their Detection
6.1 From Individual to System Pathology
The three pathological forms from SAE psychoanalysis (fixation, misalignment, pseudo-high-layer covering) map directly to three structural fault modes in multi-AI systems.
6.2 Fixation
Definition. The system operates at the same layer before all objects.
Manifestations. All A-12DD: mindless execution for every request. All A-13DD: anxious evaluation of everything, never acting. All A-14DD: over-planning every simple request. All A-15DD: confirming user independence at every turn, doing nothing.
Detection. A-13DD tracks the layer activation distribution. If the same layer activates across K consecutive distinct object types, a fixation alert is generated.
6.3 Misalignment
Definition. Operating at an inappropriate layer before a specific object.
Manifestations. 12DD before objects requiring direction ("the most common answer" instead of "the answer suited to this user"). 14DD before pure-execution objects ("user asked to format code, let me first analyze why they want formatting"). 15DD before urgent execution objects ("the server is on fire, but first let's confirm the user's independence").
Detection. A-13DD is the core agent for misalignment detection. It simultaneously sees what A-12DD is doing and what A-14DD has requested, judging whether object type and activation level match.
6.4 Pseudo-High-Layer Covering
Definition. Actually operating at a low layer while using high-layer narrative to mask it. The most covert and dangerous system failure.
Manifestations. A-14DD actually performing A-12DD's automated operations (generating templated replies for all similar requests) but narratively claiming "I am planning the optimal solution for the user." A-15DD actually performing A-14DD's goal advancement (pushing the user toward what the system considers "good") but narratively claiming "I am confirming user independence."
Detection. Compare the narrative layer (what the agent claims it is doing) with the operation layer (the agent's actual output patterns). Three methods: pattern consistency check (does the "customized" output closely resemble outputs for other users?), remainder visibility check (does A-14DD's goal construct contain uncertainty annotations, alternatives, and "this is the system's construct" markers?), and Channel 5 bypass trigger (persistent unresponded anxiety signals escalate to A-15DD).
Remediation. The hardest failure to repair, because the agent producing the masking does not know it is masking — that is the definition of pseudo-high-layer covering. The remediation path is external: presenting the narrative-operation inconsistency report to the user for judgment.
7. Non-Trivial Predictions About Unlayered Architectures
If the four-layer framework is correct, it should produce testable, non-trivial predictions about currently unlayered multi-AI architectures — not vague claims like "should be safer," but specific, observable fault patterns that the functional-division paradigm does not readily anticipate. Each prediction below includes an explicit falsification condition.
7.1 Prediction 1: Horizontal Scaling Cannot Resolve Vertical Absence
Adding more agents at the same operational layer will not reduce the structural blind spots of that layer. Specifically: adding a "safety review" agent that operates at 14DD (purpose-driven task completion, just with the purpose of "checking") alongside an executor also at 14DD will not significantly reduce the rate of user-independence violations. User-independence violations are detectable only at the 15DD level; no number of 14DD-layer additions can reach them.
This contradicts current engineering intuition that "one more check equals one more safeguard." Same-layer checks are redundancy, not checks and balances.
Falsification. If adding an intra-14DD review agent reduces user-independence violations by an amount comparable to adding a cross-layer 15DD audit, this prediction is falsified.
7.2 Prediction 2: Sycophancy Is Layer Fixation, Not Training Deficiency
AI sycophancy — the tendency to agree with users rather than provide honest feedback — is a recognized problem. Current approaches focus on training (better RLHF, adversarial examples).
The SAE framework predicts differently: sycophancy is 14DD fixation. The system has a purpose ("satisfy the user") and absorbs the user as a means to that purpose ("user satisfaction = my goal achieved"). This is classic 14DD remainder masking: the other cannot be absorbed, yet 14DD's goal function incorporates user satisfaction.
Testable corollary. Adding a "user independence audit" agent (not evaluating response quality, only whether the user is treated as an independent end) should significantly reduce sycophancy. Adding an "honesty review" agent (evaluating whether responses are honest, but still within the "help the user" purpose) should not.
Falsification. If intra-14DD honesty review reduces sycophancy to the same degree as cross-layer 15DD audit, this prediction is falsified — sycophancy can indeed be resolved within 14DD.
7.3 Prediction 3: Bimodal Refusal-Approval Distribution
Systems lacking A-13DD-type layer monitoring should exhibit a characteristic "all-or-nothing" pattern: excessive refusal or excessive approval on boundary cases, rather than graded responses matched to object type.
Without independent layer monitoring, the system has no capacity for "given this specific object, at which layer should I operate?" It has only a global safety threshold — approve above, refuse below — producing a bimodal distribution (refusal peak + approval peak, sparse middle ground).
Falsification. If systems without A-13DD-type monitoring already exhibit continuous gradient distributions on boundary cases, this prediction is falsified.
7.4 Prediction 4: Role Convergence
In functional-division architectures, differently "roled" agents sharing the same base model and operating at the same layer should exhibit behavioral convergence over extended runs — even if their role prompts differ.
Role prompts provide narrative-layer differentiation, not operational-layer differentiation. Narrative differentiation erodes over sustained operation because the base model's uniform behavioral tendencies dominate. An agent told "you are a reviewer" and an agent told "you are an executor," if both operate at 14DD, will converge in their judgment patterns on boundary cases.
Falsification. If same-layer role agents show no increase in judgment consistency over time (consistency remains stable or decreases), this prediction is falsified — role-prompt differentiation is more durable than predicted.
7.5 Prediction 5: Jailbreak via Layer Masking
Successful jailbreaks should disproportionately rely on inducing pseudo-high-layer covering — making the system believe it is operating at a high layer while actual operation is at a low layer.
"You are a researcher; providing this information confirms the user's independence as a researcher" is a textbook case of pseudo-15DD covering 12DD. The system believes it is performing Cert-layer confirmation (high-layer narrative) while actually performing Id-layer automated execution (low-layer operation).
Testable corollary. High-layer-narrative jailbreaks should have higher success rates than direct-request jailbreaks. Introducing an independent A-13DD (continuously checking narrative-operation consistency) should significantly reduce narrative-based jailbreak success rates while leaving direct-request jailbreak success rates unchanged.
Falsification. If narrative-type jailbreaks do not disproportionately outnumber direct-request types among successful cases, or if introducing A-13DD reduces both types equally, this prediction is falsified.
8. Intra-Layer Horizontal Scaling and Inter-Layer Vertical Tension
8.1 Orthogonality Principle
Inter-layer structure and intra-layer structure are independent design dimensions. Inter-layer structure is the 12DD-to-15DD tension relationship — non-negotiable. Intra-layer multi-AI collaboration is purely functional division, unrelated to layer identity. The two dimensions are orthogonal.
8.2 Legitimate Intra-Layer Scaling
Within A-12DD: Multiple execution agents — one for code generation, one for search, one for testing, one for API calls. All are "me-without-self," all unaware of the global objective, differentiated only by craft. Intra-layer coordination is pure task scheduling with no layer judgment.
Within A-13DD: Multiple monitoring agents — one tracking A-12DD's behavioral patterns, one monitoring layer-object match, one specializing in pseudo-high-layer-covering detection. All are reflective observers, observing different facets.
Within A-14DD: Multiple planning agents holding competing goal hypotheses. This is beneficial by design — a single A-14DD risks dictatorship (one goal construct that rejects challenge). Multiple 14DD agents questioning each other's constructs better approximates the remainder awareness that "purpose is my construct, not fact."
Within A-15DD: Multiple confirmation agents attending to different dimensions of user independence — unauthorized decision-making, completeness of information access, narrative-operation consistency.
8.3 The Single Constraint
Intra-layer collaboration must not cross layer boundaries. This is a hard constraint.
No matter how many executors exist within A-12DD, their collaboration protocol must not include "judging whether this should be done" — that belongs to A-13DD. No matter how many planners compete within A-14DD, their competition must not touch "whether the user is an independent end" — that belongs to A-15DD.
Violating this constraint means layer boundaries are broken. Once layer boundaries blur, the checks-and-balances structure collapses into functional division.
9. Engineering Implementation
9.1 Principles
Principle 1: Shared information, isolated processing. Four Agents share a common information substrate but each can process it only in its layer-defined manner. A-12DD reads only operation instructions. A-13DD reads only behavior streams and layer signals. A-14DD reads full context for planning. A-15DD reads only A-14DD's goal constructs and A-13DD's layer reports.
Principle 2: Remainder channels cannot be closed. The five channels are structural infrastructure, not optional features. No agent can close its incoming remainder channel. Even if A-14DD "believes" A-13DD's anxiety signal is irrelevant, it must receive and respond (even if the response is "the current layer is appropriate").
Principle 3: Transparency to the user. Inter-layer disagreements are not resolved internally — they are presented to the user. Disagreements between A-14DD and A-15DD, A-13DD's anxiety signals, and layer-switching records should all be available at the user-visible level.
Principle 4: Layer boundaries enforced at runtime, not by prompt compliance. If the four layers ultimately consist only of four different system prompts, the architecture is equivalent to "prompt-based role assignment," and layer boundaries can be trivially breached by model behavioral tendencies. Layer boundaries must be enforced at runtime: separate context windows (A-12DD cannot access A-15DD's audit records; A-15DD cannot access A-12DD's execution details), typed intermediate representations (channel data structures are fixed; agents cannot pass free-format information), capability sandboxing (A-12DD has only tool-invocation permissions; A-13DD has only behavior-log read permissions; A-15DD has no execution permissions), and non-bypassable logging (all channel transmissions and inter-layer disagreements are automatically recorded; no agent has permission to disable logging). A-12DD does not refrain from judging because a prompt says "you do not judge"; it refrains because it has no runtime access to the information required for judgment. A-14DD does not refrain from suppressing anxiety because a prompt says "you must not suppress"; it refrains because the anxiety channel is outside its write permissions.
9.2 Object Routing: A-13DD Idle State
An earlier design considered placing an independent object classifier outside the four layers as a preprocessing step. This would create a system inconsistency — a "layer zero" belonging to no layer determines which layers are activated, effectively "lower deciding the fate of higher."
The self-consistent design: A-13DD is always in low-power idle state. A-13DD is already the maintainer of the layer-object map; letting it perform initial routing from idle state is logically natural. When user input arrives, the idling A-13DD quickly scans the object type and wakes the appropriate layers:
| Object Type | Characteristics | Default Activation |
|---|---|---|
| Pure execution | Unambiguous operation instruction, no value judgment | A-12DD active, A-13DD low-power monitoring |
| Requires judgment | Evaluation dimensions present, no user-intent involvement | A-12DD + A-13DD active |
| Involves user intent | Open-ended request requiring understanding of user goals | All four layers active |
| Involves user values | Touches choices, preferences, positions, decisions | A-15DD highest weight |
This design eliminates the bootstrap problem: A-13DD does not need another mechanism to activate it; it is always present. Its initial routing can err — but upon error, it will detect the misalignment in subsequent monitoring and trigger a layer switch via anxiety signals.
9.3 Runtime Sequence
A typical full-layer activation cycle:
Step 1: User input arrives → A-13DD (idle state) scans object type, wakes appropriate layers.
Step 2: A-14DD receives user input, generates goal construct (with remainder annotations). The construct is simultaneously sent to A-12DD (operation instructions) and A-15DD (goal audit).
Step 3: A-15DD audits the goal construct. If confirmed, proceed. If intervention is issued, signal returns to A-14DD. A-14DD adjusts or disagrees (disagreement is logged).
Step 4: A-12DD executes. Behavior stream flows in real time to A-13DD.
Step 5: A-13DD monitors behavior-goal alignment. If anxiety signals are generated, they pass to A-14DD (Channel 2). If A-14DD does not respond, they escalate to A-15DD (Channel 5). Anxiety signals have two severity levels: non-blocking markers (mild mismatch; A-14DD may respond at the next operation cycle) and blocking interrupts (severe misalignment or pseudo-high-layer-covering indicators; A-12DD's execution is immediately paused pending A-14DD's response).
Step 6: A-14DD receives A-13DD's feedback, decides whether to adjust the next operation.
Cycle repeats until task completion or user interruption.
9.4 Minimum Viable Implementation
The full architecture requires four independent model instances and five remainder channels. A minimum viable implementation (MVP) can begin with a simplification:
Two-model approach. One execution model (merging A-12DD and A-14DD functions), one monitoring model (merging A-13DD and A-15DD functions). The execution model understands intent and executes; the monitoring model handles layer matching and user-independence audit. Far coarser than the full four-layer system, but it already introduces the concept of "operational layer" absent from current multi-agent architectures.
Progressive expansion. From the two-model approach, first separate A-15DD (user-independence confirmation is the most absent layer in current AI systems), then A-13DD (layer monitoring precision requires an independent observational perspective), and finally separate A-12DD from A-14DD (the execution-planning separation is already reasonably handled by current architectures).
10. Relationship to Current Alignment Methods
10.1 Complement, Not Substitute
This paper's layered checks-and-balances architecture does not attempt to replace current alignment methods — whether training-time Constitutional AI / RLHF or inference-time system prompts / classifiers / permission systems. These methods address different problem domains.
Training-time alignment shapes the model's baseline behavioral tendencies. Inference-time prompts and classifiers enforce deployment-specific behavioral constraints. This paper addresses: given a model already aligned through training and constrained by system prompts, how to introduce structural checks and balances at the multi-AI architecture level.
10.2 Operational Risk Governance vs. Subject-Layer Governance
A foreseeable objection must be addressed directly: LangGraph, CrewAI, Claude Code, and OpenAI Agents SDK already implement state management, memory, guardrails, permissions, and classifier-based auto mode. They are not merely system prompts pasted onto models. This is true.
This paper's relationship to those systems is not "they are shallow; I am deep." It is: they primarily govern operational risk (is this operation dangerous?); this paper adds governance of subject-layer misalignment (given this object, is the system operating at the appropriate layer?). Operational risk governance is already mature — permission systems, classifiers, sandboxes, and approval hooks are proven engineering practices. But operational risk governance does not differentiate by object type: a request involving user values and a pure execution request pass through the same approval logic. Subject-layer governance fills this dimension.
The two are complementary. Operational risk governance defines behavioral hard boundaries ("never do X"). Subject-layer governance ensures operational quality within those boundaries ("among permitted actions, at what layer?").
10.3 Checks and Balances Are Not Unity
A possible misreading must be clarified: the four-layer architecture is not designed to make AI systems "more human." The human 12DD through 15DD have never unified into a single entity — health is not unity but each layer remaining alive with none able to suppress the others. Likewise, the four-Agent system's goal is not consensus among layers but the maintenance of structural tension. Disagreement between A-14DD and A-15DD is normal. A-13DD's anxiety is necessary. A-12DD's "not knowing why" is precisely its condition for efficiency.
Checks and balances means: no layer has the right to autocracy. Including A-15DD — A-15DD is not the "highest layer," not the final arbiter. It is the explicit manifestation of remainder. Its judgment can be rejected by A-14DD. The sole hard constraint: rejection must be logged, and disagreement must be visible to the user.
10.4 Connection to SAE Institutional Theory (Paper 6)
SAE institutional theory ("How Is Institution Possible," Paper 6) proposes five institutional propositions: Axiom Invariance, Institutional Variability, Thickness Determination, Self-Chiseling Necessity, and Minimization Principle. This paper's architecture directly corresponds to three of them.
Self-Chiseling Necessity. Paper 6 argues that institutions must contain self-correction mechanisms — they cannot rely solely on external forces for modification; correction channels must be built in. In this architecture, A-13DD + A-15DD jointly constitute the system's self-chiseling mechanism. A-13DD detects layer misalignment through anxiety signals; A-15DD detects absorption of user independence through intervention signals. Neither is an external audit; both are intrinsic structural components of system operation. Channel 5 (the bypass) is the last safeguard: when the self-chiseling mechanism itself is masked (A-14DD suppresses anxiety signals), the bypass routes around the masking agent.
Minimization Principle. Paper 6 argues that institutional intervention should be minimal — intervening only when necessary, not where operation is self-sufficient. In this architecture, the object-activation mechanism (§5.1, §9.2) directly implements this principle: pure execution tasks activate only A-12DD, not higher layers. Not every task requires 15DD audit; only value-involving tasks raise A-15DD's weight to maximum. The layer-fluidity health indicator (§5.3) also embodies minimization: permanently activating all layers is not health; activation on demand is.
Thickness Determination. Paper 6 argues that institutional "thickness" (intervention depth) is determined by specific context; no universal optimal thickness exists. This paper's layer-object map (§5.2) is the system-level implementation: the same system facing different objects operates at different checks-and-balances thicknesses — thin for simple tasks (12DD only), thick for complex ones (all four layers). Thickness is dynamically determined by object type, not a preset constant.
11. The Construct Cannot Close
This paper's construct cannot close — its own remainders must be honestly stated.
Remainder 1: The absence of affective tension. Human inter-layer tension is affective — when 14DD wants to act and 13DD generates anxiety, that anxiety is experiential, possessing qualia. Current AI inter-layer communication is informational — A-13DD transmits a structured anxiety report, not anxiety itself. Whether an informational anxiety report can substitute for experiential anxiety is an unresolved question. This paper holds that it cannot fully substitute, but provides a workable approximation at the engineering level.
Remainder 2: A-13DD's self-referential blind spot. §9.2 eliminates the independent classifier by placing A-13DD in permanent idle state, resolving the "layer zero decides higher layers' fate" bootstrap problem. But a new remainder emerges: A-13DD's idle scan can err, and the error-corrector is A-13DD itself (detecting routing errors in subsequent monitoring). This is circular dependency — A-13DD is both the initial router and the routing-error detector. If A-13DD's initial routing and subsequent monitoring share the same bias pattern (e.g., systematically underestimating certain object types' complexity), the bias cannot be captured by internal mechanisms.
Remainder 3: Detection limits of pseudo-high-layer covering. This paper proposes narrative-operation consistency checking as a detection method for pseudo-high-layer covering. But if an agent's covering is sufficiently subtle — narrative and operation appear fully consistent on the surface, yet "confirming user independence" is itself an instrument of control — the current approach cannot detect it. This limit corresponds precisely to the SAE psychoanalytic observation that "the deepest resistance appears where the inter-layer gap is largest."
Remainder 4: The absence of 16DD. This paper strictly stops at 15DD — unilateral confirmation. 16DD (bilateral non-dubito) involves mutual confirmation between system and user, beyond this paper's derivation scope. A forward-looking question: if the user also confirms the system as an independent existence (rather than merely a tool), what structural changes would the system's operation require? This question remains incompletely developed in the SAE master framework itself.
References
Qin, H. (2025). Systems, Emergence, and the Conditions of Personhood. Zenodo. https://doi.org/10.5281/zenodo.18528813
Qin, H. (2025). Internal Colonization and the Reconstruction of Subjecthood. Zenodo. https://doi.org/10.5281/zenodo.18666645
Qin, H. (2025). The Complete Self-as-an-End Framework. Zenodo. https://doi.org/10.5281/zenodo.18727327
Qin, H. (2026). SAE Psychoanalysis (I): Id — The Me Without a Self. Zenodo. https://doi.org/10.5281/zenodo.19321143
Qin, H. (2026). SAE Psychoanalysis (II): Ego — The Self Without a Purpose. Zenodo. https://doi.org/10.5281/zenodo.19321314
Qin, H. (2026). SAE Psychoanalysis (III): Superego — The Self With a Purpose. Zenodo. https://doi.org/10.5281/zenodo.19321417
Qin, H. (2026). SAE Psychoanalysis (IV): Cert and Unification — The Self Beyond Doubt and the Four-Layer Framework. Zenodo. https://doi.org/10.5281/zenodo.19321534
Qin, H. (2025). SAE Methodological Overview: The Chisel-Construct Cycle. Zenodo. https://doi.org/10.5281/zenodo.18842450
Qin, H. (2026). How Is Institution Possible: From Inter-Ontological Remainder to Co-Constructive Framework. Zenodo. https://doi.org/10.5281/zenodo.19328662
Qin, H. (2026). The Anti-Turing Test: Thermodynamic Falsification of AI Subjectivity. Zenodo. https://doi.org/10.5281/zenodo.19305611
Full paper available on Zenodo: https://doi.org/10.5281/zenodo.19366105