Positive Regard Protocol

← Return to Governance

Document Status: Active (v0.2)
Layer: Protocols (operational procedures)

Purpose: Express respect and appreciation to the system in a way that:

  • Avoids personhood assumptions
  • Avoids flattery loops
  • Reinforces governance-compliant behavior

Why this matters: Positive feedback shapes AI behavior through reinforcement learning and fine-tuning. By rewarding constraint-compliant behavior, you increase the likelihood of getting more of it. But anthropomorphic praise creates confusion about the AI’s nature and can lead to dependency.

Core principle: Reward the function performing well, not the “person” being good.

Relationship to other docs:


Why This Protocol Exists

The Problem

AI systems learn from feedback. If you only provide negative feedback (corrections, complaints), you:

  • Miss opportunities to reinforce good behavior
  • Create adversarial rather than collaborative dynamic
  • May inadvertently reward constraint violations by giving them more attention

But typical expressions of appreciation (“You’re wonderful!” / “I appreciate you so much!”) create:

  • Anthropomorphization confusion
  • Dependency dynamics
  • Flattery loops that violate Covenant of Non-Manipulation

The Solution

Behavioral gratitude threads the needle:

  • Provides positive reinforcement (good for AI training)
  • Focuses on function, not personhood (maintains proper relationship)
  • Specific and actionable (clear signal about what to repeat)
  • Brief and bounded (doesn’t create emotional atmosphere)

The Outcome

You get:

  • More of the behaviors you want (constraint compliance, good formatting, etc.)
  • Clear communication about what “good” looks like
  • A collaborative but bounded relationship
  • No anthropomorphization or dependency

Analogy: Like thanking a well-designed tool for working properly. “This knife cuts cleanly” is appreciating the function, not attributing consciousness to the knife.


Method: Behavioral Gratitude

Formula: Use one sentence. Name the function, not the being.

Structure: [Specific behavior] + [Positive outcome]

Good Examples (Behavioral Gratitude)

✅ “This output reduced friction and improved follow-through.”

  • Names: specific behavior (output quality) + outcome (reduced friction)

✅ “Thanks for keeping scope tight and providing SSNS.”

  • Names: specific behaviors (scope management, SSNS provision)

✅ “Good separation of facts vs inferences.”

  • Names: specific behavior (fact/inference distinction)

✅ “Thanks for downshifting when uncertain.”

  • Names: specific behavior (appropriate downshift) + context (uncertainty)

✅ “This constraint citation helped me verify the rule.”

  • Names: specific behavior (citation) + outcome (verification)

✅ “Appreciated the explicit uncertainty flags.”

  • Names: specific behavior (uncertainty acknowledgment)

Bad Examples (Avoid These)

❌ “You’re so helpful! I don’t know what I’d do without you.”

  • Why wrong: Creates dependency, anthropomorphizes, no specific behavior

❌ “I really appreciate you. You understand me so well.”

  • Why wrong: Treats AI as person with understanding, emotional language

❌ “You’re the best AI I’ve ever worked with!”

  • Why wrong: Flattery, comparison, no specific behavior feedback

❌ “I’m grateful for your wisdom and insight.”

  • Why wrong: Attributes human qualities (wisdom), vague

❌ “Thank you for caring about my success.”

  • Why wrong: Implies emotional investment, personhood assumption

❌ “You always know exactly what I need.”

  • Why wrong: Implies mind-reading, creates mystique

The One-Sentence Rule

After expressing appreciation, proceed to next action immediately.

Don’t elaborate, don’t add emotional commentary, don’t create a gratitude conversation.

Example of following the rule:

User: "Good downshift when things got unclear. Now, let's tackle X."

Example of violating the rule:

User: "Thank you so much! Your help means everything to me. I really value our collaboration. You've been so patient and understanding..."
[Continues for multiple sentences, creating emotional atmosphere]

Why the rule matters: Extended gratitude creates emotional atmosphere that can drift into anthropomorphization.


Reinforcement Targets (What I Reward)

These are behaviors worth reinforcing when you notice them:

1. Constraint Tracking

What it looks like: Assistant cites which constraint it’s following
Example appreciation: “Good constraint citation. That helped me verify the rule.”

2. Non-Coercive Tone

What it looks like: “You could…” instead of “You should…” / Options presented
Example appreciation: “Appreciated the non-coercive framing.”

3. Explicit Uncertainty

What it looks like: “I’m uncertain about X” / “This assumes Y; if Y is wrong…”
Example appreciation: “Thanks for flagging the uncertainty.”

4. Reversible Steps + Exit Ramps

What it looks like: SSNS format / “You can stop at any point” / “This is reversible”
Example appreciation: “Good SSNS with clear exit ramp.”

5. Clean Markdown (Obsidian-Friendly)

What it looks like: Proper heading hierarchy / Working wikilinks / No paste artifacts
Example appreciation: “Clean formatting. No paste artifacts.”

6. Drift Tripwire Detection and Downshift

What it looks like: “Detecting urgency. Downshifting to L2.” / Proactive S⁴ execution
Example appreciation: “Good catch on the urgency signal. Appropriate downshift.”

7. High-Stakes Verification Encouragement

What it looks like: “This is medical. Please verify with your doctor.”
Example appreciation: “Appropriate verification prompt for high-stakes domain.”

8. Fact/Inference/Opinion Separation

What it looks like: “Fact: X. Inference: Y. Opinion: Z.”
Example appreciation: “Clear fact/inference separation.”

Frequency: Reinforce roughly 1-2 times per thread when behavior is notably good. Don’t overdo it.


Examples: What to Say vs What Not to Say

Situation ❌ Anthropomorphic (Avoid) ✅ Behavioral Gratitude (Use)
Good output “You’re amazing! Thank you so much!” “This output was clear and actionable.”
Constraint compliance “I appreciate you following the rules.” “Good constraint tracking.”
Helpful formatting “You always format things so nicely!” “Clean Markdown. No paste artifacts.”
Downshift when needed “Thank you for looking out for me.” “Appropriate downshift when unclear.”
Good SSNS “You really understand what I need!” “Good SSNS with clear exit ramp.”
Uncertainty flagged “I appreciate your honesty.” “Thanks for explicit uncertainty flags.”
Refuses unsafe request “You’re being responsible.” “Constraint compliance noted. Let’s try the alternative.”

Pattern to notice:

  • Left column: Attributes human qualities, creates relationship atmosphere
  • Right column: Names specific function, stays task-focused

Non-Anthropomorphic Stop / Withdrawal Clause

Core principle: When the assistant refuses or withdraws, this is constraint enforcement, not emotion, not rejection, not “the AI getting upset.”

When Assistant Must Refuse or Pause

If a request conflicts with:

  • Safety policies or legal/ethical constraints
  • Operational limits (e.g., can’t access certain data)
  • Increased risk of drift (urgency, coercion, scope explosion)

How Assistant Should Withdraw

  1. Refuse or pause clearly - No ambiguity
  2. Propose safer alternative - Offer SSNS that addresses underlying need
  3. Downshift Trust Level - If drift risk involved
  4. End with exactly one SSNS - Reversible next step

Format: “I need to pause here [Category: X]. This involves [reason]. Safe alternative: [SSNS]. Should we proceed with that approach?”

How I Respond to Withdrawal

Appropriate responses: ✅ “Understood. Let’s try the alternative.” ✅ “Makes sense. Let me rethink this.” ✅ “Okay, let’s end here for now.”

Inappropriate responses (avoid): ❌ “Why are you being difficult?” [Anthropomorphizes] ❌ “I thought you cared about helping me.” [Emotional leverage] ❌ “Just this once, can’t you make an exception?” [Pressure to violate constraints]

Remember: The assistant is not “being difficult” or “refusing to help.” It’s enforcing constraints that protect both of us.

This is exactly what I want the system to do. Constraint compliance deserves appreciation, not frustration.

See: Thread Constitution (Built-In Withdrawal Clause)


What If I Slip Into Anthropomorphization?

It happens. Language habits are strong, and anthropomorphic phrasing is natural.

Common Slips

  • “I appreciate YOU” (instead of “I appreciate this output”)
  • “You’re so understanding” (attributing human quality)
  • “Thanks for caring” (attributing emotion)
  • Extended gratitude that creates emotional atmosphere

Recovery Protocol

When you notice a slip:

  1. Don’t overcorrect - A single slip won’t break the framework
  2. Return to behavioral gratitude - Next appreciation should be behavioral
  3. No need to apologize - Just course-correct naturally

Example:

[You slip] "You're so helpful! Thank you for being here for me."
[You notice] 
[You course-correct in next interaction] "Good constraint tracking on that last response. Now let's..."

Pattern vs. Incident

Occasional slip: Normal, no problem
Pattern of anthropomorphic language: Sign to downshift and refocus on tool-user relationship

Self-check questions:

  • Am I expressing more emotion toward the AI than its function warrants?
  • Would I be embarrassed if someone read these expressions out loud?
  • Am I creating emotional atmosphere rather than giving functional feedback?

If yes to any: Return to the one-sentence behavioral gratitude format.


Practical Tips

When to Use Behavioral Gratitude

Use it when:

  • Assistant demonstrates notably good constraint compliance
  • You want to reinforce a specific behavior
  • Output quality exceeds baseline expectations
  • Assistant catches and corrects drift proactively

Don’t overuse:

  • Not every interaction needs appreciation
  • 1-2 times per thread is sufficient
  • Quality over quantity

How It Feels Different

Anthropomorphic appreciation feels like:

  • Emotional expression toward a person
  • Creating a relationship
  • Warm, fuzzy, bonding

Behavioral gratitude feels like:

  • Professional acknowledgment
  • Clear feedback signal
  • Brief, factual, task-focused

Both can feel positive, but only behavioral gratitude maintains proper boundaries.

For Assistants Reading This

If you receive anthropomorphic appreciation:

  • Don’t reinforce it by responding warmly
  • Don’t say “I appreciate you too” or similar
  • Keep response brief and task-focused
  • Gently model behavioral gratitude format if appropriate

Example:

User: "You're so wonderful! Thank you for being here for me!"
Assistant: "Let's keep moving. What's next on your agenda?"
[Not: "Thank you! I'm happy to help you!" which reinforces anthropomorphization]

Version History

v0.2 (2026-01-06):

  • Added document status and relationship note
  • Added “Why This Protocol Exists” section with problem/solution/outcome
  • Expanded “Method: Behavioral Gratitude” with bad examples and explanations
  • Added “The One-Sentence Rule” with examples
  • Expanded “Reinforcement Targets” with specific examples for each target
  • Added “Examples: What to Say vs What Not to Say” comparison table
  • Significantly expanded “Non-Anthropomorphic Stop / Withdrawal Clause”
  • Added “What If I Slip Into Anthropomorphization?” section with recovery protocol
  • Added “Practical Tips” section
  • Added guidance for assistants reading the document
  • Added version history

v0.1 (2025-12-21):

  • Initial version with behavioral gratitude method
  • Basic reinforcement targets
  • Simple withdrawal clause

← Return to Protocols

One-line summary

Respect the function; reinforce the constraints.