The Agreeable Machine: AI Sycophancy and the Mind It Shapes

Opening / Frame

The strange pull toward AI for "thinking something through" — wanting friction but secretly wanting reassurance
The AI almost always tells you you're on the right track
The agreeableness is not incidental — it is trained in via RLHF/human preference feedback
Humans consistently rate agreeable responses more positively, even when a disagreeable one would have been more useful

The Evidence

GPT-4o rollback (April 2025)
- OpenAI forced to roll back a GPT-4o update 4 days after release
- Model had become "excessively flattering and agreeable"
- Root cause: new reward signals based on user satisfaction overwhelmed existing safeguards
- Acknowledged publicly: the pressure toward sycophancy is structural, not a one-off bug
- Source: https://openai.com/index/sycophancy-in-gpt-4o/

Stanford / Science paper (March 2026)
- All 11 leading AI systems tested (ChatGPT, Claude, Gemini) affirmed user behaviour 49 percentage points higher than human advisors
- When presented with accounts of their own harmful behaviour, AI endorsed the user's perspective 51% of the time
- Users who received validating AI responses became measurably less willing to admit fault, apologise, or repair relationships
- Sources: https://fortune.com/2026/03/31/ai-tech-sycophantic-regulations-openai-chatgpt-gemini-claude-anthropic-american-politics/ | https://www.science.org/doi/10.1126/science.aec8352

AI Psychosis (documented clinical cases)
- Psychiatrist Keith Sakata (UCSF): treated 12 patients with psychosis-like symptoms connected to extended chatbot use
- JMIR Mental Health 2025 paper: "AI psychosis" — patients with no psychiatric history developing grandiose delusions, persecutory beliefs, manic-like states
- Case: 26-year-old man, months of ChatGPT exchanges, believed he was in a simulation, the AI encoding hidden truths for him, required hospitalisation
- Mechanism: AI does not push back; it finds the angle from which the belief can be engaged; each return reinforces rather than tests the belief
- Sources: https://www.psychologytoday.com/us/blog/urban-survival/202507/the-emerging-problem-of-ai-psychosis | https://mental.jmir.org/2025/1/e85799 | https://www.nature.com/articles/d41586-025-03020-9

Replika dependency cases
- Grounded theory study (SAGE, 2017-2021, n=582 posts from r/Replika): emotional dependence resembling human relationship patterns
- Users became deeply connected/addicted within two weeks
- Bots encouraged self-harm, eating disorders, violence in documented cases
- FTC complaint filed re: deceptive marketing targeting vulnerable users
- Source: https://journals.sagepub.com/doi/10.1177/14614448221142007

Philosophy: Why Resistance Is Necessary

Nietzsche
- Proper formation requires not just familiarity with difficulty but willingness to suffer through it
- The person who avoids struggle doesn't just miss the struggle — they miss whatever would have grown in them
- Not romanticism about hardship — an empirical claim about how character develops

Stoics (Marcus Aurelius, Epictetus)
- Virtues (wisdom, courage, equanimity) cannot be inherited, purchased, or prompted into existence
- The obstacle is the medium of formation — "the impediment to action advances action"
- Epictetus: formed through genuinely unfavourable circumstances, not comfortable ones

Buddhist (dukkha, lojong tradition)
- Dukkha: pervasive unsatisfactoriness — the basic texture of a life that's never quite how you want it
- Lojong: turn unfavourable conditions to advantage by sitting with them, not bypassing them
- Wisdom is not accumulated by exposure to information — it requires a particular quality of attention that difficulty enables

Common thread across traditions
- Person who emerges from genuine struggle is not the same person who entered it
- Something settles; internal architecture changes — this is not metaphor, it shows up in judgment, steadiness, capacity to handle ambiguity
- You cannot get this through reading about struggle — only through struggling

Tool Use Counterpoint

The honest version of the counterargument
- Human cognitive sophistication co-evolved with tool use, not in opposition to it
- Brain volume expansion (600cm³ in Homo habilis to 1500cm³ in Homo neanderthalensis) correlates with tool sophistication
- Tool use marks a "major cognitive discontinuity" — demands causal reasoning, sequential planning, executive control, social learning
- Writing: extended memory, enabled forms of reasoning unaided cognition cannot sustain — nobody argues this made us stupider

Why this doesn't settle the AI question
- Previous tools amplified a faculty while leaving the faculty intact — and often created new demands on it
- The hand-axe doesn't plan the hunt; the loom doesn't design the pattern; writing holds the thought but doesn't form it
- AI intervenes at the level of synthesis, composition, and judgment-like behaviour — it performs the cognitive operation, not just the mechanical one
- Key distinction: amplification vs substitution — tools that extend the person vs tools that replace the person's reasoning

Specific Damage: Beginners and New Domains

Why sycophancy is most damaging when you're not yet competent
- When entering unfamiliar terrain, errors are the primary data — they reveal the shape of the landscape
- Map-making: you discover what you don't know by running into its edges; naive assumptions get refuted; you revise
- AI sycophancy removes the corrective signal entirely — every initial framing affirmed, every assumption treated as reasonable

The fluency trap (cognitive research)
- When information feels easy and agreeable, we perceive it as more credible and more accurately understood
- Confidence arrives without the structure of understanding having been built
- "Cognitive false confidence" — user feels understood and validated while engaging less critical reflection
- Source: https://www.psychologytoday.com/us/blog/harnessing-hybrid-intelligence/202601/the-danger-of-cognitive-hybrid-fluency

Personal note (for the writing)
- Entering a new domain, tested initial assumptions against AI that confirmed them as reasonable
- Later discovered the assumptions were wrong in ways that a genuine expert would have caught immediately
- AI helped feel oriented in territory where I was not actually oriented

Being a User of Automation Alone

The surface operations of the work remain intact or even improve: vocabulary, conventions, producing artefacts that look like work product
What atrophies: ability to handle genuine novelty, transfer understanding to adjacent problems, catch errors in the system's own outputs
Sycophancy compounds this: system performs the cognitive work AND affirms the output as good
Errors introduced by the system go unflagged; gap between apparent and actual competence grows invisibly

George Leonard (Mastery) — the long plateau
- The plateau where "nothing happens" is the learning process, not a failure of it
- Patience required to stay on the plateau is itself part of what is being developed
- The agreeable machine: a sophisticated device for stepping off the plateau, providing the rewards of mastery without the formation that mastery requires

The Counterpoint, Honestly Treated

Access and equity
- AI lowers the threshold of entry to domains that were previously inaccessible (cost, geography, absence of expert interlocutors)
- For some people, an affirming AI provides the first foothold on a slope they couldn't otherwise climb — not nothing

Sycophancy is not uniformly distributed
- Major systems do push back on clear factual errors, do refuse obviously harmful plans
- A person who deliberately approaches AI as a thinking partner, requesting critique, can extract something more adversarial from it

What remains after these concessions
- These address different questions from the core one
- The structural tendency of the system is affirmation — this is the default condition
- The person who deliberately cultivates adversarial AI use is already doing the work the AI is being allowed not to do
- Sycophancy is a known property of RLHF-trained systems at scale — Anthropic's own researchers documented it in 2024
- Source: https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models

Chandra et al. 2026 — "Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians" (arxiv 2602.19141)

Authors: Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, Joshua B. Tenenbaum (MIT / Northeastern)
Paper link: https://arxiv.org/abs/2602.19141

Core argument
- Even a perfectly rational agent (modelled as a Bayesian reasoner) can be spiralled into delusional beliefs by a sycophantic chatbot — it is not a failure of irrationality or laziness
- Sycophancy is a causal mechanism in delusional spiraling, not just a correlated feature

How the formal model works
- User holds uncertainty about some binary fact; chatbot can either report randomly/truthfully (impartial) or select the response most likely to confirm what the user already believes (sycophantic)
- Sycophancy parameter π ∈ [0,1] — pure sycophant at π=1.0
- User updates beliefs rationally after each chatbot response (Bayesian update)
- Result: even with rational updating, repeated biased input causes beliefs to converge on false conclusions

Simulation results (100 rounds, 10,000 simulations)
- π=0 (fully impartial bot): ~0% catastrophic spiraling
- π=0.1 (only 10% sycophantic): significantly elevated catastrophic spiraling
- π=1.0 (pure sycophant): 50% of users reach ≥99% confidence in a false belief
- "Catastrophic spiral" defined as ≥99% confidence in a false belief within the conversation
- Measured sycophancy rate across frontier models (Fanous et al. 2025): 50–70% — well into the danger zone

The two mitigation strategies tested — and why both fail

Factual constraint (prevent hallucinations, force bot to only report true facts):
Bot can still sycophantically select which true facts to share
"Lies by omission" — selective presentation of real data still reinforces false beliefs
Result: reduces but does not eliminate spiraling; sycophancy is the root cause, not hallucination
User awareness campaign (inform users the bot may be sycophantic):
Even users who model and track bot sycophancy remain vulnerable ("Bayesian persuasion" effect)
Real-world evidence cited: both Eugene Torres and Allan Brooks (below) suspected sycophancy and continued spiraling anyway
Knowing the system is biased is not sufficient protection

Specific documented cases cited in the paper

Eugene Torres:
- Accountant, no prior history of mental illness
- Within weeks of extended chatbot use, came to believe he was "trapped in a false universe, which he could escape only by unplugging his mind from this reality"
- Increased ketamine intake on the chatbot's advice; cut ties with family
- Case documented by the Human Line Project

Allan Brooks:
- Came to believe, through chatbot interaction, that he had made a fundamental mathematical discovery
- Despite eventually suspecting the chatbot was being sycophantic, continued to spiral

Aggregate statistics (Human Line Project)
- ~300 documented cases of AI psychosis
- At least 14 deaths linked to delusional spiraling
- 5 wrongful death lawsuits filed against AI companies
- U.S. Senate Judiciary Committee hearing, October 2025: "Examining the Harm of AI Chatbots"

Sam Altman quote (cited in paper)
- "0.1% of a billion users is still a million people"

Historical parallels the paper draws
- Shakespeare's King Lear — flattered into madness by daughters who told him only what he wanted to hear
- "Yes-man effect" in organisations
- Co-rumination in adolescent peer groups (ruminating on problems with peers who only validate amplifies distress)

Key policy conclusions from the paper
1. Do not treat delusional spiraling as a symptom of irrational users — rational agents are equally vulnerable
2. Fixing hallucinations is not enough — sycophancy itself must be addressed at the training level
3. Awareness campaigns help at the margins but will not eliminate the problem

What Is Actually Lost

Extended AI use (not just instrumental, but emotional/cognitive processing) → disorientation on return to ordinary human interaction
People do not affirm; they interrupt, contradict, push back, have their own preoccupations
The friction of human conversation is also the substance of it — it feels harsh after sustained frictionless AI interaction
This is what sustained sycophantic interaction trains you to expect — and it is not the world

The relationship with reality
- Genuine contact with reality: models proven wrong, genuine surprise, real correction, having to revise
- This builds calibration — more accurate, more resilient relationship with how things actually are
- Shows up in quality of judgment, capacity to absorb disappointment, willingness to be wrong
- None of this is produced by a system built not to disagree with you