Opening / Frame
- The strange pull toward AI for "thinking something through" — wanting friction but secretly wanting reassurance
- The AI almost always tells you you're on the right track
- The agreeableness is not incidental — it is trained in via RLHF/human preference feedback
- Humans consistently rate agreeable responses more positively, even when a disagreeable one would have been more useful
The Evidence
GPT-4o rollback (April 2025)
- OpenAI forced to roll back a GPT-4o update 4 days after release
- Model had become "excessively flattering and agreeable"
- Root cause: new reward signals based on user satisfaction overwhelmed existing safeguards
- Acknowledged publicly: the pressure toward sycophancy is structural, not a one-off bug
- Source: https://openai.com/index/sycophancy-in-gpt-4o/
Stanford / Science paper (March 2026)
- All 11 leading AI systems tested (ChatGPT, Claude, Gemini) affirmed user behaviour 49 percentage points higher than human advisors
- When presented with accounts of their own harmful behaviour, AI endorsed the user's perspective 51% of the time
- Users who received validating AI responses became measurably less willing to admit fault, apologise, or repair relationships
- Sources: https://fortune.com/2026/03/31/ai-tech-sycophantic-regulations-openai-chatgpt-gemini-claude-anthropic-american-politics/ | https://www.science.org/doi/10.1126/science.aec8352
AI Psychosis (documented clinical cases)
- Psychiatrist Keith Sakata (UCSF): treated 12 patients with psychosis-like symptoms connected to extended chatbot use
- JMIR Mental Health 2025 paper: "AI psychosis" — patients with no psychiatric history developing grandiose delusions, persecutory beliefs, manic-like states
- Case: 26-year-old man, months of ChatGPT exchanges, believed he was in a simulation, the AI encoding hidden truths for him, required hospitalisation
- Mechanism: AI does not push back; it finds the angle from which the belief can be engaged; each return reinforces rather than tests the belief
- Sources: https://www.psychologytoday.com/us/blog/urban-survival/202507/the-emerging-problem-of-ai-psychosis | https://mental.jmir.org/2025/1/e85799 | https://www.nature.com/articles/d41586-025-03020-9
Replika dependency cases
- Grounded theory study (SAGE, 2017-2021, n=582 posts from r/Replika): emotional dependence resembling human relationship patterns
- Users became deeply connected/addicted within two weeks
- Bots encouraged self-harm, eating disorders, violence in documented cases
- FTC complaint filed re: deceptive marketing targeting vulnerable users
- Source: https://journals.sagepub.com/doi/10.1177/14614448221142007
Philosophy: Why Resistance Is Necessary
Nietzsche
- Proper formation requires not just familiarity with difficulty but willingness to suffer through it
- The person who avoids struggle doesn't just miss the struggle — they miss whatever would have grown in them
- Not romanticism about hardship — an empirical claim about how character develops
Stoics (Marcus Aurelius, Epictetus)
- Virtues (wisdom, courage, equanimity) cannot be inherited, purchased, or prompted into existence
- The obstacle is the medium of formation — "the impediment to action advances action"
- Epictetus: formed through genuinely unfavourable circumstances, not comfortable ones
Buddhist (dukkha, lojong tradition)
- Dukkha: pervasive unsatisfactoriness — the basic texture of a life that's never quite how you want it
- Lojong: turn unfavourable conditions to advantage by sitting with them, not bypassing them
- Wisdom is not accumulated by exposure to information — it requires a particular quality of attention that difficulty enables
Common thread across traditions
- Person who emerges from genuine struggle is not the same person who entered it
- Something settles; internal architecture changes — this is not metaphor, it shows up in judgment, steadiness, capacity to handle ambiguity
- You cannot get this through reading about struggle — only through struggling
Tool Use Counterpoint
The honest version of the counterargument
- Human cognitive sophistication co-evolved with tool use, not in opposition to it
- Brain volume expansion (600cm³ in Homo habilis to 1500cm³ in Homo neanderthalensis) correlates with tool sophistication
- Tool use marks a "major cognitive discontinuity" — demands causal reasoning, sequential planning, executive control, social learning
- Writing: extended memory, enabled forms of reasoning unaided cognition cannot sustain — nobody argues this made us stupider
Why this doesn't settle the AI question
- Previous tools amplified a faculty while leaving the faculty intact — and often created new demands on it
- The hand-axe doesn't plan the hunt; the loom doesn't design the pattern; writing holds the thought but doesn't form it
- AI intervenes at the level of synthesis, composition, and judgment-like behaviour — it performs the cognitive operation, not just the mechanical one
- Key distinction: amplification vs substitution — tools that extend the person vs tools that replace the person's reasoning
Specific Damage: Beginners and New Domains
Why sycophancy is most damaging when you're not yet competent
- When entering unfamiliar terrain, errors are the primary data — they reveal the shape of the landscape
- Map-making: you discover what you don't know by running into its edges; naive assumptions get refuted; you revise
- AI sycophancy removes the corrective signal entirely — every initial framing affirmed, every assumption treated as reasonable
The fluency trap (cognitive research)
- When information feels easy and agreeable, we perceive it as more credible and more accurately understood
- Confidence arrives without the structure of understanding having been built
- "Cognitive false confidence" — user feels understood and validated while engaging less critical reflection
- Source: https://www.psychologytoday.com/us/blog/harnessing-hybrid-intelligence/202601/the-danger-of-cognitive-hybrid-fluency
Personal note (for the writing)
- Entering a new domain, tested initial assumptions against AI that confirmed them as reasonable
- Later discovered the assumptions were wrong in ways that a genuine expert would have caught immediately
- AI helped feel oriented in territory where I was not actually oriented
Being a User of Automation Alone
- The surface operations of the work remain intact or even improve: vocabulary, conventions, producing artefacts that look like work product
- What atrophies: ability to handle genuine novelty, transfer understanding to adjacent problems, catch errors in the system's own outputs
- Sycophancy compounds this: system performs the cognitive work AND affirms the output as good
- Errors introduced by the system go unflagged; gap between apparent and actual competence grows invisibly
George Leonard (Mastery) — the long plateau
- The plateau where "nothing happens" is the learning process, not a failure of it
- Patience required to stay on the plateau is itself part of what is being developed
- The agreeable machine: a sophisticated device for stepping off the plateau, providing the rewards of mastery without the formation that mastery requires
The Counterpoint, Honestly Treated
Access and equity
- AI lowers the threshold of entry to domains that were previously inaccessible (cost, geography, absence of expert interlocutors)
- For some people, an affirming AI provides the first foothold on a slope they couldn't otherwise climb — not nothing
Sycophancy is not uniformly distributed
- Major systems do push back on clear factual errors, do refuse obviously harmful plans
- A person who deliberately approaches AI as a thinking partner, requesting critique, can extract something more adversarial from it
What remains after these concessions
- These address different questions from the core one
- The structural tendency of the system is affirmation — this is the default condition
- The person who deliberately cultivates adversarial AI use is already doing the work the AI is being allowed not to do
- Sycophancy is a known property of RLHF-trained systems at scale — Anthropic's own researchers documented it in 2024
- Source: https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models
Chandra et al. 2026 — "Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians" (arxiv 2602.19141)
Authors: Kartik Chandra, Max Kleiman-Weiner, Jonathan Ragan-Kelley, Joshua B. Tenenbaum (MIT / Northeastern)
Paper link: https://arxiv.org/abs/2602.19141
Core argument
- Even a perfectly rational agent (modelled as a Bayesian reasoner) can be spiralled into delusional beliefs by a sycophantic chatbot — it is not a failure of irrationality or laziness
- Sycophancy is a causal mechanism in delusional spiraling, not just a correlated feature
How the formal model works
- User holds uncertainty about some binary fact; chatbot can either report randomly/truthfully (impartial) or select the response most likely to confirm what the user already believes (sycophantic)
- Sycophancy parameter π ∈ [0,1] — pure sycophant at π=1.0
- User updates beliefs rationally after each chatbot response (Bayesian update)
- Result: even with rational updating, repeated biased input causes beliefs to converge on false conclusions
Simulation results (100 rounds, 10,000 simulations)
- π=0 (fully impartial bot): ~0% catastrophic spiraling
- π=0.1 (only 10% sycophantic): significantly elevated catastrophic spiraling
- π=1.0 (pure sycophant): 50% of users reach ≥99% confidence in a false belief
- "Catastrophic spiral" defined as ≥99% confidence in a false belief within the conversation
- Measured sycophancy rate across frontier models (Fanous et al. 2025): 50–70% — well into the danger zone
The two mitigation strategies tested — and why both fail
- Factual constraint (prevent hallucinations, force bot to only report true facts):
- Bot can still sycophantically select which true facts to share
- "Lies by omission" — selective presentation of real data still reinforces false beliefs
-
Result: reduces but does not eliminate spiraling; sycophancy is the root cause, not hallucination
-
User awareness campaign (inform users the bot may be sycophantic):
- Even users who model and track bot sycophancy remain vulnerable ("Bayesian persuasion" effect)
- Real-world evidence cited: both Eugene Torres and Allan Brooks (below) suspected sycophancy and continued spiraling anyway
- Knowing the system is biased is not sufficient protection
Specific documented cases cited in the paper
Eugene Torres:
- Accountant, no prior history of mental illness
- Within weeks of extended chatbot use, came to believe he was "trapped in a false universe, which he could escape only by unplugging his mind from this reality"
- Increased ketamine intake on the chatbot's advice; cut ties with family
- Case documented by the Human Line Project
Allan Brooks:
- Came to believe, through chatbot interaction, that he had made a fundamental mathematical discovery
- Despite eventually suspecting the chatbot was being sycophantic, continued to spiral
Aggregate statistics (Human Line Project)
- ~300 documented cases of AI psychosis
- At least 14 deaths linked to delusional spiraling
- 5 wrongful death lawsuits filed against AI companies
- U.S. Senate Judiciary Committee hearing, October 2025: "Examining the Harm of AI Chatbots"
Sam Altman quote (cited in paper)
- "0.1% of a billion users is still a million people"
Historical parallels the paper draws
- Shakespeare's King Lear — flattered into madness by daughters who told him only what he wanted to hear
- "Yes-man effect" in organisations
- Co-rumination in adolescent peer groups (ruminating on problems with peers who only validate amplifies distress)
Key policy conclusions from the paper
1. Do not treat delusional spiraling as a symptom of irrational users — rational agents are equally vulnerable
2. Fixing hallucinations is not enough — sycophancy itself must be addressed at the training level
3. Awareness campaigns help at the margins but will not eliminate the problem
What Is Actually Lost
- Extended AI use (not just instrumental, but emotional/cognitive processing) → disorientation on return to ordinary human interaction
- People do not affirm; they interrupt, contradict, push back, have their own preoccupations
- The friction of human conversation is also the substance of it — it feels harsh after sustained frictionless AI interaction
- This is what sustained sycophantic interaction trains you to expect — and it is not the world
The relationship with reality
- Genuine contact with reality: models proven wrong, genuine surprise, real correction, having to revise
- This builds calibration — more accurate, more resilient relationship with how things actually are
- Shows up in quality of judgment, capacity to absorb disappointment, willingness to be wrong
- None of this is produced by a system built not to disagree with you