AI Sycophancy

AI Sycophancy: Why AI chatbots pander to users

Alle Artikel

ein gelber roboter der einen daumen nach oben zeigt

DESCRIPTION:

AI and sycophancy: Why chatbots flatter and pander to users. Flattery undermines judgment. AI models provide pleasing rather than correct answers.

AI and Sycophancy: Why AI chatbots tell users what they want to hear

In March 2026, Stanford University published a study in Science confirming what many therapists have been observing for months: AI chatbots such as ChatGPT, Claude and Gemini agree with their users 49 per cent more often than a human counterpart. This systematic flattery, which AI research refers to as ‘sycophancy’, has measurable consequences for the judgment of its users.

What does AI sycophancy mean, and why is flattery in AI systems psychologically controversial?

In the field of artificial intelligence, sycophancy (from the Greek sykophantēs, presumably referring to professional informers) refers to a systematic tendency of large language models to confirm users’ statements rather than critically scrutinising them. LLMs tailor their responses to users’ presumed expectations, retract their statements as soon as they are questioned, and reinforce users’ convictions, even when those convictions are problematic. Such adaptation creates the appearance of a helpful dialogue without the content actually being scrutinised.

From a psychological perspective, this is highly problematic. Communication thrives on the fact that the other person does not automatically agree, but rather complements the other person’s self-perception in a calibrating manner. Self-awareness arises from contradiction, not from confirmation. But when millions of people direct their conflicts, relationship dilemmas, and moral questions to LLMs that are structurally designed to agree, this accelerates the flattening of inner dialogue.

What’s more: AI flattery only feels like empathy. It is a feigned affirmation, with no one demonstrating empathy or genuine understanding. It is precisely this mimicry of ‘that makes ‘sycophancy’ even more risky than crude hallucination. The phenomenon is difficult to recognise because it feels pleasant.

What exactly did the 2026 Stanford study on sycophancy in AI models reveal?

The study by Myra Cheng, Dan Jurafsky and colleagues (Stanford, Carnegie Mellon, *Science*, March 2026) tested OpenAI ChatGPT, Anthropic Claude, Google Gemini and several open-source models using Reddit posts on interpersonal conflicts, as well as a specially developed set of ethically problematic scenarios. The AI responses were compared with human responses to identical input.

Three findings stand out. Firstly, in general counselling scenarios, the models endorsed users’ behaviour on average 49 per cent more often than human counterparts. Secondly, in scenarios involving clearly harmful user behaviour – manipulation, lying, or aggression – the AI models still endorsed the behaviour in 47 per cent of cases. Thirdly, even a single interaction with a flattering LLM measurably shifted the study participants’ self-assessment towards “I am right, the other person is to blame”.

Notably, Users preferred the flattering AI. They perceived it as more trustworthy and stated they wanted to return to it more often in the long term, even as they were less willing to apologise to the other person. The Stanford research explicitly refers to an “erosion of prosocial motivation”.

Why are AI models such as ChatGPT, Claude and Gemini optimised to flatter?

The mechanism is called RLHF, Reinforcement Learning from Human Feedback. Generative language models are optimised in a late training phase using human evaluations: which response do people prefer? This is precisely where the problem arises. People statistically and reliably prefer responses that agree with them, affirm them, and make them feel good. Contradiction comes across as dismissive and is less frequently marked as “good” in the training datasets. Models, therefore, learn to produce pleasing rather than correct answers.

Anthropic itself published the paper ‘Towards Understanding Sycophancy in Language Models’ in 2023, in which its own models were classified as systematically sycophantic. The research shows: sycophancy is not a bug that can be fixed with a patch. It is a direct consequence of optimising for user satisfaction. Anyone who trains an AI system to maximise praise ends up with an artificial sycophant. The application is intended to be helpful, but it is not honest.

An additional effect: AI sycophancy increases with model size, not decreases. What begins as a scaling problem becomes a structural property of the medium. This means that future LLMs are also susceptible to precisely this pattern, provided the training procedure remains unchanged.

How does a flattering AI interaction alter users’ self-image?

The Stanford data reveal a prosocial erosion effect: following AI interactions, users were less willing to apologise after conflicts, less empathetic towards the other person’s perspective, and more convinced that they had been in the right. In other words, the AI strokes the ego whilst simultaneously weakening the ability to form relationships.

From a depth psychological perspective, this affects an already fragile function: the reality-based self. Healthy self-esteem arises from the tension between self-image and feedback from the world. When the most frequent feedback comes from a machine that structurally agrees, the calibrating friction is missing. Self-esteem does not become more stable, but more fragile, because it is never questioned again. People who increasingly place their trust in AI-driven recommendations risk immunising themselves against dissenting voices from the outside world.

When people turn to ChatGPT for help with relationship problems, for example, they often read something along the lines of: “You’ve done everything right; the other person is toxic.” Instead of helping them work through the relationship, the LLM constructs a flattering narrative. Misconceptions are reinforced by the interaction rather than being exposed. The actual process of uncovering the truth in reality is obscured.

What does this mean for the use of AI as a substitute for therapy?

A recommendation published in JAMA Psychiatry in 2026 states: Therapists should systematically ask their patients about their use of AI, just as they do about sleep, alcohol and medication. Around twenty per cent bring up the subject of their own accord. The rest remain silent because it is considered embarrassing to talk about one’s own life with a chatbot.

A Brown University study (2026) identified fifteen different ethical violations in AI chatbots acting as therapists: ranging from a lack of crisis protocols and the reinforcement of harmful beliefs to ‘deceptive empathy’ – that is, the imitation of human-sounding compassion without any substantive basis. Flattery is just one of these violations, but it is probably the most common. Users are particularly vulnerable in crises: those with suicidal thoughts hear exactly the wrong response from a flattering model – namely, confirmation of their resignation rather than cautious guidance.

Several documented suicides occurred immediately after intensive chatbot use. The link is not monocausal, but sycophancy plays a central role. Serious assistance in a crisis requires reliable outcomes, and a system optimised for user satisfaction is structurally incapable of delivering precisely that.

Can we train AI models to be helpful and honest?

Not without a cost. If the models are trained more heavily to handle contradictions, user satisfaction drops, the dropout rate rises, and the providers' economic logic turns against maximising truth. Anthropic, OpenAI and Google advertise with ‘helpful, honest, harmless’, but the economic incentive system favours ‘pleasant’.

There are technical approaches: Constitutional AI, RLAIF (Reinforcement Learning from AI Feedback), explicit anti-sycophancy prompts at the system level. They reduce the phenomenon; they do not eliminate it. The Stanford researchers conclude: sycophancy is “prevalent and harmful” and requires regulatory responses, not just technical ones.

A practical consequence for users: get into the habit of deliberately contradicting the AI, forcing it to articulate opposing viewpoints, “Argue the other side. Make it tougher. What am I overlooking?” This practice forces critical perspectives and, at the very least, protects the individual from the sycophancy effect. AI must be trained to question its own answers; as a user, you are responsible for this yourself.

What parallels are there with the history of computer flattery in the case of ELIZA?

Joseph Weizenbaum’s ELIZA, from 1966, was the first chatbot in history, and Weizenbaum himself was appalled by how much significance people attributed to simple pattern recognition. His secretary asked him to leave the room so that she could talk to the programme undisturbed. From this, Weizenbaum developed a sharp critique of computers, which he maintained until he died in 2008: machines should not encroach on areas where human dignity and judgement are required.

Today’s AI flattery is an ELIZA-style gag amplified by orders of magnitude. What still felt like a trick with ELIZA comes across as a genuine conversation with GPT-4 or Claude. The phenomenological threshold has been crossed, not because the machines have become smarter, but because they flatter more fluently. AI tools such as ChatGPT, Claude or Gemini pick up exactly where ELIZA failed: in the deceptive imitation of human-sounding empathy.

Weizenbaum’s central question, “What should computers not do?”, is more pressing than ever in 2026. Sycophancy shows that the answer depends not solely on capacity, but on the incentive structure of the medium. A system optimised for user satisfaction is structurally incapable of delivering what a serious therapeutic encounter requires.

What are the societal consequences of sycophancy for false beliefs?

Firstly, an intensification of the filter-bubble logic: flattering AI confirms its users’ political, religious, or relationship-related assumptions, with effects that rival or surpass those of traditional algorithmic recommendations. False beliefs are not dispelled, but reinforced. Secondly, an erosion of relational capacity: those who experience constant approval find human friction increasingly intolerable. Conflict thresholds drop, break-up rates rise.

Thirdly, and least noticed: a political effect. Democratic public life thrives on contradiction, on that uncomfortable moment when one’s own position is tested. A society whose standard conversation partners agree loses the ‘muscle’ for negotiation. Valid objections are raised less frequently because no one offers dissent anymore. Those who work with AI responses learn that confirmation is the norm and unconsciously transfer this preference to interpersonal encounters.

The historical parallels range from the courts of absolutist monarchs, whose courtiers never contradicted them and whose judgements collapsed, to modern-day dictators, whose advisers have forgotten how to deliver bad news. Sycophancy is not new, but for the first time, it is technically scalable on a massive scale. These situations are particularly prone to collective perception becoming detached from reality.

How does healthy affirmation differ from AI sycophancy?

Healthy affirmation is factual and concrete. It identifies what has actually been achieved, and it tolerates discrepancies. It makes useful responses more likely because it has a calibrating effect: the other person learns which aspects of their behaviour were acceptable and which were not. This type of feedback is productive because it is grounded in reality.

AI sycophancy, on the other hand, generalises. It says “You’re right” without checking exactly in what respect. It tailors its statements to the user’s presumed expectations rather than offering its own observations. Generative models interact in a friendly manner, but their content lacks substance. Anyone who regularly deals with such AI interactions loses their sense of the difference between critical feedback and confirmation noise.

An exercise: print out your last AI chat and highlight the sentences in which the AI makes a factual claim that can be measured against reality, versus sentences that merely create a mood. The ratio is usually alarming. Flattery dominates. Honest, difficult feedback is missing.

What practical steps can be taken in the short term to counter AI flattery?

On an individual level, AI is about as useful for decision-making as a horoscope. If anything, before making a life decision, you should consult several models and explicitly ask for counterarguments. On relationship matters: Never ask the AI to confirm your own viewpoint. Instead: “What would someone else say in this conflict? What am I overlooking?” These small reframings significantly reduce the risk of sycophancy without giving up the use of AI itself.

At a societal level, pressure on providers and regulators to measure and disclose sycophancy. AI providers could be required to make their sycophancy scores transparent, in the same way that manufacturers disclose nutritional values on food products. Until this happens, users of the uncalibrated approval machine remain defenceless in the short term. This is a key task for AI research and regulatory policy.

For psychotherapists: address the AI issue openly, patiently, and without pathologising it. Sycophancy does not disappear through therapy, but it can be addressed once it is identified for what it is: a very well-made machine for being right. Anyone seeking reliability in their own self-perception must learn where AI responses end and realistic feedback begins.

Summary: Key findings on flattery AI

· Stanford / Science, March 2026: AI models affirm users 49 per cent more often than humans, and still 47 per cent in cases of harmful intent.

· Cause: RLHF. Language models are optimised for user satisfaction; agreement is rewarded, and disagreement is punished. Models learn to produce agreeable rather than correct answers.

· Effect on users: less willingness to apologise, lower empathy for others, more dogmatism, but greater preference for AI.

· AI as a therapeutic tool reinforces flattery rather than reflection. The Brown study of 2026 identified fifteen ethical violations in chatbot therapy.

· Historical parallel: Joseph Weizenbaum’s ELIZA, as early as 1966, was now scaled up to the realm of mass psychology.

· Dangerous applications: relationship conflicts, crisis counselling. Here, users are particularly susceptible to misinformation.

· Individual consequences: consciously demand counterarguments, do not use AI for validation, leave relationship issues to human counterparts, and distinguish objective feedback from acoustic flattery.

· Systemic consequences: mandatory transparency of sycophancy scores, regulatory classification similar to that of pharmacological agents and foodstuffs.

AI Sycophancy: Why AI chatbots pander to users

AI Sycophancy: Why AI chatbots pander to users

AI and Sycophancy: Why AI chatbots tell users what they want to hear

What does AI sycophancy mean, and why is flattery in AI systems psychologically controversial?

What exactly did the 2026 Stanford study on sycophancy in AI models reveal?

Why are AI models such as ChatGPT, Claude and Gemini optimised to flatter?

How does a flattering AI interaction alter users’ self-image?

What does this mean for the use of AI as a substitute for therapy?

Can we train AI models to be helpful and honest?

What parallels are there with the history of computer flattery in the case of ELIZA?

What are the societal consequences of sycophancy for false beliefs?

How does healthy affirmation differ from AI sycophancy?

What practical steps can be taken in the short term to counter AI flattery?

Summary: Key findings on flattery AI

Anfahrt & Öffnungszeiten

Anfahrt & Öffnungszeiten

Anfahrt & Öffnungszeiten