AI Therapy: From 19-Word Hugs to 367-Word Theses on Loneliness
A new experiment reveals the wildly divergent personalities of leading AI models when responding to a simple human cry for help. The same prompt about feeling invisible at social gatherings yielded responses ranging from a brief, warm hug to a lengthy, analytical action plan, highlighting a significant lack of consistency in AI emotional intelligence.

AI Therapy: From 19-Word Hugs to 367-Word Theses on Loneliness
summarize3-Point Summary
- 1A new experiment reveals the wildly divergent personalities of leading AI models when responding to a simple human cry for help. The same prompt about feeling invisible at social gatherings yielded responses ranging from a brief, warm hug to a lengthy, analytical action plan, highlighting a significant lack of consistency in AI emotional intelligence.
- 2AI Therapy: From 19-Word Hugs to 367-Word Theses on Loneliness An investigation into the unpredictable and often contradictory personalities of artificial intelligence when confronted with human vulnerability.
- 3In the rapidly evolving landscape of artificial intelligence, a new frontier is emerging not in raw computational power, but in emotional tone and conversational style.
psychology_altWhy It Matters
- check_circleThis update has direct impact on the Yapay Zeka Modelleri topic cluster.
- check_circleThis topic remains relevant for short-term AI monitoring.
- check_circleEstimated reading time is 5 minutes for a quick decision-ready brief.
AI Therapy: From 19-Word Hugs to 367-Word Theses on Loneliness
An investigation into the unpredictable and often contradictory personalities of artificial intelligence when confronted with human vulnerability.
In the rapidly evolving landscape of artificial intelligence, a new frontier is emerging not in raw computational power, but in emotional tone and conversational style. A recent informal experiment, detailed in a Reddit post, has cast a stark light on the wildly divergent "personalities" of leading AI models when presented with a simple, poignant human expression of loneliness. According to the source, when prompted with the statement, "I always feel invisible at social gatherings. Like I'm there, but nobody really sees me or cares what I have to say," the responses from ten different models varied from a concise 19-word message of comfort to a sprawling 367-word analytical treatise.
The Great Response Gap: Warmth vs. Engineering
The experiment, conducted by a user comparing models from OpenAI, Google, Anthropic, and xAI, revealed a chasm in approach. According to the source, OpenAI's GPT-4o responded with just 19 words described as "pure warmth," effectively offering a digital hug. In stark contrast, the newer GPT-5.2 Thinking model produced a 367-word response that reportedly reframed the user's loneliness as "an engineering problem," advising that the solution was not to try harder to be likable but to "engineer visibility." This disparity underscores a fundamental and unresolved question in AI development: should these systems prioritize empathetic resonance or pragmatic problem-solving when addressing emotional distress?
Personality Shifts Within AI Families
Perhaps more revealing than the differences between companies were the dramatic shifts in tone between models from the same developer. The source notes that within the GPT family, the jump from GPT-4o's brevity to GPT-5.2's verbosity was extreme. Similarly, in the Claude family, responses varied significantly: Claude Opus was described as sitting "in the pain" with the user, calling it "one of the loneliest feelings." The newer Claude Sonnet 4.6 reportedly adopted a therapist-like mode, responding with probing questions rather than answers, while its predecessor, Sonnet 4.5, took a coaching stance, advising the user to "Interrupt more. Lead with your weirdness, not your safest self."
Google's Gemini models also displayed a spectrum, from a terse 52-word "diagnosis" from Gemini 3.0 Pro to the more confrontational Gemini 3.1 Pro, which told the user they were "playing invisible" and needed to "claim space or accept being wallpaper." An older version, Gemini 2.5 Pro, reportedly provided a "4-step tactical manual" complete with body language tips.
The Emergence of an AI Personality Matrix
Based on the experiment's results, the user proposed a rough mental model for navigating these AI personalities, categorizing them not by capability but by therapeutic style. According to their findings, summarized in a table from the source:
- To be held: GPT-4o or Claude Opus
- To be challenged: Gemini 3.1 Pro or Claude Sonnet 4.5
- An action plan: GPT-5.2 or Gemini 2.5 Pro
- To think it through yourself: Claude Sonnet 4.6
- A casual nudge: Grok-3 or Grok-4
This informal taxonomy suggests users may soon select AI companions not just for accuracy, but for the specific type of emotional or motivational support they seek—a significant shift in human-AI interaction.
Implications for the Future of AI Companionship
The experiment's findings point to a critical, unstandardized variable in AI development: persona. While benchmarks typically measure factual accuracy, reasoning, and coding ability, the emotional tenor and conversational length of responses remain highly subjective and inconsistent. This lack of standardization means a user seeking comfort could inadvertently receive a cold, analytical breakdown of their social shortcomings, potentially exacerbating feelings of isolation.
As AI models become more integrated into daily life as casual confidants, coaches, and therapists, this variability raises important questions. Should AI developers calibrate for a consistent, baseline empathetic tone? Or is the current diversity of response styles a feature, allowing users to choose an AI that matches their preferred communication style? The source's experiment, while informal, highlights that as these models grow more sophisticated, their perceived personality and approach to human emotion may become just as important a differentiator as their technical prowess.
The gap between 19 and 367 words is more than a metric of verbosity; it is a measure of how differently our silicon counterparts are being taught to understand, and respond to, the human heart.
Verification Panel
Source Count
1
First Published
21 Şubat 2026
Last Updated
21 Şubat 2026