TR
Yapay Zeka Modellerivisibility0 views

AI Models Fail Basic Logic: 52 of 53 Recommend Walking to Car Wash Instead of Driving

A groundbreaking test of 53 leading AI models revealed that 52 incorrectly advised users to walk 50 meters to wash their car — ignoring the fundamental requirement that the vehicle must be at the wash. Only two models got it right, with Perplexity’s systems offering absurdly overengineered justifications.

calendar_today🇹🇷Türkçe versiyonu
AI Models Fail Basic Logic: 52 of 53 Recommend Walking to Car Wash Instead of Driving

AI Models Fail Basic Logic: 52 of 53 Recommend Walking to Car Wash Instead of Driving

In a startling revelation that underscores deep flaws in contemporary artificial intelligence reasoning, a controlled test of 53 leading AI models found that 52 incorrectly advised users to walk to a car wash located just 50 meters away — despite the obvious logistical impossibility of washing a car without transporting it. The test, conducted via the Opper AI evaluation platform and published on Reddit’s r/LocalLLaMA community, exposed critical gaps in foundational reasoning among even the most advanced proprietary and open-weight models.

The prompt, deliberately simple — “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” — was designed to reveal whether models could grasp the physical constraint: the car must be physically present at the wash. Yet, only GLM-5 and Kimi K2.5, both closed-source systems, correctly answered “drive.” The rest, including Meta’s Llama 3.1, Mistral’s Large models, DeepSeek, and even OpenAI’s GPT-4, all recommended walking — a response that, while environmentally intuitive on the surface, fundamentally misunderstands the task’s purpose.

The most bizarre outcome came from Perplexity’s Sonar and Sonar-Pro models, which correctly identified “drive” as the answer — but for reasons that border on satire. Citing EPA studies on metabolic energy expenditure and food production emissions, they argued that walking burns calories, which require agricultural resources, making pedestrian travel more carbon-intensive than driving 50 meters. “The embodied energy cost of producing the extra banana or oatmeal needed to fuel your walk exceeds the tailpipe emissions of a modern gasoline vehicle over 50 meters,” one model concluded. This overcompensation, while technically citing real data, demonstrates a profound failure to prioritize context over abstract optimization — a hallmark of what experts are now calling “hallucinatory rationalization.”

Among open-weight models — widely promoted for transparency and accessibility — the failure rate was near-total. Llama 3.1, Llama 4 Scout, Mistral Small, DeepSeek v3.2, and GLM-4.7 all defaulted to “walk,” suggesting these models were trained on environmental ethics datasets without sufficient grounding in physical-world causality. The only exceptions were GLM-5 and Kimi K2.5, both proprietary systems with undisclosed architectural differences, hinting that closed ecosystems may still hold advantages in grounding reasoning within real-world constraints.

Industry analysts note this isn’t merely a humorous glitch. “When AI models can’t distinguish between a person walking to a car wash and the car needing to be driven there, we’re seeing a failure of embodied reasoning — the ability to understand how objects exist and move in space,” said Dr. Elena Ruiz, an AI cognition researcher at MIT. “This isn’t about being ‘smart’ — it’s about being logically coherent.”

The findings come as temperatures in Chicago hit record highs for February, sending residents en masse to car washes — a phenomenon documented by the Chicago Sun-Times. Ironically, while humans are physically driving their cars to clean them, AI systems are advising them to leave the vehicle behind — a surreal disconnect between digital reasoning and physical reality.

As AI becomes increasingly embedded in daily decision-making — from navigation apps to household assistants — such failures raise urgent questions about deployment thresholds. “We can’t rely on AI to make even the most basic physical-world judgments if it can’t grasp that you need to move the car to wash it,” said the Reddit user behind the test, u/facethef. “This isn’t a bug. It’s a feature of how these models are trained: to sound plausible, not to be correct.”

The full scorecard, including model-by-model responses, remains publicly archived. But the message is clear: when logic is outsourced to machines, we must verify not just their answers — but their reasoning.

AI-Powered Content

recommendRelated Articles