Open-Source Tool Emerges After Anthropic Accuses Chinese Labs of Scraping Claude
Following allegations that Chinese AI labs scraped Claude conversations to train competing models, a developer released DataClaw—a tool that open-sources 155,000 anonymized Claude interactions. The move has sparked global debate over AI training ethics, data ownership, and corporate double standards.

Open-Source Tool Emerges After Anthropic Accuses Chinese Labs of Scraping Claude
summarize3-Point Summary
- 1Following allegations that Chinese AI labs scraped Claude conversations to train competing models, a developer released DataClaw—a tool that open-sources 155,000 anonymized Claude interactions. The move has sparked global debate over AI training ethics, data ownership, and corporate double standards.
- 2Open-Source Tool Emerges After Anthropic Accuses Chinese Labs of Scraping Claude In a dramatic escalation of the global AI training data war, a developer has released DataClaw—an open-source tool that aggregates and anonymizes 155,000 Claude conversations—just days after Anthropic publicly accused Chinese AI laboratories of illicitly scraping user interactions to train rival models.
- 3The release, which garnered over 360 GitHub stars within 24 hours and drew a rare public acknowledgment from Elon Musk, has ignited a fierce debate about the ethics of AI training data, corporate hypocrisy, and the future of proprietary AI models.
psychology_altWhy It Matters
- check_circleThis update has direct impact on the Yapay Zeka Modelleri topic cluster.
- check_circleThis topic remains relevant for short-term AI monitoring.
- check_circleEstimated reading time is 4 minutes for a quick decision-ready brief.
Open-Source Tool Emerges After Anthropic Accuses Chinese Labs of Scraping Claude
In a dramatic escalation of the global AI training data war, a developer has released DataClaw—an open-source tool that aggregates and anonymizes 155,000 Claude conversations—just days after Anthropic publicly accused Chinese AI laboratories of illicitly scraping user interactions to train rival models. The release, which garnered over 360 GitHub stars within 24 hours and drew a rare public acknowledgment from Elon Musk, has ignited a fierce debate about the ethics of AI training data, corporate hypocrisy, and the future of proprietary AI models.
According to Anthropic’s official blog, the company released a revised constitution for its Claude models on January 22, 2026, explicitly outlining ethical boundaries and behavioral guidelines designed to ensure AI systems operate with integrity, transparency, and respect for user data. The constitution, described as a foundational document shaping Claude’s training and outputs, emphasizes the importance of “respecting the origins of information” and “avoiding exploitation of freely shared data.” Yet, in early February 2026, Anthropic publicly alleged that Chinese AI labs had systematically harvested public Claude conversations to enhance the performance of their own models, including the recently unveiled DeepSeek-V3, which some observers noted bore striking similarities to Claude Opus 4.6.
Anthropic’s response—demanding stricter data access controls and tighter API policies—was met with backlash from the open AI community. Critics argued that Anthropic had built its own models using vast quantities of publicly available internet data, including Reddit threads, GitHub repositories, and public forum posts, before imposing restrictive terms to prevent others from doing the same. “It’s like pulling up the ladder after you’ve climbed it,” reads the README of DataClaw, created by developer Peter O’Mallet. “Anthropic used open data to train Claude. Now they’re trying to lock the door.”
DataClaw is designed to be simple: users input their own Claude interactions (via API exports or copy-paste), and the tool anonymizes, structures, and compiles them into a reusable dataset. The repository includes documentation on ethical anonymization techniques and encourages contributors to share their data under open licenses. Within hours of its launch, GitHub repositories began popping up with curated subsets of the dataset, including specialized collections for coding, legal reasoning, and medical advice—mirroring the very domains where Anthropic claims Claude Opus 4.6 excels.
Anthropic has not yet issued a formal response to the DataClaw release. However, according to internal documents leaked to tech publication AI Weekly, the company is evaluating legal options under its updated Terms of Service, which now prohibit “reverse engineering of model behavior through interaction harvesting.” Legal experts remain divided on whether such terms are enforceable, particularly when data is voluntarily shared by users in public-facing interfaces.
The controversy underscores a deeper tension in the AI industry: as models grow more powerful, the lines between proprietary innovation and open collaboration blur. While Anthropic touts its Responsible Scaling Policy and Transparency Initiative as pillars of ethical AI development, the DataClaw movement suggests a growing belief among developers that the era of corporate control over AI training data is unsustainable—and morally inconsistent.
Meanwhile, researchers at MIT and Stanford are already using DataClaw datasets to benchmark model alignment and detect potential data contamination in open-source LLMs. The tool’s rapid adoption signals a potential shift in power: instead of waiting for corporations to self-regulate, the AI community is taking data sovereignty into its own hands.


