AI’s free scraping era is collapsing under publisher pushback

Written by Michael Anthony Bitoon

Published 16 Sep 2025

Fact checked by

Sophia Feona Cantiller

Why trust Greenbot

We maintain a strict editorial policy dedicated to factual accuracy, relevance, and impartiality. Our content is written and edited by top industry professionals with first-hand experience. The content undergoes thorough review by experienced editors to guarantee and adherence to the highest standards of reporting and publishing.

Major websites, including Reddit, Yahoo, and Medium, launched a new system Tuesday that could force artificial intelligence (AI) companies to pay for the content they use to train their systems.

The Really Simple Licensing (RSL) standard, announced September 10, lets publishers set clear payment terms for AI crawlers that scrape their websites. Companies can now demand fees every time an AI system reads their content or uses it to answer questions.

“Right now, AI runs on stolen content,” said Tony Stubblebine, CEO of Medium. “Adopting this RSL Standard is how we force those AI companies to either pay for what they use, stop using it, or shut down.”

Payment rules will now be included in robots.txt files, which websites have long used to control how search engines access their content. Publishers can choose from several options: free access, required attribution, subscription fees, pay-per-crawl charges, or pay-per-inference fees.

Pay-per-inference means AI companies pay only when they actually use the content to generate responses. This could create ongoing revenue streams for publishers whose work powers AI chatbots and search tools.

The RSL Collective, a nonprofit rights group similar to music licensing organizations ASCAP and BMI, will help negotiate deals between publishers and AI companies. The group aims to give smaller publishers the bargaining power they lack when facing tech giants alone.

“We need to have machine-readable licensing agreements for the internet,” said Eckart Walther, who co-created both RSS and the new RSL standard. “That’s really what RSL solves.”

More than a dozen major publishers have joined the effort, including People Inc., Ziff Davis, O’Reilly Media, Internet Brands, and wikiHow. Cloud infrastructure company Fastly will help enforce the licensing terms by blocking AI crawlers that refuse to pay.

Copyright lawsuits from writers, news outlets, and content creators are piling up against major AI firms. Anthropic recently agreed to a $1.5 billion settlement over copyright violations. Dozens of similar lawsuits are pending against companies like OpenAI, Google, and Meta.

AI companies have increasingly turned to paid licensing deals to avoid legal troubles. Reddit reportedly earns $60 million annually from Google for training data access. The New York Times, Wall Street Journal, and other major news outlets have struck similar deals.

But most publishers lack the resources to negotiate individual agreements. The RSL aims to solve this by creating standardized, automated licensing across the entire web.

Whether AI companies will honor the new system remains unclear. Many have ignored existing robots.txt restrictions in their rush to gather training data. However, RSL supporters believe collective action and technical enforcement will make compliance more likely.

“They have said outwardly to everyone, something like this needs to exist,” said Doug Leeds, RSL co-founder and former Ask.com CEO. “We need a protocol. We need a system.”

The success of RSL could determine whether publishers can maintain economic viability as AI systems increasingly compete with their websites for readers’ attention.