Silicon Valley startups are offering engineers half-million-dollar salaries to build digital training grounds for artificial intelligence (AI), competing with tech giants who plan to spend billions on the same technology.
Mechanize, Prime Intellect, and Surge are racing against Anthropic, OpenAI, and Meta to develop reinforcement learning (RL) environments. These simulated workspaces teach AI agents through trial and error, rewarding correct actions and penalizing mistakes. Anthropic alone reportedly plans to invest over $1 billion in 2025.
The technology marks a shift from training AI on text to teaching it through practice. Agents learn to navigate websites, write code, and handle customer service by repeating tasks thousands of times in simulated browsers and software. Companies believe that this interactive training will create AI agents capable of performing real-world tasks.
“Few understand how large the opportunity around RL environments truly is,” Brendan Foody, CEO of Mercor, said in a TechCrunch interview.
Major labs keep their systems private while startups take different paths. Prime Intellect launched an open-source platform where developers worldwide can build and share environments for bounties. CEO Vincent Weisser aims to democratize access to technology that major companies closely guard.
The money has attracted talent and skeptics alike. Scale AI pivoted to environments after losing contracts with Google and OpenAI. Surge created a new internal division focused entirely on the technology.
Former Meta AI research lead Ross Taylor sees problems ahead. “I think people are underestimating how difficult it is to scale environments. Even the best publicly available RL environments typically don’t work without serious modification,” he told TechCrunch.
Andrej Karpathy, former Tesla AI lead, invested in the space but posted doubts on X in August: “I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically.”
OpenAI’s API engineering lead, Sherwin Wu, expressed sharper skepticism on a TechCrunch podcast. “I’m short on RL environment startups… It’s a very competitive space, and AI research is evolving so quickly that it’s hard to serve AI labs well,” he said.
The race reflects a larger question of whether open platforms, like Prime Intellect’s Environments Hub, will democratize advanced AI training or if major labs with billion-dollar budgets will maintain control.
Engineers choosing between a startup’s $500,000 salary and a tech giant’s resources face the same dilemma. Their decisions could shape whether AI agents graduate from chatbots to reliable digital workers.
The technology remains challenged. Critics point to fundamental challenges like reward hacking, where agents find unintended shortcuts. Supporters argue that interactive training will unlock capabilities that text-based learning cannot achieve.
September 2025 saw multiple platform launches and funding announcements. The momentum suggests both sides are betting big, even as experts debate whether reinforcement learning can deliver on its promises.