Foreign media reports that AI programming tools have gone from being an "optional" to a "default configuration" in development teams, but optimistic expectations surrounding efficiency improvements are being pulled back to reality by more cost and quality issues. Multiple studies and corporate case studies show that while AI can indeed speed up code writing, it may not necessarily reduce subsequent rework.
Developers are no longer willing to leave their AI work.
In February of this year, AI research firm METR disclosed that researchers originally wanted to repeat an experiment on programming efficiency, comparing the difference between developers writing code by hand and completing tasks with the help of AI, but encountered resistance in the process: many developers were even unwilling to temporarily give up AI tools for the experiment.
METR conducted similar tests in 2025. Participants generally felt they were more efficient, but the actual measurement results were the opposite: although code generation was faster, developers had to spend more time waiting for model output, correcting errors, and repeatedly guiding the tool to complete the task.
Finding it difficult to engage developers without AI, METR later released a survey allowing tech employees to assess the benefits of AI themselves. Respondents generally believed that AI doubled the value of their work.
Enterprises are beginning to re-evaluate their AI investments.
The article points out that such judgments of "feeling more efficient" are being tested by corporate spending and actual output. Since 2026, Silicon Valley has seen a trend of using token consumption to measure the intensity of AI use, even treating it as a proxy for productivity, but this approach has already shown clear backlash.
The Financial Times reported this week that Amazon has shut down its internal token leaderboard, Kirorank, because employees were using excessive AI agents to "manipulate" the rankings, driving up costs without corresponding improvements in output.
The Information reported that Uber had already used up its entire year's AI budget in the first four months of 2026. Uber's Chief Operating Officer, Andrew Macdonald, recently stated on a podcast that such spending has not yet resulted in quantifiable project growth or productivity increases.
Writing code faster does not mean less maintenance.
The article argues that the bigger problem lies in code maintenance. Programmer and author James Shore recently pointed out in a widely circulated blog post that if coding speed doubles but maintenance costs don't decrease proportionally, then the team has simply traded short-term speed gains for long-term burdens.
Numerous data points have emerged in the market regarding this. Aiswarya Sankar, founder of reliability engineering startup Entelligence AI, stated that approximately 44% of enterprise token consumption is used to fix AI-generated defects. Code Rabbit, a code review tool company, also claims that its analysis of open-source project pull requests shows that AI-generated code contains 1.7 times more issues than human-generated code.
While this data comes from the relevant service providers and is clearly biased, independent research has also raised similar concerns. A report released in April this year by researchers at Singapore Management University stated that AI-generated code could impose long-term maintenance costs on real-world software projects.
Researchers suggest managing AI as a "junior developer".
Regarding how to address this, the article mentions that some AI programming agency vendors advocate continuing to use more AI to fix AI-generated problems. Scott Wu, founder of Cognition, the developer of the AI programming agency Devin, holds this view.
However, he also acknowledged that while Devin can complete some tasks independently, his current abilities are roughly between junior and intermediate programmer level, depending on the task type. This means that the development team cannot yet completely delegate the work to an agent and leave it to them.
In contrast, researchers at Singapore Management University suggest a greater emphasis on human oversight: developers need to understand the boundaries of tasks that AI excels at and struggles with, establish quality assurance processes for AI output, and review model-generated results as if they were reviewing code from a junior engineer.
The article concludes by pointing out that human developers remain the primary decision-makers in high-level tasks such as software architecture and security design, a point that is generally agreed upon even by practitioners who support AI agents.











