Foreign media commentators believe that the AI industry is currently facing not only the problem of expensive chips, but also the ever-increasing token consumption during the inference phase, which is further raising the cost of use for enterprises. This dual pressure is spreading from model companies to the broader economic sphere.
The focus of demand shifts to reasoning
In the past few years, the demand for AI chips has been primarily driven by model training. Now, the focus is gradually shifting to inference. With the increase in agent-based applications, a single task is no longer a matter of asking and answering a question, but rather broken down into multiple steps, significantly increasing computational demands.
The article cites Goldman Sachs' forecast that global token consumption could reach 120 quadrillion per month by 2030, a significant increase from current levels. Meanwhile, chips will require regular updates to maintain cost competitiveness, putting pressure on demand from both new deployments and equipment replacements.
Corporate budget pressures have emerged.
The article mentions that Microsoft recently canceled most of its Claude Code direct signing licenses because employees' use of AI tools was so extensive that the computing costs had exceeded some human resource costs. Uber has also been accused of using up its 2026 AI coding tool budget within four months.
Gartner also warns that even if inference costs decrease by 90%, the total cost of AI for enterprises may not decrease proportionally. This is because intelligent agent models consume more tokens per task, and service providers may not pass on all cost reductions to customers. The article argues that enterprises may find that the productivity gains brought by AI are not as cheap as imagined.
High leverage may amplify risks.
The article argues that chip supply is unlikely to keep pace with demand in the short term. Building new wafer fabs typically requires billions of dollars in investment and has a long construction period. Meanwhile, more advanced chips involve more manufacturing steps, higher material and process costs, and inflation, geopolitical tensions, and trade frictions further drive up prices.
Against this backdrop, AI companies have adopted a structure of cross-investment, capacity commitments, and debt financing to support capital expenditures. The article states that if these loans are secured by existing chips, and the company's revenue growth falls short of expectations, defaults or tightened financing by creditors could lead to a flood of old chips into the market, further depressing the value of the collateral.
The article argues that this risk extends beyond individual companies. If there are deep exposures between private lending, special purpose vehicles, and the broader financial system, the pressure could spread. The core issue is that the revenue expectations supporting substantial chip investments are based on the premise that companies continue to expand their use of AI, a premise that is already showing signs of slowing.
The article proposes three types of responses.
The article argues that alleviating pressure should begin with reducing the intensity of demand, including improving the efficiency of algorithms, software, and hardware. It mentions that DeepSeek has demonstrated that algorithm optimization can significantly reduce computing power requirements.
Secondly, it is necessary to expand chip production capacity and more broadly distribute the costs and risks of factory construction within the supply chain. Thirdly, it is crucial to examine the impact of tariffs, export restrictions, and financial regulations on chip prices and financing transparency to avoid the simultaneous accumulation of high costs and high leverage.
Overall, the core judgment of this commentary is that the commercialization bottleneck of AI may not first appear in model capabilities, but rather in computing power costs and token expenses. If companies begin to actively limit usage, the industry's previously established growth expectations will be tested.












