Image source: Financial Times
In the past two years, the AI industry’s primary competitive focus has been on “training”—the race to build the most powerful large-scale models. The continuous evolution from GPT-4 to multimodal architectures has centered on pushing the limits of model capabilities.
However, at NVIDIA GTC 2026, Jensen Huang made it clear: the core arena for AI is shifting from Training to Inference.
This transformation reflects a new business dynamic: training is a one-time investment, while inference creates ongoing demand.
Specifically:
As a result, AI is evolving from a technology-driven industry to a demand-driven one, shifting from capital expenditures (CapEx) to recurring revenue.
The statement “data centers are Token factories” is not just marketing—it marks a new industrial paradigm. In the traditional internet era:
In the AI era, this logic is fundamentally restructured:
This shift gives data centers, for the first time, the characteristics of production units.
A complete closed loop emerges: Compute investment → Inference computation → Token generation → Revenue realization
Within this framework, NVIDIA’s “AI Factory” concept redefines AI infrastructure using industrial principles:
In other words, data centers have evolved from server clusters into “power plants” or “manufacturing facilities.”
The production function in the AI era can be expressed as:

Revenue = Tokens × Price, Cost = Compute Cost
Thus, profit simplifies to Profit = Tokens × (Price - Cost per Token)
This model drives three key shifts:
The anticipated surge in inference demand stems from three structural changes:
From simple generation to complex reasoning:
Each invocation now incurs significantly higher computational costs.
AI is shifting from short text processing to:
This dramatically increases computational requirements.
AI Agents can:
As a result, AI’s compute demand shifts from linear to exponential growth.
At NVIDIA GTC 2026, NVIDIA also implicitly introduced a stratified AI service model, essentially tiered pricing for compute resources.
This system mirrors the layered approach of cloud computing:
Different scenarios command different Token prices:
Ultimately, the decisive factor is: Who can produce Tokens at the lowest cost and sell at the highest price.
Jensen Huang projects that by 2027, the AI chip and infrastructure market could reach $1 trillion.
The core takeaway is that AI is becoming infrastructure—on par with:
This trend will drive three major changes:
Capital will flow from the application layer back to core infrastructure:
New central players will include:
AI is no longer just a software issue—it now involves:
If Tokens are products, Agents are the “demand generators.” In the traditional internet, users created demand; in the AI era:
Agents themselves generate demand. For example:
This marks the first emergence of non-human demand entities in the AI economy. Thus, the scale of Agents sets the upper limit for inference demand.
This is why AI competition is shifting rapidly toward:
While the “Token Factory” narrative is compelling, significant market concerns remain.
If Token prices decline, profit margins will be squeezed.
Many AI applications are still experimental.
These factors could undermine the long-term stability of the Token economy.
Abstracting the current trend reveals a key analogy:
This structure closely mirrors the industrial production systems of the Industrial Revolution. It signals AI’s transition from a software industry to a compute-driven industrial system.
At NVIDIA GTC 2026, Jensen Huang’s “Token Factory” concept is not just a metaphor—it redefines the fundamental logic of the AI industry:
With the rise of the Agent economy and surging inference demand, the AI infrastructure market is on track to reach a trillion-dollar scale.
If this trend continues, future business competition will be less about products or user numbers—and more about who can produce Tokens most efficiently.





