Exponential growth of computation in the training of notable AI systems Computation is measured in total petaFLOP, which is 10¹ floating-point operations ¹.
Training computation (petaFLOP; plotted on a logarithmic axis)
GPT-4
10 billion
4.3x/year between 2010–2025
100 million
1 million
AlexNet Transformer (2017) GPT-1
10,000
100
1.5x/year between 1950–2010
1
0.01
0.0001
0.000001
0.00000001
Perceptron Mark I
Powering the AI Revolution: Building the Infrastructure That Makes Intelligence Possible By Manja Thessin
0.0000000001
Jul 2, 1950
<1e-11
Jul 2, 1950 Theseus
Apr 19, 1965
Dec 27, 1978
Sep 4, 1992
May 14, 2006
Jan 21, 2020
Time
FIGURE 1 : This graph from Our World in Data illustrates the exponential growth of computational power used to train notable AI systems since 2010. Source: Our World in Data Data source: Epoch AI (2025) Note: Estimated from AI literature, accurate within a factor of 2, or 5 for recent models like GPT-4. The regression lines show a sharp rise in computation since 2010, driven by the success of deep learning methods that leverage neural networks and massive datasets. OurWorldinData.org/artificial-intelligence | CC BY 1. Floating-point operation A floating-point operation (FLOP) is a type of computer operation. One FLOP represents a single arithmetic operation involving floating-point numbers, such as addition, subtraction, multiplication, or division.
AI is rapidly transforming the physical structure of data centers. Instead of designing facilities around servers and processors, it is the movement of data and the delivery of energy that are the primary concerns. Chip performance is no longer the bottleneck. It is the network that connects those chips, the optical fiber HYPERGROWTH BY THE NUMBERS Since 2010, AI compute demand has grown by a factor of 4.6 every year (Figure 1). This eliminates casual capacity planning. Hyperscale campuses are being designed from the ground up as AI factories with industrial-scale footprints. Larger models, longer training cycles, and a relentless need for parallelism have become the defining characteristics of modern workloads. If an organization is not bracing to scale, it risks being left behind. Modern accelerators are outrunning our ability to power, cool, and connect them. The most publicized example is the Stargate project, projected at 1.2 gigawatts (GW) of load. That is one campus. Every major operator has facilities of similar ambition on the roadmap. The new constraints are painfully
systems that carry their traffic, and the power and cooling systems that keep everything running within the limits of physics. Above all, it is infrastructure that is strained far beyond early expectations.
On the data movement side, the traffic pattern has flipped. In legacy data centers, the focus was on north- south flows, while today, east-west traffic dominates (Figure 2). When training a model across thousands of GPUs, the real bottleneck is not compute capacity—it is how efficiently those GPUs can exchange, synchronize, and update data with each other inside the cluster. Multi-terabit exchanges of model weights and gradients occur, nonstop, across the internal fabric. Latency—especially unpredictable, high-percentile (“tail”) latency—directly impacts performance and utilization rates. At hyperscale, even a single millisecond of extra delay cascades through thousands of nodes. Meta and others report that 30–40 percent of AI system time can be lost purely to network wait states. AI clusters look less like conventional data centers and more like dense, ultra-connected fabrics where horizontal bandwidth and deterministic low-latency matter most.
That is why optical fiber interconnects have become indispensable. Higher east-west capacity directly boosts utilization and model convergence. This also means network infrastructure faces new challenges: managing congestion, minimizing optical fiber losses, and ensuring reliability as optical fiber densities and switch radix continue to scale. POWER AS STRATEGY Energy is now a strategic asset for AI data centers, not just a utility cost. Major operators sign power purchase agreements for entire renewable farms, co-fund grid upgrades, and reserve transformer production years ahead. Securing clean, reliable power has become a competitive advantage. Data centers will consume approximately 536 terawatt hours (TWh) in 2025, approximately 2 percent of global electricity (Figure 3). That number is expected to double to 1,065 TWh by 2030. AI will drive this surge: training models like GPT-4 require more than 30 megawatts (MW)
concrete: permits, transmission lines, optical fiber routes, transformer lead times, and thermal budgets.
A CATALYST FOR CONVERSION AI has fused disciplines that the industry managed in separate silos for years. Compute, data movement, and energy must now be designed as an integrated system. Performance now depends on balance across all three. On the compute side, density sets the tone. Racks packed with GPUs and custom accelerators are no longer outliers. A rack pulling 150 kilowatts (kW) is a standard deployment, with roadmaps pushing beyond 200. That thermal load alters air flow, floor loading, service access, and the layout of power distribution.
I
I
30
ICT TODAY
January/February/March 2026
31
Made with FlippingBook - Online catalogs