Member News

Neologic | Server Processors in the AI Era: Can They Go Greener?

The more power-efficient they get, the more the data center’s workload pulls them back to a more distant starting point.

“Just when I thought I was out, they pull me back in,” Michael Corleone (Al Pacino) says in “The Godfather Part III.” Much the same might be said of server processors: The more powerful and power-efficient they get, the more the data center’s workload pulls them back to a more distant starting point.

As data centers continue to expand in scale, complexity and connectivity, their power consumption increases as well. According to the International Energy Agency, data centers and data transmission networks are responsible for 1% of energy-related greenhouse gas emissions. The estimated global data center electricity consumption in 2022 was 240 TWh to 340 TWh, or about 1% to 1.3% of global electricity consumption, excluding energy that was spent for cryptocurrency mining.1 According to some sources, it reaches 3% and tops industries like aviation, shipping, and food and tobacco.

Despite great efforts to improve processors’ efficiency, the rapid growth of AI workloads has resulted in a substantial increase in energy consumption over the past decade, growing by 20% to 40% annually. The combined electricity consumption of the Amazon, Microsoft, Google and Meta clouds has more than doubled between 2017 and 2021, rising to about 72 TWh in 2021.1

The current major AI workloads in data centers are deep learning, machine learning, computer vision and streaming video, recommender systems, and natural-language processing, a recent addition. AI tasks are computing power hogs, and large language models are especially demanding. Google’s PaLM language model is relatively efficient. However, its training required 2.5 billion petaFLOPS of computation; that is, its training is more than 5 million times more computation-intensive than AlexNet, the convolutional neural network that was introduced in 2012 for machine-vision tasks, heralding the AI era.2

According to informal sources, OpenAI’s GPT-2, introduced in 2019, was trained on 300 million tokens of text data and had 1.5 billion parameters. OpenAI’s GPT-3, also known as ChatGPT, was trained on about 400 billion tokens of text data and had 175 billion parameters. The details of the recent ChatGPT model, GPT-4, have not been publicly disclosed, but estimates of its size range from 400 billion to 1 trillion parameters and a humongous dataset of about 8 trillion text tokens for training.3 Put another way, the workload of training GPT-3 is about 150,000× as much as GPT-2’s, and training GPT-4 requires about 50× to 120× more computing than GPT-3. OpenAI has also capped the number of messages that users can send to GPT-4 because inference puts a strain on compute resources.4

Most AI tasks’ workloads are associated with arithmetic operations (typically matrix-matrix or matrix-vector multiplication), either on training or inference (apart from data fetching). The computational intensity of training an AI model equals the product of the training time, number of computing instances used, peak FLOPS and utilization rate. Therefore, power consumption is linearly dependent on time (training or inference), the number of parallel computing instances (CPU, GPU, TPU, AI accelerator and the like), the computing power of an instance (e.g., FLOPS) and the utilization rate (i.e., the fraction of the time a GPU is running tasks when the model is trained).

Figure 1 illustrates the power breakdown of a typical GPU,5 in which the cores consume about 50% of the total power, and off-chip memory and memory controller consume the remaining 50% (the breakdown is similar for CPUs).

Typical power breakdown of a GPU.
Figure 1: Typical power breakdown of a GPU: cores (50%), memory controller (20%) and DRAM (30%) (Source: Zhao et al., 20135)

Therefore, the server processor power consumption is significant. According to a report from infrastructure provider Vertiv, 1 W of power saved at the server processor translates into a total savings of 2.84 W across the data center.6 Figure 2 illustrates how a 1-W power savings at the processor affects different components of the data center and leads to about 3× power savings in total. For example, 1 W saved at the processor leads to a 0.18-W savings in DC/DC power conversion, 0.31 W in AC/DC power conversion and so forth. Notably, 1.07 W of cooling power is saved for every watt of processor power saved.

Power savings accumulation through the data center units resulting from a 1-W power savings at the processor.
Figure 2: Power savings accumulation through the data center units resulting from a 1-W power savings at the processor. From left to right, the bars plot the accumulated power consumption savings across the data center. (Source: Vertiv, 20236)

The current average rack density is about 10 kW per rack.7 Therefore, a savings of even a few percentage points at the server processor can translate into significant power savings at the rack level and positively affect both electricity usage and greenhouse gas emissions.

While the advancement of processors’ performance is power-limited, other factors can’t be ignored. Usually, we seek to maximize the metric shown in the following equation:

Performance ÷ (Cost × Area × Power)

where performance is the number of floating-point or integer operations per second and where die cost, area (mm2, reticle-size limited) and power (W) are self-explanatory. From an environmental perspective, the primary objective is to maximize the performance-to-power ratio. Improving performance in parallel to reducing power consumption is usually a conflicting requirement. There is a vicious circle that works against maximizing this equation, as its variables are interrelated via their mutual dependence on the transistor density (technology node), transistor count, memory size, clock frequency, driving voltage, number of cores and threads, wafer yield and more.

We are already practicing power management techniques, such as voltage-frequency scaling and clock gating. Logic synthesis and physical design optimization can also come to the rescue. So what’s next?

Gate-all-around technology will increase the driving current per area in tandem with better channel control that reduces static (leakage) power. Dropping the nominal core voltage (VDD) to 0.65 V would further save on dynamic power (compared with FinFET-based processors). Novel advanced logic approaches are emerging, such as Quasi-CMOS, in which a modified circuit topology enables a significant improvement in the performance-to-power ratio.8

Furthermore, the best possible performance-to-power ratio for a specific application may not be attainable via a general-purpose processor but rather via a special processor architecture that is specifically designed to maximize the ratio of performance to power consumption. Full-custom ASIC processors are already deployed for video and recommender systems as well as AI training and inference; this trend will increase and allow for squeezing more performance/power juice. In this context, application-specific instruction set processors (ASIPs), in which the instruction set architecture is optimally tailored for a specific application, are a viable solution for additional environmental gains.

Packaging technologies like chiplets, in which a big processor is split into smaller dice to gain lower cost and better yield, allow for improved power management and are already being employed. Chiplets make it possible to increase the effective die area considerably beyond the reticle size, and thousands of small cores have been demonstrated that, in combination with intelligent energy management, make it possible to achieve an optimal performance-to-power ratio.

While all of these directions fall within the realm of processor design, new algorithms for redefining AI models so that they require less computing power and memory bandwidth should also be considered, as has happened in convolutional neural networks (i.e., compression).

References

1Rozite et al. (July 11, 2023). “Data Centres and Data Transmission Networks.” International Energy Agency.
2Roser, M. (Dec. 6, 2022). “The brief history of artificial intelligence: The world has changed fast – what might be next?” Our World in Data.
3Bastian, M. (March 25, 2023). “GPT-4 has more than a trillion parameters.” The Decoder.
4Hines, K. (July 19, 2023). “OpenAI Increases GPT-4 Message Cap To 50 For ChatGPT Plus Users.” Search Engine Journal.
5Zhao et al. (Dec. 1, 2013). “Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface.” ACM Transactions on Architecture and Code Optimization, 10(4), pp. 1–25.
6Vertiv. (2023). “Energy Logic: Calculating and Prioritizing Your Data Center IT Efficiency Actions.”
7CoreSite. (2023). “Facing the Data Center Power Density Challenge.”
8NeoLogic. “Powering the Next Generation Processors: A New VLSI Design Paradigm.”

 Dr. Avi Messica, CEO & Co-Founder, Neologic CEO & Co-Founder, Neologic | First published in EE Times Europe.

Neologic is a member of the EACCNY