Machine Learning-Based GPU Energy Prediction for Workload Management in Datacenters

Published in 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), 2025


Abstract

Data centers (DCs) have become a vital component of a digital economy, accounting for approximately 1% of worldwide energy consumption, as the reliance on cloud services and GPU hardware increases for high-performance computing (HPC) tasks. Despite various strategies for improving energy efficiency, there is a lack of research integrating real-world workload traces into the prediction of GPU power consumption, mainly due to the lack of publicly available data. This research addresses this gap by utilizing synthetic data generated to mimic the GPU intensity of real-world workload characteristics and developing GPU power prediction models for integration into the GPUCloudSim Plus simulator. We conducted a statistical analysis of Alibaba and Helios workload traces and ran experiments with diverse GPU-intensive tasks to emulate real-world workloads. Using task-average metrics, we trained four machine learning (ML) models to predict GPU power consumption, with the best performing XGBoost model yielding an RMSE of 1.217. We propose this modeling design to integrate a GPU power prediction model into the GPUCloudSim Plus simulator. Energy-aware simulations for GPU-intensive workloads could be verified on workloads from real data centers, advancing the study of workload management.

View Conference Slides

Recommended citation: B. Ismalej, M. Smith, and X. Jiang. (2025). "Machine Learning-Based GPU Energy Prediction for Workload Management in Datacenters." IEEE Computing and Communication Workshop and Conference (CCWC).