Watt Counts: Towards Energy-Efficient Large Language Model Inference

Watt Counts: This project addresses the rapidly growing energy consumption of LLM (Large Language Models) inference for SMEs. The goal is to develop a benchmark, a dataset, and a model to optimize energy efficiency, cost, and result quality.

Key data

Description

While the high energy requirement for training Large Language Models (LLMs) is widely acknowledged, the energy footprint of their inference is often underestimated. With the increasing local deployment of LLMs for inference by small and medium-sized enterprises (SMEs), the overall energy consumption is growing rapidly, leading to negative ecological and economic consequences, including rising carbon emissions and higher energy prices. Our research shows that GPUs are the main energy consumers, accounting for up to 90% of total system power.

The central challenge, which forms the background of the project, is the question: which combination of model, quantization, and inference framework offers the best trade-off between computing costs, energy consumption, and result quality for a specific use case and available hardware?

The objective of the project is to support researchers, developers, and organizations in overcoming this challenge and thus contribute to increasing the energy efficiency of LLM inference.

The approach is divided into three main work packages (WPs):

Development of a comprehensive LLM Inference Energy Benchmark (WP1).
Creation of an open dataset on energy consumption and performance measurements (WP2).
Development of a novel predictive model (WP3) that enables informed decisions for optimizing LLM inference.