Back to insights

#ai

The ecological impact of AI: What's going on behind the scenes?

Length:

15 min

Published:

June 9, 2025

The ecological impact of AI: What's going on behind the scenes?

Every time we use artificial intelligence, whether to generate or process text, images, or video, we run energy-intensive processes in the background. They affect the consumption of electricity and even water, which computing devices and their cooling need directly and indirectly.

To use AI more responsibly, we have to understand what goes on behind the scenes. Then we realise not only that AI is not an infallible source of truth, but also that it has an impact on the environment.

What happens behind a prompt

Our interaction with AI models depends heavily on resources. But how exactly does our messaging with these models consume electricity, and what's more, water? The answer lies in a few key phases.

Training the model

Before a model is ready to generate responses, it goes through a demanding training process.

Large language models (LLM) learn through self-supervised learning. They are not trained on data with explicit labels but learn to predict the next word from the previous context. This lets them efficiently use large unstructured texts.

The base architecture of these models is the transformer, which processes sequences efficiently and captures long-term dependencies in text. During training, the model works through texts, predicts the next words, and adjusts its internal parameters based on its errors. This process runs on large-scale data that includes texts from the internet, books, code, and other sources.

Two things are crucial for effective machine learning: good data and enough of it. With those, models learn to mimic human interaction by predicting which word might follow, based on the previous context they hold in their memory, the context window.

Environmental impact

Training is intensive not only on energy for computing but also on water. Training the GPT-3 model in Microsoft's data centres, for example, consumed roughly 700,000 litres of drinking water in the form of steam emissions. That amount of water matches, for example, the consumption needed to produce roughly 320 Tesla electric cars.

This water consumption also comes from the need to keep servers in data centres at an optimal temperature. Cooling is often done with water systems, where water absorbs the heat the servers generate and then evaporates in cooling towers. It is an energy-intensive process, and it raises sustainability concerns as we use AI more and more.

Hosting the model

The size of a large language model (LLM) is set by the number of parameters, the neurons, which you can think of as weights in a neural network. These parameters shape how well the model captures complex linguistic patterns and relationships. In general, the more parameters a model has, the better it understands and generates natural language.

Small models, for example with a few billion parameters, run locally on computers. They suit applications where security, response speed, and lower hardware requirements matter.

The largest and most powerful models, such as LLaMA 3.1 with 405 billion parameters, need specialised hardware and infrastructure. Running them requires servers with high VRAM capacity and powerful GPUs. Running LLaMA 3.1 in fp16 mode (weight precision, floating point), for example, requires roughly 972 GB VRAM.

These large-scale models are usually hosted in data centres that provide the computing power and infrastructure to run them. Cloud platforms such as Google Cloud Platform or Microsoft Azure offer services for deploying and managing these models, including performance optimisation and scalability.

Input processing

Once you enter a prompt, the request immediately goes to the data centre, where dedicated hardware takes it over. Each step of input processing and output generation by large language models (LLMs) adds to the overall energy consumption.

Tokenization. The text input is split into smaller units called tokens. This process needs computing power, but its energy consumption is relatively low compared to the other steps.
Conversion to embeddings. Each token becomes a numeric vector (embedding) that captures its semantic meaning. This step involves matrix operations and runs on specialised hardware, which adds to energy consumption.
Model processing (inference). The model architecture, typically based on transformers, processes the embeddings further. This step is the most energy intensive, especially for large models with billions of parameters.
Predicting the next token. The model calculates the probabilities of the following tokens and generates the output. The energy cost of this step depends on the length of the generated text and the complexity of the model. Models with about 7 billion parameters consume roughly 3-4 Joules per token.
Shaping the output with parameters. Parameters like "temperature" change the nature of the output (deterministic vs. random). They don't directly affect power consumption, but they can change the length and complexity of the text, which indirectly changes the energy cost.

Complexity of the calculations

The larger and more complex the model, the more power it needs. Each step, from tokenisation to output generation, runs on dedicated hardware, often on a GPU like the NVIDIA H100.

How demanding are these calculations?

A single NVIDIA H100 (SXM) GPU can consume up to 700 watts. In data centres, these GPUs plug into racks that consume tens of kilowatts. To give you an idea, with 700 Wh of energy you can run:

| Equipment / Activity | Consumption | Operating time with 700 Wh | | --- | --- | --- | | NVIDIA H100 GPU (SXM) | 700 W | 1 hour | | LED bulb | 10 W | 70 hours (approx. 3 days) | | Charging your smartphone | 15 Wh per charge | Approximately 46 charges |

Heat generation and cooling

Running an LLM generates a significant amount of heat. Data centres therefore use advanced cooling systems that consume additional energy as well as water. Cooling can account for up to 30-40% of a data centre's total power consumption. The average water consumption for cooling can reach 1.8 litres for each kWh of energy. Some data centres can therefore consume 11-19 million litres of water a day, which matches the consumption of a city of 30,000-50,000 inhabitants.

Water and energy consumption per prompt

A single prompt may consume little, but with billions of queries a day, the cumulative impact grows sharply. A University of California, Riverside study reports that processing 5 to 50 AI prompts consumes up to 0.5 litres of water (mostly for cooling).

Inference of different models on a shorter query

GPT-4o: ~0.421 Wh per query.
Claude 3.7 Sonnet: ~0.836 Wh per query.
GPT-4.1 nano: ~0.103 Wh per query.
DeepSeek-R1 and o3: ~23.82 Wh per query.

As a rule, reasoning models are far more energy intensive than classical models. You can see it with DeepSeek, where the V3 version is roughly 8 times more efficient than R1, much like the OpenAI o3 model compared to GPT-4.1. For a more detailed comparison of energy consumption, I recommend reading the following article.

In the image below, you can see far higher token usage by the reasoning model R1 compared to the classical models.

Estimated energy consumption and CO2 emissions per text query (indicative)

| Task type | Energy consumption / query | CO2e emissions / query | Comparison of consumption | | --- | --- | --- | --- | | Simple search | ~0.300 Wh | ~0.2 g | Power an LED bulb (10 W) for ~2 minutes | | ChatGPT (e.g. GPT-4o) | ~0.421 Wh | ~0.3 g | Power an LED bulb for ~2.5-3 minutes | | More energy-intensive models (o3, DeepSeek) | ~23.82 Wh | ~23 g | Charge a phone ~3-4 times |

Cumulative impact at 1 billion queries per day

Energy: roughly 430 MWh per day, which matches the consumption of about 14,000 households (average daily consumption of 30 kWh).

CO2 emissions: roughly 300 tonnes per day.

Water consumption: up to 10 million litres a day.

Search is getting more energy intensive

Integrating LLM models (large language models) into the digital tools we commonly meet online is becoming a modern trend. Most major search engines, including Google Search, now use AI models to summarise search results. This approach, though, combines the energy intensity of traditional search with the extra computational cost of running an LLM. You can read more about it here.

Unlike alternatives such as DuckDuckGo or Ecosia, Google currently doesn't let you fully turn these AI features off, so the energy footprint grows even for a simple search.

Summary

The direct link between our query and real resource consumption shows that every interaction with AI carries an energy cost, and therefore an ecological one. The International Energy Agency estimates that global electricity consumption by data centres will more than double by 2030, to roughly 945 terawatt hours (TWh), about as much as all of Japan consumes today.

Given these demands, two questions arise: how do data centres actually work, that they consume so much energy and water, and what can we do to minimise this footprint?

The first question, what data centres are, since they sit at the heart of all these processes, and what specific environmental impacts their operation carries, we address in the next part. We tackle the second question in the final article of this trilogy.

Want to stay one step ahead?

Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.