July 8, 2025
Small changes to AI LLMs could cut energy use by 90 percent, claims UNESCO report
A new report from UNESCO and University College London claims that relatively minor changes in the way large language models (LLMs) are built and used can reduce their energy consumption by up to 90 percent without compromising performance. The report highlights the growing energy demands of generative AI systems and calls for a shift in approach to make them more sustainable. According to UNESCO, the annual energy footprint of generative AI is already equivalent to that of a low-income country, and continues to rise.
UNESCO’s Assistant Director-General for Communication and Information, Tawfik Jelassi, said: “Generative AI’s annual energy footprint is already equivalent to that of a low-income country, and it is growing exponentially. To make AI more sustainable, we need a paradigm shift in how we use it, and we must educate consumers about what they can do to reduce their environmental impact.”
In 2021, all member states adopted the UNESCO Recommendation on the Ethics of AI, which includes guidance on reducing environmental impact. The organisation is now encouraging governments and industry to invest in research that prioritises energy efficiency, as well as efforts to improve public understanding of the environmental cost of AI.
The report includes findings from a team of computer scientists at UCL who conducted experiments on various open-source large language models. They identified three main strategies for reducing energy use. The first is the use of smaller models that are designed for specific tasks. These models, the report says, can match the performance of larger general-purpose systems while using significantly less energy. One design approach discussed is the ‘mixture of experts’ system, in which only the necessary specialist models are activated depending on the task.
The second strategy involves shortening prompts and responses. According to the report, doing so can reduce energy use by more than 50 percent. The third technique is model compression, including methods such as quantisation, which can cut energy use by up to 44 percent while maintaining performance.
The report also points to the wider implications of this approach for access to AI in low-resource settings. Most of the infrastructure required for AI remains concentrated in high-income countries. According to the International Telecommunication Union, only 5 percent of Africa’s AI workforce currently has access to the computing power needed to build or use generative AI tools. Smaller and more efficient models are seen as one way to make AI more accessible in regions where energy and connectivity are limited.