Text to video GenAI will help drive energy consumption of OpenAI to same level as India

A new study has raised concerns about the energy consumption of text-to-video (T2V) models, warning that the technology could become one of the most power-intensive forms of artificial intelligence yet developedA new study has raised concerns about the energy consumption of text-to-video (T2V) models, warning that the technology could become one of the most power-intensive forms of artificial intelligence yet developed. The research coincides with reports from within OpenAI that the firm’s energy consumption will match the current energy use of India within 8 years. The paper, Video Killed the Energy Budget: Characterizing the Latency and Power Regimes of Open Text-to-Video Models, published on the open research platform arXiv, presents a detailed analysis of the power usage of advanced open-source video generation systems. Conducted by researchers Julien Delavande, Régis Pierrard and Sasha Luccioni, the study benchmarks several models and quantifies the way in which video generation consumes energy.

The findings suggest that energy use rises linearly with the number of denoising steps and quadratically with both spatial and temporal resolution. In practical terms, higher quality video, longer in duration and larger in resolution, results in disproportionately greater power consumption.

The team tested six widely used open-source T2V models, including the WAN2.1-T2V system, under standard inference settings. Using both empirical measurements and an analytical compute-bound model, they established power curves that show how video generation scales as output quality increases.

The study reports that moving from shorter, low-resolution clips to longer, higher-resolution outputs results in exponential growth in energy demand. The authors state that these measurements provide a benchmark for assessing the sustainability of generative video systems and call for energy reporting to become a standard part of model evaluation.

The research notes that applications of text-to-video generation include marketing, training and internal communications, which may be affected by the cost and environmental impact of running such models at scale.

Separate reporting has highlighted the scale of energy use anticipated from large commercial AI developers. OpenAI is reportedly planning a 1 gigawatt data centre in India as part of its wider “Stargate” infrastructure programme. The company is in the process of registering a local entity and is expected to announce further details during a forthcoming visit by chief executive Sam Altman.

One analysis based on statement’s from Altman suggests that within eight years, OpenAI’s energy consumption could approach half of India’s entire installed power generation capacity. While such projections rely on extrapolation and remain speculative, they illustrate the potential scale of demand created by large-scale generative AI deployments.

The authors of the arXiv paper state that their work is intended to guide future design choices for more efficient models. By identifying the specific parameters that drive power use, they argue, researchers and developers may be able to create systems that balance output quality with sustainability. They also highlight the importance of transparency, calling for model creators to publish energy and latency metrics alongside more familiar measures such as accuracy or output quality.

The study claims to be among the first to provide detailed measurements of energy usage for text-to-video models and outlines scaling behaviours that will inform future technical development. At the same time, reports of large-scale data centre projects by commercial AI firms indicate the rapid growth of infrastructure to support these systems.