OpenAI shared internal method to cut inference costs; could lower Cha…

OpenAI engineers have reportedly shared internally an optimisation technique that can cut artificial intelligence inference costs to less than half. Foreign media outlets including online publication Gigazine and The Information reported on Tuesday that the technique was mentioned inside OpenAI in early June and is already applied to some ChatGPT processing for guest users. Inference costs are operating expenses incurred each time an AI model generates a response to a user’s input. Applying it to ChatGPT guest users allowed the number of Nvidia GPUs to be reduced to about 200, the reports said. Industry analyst Edward Zitron estimated that OpenAI likely spent more than $5 billion on inference costs in the first half of 2025. The specific method was not disclosed. The only confirmed target is some processing for ChatGPT guest users.