Cutting AI Costs

Cutting AI Costs: How IT Leaders Can Run LLMs Without Expensive GPUs

News 20 Feb 2025

When businesses adopt large language models (LLMs), including ChatGPT, IT leaders must address a significant challenge: balancing system performance with operational expenses. Implementing LLMs requires high-performance NVIDIA H100 GPUs with high price points and powerful power consumption requirements.

New discoveries in AI development show that performance can be maintained even without large GPU expenditures, which allows organizations to explore price reduction opportunities.

The Cost of Running AI: GPU vs. CPU vs. Cloud

Organizations ready to implement AI systems on their premises commonly choose Nvidia H100 GPUs, which contain 80GB of memory and use 350W of power. DeepSeek-R1, with 671 billion parameters, requires at least ten H100 GPUs, which cost approximately $250,000 in hardware expenses.

Businesses seeking GPU capabilities should consider using cloud-based solutions through Azure and Google Cloud, which provide scalable GPU rental options. Users who purchase Azure H100 units at $27.17 per hour through a three-year contract will reduce their yearly expenses to $23,000. Under a long-term agreement with Google Cloud, customers can use Nvidia T4 GPUs to operate DeepSeek-R1 at a yearly expense of $13,000.

Skipping GPUs: A New Approach to AI Efficiency

AI inference does not require GPU devices because IT leaders can implement CPU operations instead. According to Matthew Carrigan, an ML engineer at Hugging Face, DeepSeek-R1 operates on dual AMD Epyc processors with 768GB memory and costs $6,000.

Businesses focused on spending budget over performance speed should consider using CPUs. They process between 6 and 8 tokens per second, which is slower than GPU capabilities.

Memory Optimization: The Key to Affordable AI

SambaNova System established a memory infrastructure breakthrough with the SN40L Reconfigurable Dataflow Unit (RDU), which powers a three-tiered memory system from its California-based headquarters. The innovative technology allows DeepSeek-R1 to require only one rack instead of 40, dramatically reducing operational expenses.

Saudi Telecom Company joined forces with SambaNova Systems to establish the country’s first sovereign AI cloud, demonstrating that economical AI solutions are currently operational across different industries.

The Future of AI Deployment: Smarter, Cheaper, and More Scalable

IT leaders can choose between price-competitive CPU arrangements, memory-based tiering approaches, and cloud platform alternatives to make AI deployment affordable. AI technology has entered a paradigm shift through innovative hardware selection methods and groundbreaking chip development, creating new opportunities for widespread adoption.

 

Suggested Read: 

SoftBank and OpenAI Establish Joint Venture to Advance AI Services in Japan

Rift Deepens Between Microsoft and OpenAI as Stargate Plans Unfold