AWS Announces Significant Price Reductions for NVIDIA GPU EC2 Instances

A Look at Jeff Barr’s Announcement on EC2 GPU Pricing

On June 5th, 2025, Jeff Barr, AWS’s Chief Evangelist, shared a major pricing update via his LinkedIn post. The announcement marks a significant shift in how AWS is approaching GPU compute costs, reducing prices by up to 45% across several high-performance GPU instance families. For enterprises scaling AI/ML, high-performance computing (HPC), or generative workloads, this update offers a timely opportunity to rethink both infrastructure design and cost optimization strategies.

The Pricing Change — Real Savings, Real Impact

To understand the scale of this update, let's examine the verified pricing reductions. The price reduction applies to On-Demand purchases beginning June 1 and to Savings Plan purchases effective after June 4.

Verified Price Reductions by Instance Type:

Instance Type	NVIDIA GPUs	On-Demand Reduction	EC2 Instance Savings Plans	Compute Savings Plans
P4d	A100	33%	31% (1yr) / 25% (3yr)	31% (1yr)
P4de	A100	33%	31% (1yr) / 25% (3yr)	31% (1yr)
P5	H100	44%	45% (3yr)	44% (1yr) / 25% (3yr)
P5en	H200	25%	26% (3yr)	25% (1yr)

Source: AWS Official Announcement, June 2025

To put these reductions in perspective, consider that the p5.48xlarge instance currently costs $3.8592 per hour. With the 44% reduction, this drops to approximately $2.16 per hour—a significant decrease for organizations running intensive AI workloads.

For a practical example: a large-scale AI training job requiring 1,000 hours on p5.48xlarge instances that previously cost $3,859 now costs approximately $2,160. That's $1,699 saved on a single training run—money that can fund additional experiments or AI initiatives.

What About the P6-B200?

The expansion of P6-B200 instances to Savings Plans represents a significant change in AWS's approach to its most advanced GPU offerings. These instances, powered by NVIDIA Blackwell B200 GPUs, deliver up to 2.5x the performance of H100s for large language model training and feature 192GB of HBM3e memory per GPU.

Previously available only through EC2 Capacity Blocks since their May 15, 2025, launch, P6-B200 instances required organizations to reserve entire capacity blocks—often requiring thousands of dollars upfront—making them accessible only to the largest enterprises. The Savings Plans expansion changes this approach completely.

Note -To optimize your cloud costs and compare instance pricing in detail, take a look at CloudOptimo’s Cost Calculator.

Technical Advantages: P6-B200 instances perform better in scenarios where H100s reach memory limitations. Training models with 70 B+ parameters, running inference on complex multimodal models, or processing large-scale distributed training workloads all benefit from the additional memory and compute capacity.

Savings Plans Amplify Cost Benefits

The new option to use Savings Plans with P6-B200 instances means customers can commit to 1- or 3-year usage in exchange for predictable, discounted rates, often up to 30% cheaper than On-Demand prices.

For example, if P6-B200 instances cost roughly $80/hr On-Demand, Savings Plans might lower that to about $56/hr, making these very large GPU clusters more financially accessible for continuous workloads.

Spot Instances Could Lower Costs Further

AWS Spot Instances often cost 60–70% less than On-Demand rates, but prices usually track the baseline. With these price cuts, spot prices for P4 and P5 GPUs should fall as well, making fault-tolerant, interruptible workloads even more cost-efficient.

Regional Expansion: Improving Deployment Options

In addition to lowering prices, AWS has expanded GPU instance availability across more global regions. This allows organizations to reduce latency, comply with regional data regulations, and improve workload distribution.

Local Deployment for Performance and Compliance

Deploying workloads in regions closer to users can significantly reduce latency. For example, deploying in Tokyo or Jakarta can improve response times for users in Asia compared to hosting in the United States.

Additionally, regional deployment simplifies data governance. European organizations can now train models using P4d instances in London while maintaining compliance with GDPR and avoiding cross-border data transfers.

Support for Disaster Recovery and Load Balancing

Expanded availability also improves fault tolerance. If capacity is unavailable in one region, workloads can be moved to another without major changes. This enables stronger disaster recovery strategies and helps maintain development continuity during peak periods.

Regional Pricing Differences

AWS pricing varies by region. Deploying in a lower-cost region without strict latency requirements can reduce expenses by an additional 10–15%. These differences can contribute to a more efficient cost structure.

Why Does This Matter for AI Workloads?

Developing AI, particularly when it involves training large models, is known to be very costly. The cost of GPU compute is often the biggest barrier. By lowering these prices, AWS effectively makes AI research, development, and deployment more accessible.

For data scientists and ML engineers, cheaper GPU hours mean you can run more training experiments, try different model designs, and iterate faster. The impact is clear: better models delivered sooner.

For example, modern transformer-based language models benefit greatly from the H100 GPUs on the P5 and P6 families, which support advanced features like FP8 precision. These features speed up training and reduce costs even more. Lower prices for these instances allow more organizations to utilize this technology without overspending.

Real Cost Comparison: Before vs. After Price Reductions

Let’s examine a practical example to understand the financial impact of AWS’s GPU price cuts. Consider the p5.48xlarge instance in the US East (N. Virginia) region, which features NVIDIA H100 GPUs.

Before the price reduction:
The On-Demand cost was approximately $6.52 per hour for the p5.48xlarge instance. For a training job running 100 hours on 6 nodes, the total cost would be:
6.52 USD/hour × 100 hours × 6 nodes = $3,912
After the price reduction (approximately 45% cut):
The hourly rate drops to about $3.59 per hour per node. Assuming the same workload on 6 nodes for 100 hours, the new cost is:
3.59 USD/hour × 100 hours × 6 nodes = $2,154
With Savings Plans:
By committing to a 1-year Savings Plan, the hourly rate could be reduced further to around $2.20 per hour. This would bring the cost down to:
2.20 USD/hour × 100 hours × 6 nodes = $1,320

Summary of Savings:
Switching to the new pricing and Savings Plans could reduce the cost of this training job from $3,912 to $1,320 — a total saving of approximately $2,592 per run.

These savings enable teams to run more frequent experiments, explore more complex models, or operate additional training clusters within the same budget.

Cost Efficiency in Practice: Monthly Savings and Planning Considerations

To understand the broader impact across ongoing projects, consider these simplified monthly cost comparisons:

Use Case	Previous Monthly Cost (On-Demand)	New Monthly Cost (After Price Cut)	Estimated Savings
24/7 LLM Training Cluster	$85,000	$59,500	$25,500
100-Hour Fine-Tuning Jobs	$12,500	$8,750	$3,750

What this means:

For teams running a large, always-on training cluster, monthly costs could decrease by over $25,000.
Even smaller, intermittent fine-tuning workloads benefit from substantial savings.

When you multiply these savings across multiple teams, projects, or regions, the total cost reduction becomes significant. This creates opportunities to invest in additional GPU resources, expand experimentation, or accelerate AI development.

Who Benefits Most?

Startups and Small Teams: Start experimenting with Spot Instances to test AI workloads cost-effectively. This allows rapid iteration with minimal upfront investment.
Large Enterprises: Schedule a detailed cost analysis of your current GPU usage. Explore migrating workloads to the newly discounted P4, P5, or P6-B200 instances and consider committing to Savings Plans for predictable long-term budgeting.
Research Institutions: Calculate potential savings to scale your experiments or increase training job frequency. Use the lower prices to explore more complex models or run additional trials within existing budgets.

Turning Cost Reductions into Practical Improvements

The recent price reductions for AWS GPU instances create new opportunities for efficiency, but realizing these benefits requires a structured implementation plan. Organizations can reduce costs and improve performance by aligning infrastructure changes with technical and business goals.

Step 1: Review Current Usage Patterns

Begin by analyzing current GPU usage. Many organizations continue to use P3 instances, which now cost more than newer P4 and P5 options. Migrating to newer instance families can lead to immediate savings of 15–25%.

Also, assess whether any workloads currently running on general-purpose instances would benefit from GPU acceleration. At the new price levels, GPUs may now be more cost-effective for development, testing, or model training.

Step 2: Plan and Execute Migration

Before migrating, test workloads on newer instance families to confirm compatibility with CUDA versions and memory requirements. For critical systems, consider using blue-green deployments to reduce the risk of downtime during transition.

It is also advisable to evaluate Savings Plans. A one-year plan based on baseline GPU usage can reduce costs further, especially for workloads that run continuously.

Step 3: Expand Capabilities Strategically

Once the migration is complete, review projects that were previously delayed or excluded due to infrastructure costs. Reduced GPU pricing may make advanced training workloads or larger model development more practical.

For high-performance use cases, such as training large language models, P6-B200 instances should also be considered.

Breaking Down Real-World Savings

To make these changes actual, let's examine how they affect different types of workloads using verified pricing data.

Example Calculation - p5.48xlarge Training Workload:

Previous hourly rate: $3.8592
New rate after 44% reduction: ~$2.16
For a 500-hour training job: $1,929 vs $1,080 = $849 saved per job

Scaling Impact: Organizations running multiple training cycles can see substantial cumulative savings. A research team conducting 10 similar training jobs annually would save approximately $8,490—enough to fund additional experiments or infrastructure improvements.

Regional Expansion Benefits: AWS is also making at-scale On-Demand capacity available in new regions:

P4d instances: Asia Pacific (Seoul), Asia Pacific (Sydney), Canada (Central), and Europe (London)
P4de instances: US East (N. Virginia)
P5 instances: Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Jakarta), and South America (São Paulo)
P5en instances: Asia Pacific (Mumbai), Asia Pacific (Tokyo), and Asia Pacific (Jakarta)

This geographic expansion means organizations can optimize costs while keeping workloads closer to their user base or compliance requirements.

Competitive Landscape

This pricing update reflects the increased competition for AI infrastructure services, where AWS faces growing pressure from other major cloud providers:

Google Cloud Position: Google's TPU v5p instances offer strong alternatives for specific workloads, particularly transformer training, often at 40-50% lower costs than comparable GPU instances. Their recent Vertex AI pricing updates and committed use discounts have pushed AWS to respond with competitive pricing.
Microsoft Azure Strategy: Azure's partnership with OpenAI and integrated AI services creates a comprehensive ecosystem. Their recent expansion of H100 and A100 capacity, combined with Azure Machine Learning credits, has attracted enterprise customers who want integrated AI workflows.
AWS Competitive Advantage: While competitors focus on performance-per-dollar metrics, AWS maintains strength in operational flexibility. The combination of Spot Instances, Savings Plans, and Reserved Instances creates pricing models that adapt to different workload patterns, especially valuable for organizations with both experimental and production workloads.

Market Development: These price cuts align with the AI infrastructure market's growth. Early adopters who tested expensive on-demand instances now need predictable, cost-effective solutions for production deployments. AWS's pricing response addresses this shift from experimentation to operational scale.

Key Considerations and Limitations

While the pricing changes provide clear benefits, they also introduce new factors that must be considered before making infrastructure changes.

Regional Pricing Variation
Not all regions received equal price reductions. For instance, US East regions may show a 44% decrease in P5 pricing, while Asia Pacific regions may see a 35–40% decrease. These differences can influence deployment strategy across regions.
Capacity Constraints
Demand for newer GPU instances remains high. P5 and P6-B200 instances may not always be available during peak usage periods. Organizations should prepare backup deployment plans with alternative instance types or secondary regions.
Spot Instance Limitations
Spot Instances offer significant cost savings, but they are interruptible. Interruption rates can range from 5–15%, depending on the region and time of day. To use Spot Instances effectively, applications must support checkpointing and automatic recovery.
Savings Plan Commitment Requirements
Savings Plans are only effective when usage is stable and predictable. If an organization’s GPU needs vary significantly, it is important to model usage scenarios before committing to a one- or three-year plan.
Workload Suitability
Newer instance types may not improve performance for all workloads. Memory-bound or bandwidth-limited applications may not benefit from switching to P4 or P5. Testing and benchmarking are recommended before migration.
Migration Costs
Migrating to new instance types may require code adjustments, software updates, and reconfiguration. These costs should be evaluated and included in any return-on-investment calculation.

Your Next Steps: Turning Savings Into Strategy

With these changes now available, organizations should take specific actions to capitalize on the improved economics:

Immediate Actions:

Audit current GPU usage to identify instances running on older, more expensive families like P3 that could be migrated to P4 or P5 for immediate savings
Evaluate Savings Plan opportunities, particularly for the newly eligible P6-B200 instances, if your workloads require maximum performance
Reassess postponed AI projects that were previously cost-prohibitive but might now fit within budget constraints

Strategic Planning:

Expand experimentation capacity by reallocating savings toward additional model training and testing
Consider spot instance strategies for fault-tolerant workloads, which become even more cost-effective with lower baseline pricing
Review infrastructure roadmaps to incorporate higher-performance instances that were previously outside budget consideration

Long-term Positioning: Organizations should view these changes not just as cost-saving opportunities but as enablers of more ambitious AI strategies. The reduced barrier to advanced GPU computing creates space for innovation that was previously constrained by infrastructure economics.

The Window of Opportunity

AWS's official announcement of up to 45% price reductions for GPU-accelerated instances represents more than a pricing update—it's a fundamental shift in AI infrastructure accessibility. The combination of significant price reductions, expanded Savings Plan eligibility for P6-B200 instances, and increased regional availability creates a unique window for organizations to advance their AI capabilities while optimizing costs.

AWS Announces Significant Price Reductions for NVIDIA GPU EC2 Instances

A Look at Jeff Barr’s Announcement on EC2 GPU Pricing

The Pricing Change — Real Savings, Real Impact

What About the P6-B200?

Regional Expansion: Improving Deployment Options

Why Does This Matter for AI Workloads?

Real Cost Comparison: Before vs. After Price Reductions

Turning Cost Reductions into Practical Improvements

Breaking Down Real-World Savings

Competitive Landscape

Your Next Steps: Turning Savings Into Strategy

The Window of Opportunity

Free Cloud Assessment

The Importance of Cloud Tagging and Cost Attribution in Modern FinOps

AWS DataSync vs AWS DMS: Choosing the Right Migration Tool

Deep Dive into AWS Database Migration Service (AWS DMS)

What is AWS PrivateLink? Architecture, Use Cases, and Design Considerations

Hidden Impact of DevOps Speed on Cloud Security Gaps

The Importance of Cloud Tagging and Cost Attribution in Modern FinOps

AWS DataSync vs AWS DMS: Choosing the Right Migration Tool

Deep Dive into AWS Database Migration Service (AWS DMS)

What is AWS PrivateLink? Architecture, Use Cases, and Design Considerations

Hidden Impact of DevOps Speed on Cloud Security Gaps

The Importance of Cloud Tagging and Cost Attribution in Modern FinOps

AWS DataSync vs AWS DMS: Choosing the Right Migration Tool

Deep Dive into AWS Database Migration Service (AWS DMS)

Maximize Your Cloud Potential

A Look at Jeff Barr’s Announcement on EC2 GPU Pricing

The Pricing Change — Real Savings, Real Impact

What About the P6-B200?

Regional Expansion: Improving Deployment Options

Why Does This Matter for AI Workloads?

Real Cost Comparison: Before vs. After Price Reductions

Turning Cost Reductions into Practical Improvements

Breaking Down Real-World Savings

Competitive Landscape

Your Next Steps: Turning Savings Into Strategy

The Window of Opportunity

Free Cloud Assessment

Similar Blogs

What is AWS PrivateLink? Architecture, Use Cases, and Design Considerations

Hidden Impact of DevOps Speed on Cloud Security Gaps

The Importance of Cloud Tagging and Cost Attribution in Modern FinOps

Maximize Your Cloud Potential