Question 1

What is the difference between Gemini Flash and Pro?

Accepted Answer

Flash is optimized for speed and cost-efficiency, making it ideal for high-volume tasks. Pro is a larger model designed for complex reasoning and handling massive context windows (up to 2M tokens).

Question 2

How does Gemini Context Caching work?

Accepted Answer

Context Caching allows you to store large amounts of input data (like a book or codebase) and query it multiple times at a reduced cost. You pay a storage fee, but the input token cost for subsequent queries is significantly lower.

Question 3

Is Gemini cheaper than GPT-4o?

Accepted Answer

Yes, Gemini 1.5 Flash is significantly cheaper than GPT-4o mini, and Gemini 1.5 Pro is generally cheaper than GPT-4o, especially for input tokens.

Question 4

Does Gemini charge for images?

Accepted Answer

Yes, images are converted to tokens. For example, a standard image might consume around 258 tokens, which are billed at the model's input token rate.

Question 5

What is the '128k context' pricing tier?

Accepted Answer

Previously, Gemini had different pricing for prompts shorter than 128k tokens. However, the pricing is now generally flat for the first 128k, with some variations for longer contexts in older pricing models.

Question 6

Is there a free tier for Gemini API?

Accepted Answer

Yes, Google offers a free tier for Gemini API usage, subject to rate limits (e.g., 15 RPM for Flash). It's great for testing and development, but data may be used to improve Google's products.

Question 7

How are tokens counted?

Accepted Answer

Tokens are the basic units of text. 1,000 tokens are approximately 750 words. Images and videos also consume tokens based on their size and duration.

Question 8

Does Gemini charge for context caching?

Accepted Answer

Yes, there is a storage cost for cached tokens (approx $1.00/1M tokens/hour for Flash), but using cached tokens for subsequent requests is significantly cheaper than processing them from scratch.

Model	Input Price ($/1M)	Output Price ($/1M)	Cached Input ($/1M)
Multimodal Models (Text, Image, Video, Audio)
Gemini 3.0 Pro	$2.00	$12.00	$0.20
Gemini 3.0 Flash	$0.50	$3.00	$0.05
Gemini 2.5 Pro	$1.25	$10.00	$0.31
Gemini 2.5 Flash	$0.30	$2.50	$0.075
Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.01
Gemini 2.0 Pro	$1.25	$5.00	$0.31
Gemini 2.0 Flash	$0.10	$0.40	$0.025
Gemini 2.0 Flash-Lite	$0.075	$0.30	$0.019
Gemini 1.5 Pro	$1.25	$5.00	$0.31
Gemini 1.5 Flash	$0.075	$0.30	$0.018
Embedding Models
Text Embedding 004	$0.025	$-	-
Multimodal Embedding 001	$0.0002 / image	$-	-

Gemini API Pricing Calculator

Gemini API Pricing FAQs

What is the difference between Gemini Flash and Pro?

How does Gemini Context Caching work?

Is Gemini cheaper than GPT-4o?

Does Gemini charge for images?

What is the '128k context' pricing tier?

Is there a free tier for Gemini API?

How are tokens counted?

Does Gemini charge for context caching?

Complete Guide to Gemini API Costs

Key Takeaways

Gemini API Pricing Overview

Gemini 3.0 Pro: Multimodal Mastery

Gemini 3.0 Flash: Extreme Speed

Gemini 2.5 Pro: Balanced Reasoning

Gemini 2.5 Flash: High-Frequency Performance

Gemini 2.5 Flash-Lite: Cost-Effective Scaling

Gemini 2.0 Pro: Enterprise Reliability

Gemini 2.0 Flash: Proven Speed

Gemini 2.0 Flash-Lite: Lightweight Logic

Gemini 1.5 Pro: The Powerhouse

Gemini 1.5 Flash: Speed and Efficiency

Key Pricing Factors

How to Optimize Costs