Gemini API Pricing Calculator

Estimate your monthly costs for Google's Gemini models. Account for input tokens, output tokens, and context length.

FAQ

Gemini API Pricing FAQs

Got questions? We've got answers. Here are the most common questions we get from potential clients.

Flash is optimized for speed and cost-efficiency, making it ideal for high-volume tasks. Pro is a larger model designed for complex reasoning and handling massive context windows (up to 2M tokens).

Complete Guide to Gemini API Costs

Google's Gemini API offers a flexible pricing model based on token usage. The cost depends on the model you choose (e.g., the cost-effective Flash or the powerful Pro) and the length of your context window. This is particularly relevant for businesses using AI Voice Agents that process large amounts of audio data.

Key Takeaways

  • Gemini 3.0 & 2.5 Series bring new levels of multimodal reasoning and speed.
  • Flash-Lite Models offer extreme cost efficiency for high-volume, simple tasks.
  • Context Caching dramatically reduces input costs for repeated large prompts across most models.

Gemini API Pricing Overview

ModelInput Price ($/1M)Output Price ($/1M)Cached Input ($/1M)
Multimodal Models (Text, Image, Video, Audio)
Gemini 3.0 Pro$2.00$12.00$0.20
Gemini 3.0 Flash$0.50$3.00$0.05
Gemini 2.5 Pro$1.25$10.00$0.31
Gemini 2.5 Flash$0.30$2.50$0.075
Gemini 2.5 Flash-Lite$0.10$0.40$0.01
Gemini 2.0 Pro$1.25$5.00$0.31
Gemini 2.0 Flash$0.10$0.40$0.025
Gemini 2.0 Flash-Lite$0.075$0.30$0.019
Gemini 1.5 Pro$1.25$5.00$0.31
Gemini 1.5 Flash$0.075$0.30$0.018
Embedding Models
Text Embedding 004$0.025$--
Multimodal Embedding 001$0.0002 / image$--

Gemini 3.0 Pro: Multimodal Mastery

The most capable multimodal model from Google to date. Designed for complex reasoning, advanced coding, and nuanced instruction following, it sets a new standard for AI performance.

Gemini 3.0 Flash: Extreme Speed

Engineered for extreme speed and efficiency at scale. It offers significantly lower latency and cost while maintaining high performance for high-volume multimodal tasks.

Gemini 2.5 Pro: Balanced Reasoning

A significant upgrade in reasoning capabilities, striking an optimal balance between performance and cost for production-grade applications requiring deep analysis.

Gemini 2.5 Flash: High-Frequency Performance

Optimized for high-frequency tasks where speed is critical, now with improved multimodal understanding capabilities compared to previous generations.

Gemini 2.5 Flash-Lite: Cost-Effective Scaling

The most cost-effective solution for simple, high-volume tasks like text classification, data extraction, and basic chatbots.

Gemini 2.0 Pro: Enterprise Reliability

A robust workhorse for enterprise applications, offering reliable performance and stability for a wide range of complex business workflows.

Gemini 2.0 Flash: Proven Speed

The previous generation's speed champion, remaining a highly viable and cost-efficient option for cost-sensitive, real-time applications.

Gemini 2.0 Flash-Lite: Lightweight Logic

An ultra-lightweight model designed for extreme efficiency in low-resource environments, perfect for simple logic and fast responses.

Gemini 1.5 Pro: The Powerhouse

Gemini 1.5 Pro offers a massive 2 million token context window, making it perfect for analyzing large documents, codebases, or long videos. It's a key component in process optimization tasks.

Gemini 1.5 Flash: Speed and Efficiency

Gemini 1.5 Flash is designed for high-frequency, low-latency tasks. It's incredibly affordable and fast, suitable for real-time applications.

Key Pricing Factors

  • Model Variety: Choose from a wide range of models (Lite to 3.0 Pro) to balance cost and capability perfectly for your specific use case.
  • Context Caching: Utilizing cached tokens for repetitive inputs can slash costs by ~90%, especially valuable for long-context applications.
  • Input vs. Output: Output tokens are significantly more expensive than input tokens (typically 4x-8x). Crafting prompts for concise answers is a key cost-saving strategy.

How to Optimize Costs

To keep your Gemini API bills low, consider using Context Caching for repeated content, which offers a significant discount (up to 75% cheaper for cached inputs). Also, use the Flash model for simpler tasks where the reasoning capabilities of Pro aren't strictly necessary.