The latest Gemini models, like Gemini 3.1 Flash Image (Nano Banana 2), are available to use with Firebase AI Logic on all platforms! Learn more.

Gemini 2.0 Flash and Flash-Lite models will be retired on June 1, 2026. To avoid service disruption, update to a newer model like gemini-2.5-flash-lite. Learn more.

Rate limits and quotas

Click your Gemini API provider to view provider-specific content and code on this page.

Rate limits (commonly called quotas) regulate the number of requests you can make to the Gemini API within a given timeframe. These limits help ensure fair usage, protect against abuse, and help maintain system performance for all users.

When using Firebase AI Logic to send requests to Gemini and Imagen models, your project's rate limits depend on your chosen "Gemini API" provider. Firebase AI Logic also provides a way to set "per user" rate limits.

How rate limits (quotas) work

When you use the Vertex AI Gemini API with Gemini models, your requests are served as long as Vertex AI capacity is available, meaning there is no preset rate limit (quota). Instead, these models use dynamic shared quota (DSQ), which serves incoming requests by distributing available capacity among all customers using that specific model and region.

If capacity is exhausted, then you'll get a 429 Vertex AI is overloaded. Please try again later. error message.

Multimodal requests to Gemini models

Multimodal requests to Gemini models are subject to the corresponding system rate limits (like tokens per minute (TPM)) for their respective input types: images, audio, video, and documents (like PDFs).

These limits are not adjustable. If you exceed the quota, then you'll get a 429 quota-exceeded error.

Request a rate limit (quota) increase

With DSQ, you don't submit a quota increase request (QIR) whenever your traffic increases, because limits are based on overall capacity and not your specific project quota. If you want to help ensure high availability for your app and to get predictable service levels for your production workloads, consider setting up provisioned throughput.

Set "per user" rate limits

To use Firebase AI Logic, your project needs your chosen Gemini API provider enabled, but you also need the Firebase AI Logic API enabled, which acts as a gateway between our client SDKs and your Gemini API provider. This API is enabled for you when you initially set up Firebase AI Logic in your Firebase project.

You can use the Firebase AI Logic API rate limit (quota) as a "per user" rate limit for your app, specifically for the AI features that rely on Firebase AI Logic. You should set this limit to reasonably accommodate a single user accessing your AI features, while also ensuring that no single user overwhelms the limits of your Gemini API provider (which is meant to be shared by all your users).

Details about the "per user" rate limit

Here are some important details about the Firebase AI Logic API rate limits (quotas) -- specifically, requests per minute (RPM):

It's based on "Generate content requests" on a per-user per-region per-minute basis, and it's not based on model.
It's the rate limit applied to all your users. Currently, there isn't a way to set the rate limit for a specific user or specific group of users^*.
It applies at the project-level and applies to all applications and IP addresses that use that Firebase project.
It applies to any call that specifically comes from any Firebase AI Logic SDK.
The default rate limit is 100 RPM per user.
Note that you still need to consider the limits for your Gemini API provider (see above), which take precedence over the Firebase AI Logic API.

^{* If you're using the Vertex AI Gemini API and your
app directs users to different regions (for example, using
Firebase Remote Config),
then you could set a specific rate limit for users in a specific region.}

Adjust the "per user" rate limit

To adjust a rate limit (quota), you must have the serviceusage.quotas.update permission, which is included by default in the Owner and Editor role.

Here's how to edit your rate limit (quota) or request an increase:

In the Google Cloud console, go to the page for the Firebase AI Logic API.
Click Manage.
Lower on the page, click the Quotas & System Limits tab.
Filter the table to show the quotas of interest, like the capability (requests for generating content) and region.

For example, to view the per-user quotas for generating content requests in any of the supported Asian regions, your filter would look similar to this: Generate content requests + Dimension:region:asia

Note: To create a Dimension filter, you need to use the filter tooling, rather than just copy-pasting the values in this example above. Also, the (default) quota row doesn't apply to Firebase AI Logic.
Select the checkbox to the left of each quota of interest.
At the end of the quota's row, click , and then select Edit quota.
In the Quota changes form, do the following:
1. Enter the increased quota in the New value field.
  
  This quota applies at the project-level and is shared across all applications and IP addresses that use that Firebase project.
2. Complete any additional fields in the form, and then click Done.
3. Click Submit request.