Gemini 3.5 Flash
Gemini 3.5 Flash advances the Flash line with improved coding proficiency, parallel agentic execution, stronger core reasoning, tighter instruction following, and higher-quality reasoning traces in thinking mode.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3.5-flash', prompt: 'Why is the sky blue?'})Playground
Try out Gemini 3.5 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Google
| Model |
|---|
About Gemini 3.5 Flash
Gemini 3.5 Flash is Google's update to the Flash tier, building on Gemini 3 Flash with focused improvements for coding workflows and agentic execution. Coding proficiency and parallel agentic execution loops both improve over previous Flash versions, which makes Gemini 3.5 Flash a better fit for agents that issue concurrent tool calls or refactor code across multiple files in one pass.
Core reasoning, instruction following, and multi-turn coherence all see upgrades. For complex tasks the model produces higher-quality reasoning traces in thinking mode, which is useful when you need to audit the model's intermediate steps or train downstream systems on chain-of-thought data. Gemini 3.5 Flash defaults to the medium thinking level, balancing quality against faster, more cost-efficient generation, and exposes thinkingLevel and includeThoughts through providerOptions for finer control.
Because Gemini 3.5 Flash sits at the intersection of agentic capability and Flash-tier throughput, it suits production traffic patterns from low-latency chat interfaces to high-volume code-transformation pipelines. Accessing Gemini 3.5 Flash through AI Gateway adds observability, automatic retries, and provider failover without requiring a Google Cloud account.
What To Consider When Choosing a Provider
- Configuration: Gemini 3.5 Flash defaults to the
mediumthinking level. SetthinkingLevelto'high'viaproviderOptions.google.thinkingConfigwhen the task demands deeper reasoning, and enableincludeThoughtsto surface the model's intermediate steps. Parameters liketemperature,topP,topK, andthinking_budgetare not supported. - Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Gemini 3.5 Flash
Best For
- Agentic coding workflows: Parallel tool calls and multi-file refactors benefit from improved coding proficiency and parallel execution
- Multi-turn assistants: Instruction following and conversational coherence drive task completion across long sessions
- Auditable agent pipelines: Higher-quality reasoning traces in thinking mode aid debugging and downstream chain-of-thought data
- Cost-sensitive production traffic: The default
mediumthinking level balances quality with throughput - Gemini 3 Flash migrations: Teams on the previous Flash version get the latest coding and agentic improvements without changing tier
Consider Alternatives When
- Deepest reasoning required: Latency and cost are secondary (consider
google/gemini-3-pro-previeworgoogle/gemini-3.1-pro-preview) - Native image generation needed: Image output is part of the task (consider
google/gemini-3-pro-imageorgoogle/gemini-3.1-flash-image-preview) - Throughput and budget dominate: Reasoning depth is not required (consider
google/gemini-3.1-flash-lite-preview) - Sampling parameters required: Your code depends on
temperature,topP,topK, orthinking_budget, which Gemini 3.5 Flash does not support
Conclusion
Gemini 3.5 Flash is the right Flash-tier choice for teams building coding agents, parallel tool-using workflows, and instruction-heavy assistants on AI Gateway. It carries the Flash speed and cost profile while adding the coding, agentic, and reasoning-trace improvements that production teams have been asking for in the Flash line.
Frequently Asked Questions
What's new in Gemini 3.5 Flash versus Gemini 3 Flash?
Gemini 3.5 Flash improves coding proficiency and supports more reliable parallel agentic execution loops. Core reasoning, instruction following, and multi-turn coherence are all stronger, and thinking-mode outputs include higher-quality reasoning traces.
How do I control how much Gemini 3.5 Flash thinks before responding?
Set
thinkingLevel(for example'high') andincludeThoughts: trueunderproviderOptions.google.thinkingConfigwhen using the AI SDK plus Chat Completions / Responses / Messages APIs. Gemini 3.5 Flash defaults to themediumlevel.Which sampling parameters does Gemini 3.5 Flash support?
Gemini 3.5 Flash does not support
temperature,topP,topK, orthinking_budget. If your application depends on those parameters, evaluate a different model before migrating production traffic.Is Gemini 3.5 Flash suitable for agentic coding tasks?
Yes. Improved coding proficiency and parallel agentic execution make Gemini 3.5 Flash well-suited for refactoring services, running concurrent tool calls, and multi-step code transformation workflows where reliability across steps matters.
Does Gemini 3.5 Flash support streaming?
Yes. Use
streamTextfrom the AI SDK plus Chat Completions / Responses / Messages APIs withmodel: 'google/gemini-3.5-flash'for streaming responses.Do I need a Google Cloud account to use Gemini 3.5 Flash on AI Gateway?
No. AI Gateway manages provider authentication. Connect using a Vercel API key or OIDC token and AI Gateway handles routing to the underlying provider.
How does Zero Data Retention work with Gemini 3.5 Flash through AI Gateway?
Yes, Zero Data Retention is available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
When should I use Gemini 3.5 Flash versus Gemini 3.1 Pro?
Choose Gemini 3.5 Flash when Flash-tier latency and cost matter and the task fits within the Flash quality envelope. Choose Gemini 3.1 Pro for the deepest reasoning, long agentic sessions, or finance and spreadsheet workloads that benefit from pro-tier capability.