What is the relationship between Qwen 3.7 Max and Qwen3.6-Max-Preview?

Qwen 3.7 Max is the Max-tier flagship in the Qwen 3.7 generation, succeeding Qwen3.6-Max-Preview. Alibaba positions Qwen 3.7 Max as an agent foundation with improvements in long-horizon tool use, multi-file coding, and office workflow automation.

What is the context window for Qwen 3.7 Max?

The context window is 991K tokens. This supports full-repository ingestion, long agent traces with hundreds of sequential tool calls, and multi-document analysis without segmentation.

Can Qwen 3.7 Max accept image or file inputs?

Qwen 3.7 Max is text-only. For vision input within the 3.7 line, use Qwen3.7-Plus, which is the multimodal entry in the generation.

Does Qwen 3.7 Max support tool calling?

Yes. Qwen 3.7 Max supports structured tool calling and is tuned for agent workflows that chain many sequential tool invocations across long-horizon sessions.

How does extended-thinking mode work on Qwen 3.7 Max?

Extended-thinking mode generates an internal reasoning trace before producing the final answer, which improves accuracy on high-difficulty logical reasoning, scientific computation, and expert-level queries. The thinking budget is tunable per request to balance depth against latency and token spend.

How do I access Qwen 3.7 Max through AI Gateway?

Authenticate with an AI Gateway API key or OIDC token and reference ``alibaba/qwen3.7-max`` as the model. You can call Qwen 3.7 Max through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.

Does Qwen 3.7 Max support zero data retention?

Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

Where can I see live latency and cost data for Qwen 3.7 Max?

This page shows live throughput, time-to-first-token, and pricing metrics for Qwen 3.7 Max measured across real AI Gateway traffic.

Qwen 3.7 Max

Qwen 3.7 Max is Alibaba's flagship agent-tuned model in the Qwen 3.7 line, with a context window of 991K tokens and an emphasis on long-horizon tool use, multi-file coding, and office workflow automation.

ReasoningTool UseVision (Image)File InputImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.7-max',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Qwen 3.7 Max by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

991K

2.7s

95tps

$1.25/MFree

$3.75/MFree

Read:$0.25/MFree

Write:$1.56/MFree

—

05/21/2026

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.8s

55tps

$0.40/MFree

$1.60/MFree

Read:

$0.08/MFree

Write:

$0.5/MFree

—

06/01/2026

1.1s

109tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

0.7s

255tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

131K

0.3s

307tps

$0.10/M

$0.30/M

Read:$0.14/M

Write:—

—

04/01/2025

262K

0.3s

82tps

$0.07/M

$0.46/M

—

04/01/2025

41K

0.3s

52tps

$0.12/M

$0.24/M

—

04/01/2025

About Qwen 3.7 Max

Qwen 3.7 Max is the Max-tier release in the Qwen 3.7 generation, succeeding Qwen3.6-Max-Preview in Alibaba's closed-weight API line. The model is served through alibaba with a context window of 991K tokens, which suits full-repository ingestion, long agent traces, and multi-document analysis without segmentation.

Alibaba describes Qwen 3.7 Max as designed as an agent foundation. The model targets coding agents that plan and act across many turns, office and productivity tasks that route work through multi-agent orchestration, and long-horizon autonomous execution where the model must maintain coherent reasoning across hundreds of sequential tool calls. Reported improvements over Qwen3.6-Max-Preview concentrate in frontend prototyping and complex multi-file engineering work.

Like other Max-tier entries, Qwen 3.7 Max supports tool calling and structured outputs, with extended-thinking mode available for high-difficulty reasoning, scientific computation, and expert-level queries. The thinking budget can be tuned per request to balance depth of reasoning against latency and token spend. Qwen 3.7 Max is text-only; for vision input, the sibling Qwen3.7-Plus is the multimodal entry in the 3.7 lineup.

You can integrate Qwen 3.7 Max through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.

What To Consider When Choosing a Provider

Configuration: Agent workflows that chain hundreds of tool calls produce high output-token volume. Use the AI Gateway cost dashboard to monitor per-session spend and tune the thinking budget before running production traffic at scale.
Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen 3.7 Max

Best For

Long-Horizon Coding Agents: Sustained tool-calling sessions across many turns with planning, retries, and dead-end recovery
Multi-File Software Engineering: Refactoring, diff editing, and frontend prototyping across a repository
Office Workflow Automation: Routing productivity tasks through multi-agent orchestration
Expert Reasoning Tasks: Scientific computation, mathematics, and structured analysis with extended-thinking mode
Repository Ingestion: Long-context workloads using the window of 991K tokens for full codebases and tool traces

Consider Alternatives When

Vision Or Multimodal Input: Qwen3.7-Plus is the multimodal entry in the 3.7 line when image inputs are needed
Latency-Sensitive Pipelines: A Plus or Flash-tier model serves users better when extended-thinking traces add unnecessary overhead
Strict Token Budgets: A smaller model is a closer fit when per-session spend on a flagship Max model isn't justified
Built-In Autonomous Search: Qwen3-Max-Thinking is a stronger fit when integrated search and code interpreter tools are required

Conclusion

Qwen 3.7 Max extends the Qwen Max tier with an agent-first design that targets long-horizon tool use, multi-file coding, and office workflow automation. Routing through AI Gateway gives you a single integration surface, provider failover, and consolidated billing while you build against the latest generation in the Max line.

Frequently Asked Questions

What is the relationship between Qwen 3.7 Max and Qwen3.6-Max-Preview?
Qwen 3.7 Max is the Max-tier flagship in the Qwen 3.7 generation, succeeding Qwen3.6-Max-Preview. Alibaba positions Qwen 3.7 Max as an agent foundation with improvements in long-horizon tool use, multi-file coding, and office workflow automation.
What is the context window for Qwen 3.7 Max?
The context window is 991K tokens. This supports full-repository ingestion, long agent traces with hundreds of sequential tool calls, and multi-document analysis without segmentation.
Can Qwen 3.7 Max accept image or file inputs?
Qwen 3.7 Max is text-only. For vision input within the 3.7 line, use Qwen3.7-Plus, which is the multimodal entry in the generation.
Does Qwen 3.7 Max support tool calling?
Yes. Qwen 3.7 Max supports structured tool calling and is tuned for agent workflows that chain many sequential tool invocations across long-horizon sessions.
How does extended-thinking mode work on Qwen 3.7 Max?
Extended-thinking mode generates an internal reasoning trace before producing the final answer, which improves accuracy on high-difficulty logical reasoning, scientific computation, and expert-level queries. The thinking budget is tunable per request to balance depth against latency and token spend.
How do I access Qwen 3.7 Max through AI Gateway?
Authenticate with an AI Gateway API key or OIDC token and reference `alibaba/qwen3.7-max` as the model. You can call Qwen 3.7 Max through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.
Does Qwen 3.7 Max support zero data retention?
Zero Data Retention is not currently available for this model. Zero Data Retention is offered on a per-provider basis. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Where can I see live latency and cost data for Qwen 3.7 Max?
This page shows live throughput, time-to-first-token, and pricing metrics for Qwen 3.7 Max measured across real AI Gateway traffic.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen 3.7 Max

Playground

Providers

More models by Alibaba

About Qwen 3.7 Max

What To Consider When Choosing a Provider

When to Use Qwen 3.7 Max

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions