On 19 established benchmarks, it demonstrates performance comparable to leading models such as GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro.
We further enhance Qwen3-Max-Thinking with two key innovations:
- (1) adaptive tool-use capabilities that enable on-demand retrieval and code interpreter invocation, now available at chat.qwen.ai; and
- (2) advanced test-time scaling techniques that significantly boost reasoning performance, surpassing Gemini 3 Pro on key reasoning benchmarks.
20251116093417b1a⁝ What are the advanced test-time scaling techniques in LLM
Qwen3-Max is potentially strong, but the main problem it has huge censorship (especially on politically sensitive topics) and still closed nature are major drawbacks.