We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities that enable on-demand retrieval and code interpreter invocation, now available at chat.qwen.ai; and (2) advanced test-time scaling techniques that significantly boost reasoning performance, surpassing Gemini 3 Pro on key reasoning benchmarks.
I wondered what is this is it about
Test-time scaling is the deliberate use of extra inference-time compute (e.g., longer reasoning traces, more sampled candidates, or more search/verification steps) to systematically improve task performance without changing model parameters or retraining.