Long Context
Long-context capability is documented with both method notes and evaluation tables
The upstream README ties longer context windows to NTK-aware interpolation, window attention, LogN scaling, and larger rotary bases for the 72B model.
Technique summary
For Qwen-14B, the README describes extending the context length from 2K to over 8K with NTK-aware interpolation, window attention, and LogN scaling.
For Qwen-1.8B and Qwen-7B, the long-context framing extends the 8K native setting to 32K. For Qwen-72B, the README says the model adapts RoPE with a larger rotary base.
Long-context perplexity snapshot
| Model | 1024 | 2048 | 4096 | 8192 | 16384 | 32768 |
|---|---|---|---|---|---|---|
| Qwen-7B (original) | 4.23 | 3.78 | 39.35 | 469.81 | 2645.09 | - |
| + dynamic_ntk | 4.23 | 3.78 | 3.59 | 3.66 | 5.71 | - |
| Qwen-1.8B | 5.00 | 4.48 | 4.13 | 3.89 | 17.42 | 433.85 |
| Qwen-1.8B + dynamic_ntk + logn + window_attn | 5.00 | 4.48 | 4.14 | 3.93 | 3.82 | 3.83 |
| Qwen-7B | 4.23 | 3.81 | 3.52 | 3.31 | 7.27 | 181.49 |
| Qwen-7B + dynamic_ntk + logn + window_attn | 4.23 | 3.81 | 3.52 | 3.33 | 3.22 | 3.17 |
| Qwen-14B + dynamic_ntk + logn + window_attn | - | 3.46 | 3.29 | 3.18 | 3.42 | - |
| Qwen-72B | - | - | - | 2.83 | 2.73 | 2.72 |
Perplexity results on arXiv long-context evaluation from the upstream README.
L-Eval comparison
| Model | Input length | Average | Coursera | GSM | QuALITY | TOEFL | CodeU | SFiction |
|---|---|---|---|---|---|---|---|---|
| ChatGPT-3.5-16k | 16K | 60.73 | 63.51 | 84.00 | 61.38 | 78.43 | 12.22 | 64.84 |
| Qwen-72B-Chat | 32K | 62.30 | 58.13 | 76.00 | 77.22 | 86.24 | 6.66 | 69.53 |
Closed-ended L-Eval result block mirrored from the upstream README.
Needle-in-a-haystack note
The upstream README also includes a qualitative needle-in-a-haystack result for Qwen-72B-Chat and states that it can retrieve information across positions within 32K inputs.
Source anchors