Benchmarks

Benchmark claims stay visible, but visibly historical

Because the source input is only the README, the site keeps every benchmark claim tied to the original table rather than presenting it as fresh leaderboard data.

OpenCompass-citedREADME-backedHistorical snapshot

How to read the numbers

The upstream README states that compared-model numbers use the best value between official results and OpenCompass.

That makes these tables useful as product-surface evidence, but not as a substitute for current benchmark research.

Natural language understanding
Math and reasoning
Code generation
Chinese evaluation coverage

Representative performance table

Model	MMLU	C-Eval	GSM8K	MATH	HumanEval	MBPP	BBH	CMMLU
LLaMA2-7B	46.8	32.5	16.7	3.3	12.8	20.8	38.2	31.8
InternLM-20B	62.1	58.8	52.6	7.9	25.6	35.6	52.5	59.0
Yi-34B	76.3	81.8	67.9	15.9	26.2	38.2	66.4	82.6
Qwen-1.8B	45.3	56.1	32.3	2.3	15.2	14.2	22.3	52.1
Qwen-7B	58.2	63.5	51.7	11.6	29.9	31.6	45.0	62.2
Qwen-14B	66.3	72.1	61.3	24.8	32.3	40.8	53.4	71.0
Qwen-72B	77.4	83.3	78.9	35.2	35.4	52.2	67.7	83.6

The upstream README reports the best score between official results and OpenCompass for each compared model.

Freshness note

These scores come from the original Qwen README and technical memo, not from a live benchmark feed.

The site keeps them because they define the documented public surface for this historical model line.

Complete documentation route map

The docs surface stays mirrored in a fixed order, with the current page highlighted inside the shared route map.

Install

Install and Quickstart

Requirements, quickstart, and deployment-oriented install notes for the historical Qwen release line.

Go to page

Models

Models and Variants

The original Qwen model family with context windows, memory guidance, and public checkpoint entry points.

Go to page

Benchmarks

Current page

Benchmarks

Historical performance tables for the original Qwen release line, preserved with source attribution.

Go to page

Demos

Demos and Deployment Surfaces

Web UI, CLI demo, vLLM, FastChat, and the deployment touchpoints highlighted by the original README.

Go to page

API

API Surface

OpenAI-compatible local API patterns, function calling, and managed API references for the original Qwen line.

Go to page

Tool Use

Tool Use and System Prompt

System prompt positioning, ReAct-style tooling, function calling, and code-interpreter benchmarks from the original README.

Go to page

Long Context

Long-context techniques and evaluation blocks for the original Qwen release line.

Go to page

FAQ

A public FAQ layer derived from the README-only source surface and the boundary conditions stated by the blueprint.

Go to page

License

License and Citation

Source-code licensing, model-license notes, and citation text mirrored from the original Qwen README.

Go to page

Source anchors

README: performance section Technical memo PDF OpenCompass leaderboard