Hugging Face
Use the public Qwen organization when you want the standard open-source model-card and checkpoint flow.
Open linkCommunity-run docs surface, derived from public Qwen source materials and not presented as the primary upstream home by default.
Install
The original repo is documentation-heavy, so the install surface is mostly about environment constraints, package baselines, and the shortest path to a first chat call.
The upstream README calls out Python 3.8+, PyTorch 1.12+, Transformers 4.32+, and CUDA 11.4+ as the baseline environment.
Flash Attention is optional, but the README recommends it for supported fp16 or bf16 devices to improve efficiency and reduce memory usage.
Start with `pip install -r requirements.txt` if you want the simplest source-aligned local environment.
Treat flash-attention as an optimization layer, not a prerequisite, because the upstream README explicitly says the project still runs without it.
The official quickstart shows `AutoTokenizer` and `AutoModelForCausalLM` loading the chat model directly from the public model hub.
The upstream quickstart centers the local experience on a direct `model.chat()` flow.
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen-7B-Chat",
device_map="auto",
trust_remote_code=True
).eval()
response, history = model.chat(tokenizer, "你好", history=None)
print(response)
Use the public Qwen organization when you want the standard open-source model-card and checkpoint flow.
Open linkMirror the same model line in the China-friendly distribution hub used throughout the original docs.
Open linkThe README also points to prebuilt Docker images for faster environment setup when you do not want to build from scratch.
Open linkSource anchors