Demos
Document the runtime surfaces without pretending the app hosts inference
The public web experience should point users toward Web UI, CLI, vLLM, and FastChat flows while keeping the marketing/docs server separate from model serving.
Web UICLI demovLLM + FastChat
Recommended deployment path from upstream
Start with vLLM for fast inference
The upstream deployment section recommends vLLM first when you need serving-oriented throughput.
Layer FastChat for Web UI or an OpenAI-style API
FastChat becomes the orchestration layer for controller, worker, Gradio web server, and compatible API server.
Use the simple demos if the stack above is too heavy
The README also keeps lighter Web UI, CLI, and API entry points for direct local demos.
vLLM + FastChat shell flow
pip install "fschat[model_worker,webui]"
python -m fastchat.serve.controller
python -m fastchat.serve.vllm_worker --model-path $MODEL_PATH --trust-remote-code --dtype bfloat16
python -m fastchat.serve.gradio_web_server
python -m fastchat.serve.openai_api_server --host localhost --port 8000
Documented demo surfaces
Gradio-oriented
Web UI
The original repo exposes a `web_demo.py` path for a quick browser-based local demo.
Terminal-first
CLI demo
The CLI path focuses on streaming token output for local prompt-response testing.
Public studio
Hosted demo
The upstream README links to a public ModelScope studio demo for the 72B chat variant.
Open linkComplete documentation route map
The docs surface stays mirrored in a fixed order, with the current page highlighted inside the shared route map.
Requirements, quickstart, and deployment-oriented install notes for the historical Qwen release line.
Go to pageThe original Qwen model family with context windows, memory guidance, and public checkpoint entry points.
Go to pageHistorical performance tables for the original Qwen release line, preserved with source attribution.
Go to pageWeb UI, CLI demo, vLLM, FastChat, and the deployment touchpoints highlighted by the original README.
Go to pageOpenAI-compatible local API patterns, function calling, and managed API references for the original Qwen line.
Go to pageSystem prompt positioning, ReAct-style tooling, function calling, and code-interpreter benchmarks from the original README.
Go to pageLong-context techniques and evaluation blocks for the original Qwen release line.
Go to pageA public FAQ layer derived from the README-only source surface and the boundary conditions stated by the blueprint.
Go to pageSource-code licensing, model-license notes, and citation text mirrored from the original Qwen README.
Go to page