Demos

Document the runtime surfaces without pretending the app hosts inference

The public web experience should point users toward Web UI, CLI, vLLM, and FastChat flows while keeping the marketing/docs server separate from model serving.

Web UICLI demovLLM + FastChat

Recommended deployment path from upstream

Start with vLLM for fast inference
The upstream deployment section recommends vLLM first when you need serving-oriented throughput.
Layer FastChat for Web UI or an OpenAI-style API
FastChat becomes the orchestration layer for controller, worker, Gradio web server, and compatible API server.
Use the simple demos if the stack above is too heavy
The README also keeps lighter Web UI, CLI, and API entry points for direct local demos.

vLLM + FastChat shell flow

pip install "fschat[model_worker,webui]"
python -m fastchat.serve.controller
python -m fastchat.serve.vllm_worker --model-path $MODEL_PATH --trust-remote-code --dtype bfloat16
python -m fastchat.serve.gradio_web_server
python -m fastchat.serve.openai_api_server --host localhost --port 8000

Documented demo surfaces

Gradio-oriented

Web UI

The original repo exposes a `web_demo.py` path for a quick browser-based local demo.

Terminal-first

CLI demo

The CLI path focuses on streaming token output for local prompt-response testing.

Public studio

Hosted demo

The upstream README links to a public ModelScope studio demo for the 72B chat variant.

Open link

Complete documentation route map

The docs surface stays mirrored in a fixed order, with the current page highlighted inside the shared route map.

Install

Install and Quickstart

Requirements, quickstart, and deployment-oriented install notes for the historical Qwen release line.

Go to page

Models

Models and Variants

The original Qwen model family with context windows, memory guidance, and public checkpoint entry points.

Go to page

Benchmarks

Historical performance tables for the original Qwen release line, preserved with source attribution.

Go to page

Demos

Current page

Demos and Deployment Surfaces

Web UI, CLI demo, vLLM, FastChat, and the deployment touchpoints highlighted by the original README.

Go to page

API

API Surface

OpenAI-compatible local API patterns, function calling, and managed API references for the original Qwen line.

Go to page

Tool Use

Tool Use and System Prompt

System prompt positioning, ReAct-style tooling, function calling, and code-interpreter benchmarks from the original README.

Go to page

Long Context

Long-context techniques and evaluation blocks for the original Qwen release line.

Go to page

FAQ

A public FAQ layer derived from the README-only source surface and the boundary conditions stated by the blueprint.

Go to page

License

License and Citation

Source-code licensing, model-license notes, and citation text mirrored from the original Qwen README.

Go to page

Source anchors

README: deployment, Web UI, CLI demo README_CN: deployment, Web UI, CLI demo

Document the runtime surfaces without pretending the app hosts inference

Recommended deployment path from upstream

Start with vLLM for fast inference

Layer FastChat for Web UI or an OpenAI-style API

Use the simple demos if the stack above is too heavy

vLLM + FastChat shell flow

Documented demo surfaces

Web UI

CLI demo

Hosted demo

Complete documentation route map