Web UI
The original repo exposes a `web_demo.py` path for a quick browser-based local demo.
Community-run docs surface, derived from public Qwen source materials and not presented as the primary upstream home by default.
Demos
The public web experience should point users toward Web UI, CLI, vLLM, and FastChat flows while keeping the marketing/docs server separate from model serving.
The upstream deployment section recommends vLLM first when you need serving-oriented throughput.
FastChat becomes the orchestration layer for controller, worker, Gradio web server, and compatible API server.
The README also keeps lighter Web UI, CLI, and API entry points for direct local demos.
pip install "fschat[model_worker,webui]"
python -m fastchat.serve.controller
python -m fastchat.serve.vllm_worker --model-path $MODEL_PATH --trust-remote-code --dtype bfloat16
python -m fastchat.serve.gradio_web_server
python -m fastchat.serve.openai_api_server --host localhost --port 8000
The original repo exposes a `web_demo.py` path for a quick browser-based local demo.
The CLI path focuses on streaming token output for local prompt-response testing.
The upstream README links to a public ModelScope studio demo for the 72B chat variant.
Open linkSource anchors