效果评估¶
本部分介绍 Qwen2 量化模型(包括 GPTQ 与 AWQ 量化方案)的效果评估,有以下数据集
- MMLU (准确率)
- C-Eval (准确率)
- IFEval (提示词级的严格准确率,Strict Prompt-Level Accuracy)
所有模型均使用贪心解码。
Quantization | Average | MMLU | C-Eval | IFEval | |
---|---|---|---|---|---|
Qwen2-72B-Instruct | BF16 | 81.3 | 82.3 | 83.8 | 77.6 |
GPTQ-Int8 | 80.7 | 81.3 | 83.4 | 77.5 | |
GPTQ-Int4 | 81.2 | 80.8 | 83.9 | 78.9 | |
AWQ | 80.4 | 80.5 | 83.9 | 76.9 | |
Qwen2-7B-Instruct | BF16 | 66.9 | 70.5 | 77.2 | 53.1 |
GPTQ-Int8 | 66.2 | 69.1 | 76.7 | 52.9 | |
GPTQ-Int4 | 64.1 | 67.8 | 75.2 | 49.4 | |
AWQ | 64.1 | 67.4 | 73.6 | 51.4 | |
Qwen2-1.5B-Instruct | BF16 | 48.4 | 52.4 | 63.8 | 29.0 |
GPTQ-Int8 | 48.1 | 53.0 | 62.5 | 28.8 | |
GPTQ-Int4 | 45.0 | 50.7 | 57.4 | 27.0 | |
AWQ | 46.5 | 51.6 | 58.1 | 29.9 | |
Qwen2-0.5B-Instruct | BF16 | 34.4 | 37.9 | 45.2 | 20.0 |
GPTQ-Int8 | 32.6 | 35.6 | 43.9 | 18.1 | |
GPTQ-Int4 | 29.7 | 33.0 | 39.2 | 16.8 | |
AWQ | 31.1 | 34.4 | 42.1 | 16.7 |