跳转至

效果评估

本部分介绍 Qwen2 量化模型(包括 GPTQ 与 AWQ 量化方案)的效果评估,有以下数据集

  • MMLU (准确率)
  • C-Eval (准确率)
  • IFEval (提示词级的严格准确率,Strict Prompt-Level Accuracy)

所有模型均使用贪心解码。

Quantization Average MMLU C-Eval IFEval
Qwen2-72B-Instruct BF16 81.3 82.3 83.8 77.6
GPTQ-Int8 80.7 81.3 83.4 77.5
GPTQ-Int4 81.2 80.8 83.9 78.9
AWQ 80.4 80.5 83.9 76.9
Qwen2-7B-Instruct BF16 66.9 70.5 77.2 53.1
GPTQ-Int8 66.2 69.1 76.7 52.9
GPTQ-Int4 64.1 67.8 75.2 49.4
AWQ 64.1 67.4 73.6 51.4
Qwen2-1.5B-Instruct BF16 48.4 52.4 63.8 29.0
GPTQ-Int8 48.1 53.0 62.5 28.8
GPTQ-Int4 45.0 50.7 57.4 27.0
AWQ 46.5 51.6 58.1 29.9
Qwen2-0.5B-Instruct BF16 34.4 37.9 45.2 20.0
GPTQ-Int8 32.6 35.6 43.9 18.1
GPTQ-Int4 29.7 33.0 39.2 16.8
AWQ 31.1 34.4 42.1 16.7