update example docs

Former-commit-id: 102cd42768d9eb2cf1219309a25b41e26149067e
2024-05-06 22:51:02 +08:00
parent 5c9da798b5
commit 50c71dd29f
33 changed files with 962 additions and 508 deletions
--- a/examples/README_zh.md
+++ b/examples/README_zh.md
@@ -1,50 +1,204 @@
 我们提供了多样化的大模型微调示例脚本。

+### 单 GPU LoRA 微调
+
+#### （增量）预训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_pretrain.yaml
 ```
-examples/
-├── lora_single_gpu/
-│   ├── pretrain.sh: 基于 LoRA 进行增量预训练
-│   ├── sft.sh: 基于 LoRA 进行指令监督微调
-│   ├── reward.sh: 基于 LoRA 进行奖励模型训练
-│   ├── ppo.sh: 基于 LoRA 进行 PPO 训练
-│   ├── dpo.sh: 基于 LoRA 进行 DPO 训练
-│   ├── orpo.sh: 基于 LoRA 进行 ORPO 训练
-│   ├── sft_mllm.sh: 基于 LoRA 进行多模态指令监督微调
-│   ├── prepare.sh: 保存预处理后的数据集
-│   └── predict.sh: 基于 LoRA 进行批量预测并计算 BLEU 和 ROUGE 分数
-├── qlora_single_gpu/
-│   ├── bitsandbytes.sh: 基于 QLoRA 微调 4/8 比特 BNB 模型
-│   ├── gptq.sh: 基于 QLoRA 微调 4/8 比特 GPTQ 模型
-│   ├── awq.sh: 基于 QLoRA 微调 4 比特 AWQ 模型
-│   └── aqlm.sh: 基于 QLoRA 微调 2 比特 AQLM 模型
-├── lora_multi_gpu/
-│   ├── single_node.sh: 使用 Accelerate 进行单节点 LoRA 训练
-│   ├── multi_node.sh: 使用 Accelerate 进行多节点 LoRA 训练
-│   └── ds_zero3.sh: 使用 DeepSpeed ZeRO-3 进行 LoRA 训练（拆分权重）
-├── full_multi_gpu/
-│   ├── single_node.sh: 使用 DeepSpeed 进行单节点全量训练
-│   ├── multi_node.sh: 使用 DeepSpeed 进行多节点全量训练
-│   └── predict.sh: 基于全量训练进行多卡批量预测并计算 BLEU 和 ROUGE 分数
-├── merge_lora/
-│   ├── merge.sh: 将 LoRA 权重合并到预训练模型中
-│   └── quantize.sh: 使用 AutoGPTQ 量化微调后的模型
-├── inference/
-│   ├── cli_demo.sh: 启动 LoRA 模型的命令行推理接口
-│   ├── api_demo.sh: 启动 LoRA 模型的 OpenAI 风格 API
-│   ├── web_demo.sh: 启动 LoRA 模型的浏览器推理接口
-│   └── evaluate.sh: 在 MMLU/CMMLU/C-Eval 数据集上评测 LoRA 模型
-└── extras/
-    ├── galore/
-    │   └── sft.sh: 使用 GaLore 训练模型
-    ├── badam/
-    │   └── sft.sh: 使用 BAdam 训练模型
-    ├── loraplus/
-    │   └── sft.sh: 使用 LoRA+ 训练模型
-    ├── mod/
-    │   └── sft.sh: 使用深度混合训练模型
-    ├── llama_pro/
-    │   ├── expand.sh: 扩展模型中的层
-    │   └── sft.sh: 训练扩展后的模型
-    └── fsdp_qlora/
-        └── sft.sh: 使用 FSDP+QLoRA 微调量化模型
+
+#### 指令监督微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_sft.yaml
+```
+
+#### 奖励模型训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_reward.yaml
+```
+
+#### PPO 训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_ppo.yaml
+```
+
+#### DPO 训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_dpo.yaml
+```
+
+#### ORPO 训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_orpo.yaml
+```
+
+#### 多模态指令监督微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llava1_5_lora_sft.yaml
+```
+
+#### 预处理数据集
+
+对于大数据集有帮助，在配置中使用 `tokenized_path` 以加载预处理后的数据集。
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_preprocess.yaml
+```
+
+#### 在 MMLU/CMMLU/C-Eval 上评估
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli eval examples/lora_single_gpu/llama3_lora_eval.yaml
+```
+
+#### 批量预测并计算 BLEU 和 ROUGE 分数
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/lora_single_gpu/llama3_lora_predict.yaml
+```
+
+### 单 GPU QLoRA 微调
+
+#### 基于 4/8 比特 Bitsandbytes 量化进行指令监督微调（推荐）
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/qlora_single_gpu/llama3_lora_sft_bitsandbytes.yaml
+```
+
+#### 基于 4/8 比特 GPTQ 量化进行指令监督微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/qlora_single_gpu/llama3_lora_sft_gptq.yaml
+```
+
+#### 基于 4 比特 AWQ 量化进行指令监督微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/qlora_single_gpu/llama3_lora_sft_awq.yaml
+```
+
+#### 基于 2 比特 AQLM 量化进行指令监督微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/qlora_single_gpu/llama3_lora_sft_aqlm.yaml
+```
+
+### 多 GPU LoRA 微调
+
+#### 使用 Accelerate 进行单节点训练
+
+```bash
+bash examples/lora_multi_gpu/single_node.sh
+```
+
+#### 使用 Accelerate 进行多节点训练
+
+```bash
+bash examples/lora_multi_gpu/multi_node.sh
+```
+
+#### 使用 DeepSpeed ZeRO-3 平均分配显存
+
+```bash
+bash examples/lora_multi_gpu/ds_zero3.sh
+```
+
+### 多 GPU 全参数微调
+
+#### 使用 DeepSpeed 进行单节点训练
+
+```bash
+bash examples/full_multi_gpu/single_node.sh
+```
+
+#### 使用 DeepSpeed 进行多节点训练
+
+```bash
+bash examples/full_multi_gpu/multi_node.sh
+```
+
+#### 批量预测并计算 BLEU 和 ROUGE 分数
+
+```bash
+bash examples/full_multi_gpu/predict.sh
+```
+
+### 合并 LoRA 适配器与模型量化
+
+#### 合并 LoRA 适配器
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
+```
+
+#### 使用 AutoGPTQ 量化模型
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
+```
+
+### 推理 LoRA 模型
+
+#### 使用命令行接口
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli chat examples/merge_lora/llama3_lora_sft.yaml
+```
+
+#### 使用浏览器界面
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli webchat examples/merge_lora/llama3_lora_sft.yaml
+```
+
+#### 启动 OpenAI 风格 API
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli api examples/merge_lora/llama3_lora_sft.yaml
+```
+
+### 杂项
+
+#### 使用 GaLore 进行全参数训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/extras/galore/llama3_full_sft.yaml
+```
+
+#### 使用 BAdam 进行全参数训练
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/extras/badam/llama3_full_sft.yaml
+```
+
+#### LoRA+ 微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/extras/loraplus/llama3_lora_sft.yaml
+```
+
+#### 深度混合微调
+
+```bash
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/extras/mod/llama3_full_sft.yaml
+```
+
+#### LLaMA-Pro 微调
+
+```bash
+bash examples/extras/llama_pro/expand.sh
+CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/extras/llama_pro/llama3_freeze_sft.yaml
+```
+
+#### FSDP+QLoRA 微调
+
+```bash
+bash examples/extras/fsdp_qlora/single_node.sh
 ```