support DDP in webui

Former-commit-id: d059262ff8dc857f597d2657546ec625726a664a
2024-05-28 19:24:22 +08:00
parent 9912b43fcc
commit 9138a7a5ba
19 changed files with 78 additions and 166 deletions
--- a/examples/README.md
+++ b/examples/README.md
@@ -110,19 +110,20 @@ CUDA_VISIBLE_DEVICES=0 llamafactory-cli train examples/qlora_single_gpu/llama3_l
 #### Supervised Fine-Tuning with Accelerate on Single Node

 ```bash
-bash examples/lora_multi_gpu/single_node.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft.yaml
 ```

 #### Supervised Fine-Tuning with Accelerate on Multiple Nodes

 ```bash
-bash examples/lora_multi_gpu/multi_node.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft.yaml
+CUDA_VISIBLE_DEVICES=0,1,2,3 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft.yaml
 ```

 #### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)

 ```bash
-bash examples/lora_multi_gpu/ds_zero3.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft_ds.yaml
 ```

 ### LoRA Fine-Tuning on Multiple NPUs
@@ -130,7 +131,7 @@ bash examples/lora_multi_gpu/ds_zero3.sh
 #### Supervised Fine-Tuning with DeepSpeed ZeRO-0

 ```bash
-bash examples/lora_multi_npu/ds_zero0.sh
+ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/lora_multi_npu/llama3_lora_sft_ds.yaml
 ```

 ### Full-Parameter Fine-Tuning on Multiple GPUs
@@ -138,19 +139,20 @@ bash examples/lora_multi_npu/ds_zero0.sh
 #### Supervised Fine-Tuning with Accelerate on Single Node

 ```bash
-bash examples/full_multi_gpu/single_node.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/full_multi_gpu/llama3_full_sft.yaml
 ```

 #### Supervised Fine-Tuning with Accelerate on Multiple Nodes

 ```bash
-bash examples/full_multi_gpu/multi_node.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/full_multi_gpu/llama3_full_sft.yaml
+CUDA_VISIBLE_DEVICES=0,1,2,3 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/full_multi_gpu/llama3_full_sft.yaml
 ```

 #### Batch Predicting and Computing BLEU and ROUGE Scores

 ```bash
-bash examples/full_multi_gpu/predict.sh
+CUDA_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli train examples/full_multi_gpu/llama3_full_predict.yaml
 ```

 ### Merging LoRA Adapters and Quantization