support badam for all stages

Former-commit-id: 7a1380646119bfe6855f73dd90570defcea05281
2024-04-16 17:44:48 +08:00
parent 42084e08ae
commit a4167fd925
9 changed files with 61 additions and 28 deletions
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ Choose your path:
 - **Various models**: LLaMA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
 - **Integrated methods**: (Continuous) pre-training, supervised fine-tuning, reward modeling, PPO, DPO and ORPO.
 - **Scalable resources**: 32-bit full-tuning, 16-bit freeze-tuning, 16-bit LoRA and 2/4/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8.
- **Advanced algorithms**: GaLore, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning.
+- **Advanced algorithms**: GaLore, BAdam, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning.
 - **Practical tricks**: FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA.
 - **Experiment monitors**: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
 - **Faster inference**: OpenAI-style API, Gradio UI and CLI with vLLM worker.
@@ -68,14 +68,16 @@ Compared to ChatGLM's [P-Tuning](https://github.com/THUDM/ChatGLM2-6B/tree/main/

 ## Changelog

+[24/04/16] We supported **[BAdam](https://arxiv.org/abs/2404.02827)**. See `examples/extras/badam` for usage.
+
 [24/04/16] We supported **[unsloth](https://github.com/unslothai/unsloth)**'s long-sequence training (Llama-2-7B-56k within 24GB). It achieves **117%** speed and **50%** memory compared with FlashAttention-2, more benchmarks can be found in [this page](https://github.com/hiyouga/LLaMA-Factory/wiki/Performance-comparison).

 [24/03/31] We supported **[ORPO](https://arxiv.org/abs/2403.07691)**. See `examples/lora_single_gpu` for usage.

-[24/03/21] Our paper "[LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models](https://arxiv.org/abs/2403.13372)" is available at arXiv!
-
 <details><summary>Full Changelog</summary>

+[24/03/21] Our paper "[LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models](https://arxiv.org/abs/2403.13372)" is available at arXiv!
+
 [24/03/20] We supported **FSDP+QLoRA** that fine-tunes a 70B model on 2x24GB GPUs. See `examples/extras/fsdp_qlora` for usage.

 [24/03/13] We supported **[LoRA+](https://arxiv.org/abs/2402.12354)**. See `examples/extras/loraplus` for usage.
@@ -278,16 +280,15 @@ huggingface-cli login

 \* *estimated*

-| Method | Bits |   7B  |  13B  |  30B  |   70B  |   8x7B |
-| ------ | ---- | ----- | ----- | ----- | ------ | ------ |
-| Full   | AMP  | 120GB | 240GB | 600GB | 1200GB |  900GB |
-| Full   |  16  |  60GB | 120GB | 300GB |  600GB |  400GB |
-| GaLore |  16  |  16GB |  32GB |  64GB |  160GB |  120GB |
-| Freeze |  16  |  20GB |  40GB |  80GB |  200GB |  160GB |
-| LoRA   |  16  |  16GB |  32GB |  64GB |  160GB |  120GB |
-| QLoRA  |   8  |  10GB |  20GB |  40GB |   80GB |   60GB |
-| QLoRA  |   4  |   6GB |  12GB |  24GB |   48GB |   30GB |
-| QLoRA  |   2  |   4GB |   8GB |  16GB |   24GB |   18GB |
+| Method            | Bits |   7B  |  13B  |  30B  |   70B  |   8x7B |
+| ----------------- | ---- | ----- | ----- | ----- | ------ | ------ |
+| Full              | AMP  | 120GB | 240GB | 600GB | 1200GB |  900GB |
+| Full              |  16  |  60GB | 120GB | 300GB |  600GB |  400GB |
+| Freeze            |  16  |  20GB |  40GB |  80GB |  200GB |  160GB |
+| LoRA/GaLore/BAdam |  16  |  16GB |  32GB |  64GB |  160GB |  120GB |
+| QLoRA             |   8  |  10GB |  20GB |  40GB |   80GB |   60GB |
+| QLoRA             |   4  |   6GB |  12GB |  24GB |   48GB |   30GB |
+| QLoRA             |   2  |   4GB |   8GB |  16GB |   24GB |   18GB |

 ## Getting Started