1. update the version of pre-built bitsandbytes library

2. add pre-built flash-attn library


Former-commit-id: 9b40eddf7aeb6b3bcf58374d43cbe44eb24f3849
This commit is contained in:
codemayq
2024-02-20 11:26:22 +08:00
parent 5ccf8fcd6b
commit eb2aa2c073
2 changed files with 6 additions and 2 deletions

View File

@@ -267,6 +267,8 @@ If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you wi
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl
```
To enable Flash Attention on the Windows platform, you need to install the precompiled `flash-attn` library, which supports CUDA 12.1 to 12.2. Please download the corresponding version from [flash-attention](https://github.com/bdashore3/flash-attention/releases) based on your requirements.
### Use ModelScope Hub (optional)
If you have trouble with downloading models and datasets from Hugging Face, you can use LLaMA-Factory together with ModelScope in the following manner.