[inference] fix stop token for object detection (#6624)

* fix stop token * update minicpm data pipeline * fix npu qlora examples Former-commit-id: 844919fadaa8a61dfae47020971ea80730b2346f
2025-01-13 21:34:20 +08:00
parent 11c38b9173
commit 2a05941b14
15 changed files with 101 additions and 45 deletions
--- a/README_zh.md
+++ b/README_zh.md
@@ -404,19 +404,23 @@ pip install -e ".[torch,metrics]"

 <details><summary>Windows 用户指南</summary>

+#### 安装 BitsAndBytes
+
 如果要在 Windows 平台上开启量化 LoRA（QLoRA），需要安装预编译的 `bitsandbytes` 库, 支持 CUDA 11.1 到 12.2, 请根据您的 CUDA 版本情况选择适合的[发布版本](https://github.com/jllllll/bitsandbytes-windows-webui/releases/tag/wheels)。

 ```bash
 pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.2.post2-py3-none-win_amd64.whl
 ```

+#### 安装 Flash Attention-2
+
 如果要在 Windows 平台上开启 FlashAttention-2，需要安装预编译的 `flash-attn` 库，支持 CUDA 12.1 到 12.2，请根据需求到 [flash-attention](https://github.com/bdashore3/flash-attention/releases) 下载对应版本安装。

 </details>

 <details><summary>昇腾 NPU 用户指南</summary>

-在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级Python到3.10及以上，并需要指定额外依赖项，使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：
+在昇腾 NPU 设备上安装 LLaMA Factory 时，请升级 Python 到 3.10 及以上，并需要指定额外依赖项，使用 `pip install -e ".[torch-npu,metrics]"` 命令安装。此外，还需要安装 **[Ascend CANN Toolkit 与 Kernels](https://www.hiascend.com/developer/download/community/result?module=cann)**，安装方法请参考[安装教程](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC2alpha002/quickstart/quickstart/quickstart_18_0004.html)或使用以下命令：

 ```bash
 # 请替换 URL 为 CANN 版本和设备型号对应的 URL
@@ -445,11 +449,15 @@ source /usr/local/Ascend/ascend-toolkit/set_env.sh

 下载预构建 Docker 镜像：[32GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/130.html) | [64GB](http://mirrors.cn-central-221.ovaijisuan.com/detail/131.html)

-如果要在 Ascend NPU中使用 基于bitsandbytes 的nf4 QLoRA量化，请执行如下3个步骤
-1. 手动编译bnb：请参考 bitsandbytes npu版本的[安装文档](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU)完成bnb的编译安装，编译要求环境cmake版本不低于3.22.1，g++版本不低于12.x
-```
-# 从源码安装bitsandbytes
-# 克隆bitsandbytes仓库, Ascend NPU目前在multi-backend-refactor中支持
+#### 安装 BitsAndBytes
+
+如果要在 Ascend NPU 上进行基于 bitsandbytes 的 QLoRA 量化微调，请执行如下步骤：
+
+1. 手动编译 bitsandbytes：请参考[安装文档](https://huggingface.co/docs/bitsandbytes/installation?backend=Ascend+NPU&platform=Ascend+NPU)完成 NPU 版的 bitsandbytes 安装，编译要求环境 cmake 版本不低于 3.22.1，g++ 版本不低于 12.x。
+
+```bash
+# 从源码安装 bitsandbytes
+# 克隆 bitsandbytes 仓库, Ascend NPU 目前在 multi-backend-refactor 中支持
 git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git
 cd bitsandbytes/

@@ -462,15 +470,18 @@ apt-get install -y build-essential cmake
 # 编译 & 安装
 cmake -DCOMPUTE_BACKEND=npu -S .
 make
-pip install -e .
-```
-2. 安装使用transformers的main分支版本
+pip install .
 ```
+
+2. 安装 transformers 的 main 分支版本。
+
+```bash
 git clone -b https://github.com/huggingface/transformers.git
 cd transformers
 pip install .
 ```
-3. 设置训练参数中的double_quantization参数为false，可参考[示例](examples/train_qlora/llama3_lora_sft_otfq_npu.yaml)
+
+3. 在训练参数中设置 `double_quantization: false`，可参考[示例](examples/train_qlora/llama3_lora_sft_bnb_npu.yaml)。

 </details>