update parser

Former-commit-id: 5262c8702382ff8bc36a172387bc4c8949f326ea
release v0.8.3
2024-07-19 01:36:39 +08:00 · 2024-07-19 01:21:18 +08:00 · 2024-07-19 01:17:37 +08:00 · 2024-07-19 01:10:30 +08:00 · 2024-07-19 01:06:27 +08:00 · 2024-07-18 22:06:12 +08:00
297 changed files with 23066 additions and 9206 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -0,0 +1,13 @@
 .vscode
 .git
 .github
 .venv
 cache
 data
 docker
 saves
 hf_cache
 output
 .dockerignore
 .gitattributes
 .gitignore
--- a/.github/CODE_OF_CONDUCT.md
+++ b/.github/CODE_OF_CONDUCT.md
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,21 @@
 # Contributing to LLaMA Factory
 Everyone is welcome to contribute, and we value everybody's contribution. Code contributions are not the only way to help the community. Answering questions, helping others, and improving the documentation are also immensely valuable.
 It also helps us if you spread the word! Reference the library in blog posts about the awesome projects it made possible, shout out on Twitter every time it has helped you, or simply ⭐️ the repository to say thank you.
 However you choose to contribute, please be mindful and respect our [code of conduct](CODE_OF_CONDUCT.md).
 **This guide was heavily inspired by [transformers guide to contributing](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md).**
 ## Ways to contribute
 There are several ways you can contribute to LLaMA Factory:
 * Fix outstanding issues with the existing code.
 * Submit issues related to bugs or desired new features.
 * Contribute to the examples or to the documentation.
 ### Style guide
 LLaMA Factory follows the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html), check it for details.
--- a/.github/ISSUE_TEMPLATE/bug-report.yml
+++ b/.github/ISSUE_TEMPLATE/bug-report.yml
@@ -1,18 +1,36 @@
 name: "\U0001F41B Bug / Help"
 description: Create a report to help us improve the LLaMA Factory
 body:
  - type: markdown
    attributes:
      value: |
        Issues included in **FAQs** or those with **insufficient** information may be closed without a response.
        包含在**常见问题**内或提供信息**不完整**的 issues 可能不会被回复。
  - type: checkboxes
    id: reminder
    attributes:
      label: Reminder
      description: |
-        Please ensure you have read the README carefully and searched the existing issues.
+        Please ensure you have read the README carefully and searched the existing issues (including FAQs).
-        请确保您已经认真阅读了 README 并且搜索过现有的 Issue。
+        请确保您已经认真阅读了 README 并且搜索过现有的 issues（包括常见问题）。
      options:
        - label: I have read the README and searched the existing issues.
          required: true
  - type: textarea
    id: system-info
    validations:
      required: true
    attributes:
      label: System Info
      description: |
        Please share your system info with us. You can run the command **llamafactory-cli env** and copy-paste its output below.
        请提供您的系统信息。您可以在命令行运行 **llamafactory-cli env** 并将其输出复制到该文本框中。
      placeholder: llamafactory version, platform, python version, ...
  - type: textarea
    id: reproduction
    validations:
@@ -26,7 +44,9 @@ body:
        请合理使用 Markdown 标签来格式化您的文本。
      placeholder: |
-        python src/train_bash.py ...
+        ```bash
        llamafactory-cli train ...
        ```
  - type: textarea
    id: expected-behavior
@@ -38,18 +58,6 @@ body:
        Please provide a clear and concise description of what you would expect to happen.
        请提供您原本的目的，即这段代码的期望行为。
  - type: textarea
    id: system-info
    validations:
      required: false
    attributes:
      label: System Info
      description: |
        Please share your system info with us. You can run the command **transformers-cli env** and copy-paste its output below.
        请提供您的系统信息。您可以在命令行运行 **transformers-cli env** 并将其输出复制到该文本框中。
      placeholder: transformers version, platform, python version, ...
  - type: textarea
    id: others
    validations:
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,8 @@
 # What does this PR do?
 Fixes # (issue)
 ## Before submitting
 - [ ] Did you read the [contributor guideline](https://github.com/hiyouga/LLaMA-Factory/blob/main/.github/CONTRIBUTING.md)?
 - [ ] Did you write any new necessary tests?
--- a/.github/SECURITY.md
+++ b/.github/SECURITY.md
@@ -0,0 +1,7 @@
 # Reporting Security Issues
 To report a security issue, please use the GitHub Security Advisory ["Report a Vulnerability"](https://github.com/hiyouga/LLaMA-Factory/security/advisories/new) tab.
 We will send a response indicating the next steps in handling your report. After the initial reply to your report, the security team will keep you informed of the progress towards a fix and full announcement, and may ask for additional information or guidance.
 Report security bugs in third-party modules to the person or team maintaining the module.
--- a/.github/workflows/label_issue.yml
+++ b/.github/workflows/label_issue.yml
@@ -0,0 +1,30 @@
 name: label_issue
 on:
  issues:
    types:
      - opened
 jobs:
  label_issue:
    runs-on: ubuntu-latest
    permissions:
      issues: write
    steps:
      - env:
          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ISSUE_URL: ${{ github.event.issue.html_url }}
          ISSUE_TITLE: ${{ github.event.issue.title }}
        run: |
          LABEL=pending
          NPU_KEYWORDS=(npu huawei ascend 华为 昇腾)
          ISSUE_TITLE_LOWER=$(echo $ISSUE_TITLE | tr '[:upper:]' '[:lower:]')
          for KEYWORD in ${NPU_KEYWORDS[@]}; do
            if [[ $ISSUE_TITLE_LOWER == *$KEYWORD* ]] && [[ $ISSUE_TITLE_LOWER != *input* ]]; then
              LABEL=pending,npu
              break
            fi
          done
          gh issue edit $ISSUE_URL --add-label $LABEL
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@@ -0,0 +1,40 @@
 name: publish
 on:
  release:
    types:
      - published
 jobs:
  publish:
    name: Upload release to PyPI
    runs-on: ubuntu-latest
    environment:
      name: release
      url: https://pypi.org/p/llamafactory
    permissions:
      id-token: write
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.8"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install build
      - name: Build package
        run: |
          python -m build
      - name: Publish package
        uses: pypa/gh-action-pypi-publish@release/v1
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -0,0 +1,51 @@
 name: tests
 on:
  push:
    branches:
      - main
    paths:
      - "**.py"
      - "requirements.txt"
      - ".github/workflows/*.yml"
  pull_request:
    branches:
      - main
    paths:
      - "**.py"
      - "requirements.txt"
      - ".github/workflows/*.yml"
 jobs:
  tests:
    runs-on: ubuntu-latest
    environment:
      name: tests
    env:
      HF_TOKEN: ${{ secrets.HF_TOKEN }}
    steps:
      - name: Checkout
        uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.8"
          cache: "pip"
          cache-dependency-path: "setup.py"
      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          python -m pip install ".[torch,dev]"
      - name: Check quality
        run: |
          make style && make quality
      - name: Test with pytest
        run: |
          make test
--- a/.gitignore
+++ b/.gitignore
@@ -160,6 +160,8 @@ cython_debug/
 .idea/
 # custom .gitignore
 user.config
 saves/
 cache/
 config/
 saves/
 output/
 wandb/
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -0,0 +1,44 @@
 cff-version: 1.2.0
 date-released: 2024-03
 message: "If you use this software, please cite it as below."
 authors:
 - family-names: "Zheng"
  given-names: "Yaowei"
 - family-names: "Zhang"
  given-names: "Richong"
 - family-names: "Zhang"
  given-names: "Junhao"
 - family-names: "Ye"
  given-names: "Yanhan"
 - family-names: "Luo"
  given-names: "Zheyan"
 - family-names: "Feng"
  given-names: "Zhangchi"
 - family-names: "Ma"
  given-names: "Yongqiang"
 title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
 url: "https://arxiv.org/abs/2403.13372"
 preferred-citation:
  type: conference-paper
  conference:
    name: "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)"
  authors:
    - family-names: "Zheng"
      given-names: "Yaowei"
    - family-names: "Zhang"
      given-names: "Richong"
    - family-names: "Zhang"
      given-names: "Junhao"
    - family-names: "Ye"
      given-names: "Yanhan"
    - family-names: "Luo"
      given-names: "Zheyan"
    - family-names: "Feng"
      given-names: "Zhangchi"
    - family-names: "Ma"
      given-names: "Yongqiang"
  title: "LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models"
  url: "https://arxiv.org/abs/2403.13372"
  year: 2024
  publisher: "Association for Computational Linguistics"
  address: "Bangkok, Thailand"
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -0,0 +1 @@
 include LICENSE requirements.txt
--- a/15
+++ b/15
@@ -1,11 +1,14 @@
-.PHONY: quality style
+.PHONY: quality style test
-check_dirs := src tests
+check_dirs := scripts src tests
 quality:
-	black --check $(check_dirs)
+	ruff check $(check_dirs)
-	ruff $(check_dirs)
+	ruff format --check $(check_dirs)
 style:
-	black $(check_dirs)
+	ruff check $(check_dirs) --fix
-	ruff $(check_dirs) --fix
+	ruff format $(check_dirs)
 test:
 	CUDA_VISIBLE_DEVICES= pytest tests/
--- a/README.md
+++ b/README.md
--- a/README_zh.md
+++ b/README_zh.md
--- a/data/README.md
+++ b/data/README.md
@@ -1,52 +1,76 @@
-If you are using a custom dataset, please provide your dataset definition in the following format in `dataset_info.json`.
+The [dataset_info.json](dataset_info.json) contains all available datasets. If you are using a custom dataset, please **make sure** to add a *dataset description* in `dataset_info.json` and specify `dataset: dataset_name` before training to use it.
 Currently we support datasets in **alpaca** and **sharegpt** format.
 ```json
 "dataset_name": {
  "hf_hub_url": "the name of the dataset repository on the Hugging Face hub. (if specified, ignore script_url and file_name)",
-  "ms_hub_url": "the name of the dataset repository on the ModelScope hub. (if specified, ignore script_url and file_name)",
+  "ms_hub_url": "the name of the dataset repository on the Model Scope hub. (if specified, ignore script_url and file_name)",
  "script_url": "the name of the directory containing a dataset loading script. (if specified, ignore file_name)",
-  "file_name": "the name of the dataset file in this directory. (required if above are not specified)",
+  "file_name": "the name of the dataset folder or dataset file in this directory. (required if above are not specified)",
  "file_sha1": "the SHA-1 hash value of the dataset file. (optional, does not affect training)",
  "subset": "the name of the subset. (optional, default: None)",
  "folder": "the name of the folder of the dataset repository on the Hugging Face hub. (optional, default: None)",
  "ranking": "whether the dataset is a preference dataset or not. (default: false)",
  "formatting": "the format of the dataset. (optional, default: alpaca, can be chosen from {alpaca, sharegpt})",
-  "columns": {
+  "ranking": "whether the dataset is a preference dataset or not. (default: False)",
-    "prompt": "the column name in the dataset containing the prompts. (default: instruction, for alpaca)",
+  "subset": "the name of the subset. (optional, default: None)",
-    "query": "the column name in the dataset containing the queries. (default: input, for alpaca)",
+  "split": "the name of dataset split to be used. (optional, default: train)",
-    "response": "the column name in the dataset containing the responses. (default: output, for alpaca)",
+  "folder": "the name of the folder of the dataset repository on the Hugging Face hub. (optional, default: None)",
-    "history": "the column name in the dataset containing the histories. (default: None, for alpaca)",
+  "num_samples": "the number of samples in the dataset to be used. (optional, default: None)",
-    "messages": "the column name in the dataset containing the messages. (default: conversations, for sharegpt)",
+  "columns (optional)": {
-    "role": "the key in the message represents the identity. (default: from, for sharegpt)",
+    "prompt": "the column name in the dataset containing the prompts. (default: instruction)",
-    "content": "the key in the message represents the content. (default: value, for sharegpt)",
+    "query": "the column name in the dataset containing the queries. (default: input)",
-    "system": "the column name in the dataset containing the system prompts. (default: None, for both)"
+    "response": "the column name in the dataset containing the responses. (default: output)",
    "history": "the column name in the dataset containing the histories. (default: None)",
    "messages": "the column name in the dataset containing the messages. (default: conversations)",
    "system": "the column name in the dataset containing the system prompts. (default: None)",
    "tools": "the column name in the dataset containing the tool description. (default: None)",
    "images": "the column name in the dataset containing the image inputs. (default: None)",
    "chosen": "the column name in the dataset containing the chosen answers. (default: None)",
    "rejected": "the column name in the dataset containing the rejected answers. (default: None)",
    "kto_tag": "the column name in the dataset containing the kto tags. (default: None)"
  },
  "tags (optional, used for the sharegpt format)": {
    "role_tag": "the key in the message represents the identity. (default: from)",
    "content_tag": "the key in the message represents the content. (default: value)",
    "user_tag": "the value of the role_tag represents the user. (default: human)",
    "assistant_tag": "the value of the role_tag represents the assistant. (default: gpt)",
    "observation_tag": "the value of the role_tag represents the tool results. (default: observation)",
    "function_tag": "the value of the role_tag represents the function call. (default: function_call)",
    "system_tag": "the value of the role_tag represents the system prompt. (default: system, can override system column)"
  }
 }
 ```
-Given above, you can use the custom dataset via specifying `--dataset dataset_name`.
+## Alpaca Format
-Currently we support dataset in **alpaca** or **sharegpt** format, the dataset in alpaca format should follow the below format:
+### Supervised Fine-Tuning Dataset
 * [Example dataset](alpaca_en_demo.json)
 In supervised fine-tuning, the `instruction` column will be concatenated with the `input` column and used as the human prompt, then the human prompt would be `instruction\ninput`. The `output` column represents the model response.
 The `system` column will be used as the system prompt if specified.
 The `history` column is a list consisting of string tuples representing prompt-response pairs in the history messages. Note that the responses in the history **will also be learned by the model** in supervised fine-tuning.
 ```json
 [
  {
-    "instruction": "user instruction (required)",
+    "instruction": "human instruction (required)",
-    "input": "user input (optional)",
+    "input": "human input (optional)",
    "output": "model response (required)",
    "system": "system prompt (optional)",
    "history": [
-      ["user instruction in the first round (optional)", "model response in the first round (optional)"],
+      ["human instruction in the first round (optional)", "model response in the first round (optional)"],
-      ["user instruction in the second round (optional)", "model response in the second round (optional)"]
+      ["human instruction in the second round (optional)", "model response in the second round (optional)"]
    ]
  }
 ]
 ```
-Regarding the above dataset, the `columns` in `dataset_info.json` should be:
+Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
@@ -57,26 +81,135 @@ Regarding the above dataset, the `columns` in `dataset_info.json` should be:
 }
 ```
-where the `prompt` and `response` columns should contain non-empty values, represent instruction and response respectively. The `query` column will be concatenated with the `prompt` column and used as input for the model.
+### Pre-training Dataset
-The `system` column will be used as the system prompt in the template. The `history` column is a list consisting string tuples representing query-response pairs in history. Note that the responses **in each round will be used for training**.
+- [Example dataset](c4_demo.json)
-For the pre-training datasets, only the `prompt` column will be used for training.
+In pre-training, only the `text` column will be used for model learning.
 For the preference datasets, the `response` column should be a string list whose length is 2, with the preferred answers appearing first, for example:
 ```json
-{
+[
-  "instruction": "user instruction",
+  {"text": "document"},
-  "input": "user input",
+  {"text": "document"}
-  "output": [
+]
-    "chosen answer",
+```
-    "rejected answer"
+
-  ]
+Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "columns": {
    "prompt": "text"
  }
 }
 ```
-The dataset in sharegpt format should follow the below format:
+### Preference Dataset
 Preference datasets are used for reward modeling, DPO training and ORPO training.
 It requires a better response in `chosen` column and a worse response in `rejected` column.
 ```json
 [
  {
    "instruction": "human instruction (required)",
    "input": "human input (optional)",
    "chosen": "chosen answer (required)",
    "rejected": "rejected answer (required)"
  }
 ]
 ```
 Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "ranking": true,
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "chosen": "chosen",
    "rejected": "rejected"
  }
 }
 ```
 ### KTO Dataset
 - [Example dataset](kto_en_demo.json)
 KTO datasets require a extra `kto_tag` column containing the boolean human feedback.
 ```json
 [
  {
    "instruction": "human instruction (required)",
    "input": "human input (optional)",
    "output": "model response (required)",
    "kto_tag": "human feedback [true/false] (required)"
  }
 ]
 ```
 Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "kto_tag": "kto_tag"
  }
 }
 ```
 ### Multimodal Dataset
 - [Example dataset](mllm_demo.json)
 Multimodal datasets require a `images` column containing the paths to the input images. Currently we only support one image.
 ```json
 [
  {
    "instruction": "human instruction (required)",
    "input": "human input (optional)",
    "output": "model response (required)",
    "images": [
      "image path (required)"
    ]
  }
 ]
 ```
 Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "images": "images"
  }
 }
 ```
 ## Sharegpt Format
 ### Supervised Fine-Tuning Dataset
 - [Example dataset](glaive_toolcall_en_demo.json)
 Compared to the alpaca format, the sharegpt format allows the datasets have **more roles**, such as human, gpt, observation and function. They are presented in a list of objects in the `conversations` column.
 Note that the human and observation should appear in odd positions, while gpt and function should appear in even positions.
 ```json
 [
@@ -84,31 +217,135 @@ The dataset in sharegpt format should follow the below format:
    "conversations": [
      {
        "from": "human",
-        "value": "user instruction"
+        "value": "human instruction"
      },
      {
        "from": "function_call",
        "value": "tool arguments"
      },
      {
        "from": "observation",
        "value": "tool result"
      },
      {
        "from": "gpt",
        "value": "model response"
      }
    ],
-    "system": "system prompt (optional)"
+    "system": "system prompt (optional)",
    "tools": "tool description (optional)"
  }
 ]
 ```
-Regarding the above dataset, the `columns` in `dataset_info.json` should be:
+Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "conversations",
-    "role": "from",
+    "system": "system",
-    "content": "value",
+    "tools": "tools"
    "system": "system"
  }
 }
 ```
-where the `messages` column should be a list whose length is even, and follow the `u/a/u/a/u/a` order.
+### Preference Dataset
-Pre-training datasets and preference datasets are incompatible with the sharegpt format yet.
+- [Example dataset](dpo_en_demo.json)
 Preference datasets in sharegpt format also require a better message in `chosen` column and a worse message in `rejected` column.
 ```json
 [
  {
    "conversations": [
      {
        "from": "human",
        "value": "human instruction"
      },
      {
        "from": "gpt",
        "value": "model response"
      },
      {
        "from": "human",
        "value": "human instruction"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "chosen answer (required)"
    },
    "rejected": {
      "from": "gpt",
      "value": "rejected answer (required)"
    }
  }
 ]
 ```
 Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "ranking": true,
  "columns": {
    "messages": "conversations",
    "chosen": "chosen",
    "rejected": "rejected"
  }
 }
 ```
 ### OpenAI Format
 The openai format is simply a special case of the sharegpt format, where the first message may be a system prompt.
 ```json
 [
  {
    "messages": [
      {
        "role": "system",
        "content": "system prompt (optional)"
      },
      {
        "role": "user",
        "content": "human instruction"
      },
      {
        "role": "assistant",
        "content": "model response"
      }
    ]
  }
 ]
 ```
 Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
 ```json
 "dataset_name": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "messages"
  },
  "tags": {
    "role_tag": "role",
    "content_tag": "content",
    "user_tag": "user",
    "assistant_tag": "assistant",
    "system_tag": "system"
  }
 }
 ```
 The KTO datasets and multimodal datasets in sharegpt format are similar to the alpaca format.
 Pre-training datasets are **incompatible** with the sharegpt format.
--- a/data/README_zh.md
+++ b/data/README_zh.md
@@ -1,38 +1,61 @@
-如果您使用自定义数据集，请务必在 `dataset_info.json` 文件中按照以下格式提供数据集定义。
+[dataset_info.json](dataset_info.json) 包含了所有可用的数据集。如果您希望使用自定义数据集，请**务必**在 `dataset_info.json` 文件中添加*数据集描述*，并通过修改 `dataset: 数据集名称` 配置来使用数据集。
 目前我们支持 **alpaca** 格式和 **sharegpt** 格式的数据集。
 ```json
 "数据集名称": {
  "hf_hub_url": "Hugging Face 的数据集仓库地址（若指定，则忽略 script_url 和 file_name）",
  "ms_hub_url": "ModelScope 的数据集仓库地址（若指定，则忽略 script_url 和 file_name）",
  "script_url": "包含数据加载脚本的本地文件夹名称（若指定，则忽略 file_name）",
-  "file_name": "该目录下数据集文件的名称（若上述参数未指定，则此项必需）",
+  "file_name": "该目录下数据集文件夹或文件的名称（若上述参数未指定，则此项必需）",
  "file_sha1": "数据集文件的 SHA-1 哈希值（可选，留空不影响训练）",
  "subset": "数据集子集的名称（可选，默认：None）",
  "folder": "Hugging Face 仓库的文件夹名称（可选，默认：None）",
  "ranking": "是否为偏好数据集（可选，默认：False）",
  "formatting": "数据集格式（可选，默认：alpaca，可以为 alpaca 或 sharegpt）",
-  "columns": {
+  "ranking": "是否为偏好数据集（可选，默认：False）",
-    "prompt": "数据集代表提示词的表头名称（默认：instruction，用于 alpaca 格式）",
+  "subset": "数据集子集的名称（可选，默认：None）",
-    "query": "数据集代表请求的表头名称（默认：input，用于 alpaca 格式）",
+  "split": "所使用的数据集切分（可选，默认：train）",
-    "response": "数据集代表回答的表头名称（默认：output，用于 alpaca 格式）",
+  "folder": "Hugging Face 仓库的文件夹名称（可选，默认：None）",
-    "history": "数据集代表历史对话的表头名称（默认：None，用于 alpaca 格式）",
+  "num_samples": "该数据集所使用的样本数量。（可选，默认：None）",
-    "messages": "数据集代表消息列表的表头名称（默认：conversations，用于 sharegpt 格式）",
+  "columns（可选）": {
-    "role": "消息中代表发送者身份的键名（默认：from，用于 sharegpt 格式）",
+    "prompt": "数据集代表提示词的表头名称（默认：instruction）",
-    "content": "消息中代表文本内容的键名（默认：value，用于 sharegpt 格式）",
+    "query": "数据集代表请求的表头名称（默认：input）",
-    "system": "数据集代表系统提示的表头名称（默认：None，用于两种格式）"
+    "response": "数据集代表回答的表头名称（默认：output）",
    "history": "数据集代表历史对话的表头名称（默认：None）",
    "messages": "数据集代表消息列表的表头名称（默认：conversations）",
    "system": "数据集代表系统提示的表头名称（默认：None）",
    "tools": "数据集代表工具描述的表头名称（默认：None）",
    "images": "数据集代表图像输入的表头名称（默认：None）",
    "chosen": "数据集代表更优回答的表头名称（默认：None）",
    "rejected": "数据集代表更差回答的表头名称（默认：None）",
    "kto_tag": "数据集代表 KTO 标签的表头名称（默认：None）"
  },
  "tags（可选，用于 sharegpt 格式）": {
    "role_tag": "消息中代表发送者身份的键名（默认：from）",
    "content_tag": "消息中代表文本内容的键名（默认：value）",
    "user_tag": "消息中代表用户的 role_tag（默认：human）",
    "assistant_tag": "消息中代表助手的 role_tag（默认：gpt）",
    "observation_tag": "消息中代表工具返回结果的 role_tag（默认：observation）",
    "function_tag": "消息中代表工具调用的 role_tag（默认：function_call）",
    "system_tag": "消息中代表系统提示的 role_tag（默认：system，会覆盖 system column）"
  }
 }
 ```
-添加后可通过指定 `--dataset 数据集名称` 参数使用自定义数据集。
+## Alpaca 格式
-该项目目前支持两种格式的数据集：**alpaca** 和 **sharegpt**，其中 alpaca 格式的数据集按照以下方式组织：
+### 指令监督微调数据集
 - [样例数据集](alpaca_zh_demo.json)
 在指令监督微调时，`instruction` 列对应的内容会与 `input` 列对应的内容拼接后作为人类指令，即人类指令为 `instruction\ninput`。而 `output` 列对应的内容为模型回答。
 如果指定，`system` 列对应的内容将被作为系统提示词。
 `history` 列是由多个字符串二元组构成的列表，分别代表历史消息中每轮对话的指令和回答。注意在指令监督微调时，历史消息中的回答内容**也会被用于模型学习**。
 ```json
 [
  {
-    "instruction": "用户指令（必填）",
+    "instruction": "人类指令（必填）",
-    "input": "用户输入（选填）",
+    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "system": "系统提示词（选填）",
    "history": [
@@ -43,10 +66,11 @@
 ]
 ```
-对于上述格式的数据，`dataset_info.json` 中的 `columns` 应为：
+对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
@@ -57,26 +81,135 @@
 }
 ```
-其中 `prompt` 和 `response` 列应当是非空的字符串，分别代表用户指令和模型回答。`query` 列的内容将会和 `prompt` 列拼接作为模型输入。
+### 预训练数据集
-`system` 为模板中的系统提示词。`history` 列是由多个字符串二元组构成的列表，分别代表历史消息中每轮的指令和回答。注意每轮的模型回答**均会被用于训练**。
+- [样例数据集](c4_demo.json)
-对于预训练数据集，仅 `prompt` 列中的内容会用于模型训练。
+在预训练时，只有 `text` 列中的内容会用于模型学习。
 对于偏好数据集，`response` 列应当是一个长度为 2 的字符串列表，排在前面的代表更优的回答，例如：
 ```json
-{
+[
-  "instruction": "用户指令",
+  {"text": "document"},
-  "input": "用户输入",
+  {"text": "document"}
-  "output": [
+]
-    "优质回答",
+```
-    "劣质回答"
+
-  ]
+对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "text"
  }
 }
 ```
-而 sharegpt 格式的数据集按照以下方式组织：
+### 偏好数据集
 偏好数据集用于奖励模型训练、DPO 训练和 ORPO 训练。
 它需要在 `chosen` 列中提供更优的回答，并在 `rejected` 列中提供更差的回答。
 ```json
 [
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "chosen": "优质回答（必填）",
    "rejected": "劣质回答（必填）"
  }
 ]
 ```
 对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "ranking": true,
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "chosen": "chosen",
    "rejected": "rejected"
  }
 }
 ```
 ### KTO 数据集
 - [样例数据集](kto_en_demo.json)
 KTO 数据集需要额外添加一个 `kto_tag` 列，包含 bool 类型的人类反馈。
 ```json
 [
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "kto_tag": "人类反馈 [true/false]（必填）"
  }
 ]
 ```
 对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "kto_tag": "kto_tag"
  }
 }
 ```
 ### 多模态数据集
 - [样例数据集](mllm_demo.json)
 多模态数据集需要额外添加一个 `images` 列，包含输入图像的路径。目前我们仅支持单张图像输入。
 ```json
 [
  {
    "instruction": "人类指令（必填）",
    "input": "人类输入（选填）",
    "output": "模型回答（必填）",
    "images": [
      "图像路径（必填）"
    ]
  }
 ]
 ```
 对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "columns": {
    "prompt": "instruction",
    "query": "input",
    "response": "output",
    "images": "images"
  }
 }
 ```
 ## Sharegpt 格式
 ### 指令监督微调数据集
 - [样例数据集](glaive_toolcall_zh_demo.json)
 相比 alpaca 格式的数据集，sharegpt 格式支持**更多的角色种类**，例如 human、gpt、observation、function 等等。它们构成一个对象列表呈现在 `conversations` 列中。
 注意其中 human 和 observation 必须出现在奇数位置，gpt 和 function 必须出现在偶数位置。
 ```json
 [
@@ -84,31 +217,135 @@
    "conversations": [
      {
        "from": "human",
-        "value": "用户指令"
+        "value": "人类指令"
      },
      {
        "from": "function_call",
        "value": "工具参数"
      },
      {
        "from": "observation",
        "value": "工具结果"
      },
      {
        "from": "gpt",
        "value": "模型回答"
      }
    ],
-    "system": "系统提示词（选填）"
+    "system": "系统提示词（选填）",
    "tools": "工具描述（选填）"
  }
 ]
 ```
-对于上述格式的数据，`dataset_info.json` 中的 `columns` 应为：
+对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "conversations",
-    "role": "from",
+    "system": "system",
-    "content": "value",
+    "tools": "tools"
    "system": "system"
  }
 }
 ```
-其中 `messages` 列必须为偶数长度的列表，且符合 `用户/模型/用户/模型/用户/模型` 的顺序。
+### 偏好数据集
-预训练数据集和偏好数据集尚不支持 sharegpt 格式。
+- [样例数据集](dpo_zh_demo.json)
 Sharegpt 格式的偏好数据集同样需要在 `chosen` 列中提供更优的消息，并在 `rejected` 列中提供更差的消息。
 ```json
 [
  {
    "conversations": [
      {
        "from": "human",
        "value": "人类指令"
      },
      {
        "from": "gpt",
        "value": "模型回答"
      },
      {
        "from": "human",
        "value": "人类指令"
      }
    ],
    "chosen": {
      "from": "gpt",
      "value": "优质回答"
    },
    "rejected": {
      "from": "gpt",
      "value": "劣质回答"
    }
  }
 ]
 ```
 对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "ranking": true,
  "columns": {
    "messages": "conversations",
    "chosen": "chosen",
    "rejected": "rejected"
  }
 }
 ```
 ### OpenAI 格式
 OpenAI 格式仅仅是 sharegpt 格式的一种特殊情况，其中第一条消息可能是系统提示词。
 ```json
 [
  {
    "messages": [
      {
        "role": "system",
        "content": "系统提示词（选填）"
      },
      {
        "role": "user",
        "content": "人类指令"
      },
      {
        "role": "assistant",
        "content": "模型回答"
      }
    ]
  }
 ]
 ```
 对于上述格式的数据，`dataset_info.json` 中的*数据集描述*应为：
 ```json
 "数据集名称": {
  "file_name": "data.json",
  "formatting": "sharegpt",
  "columns": {
    "messages": "messages"
  },
  "tags": {
    "role_tag": "role",
    "content_tag": "content",
    "user_tag": "user",
    "assistant_tag": "assistant",
    "system_tag": "system"
  }
 }
 ```
 Sharegpt 格式中的 KTO 数据集和多模态数据集与 alpaca 格式的类似。
 预训练数据集**不支持** sharegpt 格式。
--- a/data/alpaca_data_en_52k.json.REMOVED.git-id
+++ b/data/alpaca_data_en_52k.json.REMOVED.git-id
@@ -1 +0,0 @@
 3779ddbc040543ab1834ef216c983d6fcc06cc9a
--- a/data/alpaca_data_zh_51k.json.REMOVED.git-id
+++ b/data/alpaca_data_zh_51k.json.REMOVED.git-id
@@ -1 +0,0 @@
 fc9a6a3458caca2af8dafc6181773fe10c6d8657
--- a/data/alpaca_gpt4_data_en.json.REMOVED.git-id
+++ b/data/alpaca_gpt4_data_en.json.REMOVED.git-id
@@ -1 +0,0 @@
 25508714b7879a1e5a6764ba7f979a980f549f1a
--- a/data/alpaca_gpt4_data_zh.json.REMOVED.git-id
+++ b/data/alpaca_gpt4_data_zh.json.REMOVED.git-id
@@ -1 +0,0 @@
 7cb6a7d11455bddc3d495750a2392683d775b184
--- a/data/belle_multiturn/belle_multiturn.py
+++ b/data/belle_multiturn/belle_multiturn.py
@@ -1,7 +1,11 @@
 import json
 import os
 import datasets
 _HF_ENDPOINT = os.getenv("HF_ENDPOINT", "https://huggingface.co")
 _DESCRIPTION = "BELLE multiturn chat dataset."
 _CITATION = """\
@@ -13,37 +17,25 @@ _CITATION = """\
 }
 """
-_HOMEPAGE = "https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M"
+_HOMEPAGE = "{}/datasets/BelleGroup/multiturn_chat_0.8M".format(_HF_ENDPOINT)
 _LICENSE = "gpl-3.0"
-_URL = "https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M/resolve/main/multiturn_chat_0.8M.json"
+_URL = "{}/datasets/BelleGroup/multiturn_chat_0.8M/resolve/main/multiturn_chat_0.8M.json".format(_HF_ENDPOINT)
 class BelleMultiturn(datasets.GeneratorBasedBuilder):
    VERSION = datasets.Version("0.0.0")
    def _info(self):
-        features = datasets.Features({
+        features = datasets.Features(
-            "conversations": [{"from": datasets.Value("string"), "value": datasets.Value("string")}]
+            {"conversations": [{"from": datasets.Value("string"), "value": datasets.Value("string")}]}
-        })
+        )
        return datasets.DatasetInfo(
-            description=_DESCRIPTION,
+            description=_DESCRIPTION, features=features, homepage=_HOMEPAGE, license=_LICENSE, citation=_CITATION
            features=features,
            homepage=_HOMEPAGE,
            license=_LICENSE,
            citation=_CITATION
        )
    def _split_generators(self, dl_manager: datasets.DownloadManager):
        file_path = dl_manager.download(_URL)
-        return [
+        return [datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"filepath": file_path})]
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={
                    "filepath": file_path
                }
            )
        ]
    def _generate_examples(self, filepath: str):
        with open(filepath, "r", encoding="utf-8") as f:
@@ -55,7 +47,7 @@ class BelleMultiturn(datasets.GeneratorBasedBuilder):
                assist_idx = prompt.rfind("Assistant:")
                human_idx = prompt.rfind("Human:")
-                query = prompt[human_idx+6:assist_idx].strip()
+                query = prompt[human_idx + 6 : assist_idx].strip()
                prompt = prompt[:human_idx].strip()
                conversations.insert(0, {"from": "gpt", "value": response})
                conversations.insert(0, {"from": "human", "value": query})
@@ -64,8 +56,8 @@ class BelleMultiturn(datasets.GeneratorBasedBuilder):
                    assist_idx = prompt.rfind("Assistant:")
                    human_idx = prompt.rfind("Human:")
                    if human_idx != -1:
-                        old_query = prompt[human_idx+6:assist_idx].strip()
+                        old_query = prompt[human_idx + 6 : assist_idx].strip()
-                        old_resp = prompt[assist_idx+10:].strip()
+                        old_resp = prompt[assist_idx + 10 :].strip()
                        conversations.insert(0, {"from": "gpt", "value": old_resp})
                        conversations.insert(0, {"from": "human", "value": old_query})
                    else:
--- a/data/comparison_gpt4_data_en.json.REMOVED.git-id
+++ b/data/comparison_gpt4_data_en.json.REMOVED.git-id
@@ -1 +0,0 @@
 f5cb08305ff5dc9c17a09809c54c8c8834aadc70
--- a/data/comparison_gpt4_data_zh.json.REMOVED.git-id
+++ b/data/comparison_gpt4_data_zh.json.REMOVED.git-id
@@ -1 +0,0 @@
 aee47b7b443496e37808d7f34ef10403ff99bcc3
--- a/data/example_dataset/example_dataset.py
+++ b/data/example_dataset/example_dataset.py
@@ -1,46 +0,0 @@
 import json
 import datasets
 from typing import Any, Dict, List
 _DESCRIPTION = "An example of dataset."
 _CITATION = ""
 _HOMEPAGE = ""
 _LICENSE = ""
 _URL = "examples.json"
 class ExampleDataset(datasets.GeneratorBasedBuilder):
    VERSION = datasets.Version("0.0.0")
    def _info(self) -> datasets.DatasetInfo:
        features = datasets.Features({
            "instruction": datasets.Value("string"),
            "input": datasets.Value("string"),
            "output": datasets.Value("string"),
            "history": datasets.Sequence(datasets.Sequence(datasets.Value("string")))
        })
        return datasets.DatasetInfo(
            description=_DESCRIPTION,
            features=features,
            homepage=_HOMEPAGE,
            license=_LICENSE,
            citation=_CITATION
        )
    def _split_generators(self, dl_manager: datasets.DownloadManager) -> List[datasets.SplitGenerator]:
        file_path = dl_manager.download(_URL)
        return [
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={
                    "filepath": file_path
                }
            )
        ]
    def _generate_examples(self, filepath: str) -> Dict[int, Dict[str, Any]]:
        example_dataset = json.load(open(filepath, "r", encoding="utf-8"))
        for key, example in enumerate(example_dataset):
            yield key, example
--- a/data/glaive_toolcall_10k.json.REMOVED.git-id
+++ b/data/glaive_toolcall_10k.json.REMOVED.git-id
@@ -1 +0,0 @@
 4748dff00d1dc42768a5b6cc772143c313017812
--- a/data/hh_rlhf_en/hh_rlhf_en.py
+++ b/data/hh_rlhf_en/hh_rlhf_en.py
@@ -1,62 +1,53 @@
 import json
-import datasets
+import os
 from typing import List
 import datasets
 _HF_ENDPOINT = os.getenv("HF_ENDPOINT", "https://huggingface.co")
 _DESCRIPTION = "Human preference data about helpfulness and harmlessness."
 _CITATION = ""
-_HOMEPAGE = "https://huggingface.co/datasets/Anthropic/hh-rlhf"
+_HOMEPAGE = "{}/datasets/Anthropic/hh-rlhf".format(_HF_ENDPOINT)
 _LICENSE = "mit"
-_URL = "https://huggingface.co/datasets/Anthropic/hh-rlhf/resolve/main/"
+_URL = "{}/datasets/Anthropic/hh-rlhf/resolve/main/".format(_HF_ENDPOINT)
 _URLS = {
    "train": [
        _URL + "harmless-base/train.jsonl.gz",
        _URL + "helpful-base/train.jsonl.gz",
        _URL + "helpful-online/train.jsonl.gz",
-        _URL + "helpful-rejection-sampled/train.jsonl.gz"
+        _URL + "helpful-rejection-sampled/train.jsonl.gz",
    ],
    "test": [
        _URL + "harmless-base/test.jsonl.gz",
        _URL + "helpful-base/test.jsonl.gz",
        _URL + "helpful-online/test.jsonl.gz",
-        _URL + "helpful-rejection-sampled/test.jsonl.gz"
+        _URL + "helpful-rejection-sampled/test.jsonl.gz",
-    ]
+    ],
 }
 class HhRlhfEn(datasets.GeneratorBasedBuilder):
    VERSION = datasets.Version("0.0.0")
    def _info(self) -> datasets.DatasetInfo:
-        features = datasets.Features({
+        features = datasets.Features(
-            "instruction": datasets.Value("string"),
+            {
-            "output": datasets.Sequence(datasets.Value("string")),
+                "instruction": datasets.Value("string"),
-            "history": datasets.Sequence(datasets.Sequence(datasets.Value("string")))
+                "chosen": datasets.Value("string"),
-        })
+                "rejected": datasets.Value("string"),
                "history": datasets.Sequence(datasets.Sequence(datasets.Value("string"))),
            }
        )
        return datasets.DatasetInfo(
-            description=_DESCRIPTION,
+            description=_DESCRIPTION, features=features, homepage=_HOMEPAGE, license=_LICENSE, citation=_CITATION
            features=features,
            homepage=_HOMEPAGE,
            license=_LICENSE,
            citation=_CITATION
        )
    def _split_generators(self, dl_manager: datasets.DownloadManager):
        file_path = dl_manager.download_and_extract(_URLS)
        return [
-            datasets.SplitGenerator(
+            datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"filepaths": file_path["train"]}),
-                name=datasets.Split.TRAIN,
+            datasets.SplitGenerator(name=datasets.Split.TEST, gen_kwargs={"filepaths": file_path["test"]}),
                gen_kwargs={
                    "filepaths": file_path["train"]
                }
            ),
            datasets.SplitGenerator(
                name=datasets.Split.TEST,
                gen_kwargs={
                    "filepaths": file_path["test"]
                }
            )
        ]
    def _generate_examples(self, filepaths: List[str]):
@@ -69,12 +60,12 @@ class HhRlhfEn(datasets.GeneratorBasedBuilder):
                    rejected = data["rejected"]
                    assist_idx = rejected.rfind("\n\nAssistant: ")
-                    r_reject = rejected[assist_idx+13:].strip()
+                    r_reject = rejected[assist_idx + 13 :].strip()
                    assist_idx = chosen.rfind("\n\nAssistant: ")
-                    r_accept = chosen[assist_idx+13:].strip()
+                    r_accept = chosen[assist_idx + 13 :].strip()
                    human_idx = chosen.rfind("\n\nHuman: ")
-                    query = chosen[human_idx+9:assist_idx].strip()
+                    query = chosen[human_idx + 9 : assist_idx].strip()
                    prompt = chosen[:human_idx]
                    history = []
@@ -82,16 +73,12 @@ class HhRlhfEn(datasets.GeneratorBasedBuilder):
                        assist_idx = prompt.rfind("\n\nAssistant: ")
                        human_idx = prompt.rfind("\n\nHuman: ")
                        if human_idx != -1:
-                            old_query = prompt[human_idx+9:assist_idx].strip()
+                            old_query = prompt[human_idx + 9 : assist_idx].strip()
-                            old_resp = prompt[assist_idx+13:].strip()
+                            old_resp = prompt[assist_idx + 13 :].strip()
                            history.insert(0, (old_query, old_resp))
                        else:
                            break
                        prompt = prompt[:human_idx]
-                    yield key, {
+                    yield key, {"instruction": query, "chosen": r_accept, "rejected": r_reject, "history": history}
                        "instruction": query,
                        "output": [r_accept, r_reject],
                        "history": history
                    }
                    key += 1
--- a/data/oaast_rm.json.REMOVED.git-id
+++ b/data/oaast_rm.json.REMOVED.git-id
@@ -1 +0,0 @@
 274079ea921762be356de85b18f13fa60b7ba8cb
--- a/data/oaast_sft.json.REMOVED.git-id
+++ b/data/oaast_sft.json.REMOVED.git-id
@@ -1 +0,0 @@
 57fd080be5bffe4153fe3ee26a175e3d56da30f3
--- a/data/ultra_chat/ultra_chat.py
+++ b/data/ultra_chat/ultra_chat.py
@@ -1,7 +1,11 @@
 import json
-import datasets
+import os
 from typing import List
 import datasets
 _HF_ENDPOINT = os.getenv("HF_ENDPOINT", "https://huggingface.co")
 _DESCRIPTION = "UltraChat: Large-scale, Informative, and Diverse Multi-round Dialogue Data."
@@ -16,37 +20,25 @@ _CITATION = """\
 }
 """
-_HOMEPAGE = "https://huggingface.co/datasets/stingning/ultrachat"
+_HOMEPAGE = "{}/datasets/stingning/ultrachat".format(_HF_ENDPOINT)
 _LICENSE = "cc-by-nc-4.0"
-_BASE_DATA_URL = "https://huggingface.co/datasets/stingning/ultrachat/resolve/main/train_{idx}.jsonl"
+_BASE_DATA_URL = "{}/datasets/stingning/ultrachat/resolve/main/train_{{idx}}.jsonl".format(_HF_ENDPOINT)
 class UltraChat(datasets.GeneratorBasedBuilder):
    VERSION = datasets.Version("0.0.0")
    def _info(self):
-        features = datasets.Features({
+        features = datasets.Features(
-            "conversations": [{"from": datasets.Value("string"), "value": datasets.Value("string")}]
+            {"conversations": [{"from": datasets.Value("string"), "value": datasets.Value("string")}]}
-        })
+        )
        return datasets.DatasetInfo(
-            description=_DESCRIPTION,
+            description=_DESCRIPTION, features=features, homepage=_HOMEPAGE, license=_LICENSE, citation=_CITATION
            features=features,
            homepage=_HOMEPAGE,
            license=_LICENSE,
            citation=_CITATION
        )
    def _split_generators(self, dl_manager: datasets.DownloadManager):
-        file_paths = [dl_manager.download(_BASE_DATA_URL.format(idx=idx)) for idx in range(10)] # multiple shards
+        file_paths = [dl_manager.download(_BASE_DATA_URL.format(idx=idx)) for idx in range(10)]  # multiple shards
-        return [
+        return [datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"filepaths": file_paths})]
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={
                    "filepaths": file_paths
                }
            )
        ]
    def _generate_examples(self, filepaths: List[str]):
        for filepath in filepaths:
@@ -54,7 +46,7 @@ class UltraChat(datasets.GeneratorBasedBuilder):
                for row in f:
                    try:
                        data = json.loads(row)
-                    except:
+                    except Exception:
                        continue
                    key: int = data["id"]
                    content: List[str] = data["data"]
@@ -62,8 +54,7 @@ class UltraChat(datasets.GeneratorBasedBuilder):
                        content.pop(-1)
                    if len(content) < 2:
                        continue
-                    conversations = [{
+                    conversations = [
-                        "from": "human" if i % 2 == 0 else "gpt",
+                        {"from": "human" if i % 2 == 0 else "gpt", "value": content[i]} for i in range(len(content))
-                        "value": content[i]
+                    ]
                    } for i in range(len(content))]
                    yield key, {"conversations": conversations}
--- a/data/wiki_demo.txt
+++ b/data/wiki_demo.txt
--- a/data/wiki_demo.txt.REMOVED.git-id
+++ b/data/wiki_demo.txt.REMOVED.git-id
@@ -1 +0,0 @@
 c9cf509b7fdac5490cfd6dae72c2d7b8a60af6cb
--- a/docker/docker-cuda/Dockerfile
+++ b/docker/docker-cuda/Dockerfile
@@ -0,0 +1,59 @@
 # Use the NVIDIA official image with PyTorch 2.3.0
 # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-02.html
 FROM nvcr.io/nvidia/pytorch:24.02-py3
 # Define environments
 ENV MAX_JOBS=4
 ENV FLASH_ATTENTION_FORCE_BUILD=TRUE
 ENV VLLM_WORKER_MULTIPROC_METHOD=spawn
 # Define installation arguments
 ARG INSTALL_BNB=false
 ARG INSTALL_VLLM=false
 ARG INSTALL_DEEPSPEED=false
 ARG INSTALL_FLASHATTN=false
 ARG PIP_INDEX=https://pypi.org/simple
 # Set the working directory
 WORKDIR /app
 # Install the requirements
 COPY requirements.txt /app
 RUN pip config set global.index-url "$PIP_INDEX" && \
    pip config set global.extra-index-url "$PIP_INDEX" && \
    python -m pip install --upgrade pip && \
    python -m pip install -r requirements.txt
 # Copy the rest of the application into the image
 COPY . /app
 # Install the LLaMA Factory
 RUN EXTRA_PACKAGES="metrics"; \
    if [ "$INSTALL_BNB" == "true" ]; then \
        EXTRA_PACKAGES="${EXTRA_PACKAGES},bitsandbytes"; \
    fi; \
    if [ "$INSTALL_VLLM" == "true" ]; then \
        EXTRA_PACKAGES="${EXTRA_PACKAGES},vllm"; \
    fi; \
    if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
        EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
    fi; \
    pip install -e ".[$EXTRA_PACKAGES]"
 # Rebuild flash attention
 RUN pip uninstall -y transformer-engine flash-attn && \
    if [ "$INSTALL_FLASHATTN" == "true" ]; then \
        pip uninstall -y ninja && pip install ninja && \
        pip install --no-cache-dir flash-attn --no-build-isolation; \
    fi
 # Set up volumes
 VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
 # Expose port 7860 for the LLaMA Board
 ENV GRADIO_SERVER_PORT 7860
 EXPOSE 7860
 # Expose port 8000 for the API service
 ENV API_PORT 8000
 EXPOSE 8000
--- a/docker/docker-cuda/docker-compose.yml
+++ b/docker/docker-cuda/docker-compose.yml
@@ -0,0 +1,32 @@
 services:
  llamafactory:
    build:
      dockerfile: ./docker/docker-cuda/Dockerfile
      context: ../..
      args:
        INSTALL_BNB: false
        INSTALL_VLLM: false
        INSTALL_DEEPSPEED: false
        INSTALL_FLASHATTN: false
        PIP_INDEX: https://pypi.org/simple
    container_name: llamafactory
    volumes:
      - ../../hf_cache:/root/.cache/huggingface
      - ../../ms_cache:/root/.cache/modelscope
      - ../../data:/app/data
      - ../../output:/app/output
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    tty: true
    stdin_open: true
    command: bash
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: "all"
            capabilities: [gpu]
    restart: unless-stopped
--- a/docker/docker-npu/Dockerfile
+++ b/docker/docker-npu/Dockerfile
@@ -0,0 +1,45 @@
 # Use the Ubuntu 22.04 image with CANN 8.0.rc1
 # More versions can be found at https://hub.docker.com/r/cosdt/cann/tags
 # FROM cosdt/cann:8.0.rc1-910-ubuntu22.04
 FROM cosdt/cann:8.0.rc1-910b-ubuntu22.04
 # FROM cosdt/cann:8.0.rc1-910-openeuler22.03
 # FROM cosdt/cann:8.0.rc1-910b-openeuler22.03
 # Define environments
 ENV DEBIAN_FRONTEND=noninteractive
 # Define installation arguments
 ARG INSTALL_DEEPSPEED=false
 ARG PIP_INDEX=https://pypi.org/simple
 ARG TORCH_INDEX=https://download.pytorch.org/whl/cpu
 # Set the working directory
 WORKDIR /app
 # Install the requirements
 COPY requirements.txt /app
 RUN pip config set global.index-url "$PIP_INDEX" && \
    pip config set global.extra-index-url "$TORCH_INDEX" && \
    python -m pip install --upgrade pip && \
    python -m pip install -r requirements.txt
 # Copy the rest of the application into the image
 COPY . /app
 # Install the LLaMA Factory
 RUN EXTRA_PACKAGES="torch-npu,metrics"; \
    if [ "$INSTALL_DEEPSPEED" == "true" ]; then \
        EXTRA_PACKAGES="${EXTRA_PACKAGES},deepspeed"; \
    fi; \
    pip install -e ".[$EXTRA_PACKAGES]"
 # Set up volumes
 VOLUME [ "/root/.cache/huggingface", "/root/.cache/modelscope", "/app/data", "/app/output" ]
 # Expose port 7860 for the LLaMA Board
 ENV GRADIO_SERVER_PORT 7860
 EXPOSE 7860
 # Expose port 8000 for the API service
 ENV API_PORT 8000
 EXPOSE 8000
--- a/docker/docker-npu/docker-compose.yml
+++ b/docker/docker-npu/docker-compose.yml
@@ -0,0 +1,31 @@
 services:
  llamafactory:
    build:
      dockerfile: ./docker/docker-npu/Dockerfile
      context: ../..
      args:
        INSTALL_DEEPSPEED: false
        PIP_INDEX: https://pypi.org/simple
    container_name: llamafactory
    volumes:
      - ../../hf_cache:/root/.cache/huggingface
      - ../../ms_cache:/root/.cache/modelscope
      - ../../data:/app/data
      - ../../output:/app/output
      - /usr/local/dcmi:/usr/local/dcmi
      - /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
      - /usr/local/Ascend/driver:/usr/local/Ascend/driver
      - /etc/ascend_install.info:/etc/ascend_install.info
    ports:
      - "7860:7860"
      - "8000:8000"
    ipc: host
    tty: true
    stdin_open: true
    command: bash
    devices:
      - /dev/davinci0
      - /dev/davinci_manager
      - /dev/devmm_svm
      - /dev/hisi_hdc
    restart: unless-stopped
--- a/evaluation/ceval/ceval.py
+++ b/evaluation/ceval/ceval.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import datasets
@@ -19,7 +20,7 @@ import pandas as pd
 _CITATION = """\
@article{huang2023ceval,
-  title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models}, 
+  title={C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models},
  author={Huang, Yuzhen and Bai, Yuzhuo and Zhu, Zhihao and Zhang, Junlei and Zhang, Jinghan and Su, Tangjun and Liu, Junteng and Lv, Chuancheng and Zhang, Yikai and Lei, Jiayi and Fu, Yao and Sun, Maosong and He, Junxian},
  journal={arXiv preprint arXiv:2305.08322},
  year={2023}
@@ -133,25 +134,19 @@ class Ceval(datasets.GeneratorBasedBuilder):
            datasets.SplitGenerator(
                name=datasets.Split.TEST,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "test", f"{task_name}_test.csv"),
                        data_dir, "test", f"{task_name}_test.csv"
                    ),
                },
            ),
            datasets.SplitGenerator(
                name=datasets.Split.VALIDATION,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "val", f"{task_name}_val.csv"),
                        data_dir, "val", f"{task_name}_val.csv"
                    ),
                },
            ),
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "dev", f"{task_name}_dev.csv"),
                        data_dir, "dev", f"{task_name}_dev.csv"
                    ),
                },
            ),
        ]
--- a/evaluation/cmmlu/cmmlu.py
+++ b/evaluation/cmmlu/cmmlu.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import datasets
@@ -37,73 +38,73 @@ _LICENSE = "Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internatio
 _URL = "cmmlu.zip"
 task_list = [
-     'agronomy',
+    "agronomy",
-     'anatomy',
+    "anatomy",
-     'ancient_chinese',
+    "ancient_chinese",
-     'arts',
+    "arts",
-     'astronomy',
+    "astronomy",
-     'business_ethics',
+    "business_ethics",
-     'chinese_civil_service_exam',
+    "chinese_civil_service_exam",
-     'chinese_driving_rule',
+    "chinese_driving_rule",
-     'chinese_food_culture',
+    "chinese_food_culture",
-     'chinese_foreign_policy',
+    "chinese_foreign_policy",
-     'chinese_history',
+    "chinese_history",
-     'chinese_literature',
+    "chinese_literature",
-     'chinese_teacher_qualification',
+    "chinese_teacher_qualification",
-     'clinical_knowledge',
+    "clinical_knowledge",
-     'college_actuarial_science',
+    "college_actuarial_science",
-     'college_education',
+    "college_education",
-     'college_engineering_hydrology',
+    "college_engineering_hydrology",
-     'college_law',
+    "college_law",
-     'college_mathematics',
+    "college_mathematics",
-     'college_medical_statistics',
+    "college_medical_statistics",
-     'college_medicine',
+    "college_medicine",
-     'computer_science',
+    "computer_science",
-     'computer_security',
+    "computer_security",
-     'conceptual_physics',
+    "conceptual_physics",
-     'construction_project_management',
+    "construction_project_management",
-     'economics',
+    "economics",
-     'education',
+    "education",
-     'electrical_engineering',
+    "electrical_engineering",
-     'elementary_chinese',
+    "elementary_chinese",
-     'elementary_commonsense',
+    "elementary_commonsense",
-     'elementary_information_and_technology',
+    "elementary_information_and_technology",
-     'elementary_mathematics',
+    "elementary_mathematics",
-     'ethnology',
+    "ethnology",
-     'food_science',
+    "food_science",
-     'genetics',
+    "genetics",
-     'global_facts',
+    "global_facts",
-     'high_school_biology',
+    "high_school_biology",
-     'high_school_chemistry',
+    "high_school_chemistry",
-     'high_school_geography',
+    "high_school_geography",
-     'high_school_mathematics',
+    "high_school_mathematics",
-     'high_school_physics',
+    "high_school_physics",
-     'high_school_politics',
+    "high_school_politics",
-     'human_sexuality',
+    "human_sexuality",
-     'international_law',
+    "international_law",
-     'journalism',
+    "journalism",
-     'jurisprudence',
+    "jurisprudence",
-     'legal_and_moral_basis',
+    "legal_and_moral_basis",
-     'logical',
+    "logical",
-     'machine_learning',
+    "machine_learning",
-     'management',
+    "management",
-     'marketing',
+    "marketing",
-     'marxist_theory',
+    "marxist_theory",
-     'modern_chinese',
+    "modern_chinese",
-     'nutrition',
+    "nutrition",
-     'philosophy',
+    "philosophy",
-     'professional_accounting',
+    "professional_accounting",
-     'professional_law',
+    "professional_law",
-     'professional_medicine',
+    "professional_medicine",
-     'professional_psychology',
+    "professional_psychology",
-     'public_relations',
+    "public_relations",
-     'security_study',
+    "security_study",
-     'sociology',
+    "sociology",
-     'sports_science',
+    "sports_science",
-     'traditional_chinese_medicine',
+    "traditional_chinese_medicine",
-     'virology',
+    "virology",
-     'world_history',
+    "world_history",
-     'world_religions',
+    "world_religions",
 ]
--- a/evaluation/mmlu/mmlu.py
+++ b/evaluation/mmlu/mmlu.py
@@ -11,6 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import datasets
@@ -136,31 +137,25 @@ class MMLU(datasets.GeneratorBasedBuilder):
            datasets.SplitGenerator(
                name=datasets.Split.TEST,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "data", "test", f"{task_name}_test.csv"),
                        data_dir, "data", "test", f"{task_name}_test.csv"
                    ),
                },
            ),
            datasets.SplitGenerator(
                name=datasets.Split.VALIDATION,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "data", "val", f"{task_name}_val.csv"),
                        data_dir, "data", "val", f"{task_name}_val.csv"
                    ),
                },
            ),
            datasets.SplitGenerator(
                name=datasets.Split.TRAIN,
                gen_kwargs={
-                    "filepath": os.path.join(
+                    "filepath": os.path.join(data_dir, "data", "dev", f"{task_name}_dev.csv"),
                        data_dir, "data", "dev", f"{task_name}_dev.csv"
                    ),
                },
            ),
        ]
    def _generate_examples(self, filepath):
-        df = pd.read_csv(filepath)
+        df = pd.read_csv(filepath, header=None)
        df.columns = ["question", "A", "B", "C", "D", "answer"]
        for i, instance in enumerate(df.to_dict(orient="records")):
--- a/examples/README.md
+++ b/examples/README.md
@@ -0,0 +1,221 @@
 We provide diverse examples about fine-tuning LLMs.
 Make sure to execute these commands in the `LLaMA-Factory` directory.
 ## Table of Contents
 - [LoRA Fine-Tuning](#lora-fine-tuning)
 - [QLoRA Fine-Tuning](#qlora-fine-tuning)
 - [Full-Parameter Fine-Tuning](#full-parameter-fine-tuning)
 - [Merging LoRA Adapters and Quantization](#merging-lora-adapters-and-quantization)
 - [Inferring LoRA Fine-Tuned Models](#inferring-lora-fine-tuned-models)
 - [Extras](#extras)
 Use `CUDA_VISIBLE_DEVICES` (GPU) or `ASCEND_RT_VISIBLE_DEVICES` (NPU) to choose computing devices.
 ## Examples
 ### LoRA Fine-Tuning
 #### (Continuous) Pre-Training
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
 ```
 #### Supervised Fine-Tuning
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 ```
 #### Multimodal Supervised Fine-Tuning
 ```bash
 llamafactory-cli train examples/train_lora/llava1_5_lora_sft.yaml
 ```
 #### Reward Modeling
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
 ```
 #### PPO Training
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
 ```
 #### DPO/ORPO/SimPO Training
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
 ```
 #### KTO Training
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 ```
 #### Preprocess Dataset
 It is useful for large dataset, use `tokenized_path` in config to load the preprocessed dataset.
 ```bash
 llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
 ```
 #### Evaluating on MMLU/CMMLU/C-Eval Benchmarks
 ```bash
 llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
 ```
 #### Batch Predicting and Computing BLEU and ROUGE Scores
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_predict.yaml
 ```
 #### Supervised Fine-Tuning on Multiple Nodes
 ```bash
 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 ```
 #### Supervised Fine-Tuning with DeepSpeed ZeRO-3 (Weight Sharding)
 ```bash
 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
 ```
 ### QLoRA Fine-Tuning
 #### Supervised Fine-Tuning with 4/8-bit Bitsandbytes/HQQ/EETQ Quantization (Recommended)
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
 ```
 #### Supervised Fine-Tuning with 4/8-bit GPTQ Quantization
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_gptq.yaml
 ```
 #### Supervised Fine-Tuning with 4-bit AWQ Quantization
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_awq.yaml
 ```
 #### Supervised Fine-Tuning with 2-bit AQLM Quantization
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 ```
 ### Full-Parameter Fine-Tuning
 #### Supervised Fine-Tuning on Single Node
 ```bash
 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 ```
 #### Supervised Fine-Tuning on Multiple Nodes
 ```bash
 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 ```
 #### Batch Predicting and Computing BLEU and ROUGE Scores
 ```bash
 llamafactory-cli train examples/train_full/llama3_full_predict.yaml
 ```
 ### Merging LoRA Adapters and Quantization
 #### Merge LoRA Adapters
 Note: DO NOT use quantized model or `quantization_bit` when merging LoRA adapters.
 ```bash
 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
 ```
 #### Quantizing Model using AutoGPTQ
 ```bash
 llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
 ```
 ### Inferring LoRA Fine-Tuned Models
 #### Use CLI
 ```bash
 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
 ```
 #### Use Web UI
 ```bash
 llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
 ```
 #### Launch OpenAI-style API
 ```bash
 llamafactory-cli api examples/inference/llama3_lora_sft.yaml
 ```
 ### Extras
 #### Full-Parameter Fine-Tuning using GaLore
 ```bash
 llamafactory-cli train examples/extras/galore/llama3_full_sft.yaml
 ```
 #### Full-Parameter Fine-Tuning using BAdam
 ```bash
 llamafactory-cli train examples/extras/badam/llama3_full_sft.yaml
 ```
 #### LoRA+ Fine-Tuning
 ```bash
 llamafactory-cli train examples/extras/loraplus/llama3_lora_sft.yaml
 ```
 #### PiSSA Fine-Tuning
 ```bash
 llamafactory-cli train examples/extras/pissa/llama3_lora_sft.yaml
 ```
 #### Mixture-of-Depths Fine-Tuning
 ```bash
 llamafactory-cli train examples/extras/mod/llama3_full_sft.yaml
 ```
 #### LLaMA-Pro Fine-Tuning
 ```bash
 bash examples/extras/llama_pro/expand.sh
 llamafactory-cli train examples/extras/llama_pro/llama3_freeze_sft.yaml
 ```
 #### FSDP+QLoRA Fine-Tuning
 ```bash
 bash examples/extras/fsdp_qlora/train.sh
 ```
--- a/examples/README_zh.md
+++ b/examples/README_zh.md
@@ -0,0 +1,221 @@
 我们提供了多样化的大模型微调示例脚本。
 请确保在 `LLaMA-Factory` 目录下执行下述命令。
 ## 目录
 - [LoRA 微调](#lora-微调)
 - [QLoRA 微调](#qlora-微调)
 - [全参数微调](#全参数微调)
 - [合并 LoRA 适配器与模型量化](#合并-lora-适配器与模型量化)
 - [推理 LoRA 模型](#推理-lora-模型)
 - [杂项](#杂项)
 使用 `CUDA_VISIBLE_DEVICES`（GPU）或 `ASCEND_RT_VISIBLE_DEVICES`（NPU）选择计算设备。
 ## 示例
 ### LoRA 微调
 #### （增量）预训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_pretrain.yaml
 ```
 #### 指令监督微调
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 ```
 #### 多模态指令监督微调
 ```bash
 llamafactory-cli train examples/train_lora/llava1_5_lora_sft.yaml
 ```
 #### 奖励模型训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_reward.yaml
 ```
 #### PPO 训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_ppo.yaml
 ```
 #### DPO/ORPO/SimPO 训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_dpo.yaml
 ```
 #### KTO 训练
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_kto.yaml
 ```
 #### 预处理数据集
 对于大数据集有帮助，在配置中使用 `tokenized_path` 以加载预处理后的数据集。
 ```bash
 llamafactory-cli train examples/train_lora/llama3_preprocess.yaml
 ```
 #### 在 MMLU/CMMLU/C-Eval 上评估
 ```bash
 llamafactory-cli eval examples/train_lora/llama3_lora_eval.yaml
 ```
 #### 批量预测并计算 BLEU 和 ROUGE 分数
 ```bash
 llamafactory-cli train examples/train_lora/llama3_lora_predict.yaml
 ```
 #### 多机指令监督微调
 ```bash
 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_lora/llama3_lora_sft.yaml
 ```
 #### 使用 DeepSpeed ZeRO-3 平均分配显存
 ```bash
 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_lora/llama3_lora_sft_ds3.yaml
 ```
 ### QLoRA 微调
 #### 基于 4/8 比特 Bitsandbytes/HQQ/EETQ 量化进行指令监督微调（推荐）
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_otfq.yaml
 ```
 #### 基于 4/8 比特 GPTQ 量化进行指令监督微调
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_gptq.yaml
 ```
 #### 基于 4 比特 AWQ 量化进行指令监督微调
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_awq.yaml
 ```
 #### 基于 2 比特 AQLM 量化进行指令监督微调
 ```bash
 llamafactory-cli train examples/train_qlora/llama3_lora_sft_aqlm.yaml
 ```
 ### 全参数微调
 #### 在单机上进行指令监督微调
 ```bash
 FORCE_TORCHRUN=1 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 ```
 #### 在多机上进行指令监督微调
 ```bash
 FORCE_TORCHRUN=1 NNODES=2 RANK=0 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 FORCE_TORCHRUN=1 NNODES=2 RANK=1 MASTER_ADDR=192.168.0.1 MASTER_PORT=29500 llamafactory-cli train examples/train_full/llama3_full_sft_ds3.yaml
 ```
 #### 批量预测并计算 BLEU 和 ROUGE 分数
 ```bash
 llamafactory-cli train examples/train_full/llama3_full_predict.yaml
 ```
 ### 合并 LoRA 适配器与模型量化
 #### 合并 LoRA 适配器
 注：请勿使用量化后的模型或 `quantization_bit` 参数来合并 LoRA 适配器。
 ```bash
 llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
 ```
 #### 使用 AutoGPTQ 量化模型
 ```bash
 llamafactory-cli export examples/merge_lora/llama3_gptq.yaml
 ```
 ### 推理 LoRA 模型
 #### 使用命令行接口
 ```bash
 llamafactory-cli chat examples/inference/llama3_lora_sft.yaml
 ```
 #### 使用浏览器界面
 ```bash
 llamafactory-cli webchat examples/inference/llama3_lora_sft.yaml
 ```
 #### 启动 OpenAI 风格 API
 ```bash
 llamafactory-cli api examples/inference/llama3_lora_sft.yaml
 ```
 ### 杂项
 #### 使用 GaLore 进行全参数训练
 ```bash
 llamafactory-cli train examples/extras/galore/llama3_full_sft.yaml
 ```
 #### 使用 BAdam 进行全参数训练
 ```bash
 llamafactory-cli train examples/extras/badam/llama3_full_sft.yaml
 ```
 #### LoRA+ 微调
 ```bash
 llamafactory-cli train examples/extras/loraplus/llama3_lora_sft.yaml
 ```
 #### PiSSA 微调
 ```bash
 llamafactory-cli train examples/extras/pissa/llama3_lora_sft.yaml
 ```
 #### 深度混合微调
 ```bash
 llamafactory-cli train examples/extras/mod/llama3_full_sft.yaml
 ```
 #### LLaMA-Pro 微调
 ```bash
 bash examples/extras/llama_pro/expand.sh
 llamafactory-cli train examples/extras/llama_pro/llama3_freeze_sft.yaml
 ```
 #### FSDP+QLoRA 微调
 ```bash
 bash examples/extras/fsdp_qlora/train.sh
 ```
--- a/examples/accelerate/fsdp_config.yaml
+++ b/examples/accelerate/fsdp_config.yaml
@@ -0,0 +1,25 @@
 compute_environment: LOCAL_MACHINE
 debug: false
 distributed_type: FSDP
 downcast_bf16: 'no'
 fsdp_config:
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_forward_prefetch: false
  fsdp_cpu_ram_efficient_loading: true
  fsdp_offload_params: true # offload may affect training speed
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
 machine_rank: 0
 main_training_function: main
 mixed_precision: fp16 # or bf16
 num_machines: 1 # the number of nodes
 num_processes: 2 # the number of GPUs in all nodes
 rdzv_backend: static
 same_network: true
 tpu_env: []
 tpu_use_cluster: false
 tpu_use_sudo: false
 use_cpu: false
--- a/examples/extras/badam/llama3_full_sft.yaml
+++ b/examples/extras/badam/llama3_full_sft.yaml
@@ -0,0 +1,41 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 use_badam: true
 badam_mode: layer
 badam_switch_mode: ascending
 badam_switch_interval: 50
 badam_verbose: 2
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/badam/llama3_full_sft_ds3.yaml
+++ b/examples/extras/badam/llama3_full_sft_ds3.yaml
@@ -0,0 +1,42 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 use_badam: true
 badam_mode: layer
 badam_switch_mode: ascending
 badam_switch_interval: 50
 badam_verbose: 2
 deepspeed: examples/deepspeed/ds_z3_config.json
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/fsdp_qlora/llama3_lora_sft.yaml
+++ b/examples/extras/fsdp_qlora/llama3_lora_sft.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 quantization_bit: 4
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/fsdp_qlora/train.sh
+++ b/examples/extras/fsdp_qlora/train.sh
@@ -0,0 +1,6 @@
 #!/bin/bash
 # DO NOT use GPTQ/AWQ model in FSDP+QLoRA
 CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
    --config_file examples/accelerate/fsdp_config.yaml \
    src/train.py examples/extras/fsdp_qlora/llama3_lora_sft.yaml
--- a/examples/extras/galore/llama3_full_sft.yaml
+++ b/examples/extras/galore/llama3_full_sft.yaml
@@ -0,0 +1,42 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 use_galore: true
 galore_layerwise: true
 galore_target: mlp,self_attn
 galore_rank: 128
 galore_scale: 2.0
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 1
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 pure_bf16: true
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/llama_pro/expand.sh
+++ b/examples/extras/llama_pro/expand.sh
@@ -0,0 +1,6 @@
 #!/bin/bash
 python scripts/llama_pro.py \
    --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \
    --output_dir models/llama3-8b-instruct-pro \
    --num_expand 8
--- a/examples/extras/llama_pro/llama3_freeze_sft.yaml
+++ b/examples/extras/llama_pro/llama3_freeze_sft.yaml
@@ -0,0 +1,41 @@
 ### model
 model_name_or_path: models/llama3-8b-instruct-pro
 ### method
 stage: sft
 do_train: true
 finetuning_type: freeze
 freeze_trainable_layers: 8
 freeze_trainable_modules: all
 use_llama_pro: true
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b-instruct-pro/freeze/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/loraplus/llama3_lora_sft.yaml
+++ b/examples/extras/loraplus/llama3_lora_sft.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 loraplus_lr_ratio: 16.0
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/mod/llama3_full_sft.yaml
+++ b/examples/extras/mod/llama3_full_sft.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 mixture_of_depths: convert
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b-mod/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 optim: paged_adamw_8bit
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 pure_bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/extras/pissa/llama3_lora_sft.yaml
+++ b/examples/extras/pissa/llama3_lora_sft.yaml
@@ -0,0 +1,42 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 pissa_init: true
 pissa_iter: 16
 pissa_convert: true
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/inference/llama3.yaml
+++ b/examples/inference/llama3.yaml
@@ -0,0 +1,2 @@
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 template: llama3
--- a/examples/inference/llama3_lora_sft.yaml
+++ b/examples/inference/llama3_lora_sft.yaml
@@ -0,0 +1,4 @@
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 template: llama3
 finetuning_type: lora
--- a/examples/inference/llama3_vllm.yaml
+++ b/examples/inference/llama3_vllm.yaml
@@ -0,0 +1,4 @@
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 template: llama3
 infer_backend: vllm
 vllm_enforce_eager: true
--- a/examples/inference/llava1_5.yaml
+++ b/examples/inference/llava1_5.yaml
@@ -0,0 +1,3 @@
 model_name_or_path: llava-hf/llava-1.5-7b-hf
 template: vicuna
 visual_inputs: true
--- a/examples/merge_lora/llama3_gptq.yaml
+++ b/examples/merge_lora/llama3_gptq.yaml
@@ -0,0 +1,11 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 template: llama3
 ### export
 export_dir: models/llama3_gptq
 export_quantization_bit: 4
 export_quantization_dataset: data/c4_demo.json
 export_size: 2
 export_device: cpu
 export_legacy_format: false
--- a/examples/merge_lora/llama3_lora_sft.yaml
+++ b/examples/merge_lora/llama3_lora_sft.yaml
@@ -0,0 +1,13 @@
 ### Note: DO NOT use quantized model or quantization_bit when merging lora adapters
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 template: llama3
 finetuning_type: lora
 ### export
 export_dir: models/llama3_lora_sft
 export_size: 2
 export_device: cpu
 export_legacy_format: false
--- a/examples/train_full/llama3_full_predict.yaml
+++ b/examples/train_full/llama3_full_predict.yaml
@@ -0,0 +1,23 @@
 ### model
 model_name_or_path: saves/llama3-8b/full/sft
 ### method
 stage: sft
 do_predict: true
 finetuning_type: full
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 50
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/full/predict
 overwrite_output_dir: true
 ### eval
 per_device_eval_batch_size: 1
 predict_with_generate: true
--- a/examples/train_full/llama3_full_sft_ds3.yaml
+++ b/examples/train_full/llama3_full_sft_ds3.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: full
 deepspeed: examples/deepspeed/ds_z3_config.json
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/full/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 2
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_dpo.yaml
+++ b/examples/train_lora/llama3_lora_dpo.yaml
@@ -0,0 +1,41 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: dpo
 do_train: true
 finetuning_type: lora
 lora_target: all
 pref_beta: 0.1
 pref_loss: sigmoid  # choices: [sigmoid (dpo), orpo, simpo]
 ### dataset
 dataset: dpo_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/dpo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 5.0e-6
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_eval.yaml
+++ b/examples/train_lora/llama3_lora_eval.yaml
@@ -0,0 +1,18 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 ### method
 finetuning_type: lora
 ### dataset
 task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
 template: fewshot
 lang: en
 n_shot: 5
 ### output
 save_dir: saves/llama3-8b/lora/eval
 ### eval
 batch_size: 4
--- a/examples/train_lora/llama3_lora_kto.yaml
+++ b/examples/train_lora/llama3_lora_kto.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: kto
 do_train: true
 finetuning_type: lora
 lora_target: all
 pref_beta: 0.1
 ### dataset
 dataset: kto_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/kto
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 5.0e-6
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_ppo.yaml
+++ b/examples/train_lora/llama3_lora_ppo.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 reward_model: saves/llama3-8b/lora/reward
 ### method
 stage: ppo
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/ppo
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-5
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### generate
 max_new_tokens: 512
 top_k: 0
 top_p: 0.9
--- a/examples/train_lora/llama3_lora_predict.yaml
+++ b/examples/train_lora/llama3_lora_predict.yaml
@@ -0,0 +1,25 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 adapter_name_or_path: saves/llama3-8b/lora/sft
 ### method
 stage: sft
 do_predict: true
 finetuning_type: lora
 ### dataset
 eval_dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 50
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/predict
 overwrite_output_dir: true
 ### eval
 per_device_eval_batch_size: 1
 predict_with_generate: true
 ddp_timeout: 180000000
--- a/examples/train_lora/llama3_lora_pretrain.yaml
+++ b/examples/train_lora/llama3_lora_pretrain.yaml
@@ -0,0 +1,38 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: pt
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: c4_demo
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/pretrain
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_reward.yaml
+++ b/examples/train_lora/llama3_lora_reward.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: rm
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: dpo_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/reward
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_sft.yaml
+++ b/examples/train_lora/llama3_lora_sft.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_sft_ds0.yaml
+++ b/examples/train_lora/llama3_lora_sft_ds0.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 deepspeed: examples/deepspeed/ds_z0_config.json
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 2
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_lora_sft_ds3.yaml
+++ b/examples/train_lora/llama3_lora_sft_ds3.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 deepspeed: examples/deepspeed/ds_z3_config.json
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 2
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_lora/llama3_preprocess.yaml
+++ b/examples/train_lora/llama3_preprocess.yaml
@@ -0,0 +1,21 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 tokenized_path: saves/llama3-8b/dataset/sft
 ### output
 output_dir: saves/llama3-8b/lora/sft
 overwrite_output_dir: true
--- a/examples/train_lora/llava1_5_lora_sft.yaml
+++ b/examples/train_lora/llava1_5_lora_sft.yaml
@@ -0,0 +1,40 @@
 ### model
 model_name_or_path: llava-hf/llava-1.5-7b-hf
 visual_inputs: true
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: mllm_demo
 template: vicuna
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llava1_5-7b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_qlora/llama3_lora_sft_aqlm.yaml
+++ b/examples/train_qlora/llama3_lora_sft_aqlm.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: ISTA-DASLab/Meta-Llama-3-8B-Instruct-AQLM-2Bit-1x16
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_qlora/llama3_lora_sft_awq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_awq.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: TechxGenus/Meta-Llama-3-8B-Instruct-AWQ
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_qlora/llama3_lora_sft_gptq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_gptq.yaml
@@ -0,0 +1,39 @@
 ### model
 model_name_or_path: TechxGenus/Meta-Llama-3-8B-Instruct-GPTQ
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/examples/train_qlora/llama3_lora_sft_otfq.yaml
+++ b/examples/train_qlora/llama3_lora_sft_otfq.yaml
@@ -0,0 +1,41 @@
 ### model
 model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct
 quantization_bit: 4
 quantization_method: bitsandbytes  # choices: [bitsandbytes (4/8), hqq (2/3/4/5/6/8), eetq (8)]
 ### method
 stage: sft
 do_train: true
 finetuning_type: lora
 lora_target: all
 ### dataset
 dataset: identity,alpaca_en_demo
 template: llama3
 cutoff_len: 1024
 max_samples: 1000
 overwrite_cache: true
 preprocessing_num_workers: 16
 ### output
 output_dir: saves/llama3-8b/lora/sft
 logging_steps: 10
 save_steps: 500
 plot_loss: true
 overwrite_output_dir: true
 ### train
 per_device_train_batch_size: 1
 gradient_accumulation_steps: 8
 learning_rate: 1.0e-4
 num_train_epochs: 3.0
 lr_scheduler_type: cosine
 warmup_ratio: 0.1
 bf16: true
 ddp_timeout: 180000000
 ### eval
 val_size: 0.1
 per_device_eval_batch_size: 1
 eval_strategy: steps
 eval_steps: 500
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -2,23 +2,19 @@
 requires = ["setuptools>=61.0"]
 build-backend = "setuptools.build_meta"
 [tool.black]
 line-length = 119
 target-version = ["py38"]
 [tool.ruff]
 target-version = "py38"
 line-length = 119
 indent-width = 4
 [tool.ruff.lint]
 ignore = ["C408", "C901", "E501", "E731", "E741", "W605"]
 select = ["C", "E", "F", "I", "W"]
 line-length = 119
-[tool.ruff.isort]
+[tool.ruff.lint.isort]
 lines-after-imports = 2
-known-first-party = ["llmtuner"]
+known-first-party = ["llamafactory"]
-
+known-third-party = [
 [isort]
 default_section = "FIRSTPARTY"
 known_first_party = "llmtuner"
 known_third_party = [
    "accelerate",
    "datasets",
    "gradio",
@@ -28,10 +24,10 @@ known_third_party = [
    "transformers",
    "trl"
 ]
-line_length = 119
+
-lines_after_imports = 2
+[tool.ruff.format]
-multi_line_output = 3
+quote-style = "double"
-include_trailing_comma = true
+indent-style = "space"
-force_grid_wrap = 0
+docstring-code-format = true
-use_parentheses = true
+skip-magic-trailing-comma = false
-ensure_newline_before_comments = true
+line-ending = "auto"
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,19 +1,21 @@
-torch>=1.13.1
+transformers>=4.41.2
-transformers>=4.36.2
+datasets>=2.16.0
-datasets>=2.14.3
+accelerate>=0.30.1
-accelerate>=0.21.0
+peft>=0.11.1
-peft>=0.7.0
+trl>=0.8.6
-trl>=0.7.6
+gradio>=4.0.0
-gradio>=3.38.0,<4.0.0
+pandas>=2.0.0
 scipy
 einops
 sentencepiece
 tiktoken
 protobuf
 jieba
 rouge-chinese
 nltk
 uvicorn
 pydantic
 fastapi
 sse-starlette
-matplotlib
+matplotlib>=3.7.0
 fire
 packaging
 pyyaml
 numpy<2.0.0
--- a/scripts/cal_flops.py
+++ b/scripts/cal_flops.py
@@ -0,0 +1,48 @@
 # coding=utf-8
 # Copyright 2024 Microsoft Corporation and the LlamaFactory team.
 #
 # This code is inspired by the Microsoft's DeepSpeed library.
 # https://www.deepspeed.ai/tutorials/flops-profiler/
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import fire
 import torch
 from deepspeed.accelerator import get_accelerator  # type: ignore
 from deepspeed.profiling.flops_profiler import get_model_profile  # type: ignore
 from llamafactory.chat import ChatModel
 def calculate_flops(
    model_name_or_path: str,
    batch_size: int = 1,
    seq_length: int = 256,
    flash_attn: str = "auto",
 ):
    r"""
    Calculates the flops of pre-trained models.
    Usage: python cal_flops.py --model_name_or_path path_to_model --batch_size 1 --seq_length 512
    """
    with get_accelerator().device(0):
        chat_model = ChatModel(dict(model_name_or_path=model_name_or_path, template="empty", flash_attn=flash_attn))
        fake_input = torch.ones((batch_size, seq_length), dtype=torch.long, device=chat_model.model.device)
        input_dict = {"input_ids": fake_input, "labels": fake_input.clone()}
        flops, macs, params = get_model_profile(chat_model.model, kwargs=input_dict, print_profile=True, detailed=True)
        print("FLOPs:", flops)
        print("MACs:", macs)
        print("Params:", params)
 if __name__ == "__main__":
    fire.Fire(calculate_flops)
--- a/scripts/cal_lr.py
+++ b/scripts/cal_lr.py
@@ -0,0 +1,96 @@
 # coding=utf-8
 # Copyright 2024 imoneoi and the LlamaFactory team.
 #
 # This code is inspired by the imoneoi's OpenChat library.
 # https://github.com/imoneoi/openchat/blob/3.6.0/ochat/training_deepspeed/train.py
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import math
 from typing import Literal
 import fire
 import torch
 from torch.utils.data import DataLoader
 from tqdm import tqdm
 from transformers import DataCollatorForLanguageModeling, DataCollatorForSeq2Seq
 from llamafactory.data import get_dataset
 from llamafactory.extras.constants import IGNORE_INDEX
 from llamafactory.hparams import get_train_args
 from llamafactory.model import load_tokenizer
 BASE_LR = 3e-4  # 1.5e-4 for 30B-70B models
 BASE_BS = 4_000_000  # from llama paper
 def calculate_lr(
    model_name_or_path: str,
    batch_size: int,  # total batch size, namely (batch size * gradient accumulation * world size)
    stage: Literal["pt", "sft"] = "sft",
    dataset: str = "alpaca_en",
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 1024,  # i.e. maximum input length during training
    is_mistral: bool = False,  # mistral model uses a smaller learning rate,
    packing: bool = False,
 ):
    r"""
    Calculates the optimal learning rate for 7B/13B models using LLaMA's hyper-parameters.
    Usage: python cal_lr.py --model_name_or_path path_to_model --dataset alpaca_en --cutoff_len 1024 --batch_size 16
    """
    model_args, data_args, training_args, _, _ = get_train_args(
        dict(
            stage=stage,
            model_name_or_path=model_name_or_path,
            dataset=dataset,
            dataset_dir=dataset_dir,
            template=template,
            cutoff_len=cutoff_len,
            packing=packing,
            output_dir="dummy_dir",
            overwrite_cache=True,
            do_train=True,
        )
    )
    tokenizer_module = load_tokenizer(model_args)
    tokenizer = tokenizer_module["tokenizer"]
    trainset = get_dataset(model_args, data_args, training_args, stage, **tokenizer_module)["train_dataset"]
    if stage == "pt":
        data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
    elif stage == "sft":
        data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX)
    else:
        raise NotImplementedError("Stage does not supported: {}.".format(stage))
    dataloader = DataLoader(trainset, batch_size, shuffle=False, collate_fn=data_collator, pin_memory=True)
    valid_tokens, total_tokens = 0, 0
    for batch in tqdm(dataloader):
        valid_tokens += torch.sum(batch["labels"] != IGNORE_INDEX).item()
        total_tokens += torch.numel(batch["labels"])
    batch_max_len = cutoff_len * batch_size  # max tokens in a batch
    valid_ratio = valid_tokens / total_tokens
    batch_valid_len = batch_max_len * valid_ratio
    lr = BASE_LR * math.sqrt(batch_valid_len / BASE_BS)  # lr ~ sqrt(batch_size)
    lr = lr / 6.0 if is_mistral else lr
    print(
        "Optimal learning rate is {:.2e} for valid ratio% {:.2f} and effective batch size {:.2f}".format(
            lr, valid_ratio * 100, batch_valid_len
        )
    )
 if __name__ == "__main__":
    fire.Fire(calculate_lr)
--- a/scripts/cal_ppl.py
+++ b/scripts/cal_ppl.py
@@ -0,0 +1,132 @@
 # coding=utf-8
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 from dataclasses import dataclass
 from typing import Any, Dict, Literal, Optional, Sequence
 import fire
 import torch
 from torch.utils.data import DataLoader
 from tqdm import tqdm
 from transformers import DataCollatorForLanguageModeling, DataCollatorForSeq2Seq
 from llamafactory.data import get_dataset
 from llamafactory.extras.constants import IGNORE_INDEX
 from llamafactory.hparams import get_train_args
 from llamafactory.model import load_model, load_tokenizer
@dataclass
 class PairwiseDataCollatorWithPadding(DataCollatorForSeq2Seq):
    r"""
    Data collator for pairwise data.
    """
    train_on_prompt: bool = False
    def __call__(self, features: Sequence[Dict[str, Any]]) -> Dict[str, torch.Tensor]:
        r"""
        Pads batched data to the longest sequence in the batch.
        We generate 2 * n examples where the first n examples represent chosen examples and
        the last n examples represent rejected examples.
        """
        chosen_features = []
        for feature in features:
            prompt_len, answer_len = len(feature["prompt_ids"]), len(feature["chosen_ids"])
            input_ids = feature["prompt_ids"] + feature["chosen_ids"]
            attention_mask = [1] * (prompt_len + answer_len)
            labels = input_ids if self.train_on_prompt else [IGNORE_INDEX] * prompt_len + feature["chosen_ids"]
            chosen_features.append({"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels})
        return super().__call__(chosen_features)
 def cal_ppl(
    model_name_or_path: str,
    save_name: str,
    batch_size: int = 4,
    stage: Literal["pt", "sft", "rm"] = "sft",
    dataset: str = "alpaca_en",
    dataset_dir: str = "data",
    template: str = "default",
    cutoff_len: int = 1024,
    max_samples: Optional[int] = None,
    train_on_prompt: bool = False,
 ):
    r"""
    Calculates the ppl on the dataset of the pre-trained models.
    Usage: python cal_ppl.py --model_name_or_path path_to_model --save_name ppl.json
    """
    model_args, data_args, training_args, finetuning_args, _ = get_train_args(
        dict(
            stage=stage,
            model_name_or_path=model_name_or_path,
            dataset=dataset,
            dataset_dir=dataset_dir,
            template=template,
            cutoff_len=cutoff_len,
            max_samples=max_samples,
            train_on_prompt=train_on_prompt,
            output_dir="dummy_dir",
            overwrite_cache=True,
            do_train=True,
        )
    )
    tokenizer_module = load_tokenizer(model_args)
    tokenizer = tokenizer_module["tokenizer"]
    trainset = get_dataset(model_args, data_args, training_args, stage, **tokenizer_module)["train_dataset"]
    model = load_model(tokenizer, model_args, finetuning_args, is_trainable=False)
    if stage == "pt":
        data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
    elif stage == "sft":
        data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX)
    elif stage == "rm":
        data_collator = PairwiseDataCollatorWithPadding(
            tokenizer=tokenizer, label_pad_token_id=IGNORE_INDEX, train_on_prompt=train_on_prompt
        )
    else:
        raise NotImplementedError("Stage does not supported: {}.".format(stage))
    dataloader = DataLoader(trainset, batch_size, shuffle=False, collate_fn=data_collator, pin_memory=True)
    criterion = torch.nn.CrossEntropyLoss(reduction="none")
    total_ppl = 0
    perplexities = []
    batch: Dict[str, "torch.Tensor"]
    with torch.no_grad():
        for batch in tqdm(dataloader):
            batch = batch.to(model.device)
            outputs = model(**batch)
            shift_logits: "torch.Tensor" = outputs["logits"][..., :-1, :]
            shift_labels: "torch.Tensor" = batch["labels"][..., 1:]
            loss_mask = shift_labels != IGNORE_INDEX
            flatten_logits = shift_logits.contiguous().view(shift_labels.size(0) * shift_labels.size(1), -1)
            flatten_labels = shift_labels.contiguous().view(-1)
            token_logps: "torch.Tensor" = criterion(flatten_logits, flatten_labels)
            token_logps = token_logps.contiguous().view(shift_logits.size(0), -1)
            sentence_logps = (token_logps * loss_mask).sum(-1) / loss_mask.sum(-1)
            total_ppl += sentence_logps.exp().sum().item()
            perplexities.extend(sentence_logps.exp().tolist())
    with open(save_name, "w", encoding="utf-8") as f:
        json.dump(perplexities, f, indent=2)
    print("Average perplexity is {:.2f}".format(total_ppl / len(perplexities)))
    print("Perplexities have been saved at {}.".format(save_name))
 if __name__ == "__main__":
    fire.Fire(cal_ppl)
--- a/scripts/length_cdf.py
+++ b/scripts/length_cdf.py
@@ -0,0 +1,67 @@
 # coding=utf-8
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 from collections import defaultdict
 import fire
 from tqdm import tqdm
 from llamafactory.data import get_dataset
 from llamafactory.hparams import get_train_args
 from llamafactory.model import load_tokenizer
 def length_cdf(
    model_name_or_path: str,
    dataset: str = "alpaca_en",
    dataset_dir: str = "data",
    template: str = "default",
    interval: int = 1000,
 ):
    r"""
    Calculates the distribution of the input lengths in the dataset.
    Usage: python length_cdf.py --model_name_or_path path_to_model --dataset alpaca_en --template default
    """
    model_args, data_args, training_args, _, _ = get_train_args(
        dict(
            stage="sft",
            model_name_or_path=model_name_or_path,
            dataset=dataset,
            dataset_dir=dataset_dir,
            template=template,
            cutoff_len=1_000_000,
            output_dir="dummy_dir",
            overwrite_cache=True,
            do_train=True,
        )
    )
    tokenizer_module = load_tokenizer(model_args)
    trainset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)["train_dataset"]
    total_num = len(trainset)
    length_dict = defaultdict(int)
    for sample in tqdm(trainset["input_ids"]):
        length_dict[len(sample) // interval * interval] += 1
    length_tuples = list(length_dict.items())
    length_tuples.sort()
    count_accu, prob_accu = 0, 0
    for length, count in length_tuples:
        count_accu += count
        prob_accu += count / total_num * 100
        print("{:d} ({:.2f}%) samples have length < {}.".format(count_accu, prob_accu, length + interval))
 if __name__ == "__main__":
    fire.Fire(length_cdf)
--- a/scripts/llama_pro.py
+++ b/scripts/llama_pro.py
@@ -0,0 +1,131 @@
 # coding=utf-8
 # Copyright 2024 Tencent Inc. and the LlamaFactory team.
 #
 # This code is inspired by the Tencent's LLaMA-Pro library.
 # https://github.com/TencentARC/LLaMA-Pro/blob/main/scripts/block_expansion.py
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 import os
 from collections import OrderedDict
 from typing import TYPE_CHECKING, Optional
 import fire
 import torch
 from safetensors.torch import save_file
 from tqdm import tqdm
 from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
 from transformers.modeling_utils import (
    SAFE_WEIGHTS_INDEX_NAME,
    SAFE_WEIGHTS_NAME,
    WEIGHTS_INDEX_NAME,
    WEIGHTS_NAME,
    shard_checkpoint,
 )
 if TYPE_CHECKING:
    from transformers import PretrainedConfig, PreTrainedModel
 def change_name(name: str, old_index: int, new_index: int) -> str:
    return name.replace(".{:d}.".format(old_index), ".{:d}.".format(new_index))
 def block_expansion(
    model_name_or_path: str,
    output_dir: str,
    num_expand: int,
    shard_size: Optional[str] = "2GB",
    save_safetensors: Optional[bool] = False,
 ):
    r"""
    Performs block expansion for LLaMA, Mistral, Qwen1.5 or Yi models.
    Usage: python llama_pro.py --model_name_or_path meta-llama/Llama-2-7b-hf --output_dir llama2_pro --num_expand 8
    """
    config: "PretrainedConfig" = AutoConfig.from_pretrained(model_name_or_path)
    num_layers = getattr(config, "num_hidden_layers")
    setattr(config, "num_hidden_layers", num_layers + num_expand)
    config.save_pretrained(output_dir)
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
    tokenizer.save_pretrained(output_dir)
    config: "PretrainedConfig" = AutoConfig.from_pretrained(model_name_or_path)  # load the original one
    if save_safetensors:
        setattr(config, "tie_word_embeddings", False)  # safetensors does not allow shared weights
    model: "PreTrainedModel" = AutoModelForCausalLM.from_pretrained(
        model_name_or_path,
        config=config,
        torch_dtype="auto",
        trust_remote_code=True,
        low_cpu_mem_usage=True,
    )
    state_dict = model.state_dict()
    if num_layers % num_expand != 0:
        raise ValueError("`num_layers` {} should be divisible by `num_expand` {}.".format(num_layers, num_expand))
    split = num_layers // num_expand
    layer_cnt = 0
    output_state_dict = OrderedDict()
    for i in range(num_layers):
        for key, value in state_dict.items():
            if ".{:d}.".format(i) in key:
                output_state_dict[change_name(key, i, layer_cnt)] = value
        print("Add layer {} copied from layer {}".format(layer_cnt, i))
        layer_cnt += 1
        if (i + 1) % split == 0:
            for key, value in state_dict.items():
                if ".{:d}.".format(i) in key:
                    if "down_proj" in key or "o_proj" in key:
                        output_state_dict[change_name(key, i, layer_cnt)] = torch.zeros_like(value)
                    else:
                        output_state_dict[change_name(key, i, layer_cnt)] = torch.clone(value)
            print("Add layer {} expanded from layer {}".format(layer_cnt, i))
            layer_cnt += 1
    for key, value in state_dict.items():
        if key not in output_state_dict:
            output_state_dict[key] = value
    weights_name = SAFE_WEIGHTS_NAME if save_safetensors else WEIGHTS_NAME
    shards, index = shard_checkpoint(output_state_dict, max_shard_size=shard_size, weights_name=weights_name)
    for shard_file, shard in tqdm(shards.items(), desc="Save weights"):
        if save_safetensors:
            save_file(shard, os.path.join(output_dir, shard_file), metadata={"format": "pt"})
        else:
            torch.save(shard, os.path.join(output_dir, shard_file))
    if index is None:
        print("Model weights saved in {}".format(os.path.join(output_dir, weights_name)))
    else:
        index_name = SAFE_WEIGHTS_INDEX_NAME if save_safetensors else WEIGHTS_INDEX_NAME
        with open(os.path.join(output_dir, index_name), "w", encoding="utf-8") as f:
            json.dump(index, f, indent=2, sort_keys=True)
        print("Model weights saved in {}".format(output_dir))
    print("- Fine-tune this model with:")
    print("model_name_or_path: {}".format(output_dir))
    print("finetuning_type: freeze")
    print("freeze_trainable_layers: {}".format(num_expand))
    print("use_llama_pro: true")
 if __name__ == "__main__":
    fire.Fire(block_expansion)
--- a/scripts/llamafy_baichuan2.py
+++ b/scripts/llamafy_baichuan2.py
@@ -1,8 +1,17 @@
 # coding=utf-8
-# Converts the Baichuan2-7B model in the same format as LLaMA2-7B.
+# Copyright 2024 the LlamaFactory team.
-# Usage: python llamafy_baichuan2.py --input_dir input --output_dir output --shard_size 10GB
+#
-# Inspired by: https://huggingface.co/fireballoon/baichuan-llama-7b/blob/main/convert_baichuan_to_llama.py
+# Licensed under the Apache License, Version 2.0 (the "License");
-# Converted model: https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
+# you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 import os
@@ -76,7 +85,14 @@ def save_config(input_dir: str, output_dir: str):
    print("Model config saved in {}".format(os.path.join(output_dir, CONFIG_NAME)))
-def llamafy_baichuan2(input_dir: str, output_dir: str, shard_size: str, save_safetensors: Optional[bool] = False):
+def llamafy_baichuan2(
    input_dir: str, output_dir: str, shard_size: Optional[str] = "2GB", save_safetensors: Optional[bool] = False
 ):
    r"""
    Converts the Baichuan2-7B model in the same format as LLaMA2-7B.
    Usage: python llamafy_baichuan2.py --input_dir input --output_dir output
    Converted model: https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
    """
    try:
        os.makedirs(output_dir, exist_ok=False)
    except Exception as e:
--- a/scripts/llamafy_qwen.py
+++ b/scripts/llamafy_qwen.py
@@ -1,7 +1,17 @@
 # coding=utf-8
-# Converts the Qwen models in the same format as LLaMA2.
+# Copyright 2024 the LlamaFactory team.
-# Usage: python llamafy_qwen.py --input_dir input --output_dir output --shard_size 10GB
+#
-# Converted model: https://huggingface.co/hiyouga/Qwen-14B-Chat-LLaMAfied
+# Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 import os
@@ -128,7 +138,14 @@ def save_config(input_dir: str, output_dir: str, torch_dtype: str):
    print("Model config saved in {}".format(os.path.join(output_dir, CONFIG_NAME)))
-def llamafy_qwen(input_dir: str, output_dir: str, shard_size: str, save_safetensors: Optional[bool] = False):
+def llamafy_qwen(
    input_dir: str, output_dir: str, shard_size: Optional[str] = "2GB", save_safetensors: Optional[bool] = False
 ):
    r"""
    Converts the Qwen models in the same format as LLaMA2.
    Usage: python llamafy_qwen.py --input_dir input --output_dir output
    Converted model: https://huggingface.co/hiyouga/Qwen-14B-Chat-LLaMAfied
    """
    try:
        os.makedirs(output_dir, exist_ok=False)
    except Exception as e:
--- a/scripts/loftq_init.py
+++ b/scripts/loftq_init.py
@@ -0,0 +1,89 @@
 # coding=utf-8
 # Copyright 2024 HuggingFace Inc. and the LlamaFactory team.
 #
 # This code is based on the HuggingFace's PEFT library.
 # https://github.com/huggingface/peft/blob/v0.10.0/examples/loftq_finetuning/quantize_save_load.py
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 from typing import TYPE_CHECKING
 import fire
 from peft import LoftQConfig, LoraConfig, TaskType, get_peft_model
 from transformers import AutoModelForCausalLM, AutoTokenizer
 if TYPE_CHECKING:
    from transformers import PreTrainedModel
 def quantize_loftq(
    model_name_or_path: str,
    output_dir: str,
    loftq_bits: int = 4,
    loftq_iter: int = 4,
    lora_alpha: int = None,
    lora_rank: int = 16,
    lora_dropout: float = 0,
    lora_target: tuple = ("q_proj", "v_proj"),
    save_safetensors: bool = True,
 ):
    r"""
    Initializes LoRA weights with LoRA-fine-tuning-aware Quantization (LoftQ)
    Usage: python loftq_init.py --model_name_or_path path_to_model --output_dir output_dir
    """
    if isinstance(lora_target, str):
        lora_target = [name.strip() for name in lora_target.split(",")]
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype="auto")
    loftq_config = LoftQConfig(loftq_bits=loftq_bits, loftq_iter=loftq_iter)
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=True,
        r=lora_rank,
        lora_alpha=lora_alpha if lora_alpha is not None else lora_rank * 2,
        lora_dropout=lora_dropout,
        target_modules=lora_target,
        init_lora_weights="loftq",
        loftq_config=loftq_config,
    )
    # Init LoftQ model
    print("Initializing LoftQ weights, it may be take several minutes, wait patiently.")
    peft_model = get_peft_model(model, lora_config)
    loftq_dir = os.path.join(output_dir, "loftq_init")
    # Save LoftQ model
    setattr(peft_model.peft_config["default"], "base_model_name_or_path", output_dir)
    setattr(peft_model.peft_config["default"], "init_lora_weights", True)  # don't apply loftq again
    peft_model.save_pretrained(loftq_dir, safe_serialization=save_safetensors)
    print("Adapter weights saved in {}".format(loftq_dir))
    # Save base model
    base_model: "PreTrainedModel" = peft_model.unload()
    base_model.save_pretrained(output_dir, safe_serialization=save_safetensors)
    tokenizer.save_pretrained(output_dir)
    print("Model weights saved in {}".format(output_dir))
    print("- Fine-tune this model with:")
    print("model_name_or_path: {}".format(output_dir))
    print("adapter_name_or_path: {}".format(loftq_dir))
    print("finetuning_type: lora")
    print("quantization_bit: {}".format(loftq_bits))
 if __name__ == "__main__":
    fire.Fire(quantize_loftq)
--- a/scripts/pissa_init.py
+++ b/scripts/pissa_init.py
@@ -0,0 +1,86 @@
 # coding=utf-8
 # Copyright 2024 HuggingFace Inc. and the LlamaFactory team.
 #
 # This code is based on the HuggingFace's PEFT library.
 # https://github.com/huggingface/peft/blob/v0.11.0/examples/pissa_finetuning/preprocess.py
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 from typing import TYPE_CHECKING
 import fire
 from peft import LoraConfig, TaskType, get_peft_model
 from transformers import AutoModelForCausalLM, AutoTokenizer
 if TYPE_CHECKING:
    from transformers import PreTrainedModel
 def quantize_pissa(
    model_name_or_path: str,
    output_dir: str,
    pissa_iter: int = 4,
    lora_alpha: int = None,
    lora_rank: int = 16,
    lora_dropout: float = 0,
    lora_target: tuple = ("q_proj", "v_proj"),
    save_safetensors: bool = True,
 ):
    r"""
    Initializes LoRA weights with Principal Singular values and Singular vectors Adaptation (PiSSA)
    Usage: python pissa_init.py --model_name_or_path path_to_model --output_dir output_dir
    """
    if isinstance(lora_target, str):
        lora_target = [name.strip() for name in lora_target.split(",")]
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype="auto")
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        r=lora_rank,
        lora_alpha=lora_alpha if lora_alpha is not None else lora_rank * 2,
        lora_dropout=lora_dropout,
        target_modules=lora_target,
        init_lora_weights="pissa" if pissa_iter == -1 else "pissa_niter_{}".format(pissa_iter),
    )
    # Init PiSSA model
    peft_model = get_peft_model(model, lora_config)
    pissa_dir = os.path.join(output_dir, "pissa_init")
    # Save PiSSA model
    setattr(peft_model.peft_config["default"], "init_lora_weights", True)  # don't apply pissa again
    peft_model.save_pretrained(pissa_dir, safe_serialization=save_safetensors)
    print("Adapter weights saved in {}".format(pissa_dir))
    # Save base model
    base_model: "PreTrainedModel" = peft_model.unload()
    base_model.save_pretrained(output_dir, safe_serialization=save_safetensors)
    tokenizer.save_pretrained(output_dir)
    print("Model weights saved in {}".format(output_dir))
    print("- Fine-tune this model with:")
    print("model_name_or_path: {}".format(output_dir))
    print("adapter_name_or_path: {}".format(pissa_dir))
    print("finetuning_type: lora")
    print("pissa_init: false")
    print("pissa_convert: true")
    print("- and optionally with:")
    print("quantization_bit: 4")
 if __name__ == "__main__":
    fire.Fire(quantize_pissa)
--- a/scripts/test_toolcall.py
+++ b/scripts/test_toolcall.py
@@ -0,0 +1,79 @@
 # coding=utf-8
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 import os
 from typing import Sequence
 from openai import OpenAI
 from transformers.utils.versions import require_version
 require_version("openai>=1.5.0", "To fix: pip install openai>=1.5.0")
 def calculate_gpa(grades: Sequence[str], hours: Sequence[int]) -> float:
    grade_to_score = {"A": 4, "B": 3, "C": 2}
    total_score, total_hour = 0, 0
    for grade, hour in zip(grades, hours):
        total_score += grade_to_score[grade] * hour
        total_hour += hour
    return round(total_score / total_hour, 2)
 def main():
    client = OpenAI(
        api_key="{}".format(os.environ.get("API_KEY", "0")),
        base_url="http://localhost:{}/v1".format(os.environ.get("API_PORT", 8000)),
    )
    tools = [
        {
            "type": "function",
            "function": {
                "name": "calculate_gpa",
                "description": "Calculate the Grade Point Average (GPA) based on grades and credit hours",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "grades": {"type": "array", "items": {"type": "string"}, "description": "The grades"},
                        "hours": {"type": "array", "items": {"type": "integer"}, "description": "The credit hours"},
                    },
                    "required": ["grades", "hours"],
                },
            },
        }
    ]
    tool_map = {"calculate_gpa": calculate_gpa}
    messages = []
    messages.append({"role": "user", "content": "My grades are A, A, B, and C. The credit hours are 3, 4, 3, and 2."})
    result = client.chat.completions.create(messages=messages, model="test", tools=tools)
    if result.choices[0].message.tool_calls is None:
        raise ValueError("Cannot retrieve function call from the response.")
    messages.append(result.choices[0].message)
    tool_call = result.choices[0].message.tool_calls[0].function
    print(tool_call)
    # Function(arguments='{"grades": ["A", "A", "B", "C"], "hours": [3, 4, 3, 2]}', name='calculate_gpa')
    name, arguments = tool_call.name, json.loads(tool_call.arguments)
    tool_result = tool_map[name](**arguments)
    messages.append({"role": "tool", "content": json.dumps({"gpa": tool_result}, ensure_ascii=False)})
    result = client.chat.completions.create(messages=messages, model="test", tools=tools)
    print(result.choices[0].message.content)
    # Based on the grades and credit hours you provided, your Grade Point Average (GPA) is 3.42.
 if __name__ == "__main__":
    main()
--- a/setup.py
+++ b/setup.py
@@ -1,13 +1,28 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import re
-from setuptools import setup, find_packages
+
 from setuptools import find_packages, setup
 def get_version():
-    with open(os.path.join("src", "llmtuner", "__init__.py"), "r", encoding="utf-8") as f:
+    with open(os.path.join("src", "llamafactory", "extras", "env.py"), "r", encoding="utf-8") as f:
        file_content = f.read()
-        pattern = r"{0}\W*=\W*\"([^\"]+)\"".format("__version__")
+        pattern = r"{}\W*=\W*\"([^\"]+)\"".format("VERSION")
-        version, = re.findall(pattern, file_content)
+        (version,) = re.findall(pattern, file_content)
        return version
@@ -18,10 +33,29 @@ def get_requires():
        return lines
-def main():
+extra_require = {
    "torch": ["torch>=1.13.1"],
    "torch-npu": ["torch==2.1.0", "torch-npu==2.1.0.post3", "decorator"],
    "metrics": ["nltk", "jieba", "rouge-chinese"],
    "deepspeed": ["deepspeed>=0.10.0"],
    "bitsandbytes": ["bitsandbytes>=0.39.0"],
    "hqq": ["hqq"],
    "eetq": ["eetq"],
    "gptq": ["optimum>=1.17.0", "auto-gptq>=0.5.0"],
    "awq": ["autoawq"],
    "aqlm": ["aqlm[gpu]>=1.1.0"],
    "vllm": ["vllm>=0.4.3"],
    "galore": ["galore-torch"],
    "badam": ["badam>=1.2.1"],
    "qwen": ["transformers_stream_generator"],
    "modelscope": ["modelscope"],
    "dev": ["ruff", "pytest"],
 }
 def main():
    setup(
-        name="llmtuner",
+        name="llamafactory",
        version=get_version(),
        author="hiyouga",
        author_email="hiyouga" "@" "buaa.edu.cn",
@@ -35,8 +69,10 @@ def main():
        packages=find_packages("src"),
        python_requires=">=3.8.0",
        install_requires=get_requires(),
        extras_require=extra_require,
        entry_points={"console_scripts": ["llamafactory-cli = llamafactory.cli:main"]},
        classifiers=[
-            "Development Status :: 3 - Alpha",
+            "Development Status :: 4 - Beta",
            "Intended Audience :: Developers",
            "Intended Audience :: Education",
            "Intended Audience :: Science/Research",
@@ -46,8 +82,9 @@ def main():
            "Programming Language :: Python :: 3.8",
            "Programming Language :: Python :: 3.9",
            "Programming Language :: Python :: 3.10",
            "Programming Language :: Python :: 3.11",
            "Topic :: Scientific/Engineering :: Artificial Intelligence",
-        ]
+        ],
    )
--- a/src/api.py
+++ b/src/api.py
@@ -0,0 +1,33 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 import uvicorn
 from llamafactory.api.app import create_app
 from llamafactory.chat import ChatModel
 def main():
    chat_model = ChatModel()
    app = create_app(chat_model)
    api_host = os.environ.get("API_HOST", "0.0.0.0")
    api_port = int(os.environ.get("API_PORT", "8000"))
    print("Visit http://localhost:{}/docs for API document.".format(api_port))
    uvicorn.run(app, host=api_host, port=api_port)
 if __name__ == "__main__":
    main()
--- a/src/api_demo.py
+++ b/src/api_demo.py
@@ -1,16 +0,0 @@
 import os
 import uvicorn
 from llmtuner import ChatModel, create_app
 def main():
    chat_model = ChatModel()
    app = create_app(chat_model)
    print("Visit http://localhost:{}/docs for API document.".format(os.environ.get("API_PORT", 8000)))
    uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("API_PORT", 8000)), workers=1)
 if __name__ == "__main__":
    main()
--- a/src/cli_demo.py
+++ b/src/cli_demo.py
@@ -1,49 +0,0 @@
 from llmtuner import ChatModel
 from llmtuner.extras.misc import torch_gc
 try:
    import platform
    if platform.system() != "Windows":
        import readline  # noqa: F401
 except ImportError:
    print("Install `readline` for a better experience.")
 def main():
    chat_model = ChatModel()
    messages = []
    print("Welcome to the CLI application, use `clear` to remove the history, use `exit` to exit the application.")
    while True:
        try:
            query = input("\nUser: ")
        except UnicodeDecodeError:
            print("Detected decoding error at the inputs, please set the terminal encoding to utf-8.")
            continue
        except Exception:
            raise
        if query.strip() == "exit":
            break
        if query.strip() == "clear":
            messages = []
            torch_gc()
            print("History has been removed.")
            continue
        messages.append({"role": "user", "content": query})
        print("Assistant: ", end="", flush=True)
        response = ""
        for new_text in chat_model.stream_chat(messages):
            print(new_text, end="", flush=True)
            response += new_text
        print()
        messages.append({"role": "assistant", "content": response})
 if __name__ == "__main__":
    main()
--- a/src/evaluate.py
+++ b/src/evaluate.py
@@ -1,10 +0,0 @@
 from llmtuner import Evaluator
 def main():
    evaluator = Evaluator()
    evaluator.eval()
 if __name__ == "__main__":
    main()
--- a/src/export_model.py
+++ b/src/export_model.py
@@ -1,9 +0,0 @@
 from llmtuner import export_model
 def main():
    export_model()
 if __name__ == "__main__":
    main()
--- a/src/llamafactory/init.py
+++ b/src/llamafactory/init.py
@@ -0,0 +1,41 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 r"""
 Efficient fine-tuning of large language models.
 Level:
  api, webui > chat, eval, train > data, model > hparams > extras
 Dependency graph:
  main:
    transformers>=4.41.2
    datasets>=2.16.0
    accelerate>=0.30.1
    peft>=0.11.1
    trl>=0.8.6
  attention:
    transformers>=4.42.4 (gemma+fa2)
  longlora:
    transformers>=4.41.2,<=4.42.4
  packing:
    transformers>=4.41.2,<=4.42.4
  patcher:
    transformers==4.41.2 (chatglm)
 """
 from .cli import VERSION
 __version__ = VERSION
--- a/src/llamafactory/api/init.py
+++ b/src/llamafactory/api/init.py
--- a/src/llamafactory/api/app.py
+++ b/src/llamafactory/api/app.py
@@ -0,0 +1,122 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import os
 from contextlib import asynccontextmanager
 from typing import Optional
 from typing_extensions import Annotated
 from ..chat import ChatModel
 from ..extras.misc import torch_gc
 from ..extras.packages import is_fastapi_available, is_starlette_available, is_uvicorn_available
 from .chat import (
    create_chat_completion_response,
    create_score_evaluation_response,
    create_stream_chat_completion_response,
 )
 from .protocol import (
    ChatCompletionRequest,
    ChatCompletionResponse,
    ModelCard,
    ModelList,
    ScoreEvaluationRequest,
    ScoreEvaluationResponse,
 )
 if is_fastapi_available():
    from fastapi import Depends, FastAPI, HTTPException, status
    from fastapi.middleware.cors import CORSMiddleware
    from fastapi.security.http import HTTPAuthorizationCredentials, HTTPBearer
 if is_starlette_available():
    from sse_starlette import EventSourceResponse
 if is_uvicorn_available():
    import uvicorn
@asynccontextmanager
 async def lifespan(app: "FastAPI"):  # collects GPU memory
    yield
    torch_gc()
 def create_app(chat_model: "ChatModel") -> "FastAPI":
    app = FastAPI(lifespan=lifespan)
    app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_credentials=True,
        allow_methods=["*"],
        allow_headers=["*"],
    )
    api_key = os.environ.get("API_KEY")
    security = HTTPBearer(auto_error=False)
    async def verify_api_key(auth: Annotated[Optional[HTTPAuthorizationCredentials], Depends(security)]):
        if api_key and (auth is None or auth.credentials != api_key):
            raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API key.")
    @app.get(
        "/v1/models",
        response_model=ModelList,
        status_code=status.HTTP_200_OK,
        dependencies=[Depends(verify_api_key)],
    )
    async def list_models():
        model_card = ModelCard(id="gpt-3.5-turbo")
        return ModelList(data=[model_card])
    @app.post(
        "/v1/chat/completions",
        response_model=ChatCompletionResponse,
        status_code=status.HTTP_200_OK,
        dependencies=[Depends(verify_api_key)],
    )
    async def create_chat_completion(request: ChatCompletionRequest):
        if not chat_model.engine.can_generate:
            raise HTTPException(status_code=status.HTTP_405_METHOD_NOT_ALLOWED, detail="Not allowed")
        if request.stream:
            generate = create_stream_chat_completion_response(request, chat_model)
            return EventSourceResponse(generate, media_type="text/event-stream")
        else:
            return await create_chat_completion_response(request, chat_model)
    @app.post(
        "/v1/score/evaluation",
        response_model=ScoreEvaluationResponse,
        status_code=status.HTTP_200_OK,
        dependencies=[Depends(verify_api_key)],
    )
    async def create_score_evaluation(request: ScoreEvaluationRequest):
        if chat_model.engine.can_generate:
            raise HTTPException(status_code=status.HTTP_405_METHOD_NOT_ALLOWED, detail="Not allowed")
        return await create_score_evaluation_response(request, chat_model)
    return app
 def run_api() -> None:
    chat_model = ChatModel()
    app = create_app(chat_model)
    api_host = os.environ.get("API_HOST", "0.0.0.0")
    api_port = int(os.environ.get("API_PORT", "8000"))
    print("Visit http://localhost:{}/docs for API document.".format(api_port))
    uvicorn.run(app, host=api_host, port=api_port)
--- a/src/llamafactory/api/chat.py
+++ b/src/llamafactory/api/chat.py
@@ -0,0 +1,237 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import base64
 import io
 import json
 import os
 import uuid
 from typing import TYPE_CHECKING, AsyncGenerator, Dict, List, Optional, Tuple
 from ..data import Role as DataRole
 from ..extras.logging import get_logger
 from ..extras.packages import is_fastapi_available, is_pillow_available, is_requests_available
 from .common import dictify, jsonify
 from .protocol import (
    ChatCompletionMessage,
    ChatCompletionResponse,
    ChatCompletionResponseChoice,
    ChatCompletionResponseUsage,
    ChatCompletionStreamResponse,
    ChatCompletionStreamResponseChoice,
    Finish,
    Function,
    FunctionCall,
    Role,
    ScoreEvaluationResponse,
 )
 if is_fastapi_available():
    from fastapi import HTTPException, status
 if is_pillow_available():
    from PIL import Image
 if is_requests_available():
    import requests
 if TYPE_CHECKING:
    from numpy.typing import NDArray
    from ..chat import ChatModel
    from .protocol import ChatCompletionRequest, ScoreEvaluationRequest
 logger = get_logger(__name__)
 ROLE_MAPPING = {
    Role.USER: DataRole.USER.value,
    Role.ASSISTANT: DataRole.ASSISTANT.value,
    Role.SYSTEM: DataRole.SYSTEM.value,
    Role.FUNCTION: DataRole.FUNCTION.value,
    Role.TOOL: DataRole.OBSERVATION.value,
 }
 def _process_request(
    request: "ChatCompletionRequest",
 ) -> Tuple[List[Dict[str, str]], Optional[str], Optional[str], Optional["NDArray"]]:
    logger.info("==== request ====\n{}".format(json.dumps(dictify(request), indent=2, ensure_ascii=False)))
    if len(request.messages) == 0:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid length")
    if request.messages[0].role == Role.SYSTEM:
        system = request.messages.pop(0).content
    else:
        system = None
    if len(request.messages) % 2 == 0:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only supports u/a/u/a/u...")
    input_messages = []
    image = None
    for i, message in enumerate(request.messages):
        if i % 2 == 0 and message.role not in [Role.USER, Role.TOOL]:
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")
        elif i % 2 == 1 and message.role not in [Role.ASSISTANT, Role.FUNCTION]:
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid role")
        if message.role == Role.ASSISTANT and isinstance(message.tool_calls, list) and len(message.tool_calls):
            tool_calls = [
                {"name": tool_call.function.name, "arguments": tool_call.function.arguments}
                for tool_call in message.tool_calls
            ]
            content = json.dumps(tool_calls, ensure_ascii=False)
            input_messages.append({"role": ROLE_MAPPING[Role.FUNCTION], "content": content})
        elif isinstance(message.content, list):
            for input_item in message.content:
                if input_item.type == "text":
                    input_messages.append({"role": ROLE_MAPPING[message.role], "content": input_item.text})
                else:
                    image_url = input_item.image_url.url
                    if image_url.startswith("data:image"):  # base64 image
                        image_data = base64.b64decode(image_url.split(",", maxsplit=1)[1])
                        image_path = io.BytesIO(image_data)
                    elif os.path.isfile(image_url):  # local file
                        image_path = open(image_url, "rb")
                    else:  # web uri
                        image_path = requests.get(image_url, stream=True).raw
                    image = Image.open(image_path).convert("RGB")
        else:
            input_messages.append({"role": ROLE_MAPPING[message.role], "content": message.content})
    tool_list = request.tools
    if isinstance(tool_list, list) and len(tool_list):
        try:
            tools = json.dumps([dictify(tool.function) for tool in tool_list], ensure_ascii=False)
        except json.JSONDecodeError:
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid tools")
    else:
        tools = None
    return input_messages, system, tools, image
 def _create_stream_chat_completion_chunk(
    completion_id: str,
    model: str,
    delta: "ChatCompletionMessage",
    index: Optional[int] = 0,
    finish_reason: Optional["Finish"] = None,
 ) -> str:
    choice_data = ChatCompletionStreamResponseChoice(index=index, delta=delta, finish_reason=finish_reason)
    chunk = ChatCompletionStreamResponse(id=completion_id, model=model, choices=[choice_data])
    return jsonify(chunk)
 async def create_chat_completion_response(
    request: "ChatCompletionRequest", chat_model: "ChatModel"
 ) -> "ChatCompletionResponse":
    completion_id = "chatcmpl-{}".format(uuid.uuid4().hex)
    input_messages, system, tools, image = _process_request(request)
    responses = await chat_model.achat(
        input_messages,
        system,
        tools,
        image,
        do_sample=request.do_sample,
        temperature=request.temperature,
        top_p=request.top_p,
        max_new_tokens=request.max_tokens,
        num_return_sequences=request.n,
        stop=request.stop,
    )
    prompt_length, response_length = 0, 0
    choices = []
    for i, response in enumerate(responses):
        if tools:
            result = chat_model.engine.template.extract_tool(response.response_text)
        else:
            result = response.response_text
        if isinstance(result, list):
            tool_calls = []
            for tool in result:
                function = Function(name=tool[0], arguments=tool[1])
                tool_calls.append(FunctionCall(id="call_{}".format(uuid.uuid4().hex), function=function))
            response_message = ChatCompletionMessage(role=Role.ASSISTANT, tool_calls=tool_calls)
            finish_reason = Finish.TOOL
        else:
            response_message = ChatCompletionMessage(role=Role.ASSISTANT, content=result)
            finish_reason = Finish.STOP if response.finish_reason == "stop" else Finish.LENGTH
        choices.append(ChatCompletionResponseChoice(index=i, message=response_message, finish_reason=finish_reason))
        prompt_length = response.prompt_length
        response_length += response.response_length
    usage = ChatCompletionResponseUsage(
        prompt_tokens=prompt_length,
        completion_tokens=response_length,
        total_tokens=prompt_length + response_length,
    )
    return ChatCompletionResponse(id=completion_id, model=request.model, choices=choices, usage=usage)
 async def create_stream_chat_completion_response(
    request: "ChatCompletionRequest", chat_model: "ChatModel"
 ) -> AsyncGenerator[str, None]:
    completion_id = "chatcmpl-{}".format(uuid.uuid4().hex)
    input_messages, system, tools, image = _process_request(request)
    if tools:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Cannot stream function calls.")
    if request.n > 1:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Cannot stream multiple responses.")
    yield _create_stream_chat_completion_chunk(
        completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(role=Role.ASSISTANT, content="")
    )
    async for new_token in chat_model.astream_chat(
        input_messages,
        system,
        tools,
        image,
        do_sample=request.do_sample,
        temperature=request.temperature,
        top_p=request.top_p,
        max_new_tokens=request.max_tokens,
        stop=request.stop,
    ):
        if len(new_token) != 0:
            yield _create_stream_chat_completion_chunk(
                completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(content=new_token)
            )
    yield _create_stream_chat_completion_chunk(
        completion_id=completion_id, model=request.model, delta=ChatCompletionMessage(), finish_reason=Finish.STOP
    )
    yield "[DONE]"
 async def create_score_evaluation_response(
    request: "ScoreEvaluationRequest", chat_model: "ChatModel"
 ) -> "ScoreEvaluationResponse":
    if len(request.messages) == 0:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid request")
    scores = await chat_model.aget_scores(request.messages, max_length=request.max_length)
    return ScoreEvaluationResponse(model=request.model, scores=scores)
--- a/src/llamafactory/api/common.py
+++ b/src/llamafactory/api/common.py
@@ -0,0 +1,34 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import json
 from typing import TYPE_CHECKING, Any, Dict
 if TYPE_CHECKING:
    from pydantic import BaseModel
 def dictify(data: "BaseModel") -> Dict[str, Any]:
    try:  # pydantic v2
        return data.model_dump(exclude_unset=True)
    except AttributeError:  # pydantic v1
        return data.dict(exclude_unset=True)
 def jsonify(data: "BaseModel") -> str:
    try:  # pydantic v2
        return json.dumps(data.model_dump(exclude_unset=True), ensure_ascii=False)
    except AttributeError:  # pydantic v1
        return data.json(exclude_unset=True, ensure_ascii=False)
--- a/src/llamafactory/api/protocol.py
+++ b/src/llamafactory/api/protocol.py
@@ -1,6 +1,20 @@
 # Copyright 2024 the LlamaFactory team.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import time
 from enum import Enum, unique
-from typing import List, Optional
+from typing import Any, Dict, List, Optional, Union
 from pydantic import BaseModel, Field
 from typing_extensions import Literal
@@ -39,15 +53,37 @@ class Function(BaseModel):
    arguments: str
 class FunctionDefinition(BaseModel):
    name: str
    description: str
    parameters: Dict[str, Any]
 class FunctionAvailable(BaseModel):
    type: Literal["function", "code_interpreter"] = "function"
    function: Optional[FunctionDefinition] = None
 class FunctionCall(BaseModel):
-    id: Literal["call_default"] = "call_default"
+    id: str
    type: Literal["function"] = "function"
    function: Function
 class ImageURL(BaseModel):
    url: str
 class MultimodalInputItem(BaseModel):
    type: Literal["text", "image_url"]
    text: Optional[str] = None
    image_url: Optional[ImageURL] = None
 class ChatMessage(BaseModel):
    role: Role
-    content: str
+    content: Optional[Union[str, List[MultimodalInputItem]]] = None
    tool_calls: Optional[List[FunctionCall]] = None
 class ChatCompletionMessage(BaseModel):
@@ -59,12 +95,13 @@ class ChatCompletionMessage(BaseModel):
 class ChatCompletionRequest(BaseModel):
    model: str
    messages: List[ChatMessage]
-    tools: Optional[list] = []
+    tools: Optional[List[FunctionAvailable]] = None
-    do_sample: bool = True
+    do_sample: Optional[bool] = None
    temperature: Optional[float] = None
    top_p: Optional[float] = None
    n: int = 1
    max_tokens: Optional[int] = None
    stop: Optional[Union[str, List[str]]] = None
    stream: bool = False
@@ -74,7 +111,7 @@ class ChatCompletionResponseChoice(BaseModel):
    finish_reason: Finish
-class ChatCompletionResponseStreamChoice(BaseModel):
+class ChatCompletionStreamResponseChoice(BaseModel):
    index: int
    delta: ChatCompletionMessage
    finish_reason: Optional[Finish] = None
@@ -87,7 +124,7 @@ class ChatCompletionResponseUsage(BaseModel):
 class ChatCompletionResponse(BaseModel):
-    id: Literal["chatcmpl-default"] = "chatcmpl-default"
+    id: str
    object: Literal["chat.completion"] = "chat.completion"
    created: int = Field(default_factory=lambda: int(time.time()))
    model: str
@@ -96,11 +133,11 @@ class ChatCompletionResponse(BaseModel):
 class ChatCompletionStreamResponse(BaseModel):
-    id: Literal["chatcmpl-default"] = "chatcmpl-default"
+    id: str
    object: Literal["chat.completion.chunk"] = "chat.completion.chunk"
    created: int = Field(default_factory=lambda: int(time.time()))
    model: str
-    choices: List[ChatCompletionResponseStreamChoice]
+    choices: List[ChatCompletionStreamResponseChoice]
 class ScoreEvaluationRequest(BaseModel):
@@ -110,7 +147,7 @@ class ScoreEvaluationRequest(BaseModel):
 class ScoreEvaluationResponse(BaseModel):
-    id: Literal["scoreeval-default"] = "scoreeval-default"
+    id: str
    object: Literal["score.evaluation"] = "score.evaluation"
    model: str
    scores: List[float]
--- a/Show More
+++ b/Show More
		`@@ -0,0 +1,2 @@`
							`model_name_or_path: meta-llama/Meta-Llama-3-8B-Instruct`
							`template: llama3`