support ORPO

Former-commit-id: f44a4c27e2461cdaa1b16865f597a31033c0e6d9
This commit is contained in:
hiyouga
2024-03-31 18:29:50 +08:00
parent 526111a303
commit d764cd8736
22 changed files with 395 additions and 47 deletions

View File

@@ -34,6 +34,8 @@ If you are using a custom dataset, please provide your dataset definition in the
Given above, you can use the custom dataset via specifying `--dataset dataset_name`.
----
Currently we support dataset in **alpaca** or **sharegpt** format, the dataset in alpaca format should follow the below format:
```json
@@ -84,6 +86,10 @@ For the preference datasets, the `response` column should be a string list whose
}
```
Remember to set `"ranking": true` for the preference datasets.
----
The dataset in sharegpt format should follow the below format:
```json

View File

@@ -34,6 +34,8 @@
添加后可通过指定 `--dataset 数据集名称` 参数使用自定义数据集。
----
该项目目前支持两种格式的数据集:**alpaca** 和 **sharegpt**,其中 alpaca 格式的数据集按照以下方式组织:
```json
@@ -84,6 +86,10 @@
}
```
添加偏好数据集需要额外指定 `"ranking": true`
----
而 sharegpt 格式的数据集按照以下方式组织:
```json