update template

Former-commit-id: a95f3a4d62de1073a78125401cf4289ec0523156
2023-08-22 19:46:09 +08:00
parent f55907dbea
commit 6310613699
6 changed files with 38 additions and 20 deletions
--- a/data/README.md
+++ b/data/README.md
@@ -17,14 +17,15 @@ If you are using a custom dataset, please provide your dataset definition in the

 where the `prompt` and `response` columns should contain non-empty values. The `query` column will be concatenated with the `prompt` column and used as input for the model. The `history` column should contain a list where each element is a string tuple representing a query-response pair.

-For Reward-Modeling(rm) dataset, the first n examples represent chosen examples and the last n examples represent rejected examples.
+For datasets used in reward modeling or DPO training, the `response` column should be a string list, with the preferred answers appearing first, for example:
+
 ```json
 {
-    "instruction": "Question?",
-    "input": "",
-    "output": [
-       "chosen answer",
-       "rejected answer"
-    ]
+  "instruction": "Question",
+  "input": "",
+  "output": [
+    "Chosen answer",
+    "Rejected answer"
+  ]
 }
 ```