Former-commit-id: 2744dc9d2f9df4150a496b38e24ea96040a85bef
Usage:
pretrain.sh
sft.sh
reward.sh
ppo.sh
dpo.sh
predict.sh