karpathy
f9a7e0f111
update the CPU/MPS script to give reasonable results. The model can at least answer that Paris is the capital of France and knows that the sky is blue, for about 40 minutes of training on my macbook. Also fixed a bug that existed due to KVCache bfloat16 dtype assumption
2026-01-17 12:27:30 -08:00
Yury Kirpichev
77a46902e4
Fix WANDB_RUN parameter passing in runcpu.sh ( #407 )
...
- Add --run=$WANDB_RUN to base_train, mid_train, and chat_sft calls
- Ensures wandb logging works when WANDB_RUN environment variable is set
- Matches the behavior in speedrun.sh
Co-authored-by: svlandeg <svlandeg@github.com >
2026-01-16 18:59:44 -08:00
Andrej Karpathy
7312ec9898
fix buggy midtrain and update all kwargs to be idiomatic. that is, argparse uses dashes variables use underscores. the underscores are just a remnant of the previous Configurator object. This is the right way
2026-01-13 22:45:27 +00:00
Sofie Van Landeghem
a1ccb3dc0b
remove rust compilation as rustbpe is now installed from separate package ( #416 )
2026-01-08 06:18:37 -08:00
Andrej Karpathy
cf587acb1a
move eval bundle download to be lazy and inside the python code so that we can substantially simplify the run bash scripts
2025-11-01 16:04:38 +00:00
Tancrède Lepoint
d5cda11ab8
Export the base dir variable
2025-10-22 18:15:02 -04:00
Luke Stanley
901b075605
Fix GPU-less CPU use on Linux with specific Torch indexes
2025-10-21 23:14:16 +00:00
Andrej Karpathy
94ee507054
quick fix base eval due to fewshot requirement
2025-10-21 17:56:08 +00:00
karpathy
2e9669e03a
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00