This website requires JavaScript.
Explore
Help
Register
Sign In
ros
/
nanochat
Watch
1
Star
0
Fork
0
You've already forked nanochat
mirror of
https://github.com/karpathy/nanochat.git
synced
2026-01-30 04:22:02 +00:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
Files
4ddc8037975f0c11e11038a27eaf81f070971dc8
nanochat
/
scripts
History
Andrej Karpathy
061f83c152
delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
2026-01-08 02:16:50 +00:00
..
base_eval.py
bugfix
2025-12-26 19:02:12 +08:00
base_loss.py
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
2026-01-04 19:14:23 +00:00
base_train.py
delete grad_clip. appears to not be necessary at all. not only was it buggy because the clipping happened per gpu before grad synchronization, but it costs ~2% MFU, and it also doesn't even help. I tried deleting it a while ago and back then it did help. So I'm guessing that some hyperparameter tuning obviated the reason for it since then
2026-01-08 02:16:50 +00:00
chat_cli.py
upgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
2025-10-20 10:15:17 -07:00
chat_eval.py
fix typos
2025-11-14 11:20:25 +01:00
chat_rl.py
Fix undefined variable in chat_rl after recent refactor
2026-01-07 09:08:57 -08:00
chat_sft.py
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
2026-01-04 19:14:23 +00:00
chat_web.py
ensure consistency of quotes within each statement
2025-11-03 21:52:02 +01:00
mid_train.py
delete the configurator in favor of argparse and clean up a lot of kwarg details to make them more consistent across all scripts
2026-01-04 19:14:23 +00:00
tok_eval.py
initial commit
2025-10-13 06:49:24 -07:00
tok_train.py
nudge hyperparameters of the base script with the results of the sweeps and miniseries. vocab size down to 32K. D:N ratio from 20 to 8. add miniseries script
2026-01-07 22:11:59 +00:00