nanochat

mirror of https://github.com/karpathy/nanochat.git synced 2026-01-30 04:22:02 +00:00

Author	SHA1	Message	Date
Andrej Karpathy	2874eda59a	update to new os env var to get rid of deprecation warning	2025-12-28 03:32:46 +00:00
Andrej Karpathy	e1770a3061	remove spurious cast, gets compiled away anyway but it's confusing people	2025-12-27 23:07:48 +00:00
Andrej Karpathy	49389ecaa8	fix tf32 warning for deprecated api use	2025-12-27 22:03:06 +00:00
DU Wenjie	ea4229851b	bugfix	2025-12-26 19:02:12 +08:00
DU Wenjie	7840049189	bugfix keep same args style in scripts/base_eval.py	2025-12-26 17:29:08 +08:00
Andrej	bc51da8bac	pad vocab size to 64 for DDP optimizers and efficiency	2025-12-23 09:13:31 -08:00
duwenjie	92c6654b95	bugfix save and load ckpt from model_tag dir	2025-12-21 15:07:04 +08:00
Barış Özmen	790f3be65c	add rust batch encode as a faster option over encode	2025-12-18 19:17:59 +03:00
Matěj Kripner	d314e96aa2	formatting	2025-12-09 12:48:46 +01:00
Matěj Kripner	bbc57da7d5	slightly nicer error message	2025-12-09 12:46:48 +01:00
Matěj Kripner	f1bf69d562	feat: pad vocab size to 64 for DDP optimizers and efficiency	2025-12-09 12:38:18 +01:00
Andrej	d5759400f9	fixing two typos in comments	2025-12-08 20:03:08 -08:00
Andrej	e72c3299df	fix random.seed() footgun bug for SpellingBee data generation	2025-12-08 19:58:45 -08:00
Andrej	7931e0903a	rename checkpoint_dir to checkpoints_dir for consistency.	2025-12-08 18:32:12 -08:00
Andrej	849d95ae1f	remove unnecessary check to make the logic in CausalSelfAttention.forward() clearer	2025-12-08 18:30:37 -08:00
Andrej	39cccc527f	small bugfix make mid_train script work even with a tiny number of iterations	2025-12-08 18:27:32 -08:00
Andrej	8b1cecaa95	Apply suggestion from @svlandeg for nicer looking comparison Co-authored-by: Sofie Van Landeghem <svlandeg@users.noreply.github.com>	2025-12-08 18:27:06 -08:00
Andrej	58f3e84e01	clean up train/val loader in sft for consistency with mid/base	2025-12-08 18:23:57 -08:00
Andrej	1b2a675c88	Improve KV cache code readability	2025-12-08 18:19:05 -08:00
Andrej	d75e6ed711	Fix script comment to reference correct file	2025-12-08 18:16:42 -08:00
Andrej	72a7cf2bc4	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 18:15:02 -08:00
Andrej Karpathy	bffdb2ef91	group common code to make things neater in gpt logit computation	2025-12-09 02:01:05 +00:00
Andrej	cbf30c842c	apply float32 cast before logits softcapping so the tanh is in fp32. torch compile fuses this correctly with no extra memory costs.	2025-12-08 14:17:43 -08:00
Andrej Karpathy	90442de35f	fix bug where any rank has to be able to create checkpoint_dir if saving optim	2025-12-08 20:45:19 +00:00
Andrej	2fd0440355	fix: missing val_bpb on resume	2025-12-08 12:35:08 -08:00
sunyujun03	01ea71be39	Fix distributed Parquet dataloader resume for multi-epoch training	2025-12-08 00:10:19 -06:00
KimYeongHyeon	a8847a0f83	Fix script comment to reference correct file	2025-12-02 10:46:20 +09:00
deepbuilder	06677c30e0	Refactor dimension validation for KV cache	2025-11-28 15:22:18 -05:00
deepbuilder	a770dcef2e	Fix kv_cache indexing to explicitly include head dimension	2025-11-28 15:00:14 -05:00
spjosyula	16788eed3c	fix(model): apply float32 cast before logits softcapping This change ensures that the logits softcapping operation (tanh) is performed in float32 precision rather than bfloat16. Previously, the code cast to float32 after the tanh operation, which meant the non-linearity was computed with bfloat16 precision	2025-11-23 20:12:09 +05:30
Sanzo00	53b3a4fb81	fix: missing val_bpb on resume	2025-11-22 11:04:20 +08:00
svlandeg	4bcc3bb698	clarify comment	2025-11-21 13:19:45 +01:00
Eric Silberstein	f37d45c21f	remove unneeded iter()	2025-11-20 15:14:56 -05:00
Eric Silberstein	5c93a56be5	remove unnecessary check	2025-11-19 16:31:41 -05:00
Eric Silberstein	dddb95caac	make mid_train script work even with a tiny number of iterations	2025-11-19 15:52:20 -05:00
Eric Silberstein	a4a0959c73	renamed find_largest_model() argument checkpoint_dir to checkpoints_dir for clarity	2025-11-19 15:33:36 -05:00
Eric Silberstein	024781f9df	fixing two typos in comments	2025-11-19 15:12:53 -05:00
Eric Silberstein	97770700f2	change test/train split approach because random.seed(1) and random.seed(-1) do the same thing	2025-11-19 14:51:02 -05:00
Andrej	4a87a0d19f	Merge pull request #299 from samjabrahams/rotary_embedding_head_dim_comment_cleanup Fix comment: rotary embeddings final dimension size	2025-11-17 13:29:21 -08:00
Sam Abrahams	11e68bf442	Fix comment: rotary embeddings final dimension size	2025-11-17 11:32:56 -05:00
Andrej Karpathy	bc1fca39f3	mqa -> gqa to reduce confusion	2025-11-15 15:43:37 +00:00
Andrej	f66a780f68	Fix torch.dtype mismatching when running engine inline test.	2025-11-14 07:28:29 -08:00
Andrej	4763ce612a	Small fixes to typos	2025-11-14 07:25:59 -08:00
Sofie Van Landeghem	c6f5bd67db	revert change of base to sft for quick inline test	2025-11-14 12:20:03 +01:00
svlandeg	a2fb3c83a6	fix typos	2025-11-14 11:20:25 +01:00
svlandeg	e5efb4b471	add test_engine.py to file structure	2025-11-14 11:13:42 +01:00
Andrej Karpathy	9a71d13688	typo oops	2025-11-13 16:08:30 +00:00
Andrej Karpathy	7b7fd0fe71	thank you Sophie for your help with nanochat	2025-11-13 16:07:54 +00:00
Andrej Karpathy	c6abcdfe3a	big change: add pretraining resumption logic so that checkpoints can now be approximately resumed and training can continue. this is useful for very long runs when you don't want the anxiety of your run crashing for some reason. alternatively, it's a way to recover training in the event of loss spikes. i mean, this should have been there in v0 but it's ok. the resumption is approximate to control complexity and bloat, but it's possible we want to change that in the future. to use, set --save_every to a step interval to write checkpoints with, and then use --resume_from_step to resume optimization from a given step. only base model training (pretraining) supports this atm, but it's ok because midtraining is comparably quite a bit faster.	2025-11-13 15:34:40 +00:00
Andrej Karpathy	91f09ccd0d	minor fix comment in engine	2025-11-13 15:28:18 +00:00

1 2 3 4 5

232 Commits