05a051dbe9fix tokenization bug, there should be no space before first letter. sigh
master
Andrej Karpathy
2025-10-24 15:06:06 +00:00
8892470f29add the SpellingBee task so that nanochat can count r in strawberry etc. along the way we had to add a bunch of new functionality, e.g. extend the calculator to support the count function of python. possibly the current TaskMixture uses way too many synthetic examples of SpellingBee because the eval gives us exactly 100% performance on spelling. We can tune this later to reclaim some wall clock time here I think
Andrej Karpathy
2025-10-24 14:02:48 +00:00
81597cd616move the lr schedule args up in base_train so they are tunable in configurator
Andrej Karpathy
2025-10-24 13:27:31 +00:00
cc3636b01callow the tokenizer visualize_tokenization to also print the exact token id. you can never be paranoid enough
Andrej Karpathy
2025-10-24 13:27:05 +00:00
5eeb2b6ef9experiment: looking to 'hire' a nanochat repo czar to help the repo, mentioning in readme
Andrej Karpathy
2025-10-22 16:55:54 +00:00
2dda5c4c8dMerge branch 'ulanch-fix/ios-safari-input-overlap'
Andrej Karpathy
2025-10-22 16:26:35 +00:00
80b203ea59also bump run1000.sh to new uv sync
Andrej Karpathy
2025-10-22 16:08:31 +00:00
917c858136Updates lockfile with CPU package support without overwriting other architectures
Luke Stanley
2025-10-21 20:53:18 +00:00
db1d5b595dGit ignore eval_bundle
Luke Stanley
2025-10-21 20:39:31 +00:00
dd9387b362Fix GPU-less CPU use on Linux with specific Torch indexes
Luke Stanley
2025-10-21 19:52:21 +00:00
32571664b1Fix Torch crash caused by pinning on CPU
Luke Stanley
2025-10-21 19:43:38 +00:00
51e70f0d3cMerge branch 'lukestanley-fix-cpu-support-with-extras'
Andrej Karpathy
2025-10-22 16:11:15 +00:00
48387cd895also bump run1000.sh to new uv sync
Andrej Karpathy
2025-10-22 16:08:31 +00:00
796f84527ffix(ui): prevent iOS Safari toolbar from covering input on initial load
ulanch
2025-10-21 17:34:40 -07:00
7a52f9bfbbUpdates lockfile with CPU package support without overwriting other architectures
Luke Stanley
2025-10-21 20:53:18 +00:00
760af62e11Git ignore eval_bundle
Luke Stanley
2025-10-21 20:39:31 +00:00
901b075605Fix GPU-less CPU use on Linux with specific Torch indexes
Luke Stanley
2025-10-21 19:52:21 +00:00
defd1246aaFix Torch crash caused by pinning on CPU
Luke Stanley
2025-10-21 19:43:38 +00:00
2e938530cedelete spurious torch.empty allocation in adamw
Andrej
2025-10-21 11:35:17 -07:00
a088b7a6ecuse enable_gqa of pytorch sdpa, allows us to delete some code, didnt realize it's available
Andrej Karpathy
2025-10-21 18:07:33 +00:00
94ee507054quick fix base eval due to fewshot requirement
Andrej Karpathy
2025-10-21 17:56:08 +00:00
33e8a27f91Merge karpathy/cpu-mps-dev , adding the ability to run on CPU, on MPS, or on CUDA, with autodetect. Gnarly PR, nonzero chance I broke something.
Andrej
2025-10-21 10:26:04 -07:00
50bea28ef9also add readme mention of the cpu mps changes
cpu-mps-dev
Andrej Karpathy
2025-10-21 17:24:48 +00:00
5bdc99abfbmerge and resolve conflict
Andrej Karpathy
2025-10-21 17:19:10 +00:00
dfcb1c16f1Merge branch 'master' into cpu-mps-dev
Andrej Karpathy
2025-10-21 17:15:53 +00:00
bb71c64579fix silly issue in dataloader, this version is much faster and more portable to mps too
Andrej Karpathy
2025-10-21 17:12:50 +00:00
bb786c5560i shouldnt have committed the lock file, i missed that. revert to the flagship build which is linux. sorry to pollute the repo history...
karpathy
2025-10-21 10:07:40 -07:00
c9ea7a91e2Add customization instructions to README
Andrej
2025-10-21 08:57:10 -07:00
03cddd9878actually let's not brick code on git pull. change error to warning
Andrej Karpathy
2025-10-21 15:13:25 +00:00
fe5aed940badd personality to nanochat. breaks previous code on git pull and requires download of a new file from s3, but there is a helpful error message so hopefully its ok
Andrej Karpathy
2025-10-21 15:04:58 +00:00
2e9669e03aupgrading all other files to be able to use cpu/mps as well as cuda. various minor other changes ,e.g. changing max_iterations to num_iterations in sft script for consistency in naming
karpathy
2025-10-20 10:15:17 -07:00
a09ac812edtoml changes for cpu only install
Andrej
2025-10-20 07:53:15 -07:00
0abb0fa2e3add both sides of the source check
burtenshaw
2025-10-20 10:44:07 +02:00
c7ae920a77add check for linux on cpu
burtenshaw
2025-10-20 06:51:52 +02:00
0f007889ddAdd MIT License as a file to the project
Andrej
2025-10-19 17:22:19 -07:00
5a879f4947export NANOCHAT_BASE_DIR so child processes get it too
Andrej
2025-10-19 17:07:56 -07:00
c1d2ed1c13use orig_model in sampling, silly of me to miss this
Andrej Karpathy
2025-10-20 00:05:09 +00:00
2bc521a6deuse orig_model in sampling, silly of me to miss this
Andrej Karpathy
2025-10-20 00:04:15 +00:00
9467d83cf2fix memory leak bug in rust tokenizer ty @mitsuhiko
Andrej Karpathy
2025-10-19 23:54:31 +00:00
b1443dc98cexport NANOCHAT_BASE_DIR so child processes get it too
Tancrède Lepoint
2025-10-19 14:05:40 -04:00
cf2baf9933fix typo
Andrej
2025-10-17 08:35:41 -07:00
e4f9b9c64drevert to previous pyproject.toml
karpathy
2025-10-17 08:08:16 -07:00
e883b1d597Merge pull request #99 from burtenshaw/cpu-mps-dev-ben
Andrej
2025-10-17 07:24:38 -07:00
23b6351c1cadd groups and source selection
burtenshaw
2025-10-17 12:20:18 +02:00
ae02650afeupdate the midtraining script too
karpathy
2025-10-16 16:33:17 -07:00
df600b6ed5many small tweaks. base, eval, core work now i think
karpathy
2025-10-16 15:46:18 -07:00
d6d86cbf4cupdate readme with a link to the CPU|MPS branch
Andrej Karpathy
2025-10-16 22:03:39 +00:00
ccfe7915acmention the current d32 chat hosted on nanochat.karpathy.ai, as an example endpoint of the repo
Andrej Karpathy
2025-10-16 19:32:44 +00:00
786119d593add autodetect of device and related stuff. getting weird warnings/errors still, so wip
karpathy
2025-10-16 10:26:19 -07:00
279b74312cadjust comment/guidance on device type
karpathy
2025-10-16 10:06:39 -07:00
306bc380abadd support for CPU and for MPS. I had to change a few cosmetic things. I also discovered I think a bit of a bug, where I was casting wte to bfloat16 in the wrong place (the model init) instead of in init_weights
karpathy
2025-10-16 10:04:43 -07:00
722da4f543trying to add basic cpu support, will try mps too
Andrej Karpathy
2025-10-16 16:14:38 +00:00
4346536ab2also allow regenerating assistant message by clicking it, and make sure to feed good seed to generate
Andrej Karpathy
2025-10-16 01:28:37 +00:00
2846999b8fallow user to click on their message to edit them. conversation after that point is wiped
Andrej Karpathy
2025-10-16 01:16:22 +00:00
92d52ecc92add slash commands to webui
Andrej Karpathy
2025-10-16 01:09:53 +00:00
fae3aca951add script to train a 000 version of nanochat. currently it's a bit more like 00 and this would run in probably around 33 hours instead of the budget of 41 hours, so we might tune it later. i think it's ok for now
Andrej Karpathy
2025-10-15 20:32:22 +00:00
4c3590c499fix subtle issue in token decoding in cases where multiple utf8 bytes need to be emitted into a single codepoint. exampels are emoji or foreign languages. basically we have to accumulate token sequences/text and only emit when we get full codepoints
Andrej Karpathy
2025-10-15 20:29:54 +00:00
03fa673b7dadd basic logging to chat_web, which i think might be fun
Andrej Karpathy
2025-10-15 19:51:06 +00:00
52bfeea8bdadd very basic abuse prevention limits to chat_web so it's ok to host endpoints
Andrej Karpathy
2025-10-15 19:42:54 +00:00
01fb290f53allow multiple GPUs to do inference in a data parallel way
Andrej Karpathy
2025-10-15 19:12:19 +00:00
190d9515d0dont evaluate the sampling evals during SFT they are too slow. keep the multiple choice evals. delete unused imports
Andrej Karpathy
2025-10-15 16:42:23 +00:00
b8076dd367fix bug in learning rate multiplier, it was ramping up instead of ramping down. see more in Issue #68. also add --dry_run option useful for experimentation
Andrej Karpathy
2025-10-15 16:35:04 +00:00
67aaca98f5export NANOCHAT_BASE_DIR so child processes get it too
Andrej
2025-10-14 16:01:28 -07:00