Working through Stanford CS336 — Language Modeling from Scratch — in 33.6 days.

Documenting my progress below

Timeline Progress3%
Tokenization & Data
Day 1 — Intro & Tokenization · Lecture Notes
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Transformer Architecture
Day 8
Day 9
Day 10
Day 11
Day 12
Day 13
Day 14
Training & Optimization
Day 15
Day 16
Day 17
Day 18
Day 19
Day 20
Day 21
Scaling & Evaluation
Day 22
Day 23
Day 24
Day 25
Day 26
Day 27
Day 28
Final Lap
Day 29
Day 30
Day 31
Day 32
Day 33
Day 34 — Sprint end (0.6 day)

Day 1

of 33.6

3.0h

hours logged

🔥 1

day streak

1

notes filed

TodayLecture Notes

Intro & Tokenization

Tokenization · 3h

All Notes
Day 1Lecture Notes

Intro & Tokenization

Today I learned about Byte Pair Encoding (BPE), the algorithm used by most modern language models for tokenization. Unicode handling: Breaking text into bytes first, then merging Vocabulary size trade-offs: More tokens = shorter sequences but larger embedding matrix Special tokens: [BOS], [EOS], [PAD], [UNK] The basic algorithm: 1. Start with character-level…

Tokenization·3h
Intro & Tokenization