Chapter 5: Tokenization Explained from Scratch
Goal: Teach machines to read
Topics Covered:
-
Why tokenization is required
-
Character-level tokenization
-
Word-level tokenization
-
Subword tokenization
-
Byte Pair Encoding (BPE)
-
Trade-offs in vocabulary size
📌 Medium Post 5: Tokenization: Teaching Machines to Read
Chapter 6: Vocabulary Design & Token IDs
Goal: Understand token spaces
Topics Covered:
-
Vocabulary creation
-
Special tokens (PAD, BOS, EOS)
-
Unknown tokens
-
Token frequency & pruning
-
Vocabulary size vs model performance
📌 Medium Post 6: Designing a Vocabulary for LLMs

No comments:
Post a Comment