Foundations
What generative AI is, where it came from, and the text-handling primitives every model still depends on. Read these before anything else.
Generative AI didn't appear in 2022 with ChatGPT — it's the product of four decades of NLP, a 2017 architectural breakthrough, and a 2020 scale-law realization. These pages cover the prerequisites: what generative means, why it matters to engineers (not researchers), and the text-preprocessing and vectorization steps that turn human language into something a model can manipulate.
Most engineers can skip the history and still call an API. But the mental model — tokens as the unit of work, embeddings as the geometry, attention as the routing — is what separates someone who uses an LLM from someone who debugs one.
Key concepts
- Generative ≠ predictive — the model produces new outputs, not just classifications over known labels
- Tokens are the unit of work; everything is billed, attended to, and reasoned about per-token
- Embeddings are the geometry: nearby vectors mean nearby meaning
- The 2017 transformer paper rebuilt the field — every modern model traces lineage back to it
- Scale matters more than cleverness past a certain point — the bitter lesson, restated
Reference template
// Mental scaffolding before you read any model paper
1. What's the input? (text? image? audio?)
2. What's the unit? (token? pixel? frame? word piece?)
3. What's the objective? (next-token? masked? contrastive? denoising?)
4. What's the architecture? (encoder? decoder? both? diffusion?)
5. What's the scale? (params, data, compute — the cost shape)
6. What's it good at? (and what does it fail at out-of-distribution?) Adapt to your problem; the structure is the load-bearing part.
Common pitfalls
- Treating tokenization as a solved problem —
tokenizer != words, and the splits matter for cost and behavior - Confusing embeddings with hashes — embeddings carry meaning; hashes don't
- Assuming GPT-N is just bigger GPT-(N-1) — capability jumps are non-linear and sometimes qualitative
- Skipping the NLP history and missing that most 'new' ideas have older predecessors with different names
Related topics
Items (6)
- What Is Generative AI?
The shift from discriminative to generative models — what changed between 2017's transformer paper and today's foundation-model era.
Concept Foundational - Why Learn Generative AI
The engineer-shaped case for understanding generative models from first principles, not just calling APIs.
Concept Foundational - The Emergence of NLP
From rule-based parsers to statistical methods to neural language models — the four decades that led to ChatGPT.
Concept Foundational - Text Preprocessing Essentials
Tokenization, stemming, lemmatization, normalization. The unglamorous foundation under every text model.
Concept Foundational - Vectorizing Language
From bag-of-words to word2vec to contextual embeddings. How text becomes math a model can manipulate.
Concept Foundational - The Emergence of Generative AI
What changed in 2017 (attention), 2018 (GPT-1/BERT), 2020 (GPT-3 scale), and 2022 (ChatGPT, productization).
Concept Foundational