Foundations

What generative AI is, where it came from, and the text-handling primitives every model still depends on. Read these before anything else.

6 items 6 Foundational

Generative AI didn't appear in 2022 with ChatGPT — it's the product of four decades of NLP, a 2017 architectural breakthrough, and a 2020 scale-law realization. These pages cover the prerequisites: what generative means, why it matters to engineers (not researchers), and the text-preprocessing and vectorization steps that turn human language into something a model can manipulate.

Most engineers can skip the history and still call an API. But the mental model — tokens as the unit of work, embeddings as the geometry, attention as the routing — is what separates someone who uses an LLM from someone who debugs one.

Key concepts

Generative ≠ predictive — the model produces new outputs, not just classifications over known labels
Tokens are the unit of work; everything is billed, attended to, and reasoned about per-token
Embeddings are the geometry: nearby vectors mean nearby meaning
The 2017 transformer paper rebuilt the field — every modern model traces lineage back to it
Scale matters more than cleverness past a certain point — the bitter lesson, restated

Reference template

// Mental scaffolding before you read any model paper
1. What's the input?         (text? image? audio?)
2. What's the unit?          (token? pixel? frame? word piece?)
3. What's the objective?     (next-token? masked? contrastive? denoising?)
4. What's the architecture?  (encoder? decoder? both? diffusion?)
5. What's the scale?         (params, data, compute — the cost shape)
6. What's it good at?        (and what does it fail at out-of-distribution?)

Adapt to your problem; the structure is the load-bearing part.

Common pitfalls

Treating tokenization as a solved problem — tokenizer != words, and the splits matter for cost and behavior
Confusing embeddings with hashes — embeddings carry meaning; hashes don't
Assuming GPT-N is just bigger GPT-(N-1) — capability jumps are non-linear and sometimes qualitative
Skipping the NLP history and missing that most 'new' ideas have older predecessors with different names

Items (6)

What Is Generative AI?
The shift from discriminative to generative models — what changed between 2017's transformer paper and today's foundation-model era.

Concept Foundational
Why Learn Generative AI
The engineer-shaped case for understanding generative models from first principles, not just calling APIs.

Concept Foundational
The Emergence of NLP
From rule-based parsers to statistical methods to neural language models — the four decades that led to ChatGPT.

Concept Foundational
Text Preprocessing Essentials
Tokenization, stemming, lemmatization, normalization. The unglamorous foundation under every text model.

Concept Foundational
Vectorizing Language
From bag-of-words to word2vec to contextual embeddings. How text becomes math a model can manipulate.

Concept Foundational
The Emergence of Generative AI
What changed in 2017 (attention), 2018 (GPT-1/BERT), 2020 (GPT-3 scale), and 2022 (ChatGPT, productization).

Concept Foundational

Foundations

Key concepts

Reference template

Common pitfalls

Related topics

Items (6)

Keyboard shortcuts