← All items

Foundations

What generative AI is, where it came from, and the text-handling primitives every model still depends on. Read these before anything else.

6 items 6 Foundational

Generative AI didn't appear in 2022 with ChatGPT — it's the product of four decades of NLP, a 2017 architectural breakthrough, and a 2020 scale-law realization. These pages cover the prerequisites: what generative means, why it matters to engineers (not researchers), and the text-preprocessing and vectorization steps that turn human language into something a model can manipulate.

Most engineers can skip the history and still call an API. But the mental model — tokens as the unit of work, embeddings as the geometry, attention as the routing — is what separates someone who uses an LLM from someone who debugs one.

Key concepts

  • Generative ≠ predictive — the model produces new outputs, not just classifications over known labels
  • Tokens are the unit of work; everything is billed, attended to, and reasoned about per-token
  • Embeddings are the geometry: nearby vectors mean nearby meaning
  • The 2017 transformer paper rebuilt the field — every modern model traces lineage back to it
  • Scale matters more than cleverness past a certain point — the bitter lesson, restated

Reference template

// Mental scaffolding before you read any model paper
1. What's the input?         (text? image? audio?)
2. What's the unit?          (token? pixel? frame? word piece?)
3. What's the objective?     (next-token? masked? contrastive? denoising?)
4. What's the architecture?  (encoder? decoder? both? diffusion?)
5. What's the scale?         (params, data, compute — the cost shape)
6. What's it good at?        (and what does it fail at out-of-distribution?)

Adapt to your problem; the structure is the load-bearing part.

Common pitfalls

  • Treating tokenization as a solved problem — tokenizer != words, and the splits matter for cost and behavior
  • Confusing embeddings with hashes — embeddings carry meaning; hashes don't
  • Assuming GPT-N is just bigger GPT-(N-1) — capability jumps are non-linear and sometimes qualitative
  • Skipping the NLP history and missing that most 'new' ideas have older predecessors with different names

Related topics

Items (6)

  • What Is Generative AI?

    The shift from discriminative to generative models — what changed between 2017's transformer paper and today's foundation-model era.

    Concept Foundational
  • Why Learn Generative AI

    The engineer-shaped case for understanding generative models from first principles, not just calling APIs.

    Concept Foundational
  • The Emergence of NLP

    From rule-based parsers to statistical methods to neural language models — the four decades that led to ChatGPT.

    Concept Foundational
  • Text Preprocessing Essentials

    Tokenization, stemming, lemmatization, normalization. The unglamorous foundation under every text model.

    Concept Foundational
  • Vectorizing Language

    From bag-of-words to word2vec to contextual embeddings. How text becomes math a model can manipulate.

    Concept Foundational
  • The Emergence of Generative AI

    What changed in 2017 (attention), 2018 (GPT-1/BERT), 2020 (GPT-3 scale), and 2022 (ChatGPT, productization).

    Concept Foundational
Search ESC

Keyboard shortcuts

Shortcuts are disabled while typing in inputs.