Generative AI Essentials
34 items across foundations, model architectures, the foundation-model lifecycle, application patterns, and forward-looking reflections. Each kind follows a consistent H2 template so writeups are scannable across topics.
Foundations
6 items What generative AI is, where it came from, and the text-handling primitives every model still depends on. Read these before anything else.
Foundations
6 items- What Is Generative AI?
The shift from discriminative to generative models — what changed between 2017's transformer paper and today's foundation-model era.
Concept Foundational - Why Learn Generative AI
The engineer-shaped case for understanding generative models from first principles, not just calling APIs.
Concept Foundational - The Emergence of NLP
From rule-based parsers to statistical methods to neural language models — the four decades that led to ChatGPT.
Concept Foundational - Text Preprocessing Essentials
Tokenization, stemming, lemmatization, normalization. The unglamorous foundation under every text model.
Concept Foundational - Vectorizing Language
From bag-of-words to word2vec to contextual embeddings. How text becomes math a model can manipulate.
Concept Foundational - The Emergence of Generative AI
What changed in 2017 (attention), 2018 (GPT-1/BERT), 2020 (GPT-3 scale), and 2022 (ChatGPT, productization).
Concept Foundational
Architectures
8 items The model architectures behind generative AI — RNN, LSTM, transformer, BERT, GPT, diffusion. Each writeup is a focused deep-dive on one design.
Architectures
8 items- Building Context with Neurons (RNNs)
Vanilla recurrent networks: sequential context, the gradient problem, why they fail past ~50 tokens.
Architecture Intermediate - Reconstructing Context with Sequence Models (LSTM / GRU)
Gated memory cells. How LSTMs and GRUs extended the useful context window from tens to hundreds of tokens.
Architecture Intermediate - Encoder-Decoder Framework
Sequence-to-sequence: an encoder compresses input to a fixed vector; a decoder generates output token-by-token. Translation's first real shot.
Architecture Intermediate - Attention Is All You Need (Transformer)
The 2017 paper that rebuilt the field. Self-attention, positional encoding, parallel training, and why this killed RNNs for language.
Architecture Advanced - Bidirectional Transformers (BERT)
Masked language modeling. How BERT became the encoder of choice for classification, retrieval, and ranking.
Architecture Advanced - Generative Pretraining (GPT)
Causal language modeling at scale. The architectural choice that turned a language model into a general-purpose tool.
Architecture Advanced - Diffusion Models
Iterative denoising as a generative process. The architecture under Stable Diffusion, DALL·E 2, and Sora.
Architecture Advanced - Vision Models (CNN → ViT)
From convolutional layers to vision transformers. How images became sequences and joined the transformer party.
Architecture Advanced
Foundation Models
8 items The lifecycle of a foundation model — pretraining, post-training, evaluation, optimization for deployment. The model-as-a-system view, not the architecture view.
Foundation Models
8 items- What Are Foundation Models?
Large, broadly-pretrained models that serve as starting points for many downstream tasks. The reusable substrate of modern AI.
Concept Foundational - How Do Models Learn?
Gradient descent, backpropagation, loss functions, and the optimization loop. The engine under every neural network.
Concept Foundational - Pretraining Paradigms
Causal vs masked vs contrastive vs span-corruption. The objective you pick determines what the model is good at.
Concept Intermediate - Post-Training, Fine-Tuning, and Adaptation
Supervised fine-tuning, RLHF, DPO, LoRA, prompt-tuning. How a pretrained model becomes a product.
Concept Intermediate - Model Optimization for Deployment
Quantization, distillation, pruning, KV-cache reuse, speculative decoding. The serving-cost levers that decide unit economics.
Concept Advanced - Large Language Models at Scale
Scaling laws, compute budgets, emergent capabilities, and the cost shape that determines who can train frontier models.
Concept Advanced - Evaluating Large Language Models
Perplexity, MMLU, HumanEval, helpfulness ratings, holistic evals. Why every benchmark is wrong and you still need them.
Concept Intermediate - Multimodal Models
Text + image + audio in one model. CLIP, Flamingo, Gemini, GPT-4o — how cross-modal alignment actually works.
Concept Advanced
Applications
8 items What to build on top of foundation models — prompting, RAG, agents, and the modality-specific systems (text, image, audio, video).
Applications
8 items- Prompt Engineering
Templates, role / system prompts, few-shot, chain-of-thought, and the prompt patterns that survive contact with production.
Application Foundational - Retrieval-Augmented Generation (RAG)
Vector stores, chunking, hybrid retrieval, reranking, and the eval harness that tells you whether your RAG actually works.
Application Intermediate - Autonomous AI Agents
Tool use, planning, memory, multi-step loops. What's hard about turning a language model into something that takes actions.
Application Advanced - Text-to-Text Generation Systems
Summarization, translation, rewriting, structured extraction. The bread-and-butter applications and how they're served.
Application Intermediate - Text-to-Image Generation Systems
From prompt to pixels: CLIP-guided diffusion, latent diffusion, ControlNet, the prompt-to-output pipeline at production scale.
Application Intermediate - Text-to-Speech Generation Systems
Neural TTS, voice cloning, prosody, the streaming-audio pipeline. What real-time voice products are actually doing.
Application Intermediate - Text-to-Video Generation Systems
Frame coherence, motion priors, and the compute shape that makes video generation orders-of-magnitude harder than images.
Application Advanced - Audio and Music Generation
Raw-waveform vs spectrogram vs token-based audio models. How MusicLM, Suno, and Udio actually produce sound.
Application Intermediate
Future & Ethics
4 items Forward-looking pieces — where the field is heading, what's getting harder, and the alignment / safety / hallucination problems that aren't going away.
Future & Ethics
4 items- The Future of Generative AI
Where the field is heading in 2026: agents, reasoning, on-device, multimodality, and the compute wall everyone is staring at.
Reflection Foundational - The Way Forward
What to learn next, in what order, and how to keep up when the field reinvents itself every six months.
Reflection Foundational - AI Safety and Alignment
RLHF, constitutional AI, red-teaming, refusal training. The engineering practices behind not-shipping-something-harmful.
Reflection Intermediate - Hallucinations and the Evaluation Problem
Why models confidently make things up, what causes it, what reduces it, and how to measure progress on a moving target.
Reflection Intermediate