Generative AI Essentials

34 items across foundations, model architectures, the foundation-model lifecycle, application patterns, and forward-looking reflections. Each kind follows a consistent H2 template so writeups are scannable across topics.

11 Foundational 13 Intermediate 10 Advanced 5 topics RSS

Kind: Topic: Level:

Foundations

6 items

What generative AI is, where it came from, and the text-handling primitives every model still depends on. Read these before anything else.

What Is Generative AI?
The shift from discriminative to generative models — what changed between 2017's transformer paper and today's foundation-model era.

Concept Foundational
Why Learn Generative AI
The engineer-shaped case for understanding generative models from first principles, not just calling APIs.

Concept Foundational
The Emergence of NLP
From rule-based parsers to statistical methods to neural language models — the four decades that led to ChatGPT.

Concept Foundational
Text Preprocessing Essentials
Tokenization, stemming, lemmatization, normalization. The unglamorous foundation under every text model.

Concept Foundational
Vectorizing Language
From bag-of-words to word2vec to contextual embeddings. How text becomes math a model can manipulate.

Concept Foundational
The Emergence of Generative AI
What changed in 2017 (attention), 2018 (GPT-1/BERT), 2020 (GPT-3 scale), and 2022 (ChatGPT, productization).

Concept Foundational

Architectures

8 items

The model architectures behind generative AI — RNN, LSTM, transformer, BERT, GPT, diffusion. Each writeup is a focused deep-dive on one design.

Building Context with Neurons (RNNs)
Vanilla recurrent networks: sequential context, the gradient problem, why they fail past ~50 tokens.

Architecture Intermediate
Reconstructing Context with Sequence Models (LSTM / GRU)
Gated memory cells. How LSTMs and GRUs extended the useful context window from tens to hundreds of tokens.

Architecture Intermediate
Encoder-Decoder Framework
Sequence-to-sequence: an encoder compresses input to a fixed vector; a decoder generates output token-by-token. Translation's first real shot.

Architecture Intermediate
Attention Is All You Need (Transformer)
The 2017 paper that rebuilt the field. Self-attention, positional encoding, parallel training, and why this killed RNNs for language.

Architecture Advanced
Bidirectional Transformers (BERT)
Masked language modeling. How BERT became the encoder of choice for classification, retrieval, and ranking.

Architecture Advanced
Generative Pretraining (GPT)
Causal language modeling at scale. The architectural choice that turned a language model into a general-purpose tool.

Architecture Advanced
Diffusion Models
Iterative denoising as a generative process. The architecture under Stable Diffusion, DALL·E 2, and Sora.

Architecture Advanced
Vision Models (CNN → ViT)
From convolutional layers to vision transformers. How images became sequences and joined the transformer party.

Architecture Advanced

Foundation Models

8 items

The lifecycle of a foundation model — pretraining, post-training, evaluation, optimization for deployment. The model-as-a-system view, not the architecture view.

What Are Foundation Models?
Large, broadly-pretrained models that serve as starting points for many downstream tasks. The reusable substrate of modern AI.

Concept Foundational
How Do Models Learn?
Gradient descent, backpropagation, loss functions, and the optimization loop. The engine under every neural network.

Concept Foundational
Pretraining Paradigms
Causal vs masked vs contrastive vs span-corruption. The objective you pick determines what the model is good at.

Concept Intermediate
Post-Training, Fine-Tuning, and Adaptation
Supervised fine-tuning, RLHF, DPO, LoRA, prompt-tuning. How a pretrained model becomes a product.

Concept Intermediate
Model Optimization for Deployment
Quantization, distillation, pruning, KV-cache reuse, speculative decoding. The serving-cost levers that decide unit economics.

Concept Advanced
Large Language Models at Scale
Scaling laws, compute budgets, emergent capabilities, and the cost shape that determines who can train frontier models.

Concept Advanced
Evaluating Large Language Models
Perplexity, MMLU, HumanEval, helpfulness ratings, holistic evals. Why every benchmark is wrong and you still need them.

Concept Intermediate
Multimodal Models
Text + image + audio in one model. CLIP, Flamingo, Gemini, GPT-4o — how cross-modal alignment actually works.

Concept Advanced

Applications

8 items

What to build on top of foundation models — prompting, RAG, agents, and the modality-specific systems (text, image, audio, video).

Prompt Engineering
Templates, role / system prompts, few-shot, chain-of-thought, and the prompt patterns that survive contact with production.

Application Foundational
Retrieval-Augmented Generation (RAG)
Vector stores, chunking, hybrid retrieval, reranking, and the eval harness that tells you whether your RAG actually works.

Application Intermediate
Autonomous AI Agents
Tool use, planning, memory, multi-step loops. What's hard about turning a language model into something that takes actions.

Application Advanced
Text-to-Text Generation Systems
Summarization, translation, rewriting, structured extraction. The bread-and-butter applications and how they're served.

Application Intermediate
Text-to-Image Generation Systems
From prompt to pixels: CLIP-guided diffusion, latent diffusion, ControlNet, the prompt-to-output pipeline at production scale.

Application Intermediate
Text-to-Speech Generation Systems
Neural TTS, voice cloning, prosody, the streaming-audio pipeline. What real-time voice products are actually doing.

Application Intermediate
Text-to-Video Generation Systems
Frame coherence, motion priors, and the compute shape that makes video generation orders-of-magnitude harder than images.

Application Advanced
Audio and Music Generation
Raw-waveform vs spectrogram vs token-based audio models. How MusicLM, Suno, and Udio actually produce sound.

Application Intermediate

Future & Ethics

4 items

Forward-looking pieces — where the field is heading, what's getting harder, and the alignment / safety / hallucination problems that aren't going away.

The Future of Generative AI
Where the field is heading in 2026: agents, reasoning, on-device, multimodality, and the compute wall everyone is staring at.

Reflection Foundational
The Way Forward
What to learn next, in what order, and how to keep up when the field reinvents itself every six months.

Reflection Foundational
AI Safety and Alignment
RLHF, constitutional AI, red-teaming, refusal training. The engineering practices behind not-shipping-something-harmful.

Reflection Intermediate
Hallucinations and the Evaluation Problem
Why models confidently make things up, what causes it, what reduces it, and how to measure progress on a moving target.

Reflection Intermediate