Applications

What to build on top of foundation models — prompting, RAG, agents, and the modality-specific systems (text, image, audio, video).

8 items 1 Foundational 5 Intermediate 2 Advanced

Applications are where most engineers will spend their time. The architecture choices are mostly already made; the question is how to compose a useful product out of a foundation model you didn't train. Prompt engineering is the smallest unit. RAG is what you reach for when the model doesn't know your domain. Agents are what you build when one shot isn't enough.

The modality systems — T2I, T2S, T2V, audio — each have their own production gotchas. We cover them as application patterns, not as model architectures (those live in the Architectures topic).

Key concepts

Prompt → RAG → fine-tune → agents: a rough escalation ladder of complexity and cost
RAG isn't "adding context" — it's a retrieval system whose quality determines the model's perceived intelligence
Agents are loops with state and tools — the loop control is the hard part, not the tool calls
Streaming dominates user-facing latency perception — first-token matters more than total-token
Eval at the application layer is harder than at the model layer — the surface area is bigger

Reference template

// Application-template H2 structure
## Use cases
## System overview
## Key components
## Implementation patterns
## Trade-offs
## Quality and evaluation
## Common pitfalls
## Related applications

Adapt to your problem; the structure is the load-bearing part.

Common pitfalls

Reaching for fine-tuning before exhausting prompting and RAG — fine-tuning is expensive and brittle
Building agents on weak loops — most "agent" failures are loop-control bugs, not model failures
Skipping eval pipelines — without them, every prompt change is a coin flip in production
Underestimating user UX — streaming, partial output, and edit-affordances often matter more than raw quality

Items (8)

Prompt Engineering
Templates, role / system prompts, few-shot, chain-of-thought, and the prompt patterns that survive contact with production.

Application Foundational
Retrieval-Augmented Generation (RAG)
Vector stores, chunking, hybrid retrieval, reranking, and the eval harness that tells you whether your RAG actually works.

Application Intermediate
Autonomous AI Agents
Tool use, planning, memory, multi-step loops. What's hard about turning a language model into something that takes actions.

Application Advanced
Text-to-Text Generation Systems
Summarization, translation, rewriting, structured extraction. The bread-and-butter applications and how they're served.

Application Intermediate
Text-to-Image Generation Systems
From prompt to pixels: CLIP-guided diffusion, latent diffusion, ControlNet, the prompt-to-output pipeline at production scale.

Application Intermediate
Text-to-Speech Generation Systems
Neural TTS, voice cloning, prosody, the streaming-audio pipeline. What real-time voice products are actually doing.

Application Intermediate
Text-to-Video Generation Systems
Frame coherence, motion priors, and the compute shape that makes video generation orders-of-magnitude harder than images.

Application Advanced
Audio and Music Generation
Raw-waveform vs spectrogram vs token-based audio models. How MusicLM, Suno, and Udio actually produce sound.

Application Intermediate

Applications

Key concepts

Reference template

Common pitfalls

Related topics

Items (8)

Keyboard shortcuts