Applications
What to build on top of foundation models — prompting, RAG, agents, and the modality-specific systems (text, image, audio, video).
Applications are where most engineers will spend their time. The architecture choices are mostly already made; the question is how to compose a useful product out of a foundation model you didn't train. Prompt engineering is the smallest unit. RAG is what you reach for when the model doesn't know your domain. Agents are what you build when one shot isn't enough.
The modality systems — T2I, T2S, T2V, audio — each have their own production gotchas. We cover them as application patterns, not as model architectures (those live in the Architectures topic).
Key concepts
- Prompt → RAG → fine-tune → agents: a rough escalation ladder of complexity and cost
- RAG isn't "adding context" — it's a retrieval system whose quality determines the model's perceived intelligence
- Agents are loops with state and tools — the loop control is the hard part, not the tool calls
- Streaming dominates user-facing latency perception — first-token matters more than total-token
- Eval at the application layer is harder than at the model layer — the surface area is bigger
Reference template
// Application-template H2 structure
## Use cases
## System overview
## Key components
## Implementation patterns
## Trade-offs
## Quality and evaluation
## Common pitfalls
## Related applications Adapt to your problem; the structure is the load-bearing part.
Common pitfalls
- Reaching for fine-tuning before exhausting prompting and RAG — fine-tuning is expensive and brittle
- Building agents on weak loops — most "agent" failures are loop-control bugs, not model failures
- Skipping eval pipelines — without them, every prompt change is a coin flip in production
- Underestimating user UX — streaming, partial output, and edit-affordances often matter more than raw quality
Related topics
Items (8)
- Prompt Engineering
Templates, role / system prompts, few-shot, chain-of-thought, and the prompt patterns that survive contact with production.
Application Foundational - Retrieval-Augmented Generation (RAG)
Vector stores, chunking, hybrid retrieval, reranking, and the eval harness that tells you whether your RAG actually works.
Application Intermediate - Autonomous AI Agents
Tool use, planning, memory, multi-step loops. What's hard about turning a language model into something that takes actions.
Application Advanced - Text-to-Text Generation Systems
Summarization, translation, rewriting, structured extraction. The bread-and-butter applications and how they're served.
Application Intermediate - Text-to-Image Generation Systems
From prompt to pixels: CLIP-guided diffusion, latent diffusion, ControlNet, the prompt-to-output pipeline at production scale.
Application Intermediate - Text-to-Speech Generation Systems
Neural TTS, voice cloning, prosody, the streaming-audio pipeline. What real-time voice products are actually doing.
Application Intermediate - Text-to-Video Generation Systems
Frame coherence, motion priors, and the compute shape that makes video generation orders-of-magnitude harder than images.
Application Advanced - Audio and Music Generation
Raw-waveform vs spectrogram vs token-based audio models. How MusicLM, Suno, and Udio actually produce sound.
Application Intermediate