AI runs work, not just writes code. How to build systems that compound context instead of losing it.
AI coding discourse is stuck between two extremes. On one side: 'AI will replace developers in 3 months.' On the other: 'AI-generated code is always garbage.' Both miss the practical middle where AI actually helps - not by writing code for you, but by running structured work for you.
This article fills that gap. For strategic AI preparation (contracts, tokens, documentation), see Design Systems in the AI Era. For tactical AI workflows (prompts, validation gates), see AI in Design Systems: What Actually Works. This article covers the orchestration layer - how to build AI tools that compound work instead of losing context.
Boris Cherny's workflow reveal showed what's possible with Claude Code. But that reveal was a destination, not a map. This is the map - practical patterns from building and using AI orchestration systems daily.
Long AI sessions degrade quality. This is not opinion - it is measurable. Around 30% context window usage, output quality starts declining. At 70%+, outputs become rushed, minimal, and prone to shortcuts.
Context Window Usage vs. Output Quality
0-30%: [████████████████████] Peak quality
30-50%: [████████████████ ] Good quality
50-70%: [████████████ ] Declining
70-85%: [████████ ] Rushed/minimal
85%+: [████ ] Error-proneI learned this the hard way. Multi-hour sessions that started brilliantly ended with sloppy implementations. The fix is not 'shorter sessions' - it is fresh context windows for each unit of work. Multi-agent architectures where each agent gets a clean slate.
The insight: treat AI context like memory. Long-running processes accumulate garbage. Fresh processes start clean. The orchestration layer manages this - spawning fresh agents for each task, passing only relevant context forward.
Stop thinking 'AI writes code for me.' Start thinking 'AI executes structured work for me.' The shift is subtle but fundamental. Code generation is a side effect of work execution, not the primary goal.
This reframes everything. Instead of prompting 'write a login component,' you define a task: 'Implement login form with email/password validation, submit handler, and error states. Verify: component renders, form validates, submit fires callback.'
The AI executes the task. Code is produced. But the task definition includes verification, so quality is built in. The orchestration layer chains these tasks, passes context between them, and commits each one atomically.
Every Claude Code session loads your project's CLAUDE.md file as additional context. This is your project-level system prompt - persistent instructions that shape AI behavior across all interactions.
Most teams underuse this. They add 'this is a React project' and call it done. The opportunity is much larger.
# CLAUDE.md
## Project Overview
[One paragraph: what this is, who it is for, what success looks like]
## Development Commands
```bash
npm run dev # Start dev server
npm run build # Production build
npm run lint # ESLint
npm run typecheck # TypeScript check
npm run test # Run tests
```
## Architecture
[Key patterns, file organization, data flow]
- Content lives in /content/ as MDX
- Components in /components/ui/
- Data fetching in /lib/
## Key Patterns
[Conventions AI should follow]
- Use existing components before creating new ones
- All components need TypeScript props interfaces
- Tests go in __tests__ folders next to source
## Key Files
[Important files to reference]
| File | Purpose |
|------|---------|
| lib/site-data.ts | All structured content |
| source.config.ts | Content schemas |
| tailwind.config.ts | Design tokens |The structure matters. Commands tell AI how to verify work. Architecture constrains where things go. Patterns prevent AI from inventing new conventions. Key files point to reference material.
Update CLAUDE.md as your project evolves. Stale instructions produce stale outputs. When you add a new pattern, document it. When you deprecate something, remove it. CLAUDE.md is living documentation for AI.
CLAUDE.md is project-level. Skills are workflow-level - reusable instruction sets that can be invoked on demand. Think of them as function definitions for AI behavior.
# /skill:create-component
## Purpose
Create a new React component following project conventions.
## Inputs Required
- Component name (PascalCase)
- Component type (ui | feature | layout)
- Props interface
## Process
1. Create component file in correct location
2. Add TypeScript props interface
3. Implement component with Tailwind classes
4. Add to component index export
5. Create basic Storybook story
## Output
- Component file created
- Exported from index
- Story added
- Ready for customization
## Verification
- File exists at correct path
- TypeScript compiles
- Story renders in StorybookInvoke with /skill:create-component and the skill instructions shape the entire interaction. The AI knows what inputs it needs, what process to follow, and how to verify completion.
Skills compose into workflows. A /skill:feature-complete might chain /skill:create-component + /skill:add-tests + /skill:update-docs. Each skill handles its domain; the workflow skill orchestrates them.
The power is in versioning and sharing. Skills live in your repo. They evolve with your project. New team members get the same AI behavior from day one. No prompt engineering - just documented procedures.
Every task needs a done condition. Without verification, AI completes tasks in whatever way seems reasonable - which may not be correct.
## Task: Add user authentication
### Action
Implement login form with:
- Email and password fields
- Client-side validation
- Submit handler calling /api/auth/login
- Error state display
- Loading state during submission
### Verify
- [ ] Component renders without errors
- [ ] Empty submission shows validation errors
- [ ] Valid submission calls API endpoint
- [ ] API error displays in form
- [ ] Loading state prevents double-submit
### Done
Form component exists, all verification checks pass, ready for integration.The verification checklist is not optional decoration. It is the contract. AI execution must satisfy these conditions. If verification fails, the task is not complete.
Atomic commits make this traceable. Each task produces one commit. Git history becomes a log of AI-assisted work. When something breaks, git bisect finds the exact task. When you need context, the commit message explains what was intended.
Honest failures make patterns credible. Here is what I tried that did not work.
Long-running sessions for complex features. The plan: let AI handle an entire feature over several hours. The result: declining quality, accumulating confusion, outputs that contradicted earlier decisions. The AI was not getting worse - it was drowning in context. Fresh agents per task fixed this.
Over-specified tasks. The plan: detailed 500-word task descriptions for every step. The result: AI followed instructions literally, missing obvious improvements. Rigid specifications killed judgment. Better: clear objectives with verification, letting AI make implementation decisions.
Under-specified tasks. The opposite failure: 'implement user auth' with no constraints. The AI made reasonable choices that did not fit the project. Added its own patterns. Chose libraries we did not want. Balance: specify constraints and verification, leave implementation flexible.
Clever meta-prompting. The plan: prompts that generated prompts that generated code. The result: complexity without benefit. Each layer added interpretation errors. Simple, direct instructions outperformed clever indirection.
Trying to maintain state across sessions. The plan: detailed handoff documents between sessions. The result: the handoff was never complete enough. Fresh context + task definitions work better than trying to transfer state. Context is cheap. Let each agent read what it needs.
Aggressive parallelization. The plan: run 10 AI tasks in parallel. The result: conflicts, merge nightmares, agents making incompatible decisions. Sequential execution with fresh contexts per task is slower but dramatically more reliable.
Start with this structure. Customize for your project.
# CLAUDE.md
## Project Overview
[What is this? Who uses it? One paragraph max.]
## Development Commands
```bash
npm run dev # Start dev server at localhost:3000
npm run build # Production build
npm run lint # ESLint check
npm run typecheck # TypeScript check
npm run test # Run test suite
```
## Architecture
[How is the code organized? 3-5 bullet points.]
- Source code in /src
- Components in /components with colocated styles
- API routes in /app/api
- Shared utilities in /lib
## Key Patterns
[What conventions should AI follow? Be specific.]
- Use existing UI components from /components/ui
- All props need TypeScript interfaces
- No inline styles - use Tailwind classes
- Tests in __tests__ folder next to source
## Key Files
| File | Purpose |
|------|---------|
| /lib/constants.ts | App-wide constants |
| /types/index.ts | Shared TypeScript types |
| /components/ui/index.ts | UI component exports |
## Environment Variables
[What is needed to run locally?]
```
DATABASE_URL # Postgres connection
NEXT_PUBLIC_URL # Base URL for client
```Organize skills in a known location. This structure works:
.claude/
skills/
create-component.md
add-tests.md
update-docs.md
fix-bug.md
workflows/
feature-complete.md # chains skills
bug-fix-cycle.md # chains skills
hooks/
pre-commit.md # runs before commit
post-task.md # runs after each taskEach skill file follows the same structure: Purpose, Inputs, Process, Output, Verification. Consistency makes them predictable.
Before marking any AI task complete, verify:
**Functional:**
- [ ] Feature works as specified
- [ ] Edge cases handled (null, empty, error states)
- [ ] No console errors or warnings
**Code quality:**
- [ ] TypeScript compiles without errors
- [ ] Lint passes
- [ ] No commented-out code
- [ ] No hardcoded values that should be configurable
**Integration:**
- [ ] Imports use correct paths
- [ ] No circular dependencies introduced
- [ ] Existing tests still pass
- [ ] New code has tests if appropriate
**Context:**
- [ ] Follows project patterns (check CLAUDE.md)
- [ ] Uses existing utilities instead of reimplementing
- [ ] Consistent naming with rest of codebaseSkip nothing. AI-generated code passes type-checking but can fail these checks. Verification is the difference between AI-assisted quality and AI-introduced bugs.
When starting AI work, choose the right context strategy:
Is this continuing previous work?
├─ YES: Does AI need to understand the full history?
│ ├─ YES: Summarize history in task context (don't use same session)
│ └─ NO: Start fresh, reference only what's needed
└─ NO: Start fresh (default)
Is this a large change?
├─ YES: Break into atomic tasks, one commit each
└─ NO: Single task, single commit
Does this need human judgment?
├─ YES: Define task, let AI propose, human reviews before commit
└─ NO: Define task with verification, AI can commit after passing
Does this touch multiple systems?
├─ YES: Sequential tasks, verify after each, no parallelism
└─ NO: Can parallelize if tasks are independentThe default is fresh context. The exception is when history genuinely matters - and even then, summarize rather than continue.
AI orchestration is not about maximizing what AI does. It is about creating reliable patterns where AI work compounds. Fresh contexts. Atomic tasks. Explicit verification. These patterns scale. Clever prompting does not.
Want help implementing this? Let's talk about your system or workflow.