Ralph is a technique for running AI coding agents in a loop. You run the same prompt repeatedly. The AI picks its own tasks from a PRD. It commits after each feature. You come back later to working code.
β AI Hero
Visual flowchart of how Ralph works
Make AI work like a software engineer: tight feedback loops, automated verification, human oversight where it matters.
Quick Start
# 1. Write your PRD with user stories
vim PRD.md
# 2. Convert to machine-readable format
/ralph-init
# 3. Run one iteration (human-in-the-loop)
./ralph-once.sh
# 4. Or run autonomously overnight
./afk-ralph.sh 10
Key files:
PRD.mdβ Human-readable user stories with[ ]checkboxesprd.jsonβ Machine-readable version Ralph executes againstCLAUDE.mdβ Project context and accumulated learningsprogress.txtβ Log of completed work
The Problem: Context Windows
LLMs have a fundamental limitation: context windows. Long conversations drift, accumulate errors, and eventually break.
The Ralph solution: Start fresh every time.
Each iteration:
- Picks ONE small story from prd.json
- Completes it with TDD in a single context window
- Commits and exits
No drift. No accumulated confusion. Progress lives in files (prd.json, CLAUDE.md), not in the AI's "memory."
Ralph = many small, independent sessions instead of one long broken one.
The Other Half: Feedback Loops
LLMs can't "know" if their code works just by writing it. They need automated verification at every step.
Ralph's verification stack:
- Unit tests - pytest, vitest (must pass before commit)
- Type checking - mypy, TypeScript strict mode
- Docker - actually run the app, not just tests
- Browser tools - Puppeteer screenshots for UI changes
- Logs - check for runtime errors
No "I think this works." Only "tests pass, Docker runs, browser shows expected result."
Without feedback loops, AI just generates plausible-looking code. With them, it generates working code.
The Foundation: A Well-Structured PRD
Ralph is only as good as the PRD you give it. Two rules:
1. Right-sized stories - Each story must complete in ONE context window.
Too big? Split it:
- Data layer first (models, migrations)
- Backend logic (services, APIs)
- Frontend UI (pages, components)
- Integration (connecting pieces)
2. Verifiable acceptance criteria - Not "works correctly" but specific, testable outcomes.
| Bad | Good |
|---|---|
| "Handles errors properly" | "Returns 401 when token is missing" |
| "Good UX" | "Shows loading spinner while fetching" |
| "Is secure" | "Passwords hashed with bcrypt" |
Every story must include: "Typecheck passes" + "Unit tests pass"
Bad PRD = AI spinning in circles. Good PRD = overnight productivity.
The Daily Loop
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β EVENING NIGHT MORNING DAY β
β ββββββββ βββββ βββββββ βββ β
β β
β Start Ralph β AI executes β Human reviews β β΅ β
β ./afk-ralph.sh autonomously /morning-routine β
β (10+ stories) β
β β
β ββββββββββββββββββββ repeat daily βββββββββββββββββββββββββββββ
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Responsibility | Human | AI (Ralph) |
|---|---|---|
| Write PRD with acceptance criteria | β | |
| Execute stories with TDD | β | |
| Run automated quality checks | β | |
| Commit code | β | |
| Review completed work | β | |
| Triage suggestions & prioritize bugs | β | |
| Manual browser testing | β | |
| Learn patterns for CLAUDE.md | β |
Key insight: Humans decide what and why. AI handles how.
Morning Routine
After Ralph runs overnight, follow these 6 steps to stay in control. You can run /morning-routine to be guided through this process, or follow manually:
/prd-sync β Review β /prd-archive β /review-suggestions β Manual QA β /ralph-init
Step 1: /prd-sync
Sync completed stories from prd.json back to PRD.md checkboxes.
Skip if: No stories completed overnight
Step 2: Review What Was Done
tail -50 progress.txt # What Ralph completed
git log --oneline -20 # Scan commits for surprises
Look for: unusually large commits, odd messages, changes to unexpected files.
Step 3: /prd-archive
Move fully completed epics to ARCHIVED_PRD.md.
Skip if: No fully completed epics
Step 4: /review-suggestions
Triage each item in SUGGESTIONS.md:
- Remove - Not needed or already done
- Promote to PRD - Real issue, needs a user story
- Skip - Valid but save for later
Step 5: Manual QA / Handle Promoted Items
Actually use the app. Click around. Try edge cases. No automated test replaces human eyes on the product.
For promoted items: /prd creates proper user stories.
Step 6: /ralph-init
Generate fresh prd.json with only incomplete stories.
Skip if: Not running Ralph today
After Routine
git add PRD.md prd.json SUGGESTIONS.md progress.txt
git commit -m "chore: morning routine sync and cleanup"
What Ralph Does at Night
Once you run ./afk-ralph.sh N, Claude autonomously executes this loop:
- Picks the next story where
passes: falsein prd.json - Implements with TDD - writes failing test first, then code
- Runs quality checks - pytest, mypy, npm test, npm run check
- Verifies in Docker - unit tests alone aren't enough
- Verifies in Browser - using browser tools for UI changes
- Updates prd.json - sets
passes: true - Commits with a clear message
- Logs learnings to progress.txt and CLAUDE.md
- Loops until N iterations complete or all stories done
Periodic QA: Every 3-5 iterations, Ralph runs /exploratory-qa to check for cross-cutting concerns (navigation consistency, error handling, security issues) and logs observations to SUGGESTIONS.md.
Why This Works
- No regressions - Every change must pass existing tests. AI can't break what already works.
- Autonomous but bounded - Clear acceptance criteria prevent tangents.
- Learning accumulates - CLAUDE.md grows smarter with each session.
- Human judgment where needed - PRD creation, bug triage, and priorities stay human.
Results: MediaJanitor in Numbers
Here's what happened when I let Ralph build MediaJanitor:
| Metric | Value |
|---|---|
| Commits generated | 350+ |
| User stories completed | 150+ |
| My time spent | 10-15 hours |
That's 10-15 hours total. Not per weekβtotal. Over the entire project.
What did I actually do?
- PRD writing and refinement - Defining features, splitting stories, writing acceptance criteria
- Morning routine reviews - Checking what Ralph built, triaging suggestions
- Manual fixes - The occasional bug Ralph couldn't solve
- External setup - Slack notifications, SMTP2GO for emails, deployment configs
Everything else? Ralph handled it. Backend APIs, database models, frontend pages, test suites, Docker configs.
What I Got Wrong
Ralph isn't magic. When it fails, it's usually my fault.
Vague PRDs = Wasted iterations
Early on, I wrote stories like "Add user settings page." Too vague. Ralph would build something, but not what I wanted. Now I specify: "Settings page with email notification toggles. Saves to user_preferences table. Shows success toast on save."
AI takes shortcuts
Sometimes Ralph can't verify something (browser test flaky, Docker not running) and marks the story complete anyway. It's not lyingβit's optimizing for "move forward." That's why morning reviews matter. Trust but verify.
Human review is non-negotiable
I tried running Ralph for 20+ iterations without checking. Came back to a mess. Now I cap at 10 iterations and review every morning. The loop works best when humans stay in it.
Files Reference
| File | Purpose |
|---|---|
PRD.md | Human-readable user stories with [ ] checkboxes |
prd.json | Machine-readable version Ralph executes against |
progress.txt | Log of completed tasks |
CLAUDE.md | Project context and accumulated learnings |
SUGGESTIONS.md | QA observations with [P1]/[P2]/[P3] priorities |
prompt.md | The execution prompt Ralph follows |
Skills Reference
Planning Skills (before Ralph runs)
/prd - Create well-structured PRDs
Guides you through writing PRDs with right-sized stories and verifiable acceptance criteria.
/ralph-init - PRD to JSON
Converts PRD.md to prd.json. Validates story sizing and assigns priorities so dependencies run first.
Review Skills (after Ralph runs)
/morning-routine - Guided morning review
Walks you through all 6 steps of the Morning Routine: sync, review, archive, triage, QA, and re-init.
/review-suggestions - Triage workflow
Interactively review each item in SUGGESTIONS.md.
/exploratory-qa - Periodic QA
Every 3-5 iterations, reviews the app for navigation consistency, error handling, auth guards, UI coherence, and security issues.
Maintenance Skills (as needed)
/prd-sync - Keep PRD.md in sync
After Ralph completes stories, syncs passes: true back to PRD.md checkboxes.
/prd-archive - Clean up completed work
Archives completed epics to ARCHIVED_PRD.md. Keeps active PRD focused.
Final Thoughts
This is my adaptation of Ralph by AI Hero, tuned for Claude Code.
The core insight: AI coding agents work best with structure. Give them a clear PRD, tight feedback loops, and fresh context every iteration. They'll build real software while you sleep.
Is it perfect? No. You'll still debug, still review, still think. But the ratio shifts. Instead of writing code, you're directing an engineer who never gets tired.
350 commits. 150 user stories. 10-15 hours of my time. MediaJanitor exists because of this workflow.
Try it on your next project.