← Back to blog

How I Use Ralph to Build a Full SaaS with Claude Code

Ralph is a technique for running AI coding agents in a loop. You run the same prompt repeatedly. The AI picks its own tasks from a PRD. It commits after each feature. You come back later to working code.

β€” AI Hero

Visual flowchart of how Ralph works

Make AI work like a software engineer: tight feedback loops, automated verification, human oversight where it matters.


Quick Start

# 1. Write your PRD with user stories
vim PRD.md

# 2. Convert to machine-readable format
/ralph-init

# 3. Run one iteration (human-in-the-loop)
./ralph-once.sh

# 4. Or run autonomously overnight
./afk-ralph.sh 10

Key files:

  • PRD.md β†’ Human-readable user stories with [ ] checkboxes
  • prd.json β†’ Machine-readable version Ralph executes against
  • CLAUDE.md β†’ Project context and accumulated learnings
  • progress.txt β†’ Log of completed work

The Problem: Context Windows

LLMs have a fundamental limitation: context windows. Long conversations drift, accumulate errors, and eventually break.

The Ralph solution: Start fresh every time.

Each iteration:

  • Picks ONE small story from prd.json
  • Completes it with TDD in a single context window
  • Commits and exits

No drift. No accumulated confusion. Progress lives in files (prd.json, CLAUDE.md), not in the AI's "memory."

Ralph = many small, independent sessions instead of one long broken one.

The Other Half: Feedback Loops

LLMs can't "know" if their code works just by writing it. They need automated verification at every step.

Ralph's verification stack:

  • Unit tests - pytest, vitest (must pass before commit)
  • Type checking - mypy, TypeScript strict mode
  • Docker - actually run the app, not just tests
  • Browser tools - Puppeteer screenshots for UI changes
  • Logs - check for runtime errors

No "I think this works." Only "tests pass, Docker runs, browser shows expected result."

Without feedback loops, AI just generates plausible-looking code. With them, it generates working code.

The Foundation: A Well-Structured PRD

Ralph is only as good as the PRD you give it. Two rules:

1. Right-sized stories - Each story must complete in ONE context window.

Too big? Split it:

  • Data layer first (models, migrations)
  • Backend logic (services, APIs)
  • Frontend UI (pages, components)
  • Integration (connecting pieces)

2. Verifiable acceptance criteria - Not "works correctly" but specific, testable outcomes.

BadGood
"Handles errors properly""Returns 401 when token is missing"
"Good UX""Shows loading spinner while fetching"
"Is secure""Passwords hashed with bcrypt"

Every story must include: "Typecheck passes" + "Unit tests pass"

Bad PRD = AI spinning in circles. Good PRD = overnight productivity.


The Daily Loop

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                                                                 β”‚
β”‚   EVENING              NIGHT                MORNING       DAY   β”‚
β”‚   ────────             ─────                ───────       ───   β”‚
β”‚                                                                 β”‚
β”‚   Start Ralph    β†’    AI executes    β†’    Human reviews  β†’  ↡  β”‚
β”‚   ./afk-ralph.sh      autonomously        /morning-routine      β”‚
β”‚                       (10+ stories)                             β”‚
β”‚                                                                 β”‚
β”‚   ←─────────────────── repeat daily ────────────────────────────│
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
ResponsibilityHumanAI (Ralph)
Write PRD with acceptance criteriaβœ“
Execute stories with TDDβœ“
Run automated quality checksβœ“
Commit codeβœ“
Review completed workβœ“
Triage suggestions & prioritize bugsβœ“
Manual browser testingβœ“
Learn patterns for CLAUDE.mdβœ“

Key insight: Humans decide what and why. AI handles how.


Morning Routine

After Ralph runs overnight, follow these 6 steps to stay in control. You can run /morning-routine to be guided through this process, or follow manually:

/prd-sync β†’ Review β†’ /prd-archive β†’ /review-suggestions β†’ Manual QA β†’ /ralph-init

Step 1: /prd-sync

Sync completed stories from prd.json back to PRD.md checkboxes.

Skip if: No stories completed overnight

Step 2: Review What Was Done

tail -50 progress.txt      # What Ralph completed
git log --oneline -20      # Scan commits for surprises

Look for: unusually large commits, odd messages, changes to unexpected files.

Step 3: /prd-archive

Move fully completed epics to ARCHIVED_PRD.md.

Skip if: No fully completed epics

Step 4: /review-suggestions

Triage each item in SUGGESTIONS.md:

  • Remove - Not needed or already done
  • Promote to PRD - Real issue, needs a user story
  • Skip - Valid but save for later

Step 5: Manual QA / Handle Promoted Items

Actually use the app. Click around. Try edge cases. No automated test replaces human eyes on the product.

For promoted items: /prd creates proper user stories.

Step 6: /ralph-init

Generate fresh prd.json with only incomplete stories.

Skip if: Not running Ralph today

After Routine

git add PRD.md prd.json SUGGESTIONS.md progress.txt
git commit -m "chore: morning routine sync and cleanup"

What Ralph Does at Night

Once you run ./afk-ralph.sh N, Claude autonomously executes this loop:

  1. Picks the next story where passes: false in prd.json
  2. Implements with TDD - writes failing test first, then code
  3. Runs quality checks - pytest, mypy, npm test, npm run check
  4. Verifies in Docker - unit tests alone aren't enough
  5. Verifies in Browser - using browser tools for UI changes
  6. Updates prd.json - sets passes: true
  7. Commits with a clear message
  8. Logs learnings to progress.txt and CLAUDE.md
  9. Loops until N iterations complete or all stories done

Periodic QA: Every 3-5 iterations, Ralph runs /exploratory-qa to check for cross-cutting concerns (navigation consistency, error handling, security issues) and logs observations to SUGGESTIONS.md.


Why This Works

  • No regressions - Every change must pass existing tests. AI can't break what already works.
  • Autonomous but bounded - Clear acceptance criteria prevent tangents.
  • Learning accumulates - CLAUDE.md grows smarter with each session.
  • Human judgment where needed - PRD creation, bug triage, and priorities stay human.

Results: MediaJanitor in Numbers

Here's what happened when I let Ralph build MediaJanitor:

MetricValue
Commits generated350+
User stories completed150+
My time spent10-15 hours

That's 10-15 hours total. Not per weekβ€”total. Over the entire project.

What did I actually do?

  • PRD writing and refinement - Defining features, splitting stories, writing acceptance criteria
  • Morning routine reviews - Checking what Ralph built, triaging suggestions
  • Manual fixes - The occasional bug Ralph couldn't solve
  • External setup - Slack notifications, SMTP2GO for emails, deployment configs

Everything else? Ralph handled it. Backend APIs, database models, frontend pages, test suites, Docker configs.


What I Got Wrong

Ralph isn't magic. When it fails, it's usually my fault.

Vague PRDs = Wasted iterations

Early on, I wrote stories like "Add user settings page." Too vague. Ralph would build something, but not what I wanted. Now I specify: "Settings page with email notification toggles. Saves to user_preferences table. Shows success toast on save."

AI takes shortcuts

Sometimes Ralph can't verify something (browser test flaky, Docker not running) and marks the story complete anyway. It's not lyingβ€”it's optimizing for "move forward." That's why morning reviews matter. Trust but verify.

Human review is non-negotiable

I tried running Ralph for 20+ iterations without checking. Came back to a mess. Now I cap at 10 iterations and review every morning. The loop works best when humans stay in it.


Files Reference

FilePurpose
PRD.mdHuman-readable user stories with [ ] checkboxes
prd.jsonMachine-readable version Ralph executes against
progress.txtLog of completed tasks
CLAUDE.mdProject context and accumulated learnings
SUGGESTIONS.mdQA observations with [P1]/[P2]/[P3] priorities
prompt.mdThe execution prompt Ralph follows

Skills Reference

Planning Skills (before Ralph runs)

/prd - Create well-structured PRDs

Guides you through writing PRDs with right-sized stories and verifiable acceptance criteria.

/ralph-init - PRD to JSON

Converts PRD.md to prd.json. Validates story sizing and assigns priorities so dependencies run first.

Review Skills (after Ralph runs)

/morning-routine - Guided morning review

Walks you through all 6 steps of the Morning Routine: sync, review, archive, triage, QA, and re-init.

/review-suggestions - Triage workflow

Interactively review each item in SUGGESTIONS.md.

/exploratory-qa - Periodic QA

Every 3-5 iterations, reviews the app for navigation consistency, error handling, auth guards, UI coherence, and security issues.

Maintenance Skills (as needed)

/prd-sync - Keep PRD.md in sync

After Ralph completes stories, syncs passes: true back to PRD.md checkboxes.

/prd-archive - Clean up completed work

Archives completed epics to ARCHIVED_PRD.md. Keeps active PRD focused.


Final Thoughts

This is my adaptation of Ralph by AI Hero, tuned for Claude Code.

The core insight: AI coding agents work best with structure. Give them a clear PRD, tight feedback loops, and fresh context every iteration. They'll build real software while you sleep.

Is it perfect? No. You'll still debug, still review, still think. But the ratio shifts. Instead of writing code, you're directing an engineer who never gets tired.

350 commits. 150 user stories. 10-15 hours of my time. MediaJanitor exists because of this workflow.

Try it on your next project.