January 21, 2026

How I Use Ralph to Build a Full SaaS with Claude Code

Ralph is a technique for running AI coding agents in a loop. You run the same prompt repeatedly. The AI picks its own tasks from a PRD. It commits after each feature. You come back later to working code.

— AI Hero

Visual flowchart of how Ralph works

Make AI work like a software engineer: tight feedback loops, automated verification, human oversight where it matters.

Quick Start

# 1. Write your PRD with user stories
vim PRD.md

# 2. Convert to machine-readable format
/ralph-init

# 3. Run one iteration (human-in-the-loop)
./ralph-once.sh

# 4. Or run autonomously overnight
./afk-ralph.sh 10

Key files:

PRD.md → Human-readable user stories with [ ] checkboxes
prd.json → Machine-readable version Ralph executes against
CLAUDE.md → Project context and accumulated learnings
progress.txt → Log of completed work

The Problem: Context Windows

LLMs have a fundamental limitation: context windows. Long conversations drift, accumulate errors, and eventually break.

The Ralph solution: Start fresh every time.

Each iteration:

Picks ONE small story from prd.json
Completes it with TDD in a single context window
Commits and exits

No drift. No accumulated confusion. Progress lives in files (prd.json, CLAUDE.md), not in the AI's "memory."

Ralph = many small, independent sessions instead of one long broken one.

The Other Half: Feedback Loops

LLMs can't "know" if their code works just by writing it. They need automated verification at every step.

Ralph's verification stack:

Unit tests - pytest, vitest (must pass before commit)
Type checking - mypy, TypeScript strict mode
Docker - actually run the app, not just tests
Browser tools - Puppeteer screenshots for UI changes
Logs - check for runtime errors

No "I think this works." Only "tests pass, Docker runs, browser shows expected result."

Without feedback loops, AI just generates plausible-looking code. With them, it generates working code.

The Foundation: A Well-Structured PRD

Ralph is only as good as the PRD you give it. Two rules:

1. Right-sized stories - Each story must complete in ONE context window.

Too big? Split it:

Data layer first (models, migrations)
Backend logic (services, APIs)
Frontend UI (pages, components)
Integration (connecting pieces)

2. Verifiable acceptance criteria - Not "works correctly" but specific, testable outcomes.

Bad	Good
"Handles errors properly"	"Returns 401 when token is missing"
"Good UX"	"Shows loading spinner while fetching"
"Is secure"	"Passwords hashed with bcrypt"

Every story must include: "Typecheck passes" + "Unit tests pass"

Bad PRD = AI spinning in circles. Good PRD = overnight productivity.

The Daily Loop

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   EVENING              NIGHT                MORNING       DAY   │
│   ────────             ─────                ───────       ───   │
│                                                                 │
│   Start Ralph    →    AI executes    →    Human reviews  →  ↵  │
│   ./afk-ralph.sh      autonomously        /morning-routine      │
│                       (10+ stories)                             │
│                                                                 │
│   ←─────────────────── repeat daily ────────────────────────────│
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Responsibility	Human	AI (Ralph)
Write PRD with acceptance criteria	✓
Execute stories with TDD		✓
Run automated quality checks		✓
Commit code		✓
Review completed work	✓
Triage suggestions & prioritize bugs	✓
Manual browser testing	✓
Learn patterns for CLAUDE.md		✓

Key insight: Humans decide what and why. AI handles how.

Morning Routine

After Ralph runs overnight, follow these 6 steps to stay in control. You can run /morning-routine to be guided through this process, or follow manually:

/prd-sync → Review → /prd-archive → /review-suggestions → Manual QA → /ralph-init

Step 1: /prd-sync

Sync completed stories from prd.json back to PRD.md checkboxes.

Skip if: No stories completed overnight

Step 2: Review What Was Done

tail -50 progress.txt      # What Ralph completed
git log --oneline -20      # Scan commits for surprises

Look for: unusually large commits, odd messages, changes to unexpected files.

Step 3: /prd-archive

Move fully completed epics to ARCHIVED_PRD.md.

Skip if: No fully completed epics

Step 4: /review-suggestions

Triage each item in SUGGESTIONS.md:

Remove - Not needed or already done
Promote to PRD - Real issue, needs a user story
Skip - Valid but save for later

Step 5: Manual QA / Handle Promoted Items

Actually use the app. Click around. Try edge cases. No automated test replaces human eyes on the product.

For promoted items: /prd creates proper user stories.

Step 6: /ralph-init

Generate fresh prd.json with only incomplete stories.

Skip if: Not running Ralph today

After Routine

git add PRD.md prd.json SUGGESTIONS.md progress.txt
git commit -m "chore: morning routine sync and cleanup"

What Ralph Does at Night

Once you run ./afk-ralph.sh N, Claude autonomously executes this loop:

Picks the next story where passes: false in prd.json
Implements with TDD - writes failing test first, then code
Runs quality checks - pytest, mypy, npm test, npm run check
Verifies in Docker - unit tests alone aren't enough
Verifies in Browser - using browser tools for UI changes
Updates prd.json - sets passes: true
Commits with a clear message
Logs learnings to progress.txt and CLAUDE.md
Loops until N iterations complete or all stories done

Periodic QA: Every 3-5 iterations, Ralph runs /exploratory-qa to check for cross-cutting concerns (navigation consistency, error handling, security issues) and logs observations to SUGGESTIONS.md.

Why This Works

No regressions - Every change must pass existing tests. AI can't break what already works.
Autonomous but bounded - Clear acceptance criteria prevent tangents.
Learning accumulates - CLAUDE.md grows smarter with each session.
Human judgment where needed - PRD creation, bug triage, and priorities stay human.

Results: MediaJanitor in Numbers

Here's what happened when I let Ralph build MediaJanitor:

Metric	Value
Commits generated	350+
User stories completed	150+
My time spent	10-15 hours

That's 10-15 hours total. Not per week—total. Over the entire project.

What did I actually do?

PRD writing and refinement - Defining features, splitting stories, writing acceptance criteria
Morning routine reviews - Checking what Ralph built, triaging suggestions
Manual fixes - The occasional bug Ralph couldn't solve
External setup - Slack notifications, SMTP2GO for emails, deployment configs

Everything else? Ralph handled it. Backend APIs, database models, frontend pages, test suites, Docker configs.

What I Got Wrong

Ralph isn't magic. When it fails, it's usually my fault.

Vague PRDs = Wasted iterations

Early on, I wrote stories like "Add user settings page." Too vague. Ralph would build something, but not what I wanted. Now I specify: "Settings page with email notification toggles. Saves to user_preferences table. Shows success toast on save."

AI takes shortcuts

Sometimes Ralph can't verify something (browser test flaky, Docker not running) and marks the story complete anyway. It's not lying—it's optimizing for "move forward." That's why morning reviews matter. Trust but verify.

Human review is non-negotiable

I tried running Ralph for 20+ iterations without checking. Came back to a mess. Now I cap at 10 iterations and review every morning. The loop works best when humans stay in it.

Files Reference

File	Purpose
`PRD.md`	Human-readable user stories with `[ ]` checkboxes
`prd.json`	Machine-readable version Ralph executes against
`progress.txt`	Log of completed tasks
`CLAUDE.md`	Project context and accumulated learnings
`SUGGESTIONS.md`	QA observations with [P1]/[P2]/[P3] priorities
`prompt.md`	The execution prompt Ralph follows

Skills Reference

Planning Skills (before Ralph runs)

/prd - Create well-structured PRDs

Guides you through writing PRDs with right-sized stories and verifiable acceptance criteria.

/ralph-init - PRD to JSON

Converts PRD.md to prd.json. Validates story sizing and assigns priorities so dependencies run first.

Review Skills (after Ralph runs)

/morning-routine - Guided morning review

Walks you through all 6 steps of the Morning Routine: sync, review, archive, triage, QA, and re-init.

/review-suggestions - Triage workflow

Interactively review each item in SUGGESTIONS.md.

/exploratory-qa - Periodic QA

Every 3-5 iterations, reviews the app for navigation consistency, error handling, auth guards, UI coherence, and security issues.

Maintenance Skills (as needed)

/prd-sync - Keep PRD.md in sync

After Ralph completes stories, syncs passes: true back to PRD.md checkboxes.

/prd-archive - Clean up completed work

Archives completed epics to ARCHIVED_PRD.md. Keeps active PRD focused.

Final Thoughts

This is my adaptation of Ralph by AI Hero, tuned for Claude Code.

The core insight: AI coding agents work best with structure. Give them a clear PRD, tight feedback loops, and fresh context every iteration. They'll build real software while you sleep.

Is it perfect? No. You'll still debug, still review, still think. But the ratio shifts. Instead of writing code, you're directing an engineer who never gets tired.

350 commits. 150 user stories. 10-15 hours of my time. MediaJanitor exists because of this workflow.

Try it on your next project.

Ralph est une technique pour faire tourner des agents IA en boucle. Vous executez le meme prompt de maniere repetee. L'IA choisit ses propres taches depuis un PRD. Elle commit apres chaque fonctionnalite. Vous revenez plus tard avec du code fonctionnel.

— AI Hero

Diagramme visuel du fonctionnement de Ralph

Faire travailler l'IA comme un ingenieur logiciel : boucles de feedback serrees, verification automatisee, supervision humaine la ou ca compte.

Demarrage rapide

# 1. Ecrire votre PRD avec les user stories
vim PRD.md

# 2. Convertir en format lisible par la machine
/ralph-init

# 3. Executer une iteration (human-in-the-loop)
./ralph-once.sh

# 4. Ou executer de maniere autonome pendant la nuit
./afk-ralph.sh 10

Fichiers cles :

PRD.md → User stories lisibles avec des cases [ ]
prd.json → Version machine que Ralph execute
CLAUDE.md → Contexte projet et apprentissages accumules
progress.txt → Journal du travail accompli

Le probleme : les fenetres de contexte

Les LLMs ont une limitation fondamentale : les fenetres de contexte. Les longues conversations derivent, accumulent des erreurs, et finissent par casser.

La solution Ralph : Repartir a zero a chaque fois.

Chaque iteration :

Choisit UNE petite story depuis prd.json
La complete avec TDD dans une seule fenetre de contexte
Commit et termine

Pas de derive. Pas de confusion accumulee. La progression vit dans les fichiers (prd.json, CLAUDE.md), pas dans la "memoire" de l'IA.

Ralph = plusieurs petites sessions independantes au lieu d'une longue session cassee.

L'autre moitie : les boucles de feedback

Les LLMs ne peuvent pas "savoir" si leur code fonctionne juste en l'ecrivant. Ils ont besoin de verification automatisee a chaque etape.

La stack de verification de Ralph :

Tests unitaires - pytest, vitest (doivent passer avant le commit)
Verification de types - mypy, mode strict TypeScript
Docker - executer vraiment l'app, pas juste les tests
Outils navigateur - screenshots Puppeteer pour les changements UI
Logs - verifier les erreurs runtime

Pas de "je pense que ca marche." Seulement "les tests passent, Docker tourne, le navigateur affiche le resultat attendu."

Sans boucles de feedback, l'IA genere juste du code qui a l'air plausible. Avec elles, elle genere du code fonctionnel.

La fondation : un PRD bien structure

Ralph n'est aussi bon que le PRD que vous lui donnez. Deux regles :

1. Stories de la bonne taille - Chaque story doit se completer dans UNE fenetre de contexte.

Trop gros ? Decoupez :

Couche donnees d'abord (modeles, migrations)
Logique backend (services, APIs)
UI frontend (pages, composants)
Integration (connexion des pieces)

2. Criteres d'acceptation verifiables - Pas "fonctionne correctement" mais des resultats specifiques et testables.

Mauvais	Bon
"Gere bien les erreurs"	"Retourne 401 quand le token est manquant"
"Bonne UX"	"Affiche un spinner de chargement pendant le fetch"
"Est securise"	"Mots de passe hashes avec bcrypt"

Chaque story doit inclure : "Typecheck passe" + "Tests unitaires passent"

Mauvais PRD = IA qui tourne en rond. Bon PRD = productivite nocturne.

La boucle quotidienne

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│   SOIR                   NUIT                MATIN        JOUR  │
│   ────                   ────                ─────        ────  │
│                                                                 │
│   Lancer Ralph    →    IA execute     →    Review humain  →  ↵  │
│   ./afk-ralph.sh       de maniere          /morning-routine     │
│                        autonome                                 │
│                        (10+ stories)                            │
│                                                                 │
│   ←─────────────────── repeter quotidiennement ─────────────────│
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Responsabilite	Humain	IA (Ralph)
Ecrire le PRD avec criteres d'acceptation	✓
Executer les stories avec TDD		✓
Executer les checks de qualite automatises		✓
Commiter le code		✓
Reviewer le travail accompli	✓
Trier les suggestions & prioriser les bugs	✓
Tests manuels navigateur	✓
Apprendre les patterns pour CLAUDE.md		✓

Insight cle : Les humains decident quoi et pourquoi. L'IA gere le comment.

Resultats : MediaJanitor en chiffres

Voici ce qui s'est passe quand j'ai laisse Ralph construire MediaJanitor :

Metrique	Valeur
Commits generes	350+
User stories completees	150+
Mon temps passe	10-15 heures

C'est 10-15 heures au total. Pas par semaine—au total. Sur l'ensemble du projet.

Qu'est-ce que j'ai vraiment fait ?

Ecriture et affinage du PRD - Definir les fonctionnalites, decouper les stories, ecrire les criteres d'acceptation
Reviews matinaux - Verifier ce que Ralph a construit, trier les suggestions
Corrections manuelles - Les bugs occasionnels que Ralph n'a pas pu resoudre
Configuration externe - Notifications Slack, SMTP2GO pour les emails, configs de deploiement

Tout le reste ? Ralph s'en est charge. APIs backend, modeles de base de donnees, pages frontend, suites de tests, configs Docker.

Ce que j'ai mal fait

Ralph n'est pas magique. Quand ca echoue, c'est generalement ma faute.

PRDs vagues = Iterations gaspillees

Au debut, j'ecrivais des stories comme "Ajouter une page de parametres utilisateur." Trop vague. Ralph construisait quelque chose, mais pas ce que je voulais. Maintenant je specifie : "Page de parametres avec toggles de notifications email. Sauvegarde dans la table user_preferences. Affiche un toast de succes a la sauvegarde."

L'IA prend des raccourcis

Parfois Ralph ne peut pas verifier quelque chose (test navigateur instable, Docker pas lance) et marque la story comme complete quand meme. Elle ne ment pas—elle optimise pour "avancer." C'est pourquoi les reviews matinaux comptent. Faire confiance mais verifier.

La review humaine est non-negociable

J'ai essaye de lancer Ralph pour 20+ iterations sans verifier. Je suis revenu a un bazar. Maintenant je limite a 10 iterations et je review chaque matin. La boucle fonctionne mieux quand les humains restent dedans.

Reflexions finales

C'est mon adaptation de Ralph par AI Hero, ajustee pour Claude Code.

L'insight central : les agents IA de code fonctionnent mieux avec de la structure. Donnez-leur un PRD clair, des boucles de feedback serrees, et un contexte frais a chaque iteration. Ils construiront du vrai logiciel pendant que vous dormez.

Est-ce parfait ? Non. Vous debuggerez encore, reviewerez encore, reflechirez encore. Mais le ratio change. Au lieu d'ecrire du code, vous dirigez un ingenieur qui ne se fatigue jamais.

350 commits. 150 user stories. 10-15 heures de mon temps. MediaJanitor existe grace a ce workflow.

Essayez-le sur votre prochain projet.