A new piece from the SitePoint Team addresses this head-on with "Claude Code in Production: How to Keep Long Runs Stable." Published on April 14, 2026, within their AI and Programming sections, it sets out to demystify the intricacies of maintaining stable AI code behavior over time. The title alone tells you where it’s going: this isn’t about generating a snippet, it’s about making AI a reliable workhorse.

SitePoint Team
Published in
The Core Problem: Why AI Code Breaks Down
The central challenge, as the article outlines, isn't just about random errors; it's rooted in how models like Claude manage context. Long-running sessions inevitably encounter issues because, well, LLMs aren't built for infinite memory. The article plans to dissect how Claude Code specifically handles context and where those systems typically fail, which is crucial for anyone trying to push these models into real-world applications. If you're building with AI, understanding these limitations is your first line of defense against production woes.Strategies for Stability
Rather than looking for a magic bullet – a point the article explicitly makes with "No Magic Flag Exists" – the authors propose a methodical, engineering-first approach. They promise to lay out a structured path to stability, starting with a "Checkpoint Workflow" to structure long runs. This suggests breaking down complex tasks into manageable, savable segments, which is a sensible strategy for any fragile system. Beyond process, they'll also cover tangible artifacts and practices. There's a section dedicated to CLAUDE.md, implying a specific file or documentation approach critical for maintaining these long-lived sessions. And for the prompts themselves, the article will guide readers on "Prompt Hygiene," detailing how to craft instructions that can withstand extended contexts without degradation. Finally, because things *will* go wrong, the piece will tackle "Failure Recovery." This isn't just about fixing bugs; it's about anticipating and mitigating the inevitable hiccups when an AI is left to its own devices for too long. A "Production Stability Checklist" and a sample production run are promised, aiming to pull these concepts together into a practical guide. What's clear here is that the article isn't offering shortcuts. It's pushing for rigorous software engineering principles applied to the new frontier of AI code execution. And that’s a perspective we desperately need right now.Using Claude Code for quick, isolated tasks feels almost like cheating. You ask it to spin up a utility function, fix a pesky type error, or scaffold a test, and it usually nails it, no fuss. But the moment your ambitions stretch beyond these well-defined boundaries – think sessions pushing past the 30-minute mark, touching two dozen files, or demanding sustained, multi-phase reasoning – you're dealing with an entirely different beast. What works for a sprint falls apart in a marathon, and the failure modes shift dramatically. This isn't about arbitrary limits. My testing suggests these thresholds – around 30 minutes or 20 files – are where Claude Code's reliability profile fundamentally changes. For developers pushing this tool into production, especially for non-trivial, multi-file projects, you need a workflow engineered to address these new failure profiles. This isn't just about tweaking a prompt; it's about building a structured process around the AI's inherent limitations. What happens when those longer sessions inevitably break down? You'll typically encounter three distinct issues. First, there's **context window exhaustion**. This is mechanical; every action Claude takes – a tool call, a file read, a command execution, or even just another turn in the conversation – consumes tokens from a finite budget. Eventually, that window fills up. Then there's **goal drift**. Over extended interactions, Claude Code can subtly lose its grip on the original task, as older instructions fade from its effective working memory. It's like asking someone to keep a complex mental checklist updated for hours without external reminders. And finally, the most damaging of all: **compounding errors**. A minor misinterpretation in an early phase can snowball into a significant architectural flaw by the time you're deep into the project. By then, correcting it becomes an expensive, time-consuming nightmare.Understanding Claude's Ephemeral Memory
The Context Window Is a Leaky Bucket
Claude Code doesn't just process a single prompt and spit out a response. It operates in an agentic mode, meaning it performs a series of actions: reading files, running shell commands, writing to disk, and calling external tools. Each of these actions, along with the back-and-forth conversation, generates output that occupies space in its context window. It's a constant competition for memory between your instructions (the conversation context) and the accumulated state of its operations (working memory). Here's the thing: that context window isn't infinitely elastic. When it starts approaching capacity, Claude Code initiates an auto-compaction process. It might summarize older parts of the conversation or even deprioritize them to make room for new information. While Anthropic doesn't openly document the exact internal mechanisms for this, and it likely changes between versions, the practical outcome is often silent information loss. A critical constraint you mentioned in message three could be reduced to a vague summary, or worse, dropped entirely by message forty. The model won't warn you that it's lost access to previous instructions. It simply moves forward with what's left in its effective context, which may no longer align with your core requirements.The practical result is silent information loss: auto-compaction may compress a constraint from message three into a vague summary or drop it entirely by message forty.This is why, as a peer, I'm telling you: you cannot assume permanence. You absolutely must verify against the current documentation before relying on any version-dependent features, especially those related to context management or `CLAUDE.md` resolution.
Recognizing the Warning Signs of Context Degradation
Catching context degradation early is crucial, before you've wasted hours on unusable output. There are observable patterns, distinct symptoms that signal Claude Code is losing its way: * **Repeated file reads:** If Claude keeps opening files it's already read earlier in the session, it's a strong indicator. This often means the contents of those files have been compacted out of its working memory, and it's trying to refresh its understanding. * **Contradicting earlier decisions:** You'll see an architectural choice made confidently in phase one get reversed or ignored in phase three, with no acknowledgment of the prior decision. The original reasoning has likely vanished from its active context. * **Forgetting stated requirements:** Key task constraints you laid out in the initial prompt simply stop influencing Claude's behavior. It starts veering off course from the core mission. * **Looping without progress:** Claude enters cycles of re-examining the same code, often narrating "Let me check that again" or similar phrases, but producing no new, meaningful output or advancement. It's stuck. To illustrate, consider what repeated file reads look like in a console:> Read file: src/middleware/auth.ts
> ...processing...
> Read file: src/routes/users.ts
> ...modifying users.ts...
> Read file: src/middleware/auth.ts
> ...processing...
> Read file: src/middleware/auth.ts
> ...processing...
> "Let me verify the auth middleware signature again..."
> Read file: src/middleware/auth.ts
Seeing `src/middleware/auth.ts` pop up three or more times in quick succession? That's your heuristic. It's not a mechanical certainty, but it's a strong signal that the session's context is degrading.
The Checkpoint Workflow: Engineering Stability for Long Runs
Given these challenges, a disciplined approach is essential. The solution lies in what I call the "checkpoint workflow." It's a repeatable production process built on concrete artifacts: carefully crafted `CLAUDE.md` configurations, prompt templates structured with phase gates, robust Git checkpointing, and clear recovery patterns. The overarching goal is to stabilize long-running sessions, enabling intermediate developers to scale their use of Claude Code to complex, multi-file tasks without constant AI debugging. This workflow involves eight core strategies: 1. **Decompose** your task into bounded phases, each with explicit entry and exit criteria, and clear deliverables. 2. **Configure** a `CLAUDE.md` file to establish project architecture, coding conventions, and non-negotiable constraints upfront. 3. **Front-load** critical constraints at the very top of every prompt. This maximizes their chances of surviving context compaction. 4. **Gate** each phase with a verification step, compelling Claude to restate its plan *before* it starts writing code. 5. **Commit** to Git at every single phase boundary. This ensures you always have a clean, stable rollback point. 6. **Monitor** actively for context degradation signals, like those repeated file reads or contradictory decisions. 7. **Restart** the session if Claude demonstrably loses track of more than one requirement or enters a persistent loop without making progress. 8. **Review** all generated diffs against your merge base and execute the full test suite *before* merging anything.Break Work into Bounded Phases
The most effective strategy against context degradation is to constrain each phase so tightly that the context window never even approaches its limit before the phase concludes. From my testing, a sweet spot for a phase is roughly 10 to 15 minutes of wall-clock time. This is a heuristic, mind you; file-heavy tasks will likely demand even shorter phases to prevent those repeated-read symptoms. This is the essence of the "phase-gate" pattern: breaking down large, unwieldy tasks into discrete, manageable chunks, each with clearly defined entry criteria, exit criteria, and specific deliverables. At the very start of a session, paste your entire phase plan directly into Claude Code. This plan needs to be precise enough that each phase could realistically be completed independently, almost as a standalone micro-project.## Phase Plan: Add Authentication to Express API
### Phase 1: Install Dependencies and Configure JWT
- Entry: Clean main branch, no auth code exists
- Tasks: Install jsonwebtoken, bcryptjs, and @types packages (verify whether @types packages are needed — jsonwebtoken ≥9.0.0 bundles its own types). Create src/config/auth.ts with JWT secret loading from env vars (fail loudly if the env var is missing).
- Exit: Dependencies in package.json, auth config file created, app compiles cleanly.
- Commit message: "feat(auth): add JWT dependencies and auth config"
### Phase 2: Create User Model and Registration Endpoint
- Entry: Phase 1 committed
- Tasks: Create src/models/User.ts with email/password fields. Add POST /api/auth/register in src/routes/auth.ts. Hash passwords with bcryptjs before storage.
- Exit: Registration endpoint returns 201 with user object (no password). Existing routes unaffected.
- Commit message: "feat(auth): add User model and registration endpoint"
### Phase 3: Implement Login and Token Issuance
- Entry: Phase 2 committed
- Tasks: Add POST /api/auth/login. Validate credentials against stored hash. Return signed JWT with 24h expiry. (Note: for production systems, pair long-lived access tokens with refresh token rotation and a revocation mechanism.)
- Exit: Login returns 200 with token for valid credentials, 401 for invalid. Token contains userId and email claims.
- Commit message: "feat(auth): implement login and JWT issuance"
### Phase 4: Add Auth Middleware and Protect Routes
- Entry: Phase 3 committed
- Tasks: Create src/middleware/requireAuth.ts that verifies JWT from Authorization header. Apply to all /api/users routes.
- Exit: Protected routes return 401 without valid token, 200 with valid token. Public routes unchanged.
- Commit message: "feat(auth): add auth middleware and protect user routes"
### Phase 5: Integration Tests
- Entry: Phase 4 committed
- Tasks: Write tests for register, login, protected access, and rejected access in tests/auth.test.ts.
- Exit: All tests pass. No existing tests broken.
- Commit message: "test(auth): add authentication integration tests"
Notice the level of detail here. "Add authentication" is merely a task description; it's not a contract. This phase plan, however, *is* a contract, one that Claude Code can execute against and that you, as the developer, can rigorously verify at each defined boundary. It's worth pointing out that these commit messages follow the Conventional Commits specification, which further streamlines integration into a typical development workflow.
Your Safety Net: Git Checkpoints
This can't be stressed enough: every single phase boundary must culminate in a Git commit. Not just when you feel like things are "done," and certainly not only at the end of the entire session. Commit after *each* phase. First, set up your feature branch, as you normally would:# Before starting, create a feature branch
git checkout -b feat/add-auth
# After completing Phase 1, instruct Claude Code:
Once Phase 1 wraps up, your instruction to Claude Code should explicitly trigger that commit. This creates a rock-solid rollback target, guaranteeing you a stable state if the next phase goes sideways.Working with advanced AI coding assistants like Claude means you're not just writing prompts; you're essentially programming a collaborative partner. This isn't about throwing code at a black box and hoping for the best. It's about establishing clear guardrails, providing consistent context, and knowing when to reset. If you're serious about integrating these tools into your workflow, you need a strategy that goes beyond simple instruction sets.
Establishing Your AI's Persistent Memory: CLAUDE.md
Here's the thing: large language models operate within a context window, which is great for short bursts but terrible for long-term project memory. That's why `CLAUDE.md` is, without exaggeration, your most important production file when pairing with Claude Code. It's the mechanism for persistent memory that sticks around across sessions. Claude automatically reads it every time you kick off a session, making it foundational for code quality and production stability.What Belongs in CLAUDE.md
Think of `CLAUDE.md` as a living documentation for your AI, distilled to its most potent form. It should contain only what's absolutely essential for Claude to operate effectively and consistently. This includes a concise project architecture overview – don't go exhaustive here, just the high-level stuff. Critically, it needs coding conventions you *actually* enforce, key file paths, and explicit anti-patterns you want to avoid. Consider this structure:# CLAUDE.md
## Project Overview
Express API with PostgreSQL. TypeScript strict mode. Node 20.
Monolith structure, migrating toward modular services.
## Architecture
- src/routes/ — Route handlers, thin layer, delegate to services
- src/services/ — Business logic, one file per domain entity
- src/models/ — Sequelize models, no raw SQL outside migrations
- src/middleware/ — Express middleware, each file exports one function
- tests/ — Jest integration tests, mirror src/ structure
## Coding Conventions
- All functions must have explicit return types (no inferred returns)
- Use named exports, never default exports
- Error handling: throw AppError from src/utils/errors.ts, never raw Error
- Async routes must be wrapped with asyncHandler middleware (see src/middleware/asyncHandler.ts or a library such as `express-async-errors`)
- Database access only through model methods, never direct queries in routes
## Constraints
- NEVER modify src/config/database.ts — managed by platform team
- NEVER add dependencies without listing them in the commit message
- Always run `npm run lint && npm run test` before suggesting a commit
- Use existing patterns in adjacent files as templates for new code
## Current Task Context
<!-- DO NOT POPULATE — inject via session prompt only -->
Every single line here serves a distinct purpose. The architecture section prevents the AI from making structural assumptions or misplacing files. Explicit conventions, like requiring explicit return types or using named exports, prevent style drift and ensure the generated code aligns with your team's standards. Constraints act as hard boundaries, blocking dangerous modifications, like touching `src/config/database.ts`. The `Current Task Context` section is a clever placeholder, signaling that per-task instructions belong in the prompt, not here, using a comment to avoid stale, conflicting instructions.
What to Keep Out
Just as important as what goes in `CLAUDE.md` is what absolutely *doesn't*. Overloading this file sabotages its effectiveness. There are three clear culprits: * **Verbose documentation dumps:** Copying an entire API reference or a sprawling design doc is counterproductive. It wastes precious context tokens on information Claude likely doesn't need, drowning out the truly important constraints. The goal is concise guidance, not an encyclopedia. * **Generic style guides:** "Write clean code" or "Follow SOLID principles" might sound good, but for an AI, they're too abstract to be useful. These vague directives give Claude too much room to interpret, often leading to code that's technically "clean" but not aligned with your specific patterns. Be concrete: "All functions must have explicit return types," rather than "Ensure type safety." * **Per-task information:** Requirements that change with every task or feature don't belong here. Embedding current instructions directly into `CLAUDE.md` means you'll be constantly editing it, risking stale information that conflicts with your fresh prompts. Keep task-specific details in the session prompt.Monorepos and Layered Guidelines
For those working in monorepo environments, Claude Code offers a smart approach to `CLAUDE.md` resolution. It handles these files hierarchically. A `CLAUDE.md` at the root of your project applies across the entire codebase. Subdirectory `CLAUDE.md` files then add specific instructions for individual packages or modules. It’s a powerful way to manage complex guidelines without unnecessary duplication. (Just a heads up: Always verify the precise merge behavior in the current Claude Code documentation, as how these rules layer might evolve between versions.) Think of it like this:my-monorepo/
├── CLAUDE.md # Root: shared conventions
├── packages/
│ ├── api/
│ │ ├── CLAUDE.md # API-specific: "Use Express patterns.
│ │ │ # All routes must use asyncHandler.
│ │ │ # Test with supertest."
│ │ └── src/
│ ├── web/
│ │ ├── CLAUDE.md # Web-specific: "React 18 with TypeScript.
│ │ │ # Use function components only.
│ │ │ # State management via Zustand.
│ │ │ # Never use CSS-in-JS, use CSS modules."
│ │ └── src/
│ └── shared/
│ ├── CLAUDE.md # Shared: "This package has zero runtime
│ │ # dependencies. All exports must be
│ │ # tree-shakeable. No side effects."
│ └── src/
Your root `CLAUDE.md` lays down the universal laws – TypeScript strict mode, named exports, global error handling. Then, each package gets its own specific `CLAUDE.md` to define constraints unique to its tech stack and boundaries. This prevents, say, your API package's Express-specific conventions from bleeding into your web package's React context. It's about giving Claude just the right amount of information, precisely when it needs it.
Managing Iterations: When to Reset or Roll Back
Even with the best `CLAUDE.md` file, AI-driven development isn't always linear. You'll need clear checkpoints and a strategy for when things go off the rails. This isn't just about code; it's about managing your interaction with the AI.Confirming Your Changes (and Preventing Regressions)
Before any major step, you need verification. This workflow isn't just about getting code; it's about preventing an unknown state. After a development phase, don't just blindly commit. First, you should always `npm run build` to ensure everything compiles correctly. Then, run `git status` to get a clear picture of what files have changed. You need to verify those changes against what you intended.Phase 1 is complete. Before moving to Phase 2:
1. Run `npm run build` to verify compilation
2. Run `git status` so I can review what has changed
3. Wait for my approval before staging or committing
Only then should you selectively stage the files you *intended* to change. It's easy to accidentally include unintended files, which can cause subtle bugs or merge conflicts down the line. If you spot anything outside your intended scope, use `git restore --staged Stage only the files listed in Phase 1's scope, then commit with message "feat(auth): add JWT dependencies and auth config" and show me the output of `git log --oneline -1` to confirm.
Knowing When to Hit the Reset Button
Sometimes, a session with an AI goes sideways. It loses context, gets stuck in a loop, or simply misunderstands core requirements. In these situations, attempting to repair it often wastes more time than starting fresh. You need clear criteria for when to terminate a session: 1. **Context Loss:** If Claude has clearly lost track of more than one core requirement from your original phase plan, it's time to reset. Trying to re-explain everything mid-session rarely works well. 2. **Looping Behavior:** If Claude gets stuck on the same problem, attempting the same non-solution three or more times without progress, pull the plug. It's not going to suddenly have an epiphany. When you do decide to restart, transferring context efficiently is paramount. Copy your phase plan into the new session, marking all completed phases as done. Crucially, add a concise, one-paragraph summary of the current state of your project.Phases 1-3 are complete and committed. Starting fresh for Phase 4.
Current state: User model, registration, and login endpoints are working.
The JWT token includes userId and email claims with 24h expiry.
Branch: feat/add-auth, latest commit: a3f7b2d (replace with the actual output of `git log --oneline -1`).
Proceed with Phase 4 as defined in the plan.
This gives the new session a clean slate, providing all the necessary information without the accumulated "noise" that bogged down the previous session. It's a surgical approach to context management.