The Worst Technical Debate in Software
Every engineering team eventually has it. The legacy codebase is slowing them down. Features take three times longer than they should. Every bug fix introduces two new bugs. A developer describes the existing code as "unmaintainable." Someone suggests a rewrite.
The rewrite debate is dangerous because both sides are usually right about the symptoms and wrong about the cure.
Why Rewrites Fail
Joel Spolsky called it the single worst strategic mistake a software company can make. He wasn't wrong about the mechanism, even if the conclusion is too absolute.
Rewrites fail because:
You don't know what the existing code actually does. A messy codebase accumulates edge cases, business rules, and workarounds that aren't documented anywhere. The original developer is gone. Users depend on behavior that wasn't designed - it just emerged. A rewrite that doesn't replicate this behavior will introduce "bugs" that are actually missing features.
Rewrites take 3–5x longer than estimated. Every rewrite we've inherited started with a 3-month estimate. None took less than 8 months. One took 26 months and was eventually abandoned - the startup ran out of runway while users on the old system were unable to upgrade.
You're building the old product, not the better one. By the time a rewrite is complete, requirements have changed, competitors have shipped new features, and the rewritten version is already partially obsolete.
Why Incremental Refactoring Fails
The alternative - refactor incrementally, never rewrite - also fails in certain conditions.
Some codebases have a bad foundation that incremental changes can't fix. If the data model is fundamentally wrong, all the incremental refactoring in the world won't address the underlying architectural problem. You end up polishing code that still can't be built upon.
Refactoring accrues technical debt on the process, not just the code. Teams that are "always refactoring" often have no clear endpoint. Six months in, the codebase is 20% better and 100% more complex (new patterns mixed with old patterns), and team morale is lower than if they'd made a clean decision.
Refactoring is invisible to stakeholders. "We're improving the codebase" is a hard thing to justify when the business wants new features. Rewrites, paradoxically, are sometimes easier to communicate: "We're building a new version that will let us ship features 3x faster."
The Framework: Four Questions
We ask four questions before making the rewrite/refactor call:
1. Is the problem the code or the design?
Bad code implementing a good design can be refactored. Bad code implementing a bad design almost always needs to be rewritten.
A "bad design" means: the data model doesn't reflect business reality, the architecture can't support current requirements (e.g., a single-server architecture that can't scale), or the technology choice is genuinely obsolete (not just unfashionable).
If the app was built on WordPress and it's now a complex multi-tenant SaaS with custom billing - that's a design problem. If the app is architecturally sound but the code is messy - that's a code problem.
2. What is the blast radius of the existing system's problems?
Does the existing codebase slow down one feature area or all feature areas? If the payments module is spaghetti but everything else is manageable, rewrite the payments module. Don't rewrite the whole app.
Score each major module: how much technical debt, how often does it cause issues, how central is it to new development? Rewrite only the modules that score highest on all three.
3. Do you have test coverage?
Refactoring without test coverage is replacing known behaviour with unknown behaviour. If you don't have tests, write characterisation tests first - tests that capture the existing behaviour (including the bugs and quirks). Then refactor.
A rewrite without test coverage of the existing system means you have no way to verify you've replicated all the behaviour users depend on.
4. What is the runway?
Rewrites require time without visible user-facing progress. A startup with 6 months of runway cannot afford a rewrite - even if the rewrite is the right long-term call. A well-funded company with 3 years of runway and a mature user base might be able to absorb it.
If runway is short, refactor the most painful parts and defer the rest. Ship features that extend the runway, then reconsider.
The Strangler Fig Pattern
The best rewrites we've executed used the Strangler Fig pattern (named by Martin Fowler). The approach:
1. Build the new system alongside the old one
2. Redirect a small slice of functionality to the new system (e.g., new user registrations)
3. Gradually increase the scope of what the new system handles
4. When the new system handles everything, retire the old one
This approach has several advantages:
- Users never experience a big-bang cutover
- You can validate the new system with real traffic before cutting over critical paths
- If the rewrite hits problems, you still have the old system running
- The team stays motivated - they're shipping to production throughout
We used this pattern for a healthcare platform migration - 340+ microservices moved from on-premise to AWS over 6 months, zero downtime, without any user-facing changes during the process.
Warning Signs That Your Refactor Has Become a Rewrite
- The "refactoring" PR touches 40+ files with no new features
- You're changing the data model and migrating existing data
- The branch has been open for more than 3 weeks
- Other feature work is blocked waiting for the refactor to land
- You've introduced a new framework alongside the old one
These are signs you've crossed the refactor threshold into rewrite territory - without the planning, scoping, or stakeholder alignment that a rewrite requires.
Our Recommendation
Default to refactoring. Rewrites are almost always more expensive and slower than the estimates going in. But if the answer to Question 1 is "the design is wrong" and the answer to Question 4 is "we have sufficient runway," a targeted rewrite of the affected modules - using the Strangler Fig pattern - is often the right call.
The most dangerous position: a team that's been "refactoring" for 12 months with no clear end in sight and no user-visible progress. Make a decision, commit to it, and move.