Refactor or start over?

The refactor vs rewrite question

“Why don’t we just scrap all of it and start over?  This legacy code blows.” – pretty much every developer at least once.

RefactorvRewrite.jpg
Credit to bonkersworld.net

Has this question ever come up from one of the engineers on your team?  Many times, it is brought up by one of your best engineers.  It will certainly be brought up by most of your younger engineers.  It’s a fair question.  Especially from a developer’s perspective.  Let’s look at why:

  • Legacy code is notoriously difficult to learn
    • It makes ramping up new developers difficult
  • Legacy code is hard to maintain
    • Very often the devs who wrote it originally are no longer there
    • It’s very rare that you find good comments in legacy code
  • Legacy code is often a mess
    • The comment thing again
    • Trying to wade through functions hundreds of lines long that don’t seem to have much of a point
    • It rarely follows the code standards the current teams follow
  • A lot of legacy code feels hopelessly outdated
    • This is where you find things like custom string types and pointers that aren’t thread safe
    • After the first forays into legacy code, many younger devs will come out of it as if they just walked in on their parents having sex –  baffled, disgusted and with a tinge of confused wonder.

Were all past devs really that bad?

Having been a developer for many years it’s hard to miss the pervasive attitude that all developers not on the current team pretty much suck.  I contributed to that attitude when I was the one fixing ‘other people’s messes’.  It’s also really easy to point at old code and blame all of your current problems on it.  It’s certainly easier to point the finger at someone no longer there than to accept responsibility that you are your team are probably the ones at fault.  You and your team are certainly the ones that have to deal with it.

RefactorvRewriteDilbert

Try to nip this behavior at the bud when you can.  All developers that came before you don’t necessarily suck.  Chances are they were put in the same situations where they had to sacrifice the right way of doing something to deliver on a deadline.  Or, the technologies that you are so familiar with now were new to them at the time.  One of the best ways to nip this in the bud is to say something along the lines of: ‘one day, the code you’re writing now will be legacy code too!  Remember that some snotty nose kid will be making the same comments about you.”

Another reason to change this behavior is to eliminate a culture of finger pointing.  You want your devs to be taking responsibility for the whole code base.  They might not have made the messes that exist in the legacy code but they own it now.  Blaming someone else doesn’t fix these problems.

How to answer the refactor vs rewrite question

Once the team accepts that they own the mess, the more often you will hear the question.  Why don’t we just scrap it all and start over?  This question is a trap.  Be very careful how you answer it.

Why not give in to the devs and start over?  There are normally a ton of business reasons why you can’t do this.  You have to get the next product out the door so you don’t have the runway to start over.  Engineers understand this but the ivory tower dev mind in each of us has a hard time accepting the business reason alone.  The question becomes, if you did have the runway to scrap everything and start over, would it be a good idea then?

The first time I managed a team of developers, I wanted to answer this question with more than just the business reasons.  I knew that scrapping all previous code seemed like a bad idea but I struggled to articulate why.  So I went on the Google and ended up consulting the seer of Stack Exchange, Joel Spolsky.  If you haven’t checked out JoelOnSoftware – https://www.joelonsoftware.com/ , start reading his blog now.  His posts are insightful and incredibly well written.

Joel had the best answer that I’ve heard to this question and he used NetScape as an example.  His answer: the biggest benefit of legacy code is that it is production tested.  In many cases, it has thousands, if not millions of hours of customers pounding on that software.  All of those crazy little subroutines that you found that you have no idea what they do were built to fix a defect or to accommodate for a workflow that you would never have thought of in a million years.  What you naively think looks like a legacy code mess is actually a fount of knowledge gathered from many user years of production testing.

Joel then goes on to tell the story of Netscape.  Netscape was the top web browser in the nineties and early two thousands.  Netscape made the wrong choice.  They decided that with everything they had learned in their first versions of Netscape, they could start over and make a way better browser.  This was a disastrous decision.  They lost all of that production knowledge and eventually lost the browser wars because of it.  They started re-struggling with a bunch of problems that they had already solved in their previous versions and they never caught up to their competitors again.

The exceptions to the rule

It’s always ok to rewrite prototype code.  Prototype code doesn’t have any production knowledge behind it in most cases (at least it shouldn’t).  The purpose of prototype code is to research topics before they become production code.

In some cases it is also ok to rewrite modules of legacy code that can’t be optimized.  In most cases, these are prototypes or demos that somehow made it into the production code base that had no business being in there.  Be careful that this doesn’t become a convenient excuse though.  Devs will start calling all code prototype code if it means that they get to rewrite it.

The general rule of thumb for legacy code – always try to refactor before you rewrite.  If nothing else, this will force you and your team to try to understand what all of that production testing has to teach before you start tearing it down.