Clean Code

Sooner or later, every programmer encounters the term “clean code” in their career. And although most initially imagine it’s just “code aesthetics” that can be achieved with a formatter, in reality there is much more to it than that.

In this post we’ll examine what exactly it is, why it matters, and some of the fundamental ideas that can help us write clean code.

What is Clean Code?

If you google the term, you’ll most likely find references to the famous book “Clean Code” by Robert C. Martin, which is probably responsible for popularizing the term in the early 2000s. You can read the book if you want to take a deep dive into many of the ideas behind “clean code”, but here’s a simplified definition that I think captures the gist of it:

Clean Code is code that is easy to understand, maintain, and extend

Why does it matter?

For most programmers, the importance of writing clean code becomes self-evident soon after inheriting a legacy codebase that is a nightmare to work with. This realization comes from a simple fact: for all but the most trivial of projects, we read code much more than we write it. Because of this, it makes sense to spend additional time structuring our code to make it easier to understand by future maintainers (ourselves included!)

Writing clean code also matters because it can help reduce defect density: when code is simple and straightforward, it’s easier for our peers (and even ourselves) to spot mistakes in our code. This can mean a huge difference in quality and productivity for a project.

To better appreciate these ideas, let’s analyze a small function that violates several clean code principles and see how we can improve it.

What’s wrong with this function?

The following function determines if its argument is a valid IP address. Do you see anything wrong with it?

def isIPV4(txt):
    ints = txt.split('.')
    cnt = 0
    for i in range(0, len(ints)):
        try:
            v = int(ints[i])
            if v < 0 or v > 255:
                return False
        except Exception:
            return False
        cnt += 1
    if cnt == 4:
        return True
    return False

Idiomatic Code

The first thing to notice is that this function is not very Pythonic. For example, using camel case instead of snake case, or looping through ints by using an explicit index (a common idiom in C or C++ but not in Python).

Another issue is the use of non-standard abbreviations (e.g., cnt, txt), which forces readers to do unnecessary mental mappings. Let’s begin by fixing these minor issues and see the resulting code:

def is_ipv4(text):
    parts = text.split('.')
    count = 0
    for part in parts:
        try:
            v = int(part)
            if v < 0 or v > 255:
                return False
        except Exception:
            return False
        count += 1
    if count == 4:
        return True
    return False

Perhaps not much of a difference, but bear with me: the secret behind improving code in a reliable way (as Martin Fowler has described in his book Refactoring) lies in making many small, almost trivial transformations.

Separation of Concerns

One thing that makes code harder to read is mixing multiple concerns in a single block. This is a common mistake for novice programmers.

In our example, we can see that happening inside the for loop: it’s trying to parse each part into an integer, but it’s also tracking how many parts passed the octet validation. If we separate these two concerns, we can make the code easier to read:

def is_ipv4(text):
    # split into octets and verify that there are exactly four
    parts = text.split('.')
    if len(parts) != 4:
        return False

    # verify that each octet is valid
    for part in parts:
        try:
            v = int(part)
            if v < 0 or v > 255:
                return False
        except Exception:
            return False
    return False

We added some comments to organize our thoughts, but we should be able to get rid of them by writing code that is completely self-explanatory. In most cases, comments should be reserved to explain why we implemented something in a certain way, not what we did (redundant if we’ve picked good names) or how we did it (redundant if our logic is simple and straightforward.)

Expressive Code

Is it easy to immediately see what our function does? Sort of, but we can probably do better. After some examination, we can see that conceptually:

We’re splitting text into pieces,
Checking if there are four pieces, and
Checking if every piece is a valid octet.

Articulating the high-level logic of an algorithm is a very useful practice that can be distilled into the following principle:

Clean Code describes core logic at a single level of abstraction by suppressing detail

If we try to restructure our code to follow this principle, we get the following:

def is_ipv4(text):
    parts = text.split('.')
    if len(parts) != 4:
        return False
    return all(is_octet(part) for part in parts)

This code tells a shorter story and minimizes the details so we can focus on the big picture. This definitely is easier to grasp at a glance. We can even go one step further and simplify the code as follows:

def is_ipv4(text):
    parts = text.split('.')
    return len(parts) == 4 and all(is_octet(part) for part in parts)

This version reads almost like English prose and is quite easy to eyeball for logical errors. At this point, the implementation of is_octet should not really be that important (as long as it’s correct), but let’s add it for the sake of completeness:

def is_octet(part):
    return part.isdigit() and 0 <= int(part) <= 255

def is_ipv4(text):
    parts = text.split('.')
    return len(parts) == 4 and all(is_octet(part) for part in parts)

Notice how we avoided using a try/except block and used a more idiomatic way to test if a value lies in a range, thus simplifying the definition of is_octet substantially.

Some programmers might complain that this practice can pollute the namespace with many little helper functions. Although I find that the improved clarity more than offsets this potential drawback, the objection is easy to fix in Python (and other languages that support nested functions):

def is_ipv4(text):
    def is_octet(part):
        return part.isdigit() and 0 <= int(part) <= 255

    parts = text.split('.')
    return len(parts) == 4 and all(is_octet(part) for part in parts)

Conclusion

The resulting code is simpler and easier to read, but did we just make the code “prettier”? Is that all we gained by doing these refactorings? On the surface it may seem so, but let me state a couple of points on why I think this effort goes beyond making things “prettier.”

Readability improves programmer efficiency

In professional settings, we almost always read code way more than we write code. This is easy to see if you think about the times you’ve had to add a new feature to an existing codebase, and before you could do so you had to understand how the existing code worked.

The additional upfront cost we pay to make things simpler is dwarfed by the time we would otherwise have to spend making sense of tangled or poorly structured code.

Bugs are easier to spot in clean code

Whenever a piece of code is easy to grasp, it’s easier to spot mistakes in its logic. It’s also easier to see a mismatch between implementation and intent (e.g., when the function name says something but the implementation contradicts it.) In code that doesn’t adhere to clean code principles, this is much harder to do.

Programmer frustration decreases with clean code

Our motivation and ability to remain productive in a project are strongly correlated with the amount of frustration we face on a daily basis with the codebase (which is often not intrinsically difficult, it just happens to have been poorly structured.) Clean code practices can reduce this daily frustration, thus increasing our productivity and our happiness in the job.

Did we miss something?

Yes, there is one thing we didn’t mention throughout this exercise: as is explained in any good book on refactoring, having tests is a very important pre-condition to do refactorings with confidence.

In the example we considered, the transformations were more or less straightforward, but can we be sure that we didn’t alter the (presumably correct) workings of the original code?

Ex Machina