Wednesday, 13 August 2014

Guideline 9. Keep functions short and cohesive


A function should achieve its purpose by means of a cohesive structure. A function can be primarily structured in one of these ways:
  • A sequence of actions.
  • A condition.
  • A repetition of certain actions.
Do not combine these structures in a way which makes it hard to follow the code flow throughout the function, or that it makes it difficult to test.

A function should have a small cyclomatic complexity.


One function, one purpose

A function should have one specific purpose. It should exist for one single reason, and that is to achieve its purpose.

As a writer of a function body, you decide what strategy to use for the function to achieve its goal. It should be clear and simple. The reader of a function body should be able to follow your steps and understand what's going on.

A sequence, a conditional action, a loop

In one of the programming classes I took at the university, the teacher asked us how a function which accomplished a certain goal should be like. We hesitated, nobody spoke. Then the teacher said: "Come on, it isn't that hard. It must be either a sequence, a condition, or a loop. So which one is it?".

I don't remember which one it was. But that sentence stuck to my mind. A sequence, a conditional action or a loop. The structure of a function is always one of these.

But, you say, these control flow structures can be combined to form more sophisticated units. For example, a function may be a sequence of actions, with some of them being executed only under some conditions, and some other being loops. A certain degree of composition may be understandable and practical, but if you're not very strict soon you'll find yourself writing 50-line functions which lack any structure.

I've seen functions of more than 200 lines in production code, several times and in different projects. I've seen functions which should never have been written. Functions which never seemed to end their work, and which needed comments playing the role of section headers, as in: "// 4. Get data from the database", "// 4.1 Check database connection"...

Cyclomatic complexity

Software scientists have long attempted to measure the complexity of code. Cyclomatic complexity measures the number of linearly independent paths through a program's source code. I've never been proficient in actually calculating this metric - nowadays, automatic tools will do that job better than you. But in short it expresses that a function (a software element, in general) is harder to test the higher its cyclomatic complexity is.

If your function is a condition, you may need to test it in two different scenarios (paths): when the condition is true, and when it is false. If it contains four conditions, suddenly the scenarios become 2^4, that is, 16. If it is a combination of nested conditions with some loops spread out here and there, it will be a testing and maintenance nightmare. Some of these long and overly functions contain bugs ever since they begin to exist. Nobody has found them yet, because they are hidden in a path which requires five different conditions to be met. But they will appear, eventually.

"But if reality is complex, if requirements are complex... maybe all this complexity is necessary, and there's simply no way to escape from it - they asked me to contemplate all those different cases!", you say. You have your point. But at least remember that you should divide that compexity among different, simple software elements. The function is the smallest software element: you can't write unit tests for something smaller than a function, for example. Complex parts deserve to be in their own function, with clearly defined semantics and which can be tested independently.


[McCONNELL 2004] This book has an interesting section about complexity in software, 34.1: "Conquer Complexity" (pages 837-839), inside its valuable chapter 34: "Themes in Software Craftmanship".