Wednesday, 13 August 2014

Guideline 9. Keep functions short and cohesive

Guideline


A function should achieve its purpose by means of a cohesive structure. A function can be primarily structured in one of these ways:
  • A sequence of actions.
  • A condition.
  • A repetition of certain actions.
Do not combine these structures in a way which makes it hard to follow the code flow throughout the function, or that it makes it difficult to test.

A function should have a small cyclomatic complexity.

Discussion

One function, one purpose


A function should have one specific purpose. It should exist for one single reason, and that is to achieve its purpose.

As a writer of a function body, you decide what strategy to use for the function to achieve its goal. It should be clear and simple. The reader of a function body should be able to follow your steps and understand what's going on.

A sequence, a conditional action, a loop


In one of the programming classes I took at the university, the teacher asked us how a function which accomplished a certain goal should be like. We hesitated, nobody spoke. Then the teacher said: "Come on, it isn't that hard. It must be either a sequence, a condition, or a loop. So which one is it?".

I don't remember which one it was. But that sentence stuck to my mind. A sequence, a conditional action or a loop. The structure of a function is always one of these.

But, you say, these control flow structures can be combined to form more sophisticated units. For example, a function may be a sequence of actions, with some of them being executed only under some conditions, and some other being loops. A certain degree of composition may be understandable and practical, but if you're not very strict soon you'll find yourself writing 50-line functions which lack any structure.

I've seen functions of more than 200 lines in production code, several times and in different projects. I've seen functions which should never have been written. Functions which never seemed to end their work, and which needed comments playing the role of section headers, as in: "// 4. Get data from the database", "// 4.1 Check database connection"...

Cyclomatic complexity

Software scientists have long attempted to measure the complexity of code. Cyclomatic complexity measures the number of linearly independent paths through a program's source code. I've never been proficient in actually calculating this metric - nowadays, automatic tools will do that job better than you. But in short it expresses that a function (a software element, in general) is harder to test the higher its cyclomatic complexity is.

If your function is a condition, you may need to test it in two different scenarios (paths): when the condition is true, and when it is false. If it contains four conditions, suddenly the scenarios become 2^4, that is, 16. If it is a combination of nested conditions with some loops spread out here and there, it will be a testing and maintenance nightmare. Some of these long and overly functions contain bugs ever since they begin to exist. Nobody has found them yet, because they are hidden in a path which requires five different conditions to be met. But they will appear, eventually.

"But if reality is complex, if requirements are complex... maybe all this complexity is necessary, and there's simply no way to escape from it - they asked me to contemplate all those different cases!", you say. You have your point. But at least remember that you should divide that compexity among different, simple software elements. The function is the smallest software element: you can't write unit tests for something smaller than a function, for example. Complex parts deserve to be in their own function, with clearly defined semantics and which can be tested independently.

Bibliography


[McCONNELL 2004] This book has an interesting section about complexity in software, 34.1: "Conquer Complexity" (pages 837-839), inside its valuable chapter 34: "Themes in Software Craftmanship".

Saturday, 28 June 2014

Guideline 8. Define variables as close as possible to where they are used

Guideline

Define variables as close as possible to their first use. Prefer variables with the most local scope as possible.

Complementary guidelines:

Do not reuse a variable for two totally different things inside the same function body, just to "save some space". It kills readability and it is very error prone.

Do not use exactly the same name for two variables with inner block scopes inside the same function body. It may hurt readability and it is error prone.

Discussion


Context


In previous guidelines we have been discussing the innermost pieces of software construction in C++: function bodies. All the guidelines we have seen make sense within that scope. Later we'll see guidelines which refer to the structures to which functions belong, that is, the classes. Even later we may consider how a number of classes are organized in a software project.

Guideline #7 instructed you to initialize all variables at the point of their definition, i.e., variables should begin to exist with a known value, to prevent undefined behaviour. The present guideline is a further recommendation about when should this (the variable definition and initialization) happen.

But what is a variable definition, anyway? Is it the same as a variable declaration?

Variable definition and declaration


This great answer on the software Q&A site Stack Overflow clarifies the meanings of variable declaration and definition in C++. Quoting:

A declaration introduces an identifier and describes its type, be it a type, object, or function. A declaration iswhat the compiler needs to accept references to that identifier. These are declarations:
extern int bar;
extern int g(int, int);
double f(int, double); // extern can be omitted for function declarations
class foo; // no extern allowed for class declarations

A definition actually instantiates/implements this identifier. It's what the linker needs in order to link references to those entities. These are definitions corresponding to the above declarations:
int bar;
int g(int lhs, int rhs) {return lhs*rhs;}
double f(int i, double d) {return i+d;}
class foo {};
A definition can be used in the place of a declaration. (...)

I strongly recommend that you read the complete answer.

In short we can say that inside a function body, at run time, a variable begins to exist once the code execution reaches the point where it is defined - not the point where it is only declared.

It benefits the readability of your code that you keep functions short and well structured. To achieve this, It is a key factor that every concept is limited to the exact scope where it belongs to. Define, initialize and use each variable exactly where it is needed, not any earlier. By following this simple guideline you will write code which is more readable, contains less defects, and is easier to debug.

Refactoring a function body is one of the most common tasks in software engineering. What was before in one function could be separated into several functions in the future. If you limit the scope of each variable to exactly the needed one, you will make this process much easier.

Complementary guidelines


If you have learnt the old C tradition of declaring all variables at the top of a funcion body, maybe it's time to consider abandoning it, in favour of a more compact and readable style.

If you have seen code in which a variable of a certain type is reused inside the function body for two or three different things, just to save some bytes, please forget about it. Use each variable exactly once - don't define variables which are never used, and don't use the same variable for several different things.

A local variable has only the scope of the braces which contain it. Because of this, a function may contain some inner block of code in which you define a variable, say, int i = 0;. Later in the same function you may have a later block of code in which you will be allowed to create another variable with the same type and name: int i = 0;. These are different variables, which totally disjoint scopes. This is perfectly correct. However, prefer not doing it. It is error prone and it hurts readability to have two different variables with the same name so close to each other. In fact, "i" is not a very expressive name - you should be able to come up with better names for both variables, names which say more about their meaning.

Bibliography

Books

[McCONNELL 2004] This book discusses variable initialization in section 10.3: "Guidelines for initializing variables" (page 242). Scope is discussed in section 10.4: "Scope" (page 244). And the title of section 10.8 speaks for itself: "Using each variable for exactly one purpose" (page 255).

[MEYERS 1998-1] Scott Meyers advises us to "Postpone variable definitions as long as possible" in Item 32 of this book (page 135).

[SUTTER-ALEXANDRESCU 2004] The C++ Coding Standards suggested in this book include Item 18, "Declare variables as locally as possible" (page 35).

Web references


Friday, 23 May 2014

Guideline 7. Always initialize variables

Guideline


Variables should be initialized with a known and meaningful value as soon as they are created. A variable should never begin to exist without having a known value.

Discussion


Initialize all local variables


There are three things you need to learn how to design well: functions, classes and projects. Initializing all variables is an issue in all of them.

  • In a function body, you need to initialize all local variables.
  • In a class, you need to initialize all member variables (static or nonstatic).
  • In a project, you need to initialize all global variables.

You may have noticed that these guidelines have been focusing in the function scope. By now, I will keep this focus and I will discuss only local variables in this guideline. I expect to discuss the design of classes and projects in future guidelines. However, it's worth noting that all variables need to be initialized, without exception.

The most common sign I've found that someone is a novice C++ programmer are uninitialized variables. You look at a function body and you see the definitions of many uninitialized variables, usually grouped at the beginning of the function. Sometimes they are assigned to later, sometimes they aren't used at all, and sometimes their uninitialized value is used, causing undefined behaviour.

This is bad. Very bad. It hurts readability and it is error prone. The risk of using the value of an uninitialized variable is never worth taking. Don't do it. Always initialize variables.

Initialization should be simple. If you need to do complicated things like asking the user for a value, you should do that in a separate operation and, if that succeeds, assign its result to your variable. Don't use lengthy, complex or difficult to predict operations in the variable initialization itself.

Example 1

Don't

int main()
{
   int i;
   double d;
}

The variables i and d have now unspecified values. Your program does not have undefined behaviour, though, because it doesn't read their values at any time. It is written in bad style, but it has a well defined behaviour: it runs without producing any external effect.

The main function is special in that if you don't write a return statement for it, it implicitly returns 0 when it returns. Your program takes advantage of this and, after running without any external effect, it ends with a return value from main of 0.

Let's see one variation in which the value of uninitialized variables is read:

#include <iostream>
using std::cout;

int main()
{
   int i;
   double d;
   cout << "i: " << i << " ";
   cout << "d: " << d << " ";
}

Great. Now your program has undefined behaviour. Are you satisfied?

You'd rarely write code such as this. But in some cases the use of an uninitialized variable may not be so obvious. For example, you could write f(d); in main, with f being defined in some included header file. At first sight you wouldn't know if you're getting into trouble by passing a still uninitialized variable to f, because you don't know if f uses the value of that variable. However, if you don't like getting into trouble, you'll avoid passing uninitialized variables to functions - and the best way to do that is by not having uninitialized variables in the first place.

Do

#include <iostream>
using std::cout;

int main()
{
   int i = 0;
   double d = 0.0;
   cout << "i: " << i << " ";
   cout << "d: " << d << " ";
}

I have modified your program so that it initializes i with an integer value, and d with a floating point value. Now you have a correct C++ program which sends the following to the standard output:

i: 0 d: 0

Isn't it beautiful?

Bibliography

See the page Bibliography for details of the referenced materials.

[McCONNELL 2004] This book discusses the initialization of variables in Section 10.3, "Guidelines for initializing variables".

[SUTTER-ALEXANDRESCU 2004] This book includes this recommendation as its rule 19, "Always initialize variables" (page 36).