Wednesday, 13 August 2014

Guideline 9. Keep functions short and cohesive

Guideline


A function should achieve its purpose by means of a cohesive structure. A function can be primarily structured in one of these ways:
  • A sequence of actions.
  • A condition.
  • A repetition of certain actions.
Do not combine these structures in a way which makes it hard to follow the code flow throughout the function, or that it makes it difficult to test.

A function should have a small cyclomatic complexity.

Discussion

One function, one purpose


A function should have one specific purpose. It should exist for one single reason, and that is to achieve its purpose.

As a writer of a function body, you decide what strategy to use for the function to achieve its goal. It should be clear and simple. The reader of a function body should be able to follow your steps and understand what's going on.

A sequence, a conditional action, a loop


In one of the programming classes I took at the university, the teacher asked us how a function which accomplished a certain goal should be like. We hesitated, nobody spoke. Then the teacher said: "Come on, it isn't that hard. It must be either a sequence, a condition, or a loop. So which one is it?".

I don't remember which one it was. But that sentence stuck to my mind. A sequence, a conditional action or a loop. The structure of a function is always one of these.

But, you say, these control flow structures can be combined to form more sophisticated units. For example, a function may be a sequence of actions, with some of them being executed only under some conditions, and some other being loops. A certain degree of composition may be understandable and practical, but if you're not very strict soon you'll find yourself writing 50-line functions which lack any structure.

I've seen functions of more than 200 lines in production code, several times and in different projects. I've seen functions which should never have been written. Functions which never seemed to end their work, and which needed comments playing the role of section headers, as in: "// 4. Get data from the database", "// 4.1 Check database connection"...

Cyclomatic complexity

Software scientists have long attempted to measure the complexity of code. Cyclomatic complexity measures the number of linearly independent paths through a program's source code. I've never been proficient in actually calculating this metric - nowadays, automatic tools will do that job better than you. But in short it expresses that a function (a software element, in general) is harder to test the higher its cyclomatic complexity is.

If your function is a condition, you may need to test it in two different scenarios (paths): when the condition is true, and when it is false. If it contains four conditions, suddenly the scenarios become 2^4, that is, 16. If it is a combination of nested conditions with some loops spread out here and there, it will be a testing and maintenance nightmare. Some of these long and overly functions contain bugs ever since they begin to exist. Nobody has found them yet, because they are hidden in a path which requires five different conditions to be met. But they will appear, eventually.

"But if reality is complex, if requirements are complex... maybe all this complexity is necessary, and there's simply no way to escape from it - they asked me to contemplate all those different cases!", you say. You have your point. But at least remember that you should divide that compexity among different, simple software elements. The function is the smallest software element: you can't write unit tests for something smaller than a function, for example. Complex parts deserve to be in their own function, with clearly defined semantics and which can be tested independently.

Bibliography


[McCONNELL 2004] This book has an interesting section about complexity in software, 34.1: "Conquer Complexity" (pages 837-839), inside its valuable chapter 34: "Themes in Software Craftmanship".

Saturday, 28 June 2014

Guideline 8. Define variables as close as possible to where they are used

Guideline

Define variables as close as possible to their first use. Prefer variables with the most local scope as possible.

Complementary guidelines:

Do not reuse a variable for two totally different things inside the same function body, just to "save some space". It kills readability and it is very error prone.

Do not use exactly the same name for two variables with inner block scopes inside the same function body. It may hurt readability and it is error prone.

Discussion


Context


In previous guidelines we have been discussing the innermost pieces of software construction in C++: function bodies. All the guidelines we have seen make sense within that scope. Later we'll see guidelines which refer to the structures to which functions belong, that is, the classes. Even later we may consider how a number of classes are organized in a software project.

Guideline #7 instructed you to initialize all variables at the point of their definition, i.e., variables should begin to exist with a known value, to prevent undefined behaviour. The present guideline is a further recommendation about when should this (the variable definition and initialization) happen.

But what is a variable definition, anyway? Is it the same as a variable declaration?

Variable definition and declaration


This great answer on the software Q&A site Stack Overflow clarifies the meanings of variable declaration and definition in C++. Quoting:

A declaration introduces an identifier and describes its type, be it a type, object, or function. A declaration iswhat the compiler needs to accept references to that identifier. These are declarations:
extern int bar;
extern int g(int, int);
double f(int, double); // extern can be omitted for function declarations
class foo; // no extern allowed for class declarations

A definition actually instantiates/implements this identifier. It's what the linker needs in order to link references to those entities. These are definitions corresponding to the above declarations:
int bar;
int g(int lhs, int rhs) {return lhs*rhs;}
double f(int i, double d) {return i+d;}
class foo {};
A definition can be used in the place of a declaration. (...)

I strongly recommend that you read the complete answer.

In short we can say that inside a function body, at run time, a variable begins to exist once the code execution reaches the point where it is defined - not the point where it is only declared.

It benefits the readability of your code that you keep functions short and well structured. To achieve this, It is a key factor that every concept is limited to the exact scope where it belongs to. Define, initialize and use each variable exactly where it is needed, not any earlier. By following this simple guideline you will write code which is more readable, contains less defects, and is easier to debug.

Refactoring a function body is one of the most common tasks in software engineering. What was before in one function could be separated into several functions in the future. If you limit the scope of each variable to exactly the needed one, you will make this process much easier.

Complementary guidelines


If you have learnt the old C tradition of declaring all variables at the top of a funcion body, maybe it's time to consider abandoning it, in favour of a more compact and readable style.

If you have seen code in which a variable of a certain type is reused inside the function body for two or three different things, just to save some bytes, please forget about it. Use each variable exactly once - don't define variables which are never used, and don't use the same variable for several different things.

A local variable has only the scope of the braces which contain it. Because of this, a function may contain some inner block of code in which you define a variable, say, int i = 0;. Later in the same function you may have a later block of code in which you will be allowed to create another variable with the same type and name: int i = 0;. These are different variables, which totally disjoint scopes. This is perfectly correct. However, prefer not doing it. It is error prone and it hurts readability to have two different variables with the same name so close to each other. In fact, "i" is not a very expressive name - you should be able to come up with better names for both variables, names which say more about their meaning.

Bibliography

Books

[McCONNELL 2004] This book discusses variable initialization in section 10.3: "Guidelines for initializing variables" (page 242). Scope is discussed in section 10.4: "Scope" (page 244). And the title of section 10.8 speaks for itself: "Using each variable for exactly one purpose" (page 255).

[MEYERS 1998-1] Scott Meyers advises us to "Postpone variable definitions as long as possible" in Item 32 of this book (page 135).

[SUTTER-ALEXANDRESCU 2004] The C++ Coding Standards suggested in this book include Item 18, "Declare variables as locally as possible" (page 35).

Web references


Friday, 23 May 2014

Guideline 7. Always initialize variables

Guideline


Variables should be initialized with a known and meaningful value as soon as they are created. A variable should never begin to exist without having a known value.

Discussion


Initialize all local variables


There are three things you need to learn how to design well: functions, classes and projects. Initializing all variables is an issue in all of them.

  • In a function body, you need to initialize all local variables.
  • In a class, you need to initialize all member variables (static or nonstatic).
  • In a project, you need to initialize all global variables.

You may have noticed that these guidelines have been focusing in the function scope. By now, I will keep this focus and I will discuss only local variables in this guideline. I expect to discuss the design of classes and projects in future guidelines. However, it's worth noting that all variables need to be initialized, without exception.

The most common sign I've found that someone is a novice C++ programmer are uninitialized variables. You look at a function body and you see the definitions of many uninitialized variables, usually grouped at the beginning of the function. Sometimes they are assigned to later, sometimes they aren't used at all, and sometimes their uninitialized value is used, causing undefined behaviour.

This is bad. Very bad. It hurts readability and it is error prone. The risk of using the value of an uninitialized variable is never worth taking. Don't do it. Always initialize variables.

Initialization should be simple. If you need to do complicated things like asking the user for a value, you should do that in a separate operation and, if that succeeds, assign its result to your variable. Don't use lengthy, complex or difficult to predict operations in the variable initialization itself.

Example 1

Don't

int main()
{
   int i;
   double d;
}

The variables i and d have now unspecified values. Your program does not have undefined behaviour, though, because it doesn't read their values at any time. It is written in bad style, but it has a well defined behaviour: it runs without producing any external effect.

The main function is special in that if you don't write a return statement for it, it implicitly returns 0 when it returns. Your program takes advantage of this and, after running without any external effect, it ends with a return value from main of 0.

Let's see one variation in which the value of uninitialized variables is read:

#include <iostream>
using std::cout;

int main()
{
   int i;
   double d;
   cout << "i: " << i << " ";
   cout << "d: " << d << " ";
}

Great. Now your program has undefined behaviour. Are you satisfied?

You'd rarely write code such as this. But in some cases the use of an uninitialized variable may not be so obvious. For example, you could write f(d); in main, with f being defined in some included header file. At first sight you wouldn't know if you're getting into trouble by passing a still uninitialized variable to f, because you don't know if f uses the value of that variable. However, if you don't like getting into trouble, you'll avoid passing uninitialized variables to functions - and the best way to do that is by not having uninitialized variables in the first place.

Do

#include <iostream>
using std::cout;

int main()
{
   int i = 0;
   double d = 0.0;
   cout << "i: " << i << " ";
   cout << "d: " << d << " ";
}

I have modified your program so that it initializes i with an integer value, and d with a floating point value. Now you have a correct C++ program which sends the following to the standard output:

i: 0 d: 0

Isn't it beautiful?

Bibliography

See the page Bibliography for details of the referenced materials.

[McCONNELL 2004] This book discusses the initialization of variables in Section 10.3, "Guidelines for initializing variables".

[SUTTER-ALEXANDRESCU 2004] This book includes this recommendation as its rule 19, "Always initialize variables" (page 36).

Wednesday, 16 April 2014

Guideline 6. Ensure entry and exit conditions in loops

Guideline


Ensure entry and exit conditions in loops. Write code  that guarantees that loops will be entered and exited correctly. Be aware of the possibility of an infinite loop and prevent it.

Discussion


More about loops


Guideline #5 was focused in discussing how for loops should be. But it also contained some discussion about choosing a loop structure. Let's review it:

A for loop makes sense when you repeat an action for a known number of times, or for all the elements of a known set. It makes sense when you have a loop counter - one variable which is initialized in the initialization, evaluated in the condition, and incremented (or decremented) by a constant amount in the expression. (...)
If you don't know the number of times or the total set of elements for which you're going to do something, and what's essential in your looping activity is a certain condition which will at some point interrupt it, you'd better chose one of the other two looping structures: a while or a do ... while. It is easy to choose between them: the former checks the condition beforehand and thus may never perform the repeated action (if the condition happens to be false at the beginning), and the latter executes the repeated action at least once and then checks the condition to see if it must keep repeating it, or just finish.

After this discussion the guideline set the focus back on the for loop. This is because the use of a specific variable called the loop counter, possibly combined with other loop control variables, introduces extra complexity and is a frequent source of defects in for loops.

However, what if we don't use a for loop in the first place? Isn't there a guideline to follow? Can't things go wrong in a while or a do ... while loop? Don't worry: They can.

There are at least two questions you should always ask yourself while you're writing code for a loop - and ask yourself again after writing it. Here they are:
  1. Will the loop body be entered at least once?
  2. Will the loop be exited?
Let's discuss them separately.

Entering the loop


If your loop is a do .. while, the first question has an obvious answer: yes. It will be entered at least once (as long as the execution point reaches it, of course). That's why you chose that structure in the first place.

If it is a while or a for, you should ask yourself that first question. Review the loop condition thoroughly and think about the possibility of it being false at the very beginning of the loop. In that case, the loop body would never be executed.

Is that possibility perfectly correct in the context of your function, or is it something you should avoid? If the latter is true, then you should write the specific code to handle that.

Exiting the loop


The loop will be exited when the condition is evaluated to false. The condition is evaluated once for each iteration. Will it eventually become false? Are you totally sure that this will happen? Can you guarantee it?

There are at least two possible sources of infinite loops that you should try to avoid. You will see them in the examples below.

Examples


Example 1


This is an infinite loop:

for (unsigned char c = 0 ; c < 256; ++c) {
    std::cout << c;
}

The range of the type unsigned char is from 0 to 255. In C++ addition, unsigned integer types (and unsigned char is an integer type, by the way) wrap around when they exceed their upper numeric limit, so if the value of c is 255, ++c "increments" it to 0. From there it will go up again, if you keep incrementing it, until it reaches 255, from where it will wrap around again to 0, in an infinite loop which would be too long to write completely.

This behaviour of the C++ arithmetic is neither "right" or "wrong" - it's just how the language is specified. You, as a programmer, must be aware of it and write your code so that it does the right things.

Always keep in mind the numeric limits of the types you use. The comparison c < 256 above violates the guideline against implicit type conversions (see Guideline #2), because 256 is obviously not a value of type unsigned char. It will probably cause a compiler warning, such as "Constant out of range in comparison" (and you should fix all compiler warnings), but anyway, you should be aware of the semantics of the types you use in the first place, including their limits.

Following Guideline #5 you will avoid another source of infinite loops: to play with the value of the loop counter inside the loop body. Do not modify the loop counter in the loop body, and it will eventually reach the limit value of the loop condition so that it can become false.

Example 2


int x = 0;
int y = 0;
bool sucess = false;
while (!success) {
    success = do(x, y);
    ++x;
}

Whatever the function do does, and whatever success means, this code doesn't look very safe. You will keep trying forever until something succeeds and do returns true. Are you sure there isn't another terminating condition? If you are, go ahead, but it is not a good idea to have a loop in which termination is not evident.

This looks a bit better:

int x = 0;
int y = 0;
bool sucess = false;
int iterations = 0;
const int max_iterations = 10;
while (!success && iterations < max_iterations ) {
    success = do(x, y);
    ++x;
}

A maximum number of retries, a second boolean variable which will obviously change its value at some point, even a timer... Anything is better than just waiting for a complex operation (creating something, finding something, etc.) to succeed one day. In short, your design should guarantee the termination of the loop, not just suggest it.

But I don't want it to end


If you actually want an infinite loop, because its purpose is to actually run forever or until some external event interrupts it, you should stress it right from the start.

while (true)
{
   ...
}

Using this style you avoid the confusion between the chosen loops designed to last forever, and the more run of the mill loops which are meant to come to an end.

Bibliography

See the page Bibliography for details of the referenced materials.

[McCONNELL 2004] This book discusses loops in Chapter 16: "Loops" (pages 367-389).

[MEYER 1997] One day we'll talk about the design by contract, which is developed in this book like nowhere else I know. Meyer introduces the concepts of loop invariant and loop variant in Chapter 11: "Design by Contract: Building reliable software", section 11.12 "Loop invariants and variants". These concepts are thetheoretic background which allows you to actually prove that your loops will come to an end at their due time.

Monday, 10 March 2014

Welcome to The Deep Blue C++

Waves at Getaria/Gu├ęthary, Aquitaine, France


During my thirteen years as a software engineer, I have learnt quite a lot of things - and I keep learning every day. I learn from practice, from discussions with workmates and from reading good books. I especially learn from mistakes: code that doesn't do what I expect, code that doesn't do anything at all, code that does it but with some occasional crash or two... If you're a programmer, you know what I'm talking about.

The region of my brain which stores this kind of knowledge is starting to be a bit crowded, but it still needs free space to learn new things. And my memory doesn't grow. What if I forget some useful bit of knowledge which I should apply tomorrow?

What I've learnt has substantially changed the way I write code. In my first years as a programmer I tried to avoid mistakes, but I hadn't made enough of them yet to discover the best ways to prevent them. Now I know better. I know there are some things that bring you into trouble very quickly. Other things will silently wait for the worst moment - maybe months or years later, to come to the surface in the form of even more original problems. Disorganized code, uninitialized variables, pointer arithmetics, implicit casts, syntactic overloading, inheritance in which functions hide other functions, too complex expressions, unclear function or class semantics... the list of things which you'd better avoid in the first place has become rather long during the years.

I decided to start a blog where I'll organize and share my experiences and ideas, so that I won't forget them. As a plus, you'll be able to read them and give me your feedback. I will appreciate any comments you leave in this place, and I'm sure I can learn a lot from them!

There are already many C++ blogs, often written by people who know much more about software and C++ than I'll ever do. But I think something is gained from my addition – those great blogs are still there anyway (I link to a few of them in my blog list), and my contribution may add some practical aspects which I feel may be useful and accessible, especially for novice programmers.

We are all beginners, in a way - when we start new projects, we are beginners in that specific new project, and we have the chance to make it good from the start. By following some guidelines, I think we notably increase our chances to do so. There are many things that can be done well from the very first time, and which are not so easy to fix later. The first name of this blog came naturally after this reasoning: "C++ from the start". However, I soon realized I didn’t love it. I was looking for something more expressive.

Software programs have two kinds of audience. They are written for a computer to execute them, but also for people to read and hopefully understand them in the future. The best programs are those which work well in both ways – the machine does its job well, and humans can make that work evolve into something which suits new needs.

We write software to change the world, because programs are run to change something. Their lines will be spelled tomorrow with effects that we cannot fully foresee. We rise waves which will go a long way until they reach a far away shore - one which our eyes may never see.

Let’s take a dive. Welcome to The Deep Blue C++!

Sunday, 9 February 2014

Guideline 5. "for" loops should be simple and well-formed

Guideline

for loops should be well-formed. This means:
  1. They should have a single loop counter, which shall be of integral type or an iterator.
  2. The loop counter should only be modified in the increment expression of the loop.
  3. The loop counter should be incremented or decremented in a constant amount at each loop iteration.
  4. The loop counter should be accompanied in the loop condition, if anything, by boolean variables.
  5. The loop body should be a compound statement, delimited by brackets.
  6. The loop body should not contain labels which are the destination of goto statements.
  7. The loop body should contain at most one break statement.
  8. Use continue with care, at best in the beginning of the loop body, to exclude certain iterations from the action.

Discussion

To loop or not to loop, that is the question

Loops are an important part of functions in any structured programming language. Actually, code has one of the following three structures: a sequence of actions, which are performed in the order you write them; an alternative, in which the action or actions to perform depend on a certain condition; and a repetition, which is expressed in the form of a loop. Hence, you will easily know when you need to loop: when you need to perform something several times, rather than just once. When there is repetition, there is a loop.

A loop - but which one?

Once you have decided that you need to loop, the next question which arises is how - in other words, which loop structure you will use, among the ones supplied by the language.

C++ has three loop instructions: for, while and do ... while. To choose wisely among them, you need to know what their differences are. You surely know the for syntax:

for (initialization condition; expression) statement

The first thing you need to know is that statement should actually be a compound statement, that is, a block of code, comprised between brackets: { ... }, with n single statements inside it. And, yes, each of them in its own line. This is called the body of the loop. In contrast, the initialization, condition and expression together are called the header of the loop.

initialization, in turn, is a single statement, finished with its own semicolon (;) which I did not write separately because, technically, it's part of the initialization itself.

You may of course use the comma operator inside the initialization statement, to combine several actions, but frankly, I don't recommend it. Let's keep things as simple and readable as possible.

So what's going on here?

The initialization is performed. Then, the condition is evaluated. Then, two things can happen. If the condition is false, the code jumps right after the for statement block. If instead it is true, the statement (the loop body) is executed. After it, the incrementing expression is evaluated. And then, again, the condition is evaluated and there you have the loop - this sequence of actions is repeated, until the condition is evaluated to false.

I said incrementing expression. Now we've reached the final answer. A for loop makes sense when you repeat an action for a known number of times, or for all the elements of a known set. It makes sense when you have a loop counter - one variable which is initialized in the initialization, evaluated in the condition, and incremented (or decremented) by a constant amount in the expression. This variable should be of an integer type, or better said, it should not be of a floating point type. Iterators (variables used to iterate through standard containers - one day we'll talk about the Standard Template Library) are also accepted as loop counters.

If you don't know the number of times or the total set of elements for which you're going to do something, and what's essential in your looping activity is a certain condition which will at some point interrupt it, you'd better chose one of the other two looping structures: a while or a do ... while. It is easy to choose between them - the former checks the condition beforehand and thus may never perform the repeated action (if the condition happens to be false at the beginning), and the latter executes the repeated action at least once and then checks the condition to see if it must keep repeating it, or just finish.

A for loop - but how?

Now let's suppose you have already chosen a for loop. You want to do something n times, with n known, or for each element of a certain set which you need to iterate through.

for loop is a practical, readable (once you get used to it) and terse construct, but you need to use it well. Because of its uncommon syntax, using it in a too imaginative way is not a good idea.

All parts of the for loop should be short and readable. Variable names should be chosen to make it easy to understand.

It you follow all the parts of the guideline above, you'll avoid most of the error-prone deviations in for loops. Let's repeat them here:
  • They shall have a single loop counter, which shall be of integral type or an iterator. Using a floating point variable as a loop counter is against the good practices of a C++ programmer. Using more than one counter is overcomplicating things - a while loop should be preferred in such case.
  • The loop counter shall only be modified in the increment expression of the loop. Don't play with the loop counter inside the loop body, because if you do your code will be playing "loop pinball": the reader will have to perform complicated calculations in his or her head just to know what's going on, and the number of times the loop is executed will not be obvious anymore. This is a total no-no.
  • The loop counter shall be incremented or decremented in a constant amount at each loop iteration. Again, don't play strange games with your loop counter. It's a counter, that's all. I'm already letting you increment or decrement it in quantities other than 1 (you should seldom do it, anyway), but incrementing it in a value which is not constant is just too much.
  • The loop counter shall be accompanied in the loop condition, if anything, by boolean variables. Imagine there's a special condition which terminates the loop (for example, finding one element), apart from the loop counter reaching its limit. This is a natural sophistication and understandable of a loop - if this additional variable is boolean. If it is not, the for loop becomes too complicated.
  • The loop body shall be a compound statement, delimited by brackets. I told you before. Never use single statements, omitting the brackets, in structured keywords, i.e. in conditions or loops. Doing so is just asking for errors. You'll have more bugs in your software just because you decided to overlook this advice. Don't do it.
  • The loop body shall not contain labels which are the destination of goto statements. Well, I'm sure you don't use goto that much, but if one day the idea of jumping with a goto to a label which is inside a code block which has an inner scope - you'd better abandon the idea right away. If that inner code block happens to be a loop body, well, you shouldn't even have had the idea in the first place. It's a particularly bad idea. As a bad idea, it's hard to beat, actually. Although we'll have the opportunity to discuss even worse ideas when we talk about syntactic overloading, this is certainly close to the bottom.
  • The loop body should contain at most one break statement. This is a mechanism to interrupt the loop early if a certain special condition is met in the middle of the loop body. For example, you were searching for something and you found it. Since this keyword adds complexity to the code flow, it's best to only use it once, and it's certainly best to avoid it altogether.
  • Use continue with care, at best in the beginning of the loop body, to exclude certain iterations from the repeated action. Again, this rule is aimed not to overcomplicate the flow of your code.
All these rules can be summarized in the following sentence: "A loop shouln't be complicated". But if for a good reason it needs to be so, then it shouldn't be a for loop. When there is complexity regarding the exit condition, or the initialization is not that simple, or the increment expression needs to be something more than just incrementing or decrementing a variable... Then change your loop to a while or a do ... while, which are inherently more readable and less error prone in these more difficult cases.

Bibliography

See the page Bibliography for details of the referenced materials.

[McCONNELL 2004] This book discusses loops in Chapter 16: "Loops" (pages 367-389).

Sunday, 26 January 2014

Guideline 4. Avoid fall-through in nonempty cases of "switch" statements

Guideline

Each case label in a switch statement shall be followed by one of the following:

  • A set of one or more statements, each in its own line, terminated by a break instruction.
  • Another case label.
  • A default label.
We may call "empty case" a case label which is followed by either another case label or  default. Empty here does not mean, of course, that no action is executed if the code flows to that label; it is just that the action executed is shared between several labels.

A "nonempty" case is one which is followed directly by a set of one or more statements. The guideline states that these statements shall always be terminated by a break instruction. These statements shall not lead to a fall through another case or default label. This is a legal possibility, of course, but this is why I am writing this guideline in the first place - if it weren't legal, the compiler would take care of it!

Discussion

A switch statement is a controlled jump. Every time the code execution encounters the keyword switch, it will jump to a different place. The question is where.

The answer is:

  • The execution will jump to the line after the case which is equal to the value of the expression in the switch, if there is one;
  • If there isn't one such case, but there is a default case, the execution will jump to the line after the default.
  • If there is none of the above (that is, no case equal to the value of the expression in the switch, and no default label), the execution will jump to the line after the closing bracked of the switch block.
It is interesting to note that this behaviour does not depend on the order of the case and default labels. default could be the first label to appear inside the switch block, some cases could go after it, and the language rules above would still hold. If there is a default , the execution jumps to it when no other case expression is equal to the switch expression, no matter where the other cases are located.

The lines with the case and default keywords are not instructions by themselves. Instead, they are just labels. They mark spots where the code execution may jump, depending on the value of the switch expression.

Ok. So what could go wrong?

One of the few things that could go wrong is that in C++, there is no implicit goto between the end of each case block and the end of the switch. This is good because it allows several cases to lead to the same action. This is done by putting several empty cases in consecutive lines, followed by a nonempty case. But it has one drawback, too: It also allows nonempty cases to fall through and continue executing below any further case labels they encounter, until a break is found or the switch block ends.

"What's wrong with that?", you say. "After all, this is how the language was defined. Every programmer must know how this kind of code works and write their switch blocks accordingly. If they're not aware of the need to write break to jump to the end of the switch block, it's their fault. I won't do such basic mistakes - if I fall through a case label it will be because I wanted to do so! If other people make mistakes when using a simple switch, what will they do with pointers?"

You are right. You are totally entitled to fall through case labels. I have no argument against that. You write correct C++ code and there is nothing I can obect to that.

Or is it?

Please listen to me for a minute. I will make one or two points and then you'll decide what you do with your switches.

First of all: Yes, I admit that you are confident and experienced in C++, and that you write correct code in this language. You never, ever forget to write a break, so whenever you use fall-through it's because you willingly choose to do so. But the next time I see one of them, how will I know it was you who wrote it? How will I know that it wasn't some beginner programmer who was confused about the need to write break and thought that his code would jump right to the end of the switch after the case was "finished", and forgot to test that particular part of the code? How will I, a humble reader, tell one unintended fall-through from your very well informed ones? I'll tell you how: In no way. I'll have to try and guess it from the context, or just trust that the existing code is correct in the absence of evidence to the contrary.

One more thing: Have you ever forgotten to pick your keys? I have, and I know perfectly how my door works. I know I won't be able to open it without them. Nevertheless, a few times in my life I forgot to pick the keys and I got locked out.

I know perfectly how switch works, but even so I have forgotten to write a break statement in certain occasions. I found out shortly after, because I tested the code and it didn't do what I expected. But nevertheless I'd like to prevent such simple mistakes in the future.

In case I haven't convinced you yet, let me quote this StackOverflow answer, which is itself quoting a designer of the C# language, who arguments against fall-through between cases being built into the language in the first place:
This implicit fall-through behavior is often used to reduce the amount of code needed and often isn't an issue the first time that code is written. However, as code moves from the initial development phase into a maintenance phase, the code above can lead to subtle errors that are very hard to debug. These errors result from the very common mistake of the developer adding a case, yet forgetting to put a break at the end of the block.
In C#, the switch statement requires that explicit flow control occur at the end of a case, either a break, goto, return, or throw. If the developer desires fall-through semantics, it can be achieved by an explicit goto at the end of a case statement.
There is not one single truth, except maybe for the fact that you need to know the syntax of your language and use it correctly. But I am strongly inclined to treat C++ as if it was C# in this particular item: do not use  fall-through, and always use break at the end of a nonempty switch statement. This will make your C# and C++ switch statements look the same, and behave in the same way - a robust, less error prone way.

Example

Don't

If write a switch statement with fallthrough, such as:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
  case 4:
  case 5:
    // do something else
    break;
}

You'll leave future readers of your code intrigued as to whether you forgot to write a break after the "do something". It is good that you solve that doubt, just in case they don't succeed at inventing a time machine so that they can come to ask you. 

Do

You could at least add a comment stating that your fallthrough is intended:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
    // FALLTHROUGH
  case 4:
  case 5:
    // do something else
    break;
}

This violates my guideline, but at least makes it clear that you knew what you were doing and that you didn't just forget the break .

A better option is to not use fallthrough at all:

switch (option)
{
  case 1:
  case 2:
  case 3:
    // do something
    // do something else
    break;
  case 4:
  case 5:
    // do something else
    break;
}

Of course, "do something else" should better be a very simple instruction or a function call, because you're now repeating that instruction. If it was originally a set of eight instructions, it is not a good idea to just copy and paste them in the above case, because that introduces code duplication. If it is just a trivial line or a function call, it's ok.

While we're at it, it's not a bad idea to always put a default statement:

switch (option)
{
   case 1:
   case 2:
   case 3:
      // do something
      // do something else
      break;
   case 4:
   case 5:
      // do something else
      break;
   default:
      break;
}

When a new case 6 comes, the more clear, readable and well structured is your switch statement, the less likely you will be to make a mistake.

Given that this is the best way to write a switch statement, why not build it like this from the beginning? For example, you could start by writing:

switch (option)
{
   case 1:
   case 2:
   case 3:
   case 4:
   case 5:
   default:
}

Then consider which cases should be grouped, and divide them by adding the break statements:

switch (option)
{
   case 1:
   case 2:
   case 3:
      // action for 1, 2, 3
      break;
   case 4:
   case 5:
      // action for 4, 5
      break;
   default:
      break;
}

And in a third step, add the instructions before each break, so that your actually does something. If you get used to write switch blocks this way, you'll minimize your mistakes and, more importantly, you will generate readable and well structured code.

Bibliography

[McCONNELL 2004] This book discusses switch statements in Chapter 15: "Using conditionals", Section 15.2 "case statements" (pages 361-365).

Monday, 13 January 2014

Guideline 3. The condition in an "if" statement should be a simple boolean expression with no side effects

Guideline

  • The expression inside an if statement should be boolean.
  • It should be simple and readable.
  • It should have no side effects.
  • It should not include assignments or variable declarations.

    Discussion

    1. A boolean is a boolean

    In C++, every statement is an expression of a certain type. It may be an ordinary type, such as boolint or a user-defined type such as Customer), or a special type called void which basically means "no type".

    The language rules establish which types are fundamental in the language. You don't need to invent your own bool type, for example, because the language has one. So you won't use invented types such as BOOL or Bool, will you?

    The language rules also determine its syntax, of course. They say what types are expected at each part of an instruction. Thework of the people who make compilers is to follow these rules so that they create C++ programs which execute the code you write in the right way. Your part of the work is to know these rules so that after being compiled, your program does not only what you wrote (it will, if the compiler is correct), but also what you wanted to do when you were writing it.

    Different languages have different rules. The C language, for example, does not have a native boolean type. When that language was created, if was defined as taking an int as an argument, so that you would write code such as if (count). This worked, but was not natural.

    The natural argument of a conditional expression is a boolean expression. If the boolean expression is true, the code block whichs follow the if will be executed. Otherwise, it won't.

    In C++ the expression inside the pair of brackets will be understood as a boolean expression. Some people carry the habit from their C days and pass an as int an argument to the if. Other people even use a pointer, so that the if will check whether it is NULL and will evaluate to true in that case. Don't do it - it makes the code less readable and a step closer to being error prone.

    Guideline #2 advised you to avoid implicit type conversions. This is a very good place to start applying it. The if statement requires a bool, so why pass it anything else?

    2. And nothing else than a boolean

    A boolean expression, that's all. Don't overcomplicate things by putting several actions together. Don't use complex commands which may succeed or not, returning a boolean which indicates success, as conditions inside if statements. Don't write assignments or, worse, variable declarations in the condition, either. In short, keep it simple. Don't do anything in your code that you wouldn't do when writing an instructions list for someone to follow. If you do, you'll (or someone else will) soon be asking yourself (or themselves) where is the software failing, only to discover that you need to debug into each and every complex step which you one day decided to write together inside of an if condition.

    While we're at it, everything which has been said here also applies to the conditions inside loop instructions such as for, while and do ... while .

    This second part (keep it simple, no side effects) also applies to the expression which is evaluated in a switch instruction, altough in that case of course you're not using a boolean expression, but an integer or an enum.

    Example

    Don't

    if (!isConnected() || !saveToFile())
    showMessage("Error saving to file.");

    What have you done? You have omitted the square brackets in the code block which is executed if the condition is true! Please don't do that. I didn't mention that in the guideline, but I took it for granted. Put the square brackets in their place and then we'll continue the discussion.

    if (!isConnected() || !saveToFile())
    {
    showMessage("Error saving to file.");
    }

    Thank you.

    In this example you mixed two possible complex operations (one may be a query, isConnected; the other is a command) in the condition, so you won't easily know which one failed when you see the message. At least you used a boolean expression, which is good; but it is not a simple one, and it has side effects. Let's see how we can improve it.

    Do

    if (!isConnected())
    {
    showMessage("Cannot not save to file: Not connected.");
    }
    else
    {
    bool saveSucceeded = saveToFile();
    if (!saveSucceeded)
    {
    showMessage("Error saving to file.");
    }
    }

    There are a few alternatives, most of them better than the original code. In this case at least I separated the two possible causes of error, giving them different error messages (the error in saveToFile probably needs more detailed information, but that is beyond the scope of this example). It's easy to debug the code step by step (or imagine it being debugged) and figure out what's going on.

    I assumed that isConnected is simple query and left it inside the condition; had it been a complex operation, I'd isolate it in its own line just as I did with saveToFile.

    Which leads us to the most important point: a command such as saveToFile should be called in its own line, not buried inside the conditional expression of an if statement. It's better to separate the list of things that you need to do (check whether the object, whatever it is is connected, and then save to file) from a conditional action which you will perform if the result of one of these actions is not satisfactory.

    Bibliography

    This materials listed in this section discuss the issue described above in more detail. See the page Bibliography for details of the referenced materials.

    [McCONNELL 2004] This book discusses conditionals in Chapter 15. In particular, see Section 15.1 "if statements" (pages 355-360) and the checklist at the end of the chapter (page 365).

    Monday, 6 January 2014

    Guideline 2. Make all type conversions explicit

    Guideline

    Assignments, function calls and expressions in general should not contain implicit type conversions. All type conversions should be explicit.

    Type conversions should be isolated in their own line, which should consist only of the assignment of the result of the type conversion to a variable of the target type.

    A statement should not mix type conversion with other operations. For example, if you need to convert an object to one of another type to pass it as a function parameter, do not do it in the function call itself; instead, do it in a separate instruction in the previous line.

    Discussion

    Type conversion occurs when an entity of type A is, well, converted to an entity of type B. This may occur in one of these circumstances:
    • Using the object as a parameter in a function call, where the type of the parameter in the function is not the same as the type actually passed to it.
    • Using the object as an operand in an expression which requires an object of a different type.
    • Assigning the object to an object of a different type.
    In many cases this operation will be impossible. If a function takes a Customer, for example, you cannot call it with a totally unrelated type such as a Printer. In other cases the conversion may be possible but involve some risks. Finally, there may be cases in which it can be done safely.

    A type conversion may be implicit or explicit. An explicit type conversion is one which you can read in the code. An implicit type conversion is one which does not appear in the source code, but is done automatically when the software is executed.

    Making all type conversions explicit, as this guideline states, helps you to identify the potentially dangerous ones. Type conversions shouldn't be spread all over your code, because good design techniques should avoid them in most cases. In such specific cases where a type conversion is needed, it's better to make it shine out so that more coding, review and testing effort is dedicated to ensure its correctness.

    Example

    Don't

    int i = 0;
    double d = 10.0;

    (…)

    i = d;

    This intentionally simple example is enough to show my point. There are always several things in each piece of code that might potentially go wrong, and one of them in this case (converting a value from one type to another) is totally hidden in the assignment expression, so that an error in it might go unnoticed.

    Do

    int i = 0;
    double d = 10.0;

    (…)

    i = static_cast<int>(d);

    The type conversion is made explicit by means of a cast instruction. That is precisely the meaning of a cast: an explicit type conversion.

    C++ offers several cast instructions, namely static_castdynamic_castreinterpret_cast. and const_cast. They are the equivalent of a warning sign in a road. They tell the reader of the code that something might go wrong because of the type conversion. It won't go wrong, because you designed and tested it well - but the potential risk is there. Something's going on which shouldn't be hidden, for the sake of today's and tomorrow's correctness.

    If you get used to always respect the types of your objects, identify the occasional type conversions and make them explicit, you will prevent some common mistakes and avoid future problems.

    Bibliography

    This materials referenced in this section discuss the issue described above in more detail. The range they cover is much wider, though, and dives into more depth. See the page Bibliography for the detailed references.

    [MEYERS 1998-1] The new forms of cast are discussed in the Introduction, pages 10-11. Implicit type conversions are treated on pages 67, 83, 85 and 86.

    [MEYERS 1998-2] Implicit type conversions are central to the subject of Item 5: "Be wary of user-defined conversion functions", pages 24-31.

    [SUTTER 1999] Implicit type conversions appear on pages 19, 70, 164-165 and 191. See especially Item 39: "Automatic conversions".

    Wednesday, 1 January 2014

    Guideline 1. Do one thing at a time

    Guideline

    • Put each instruction in one separate line of code.
    • Each instruction shall do exactly one thing.

    Discussion

    This will enhance the readability of your code and make it easier to debug and possibly change in the future.

    It should not be possible for an instruction to have several causes of failure. Constructing objects, calling functions and combining several values into a nontrivial expression are operations relevant enough to deserve their own line of code.

    For example, if an instruction is a function call, its arguments should be trivial expressions. If they are not, their values should be obtained in independent instructions prior to the function call. If the purpose of that function call is to get some result, don’t do anything complicated with it once you get it. Instead, place the result in a variable and do subsequent operations with it in the following lines.

    If you write code in this way you will find out that it follows your train of thought quite closely. For this reason, a future reader (whether it is another programmer or you) will have a better chance of understanding it.

    Example

    Don't

    int i = 0, j = 1;

    (…)

    i = ++j;

    In the don't section you see that two variables are declared in the same line. After doing some omitted work which possibly changes their initial values, j is incremented, and the result of this increment is assigned to i.

    Do

    int i = 0; // Count of i
    int j = 1; // Count of j

    (…)

    ++j;
    i = j;


    This code puts the two variable declarations in separate lines, thus complying with the guideline and allowing to easy document their meaning - something which may not always be necessary, but which should always be considered.

    The change of value of j (its increment) is given a specific line of code, and after that, the assignment of the value of j to i is performed. The code becomes instantly clearer, just by doing one thing at a time.

    Bibliography

    See the Bibliography page for full book references.

    [McCONNELL 2004] This book discusses layout in Chapter 31: "Layout and Style". In particular, see Section 31.5 "Laying out individual statements" and its Subsection "Using Only One Statement Per Line" (page 758-761).