Saturday, 15 November 2014

Guideline 10. A function should have a single exit point at its end

Guideline

  •  Prefer a single exit point.
  •  In void functions, the single exit point should be the physical end of the function.
  •  In non-void functions, the single exit point should be a return statement at the last line of the function.
  •  The return statement should be simple and avoid implicit type conversions.
 Corollaries:
  • Avoid multiple return statements.
  •  Prefer not to throw exceptions.

 Discussion

Getting function bodies right is the basis of whatever coding best practices you learn in the future. By writing correct and maintainable function bodies, you’ll have the pieces to join in a solid greater work. This tenth guideline completes a set of recommendations related to writing function bodies.

A good story is one that ends well. The execution of a function body can end in two ways: returning, and throwing an exception. A function returns if the execution flow reaches a return statement, or if it reaches its closing bracket. A function throws if an exception is generated (for example, with a throw statement) and not handled in the function itself. Let’s ignore exceptions by now (most of the time, ignoring them is not as bad an idea as it may seem) and let’s focus on the return.

A function has a return type. Even without having discussed function signatures, you obviously needto know that when you’re writing a function. Let’s divide the return types in two categories: void (i.e. no type) and anything else.

In void function, a return statement forces the function execution to be ended at that point, returning the control flow to the caller code. It’s ok to write return statements, but prefer not using them in this case and instead exiting the function at its end. You will minimize confusions in the human eyes which will read your code (the machine which executes it is never confused).

In non-void functions, prefer a single exit point, in the form of a return statement at the last line of the function. Avoid too complex statements as return statements: prefer returning a constant, a local variable, or the result of a simple query function.

Also, avoid implicit type conversions: don’t end a function which is int with “return 3.0;”. Your floating point literal will implicitly be casted to an integer, but you want to avoid implicit type conversions in your code – you win very little in using them, but you can sometimes lose a lot. An explicit type conversion in a return statement is not much better – if you really need one, prefer isolating it in its single line, prior to the return statement.

Bibliography

[McCONNELL 2004] Code Complete is flexible about multiple return statements, but with a preference for a single return: "Minimize the number of returns in each routine. It's harder to understand a routine when, reading it at the bottom, you're unaware of the possibility that it returned somewhere above. For that reason, use returns judiciously-only when they improve readability.", it says in page 393 as a conclusion of Section 17.1 "Multiple Returns from a Routine", in Chapter 17 "Unusual Control Structures".

[MEYER 1997] Bertrand Meyer's book discusses every aspect of an object-oriented programming language to decide how it should be if we were to design it. He actually designs one, of course - Eiffel. Eiffel introduced the concept of a special local variable called Result which always holds the return of the function call. He mentioned three disadvantages of the more common return instruction, used in C, C++/Java, Ada and Modula-2, among others. First, often the result must be obtained through intermediate computations, and a local variable needs to be created only for that purpose. Second, "the technique tends to promote multiple-exit modules, which are contrary to the principles of good program structuring". Third, the language still needs to define what happens if the return instruction is missing or the last instruction in a function is not a return. You can find this very interesting discussion in Section 7.10, page 211, inside Chapter 7 "The static structure: classes".


Wednesday, 13 August 2014

Guideline 9. Keep functions short and cohesive

Guideline


A function should achieve its purpose by means of a cohesive structure. A function can be primarily structured in one of these ways:
  • A sequence of actions.
  • A condition.
  • A repetition of certain actions.
Do not combine these structures in a way which makes it hard to follow the code flow throughout the function, or that it makes it difficult to test.

A function should have a small cyclomatic complexity.

Discussion

One function, one purpose


A function should have one specific purpose. It should exist for one single reason, and that is to achieve its purpose.

As a writer of a function body, you decide what strategy to use for the function to achieve its goal. It should be clear and simple. The reader of a function body should be able to follow your steps and understand what's going on.

A sequence, a conditional action, a loop


In one of the programming classes I took at the university, the teacher asked us how a function which accomplished a certain goal should be like. We hesitated, nobody spoke. Then the teacher said: "Come on, it isn't that hard. It must be either a sequence, a condition, or a loop. So which one is it?".

I don't remember which one it was. But that sentence stuck to my mind. A sequence, a conditional action or a loop. The structure of a function is always one of these.

But, you say, these control flow structures can be combined to form more sophisticated units. For example, a function may be a sequence of actions, with some of them being executed only under some conditions, and some other being loops. A certain degree of composition may be understandable and practical, but if you're not very strict soon you'll find yourself writing 50-line functions which lack any structure.

I've seen functions of more than 200 lines in production code, several times and in different projects. I've seen functions which should never have been written. Functions which never seemed to end their work, and which needed comments playing the role of section headers, as in: "// 4. Get data from the database", "// 4.1 Check database connection"...

Cyclomatic complexity

Software scientists have long attempted to measure the complexity of code. Cyclomatic complexity measures the number of linearly independent paths through a program's source code. I've never been proficient in actually calculating this metric - nowadays, automatic tools will do that job better than you. But in short it expresses that a function (a software element, in general) is harder to test the higher its cyclomatic complexity is.

If your function is a condition, you may need to test it in two different scenarios (paths): when the condition is true, and when it is false. If it contains four conditions, suddenly the scenarios become 2^4, that is, 16. If it is a combination of nested conditions with some loops spread out here and there, it will be a testing and maintenance nightmare. Some of these long and overly functions contain bugs ever since they begin to exist. Nobody has found them yet, because they are hidden in a path which requires five different conditions to be met. But they will appear, eventually.

"But if reality is complex, if requirements are complex... maybe all this complexity is necessary, and there's simply no way to escape from it - they asked me to contemplate all those different cases!", you say. You have your point. But at least remember that you should divide that compexity among different, simple software elements. The function is the smallest software element: you can't write unit tests for something smaller than a function, for example. Complex parts deserve to be in their own function, with clearly defined semantics and which can be tested independently.

Bibliography


[McCONNELL 2004] This book has an interesting section about complexity in software, 34.1: "Conquer Complexity" (pages 837-839), inside its valuable chapter 34: "Themes in Software Craftmanship".

Saturday, 28 June 2014

Guideline 8. Define variables as close as possible to where they are used

Guideline

Define variables as close as possible to their first use. Prefer variables with the most local scope as possible.

Complementary guidelines:

Do not reuse a variable for two totally different things inside the same function body, just to "save some space". It kills readability and it is very error prone.

Do not use exactly the same name for two variables with inner block scopes inside the same function body. It may hurt readability and it is error prone.

Discussion


Context


In previous guidelines we have been discussing the innermost pieces of software construction in C++: function bodies. All the guidelines we have seen make sense within that scope. Later we'll see guidelines which refer to the structures to which functions belong, that is, the classes. Even later we may consider how a number of classes are organized in a software project.

Guideline #7 instructed you to initialize all variables at the point of their definition, i.e., variables should begin to exist with a known value, to prevent undefined behaviour. The present guideline is a further recommendation about when should this (the variable definition and initialization) happen.

But what is a variable definition, anyway? Is it the same as a variable declaration?

Variable definition and declaration


This great answer on the software Q&A site Stack Overflow clarifies the meanings of variable declaration and definition in C++. Quoting:

A declaration introduces an identifier and describes its type, be it a type, object, or function. A declaration iswhat the compiler needs to accept references to that identifier. These are declarations:
extern int bar;
extern int g(int, int);
double f(int, double); // extern can be omitted for function declarations
class foo; // no extern allowed for class declarations

A definition actually instantiates/implements this identifier. It's what the linker needs in order to link references to those entities. These are definitions corresponding to the above declarations:
int bar;
int g(int lhs, int rhs) {return lhs*rhs;}
double f(int i, double d) {return i+d;}
class foo {};
A definition can be used in the place of a declaration. (...)

I strongly recommend that you read the complete answer.

In short we can say that inside a function body, at run time, a variable begins to exist once the code execution reaches the point where it is defined - not the point where it is only declared.

It benefits the readability of your code that you keep functions short and well structured. To achieve this, It is a key factor that every concept is limited to the exact scope where it belongs to. Define, initialize and use each variable exactly where it is needed, not any earlier. By following this simple guideline you will write code which is more readable, contains less defects, and is easier to debug.

Refactoring a function body is one of the most common tasks in software engineering. What was before in one function could be separated into several functions in the future. If you limit the scope of each variable to exactly the needed one, you will make this process much easier.

Complementary guidelines


If you have learnt the old C tradition of declaring all variables at the top of a funcion body, maybe it's time to consider abandoning it, in favour of a more compact and readable style.

If you have seen code in which a variable of a certain type is reused inside the function body for two or three different things, just to save some bytes, please forget about it. Use each variable exactly once - don't define variables which are never used, and don't use the same variable for several different things.

A local variable has only the scope of the braces which contain it. Because of this, a function may contain some inner block of code in which you define a variable, say, int i = 0;. Later in the same function you may have a later block of code in which you will be allowed to create another variable with the same type and name: int i = 0;. These are different variables, which totally disjoint scopes. This is perfectly correct. However, prefer not doing it. It is error prone and it hurts readability to have two different variables with the same name so close to each other. In fact, "i" is not a very expressive name - you should be able to come up with better names for both variables, names which say more about their meaning.

Bibliography

Books

[McCONNELL 2004] This book discusses variable initialization in section 10.3: "Guidelines for initializing variables" (page 242). Scope is discussed in section 10.4: "Scope" (page 244). And the title of section 10.8 speaks for itself: "Using each variable for exactly one purpose" (page 255).

[MEYERS 1998-1] Scott Meyers advises us to "Postpone variable definitions as long as possible" in Item 32 of this book (page 135).

[SUTTER-ALEXANDRESCU 2004] The C++ Coding Standards suggested in this book include Item 18, "Declare variables as locally as possible" (page 35).

Web references