Saturday, December 17, 2011

Avoiding Degenerative Coding Practices - My Coding Style Guide (C-Style)

I'm quite pedantic about many aspects about how code looks. After all, you're going to be looking at, and editing it quite a bit, so it does pay to spend some time making it more pleasant in the long run.

This particular guide applies to C-like languages. I have another one for Python-code which shares some similarities, but also some differences.

Lazy/Sloppy/Ugly Formatting to be Avoided
BAD - "Caterpillar Ifs"
I've mentioned these a bit in the past, but what exactly are they. Let's see an example:
if (some_condition) {
    ... do some thinking stuff ...
} else if (some_condition_2) {
    ... do some more stuff ...
} else if (...) {
    ... aren't your tiring of this crap already ...
} else {
    ... bleh...
}
The important thing here is that if statements are on the same lines as the previous blocks end. BAD BAD BAD!
1) Inserting a new condition + block in the middle is difficult. You need to either chop off the head or tail of the surrounding blocks, with "the guts" spilling out into "open space" until you finish adding in your new case. You will do this often.
2) The same goes for removing a condition+block in the middle
3) Rearranging the code is yet another combination of adding/removing conditions+blocks. But fundamentally, all of these are all made difficult as...
3) It's hard to see the where cases start and finish - they're all just part of one large hazy blur of action

How then should "good" code look? Here's an example :)

if (some_condition) {
    ... do some thinking stuff ...
}
else if (some_condition_2) {
    ... do some more stuff ...
}
else if (...) {
    ... this is easy to see isn't it ...
}
else {
    ... yay! ...
}
Much better now :)

BAD - Nested Blocks Without Braces for Scope Clarification
For example,
for (i = 0; i < maxCount; i++)
   if ((i % 3) != 0)
      ... do something funny ...
Or worse
if (i > 2)
   if (i < 4)
      ... do something funny ...
will become confusing very easily as soon as you try to add an else clause to the inner if...

Admittedly, sometimes when coding, you're quite desperate to get your code working fast. But will skimping on two keystrokes really help that much in the long run.? ;)

So, for your and my sanity, and also for nicer more "complete" looking code, remember to put braces around each "outer"-shell - i.e. each block that will itself have child blocks. So,
if (i > 0) {
   if (i < 100)
      ... do something great...
}
In recent times, I've actually seen a stricter version of this which says that ANY block should have braces, even the inner-most ones with just one line. Personally, I'm still a bit divided about this one; sometimes it looks like overkill, while other times it makes perfect sense. So, IMO this is one of those: whatever works best for the particular case.

BAD - No space between "if", "while", "for" (i.e. C's keywords) and first parenthesis
For example,
if(...somecondition1...)
   ...

while(...somecondition2...)
   ...

for(... blah; blah2; blah3...)
    ...

I suspect that this is yet another subtle attempt at skimping on keystrokes. Or perhaps there are some really confused people out there who are delusional and believe that these are function calls NOT the builtin keywords that they are.

BAD - Strange spaces (or the complete lack of spaces) either side of the parentheses everywhere
Somewhat related to the above are the cases where people have a habit of adding weird extra spaces all over the place. For example...

if( blah blah ) {

if(blah blah blah ){

if ( blah blah blah){

if(blah blah blah){

if( blah blah blah ){
...


BAD - Function definitions
Exhibit 1 - GNU Style A
static void 
my_function_name_here (int a, int b, float c)
{
  if (blah)
    {
       ... haha ..
    }
  else
    {
        ... gah ...
    }
}

Exhibit 2 - GNU Style B

static void 
my_function_name_here (int a, int b, float c)
  {
    if (blah)
      {
         ... haha ..
      }
    else
      {
          ... gah ...
      }
  }



Exhibit 3 - Lazy Java Style
void function(int x, int y, int z) {
    ... function body ....
}

Notes about the above:
- Indented braces? Seriously?! Wackos!
- Functions are top-level blocks. As such, block scoping indicators should each be on separate lines

Formatting that you SHOULD use
Now that we've gotten the worst out of the way, let's have a look at what you SHOULDMUST do.

GOOD - Pointer Declarations
The "star" for indicating pointers and pointer-related behaviour should be on the side of the variable not the type.

For example:
int *a, *b, c;
here a and b are both pointers to integers, but c is just a regular integer. This example illustrates why I've chosen this convention, as it means that you won't forget this anytime while coding like in
int* a, b, c;
where only a is a pointer.

Carrying this convention, when you need a pointer cast, do
casted_value = (PointerType *)old_value;
GOOD - Switch/Case Statements
Case statements should take the following form:
switch (value) {
    case CASE_1:
    {
        ... case body ...
    }
        break;

    case CASE_2:
    {
        ... case 2 body ...
    }
        /* continue; */
    case CASE_3:
        ... case 3 body ...
        break;
    
    case CASE_4:
    case CASE_5:
        ... handling for cases 4 and 5 ...
        break;

    default:
        ... default case ....
        break;
}
Rules:
- If it's more than one line of code, put a set of braces around it
- Use switch/case statements if a single value is check against two or more different cases requiring different processing in each case
- Indention used should be strictly as shown above
   - Opening brace on switch statement could be put on a new line but only at same level as the switch statement itself. If opting for this approach, use sparingly, and/or only if the rest of the code uses that style of brace placement too
- For flow-through cases where some processing occurs before a more general case occurs (e.g. 2 to 3), ALWAYS include a "continue;" comment between the two case blocks. This ensures that it is clear to a reader that this is an intentional flow-through, and not just a forgotten break.
    - However, if no extra processing should happen, the "4,5" version should be used.

Good - Indicate indicate array types in function signatures as such
If you have a variable length array, do the following:
int function(int var_array[], size_t len)
But if the length of the array is known (i.e. for a vector), then
float function(float vec[3])
or perhaps for a matrix...
void function(float mat[4][4])

Do not just use an ordinary pointer for these. Reserve those for truly single-value input/output channels:
void map(const int x, const int y, int *xo, int *yo)
Also note in this case that we mark purely input variables as "const".

Good - Enumerations - Style and Naming
If you need to define a set of named values that a property can take, use an enum instead of #define's
typedef enum eEnumName {
    /* v1 comment */
    ENUM_VALUE_0 = 0,
    /* v2 comment */ 
    ENUM_VALUE_1,
    /* v3 comment */ 
    ENUM_VALUE_2,
    /* v4 comment */ 
    ENUM_VALUE_3,

    /* (optionally, but for length-limited enums only) */
    NUM_ENUM_TYPES
} eEnumName;

Rules:
- Always name any enums define, and ensure that the name is prefixed by a lower-case 'e'.
- The last enum value in a list should have a hanging comma to facilitate further additions later on. In particular, this saves the new coder from having to add a comma to the last line there before adding their own, and also helps to keep the diff-noise down.
    - Yes, I'm now aware that this is apparently not valid C89/ANSI-C strict. However, EVERY compiler I've seen supports this without falling over itself unless you tell it to be pedantic.

Now, if you need to define a set of bitflags for an int/short, do the following:
typedef enum eSometypeFlags {
     /* f1 comment */
     SOMETYPE_FLAG_BLAH       = (1<<0),
     /* f2 comment */
     SOMETYPE_FLAG_BLAH2     = (1<<1),
     /* f3 comment */
     SOMETYPE_FLAG_BLAH3     = (1<<2),
} eSometypeFlags;
A similar story to with standard enums, except that here we have to define each value using bitshifts.
 - Do NOT use 1,2,4,8,etc. style or 0x01, 0x02, etc. style defines since these are harder keep track of.
 - Yes, technically these values are not mutually exclusive, and perhaps pedantically actually violates the lawyer-definition of what an enum is. However, from a practical perspective, this is a natural representation which complements most coder's mental models of what enums are.

GOOD - Structs
With structs, it's a good idea to keep in mind the alignment rules that apply for Blender's source code, especially in SDNA.

Anyways, here's an example of what they should look like in general:
typedef struct MyStruct {
    struct MyStruct *next, *prev;
    void *a_ptr;    /* comment */
    float x, y;        /* comment  */

    int flag;           /* comment */
    int value;        /* comment */
} MyStruct;
Key things worth pointing out here:
- Ideally there should be only one variable on a line, unless they are clearly related (next/prev or inlined array elements). This allows each to be commented by itself
- Section header comments can go above a chunk of related settings
- Related settings should ideally be kept together. This should also be shown by the use of line breaks to visually group these settings
- No visual padding of item names should take place to align them in most cases, unless there's like 1 odd man out (i.e. one float var in a sea of ints).

Special care should be taken for the naming of structs
- Structs which are "primary data structures" (i.e. those which hang around all the time, and keep track of some important application data) should be named normally: TitleCase with no spaces. For example, 'ArrayList'.
- Structs which are used as helper structures by certain functions/processes only, which only they deal with should be marked as transient. That is, their names should all be prefixed by a lower-case 't'. For example, 'tSortElem'.

GOOD - Variable Declarations
Variables should be defined at the start of blocks only. This is the only way that most C compilers support (IIRC it's GCC which is the one which complains here).

The scope of variables should be limited to the chunk of code that they're used in. So, for example, if you have a step within a function which uses a lot of variables which aren't used elsewhere, then define that step as an inner/anonymous block and declare its variables at the start of that block.

The main benefits of reducing variable scope is that it makes it easier to keep track of things, preventing situations where unknown results arise.


The ordering of variable declarations depends on several things (listed here in order of importance):
1) Custom types (i.e. structs, enums, etc.), which usually have longer names (but also often take up more space) should be defined before those for builtin types
2) Within the "type divides", lines of variable declarations should be ordered according to length, with longer lines occurring before shorter ones.


Within each line, variable types should not be mixed:
- Arrays and single items shouldn't be declared on the same lines
   - Arrays of different lengths (including multi-dim vs single-dim) are ok to have on the same line though
- Pointer and non-pointer types should not be mixed on the same line


Where it makes sense, variables should be initialised when created. There's no need to initialise a variable if it is going to be set to some value quite soon a few steps later.

GOOD - Comment formatting
Block comments should be favoured over C++ style "lazy singlelines". They should take one of the following forms:
Single line
/* This is a comment. Fix for xyz [#12345] weird bug. */

Multi-line
/* This is a comment
 *
 * Some blob of waffling text can go here explaining what is going on.
 * Quite a bit of waffling can take place, and it can go over the edge 
 * of lines and spawn some lists:
 *     - first item
 *     - second item
 *     - last item
 */
The use of C++ "lazy singlelines" should be reserved for temporary problem markers only. For example:
// TODO: fix this evil code
and should follow after any block comments have finished, but before the code in question occurs. Note the use of a space AFTER the comment marker...

For debugging purposes, it may be convenient to comment out a single line of code for testing. In this case, a "lazy singleline" should be used (let's be honest, it's far too clumsy to type in /* */ either end of such code). However, to tell the difference between commented out code, and code snippets as part of comments, there should be NO SPACE between comment marker and code being commented out
// some_example_code(foobar)
vs
//offending_function_call(foo, bar, bleh);
However, if more than 1 offending line is present, you should NOT go in and insert comments on each line (unless you're in Java, with no other sane choice, but then again, why go there!)
#if 0 // debugging why this won't work!
    first_line_of_offending_code();
    second_line += of_offending_code;
    ...
    last_line_of_offending_code();
#endif

Construct Usage
Looping
In general, unless trying to make an infinite loop, a for loop tends to be "safer" and less likely to mean that you accidentally shoot yourself in the foot. Reserve while loops for cases where you've got an infinite loop, since it's all too easy to get the termination condition wrong or forget to increment something which would trigger the termination condition to fail. CS professors might disagree on this at times, but for the vast majority of the coding public, for loops are the way I'd recommend them to go.

If using the "readline" idiom with the while loop, it's better to use the pedantic/fully qualified version instead of the standard NULL==0 assumption. So from experience,
while ((val = read_line_func(fp)) != NULL)
reads much nicer than
while ((val = read_line_func(fp))) 
Pointers
In general, try to stick to first-order pointers (a pointer to some data). If you must, go to second-order pointers (i.e. a pointer to a pointer). Anything beyond this is pure madness, and suggests a data structuring flaw. Further, anything higher is by and large unreadable for 95% of coders, and is a bit of a tax on the rest too (the level at which that goes depends on the coder's skill level, degree of sleep deprivation, and "Gross Domestic Happiness Quotient" ;)

Macros
Recognise what they are, and use them to their strengths, but don't go overboard to shoot yourself in the foot. In particular, side-effects (other than their main advertised purpose, unless these are going to be just internal abstracted details that the end user shouldn't have to care about) are where you'll mostly run into trouble. That, and trying to develop a 300-line complex function as macro with no syntactic highlighting and cryptic compiler errors when you make a typo, are the main reasons people have qualms against these.

Used properly, they can help bring structure to an otherwise difficult to comprehend algorithm, and/or help abstract away the finer points/syntactic-bludgery of certain logic structures. Thus, they can actually make code clearer in some cases.


For example, one of my favourite macros is the "ELEM" macro. It practically allows the implementation of the:
if x in {a, b}
idiom from other higher-level modern languages, but without the baggage of syntactic clutter (namely repeated checking of x against some value. That is, the ELEM macro usually expands to the ugly:
if ((x == a) || (x == b))
which arises quite often when performing NULL checks at the start of a function.

In short, my take on macros: use them!

Assorted Issues
Indention
Indention should be done with tabs only. 
- Tabs should be considered to be 4 spaces wide. Text editors should ideally be configured to do this (though other values can be used as long as it's for your own personal display). However, 4 spaces is what the modern tab is and should be.
- Up to 3 spaces may be used to align pieces of code with preceeding lines, if those lines were the result of a line split. However, these 3 spaces must only occur at the end of a string of tabs which line up the code with the preceeding lines' indention level.

Blank lines should not be truly blank. If they occur within a block of code, they should take on the indention level of the last piece of code that they follow.
- For most text editors, turn off the "Strip Trailing Spaces" option. It is IMPLEMENTED INCORRECTLY EVERYWHERE (i.e. in a lazy manner). This is because indention/whitespace on blank lines actually occurs before any code on that line has occurred yet, which means that it isn't actually "trailing".
- For Vi/Vim users, I'm afraid that there is no way to force it not to clobber indention. I've asked and been booed out the door by a bunch probably still living in the dark ages counting filesizes by bytes. AFAIK, there is an option which kindof delays the inevitable, keeping them around until a mode switch triggers a silent clobber. Until that changes, probably the only alternative is to actually hack their source code.

Parenthesis Usage - [Added 2Jan2012. Originally omitted, but has been found to be note worthy]
Within a condition, each statement to be tested within the condition should be surrounded by a parentheses. Although we don't want to really end up like Lisp, it's better that everyone can see at a glance exactly what the condition is built up from. In the end, if it saves someone a moment of thought going through the operator precedence stuff parsing the expression, then it's less likely that they're going to end up fatally misinterpreting the statement and either introducing bugs or wasting time debugging.

So,
if ((a == b) || (c != d))
NOT
if (a == b || c != d)

One case deserves special attention:
When testing if a bitflag is set, ALWAYS surround the bitflag masking part with parentheses (If just testing whether the flag is set, as in the first example below, it is fine to just use the single pair). That is,
if (flag & BIT_FLAG_1)
if ((flag & BIT_FLAG_1) == 0)
if ((flag & BIT_FLAG_1) && (...some other condition...))

NOT
if (flag & BIT_FLAG_1 && ...some other condition...)

Apart from the clarity considerations from the general case, there are two other considerations for this style.
- In the past, certain compilers would regularly bail out on the case with no explicit parentheses. So, it's better to enforce a consistent style that they'll be happy with
- The bad style makes it more difficult to pick up typos where a single ampersand was intended vs when it was actually meant to be a logical AND. Sure, in one case you'd usually see some UPPER_CASE stuff, but in the interests of clarity, explicit is better here.

7 comments:

  1. I see a lot of those bad formatting ways with new programmers...

    What do you think about putting the opening braces on the next line such as:

    if (some_condition)
    {
    ... do some thinking stuff ...
    }

    instead of:

    if (some_condition) {
    ... do some thinking stuff ...
    }

    I find it better looking and it allows every block of code to be aligned properly.

    ReplyDelete
  2. Apos:
    I've tried this a few times on and off, but at the end of the day, IMO it ends up just seeming a bit too verbose.

    Unlike with top level function/class blocks, these control statements aren't that important. Also, by putting them beside the condition, it links the statement-block with the condition.
    That is not an excuse for the catapillar-ifs though, since those clump all the blocks into a single blob.

    Having said that, it is my preferred format for when you need to split the condition over several lines. So:

    if ((condition_1) ||
    (condition_2))
    {
    ... do funky stuff in here ...
    }

    ReplyDelete
  3. Whats wrong with Exhibit 1 - GNU Style A?

    ReplyDelete
    Replies
    1. 1) Return type, etc. on a separate line from the rest of the function definition/prototype
      I'm a still a bit mixed on this, but tend towards preferring everything on one line.

      2) Halfway indented braces
      Argh! They're neither here nor there, and even makes things look really ugly.

      Delete
  4. I think the colour coding of your code-blocks doesn't give equal representation to the poorer coding practices. I get why you did it, but the background colour makes the code more unpleasant for me to read than the poor code it is trying to show.

    ReplyDelete
  5. Hi Aligorith.

    I am a novice programmer, so I have no definite preference about any particular coding style or practice. What I do think is important though is whether or not the other main Blender Developers agree on any suggested style and practice. This keeps things in sync...

    So do you know if the other main Devs agree on these styles and practices?

    ReplyDelete
    Replies
    1. Hi,

      Good question. The majority of these items are now adopted officially as part of the code style guide:
      http://wiki.blender.org/index.php/Dev:Doc/CodeStyle

      However, there are still a few areas where this here is a superset (i.e. it explicitly specifies what should be done in some situations which the official doc still leaves undefined)

      Delete