Monday, September 13, 2010

Matters of Code Indention - Tabs only please

The internet is a place where people can discuss various matters, in particular, the tale of two bitter rivals. There are: Ford vs Holden car enthusiasts, Canon vs Nikon photographers, JPEG vs RAW, Marmite vs Vegemite, AMD vs Intel, ATI vs NVIDIA, and emacs vs vi(m), to mention a few of the many such rivalries I'm aware of.

Of particular relevance to the last one is the issue of code indention: how much, and of what type.

We should all (hopefully) agree that code needs to be indented to better show the structure and form of that code as well as making it more readable. If you don't agree, please go and use Python for a few weeks then come back with the newfound "appreciation" of what code should look like when it has been indented. Either that, or go back to writing in assembler of machine code or whatever stone-age contraption you use.

The question then becomes: how do we indent our code?

There are essentially three common forms of whitespace character that we are able to insert: newlines, spaces, and tabs (in alphabetical order). Newlines are used to separate text into "lines", corresponding to statements that we want indented; hence they are of no use to us.

Therefore, we're left with spaces and tabs only. A space is a "single character" width of white space, while a tab can be multiple. Hence, let's step back and examine the question: how much to indent?

A popular amount for "one indent step" is "four spaces worth" (4 sw). This offers a nice balance between being a healthy distance and not taking up too much horizontal space. However, some people like 8 sw, but that really starts looking quite excessive for all but the most trivial of code. Other (IMO demented) people seem to like 2 sw, 3 sw, 5 sw, 7 sw, or some other prime number of sw! Personally, I feel that 4 sw looks the best (for most languages, though lisps might be an exception here, though they're really in a league of their own anyway).

Now, how do we get indention of these sizes?
1) We insert n spaces to get n sw, everybody sees n sw indention, and changing it means changing all the code.
2) We insert 1 tab per indention step, and adjust these tabs to display as n sw wide in the text editor (or whatever we like) but without altering the underlying code.

One conundrum arising from this facet is the one about freedom of choice for how you (personally) want to see the code. When you use tabs, it doesn't matter that other people don't like the size of the indention you like (or vica versa), as all that matters is that the underlying code is indented by x indention steps since the visual representation can always be locally adjusted using editor settings. On the other hand, with space-based indention, you are stuck looking at whatever indention someone else chose. Most of the time this is not an issue, as many people choose sane values, but that is not always the case.

There is another benefit to using tabs, with respect to "indention steps". IMO, indention plays an important role in the representation of code. Python takes a step in the right direction here by recognising the role that use of indention has on showing the scoping of chunks of code. However, I think it doesn't quite go far enough to allow users to add extra levels of indention where the code pattern being used requires this (OpenGL type API's come to mind, where statements within the GL-begin/end are ideal candidates to be indented an extra level from the preceeding code, to show more clearly that this stuff happens within that block).

Due to this role, I think that it's very important that we are able to easily count/quantify, line up and associate, and navigate our code relative to "indention steps" rather than at the low-level of how many characters are needed to make the whitespace of required width. You can consider this a type of abstraction, or perhaps of using appropriate tools for the appropriate job, but using tabs gives us a tool that we use to find the structure of our code easier.

Here are some explanations of the points mentioned above, showing how tabs do this better than spaces:
1) Counting/quantifying the amount of indention we have:
- With Tabs, it is easy to either visually inspect the indention and say "oh, the chunk there is large enough to be 1 step, 2 steps, or 3, etc." without having to guess (this is probably more of a matter of choice of sw's though), OR being able to run the cursor (with the arrow keys also works, but is less comfortable in general) over the whitespace and selecting it while counting how many times the selection "jumps" to get to the start of the code.
- Using Spaces, it becomes much harder to quantify the amount of indention being used, as you end up having to count the number of spaces (easy to lose track of that) and then have to still divide this number to get the level of indention. It's one more level of mental tax that distracts from the core task of coding and understanding existing code

2) Lining up blocks of code, and thus associating their scopes:
If it is not possible to easily identify the amount of indention being used, how can blocks of code be lined up to check if they are within a certain scope, implying whether they are part of a certain branch, etc.? It often happens when working with production code that you need to be able to find these things out as you try to work out how some foreign code works.

This also matters quite a bit when trying to add new code. When using tab-based indention, you can be quite sloppy about how you select your code for deleting,etc. as it becomes quite obvious when you've gone too far (as with 1). However, when using space-based indention, you really have to be quite careful, as all too often, you end up deleting one too many or too few spaces, end up with mal-alignment of code that isn't actually immediately obvious (as code above may have just scrolled off screen in the meantime).

By this same token, with space-based indention, hopping to a particular indention level to start work is not always that easy, as you nearly constantly end up being one or two spaces off EVERY single time you go and try to code.

3) My points about code navigation have pretty much been covered by the above discussion already ;)

Now one issue related to indention I think we should also mention here, as it also seems quite contentious but which is relevant to my argument here. This is what to do with indention on blank lines.

Personally, I believe that there SHOULD BE WHITESPACE ON "BLANK" LINES, with the indention level being that of the scope that the blank line is existing in. This helps keep the continuity of the code between chunks of grouped related statements. This makes inserting new lines of code on either end of the existing code easier, as you can just start typing, instead of having to reindent to the required level first to start coding again. Also, when navigating your code with arrow-key movements, or bulk-selecting of lines of code WILL NOT result in the cursor jumping around (and randomly disappearing on blank lines, when it jumps back to col=0), which is actually quite visually distracting (I know I've once been flamed by some twits who actually didn't read carefully what they were saying, when they suggested that having truly blank lines wouldn't leave you with the cursor jumping around... that's absolute bullshit).

So, hopefully if you buy this, then if you haven't already, please ensure that the "Strip Trailing Spaces" option is disabled in your text-editor now. IMO, this option is wrongly implemented most of the time - it should only look out for whitespace after the code on that line occurs. On "blank" lines, there IS NO CODE, but the whitespace that is there is not strictly "after" the code that would otherwise be there, but is rather the scoping indention that occurs "before" the code would normally appear. The same flamers who do not understand this are the same ones who use/develop one of the text-editors out there where this option CANNOT BE DISABLED, as it really seems to be too hard coded into their warped psyche.

Speaking of text editors, I have a few last thoughts regarding indention and text editors.

It seems to me that most of the people out there who bleet about using spaces for indention fall under one of the following categories:
1) users of very cruddy text editors that are so shoddy that only space-based indention is usable/reliable - most IDE text editors fit under this category
2) users of old Unix text editors with some questionable practices and stone-age interfaces - a bit more on this to follow
3) coders who have at some stage developed some parsers, and have during that process become depressingly lazy about parsing input. These are the same guys who develop languages where nearly every symbol is representable as a single character.

The software implied by 1 & 2 are probably the result of code that was developed by guys who fit under 3. Hence, the users in 1 & 2 come to believe that the only way to avoid giving pain to themselves is to use spaces.

Now about the Unix text editors. Here I'm referring to the two main families I know of: emacs vs vi/vim. Personally, of these two evils, I go with the vi/vim side over the emacs side (modality + single-key commands sound much more accessible than the arcane twister-ctrl-shift-alt-double-layer-sticky-commands needed to work with emacs, not to mention that the blasted thing/emacs is quite lisp-y all over for all keybindings and other internals). vi/vim also has some rather neat things built in, such as the power for doing quick replacements of some stuff, all just using the keyboard and typing some magic invokations, not to mention being able to be run from a terminal shell (which is especially useful for remote sessions, and the main reason I learnt to use it really... I would've stayed away otherwise), though it does suffer from not being able to be easily made to totally fit with some of the things I've mentioned here.

For serious coding, I do not think that text editors that cannot support tab-based indention to a reasonable degree, ability to disable "Strip Trailing Spaces", and also show indention guides to be totally useless POS.

1 comment:

  1. When you said that python lacks the abilities to indent code when the logic of the code needs it, like in opengl glBegin/glEnd, I think the issue always come from the api you're using (and especially the binding used) who do not implement context manager.

    Ie, for glBegin/glEnd if your code is like that :

    for points in all_quads:
    for p in points:

    I think you better had to implement a context manager like that

    def myglBegin(shape):

    And then use it like that :

    with myglBegin(GL_QUAD):
    for points in all_quads:
    for p in points:

    You get some niceties with it :

    - Never forget the "cleanup" operation
    - you get your indentation right

    Every time I think "Hey, I want to indent that stuff", I'm finishing with a contextmanager.