Aligorith's Lair: FAQ/Rant: Learning your way around a new codebase

From time to time, we get emails and forum postings asking about how to get started developing Blender. These almost always follow a similar predictable pattern (no offense to any individuals whose posts actually resembled this almost to the letter):

1) Blender is cool. I want to write some cool new big feature (*A) or fix some bugs (*B).
2) I've experience in C/C++/Python/Java/OpenGL and/or have taken classes in C/C++/OpenGL/matrix math (*D)
3) I'm trying to understand Blender's source code. The source code is so large that I don't know where to start (*A+E) / the "learning curve is too steep" (*A+D+E) / I tried stepping through the code in a debugger and it's too complex/hard to follow (*D+F) / it's weird/confusing (*D+F+G)
4) HEELLLLPPP MEEEEEEEEE!!!!!!!!!!!!!!!!! PLZ!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Comments on each of these points:
*A - Oh no! Experience has shown that usually, this is not a very good sign of things to come. Said "cool new big feature" usually either duplicates existing functionality, or simply means that the guy wants an exact replica of a feature from [insert-big-evil-corporate's-cash-cow-product-names-here] for better or for worse. The first case indicates a lack of Blender domain knowledge, while the second indicates wider systemic knowledge of developing software (i.e. both in estimating the complexity of the proposed undertaking due to lack of any real programming experience, but also in being able to adapt/refine/reshape ideas so that they fit better into the final product in a meaningful + useful way). Although both of these were probably well-meaning people who were very enthusiastic to get involved, there is likely to be a significant mismatch between their motivations/plan of attack + ambitions vs their current skill level, which will likely cause some angst.

*B - Cool! Welcome aboard! We welcome attempts to address our open bug reports anytime. (Just beware that some areas are "bug farms" - e.g. bone roll - and you'll probably soon start to understand why we have generally avoided touching them when you become the new defacto maintainer for those ;)

*C - Yes, there is actually a 3rd category here for #1. These are the people who want to work on small, well-defined features to get their feet wet. It should be noted though that these people usually just manage to quietly get on with the job. They may still ask questions about various matters when they get stuck, but these tend to be very well-defined and specific technical questions that they directly ask the relevant people. In short, over time (if they hang around), these are the people who've ended up on the core team of developers.

--------------------------------------------------

What's likely to be the matter in each case?
*D - When many people say they've got "experience" with certain techniques, especially when it comes to that from taking courses, there's unfortunately a great gulf between the level of depth+competence they've gained that way, and what's needed in actual functional situations. Having been on both sides of the fence (sometimes simultaneously - in the sense of having taken 1 or 2 courses at various stages where I was really experienced in parts of the material but not the rest, and being able to see how much the two perspectives differed for the parts I knew), I can say that such a gulf does exist.

Firstly, there are limits on how much material we can physically fit into a course (i.e. not a lot). Even after allowing for students only being able to cope with so many core concepts before they saturate and just start forgetting or being too overwhelmed to take in any more, we are often only able to provide students with what can be described as a cursory skim across the treetops of the main issues/concepts they should be aware of (or else we'd not have enough time to cover and assess all of that material).

As such, there is still a wide gulf that students end up needing to cross themselves to reach the decent "base-level" that competent professionals need to have. The problem of course is that not everyone realises that this gap even exists, which is why we often find bands of fresh-grads all "whizzy-snappy-happy" to parrot a certain set of buzzword-laden "pure" perspectives on things. (e.g. OO design for everything, strict "ceremony" style SCRUM, etc.)

There is one other aspect I think needs to be brought up. Namely, the issue that to be truly good at something, IMO you need to learn it twice, with a gap in between.
   - The first time you learn something, you're just getting exposed to the domain, and are often scrambling to just get on top of all the various concepts.
   - The gap occurs when other stuff comes along, and the stuff you'd been learning gets pushed to the back of your mind
   - The second time you learn something is when you actually need to start applying it. Often, at this point, instead of somebody else effectively defining a path for you to tread, you end up self-motivating to do this (or else whatever needs to be done can't be done). Regardless, the second time you learn the stuff, you'll tend to remember it for life in much greater detail. This time, instead of starting from scratch and trying to figure out the landscape as you're trying to build up experience+confidence, you're re-treading a path you've previously visited, and are able to actually take things in. Well, at least that's my current theory on this matter :)

*E - Secondly, even when people don't take courses, there is a big difference between "greenfield" projects where they were mainly "programming in the small" to actual real-life codebases which do practical things. If we couple this with unclear or unwise motivations, then you've got the seeds for a pending storm of confusion.

*F - Closely related to that "programming in the small" mindset (which learning to program by following courses tends to encourage, as they only focus on simple "toy' examples out of necessity and to keep things nice and focussed), is the problem where you get legions of people who are effectively deluded into thinking that,
     1) computers are machines of logic <=> code is just a series of logical statements and procedures, so...
     2) to understand a piece of code, we just need to follow the logic and reason about it, from "first principles", and starting from the start/entrypoint (aka the Main() method)
     3) stepping through code using a debugger, starting from the Main() method is a perfectly great/logical/sane way of learning how to work with a codebase

TBH, I am quite vehemently opposed to these viewpoints. IMO, that approach doesn't work in practice on anything that is big/complex enough to be of any use. It is especially completely and utterly useless when dealing with anything that has a GUI, but also increasingly anything that is event-based or declarative. Stop perpetuating these myths! It's NOT the way for any beginner to learn what is happening.

The approach to use instead is to search for where labels visible in the UI are defined/used in the code. From there, start trying to check on any callbacks or function calls made nearby, which have names which suggest they are involved in getting that feature to do what you know it does. By following this approach, you should be able to quickly hone in one the small subset of the code related to the feature you're interested in. Focus your painstaking flow-following + reasoning work on the things in this area ONLY. Before long, you should be able to understand it enough to identify the patterns of how it does things, and from there, your feature/fix shouldn't be too hard to manage.

Sure, it may get tricky with some internationalised code (i.e. the actual strings may be defined in a separate file, away from the code). In that case you'll either want to track whatever identifiers are used to fetch these separate strings, or to simply find code where the names (functions, classes, etc.) seem to reflect concepts you've become familiar with as a user.

A useful tip for finding where such code lives: coders love to modularise their work. Thus, the folder structures used to store the source code often provides useful hints for how the modules are broken up. For most of the work I've done (which tends to revolve around stuff accessible from a gui at some point), starting by finding the core parts of the GUI (i.e. where some important reference feature is located - again, you probably want to avoid the "main window", and focus on something within it instead), you'll find what you need eventually.

*G - Complaints about "complexity" are usually only because you haven't yet figure out how the developers have structured things. More often than not nowadays though, at the very least, you can expect to find some docs at least loosely introducing what the key modules are, and/or describing how any particular domain-specific object lifecycles may work (e.g. how the various methods in your Android app classes feed into the object lifecycle management core of the OS, or how QWidget lifecycles work and how the QSignal/QSlot mechanism interacts with these, or what "Operators" in Blender are and how these interact with the main event loop).

But, more often than not, your first project really shouldn't even be big enough that it requires you to start worrying about such things. If you find yourself trying to "understand how these subsystems work" (without having tried writing a feature yet), you've already gone too low level (aka you're trying to do a "from the main() method up" dance again). Most of the time, it really isn't bloody necessary to know these details!

Aligorith's Lair

Thursday, July 10, 2014

FAQ/Rant: Learning your way around a new codebase

No comments:

Post a Comment