Monday, June 24, 2013

Depsgraph - The Granularity Problem

In today's post, we look into one of the problems affecting Blender's current Depsgraph: the granularity problem (also known as the “Why can't I use a bone as a driver for another's constraint influence?” problem). Another related problem arising from this is the “phantom cycles” problem (i.e. the, “Huh? It keeps telling me there's a cyclic dependency here, but these things aren't even related!?”).



Current Architecture

As it stands, Blender's current Depsgraph effectively only knows about two types of nodes: “Objects” and “ObData” (where ObData strictly refers to the datablock pointed that “ob.data” points to). Although this is perhaps a slight oversimplification of the actual situation (i.e. Objects without dependencies are linked to a “Scene” node, and Armatures actually have separate sub-graphs they use for representing the dependnecies between bones within the hierarchies of a single Armature), it is helpful to remember that our current depsgraph only has such a crude distinction for tagging and keeping track of dependencies.

From the original design doc for this (http://wiki.blender.org/index.php/Dev:Source/Data_System/Dependency_Graph/Implementation) it seems that we were supposed to actually have a few more types of nodes, and that we could actually keep track of the types of relationships being created with a far greater level of detail (i.e. check out the “dag_rel_type” enums in the API listings at the end of that doc). However, somewhere between that proposal and the final implementation, things somehow got merged down to the point where we are now left with a “blunt instrument” approach.

Perhaps one of the contributing factors here though was the need at the time to get things done fast (in the lead up to Orange) or with minimal invasive changes to the codebase (as I recall seeing in one of these docs at one stage, but which I can't seem to find offhand again), which lead to the decision to essentially only use depsgraph to tag and sort the hardcoded evaluation functions (i.e. “object_handle_update()” and things like “where_is_object_time()”, etc.). However, from what we've seen, IMO we've well and truly past the point where such “minimally invasive” tweaks in the code are able to cut it anymore.

To understand this problem more, let's look a bit more in depth at two of the places where this problem usually crops up: 1) armature evaluation, and 2) constraints on different objects referring to different transform aspects.

Armature Evaluation – How it Currently Works

The following diagram basically summarises how the current armature evaluation system works, especially in relation to the depsgraph.


Let's walk through this a bit.

Datablocks in Depsgraph
On the left, we have the chain of datablocks being evaluated. This is basically the depsgraph's view of things, and happens inside object_handle_update() - firstly the object level datablock is evaluated, and then the object-data level datablock is evaluated. While strictly speaking, the set of bone data containing animation/constraints/etc. is actually stored on the Object datablock instead of the Armature datablock (i.e. those bones are actually “Pose Channels” which are part of the “Pose Armature”, which is ob.pose, whereas ob.data is actually the “Rest Pose Armature”), when it comes to evaluation time, “ObData” for armatures really means that the “Pose Armature” now needs to be evaluated. This distinction is used this way to distinguish between pose bones being transformed versus the armature object being moved around.

ObData Step and Drivers
Now, let's take a closer look at the “ObData” step here. This consists of two steps: 1) evaluate drivers, 2) evaluate pose. In this diagram, I've somewhat fluffed up a bit here by include the drivers block here (though not entirely). That's because, since the Pose Armature is really stored on the Object datablock, the drivers for it (i.e. any drivers you may have on constraints, or bone locations) actually live in the AnimData block at Object level (Note: if you're a bit confused by this, there'll be a diagram in one of the later posts which clarifies this a bit). Hence, in reality they really get executed along with any other drivers on that Armature Object anytime the Armature Object is told to update. This occurs before any bones in the pose ever get updated; also, object level updates and obdata level updates don't necessarily need to go hand-in-hand either! If this is the first time you've heard about all this, then you'll understandably be quite overwhelmed by all of it; it is really quite a bit to take in at first, but goes a long way towards explaining many of the weird artifacts you may have encountered when playing around with these. In any case, it's obvious that there are clearly some problems here, many of which are related to the strict Ob/Obdata only ID-Block units allowed by the current Depsgraph and executed in a fixed order, and that we really can't keep going on like this.

Pose Evaluate
Here's where things take another turn to the left-field: the evaluate pose step then deals with a mini depsgraph that is only used to model the relationships between bones within that Pose Armature. Except that this mini-depsgraph only really shows up when building the depsgraph, where it goes and sorts the bones, and then hides away again, and is basically never seen or heard from again. In other words, IIRC, all bones in the armature get recomputed everytime the armature changes. That is, regardless of whether you only slightly tweaked the tip segment on the pinky of a character's left arm, with no IK solvers active anywhere (and definitely no Full-Body IK running) or whether you were rotating the hips when all the limbs were pinned to specific IK targets, the full rig is always fully recomputed. Rig running slow? Don't say I didn't warn you :P

(A little disclaimer at this point: much of the internal details of the armature evaluation process I haven't really fiddled around with, apart from extending a bit; It was already in this state when I got here, but since it's mostly worked well enough for many cases, I've by and large left it alone)

Ok, back to the evaluation process (where_is_pose()). Now that we're inside the evaluate pose step, we begin a three-phase evaluation process (some of these have multiple steps):
  • Initialise – Compute inverse matrix (for object), Clear Flags on all bones, Compute “IK Trees”
  • Evaluate – Iterate over the “IK Trees”, and perform solving on them.
  • CleanupFree “IK Trees”, Compute Deform Matrices (for BBone deforms, and also standard armature vgroup deforms)

One thing I'm not sure whether many of you know is that IK Constraints aren't actually handled within the constraint stacks, but rather as something that goes “over the top” of all the constraints and bones. That is, they're handled right out in the main loop over all bones per armature. How does this work?
  1. Well, before evaluating the bones, we loop over them finding the IK Constraints (and Spline IK Constraints too), backtracking up the chains to find the “root” bones (which become the “roots” of the IK Trees which appear in the main IK Trees forest/list that we iterate over), in the process hoovering up every bone in our path and adding it to a little array of bones that are affected by that tree (Note: this is why we're actually limited to having only 255 bones in IK chains! The IK Trees currently only let you have that many bones in a chain...), and then growing these trees if we find another IK Constraint whose influence overlaps. This IK-Tree building exercise happens everytime we evaluate an armature.
  2. When we loop over these IK Trees to evaluate them, we either send these to the IK solver, or just directly solve that particular bone. Note here that the bones have already been sorted so that parent bones always occur before children bones – this property holds for the IK Trees too, and is the handiwork of the Pose-Level mini depsgraph. In any case, for each of these solvers, they still go through evaluating the pose of each bone (where_is_pose_bone()) – including parenting and constraints – at some stage. It's just that the IK solvers also muddle with the bones a bit more (including using their own copies of those structures where necessary for easier manipulation).
  3. Finally, when all the bones have been solved (results are stored on the bones in the pose_matrix), the temporary IK Trees are freed, and the deform matrices used by the Armature modifier (i.e. “chan_mat” – which is incidentally also used by constraints earlier as “pure animation-results”, since that's what it contains during armature evaluation, but is later replaced for long-lasting storage) are computed from the pose_matrix.

Constraint Stack Evaluation
Following the diagram, let's now briefly talk about constraint evaluation. As many may have found when using constraints, they aren't truly “Constraints” in the traditional sense of the word as they are only really respected at the moment that they're applied (in a top to bottom manner). In other words, they are perhaps more “Transform Modifiers” than “Constraints”. So how does this work?

Well, as per the “constraint stack” box (i.e. “solve_constraints()”), we simply visit each constraint in turn and evaluate it. Evaluating each constraint is once again a 3-step process:
  1. Setup: Grab the matrices of targets (which should have already been evaluated, either within the armature eval loop as due to Pose-depsgraph sorting, or those objects have already been evaluated due to main depsgraph ob-ob/obdata relationship sorting). Then, ensure that everything is in the correct spaces (i.e. space-conversions are applied to both the “owner matrix” - encapsulated in the bConstraintEvalOb structure that's created before the constraint stack evaluation occurs)
  2. Evaluate: The constraint's evaluation callback is executed, to basically fiddle with the bConstraintEvalOb “owner matrix”, taking into account any targets, however it wants
  3. Cleanup: The newly computed results are converted back to world space (i.e. the standard working-reference to keep things simple) are blended with the previous result (according to the influence sliders/values) and stored again.

Once the constraint stack has been evaluated, we once again perform a “cleanup” step (of freeing the bConstraintEvalOb structure, after copying the temp result it held into the 1) the “pose_matrix”, if dealing with bones, or 2) the “obmat” when dealing with objects).

A critical limitation that needs to be pointed out here is that, when dealing with bone constraints, if one or other bones modify an object/obdata that another bone in the same armature rely on, then we cannot have proper updates, since those objects cannot be re-evaluated in between the different bones which have relationships with it currently (due to the ID/Bone separation). More info on this later.

Key Observations

By studying this evaluation process in depth, we can make several observations about the structure of the evalation process.

The 3-Step Lifecycle of Evaluation Steps
All evaluation processes can be essentially broken up into 3 steps:
  1. Initialise – Pre Evaluation Setup, Allocate memory for temporary structures that need to be used during the evaluation process (or perhaps compute the groups of things that need to be evaluated).
  2. Evaluation – Execute an operation, or (more commonly) loop over a collection of sub-entities, and performing some evaluation operation on those
  3. Cleanup – Merge results or free temp structures created

This gives rise to the following situation:


where your traditional evaluation steps (currently defined as functions in Blender's code base) are all of the 3-tiered boxes. If evaluating those entities involves evaluating some sub-data, then those are effectively “composite” steps, whereas those which don't involve any other sub-data are “terminal” (Note: For anyone familiar with language/compiler design, this is similar to the whole terminal/non-terminal distinction there, where non-terminals are statements and expression structures, whereas terminals are specific literals/keywords).

Also note, that anywhere we have a “composite” execution step, all of the nodes contained within that composite's evaluation step form a sub-graph, and whereever we have a sub-graph, there are once again obvious opportunities for parallelism to happen (as well as some other neat things which we'll get onto in a sec).

However, you'll also notice that there are a number of “atomic” steps, or operations which are already very well defined/compact. These are basically irreducible little nuggets of code that we just execute like black boxes, and are in effect, “the lowest rung” in our graph structure. Another important property is that they are “simple cases”. I refer to these as “atomic” because, as long as overall they're not called before other similar atoms they depend on, and are not called after atoms which depend on them have been called, we have quite a bit of freedom.



In the diagram above, you'll see that all of those red-marked steps are all “atomic operations”. As such, they can be used as follows...

Granularity Explosion”
Currently, the problems we've been having with the Ob/ObData blocks are that they are simply too large. Hence, when evaluating them, if there are any interdependencies between them, we cannot resolve that problem.

However, if we have a way of chopping these up into smaller pieces, then we'll have an opportunity to interleave the evaluation steps for “foreign entities” into the evaluation process for our block, as per this figure:




where we can see that if we split monolithic block “A” up into a number of smaller “atomic” evaluation steps, other operations from other entities (“B”, “C”, “D”, etc.) can be interleaved in with those steps. In fact, we can interleave in any number of such atomic operations!

Interleaving Lifecycle Events: An Example
Ok, so let's refer back to our earlier example. Take a look at the pink/light red labels for each of the atomic operations.

Following the principles outlined, as long as A1 and A3 are the first and last steps performed here (relative to all the steps in B, C, D, E), we should be fine.

Other Issues

Efficiency Concerns
It's not always necessary to explode all evaluation steps. If there are no phantom cycle situations occurring (where interleaving “foreign” steps into composite steps is necessary), then for efficiency sake, we can probably still fallback on a version of these composite-evaluation-step nodes which simply schedules up the whole node to be evaluated in one go (i.e. a single “do_node()” on the node which calls all steps three internally instead of scheduling up and running separately “#-pre”, “#-eval”, “#-post” nodes – which incurs higher costs of memory and scheduling time).

Mechanics of Computation
More likely than not, what will happen with most of these composite nodes is that the depsgraph proper will store these kinds of nodes somehow (maybe need to group them), but then for evaluation, we pull out the exact queue of chained up operations and run those...

No comments:

Post a Comment