Monday, June 24, 2013

Depsgraph - Know thy enemy

As promised in my weekly report yesterday, this is the first in what will be a series of posts documenting various thoughts and snippets of ideas during the development process for a new Dependency Graph engine for Blender.

As well as serving as rolling documentation of the ongoing work and how various design "quirks" (as they'll inevitably be viewed in retrospect) came about, these posts aim to serve two other important roles. From the perspective of the wider community, it provides greater visibility for this project and/or progress made, which may otherwise be difficult to grasp (since it is not really a flashy front-end tool that users can directly hold in their hands). However, more importantly, writing these posts mainly helps/forces me to really flesh out some areas of the design which I may otherwise forget or gleefully gloss over when a suitable solution comes around. It has been said by many notable people (including various scientists, engineers, and academics) that the act of writing down your thoughts can help clarify and refine those ideas, by forcing you to figure out how to express them in a concise manner (and in the process clarifying/resolving ambiguities), thus moving these from merely nebulous figments of your imagination to somewhat physical (or concrete) entities.

So, welcome along for the ride, as the great Blender Depsgraph project gets underway at last.
(Note: some of these posts have been sitting around for a while now – with delays and other things getting in the way – so dates mentioned are somewhat inaccurate).

A bit of background

Perhaps the first questions that you may be wondering are: "What exactly is a Dependency Graph?", and "What is wrong with the current one?"

Basically, a dependency graph (DG) is a data structure used to keep track of relationships between different entities within a system, allowing us to figure out what things in which order need to be updated when a change is made (i.e. when the current frame changes, or we edit some property). Most commonly (e.g. as used in Blender currently), a particular form of dependency graph, the "Directed Acyclic Graph" (DAG) is used for this purpose. The main thing to note about DAG's is that they cannot contain dependency cycles at all (i.e. the "acyclic" part of the name).

Blender's current DAG implementation (http://wiki.blender.org/index.php/Dev:Source/Data_System/Dependency_Graph/Implementation) has a number of well known limitations, some of which are becoming increasingly severe problems nowadays (such as drivers not working on all datablocks which they can be added to for example). Lists of these issues can be found on the following pages:

For a more detailed overview, Brecht has written up a good user-level overview about these issues over on the wiki that discusses these issues in a bit more depth with a few nice pictures: http://wiki.blender.org/index.php/Dev:Source/Data_System/Dependency_Graph/Overview

In addition to these links, I've put together a list of all the various notes and pages that various developers (including Ton, Brecht, Campbell, Sergey, and myself) have written up about these issues. This can be found on the main page for this project: http://wiki.blender.org/index.php/User:Aligorith/GSoC2013_Depsgraph

A literature search

Over the past few days, I've being going through as much of the scientific literature and/or publically available articles discussing dependency and scene graph systems to get a good feel for the design space. Of particular interest to me (based on the kinds of problems we've been having, but also with regard to the goals I outlined earlier) was to find out exactly what successful systems represented with their nodes, how these nodes were scheduled for evaluation, and what these systems did about evaluating/representing/resolving cyclic dependency situations.

As is usual when searching for things, it usually takes a few attempts to finally start finding relevant info. In the case of dependency graphs (and especially those used for 3D content creation and animation) though, it is turns out to be quite tricky to find much of true relevance and worth when it comes to issues about actual implementation details (or at the very least, decision decisions and/or the relative tradeoffs involved) of production systems (vs pure toys/academic exercises) used for animation. Then again, there's almost nothing publically available out there about the implementation of animation systems (for 3D content creation packages) out there either ;)

Typical searches of the obvious keywords reveals the usual candidates (i.e. generic overviews and stubs of limited utility for what we're interested in knowing):
  • the barebones Wikipedia entry which simply states what a Dependency Graph is (stated in terms of semi-staid mathematical textbook rigour, e.g. "let G be the graph containing … where ... etc.", with very little in the way of actual useful guidance on how these are mostly constructed or about any practical techniques for dealing with "cyclic dependency" situations),
  • a link to the Maya User Manual describing the features of their Dependency Graph implementation as presented via the "Hypergraph",
  • some links to our very own wiki and release notes (notably to the docs for the current implementation we use now from way back in 2005),
  • a smattering of tools for computing and evaluating the dependencies between various pieces of code in various programing languages (NOTE: dependency graphs are used for various purposes here, notably as the core tech used by build systems to figure out what to rebuild and in what order, as well as for figuring out code optimisations).

It is only with a bit more prodding and careful slicing that you finally begin to shed light on some enlightening gems (of which there are still relatively few, but at least they are there once you start to really dig for them). For example, this (http://www.gamasutra.com/view/feature/131221/dependency_graphs_in_games.php?print=1) interesting post on Gamasutra about creating a dependency graph for a game engine, and the worth of using a dependency graph separate from the existing scene graph(s) in use for other purposes. Speaking of scene graphs, despite not actually solving the kinds of problems we're trying to deal with here, papers and theses written about this topic do provide some interesting insights about optimisation and parallelisation techniques that can be used for this purpose.

Scene Graphs vs Dependency Graphs

Perhaps one of the first issues you'll encounter when starting to search a bit about this sort of stuff is that there's generally a lot of talk about so-called “scene graphs” and relatively little about “dependency graphs”. You'll also note that university graphics courses generally introduce the concepts of scene graphs (notably the hierarchical, or tree-like aspects of them, and the way they can be used to represent transform hierarchies between objects, geometry, and scene elements such as lights and cameras), but almost never the latter. So what is a scene graph, and are we just barking up the wrong tree looking at dependency graphs (which seem to only be related to compilers, programming languages, and build-systems) when almost everyone else dealing with graphics only ever talk about scene graphs?

As it turns out, these are two different and complimentary data structures (i.e. both can occur in the same program, instead of this being an “either-or” decision) which serve two very different roles. While there are some other uses of these terms out there (where the authors have tended to be a quite loose with the terms they're using), it's best that we stick with a more confined definition here to reduce confusion about the types of things we're referring to. IMO, the differences between the two are only really start making sense after you've examined (and understood) the underlying motivations behind the use of each (and perhaps if not, the Gamasutra article will help to clear things up a bit). Namely:
  • Scene Graph = A data structure for representing and partitioning world space in a hierarchical manner. Basically, these are wrappers around OpenGL (or DirectX) which allows you to play with scene data in an object-orientated way to add special effects and/or optimise the way things eventually get drawn. While there is perhaps a bit of overlap with spatial acceleration structures such as BVH's, KD-Trees, and Octrees, spatial acceleration structures are typically more tuned for high performance, while scene graphs are really more to reduce the number of costly “state changes” that need to be performed when telling the GPU graphics pipeline what/how to draw your scene.
  • Dependency Graph = A data structure for keeping track of relationships, figuring out what to update, and in what order updates need to occur for everything to be correctly updated as efficiently as possible. That is, this is used to represent more complex interdependencies between entities than can be supported by a scene graph (i.e. custom modification logic in the form of “drivers”, “constraints”, or “modifiers” which take multiple entities into account instead of just a direct parent or chain of parents). Thus, while you might have a scene graph in your program for being able to efficiently render the scene, you could also have/use a dependency graph to ensure that when you start trying to add some more complicated animated effects into the mix, that all the necessary parts of the scene graph get updated accordingly as needed.

It is only after understanding these fundamental differences (remember, focus on what the main aim of these things are; it's hard to grasp this initially, in much the same way that students usually struggle to come to terms with design patterns when they are first introduced to them) that it finally becomes clear why scene graphs (such as OpenSceneGraph or OpenSG) are in fact unable to solve the problems that we currently have as they're tailored to a very different problem. So, really, if anyone was ready to jump on that “me too” bandwagon of, “Why don't you just bash in <xyz> famous generic toolkit in here?”, that really doesn't work/apply here.

Insights from Scene Graph Implementations

Nevertheless, there are things that we can learn from scene graphs, since they need to deal with similar sorts of issues at some point.

Of particular relevance are the discussions about what to do about data duplication (or more generally, the problem of multiple users of some piece of data), especially when issues of thread-safety come into play (NOTE: this is essential if talking about multi-threading your pipeline to take advantage of the parallel-centric design/speed-limits of modern hardware; non-threadsafe programs are famously buggy, with problems ranging from minor niggles such as unpredictable performance, including the potential for random and hard to reproduce locking up when running on different machines or at different times of the day, to more dangerous problems such as data corruption).

Dirk Reiner's thesis on OpenSG (www.eg.org/EG/DL/dissonline/doc/reiners.pdf) offers a good discussion on these issues, in particular about when to duplicate (or not) mesh data (which is a significant concern, given the sheer size and complexity of this data in practice).

Gamasutra Article on Depsgraphs for Game Engines

Link: http://www.gamasutra.com/view/feature/131221/dependency_graphs_in_games.php?print=1
Author: Jelle van der Beek

Summary:
This article provides a nice practical introduction to implementing dependency graph systems for graphics purposes. In particular, it is targetted at game developers, but also mentions a number of points which are relevant for developers of 3D content creation packages (i.e. us!).

It covers why dependency graphs are needed, even when you have a scene graph in your application already (Note: it explicitly highlights the differences between them). Then, it breaks down the various types of dependency situations you can have between different nodes/entities (I've listed these below, with a quick summary of each; one of these “data processor” – which is effectively a proxy/wrapper type of deal – I've omitted here since it doesn't seem to be so relevant). There is a brief discussion (including C++ pseduo-code snippets) of how updates for “changes” pass through the dependency graph (i.e. it's that age-old problem of pushing or pulling data), though it seems that our current pull-based methods work fine enough (for once).

Common Dependency Situations:
  1. Simple B, (C, ...), depend on A – e.g. result of parenting or constraints,
  2. Implicit B depends on A – e.g. B is animated object, and A is timesource
  3. Mutually-Dependent Cyclic Dependencies – e.g. everything relies on everything else; there's no obvious way to “break” the cycles without external assistance (i.e. some way of allowing users to define scenario/domain-specific input to control where it's acceptable for cycles to not be strictly correct, thus “breaking” the cycle)
  4. Solvable” Cyclic Dependencies (or as Brecht says, “phantom cycles”) – e.g. when A and B both look like they depend on each other (e.g. A ↔ B) on entity-level, but actually, the property-level relationships between them don't actually have such problems; can be solved by suitable partitioning of the entities into suitably sized atoms which don't have any cycles between them. More on this later.
  5. Multiple Dependencies – e.g. A depends on B in many different ways

Resolving Cyclic Dependencies – Solvably Cycles:
For “Solvable” cycles, the key advice given is to “Define Your Atoms”. In other words, “split objects up into smaller objects”.

As it turns out, this advice is actually pretty widely-accepted OO/data design wisdom, with names such as “Single Responsibility Principle”, and “Separation of Concerns”. Now, as an undergraduate student a few years ago, I'll admit that when I first heard about this in lectures, this concept 'made sense' (in that vague, “yeah, objects are really about defining blobs of related information, so maybe there are times where some clusters of things aren't really that related”, fuzziness that you get when just staring at a list of definitions and so forth).

However, it's only really been in the years since, especially after truly encountering a complex codebase involving no less than 3 different threading subsystems all somehow interacting with each other (and more alarmingly, one of those threading subsystems being used to interface with an X-ray tube), that I've truly come to appreciate the true importance of this principle. Basically, when you've got a multithreaded application, you need to ensure that threads don't try to read/write the same resource at once, or else you get strongly “non-deterministic behaviour” (or, in plain English, “random shit silently happens”)! There are basically two ways of dealing with this: 1) using the various species of “locks” to mask off the data so that only one of the threads can operate on it at a time, or 2) making a copies of the data so that the threads can work on their own copies (but only if they don't really need to collaborate). In both cases, it's in your interests to minimise the size of structures (i.e. the number of properties contained in an object/entity).

For example, threading “locks” are expensive to use (i.e. they impose a hefty time/performance penalty) as they need to be set and freed, but more importantly, because the data you've locked up cannot be used by anything else in any other threads. So, if you've got a large structure that ends up entering a lock (a so-called “critical section” within a method needing locks), all the properties in that structure are now only usable within that method and not in other threads you may have running. The problem here is that, if there are in fact several different sub-groups of properties within that structure, and that the processing of these sub-groups of properties can actually safely happen in parallel, then you're effectively killing off the possibility of being able to run these in parallel, since those properties are now not accessible.

Similarly, if we were just copying data around instead, you have less data to copy, so less time to spend performing said copying, while also reducing the memory overhead of having those extra copies. Note that concerns about the cost of storing double, triple, etc. copies of heavy pieces of data are the main sticking points for people when considering the option of simply making copies of stuff to have true data separation/sanity for thread safety. However, by reducing the size of individual elements into more tightly-knit atomic components which are composed together, we gain the benefits of not only needing to copy smaller chunks, but actually having the possibility of sharing component chunks which don't change between larger aggregate entities, thus resulting in some avenues of optimisation if/when it comes down to that.

Resolving Cyclic Dependencies – Non-Obvious Situations:
Not all cycles can be resolved. In which case, the only solution is to isolate these, and cycle through them (i.e. evaluate several times) until either: 1) the amount of changes being propagated between them becomes neglible (aka “the system converges on a solution”), 2) we've reached the maximum number of iterations that we're willing to allow it to take (aka “threshold” for limiting potentially performance-impairing operations kicks in, since not all cyclic situations will actually converge, or it may not be easy/possible to check!).

Key Points:
  • Dependency Graphs != Scene Graphs.
  • Dependency graphs are mainly about ensuring that updates go out to all places that need to be updated in response to a change. At the same time, they should also try to minimise the amount of recomputation that needs to take place (if possible).
  • Cyclic dependencies best handled case by case” - That is, extra domain knowledge (i.e. an opportunity here for users to say what properties are important and which aren't) is important to being able to devise specific and appropriate fixes to cyclic solutions. It's still best to avoid cycles, but when they happen or are inevitable, it's better to have good ways of dealing with them.
  • Avoid having “phantom cycles” by simply breaking down the objects you're dealing with into smaller chunks, until finally those chunks are “atomic” enough that you don't have cycles between them. This also makes sense from a thread-safety standpoint.
  • For unavoidable cycles, simply solve by allowing these to be cycled a number of times (limited to a “max iterations” threshold that user can specify)

Useful Info:
  • It may be necessary to get down to “vector level” for granularity of dependency representations. We need to draw the line somewhere, but this should give a good balance.
  • Pulling is generally better than pushing. At least it means that we don't risk potentially ending up recomputing some things more than once.
  • Using trees to represent dependencies is bad.

Dreamworks Paper on LibEE (Multithreaded Dependency Graph for Character Animation)

Paper: LibEE: A Multithreaded Dependency Graph for Character Animation
Authors: Martin Watt∗, Lawrence D. Cutler, Alex Powell, Brendan Duncan, Michael Hutchinson, Kevin Ochsk

Summary:

In short, they describe their new dependency graph and evaluation engine they developed over 3 years for use in their in-house animation software. Core goals are to have proper and efficient multithreading, so that this software used for character animation (i.e. complex rigs, with many deformers, effects, etc.) have better performance (i.e. are closer to operating at realtime speeds). They mention many tradeoffs/assumptions which apply in this environment too.

Key Points and Techniques:
As per the paper, their main contributions (and by extension, important aspects to creating multi-threaded or at least multi-threading-ready architectures) are:
  1. Utilise “graph based” concurrency opportunities as well – That is, nodes in your scene will likely form little dependency sub-graphs (e.g. separate limbs of character), which have very little interaction with each other, and can hence be computed in parallel (as there's no conflict to doing this). This doesn't preclude still multi-threading those 'embarassingly parallel' situations inside performance-hungry operations such as subsurf, etc.
  2. Once you've got your subgraphs of nodes, you feed them into a scheduler engine for efficient distribution across the cores. In their case, they just took Intel's Open-Sourced libraries for this purpose: TBB (Threading Building Blocks – low level threading primitives) and CnC (Concurrent Collections – “task dependency management”, “ensures that chains of nodes run on same core”)
  3. Provide some visualisation tools for technical users to visualise/debug inefficiencies in their pipelines, opening up the possibilities of quickly identifying bottlenecks, and then rearranging the setups so that they lead to parallelisable graphs.

Another important contribution is the idea of having “static rules” for how to perform updates. When I originally read this paper, I ended up thinking that this meant that they were just doing the RECALC_OB and RECALC_OBDATA tags that we were using, just perhaps on a finer grained scale. However, after rereading the paper following Martin's comments on that earlier post, I suspect now that what they really meant was that they basically figure out the exact set of nodes which need to be queued up (and in what order) when you, say, modify the “lip.frown” parameter on your rig, resulting in certain drivers and deformers getting queued up to be executed (instead of going through a more generic “graph traversal” to figure out what needs to be done).


Additional Points to Consider:
  • Nodes in the DG are “standalone compute units”, which take in some inputs and produce some outputs. What exactly constitutes a node/compute unit is the crux of ensuring the design can work efficiently when multithreaded.
  • Under Section 5 (Graph Evaluation Mechanism), they mention that the application has to specifically request “output results”. This could potentially simplying the “where are things stored” issue, if we can say that: “Here is where I'd like the result to be dumped when you're done.” (i.e. providing a little mailbox/slot for the results to go into).
  • Graph Evaluation is a three-step process – tag dirty nodes, add nodes which will also need to be recomputed to a “task list”, then finally execute this task list to evaluate (parallelism opportunities here are taken into account).
  • For cases where there's some kind of “switch” node (e.g. between low and high resolutions of a mesh), they perform an addition step to prune the unused branch(es) before actually evaluating it.
  • Some way of managing the risks of vetting whether code is threadsafe or not. They had a specific protocol of labelling code with various levels of threadsafety warnings, restricting how those nodes may/may not be combined in the execution process
  • No graph editing” assumption – From re-reading Jean-Luc's original design for our current system, it appears that we currently apply this assumption too. Basically, we currently just chuck out the whole graph and recompute it if stuff gets added/changed in the scene, which results in the dependencies that need to be represented changing. While this takes a bit longer to rebuild, it is also much much simpler to maintain – we only need to care about how to build a proper and valid depsgraph from the scene, instead of also adding complications to figure out how to ensure that the update process goes smoothly (and/or providing support for actually doing that in-place modification!). So, while I was originally skeptical about this assumption, I think we can live with it.
  • They claim to have “encountered numerous challenges” implementing this system, especially when it came to ensuring that the thing actually ran multi-threaded. As I'm personally not too up on multithreaded coding + debugging myself, I think it's perhaps wise to say that for this GSoC project at least, we'll just strive to create “multi-thread ready” design (much like how they called TV's “HD-Ready” and “HD” ;) This basically means that architectural decisions will be made to keep it able to be multithreaded (e.g. atomic datatypes, etc.), though we probably won't embark on the actual final stage of turning on the multi-threading support (i.e. basically, we'll implement a basic/dump scheduler that initially won't try to schedule tasks in a multithreaded way).



3 comments:

  1. nice read! (already! still reading tbh...)
    these guys came to my mind also:

    http://fabricengine.com/2012/10/creation-platform-101/
    http://fabricengine.com/2012/10/test/
    http://fabricengine.com/2012/08/using-node-slicing-to-manage-large-scenes/

    ReplyDelete
  2. Interesting articles on the Fabricengine, I was unaware of what had happened to SoftImage! It looks like they use a similar dependency graph (DAG) as Blender is using, and unfortunately the article is light on details. Looks like Blender is very competitive already!

    ReplyDelete
  3. You know, reading through this reminds me strongly of the Google V8 Javascript engine, which basically has to parse entities and dependencies and then compile those into machine code in realtime, optimizing paths as they get used. Javascript as used in browsers also deals with cyclic dependencies on a regular basis, so perhaps there is something that can be learned there? On the flipside, I don't see much talk about the design when digging around.

    ReplyDelete