Tuesday, June 11, 2013

Risk Scores - Alternative Blender Security Approach

Over on the blender-developers mailing list these past few days, there has been a lively discussion about the state of security in Blender. Of particular concern is the potential for malevolently-minded individals to create malware laden .blend files which then get distributed far and wide to unsuspecting users who were none the wiser about these problems. Particularly troubling too is that many virus scanners would probably not be able to detect these problems as they currently stand.


Perhaps I'm taking a much too simplistic view of these issues, but it seems that while we may never be able to achieve full bulletproof fortress status, there are a number of feasible steps which can certainly help to identify a large number of these types of attacks to allow users to make more informed decisions about their own safety and security. The following proposal is a just a random idea I had yesterday afternoon of an alternative solution that (at least given the facts I'm aware of now) sounds entirely feasible.


The Main Avenues of Attack

From the discussions I've seen, the main risks we're seeking to alleviate here are:
  1. Scripts of any kind, but particularly those which get “auto-loaded” (i.e. loaded on startup or when the file is loaded without any need of intervention on the part of the user) within the files.
  2. Drivers allowing users to specify a “scripted expression” (i.e. a one-line entrypoint for entering arbitrary Python code). These drivers get evaluated as soon as the scene/data they live on evalates its drivers, which is often at least once at the time of loading.
  3. Addons, or scripts deployed alongside the Blender binary in a specially marked directory.
In other words, we're looking at the potential to embed arbitrary Python code, which can be used to wreck havok. Now, while there are many other ways we can extend Blender by writing Python code, the way things work, these all fall under the “Scripts” category of stuff that usually needs to be “auto-loaded” or at least manually run by users.

There were also mentions of the Interactive Console editor type being a potential avenue of attack, if attackers somehow managed to trick users into typing some bad code into there. But, AFAIK, there isn't really any way of disguising this thing much or preloading in content which doesn't involve hooking up some auto-load scripts (see 1). So, we're probably ok here.

Of course, there are still the buffer-overflow risks inherrent in the datafile design itself, but that's a lower-level problem that requires a slightly different set of mitigation approaches (e.g. a .blend file binary verifier or so).

Black and White, or 50 shades of grey?

Currently, all discussions about approaches to safeguarding users against malicious files is that we simply provide them with a binary distinction between “This file contains scripts and/or drivers” and “There are no dangerous features here at all”.

One of the major threads of discussion has centered around the problem that if we simply always show such prompts at users (either modally in their face or non-modally as a badge in the info header for example – the actual mechanics of how we do this really doesn't matter that much though, since the result is inevitably the same), then eventually they'll just get conditioned “To always click 'Allow'” (i.e. the antithesis of what we want). This is because so much of the useful and necessary functionality that make the files work are provided using scripts and drivers (as they say, sometimes users will jump through hoops of fire just to see the dancing bunny). Thus, by just blindly saying “there be dragons within, enter at your own risk” ends up being merely a usability hassle/bug. As has been argued, in files you download off the net, there may well be a “trojan horse” scenario, where there's a piece of core new/amazing candy (i.e. a demonstration of a new compositing noodles setup, or a driver setup for an intricate Edwardian grandfather clock mechanism) to act as the lure (which users will automatically assume/associate as being the main cause for the warning), while hidden on line 700 of one of these scripts (it doesn't matter how many, though the more there are, the worse of course) there is secretly a malicious line (or whole hunk of code) designed to cause you grief.

An oft-proposed solution to this then, is to say: “simple, let's just ensure that such files can be certified as coming from a reputable/trustworthy source”. Here we get into the whole “trusted computing” problem, and you start having to deal with things such as key signing and verification, which then eventually leads to the need for some kind of semi-centralised verification authority/mechanism to keep track of all this. There are a whole host of problems here (that some in the discussion have only really started to touch on), but by the time you start getting really deep into this, it soon becomes a matter of, “When did just trying to make a piece of art on a Friday evening and distributing it to a few friends become a visit to the taxman to file yet another set of IR3's and NOPA's?”, or put in another way, you start having to diversify your efforts away from your core goals (i.e. to provide 3D Content Creation software).

The diagram below illustrates a bit about this divide we have between the various “forces” and tradeoffs involved in these matters as it often seems:

IMO, we're heading towards overkill here, adding patches upon patches for solving the wrong problem. If you refer back to the diagram above, take note of how these sorts of opposing forces on a design are not always mutually exclusive, if you can allow some suitable tradeoffs. Again, we're talking about a spectrum here, not two either-ors.

Also, it's important to stop and think for a moment about the domain we're working in. What is it that we're building? What is it used for? What's the core goal of this thing?

...

Taking these considerations into account, let's look at our initial approach to the problem: that is we provide users with a binary distinction between whether there are known attack vectors (i.e. scripts and driver expressions) somewhere within the file. Never mind that not all of these are created equal. Never mind that some of these may be completely harmless while another one beside it is contains some evil hidden deep within it. Never mind that we're building a 3D content creation package which allows some user flexibility to create extensions for improving usability or performing some custom tasks and NOT just any general-purpose Python environment for building any manner of system admin + network hosting + CS experiments + creative introspection playground with a 3D engine backend bolted on for kicks. As far as Joe user is concerned, “there be dragon in blendfile”.

A Proposal – Scoring "Risk Potential"

Therefore, my proposal is that we should instead aim to profile the kinds of things present, and provide some form of more nuanced risk assessment to users about the kinds of things present.

To keep this easily understandable by users, there should be a way of aggregating these risks assessments to provide an overall score for the risk posed by the file. For example, instead of simply saying whether the file uses potentially dangerous features or not, we can now say:
  • 0% = Clean”, if absolutely none of these things were detected in the file
  • 10% = Potentially dangerous”, as a baseline for the existance of scripts and driver expressions in the file. There's no way we can be absolutely sure that any file containing these things is indeed clean (without having actually run these and potentially run into strife), so we don't lie to users that there isn't such potential. However, we keep this score relatively low for a baseline indication, so that if in fact these turn out to appear to be good-natured UI scripts for example, then users won't need to become overly alarmed.
  • >10% = Suspicious constructs found”, if we find some dodgy-looking constructs (see later) within those scripts and drivers, we will tag and assign scores to them based on the known potential for these being used to wreck havok. Naturally, the use of some of these are more likely to be problematic than others, and scripts making heavy use of these are even more likely to be cause for concern. Thus, higher total risk scores should result from the use and concentration of highly dangerous things. Of course, there's potential for false positives (i.e. some coder has been trying to be a bit more “clever”, and written some “clever code” - there's quite a lot written about the problems with this relative to good software engineering principles), so our use of relative scores allows the detection and informed careful checking of the offending code to verify the true risk posed.
  • 100% = Outright dangerous constructs detected”, if we find stuff that's just so blatantly dangerous and wrong (i.e. perhaps the work of amateur crackers out for a laugh), then of course these should be suitably detected.
This system of “risk scoring” naturally lends itself to being represented as a little percentage-bar widget with colour coding for the level of risk posed – something that could easily be integrated into both the main interface when such files are loaded but also when browsing for files in the file browser. A proposed mockup for this is shown below:

Mockup showing what these widgets could look like (though this is only a very rough draft, and the actual content/copywriting would probably need to be somewhat different in practice). 

The detail panels shown below the last two are floating panels shown when the badges are clicked on, to provide more details about the nature of threats found. The "View Full Report" would be used to generate a textblock showing an exact list of the things detected, and where exactly to find them. Obviously, in these detail panels, it is only really appropriate to keep these relatively brief still, while offering some more context to ease a quick decision if the user already has some hint of what might be going on in the file.

In addition to notifying users about risks, there are a few additional points we can investigate further.
  • The first is allowing customisation of the risk profiles that users are willing to accept as being allowed to run without explicit confirmation (but warning still gets shown). So for instance, some users may feel that they're perfectly happy letting any scripts/drivers run if they don't contain any known or detected naughty stuff (even on the off-chance that we missed some class of vulnerability) and that we should let them do so if they so choose. Another example of this is being able to explicitly modify which of the detection patterns to use or not. As always though, we must be careful when treading here to avoid letting errant scripts simply disable all the protections unbeknownst to the user.
  • Related to this is a hash-based integrity check system for “whitelisting” certain scripts that a user has checked out, where the our risk detection system identified some potential risks due to some quirks of how the script was written (or required as part of its functionality). Such whitelisting must be done on a per-computer/user basis (i.e. this whitelist is stored separate from the actual files), and is designed for use in studio scenarios where in-house scripts can get manually whitelisted when they changed. Studio IT procedures can then be applied to say that staff are only allowed to whitelist in-house assets only, or similar procedures as appropriate.
  • Some mechanism/system where users can get files with suspicious entities tagged “professionally vetted” (i.e. involving any combination of formal code review, and/or testing on secured virtual machines/environments). Again, this may be out of the scope of things that we “officially provide”, and will end up rather facilitating some popup service providers who end up doing this. Users themselves could also do this manually (provided they have the skills, expertise, and suitable tech at their disposal) themselves after being alerted to these risks. Anyways, the ability to know that there's something that there that needs further investigation before letting it run amok is basically our goal here.

Heuristics for Risk Scoring

So, how exactly would we go about figuring out the risk scores for the attack vectors in our files?

Well, although Python experts have generally concluded that accurately detecting/stamping out ALL ways of effectively sandboxing Python environments (especially when embedded in other programs) is nearly impossible or at the very least extremely difficult due to a varying number of hacks and workarounds, it must be stressed that they were talking in the general case. Most of the time, they're talking about making these fullproof fortressses that can be chucked up on a webserver to let any Jack and Jill submit arbitrary pieces of Python code onto and have them be executed (i.e. Google's App Engine). There's absolutely no room for any custom vetting or whatnot here, as running these scripts that people are uploading is THE whole point of the exercise of building these systems, so not letting bits of them run is not exactly an option as otherwise there's nothing left to do.

From what I've gathered, a non-exclusive list of the main methods that people have discussed for hacking around Python sandboxes and detection mechanisms is as follows:
  • Creating aliases for risky things so that their names won't be detected – either on an individual basis or by batch remapping them. For example, “file” => “rotate” by doing something like “rotate = file”, and then calling “rotate(...)” everywhere file would have been used.
  • Use of import function instead of import statement – e.g. “__import__('blah')
  • Use of Python's introspection mechanisms to hack up into the internals to access prohibited stuff, and either call methods through that or by directly injecting bytecode into the executable portions of the interpreter's state.
  • Obfuscation by hiding using encoding hacks – e.g. a “#encoding: rot13” line at the start, or perhaps inline “eval('3Q@#QUFNvdfyutr237$%^$'.decode('base-64'))
  • Direct usage of dangerous constructs like “eval()” and “exec()
  • Module reloading to reinstate things the sandbox may have disabled... - e.g. “reload(sys)” is something I've personally used a few times to force a symbol for setting the default encoding codec used to utf-8.
  • Obfuscation by verbiage - i.e. breaking up dangerous constructs (like the one below) into many single-assignment operations involving storing the intermediate steps in many temporary variables/assignments (with some potential misdirection to save out these things on bad paths), and then "shoving" or "interleaving" a whole bunch of other unrelated operations inbetween (to confuse any human reader, but not a sufficiently sophisticated flow analysis algorithm!)
There may be even worse atrocities out there that are possible. However, a key observation to make here is that once you eliminate the easy cases, you need to start doing increasingly “clever” tricks (i.e. combining various types of obfuscation and introspection) to achieve similar results. I don't know about you, but these things looks awfully conspicious and eye catching, especially if they were to appear in the middle of some of the relatively benign things you normally see in legitimate scripts for 3D content creation using Blender's bpy module. For example, this snippet from Campbell (from earlier in the discussion):
os = next(iter(i for i in (1).__class__.__mro__[-1].__subclasses__() if i.__name__ == '_ZipDecrypter'))._UpdateKeys.__globals__["so"[::-1]]
does its business by diving deep up the referencing chain into the internals of classes, and chaining together several of these “__blah__” members. As an example of how we can use this tidbit of information, it stands to reason that we can start looking out for code that tries chaining these together in this way (i.e. one or more of these, or the use of particular combinations) as a good starting point for identifying “potentially unwanted behaviour”.

This is because, we're not looking at the general case of a general purpose programming language executable that can be used for anything under the sun that you can code up. We're looking specifically at a flexible programming language embedded in a 3D content creation package. Thus, there are things that you'd do which are completely normal for such an application (and which is typified by the examples you'll find in the official code distribution). That's our “baseline” for what “good and normal” code should and needs to be. We have standards on that which are aimed at making and keeping this easy to read – key software engineering principles. Anything outside of this brief, like this sort of exotic internals hacking is really starting to look like suspicious stuff.

For example, take the encoding hacks. It stands to reason that we really shouldn't trust great chunks of encrypted hashcode-garbage. It's not easily human readable, so someone obviously has something to hide here. Red flag goes up. Written in a foreign language requiring strange glyphs? Similar story – we probably want to be careful with that since we have no way of knowing what it's really about (i.e. is it a book of spells and curses, or just a fairy tale? Who knows if you can't understand it!).

Once we accept this principle that we're really just trying to go about identification and not do fullproof blocking (Note: even if we do perform a false negative, the 10% latent risk that there are attack vectors present in the file covers us a bit, as will any other suspicious things present), other things also become feasible again. For example, checking against a wordlist of good and bad identifiers to see (i.e. explicitly avoid “eval(...)” and “__import__(...)” but allow “sin()” for example in driver expressions) gets us a long way. On the topic of driver expressions, we can also treat anything that isn't a math function, numeric builtin, and/or contained in the variables list as potentially dangerous too!

As you can see, all of a sudden, a lot of things become a lot more feasible, while we still accept that we aren't perfect and that some things may slip through the cracks (though never completely!).

Conclusions / Addendum
  1. While knowing with absolute certainty whether a script is malicious or not may not be possible, I believe that it is still well within feasibility limits to at least identify and tag swathes of code which are similar to techniques used to perform malicious tricks. To do so, we simply need to understand that the majority of the fancier techniques needed to perform such hacks look almost nothing like what run-of-the-mill scripts for dealing with legitimate 3D content creation scripts. Simpler malicious techniques can also be easily identified using easy to apply detection methods (i.e. searching for blacklisted keywords). 
  2. Even if our detection techniques result in false positives (i.e. legitimate code is caught), that is fine, as this is not an offline processing engine, but rather part of a user-facing tool. Thus, it's better to keep the human in the loop and ask for human input on anything that looks suspicious (again, we have this benefit here, and not everyone in the general case does; human power is still often greater than blind machine power). To do this, we tag anything we even remotely suspect are dodgy, and we tell the user how suspicious we are about it (or the severity of the consequences if the dodgy stuff was truly being used for dangerous deeds), as well as where to find it to check (or have it checked out).
  3. It is still possible that our technique will eventually miss some dangerous techniques which it wasn't designed to detect. Still, in this case, users are still advised of the potential for hidden dangers lurking in the attack vector sources and told that we couldn't find anything we were looking for - though it needs to be stressed that we can't detect what we don't know we have to detect. At the very least, users can open the file and not be attacked - giving them a chance to verify whether the things we deemed clean are truly clean. 
  4. The detection mechanisms I'm suggesting here would do a combination of both raw text processing (to avoid being blindsided by whole-file encryption for example), AST inspection, and may also include some static analysis (to defeat obfuscation by verbiage - ). Basically anything but actually running the damned things either in a real or sandboxed evaluation environment.

4 comments:

  1. I thought of suggesting this on the Blender developers list, is script sandboxing not feasible?

    ReplyDelete
  2. Davis:
    From what I understand, all these various types of hacks practically make sandboxing scripts impossible. You can stop them running outright or you let them do whatever.

    ReplyDelete
  3. Great proposal. Only issue is what happens when it fails to detect a critical backdoor. Script somehow is able to execute fully customized code and the riskometer says it's 100% safe. In that case falsely claiming script to be safe is worse than not evaluating it's risk level at all. :s

    ReplyDelete
  4. True, there's always that danger that we miss something. However, a key mitigating factor is that as long as scripts exist, we never say 100% safe - that as long as scripts exist in that file, there's always 'latent risk' (the size of which we could debate; 10% was just a first guess at it).Just like now, users are ultimately not quite as safe as they get lulled into a false sense of security. Then again, with modern antivirus, we also have that problem of not quite catching all zero-days :)

    ReplyDelete