If you stumbled across this post looking for information about how to identify File Browser windows (i.e. "explorer.exe" instances) and File Dialogs (i.e. Open / Save As dialogs), here is what you need to know:
- GetClassName(hwnd, className, nameBufLen) or className = win32gui.GetClassName(hwnd) in Python with PyWin32 installed gives you a string for identifying the type of window you're dealing with.
- CabinetWClass = The identifier for File Explorer windows
- "Rebar" is the navigation toolbar containing the back/forwards, breadcrumbs, and search box
- "Control Host" the shortcuts pane where favourites, libraries, and the folder hierarchy are shown
- "Shell Folder View" is the pane where all your files and folders are shown
- #32770 = The identifier for Folder Dialogs (such as Open/Save dialogs)
Over the past few days, I've been doing quite a bit of digging into
how various bits of the core Windows UI code works as a bit of an
initial feasibility/reconnaissance survey for a project I'm currently
working on. Usually I prefer to give these things a wide berth,
preferring instead to stick to cross platform toolkits whenever
possible. But due to the platform-specific nature of the project
deployment, I'm stretching out on a bit of a limb here and diving
headfirst into getting to grips with Windows desktop dev, writing what
can be described as an amalgam of "academic spyware" with "interface
replacement capabilities". It's been quite an eye opening journey so
For starters, my first reaction to some of the things I saw can best be described as visceral disgust. In a way, it's hard to blame them entirely, since you have to remember that the core API's and structures here are all about 2 decades old. On a side note, Microsoft's commitment to maintaining backwards compatibility for legacy software/API's is somewhat impressive (if at times a bit annoying when this comes at the cost of causing most of the much maligned quirks in their software which many love poking fun of), and is not something I had fully realised until reading about these kinds of tradeoffs at one point.
Having said that, it's easy to see why many aspiring C/C++ developers in the mid-90's and early 2000's may very well have been turned away from programming - either in those languages or in any language at the time. Put simply, seeing pages of stuff with __ALLCAPS_NAMES_WITH_RANDOM___UNDERSCORES, frequent references to "HWND hWnd", 0x000008's, and talk of several layers of interacting ISomethingAccessObject types mixed together inside some or other flexible buffer formats is not exactly a very friendly introduction to the matter. Even as a rather experienced C developer, and even if beneath that skin it's still just a very plain old stuff we do every day in some or other form, this stuff appears very scary when it's presented this way.
Besides the aesthetic issues, the more concerning issue was about the nature of how exactly we go about hacking extensions into the systems. And it is very much hacky hacky hacky... you know things are fishy when everyone talks about "hooking" api's, "injecting" code and handlers into places they shouldn't really be, and generally "intercepting events". While it seems that these are necessary evils for still allowing a certain degree of flexibility and freedom for extension developers to develop some of the very extensions I rely on to keep Windows in a usable state still, knowing what I now know about malware (a year on from dealing with an outbreak), it's also decidedly scary how easy a determined and creative malware author could cause some really serious damage hooking into things.
It is perhaps for this reason, that I believe that it is absolutely essential that such plugins should be open sourced (as is the case for most software in general; but if nothing else, at least here it is quite important) -- whether it's so that security conscious users can inspect and verify that the code doesn't have any obvious hidden nasties + actually compile working builds from this code to be safe, or whether it's simply to as a safeguard measure for the greater user community should the developer lose interest in his/her project down the track and an update breaks the old code.
Anyways, here are some useful resources
- UI Spy = This (now discontinued) utility from Microsoft is an invaluable gem for digging around the UI widget hierarchy of windows gui's, and finding out how what parameters are available for playing around with. Thanks to my former HCI Lab colleague (and now Sydney Googler) Joey Scarr for introducing me to this tool and sending me a copy.
- pywin32 and pyhook - These two packages for Python are invaluable resources for getting access to things in Windows. While pywin32 (accessed as win32gui) provides wrappers for common Windows core API's (which are useful for querying open windows, and stuff like that), pyhook is useful for listening in on events (clicks and keypresses) users make. (NOTE: the link to pyhook here is NOT the official binaries on sourceforge. However, if you're on a 64-bit system, the binaries provided on SF are outdated and are only for the 32-bit version of Python. As such the installer will fail when you try to install from the SF version).
- Suzanne Tak's PhD thesis from 2011 - "Understanding and supporting window switching" - covers some useful techniques for monitoring window usage using Python (and the libraries above). See in particular pages 41-43 for the discussion on "PyLogger"
Some Methods and Quirks
- win32gui.EnumWindows(callbackFn, windowList) is a great way for checking what "windows" currently exist. You'll want to filter these in callbackFn() first by calling win32gui.IsWindowVisible() to ensure you only get "visible windows" in the traditional sense. Be warned though that even then, you'll likely encounter many strange things when doing this (see next point)
- It turns out many different things are "windows" (now the full extent of the product's name becomes even clearer than before):
1) The Desktop (aka "Program Manager" / "Progman" - this brings back fond memories of playing around with Win3.1 in Dad's office) is the bottom-most "window"
2) Any custom start-menu replacement you may have running (e.g. Classic Shell creates a "ClassicShell.CStartButton" window which is always the topmost window - since EnumWindows goes from top-most to bottom-most windows in terms of Z-order - which may cause issues when you're trying to identify the active window ;) will also have introduced some windows of its own
3) The notifications area also seems to have its own window, though I'm not sure whether that's just 7+ Taskbar Tweaker's handiwork (as I've enabled scrollwheel over clock = volume shortcut)