Yesterday, I ended up wasting quite a bit of time trying to debug various deadlock freeze bugs I was intermittently getting in a non-deterministic fashion on one of my test machines but not on any other machine I was using. The cases I was dealing with there had cases where the main window would intermittently freeze (and just show a blank grey screen) while loading.
While trying to reproduce these problems by going through and clamping down any places where potential threading-issues may have arisen, I happened to notice a far worse bug. All of the widgets on the main window (scrollbars, tabs, sliders) were all non-responsive, with clicks in certain places causing non-recoverable glitches. Yet, at the same time, there would be a few widgets (toolbar buttons, menu items) which still worked! To make matters even weirder, all the controls on a secondary window worked flawlessly throughout.
After hours of fruitlessly retracing various aspects that I had believed were problematic, commenting out various things added recently just before noticing the bug or checking out different revisions, I finally narrowed down a particular commit that was the source of the problems. The culprit, I eventually found, was in the least expected place!
Custom Widgets and Deadlocks...
I had written a custom widget using gtk.DrawingArea as a base. This custom widget was included as part of a layout that went on a "Notebook" tab thingy (*). The Notebook was being used to emulate the "CardLayout" from Java Swing (with no tabs shown, and no border either); the tab that the custom widget went on was a secondary tab that got shown "later" in response to some other actions taking place elsewhere. The layout manager hosting the Notebook had a show_all() call when everything was in place.
(*) TBH, I can't understand why they chose such a strange name for this, though then again, there's always confusion about what to call the widget which hosts several tabs, the thingies which let you select a tab, and whether a tab includes the content area too. For clarity, I usually refer to them as TabHost, TabLabel/TabPane, and TabAs it turns out, if you call set_size_request() on the custom widget, then this will result in a deadlock situation whereby all the other widgets which happen to be on the same screen layout as that widget will end up not taking any events, thus resulting in everything not responding. I haven't been able to find anything else about this problem, but for now, I've just commented out this call to get things working fine, albeit the widget draws a bit too large or small at times.
Crash by Assertion
Another one of the problems I've been running into is this cryptic error:
Gdk:ERROR:gdkregion-generic.c:1114:miUnionNonO: assertion failed: (r->x1 < r->x2)Apparently this is quite infamous in a few other GTK apps, in particular some of the File-A-Bug systems. Although I haven't been able to pinpoint exactly when this occurs, it seems to happen a lot when I have some thread continuously updating some UI widgets, and then I put some other windows (e.g. the console associated with the app) partially covering it for a few seconds.
I've said it many times, but NEVER should an software released for people use to result in "crash by assertions". FFMPEG used to (don't know about now, since I haven't needed to do another FFMPEG Codec+Container dance to find a working combo for a while now) result in "crash by assertion" whereby some obscure MSVC "Assertion Failure" error msgbox would popup when trying to use certain codec/container combos.
I understand that some people (especially prominent among academics who are really really into pre-post conditions and formal proofs of everything) consider assertions beneficial. Indeed, they probably do have uses, but only during the development phase on the particular software developer's computer. For libraries used by other applications, this means that assertions should NOT cause problems and/or crashes, even if the situation is impossibly convoluted. However, for some strange reason, I've found that the are never really isolated to "debug" builds only, as some manuals and specs would have you believe.
Another bug I've been fighting with this toolkit is the problem of the applications not shutting down properly. Namely, the console doesn't go away, even though nothing appears, and zero CPU activity is occurring after gtk.main_quit() and all other shutdown activities have been performed.
Somehow, I managed to get this to work eventually on the lab octocore I was using for a while, but this didn't hold for my own machine or the test machines this stuff needs to work on. Gah!
I've recently been dealing with quite a few really insane concurrency-type non-deterministic, and difficult to pinpoint bugs. If anyone knows of any good static analysers which may be able to plot these things out graphically (and preferably highlight the problem spots) for Python code, then I'd be interested in hearing about it!