Hi, I'm a bug!

By Serdar Yegulalp | 2015/10/07 18:26

One of the ongoing projects at this stage of MeTal is refactoring a bunch of modules that all used to be in one file -- breaking them out into separate files for easier maintenance (and perhaps even a very modest performance boost, but I wasn't banking on that). When this broke something horribly, it made for an object lesson in how to pay attention to the consequences of moving things around.

When MeTal receives a request, it's handled by the routes.py file. The functions there import whichever library in the ui namespace the request is destined for, then hands it off to a function from there. The ui functions are all wrapped in a @transaction, so if anything fails the commits to the database are rolled cleanly back.

I screwed up during the refactoring, and ended up with a sequence of events where, on a save, a transaction-wrapped event called another transaction-wrapped event. End result was a vital operation was never committed, and so the save itself never actually completed. Irony: the client got a 500 error, but in the context of a success message -- so my frontend didn't know how to properly interpret the results, either. It reported a successful save, even when no such thing had in fact happened. I lost an hour's worth of work because of this.

The offending event handled the deletion of files associated with page previews, and it needed exception handling to avoid squawking whenever it tried to delete a file that didn't exist in the first place. I spent a good twenty minutes messing with the exception handling, only to realize I was not calling the actual page-preview-delete event from the context of another function, but its user-interface handler ... hence the blowup.

After much "argh" and "yargh" and hair-pulling, I eventually tracked down the offending sequence of events, and reworked things so that only one transactional event was called during that sequence. It wasn't hard; in fact, it helped further reinforce separation of concern in my code, and made it easier to call the preview-delete function elsewhere without shooting myself in the foot.

Things are fine now. At least, until the next refactor. And yes, I know, a full array of unit tests would most likely have caught this problem; if you want to hassle me about that, take a number and get in line.

Tags: bug refactoring

comments powered by Disqus