Time bombs

By Serdar Yegulalp | 2016/07/11 08:00


One of the real demoralizing experiences of working on a project like MeTal is when you realize the approach you've taken to something has serious performance implications further down the road. In this case, it's the current logic I have for creating mappings to templates.

When you create a template, it has one or more mappings that describe how files will be built from a page using that template. For instance, a mapping like '/pages/{}/'.format(page.basename) will write a file to /pages/my-page for a page with the basename my-page. This kind of data is stored for each page in a fileinfo table.

Now the bad news. Whenever we create a new mapping or revise one, we have to iterate through every single page in the system, build mappings, and make sure there are no mapping collisions. As far as I can tell, there's no way to guarantee that you aren't going to get a fileinfo collision unless you do this every time a new mapping is created.

This isn't so bad if you're working on a blog with a small number of pages, or if you just work with a canned set of templates. But what if you're working with a blog that has thousands of pages, as I'm about to?

Consider these two mappings:

'/{}/{}'.format(page.primary_category,page.basename)

'/{}/{}'.format(page.basename,page.primary_category)

Now imagine we have two pages:

Page 1: Primary category = Me, basename = You

Page 2: Primary category = You, basename = Me

It should be pretty obvious what happens if we try to publish both of those pages using both of those mappings. And the only way to tell whether or not it'll happen would be to build fileinfos for every single page from those mappings.

One thing I already did to ameliorate all this is to not rebuild the mapping every time the template itself, rather than the mapping, is changed. That way you can set up the mapping once, then make as many changes as you like to the actual content of the templates, without having to rebuild the mappings. Smart move.

My next avenue of amelioration is to make those operations queue-able, so they can be processed in a queue a little at a time and don't time out completely. (An important thing on shared hosting.)

Another possibility is to defer the creation of a fileinfo for a page as long as possible -- for instance, until the page object itself is accessed in some form. This seems like a good idea, but it has the bad side effect of pushing any potential fileinfo collisions further down the road, and possibly into the lap of the user rather than the designer. In other words, if there's a template that has some subtle problem in this vein, you won't know about it until you try to publish a page that has the offending data. And if you're not a designer, you're screwed.

Now, some of this clearly falls to the responsibility of the designer. To that end, what I might do is provide them with tools that allow all fileinfos for a given mapping to be checked on demand, as a way to determine if a given mapping holds up.

But for now, this seems like the best -- or rather, least worst -- approach.

I also plan to make sure the canned templates have little to no chance of having basename collisions.

Yet another possibility, although one that seems impossible to implement effectively, is to have each mapping "fuzzed", or checked with a certain contrived set of data to make sure it doesn't collide with other mappings. This seems like something so hopelessly out of my league, and so difficult to implement, that I'm not even sure it could be done with my programming chops.

One thing I hate most about problems like this is how they have no universally satisfactory solution. Each solution comes at the cost of performance or safety in another area. But not thinking about them is, well, unthinkable.


Tags: bugs hard problems performance templates


comments powered by Disqus