Earlier this week I got MeTal running on my web host by way of FastCGI -- actually WSGI with a FastCGI adapter. It runs much faster, of course, compared to its original CGI deployment, but that in turn exposed me to some bugs that revolve around what happens when multiple people contend for the database. In short, it creates an error cascade that I had to unravel.
Whenever someone hits "publish," all the pages for a given job are pushed to the queue, and handled in batches of fifty. During these batches, the database is locked to keep inconsistencies from creeping in -- but it also means during a batch run, there are long blocks of time when nobody else can make changes to the database.
This gets to be a problem if you have many people all trying to publish changes at once, all with permissions high enough that they can run instance of the queue.
One of the short-term workarounds I created was simple enough: a timeout value for queue batch runs. When a queue has processed fifty items or has been running for more than five seconds, it relinquishes the lock and lets someone else in.
This isn't as much of a problem when:
ideally, if you have a lot of people working at once, you should go with the last two scenarios -- give actual publish permissions to a small number of people who can push jobs as needed, or leave the whole thing to the publishing queue exclusively.
Another amelioration: the maximum timeout for the queue run iterations is 5 seconds. The maximum timeout for a database lock is 10 seconds. This way, there's a nice margin of time to wait for database locks to lift before reporting a lock error.
Yet another problem is what happens if we need to kill and restart the FastCGI process, for instance to apply changes to the application. I've worked out a way to do it that ought to behave even when we have multiple instances side-by-side, but we'll see.
Finally, I'll be adding options at some point to limit the number of ways multiple instances of the queue can be run. I don't have the specifics yet, but the idea will be to ensure only one queue runner is active anywhere at any time, or at the very least to ensure the queue runner isn't launched from within the editor unless it's not going to create problems anywhere else.