Sometimes performance isn't what you think it is

By Serdar Yegulalp | 2016/12/03 08:00

Sorry I haven't been posting much. Real life has kept me a lot busier than I anticipated, but work is continuing on this project. Just very slowly. Recently, though, I had an experience worth sharing.

One of the things I recently added to the internal development branch of Mercury is a quick way to back up and restore the SQLite database associated with a given install of Mercury. It's nothing complicated -- it just makes a copy of the database file, and then copies it back when needed. The problem was, the copy operation in either direction was staggeringly slow. It took almost a minute for the server to return a response!

At first I thought this was a documented problem with the shutil.copyfileobj utility in Python. If you don't explicitly set a buffer size for the copy operation, it can take a long time to complete, as the default buffer size is quite small by design. But even with a buffer size of -1 (e.g., "use all the memory you can"), it was still taking forever. What was wrong?

Then I realized something: I had never explicitly closed the connections to the database before attempting the file operations. I'd assumed that the context managers for the database handler were doing that, but I had never fully tested this assumption.

I added a db.close() operation before the backup routine, and BAM! The whole thing ran in a fraction of a second. In other words, it wasn't a performance issue at all, but a resource contention issue, the possibility of which I had arrogantly ignored.

I'll be posting another update in December with, I hope, some more meat on its bones.

(Updated: Turns out the "fraction of a second" save time was an artifact of disk caching. The actual save time is more like 2-3 seconds. That said, clearing the lock was definitely the source of the majority of slowness.)

(Updated yet again, 2016/12/23: The 2-3 seconds wasn't in fact the save time! It was the time required to spin up an entirely new instance of the Python interpreter. When I performed the save on a "pre-warmed" copy of the interpreter, it saved in a fraction of a second.)

Tags: Python SQLite database performance real life

comments powered by Disqus