Last weekend I asked myself: “How fast is zeitgeist, and can we make zeitgeist even faster?” It turned out to be a too general question, zeitgeist has various places where performance matters, so I decided to take a first look at some very basic FindEvents queries.
To get a first impression of how fast some commonly used queries are I wrote a small benchmarking tool which on the one hand gets me some timings and also is able to produce some nice plots.
The first plot I started with gave me a first overview, the speed of these queries varies from a few milliseconds to over half a second. But as you can see, the slower queries all have a red border around their bar, this means that we are not using our SQL indices for such queries. So my first step of this optimization story was to change the queries in a way that they are using the index they should.
And voila, since yesterday these queries are multiple times faster, as you can see in this plot. The yellow series show the same data as the first plot, and the additional series in cyan shows how fast the same queries are after this first step of optimization – pretty impressive.
But we can do even better! Until now I exclusively looked at the class of queries where the timerange argument is “TimeRange.always()”, which is already optimized. So my next question was: “What happens if we do not query over the whole period of time, but only a random interval?”. To understand the next plot you have to know that all events in my sample activity log (which contains 50000 events) have a timestamp greater than 0 and lower than 50000, so ‘TimeRange.always()’ and the intervall ‘(1, 60000)’ will return the same result. The plot is a bit harder to read: always a yellow and a cyan bar describe the same kind of query, using the same codebase. The only difference is that the yellow bars are using a concrete time-interval were the cyan ones are using the already optimized ‘TimeRange.always()’ statement – and remember, both types will return the same results. And as you can see, ‘TimeRange.always()’ is up to three times faster! But I already have a fix, take a look at this one, the yellow and purple bars are the same as in the last plot, and the cyan series shows the upcoming optimization which will hopefully land in zeitgeist soonish. querying on random time intervals will roughly be at the same speed than on the complete time-period.