On July 4th we wrote a post about our new software robot called Ocebot (a combination of the words Ocelot and Bot). And today we'd like to give you some insight into what we discovered as we pour over the past 10 days of data since that post.
Before we get into the data though lets just run through the kinds of things Ocebot has been doing.
- Querying the API around once a minute for 10 days straight
- Making proxy only and VPN requests
- Making queries it already knows the answer to
- Making malformed queries to see how the API responds
- Forcing the server to take detailed server-side analytics when answering Ocebot queries
So from this we've gleaned a few things. Firstly the response time of the API is excellent on a positive detection with most queries being answered under 50ms with network overhead. For negative detections (meaning every single level of check is performed) the average time is 250ms, again this is with network overhead but without TLS turned on.
The second thing we found is that the response times are very consistent, our averages aren't changing throughout the day and we're not seeing much of any difference between our nodes in the time they take to answer a query which is a good thing as slow nodes would create inconsistency for our customers.
The third thing we found were some edge cases in our code that could create a high latency response due to logging of errors. We're talking in the millisecond range here but when we're trying to give responses as fast as possible every millisecond counts.
The fourth thing we found were some optimisations to our cluster database syncing system. Through the server side analytics we were able to discover high CPU usage caused by the encryption of data that is to be synced to our other nodes in the cluster. Essentially before we send any data to any other node in the cluster through our persistent machine to machine data tunnel we encrypt it with AES256.
This can be CPU intensive if the data being transferred is always changing and thus requiring lots of database updates to other nodes. By looking at the Ocebot data we could see there were a lot of things being synced that didn't need to be, lots of high activity data alterations that are only really important to the machine handling your API query and are not needed by the other nodes in the cluster.
And so what we've done is moved some of these to a local cache on the nodes making the requests when the data isn't ever going to be needed by another node.
The other thing we've done is some data does need to be shared with other nodes but not immediately and so we've added some granularity to how frequent certain pieces of data are synced so we can benefit from update coalescing, meaning combining multiple smaller database updates into one larger database update that is transferred to other nodes less frequently.
By doing it this way we've been able to significantly reduce the CPU usage of our cluster syncing system and thus increase their API response throughput (hypothetically) in the future when we're closer to full node utilisation.
Our experiments with Ocebot are ongoing, already we've discovered some incredibly useful information that has directly improved proxycheck.io. Over the next few weeks we will be enhancing Ocebot so it can perform tests on our new Inference Engine, not to judge accuracy but to gauge performance and to make sure it's getting faster at making determinations.
Thanks for reading and have a great day!