Inference Engine Improvements

Over the past few days we've been working on improvements to our Inference Engine. Specifically making it work across more processing threads simultaneously and efficiently while decreasing its rate of false positives.

We recently upgraded our Prometheus server node from a 6 Core, 12 Thread XEON to dual 8 core, 16 thread XEON processors. Giving us a total of 32 threads and a giant 50MB of L3 Cache. These processors also have much higher frequency with a base clock of 3.6GHz and a maximum turbo-speed clock of 4GHz.

With this new hardware we've seen a dramatic 181% uplift in performance for our multithreaded workloads. Every part of proxycheck.io is multithreaded, it's one of the ways we're able to deliver such high performance for the millions of queries we process daily.

But some of the post-processing we do such as with the Inference Engine to discover new proxies did have some performance hangups under certain scenarios which could reduce performance. Tuning your software to take advantage of 32 processing threads is not an easy task and it's a big jump from the 12 threads we were using previously.

We've had to tune the software not just to take advantage of the extra threads but also to understand NUMA (Non-uniform memory access) so that our threads are working on data in the RAM connected directly to the processor the thread is running from. We've now completed the rewrites necessary for our new hardware and we're seeing dramatic increases in performance on this node.

Due to this performance uplift we've been able to spend more time training our Inference Engine and with the extra computation time available each determination our inference engine makes is more thorough than ever before which lends itself to increased accuracy.

We're now seeing a 0.02% reduction in false positives in our training models which equates to 2,000 less false positives per 10 million determinations.

API Improvements

Over the past several days we've also been hard at work on the API itself. Not only to tune it for our new multi-processor node but also to fix some bugs including one with ASN's and special characters. When receiving an ASN result we tell you the provider name, sometimes these providers have special characters in their name for example Telefónica has a special ó in their name. This caused an encoding error with our JSON encoding which resulted in null results.

We've corrected this by encoding all provider names before the JSON encoding is performed. We believe very few results were affected by this issue which was caught by our own Ocebot during its automatic API probing.

That's all we have to share with you today. We hope you all had a great weekend.