Post-Processing Inference Improvements

Today we've released a significant update to our post-processing inference engine which should greatly increase its detection time of bad addresses. The post-processing inference engine is the part of our service that stores addresses we don't think are proxies in a temporary cache so that more data can be accumulated and more processing power can be used to determine if the addresses are in-fact bad.

The new changes today allow the engine to be a lot more aggressive on when it starts making determinations. Prior to today we would convert IP Addresses into one-way hashes (similar to how we hash passwords) and then only make a determination on whether those hashes are proxies or not after we had gathered 8 hours worth of data.

This meant that if an IP was going on a rampage, such as registering on lots of websites, brute forcing passwords and so forth it's likely we would not detect it as a bad address until its 8th hour in operation. For some IP's you really do need this amount of time to get a full picture of its intent as we do not have a complete overview of the entire internet, we only see a small slice.

But for other addresses it becomes obvious in mere seconds that they are bad. And so that is why we've re-engineered our post-processing engine to be a lot more aggressive in when it begins to make determinations. Now it will constantly re-evaluate them based on the actions it's seeing them perform in near real-time allowing us to go from first sight to determination in seconds instead of hours.

Already we're seeing an uplift in detection rate for the worst addresses attacking our customers and honeypots. As our customer base continues to grow, the speed at which attacks are correlated will increase resulting in even faster determinations. Essentially the more our customers and honeypots receive attacks the faster we can identify those bad addresses causing those attacks and share our determinations through our API.

In addition to these changes we've also updated our machine learning model and the inference engine can now hold hashed addresses for a much longer period of time if it feels that's warranted which allows for more data accumulation. This should increase the overall detection rate over our previous model due to it now being able to gather much more evidence.

Thanks for reading this update and we hope everyone is having a great week.