Reducing stagnant data

When you operate a data driven service such as proxycheck.io you will come up against an issue where you need to decide at what point data has become old and irrelevant.

For us that means how long should we consider an IP Address bad before we remove it from our database. In the past we would cache an address for a period of 90 days since we last saw it operating as a proxy or compromised server.

But this presents an issue in that addresses are often repurposed and bad services running on compromised servers get cleaned up constantly. So this means it's not always best practice to hold IP data as long as we have been.

So we're extending the duties of our inference engine to not only discover new IP Addresses acting as proxy servers but also to go through our old data and verify that the IP Addresses there are still bad.

This means we're now holding IP's for a minimum time of 15 days down from 90 days. The inference engine will make assessments every day from the moment we first add an IP to our database and then slowly discard Addresses where it has a 100% confidence rating they're safe again.

We believe this will cut down on false positives allowing more of your legitimate users to access your services without being blocked unduly just because they received a previously abused IP Address. This change has been active for a while on our development platform and after positive and accurate results we've engaged the system on our live data.

Thanks for reading this change and as always have a great day.


Back