Data improvements between March and April

Image description

Today we wanted to detail a few significant data changes we've made over the current and previous month and how it impacts the data we serve through our API to you.

Firstly we've seen a large uptick in residential proxies being used around the internet to scrape websites and perform exploits. Residential proxy networks have outgrown the onion network (TOR) due to their users being paid to participate which differs from the free model that TOR uses.

On TOR anyone with an internet connection can launch what is called an exit node and others can use it to proxy their internet traffic for free. We've always detected TOR exit nodes right from the beginning. But now with money being involved these residential proxy networks are growing exponentially. We've managed to find flaws in a few of them which we've used to list their networks on our API.

In March we added the networks of two of the largest ones, we've had many customers email us about the fact we fail to detect a lot of these networks and so we became aggressive in our pursuit of their network nodes.

To put the scale of these networks in context In one case we were able to list 15,000 of 17,500 nodes on our API. That alone is four times the size of TOR. And while this has pleased our customers who asked for better indexing of these networks it has come at a cost: false positives.

The reason for the false positive rate increase is that these networks are relying on users to share their home and mobile networks and these are often dynamic. A single subscriber may change IP address upwards of 20 times per day in some circumstances which means unless we're constantly evicting addresses from our database we're going to have false positives.

To work around this problem we've begun evicting addresses at a much faster rate than we otherwise would, sometimes as little as 10 minutes depending on how dynamic we believe the addresses are. And for addresses we don't see multiple times within these proxy networks, they'll be evicted from our data within an hour.

When it comes to evicting dynamic addresses in general we have made significant progress in this area. For example, 90% of mobile (5G/4G/3G etc) proxies are removed from our data within 10 minutes. We've also categorised hundreds of address ranges we know to be shared via carrier-grade NAT (CG-NAT) due to the impact to the users of those networks being too great to list even if a proxy is inhabiting an address within one of those address ranges.

We've also expanded our VPN detection greatly, we increased the amount of hosting providers in our database by 19.4% between March 1st and April 8th, our biggest increase in such a short period. This was largely driven by investigating hundreds of address ranges and also by our customers supplying us with suspicious addresses and providers through our contact form (which by the way we read and reply to every message sent to us).

Finally, we've also been working heavily on our disposable email detection. We identified several issues in our internal systems that collect and store disposable addresses and by fixing those we've been able to vastly increase the number of domains we're adding to our database. We've also built some custom tools to obtain disposable domains from many of the most popular services automatically.

So that's the data update for you. We have also made a few updates to the Dashboard over the same time frame, you will now receive Country and Continent suggestions when creating rules that use those condition types making it easier to target locations without having to guess how we present their names in our API. This feature is driven by our new resources feature found here.

We hope you found this post interesting and we would like to thank all of our customers who have taken the time to write to us about emerging threats, new proxy networks, suspicious addresses and temporary email domains. We very much appreciate your effort to make our data better and more thorough.

Thanks for reading and have a lovely weekend!


Back