Backend changes and some growing pains

Image description

Today we wanted to talk about some changes we've been making behind the scenes to help us scale to meet the needs of our growing customer base. You may have noticed over the past week that the customer dashboard has not been working as reliably as it should with changes made within your account not synchronising amongst all our infrastructure in a timely manner.

We've also had issues with new users signing up only for their accounts to not be activated by our internal systems, resulting in them being unable to log in and use their accounts. Both of these issues are linked to the same problem, which has been the unreliable synchronisation of data amongst our nodes for our most high-traffic database operations.

This includes things like signing up, changing any setting in your account and the creation or modification of user-generated content (custom lists, custom rules, CORS domain entries etc). We put these things in a fast-track lane so that the Dashboard always feels snappy even when your dashboard requests could change from one server to another in our cluster during a single user session.

But as we've grown we are seeing more and more traffic targeting the dashboard and this once super-fast lane began to slow to a crawl culminating in this week where changes and additions to customer data were greater than the ability for our servers to synchronise those changes in real-time.

To solve this problem we looked at the way we've been handling requests to and from the database and identified many pain points. The biggest one was the amount of database transactions for a single dashboard web request this includes both loading the dashboard and also creating or modifying content within it and then sharing those changes with other server nodes.

By restructuring our customer data and coalescing the gathering and saving of that data into single database operations we've been able to reduce traffic between nodes by a factor of 7 on average for a user accessing their Dashboard and by a factor of 3 to 5 when they make changes depending on what those changes are.

As a result of these changes, the dashboard and user signups are now being handled in real-time again. In addition to these changes for our most accessed customer data, we've also been working on the slower database synchronisation we use for big data. This includes things like customer positive detection logs.

One thing we noticed here is a lot of this data is rarely accessed but the enormity of it was significantly delaying our ability to bring up new nodes (due to the need to synchronise all this data) and the ledger size of our database that maintains a listing of what is synchronised and which nodes are missing data was getting very large and becoming a burden for our server nodes to handle.

To solve this problem we've begun to compress all high-impact user data and as a result we've been able to reclaim 80% of the disk space utilised by this data. This has also had the side-effect of making this data much faster to access as even though we use high-end Solid State Drives on all our servers, storage is still the slowest component out of the CPU, RAM and Storage in a server. So by decreasing the size of the data we're loading from our disks, we can load it into memory much faster.

In addition to the compression, we've also altered how our database blocks work. Prior to this week, all data in the database were stored in 8MiB blocks. This made things mathematically simple. But as user data has grown the amount of blocks has increased to an unmanageable amount. Due to this, we've now moved to an adaptable block size between 8MiB and 64MiB with customer data being placed into appropriately sized blocks depending on their data volume. Since servers need an entire block before they can access the data inside we will adapt what size block is used for a specific customer's data based on their data volume with smaller blocks able to transition into larger ones as their data grows.

So this is the update for today, we're hoping that there won't be too many teething problems but to be honest with you the database scheme updates that apply to the Dashboard are rather major and were introduced faster than we would have liked due to the serious performance degradation we were seeing. What this means is, there may be bugs and we ask for your patience and diligence in reporting any you find.

Thanks for reading and have a great week!


New European Node!

Image description

Today, we have introduced a new high-end server into our European service cluster called Titan. It is named after Saturn's largest moon, which boasts a thick atmosphere and a landscape of liquid hydrocarbon lakes, making it quite unique in our solar system. This server is based on AMD's EPYC Milan-X platform and is currently the most powerful server in our cluster.

By adding this new server, we're increasing our European capacity by 20%. The main reason we've done this is that we wanted more buffer to withstand momentary increases in API usage, which sometimes happens when our customers come under targeted attacks.

This new server also further increases our provider diversity as it's not hosted by the same datacenter partners we've been using for any of our prior nodes and it's deployed in a different country to our other nodes as well (we are now using three different countries for our European nodes).

As we mentioned in the introduction of our Rigel node two months ago, we will also be refreshing a server later this month, that server will be our North American node called Lunar. We intend to keep the node name the same but it will be transitioning to a different provider and more capable hardware based on AMD's EPYC Genoa platform which Rigel already uses and we've been very happy with.

So that's the update for today, thanks for reading and have a wonderful weekend.


Disposable Mail Detection Improvements

Image description

Today, we would like to share a big improvement we have made to our API's disposable mail-checking feature. We launched this feature a few years ago and it lets you check whether an email address belongs to a disposable or temporary mail service so that you can deny service to uncontactable users.

The biggest roadblock we've encountered since launching this feature has been the collection of disposable domain names from the hundreds of sites that offer temporary mail services. The reason for that is their increasing use of anti-bot technology, which made it hard for us to collect their domains in an automated fashion. Often requiring us to collect them by hand which meant there were large periods where certain temporary mail services were underrepresented in our data.

That however has changed over the past several days since we deployed a new gathering system which utilises AI agents to browse these sites, solve captchas and other anti-bot screens to collect the domains. More than half of the sites we profile (and there are a few hundred) employ some kind of anti-bot screening or captcha system.

But with our new AI agent-driven gatherer we've been able to add hundreds to thousands of new domains to our database every day since deployment thus making our disposable mail system much more effective.

So that's the update for today, we hope you'll try the disposable mail detection and feel free to let us know what you think or if you've found a disposable mail site that is giving you results we don't have, send us a message and let us know, we'll build a profile for them.


API Version Support Changes

Image description

Today we've introduced a change to the API version selector found within customer dashboards. This change now clearly identifies which versions of our API are supported and which ones are no longer supported.

We've made this change for two reasons, firstly so that we can stop supporting older versions of the API which will free up more resources for us to put into newer versions of the API. And secondly so customers can have greater control over when to upgrade their API version to gain access to new features and upgrades.

To expand on that second point, previously we've only ever issued new versions of the API when we make a change to the JSON output format that we believe could break implementations by our customers. But now we will be issuing new API versions when we make any kind of substantial code change (anything beyond maintenance or bug fixes), even if it doesn't alter the output format.

The reason for us wanting to release more API versions like this is to insulate our customers from API mistakes and errors as we introduce both new features and changes.

And so by giving our customers greater control over when to upgrade for even minor new features and changes they can be protected against bugs that we may have missed during development and testing. We felt the need for this as we introduced our new location engine last month which did have some problems at release that required us to temporarily roll back the update and then implement rapid fixes before redeployment.

We're very fortunate to have understanding and patient customers who made us aware of these location data regressions and allowed us time to fix the problems. While reflecting on their feedback we realised that bundling the new location data update into our previous release (which introduced device estimates) was not the right thing to do and we should have instead issued a new major version of the API alongside this feature.

Image description

So above on the left is the new API version dropdown selector and on the right is what it looks like if you choose an unsupported version of the API. Below is what you'll see if you're already running on an unsupported version of the API.

Image description

We wanted to make sure these notices couldn't be easily missed thus the big red warnings. Our support timeline for API versions going forward will be a minimum of three years from the point of a release. Currently, we're supporting every version since May 2022 because that was when a large shift happened in the codebase with the introduction of Custom Lists. We will likely support versions beyond this date range depending on how time-intensive it is to port changes back.

So that's the update for today, as always thanks for reading and have a great week!


New North American Node!

Image description

A couple of days ago, we introduced a new high-end server into our North American service cluster called Rigel. It is named after the blue supergiant star in the constellation of Orion. This new server is based on the AMD 4th generation EPYC Genoa platform, making it one of the fastest systems in our fleet as of today.

By adding this new server we're increasing our North American capacity by 33% as we've seen traffic there increase significantly over the past 12 months and we like to have a large buffer so we can take some servers offline for maintenance should it be needed.

This new server also further increases our provider diversity as we're utilising a new datacenter which is operated by a different provider to our other North American server nodes.

In addition to Rigel, we think around April we'll be refreshing several of the servers we picked up two years ago which will help raise overall performance, lower latency and give us more room to grow.

That's the update for today, we should have another non-infrastructure-related update regarding the service directly next week so stay tuned for that.

Thanks for reading and have a wonderful weekend.


Welcome to 2025 and our first big update of the year!

Image description

Happy New Year to everyone as we start 2025 with a big update. We'll not be listing anything we did last year in a round-up because you can scroll down to read everything that happened in our blog posts below and we certainly recommend you take a look at those!

What we want to talk about today is location data and our new location engine that was deployed today on the latest version of the API (November 2024). Over the past several years we've used a ping-based triangulation system to figure out where IP addresses are actually physically based.

It's a simple model where you have lots of servers all around the world and you have them ping the same address and based on the latency you can triangulate within a reasonable doubt where an address is physically located. This model has worked well for us up until now. Customers and ourselves have noticed some location drift especially over the past year.

To address these issues, we investigated the causes:

Firstly, we relied too much on one specific VPS (Virtual Private Server) provider for most of our test servers. This created problems where their fibre links to certain locations artificially decreased the latency response for addresses tested through them. Meaning sometimes they provided us with an extremely fast highway within which to drive and it altered the latency compared to the wider internet and when mixed with our other VPS providers created wrong results that we couldn't account for.

Secondly, we didn't have enough servers in general. To do triangulation properly and to increase accuracy when you get down to the city or postcode level it requires more servers, several in most cities.

Thirdly, anycast addresses where a single IP address is announced worldwide from multiple locations caused our limited number of servers to disagree about where an address specifically is located and as a result required our software to make a sometimes wrong determination based on conflicting test results.

And finally, pinging doesn't always work and isn't always the best. Sometimes you need to simply read 3rd party announcements where an ISP specifically says where an address is. Sometimes you need to perform a traceroute and trace the paths towards an address so you're checking the physical location of all the intermediary routes. Basically, metadata is important.

So to solve these issues we've vastly expanded our network of VPSs, we're now using more VPS providers so we're not overly influenced by the robustness of a single provider, and we're now spinning up and down VPSs as needed to increase our network size while keeping costs in check. We've switched to performing trace routes and gaining information from all addresses along a route to a specific address that we're interested in and we're looking at ISP-provided metadata.

The result of all this work is that the country detection specifically (which most of our customers care about when it comes to location information) is now incredibly accurate once again. When compared with market leaders whose main or only product is location data we're very competitive with them for both IPv4 and IPv6 location data.

Another improvement provided by the new location engine is fewer blank spots where we had no location data at all about an address which was the result of our prior ping approach with addresses that simply did not reply, the traceroute system combined with ISP metadata takes care of this and provides accurate location information for these prior unknowns.

So that's the update for today. Remember this is available only on our latest API version dated November 2024. If you've already set your API version to this in our dashboard or you have it set to the "Latest Version" then you already have the new location engine. If you would like to compare past and current results you can alter your API version to our previous release.

Thank you for reading, and welcome once again to 2025!


Introducing Device Estimates

Image description

Today we're introducing a new feature called device estimates which presents you with the estimated device count for specific addresses and their subnets based on actual data derived from the usage of our API by customers when supplying the &asn=1 flag with your requests.

By using these new data fields within the API (shown below) you'll be able to make decisions about whether to allow an IP to interact with your service based on how many devices are estimated to be active behind it which allows you to make a better risk assessment.

Image description

One thing we were very keen to maintain with this feature is user privacy. This is why we do not detail the exact devices being used behind an IP address and in fact we don't perform any kind of device fingerprinting as all of this data is gathered anonymously and our estimated number is based on number theory and not specific device tracking. This means we can still maintain accurate device estimates without impinging on user privacy.

The new device estimate feature is available now in the API, we've issued a new version dated the 19th of November 2024. This data is also exposed within the Custom Rule feature which means you can now build rules against device counts for both singular addresses and their subnets. We've also added device estimates to our individual threat pages.

In addition to presenting this data in the API we're also using it internally to help us discover previously undiscovered VPN services and proxy servers, we'll do another blog post on the results of this in the future.

That's the update for today we hope you'll take advantage of the new feature and thanks for reading!


Refreshing our status page

Image description

Today we've launched a new version of our status page designed to convey more relevant data to you and to make the status page itself more resilient and accessible in emergencies.

So first of all, the page has a brand new address, previously our status page was at proxycheck.io/status which meant it could potentially become inaccessible if our entire website were to be down. This has now been changed to status.proxycheck.io which as a sub-domain can be operated independently of our normal service cluster.

Image description

The second big change is we now show status history. The image above illustrates the new pill-style history graph showcasing the past 3-day status of our API in increments of one hour. Each pill can show multiple colors at once with the size of the color indicating the service status and how long that status occurred. When hovering your cursor over a pill you'll receive an interface similar to the one on the right below featuring current status, latency and any specific service messages.

Image description

On the left above you can also see smaller status panels for specific server nodes. If you view the new status page you'll actually notice that the most important statuses are at the top and shown larger with more visible history and as we get down to the less important things like individual service nodes we display them more densely.

You may also notice that some services not relevant to customers have been replaced on the new status page with more appropriate services such as email services and the Custom List downloader service.

One last thing to mention about the design is that all the displayed dates and times are localised to you as and when you view the page making it much simpler to determine when events occurred without needing to look at unfamiliar timezones.

Before we decided to make our status page (absolutely everything about this feature is custom) we looked at many commercial and open-source solutions and although many of them could accomplish what we needed none of them fit the design aesthetic of our website or they didn't display the exact information we needed in the way we wanted it shown.

That's why we chose to design this ourselves, the flexibility that building things yourself affords can not be overlooked and that extends to even small things like making sure the hover-tooltip stays on the screen when you get near a browser window edge which was something we found even some commercial status products didn't offer.

So that's the update for today we hope you enjoy the new status page and will bookmark it for your convenience and as always have a wonderful week!


Improvements to the API test console and Custom List storage increase

Image description

Today we've made two changes to the service the first of which helps developers get started with our API faster by expanding the test console found in our API documentation to actually generate a URL for you to query based on the flags you enable. Below is a screenshot showing the new interface, the purple section being completely new.

Image description

We've also removed the submit button that used to accompany the test console because it was redundant, instead we now dynamically update both the output example and the new URL generator section as you toggle flags on and off or change the type of request being tested from one of the supplied dropdowns.

The second change we've made today is we've increased the storage available for Custom Lists from 4MB to 8MB as illustrated below.

Image description

We've made this change because users are making use of larger and larger lists and we want to facilitate this usage. Some users have resorted to breaking up large lists into multiple lists which just seemed inefficient. We ran some test to determine the performance impact on the API and didn't see any degradation, we may increase individual list sizes again in the future but right now we felt 8MB struck the right balance.

So those are the updates for today we hope you're having a wonderful week and thanks for reading!


Operator Data Expansion

Image description

Since we introduced operator data to the API in December 2021 we've often been asked by customers to broaden the types of operators that we support and generally expand on the feature. To deliver on those requests we showed last year how we had been adding decentralised VPN operators and then a month later we integrated operator data into the positive detection log within customer dashboards.

Today we're improving operator data again by building operator profiles for scraping services. We've been monitoring many of these services since last year and we feel now is the right time to create distinct operator cards and expose operator data within our API for these organisations.

Image description

Above is one such card for Oxylabs which is one of the largest operators in the scraping and residential proxy selling space. You'll also find cards for their many competitors of all sizes.

Broadening the kinds of operators we list doesn't stop here; we will start to include datacenter hosts, residential proxy sellers, click farming services and more in future updates, we are committed to expanding our operator data with the rich cards like the one above and detailed and easy to parse data exposed through our API.

That's all for today, thanks for reading – we hope you have a great week!


Back