Custom Rules enhanced with Dividers

Image description

Since we introduced Custom Rules in 2019 it has continued to be one of our most popular features and as customers have become more familiar with it and we've expanded its feature set we're now seeing some customers with upwards of 100 custom rules in their account.

Last year we improved the interface for these power users by introducing the ability to hide deactivated rules and also search for rules based not only on their name but their rule content which includes searching both condition and output values.

Today we're adding another power user feature, dividers. This feature allows you to add dividers between and above rules so that you can visually separate rules that have different use cases. You can add as many separators as you like and we let you both name them and set the color of your dividers individually. Below is an example of how the feature looks when you've added a few dividers.

Image description

We wanted to make dividers very easy to use so you can simply click on the name of a divider to change it and drag the dividers around to move them like you can with rules. We also didn't want them to look visually cluttered so you only see the divider control buttons when you mouse over a divider like below.

Image description

And finally, we wanted you to be able to customise your dividers not just by name but with any color and level of transparency that you want. To that end, we've added a real-time full spectrum color picker which you'll see if you click on the Color button as shown below.

Image description

So that's the update for today, it's live in everyone's Dashboard right now and we hope you have a lovely weekend.


CIDR ranges added to the API and better 3rd party vendor support

Image description

Today we've added a new feature to our API, CIDR ranges. This change allows you to see the ranges present when you check an IPv4 or IPv6 address on the API which has been an often-requested feature and one that provides greater insight into network route sizes and helps in potentially blocking undesirable peers from accessing your services.

If you've used our custom list feature you may have seen the sample text we populate these lists with which contains entries such as:

123.45.214.0/24 #My Home VPN Providers IP Range

2001:4860:4860::ffff/64 #My Work VPN Providers IP Range

These are CIDR ranges which help to specify a range of addresses within a group. For example, the first range above has /24 at the end which means there are 256 addresses in this range between 123.45.214.0 and 123.45.214.255.

By having these ranges displayed in our API results and threat pages it will help users to quickly add a specific range of addresses to a custom list. This can be useful if you want to stop a specific user who keeps changing IP address within their internet service provider's supplied DHCP range from accessing your services.

We think this change will be well received and it is live as of today through a new API version dated January 22nd 2024. If you're already set to use the latest version of the API (which is the default in our customer dashboard) you will have this feature available in your API results as of this post.


In addition to this news, we've also made a change to the emails that you receive when your plan is nearing expiration. Before today we would send you an email which explained you could visit the dashboard on our website to start a new plan if desired.

As of last week these emails are now vendor-aware so if you've purchased a paid plan through a 3rd-party vendor you will now be recommended to revisit that vendor to renew your plan with an appropriate link being provided. These vendors sometimes offer discounted or bundled pricing so it's a good deal to renew through them and the plans bought through 3rd-party vendors in this way help to fund more end-user software that integrates our API for the benefit of all customers.

Thanks for reading and have a wonderful week!


Our 2023 Retrospective

Image description

At the end of each year, we like to look back and discuss some of the significant changes that happened to our service and this year we focused heavily on our physical infrastructure.

We started early on April 14th by introducing QUASAR and PULSAR, which are our first South Asian server nodes.

These servers were very important because until this time all the traffic that originated in Asia was being served by our European infrastructure which introduced higher-than-desired latency. We had trialled many different servers from many different hosting providers and datacenters until we settled on these and over the past 8 months they've worked diligently.

We followed this up just four days later on April 18th with the first major refresh of our American infrastructure. We swapped out our LETO node for LUNAR. This increased performance and set a new benchmark for our servers in the North American region going forward.

Then eight days after that on April 26th we introduced both SATURN and JUPITER as new North American nodes. This time we didn't replace any current nodes for that region as we had long-term leases on our other older servers so we kept CRONUS, METIS and NYX until between July and September, all of those older servers are now retired.

These three new North American nodes increased our performance so much that we reduced our footprint from four servers to three while more than quadrupling our per-second request capacity.

And with that final hardware update, we were now running the latest and fastest hardware in all regions and that gave us the confidence to increase our query limits from 125 requests per second to 200 requests per second per node and per customer in all regions.

Of course, other changes happened in supporting of our physical architecture, we re-designed the way our servers share and correlate database updates which made features like our new stats graph with per-minute resolution possible. We made some blog posts about both of those things, we also were able to lower our average query latency which gave us an extra buffer to introduce more data to the API like currencies and more detailed and accurate location data.

As we close out the year, the main thing that happened this year that makes me personally happy is our South Asian server nodes. It's no secret if you have followed the blog for the past several years that we have been trying to get servers in Asia that had the network connectivity we needed, the processing power we required and a price that made sense. So being able to finally reach that goal with hardware that will last us many years was a great achievement.

I also want to give one shout-out to the power user improvements we made this year, not only the new high-resolution stats graph already mentioned above but also making Custom Rules and Custom Lists fully searchable. Such a simple concept but it works so well and saves so much time, especially for our most heavy users who have a lot of content in their dashboards.

Thank you to everyone who uses our services for a wonderful 2023, we're looking forward to what 2024 brings!


Operators added to the Positive Detection Log

Image description

In December 2021 we added a feature to our threat pages and API called operator data which lets you view specific information about VPN operators including their name, website and certain policies of their service.

This has been a great value add for our users especially as all that data is exposed through our API directly. However, there has been one area where we've not exploited this data until today which has been the customer Dashboard.

Previously you would only see entries in your positive detection log within the Dashboard like below.

Image description

As you can see the entries are quite generic, a VPN was detected but you don't know which operator it's from and you may want to know that especially if you see a pattern of abuse from a specific operator's services.

And so from today, you'll now receive a more detailed response like below when we know who is operating a VPN server.

Image description

We've made these tags clickable which will open the operator's website in a new tab/window of your browser making it easier for you to research them.

We also tied these new tags directly into our pre-existing operator database (making sure they're always up to date) where not just the names and URLs are lifted from but also the color coding which matches the VPN operator brand colors for easier visual recognition.

So that's the update for today thanks for reading and have a wonderful month!


Improving support for decentralized VPN networks

Image description

As the commercial VPN market reaches maturity and the majority of VPN providers have a well-understood and traceable infrastructure we're starting to see novel approaches to building and maintaining VPN server fleets that thwart traditional detection methods.

One of these approaches is known as a decentralized VPN or dVPN for short. These are where a VPN company doesn't own and operate the VPN servers they sell access to and instead, they act as a broker between consumers seeking to use a VPN and "node operators" who make available their internet connections for rent.

For the vast majority of these dVPN services their decentralized infrastructure can still be discovered and added to our database like any other VPN service but some of them have made it more difficult. One such service we're focusing on today is MysteriumVPN which has a complex broker system utilising tokenized addresses to mask node operators.

To be more specific, in MysteriumVPN's case, you cannot glean the IP addresses of their VPN nodes until you pay some cryptocurrency called Myst to one of Mysterium's brokers who then connects you to a single node operator. Essentially this means you have to pay every individual node operator a small amount of cryptocurrency to be given their node's IP address by the Mysterium broker.

This unique approach has meant that for some time now Mysterium nodes have gone undetected and their abuse on the internet has reached critical levels. Everything from bypassing streaming site geoblocking to scraping website content and performing fraudulent transactions with stolen payment information has been facilitated by these nodes.

Because of that, we've taken a special interest in dVPN's and throughout July we've been developing new tools to better handle them. That is where today comes in where we wanted to share our work on dVPNs and specifically share with you about Mysterium due to it being the largest in the space.

Image description

Above is what the new Mysterium operator card looks like, you'll find addresses from this VPN service presented via our API with a heightened risk score beginning at 73%. This elevated risk score reflects the danger we perceive these addresses as posing because not only is Mysterium a fully anonymous service but due to the difficulty in discovering the addresses, the lack of detection of them by services like our own and most of the addresses being hosted from residential address ranges it has become a magnet for criminals.

At present, we're indexing a few thousand nodes per day and expect to have 95% of the nodes offered by the top 10 dVPN providers detected by the end of this month. We would also like to take this opportunity to thank customers over the previous several months who provided IP addresses that they were certain belonged to proxy or VPN networks, we were able to match many of these to dVPN operators and thus expand our detection capability.

Thanks for reading and have a wonderful weekend!


Improvements to the Dashboard for Power Users

Image description

Today we've introduced several new features to the Dashboard to make it easier to manage your Custom Rules and Custom Lists, especially if you have a great many of them as some of our customers do.

Firstly we've reduced the space between entries so more can be viewed at once in a vertical stack. This is a minor change but beneficial all the same. You can see how this looks in the screenshot below.

Image description

Secondly to that, we've added a new button to the top of your rules and lists which will insert a new entry at the top as opposed to the bottom of your stack. So you can now add new entries without needing to scroll to the bottom of your stack only to need to drag the rule to the top again.

Thirdly, we've added a new button which will hide any disabled rule or list that you have. We know many of our users like to create situational rules or lists and leave them disabled until they're needed, as a result, you may have many cluttering up your rule or list interface. This feature was requested by a customer only yesterday and we were happy to include it in today's power user update.

The fourth and final feature is a filter field which lets you search for your rules and lists based not only on their names but also their content. So if you've created a rule that targets a specific country or a list that contains a specific address you will no longer need to closely examine every rule or list in your dashboard to find them, you can simply search for it by the piece of information you know is within them. Below is an example of us filtering for a specific rule based on the ISP Vodafone that was included in the rule.

Image description

All of today's new power user features utilise client-side Javascript exclusively which means not only do they work live without page refreshes but they're incredibly fast and smooth with animations where appropriate to convey that something has been hidden and not deleted etc

We hope you enjoy today's update, thanks for reading and have a great week!


Performance Regression Discovered & Fixed

Image description

This is a brief post to let you know that between the 6th and 15th of June, there was an intermittent performance regression affecting our North American service area.

This was caused by a regression in our code that significantly increased database queue times when the API was under high load conditions. We were first made aware of the problem on the 8th of June by a customer but when we investigated, the high load conditions had already ceased making it difficult to replicate and diagnose the cause of the problem.

Today though we were able to view the performance regression live as it occurred and that enabled us to properly diagnose and resolve the code issue. As of this post, everything is solved and the API is once again answering all queries with a consistently low latency. We would like to thank the customers who messaged us about the problem, we would not have discovered it so quickly without your assistance :)

Thanks for reading and have a wonderful week!


Location data improvements

Image description

Today we have released a large update to our location data which not only improves the accuracy and precision of all our location data but adds data for IP addresses that lacked any previously.

This includes region, city and postcode data. We've been working on this for a while and anyone using our development branch, you may have noticed some addresses were giving different and more complete location data compared to our stable releases until this morning.

Image description

We wanted to make the new data available to as many of our users as possible so we have gone back and ported the new code to every version of the v2 API starting from August 2020. If you're on a release before that you will need to change your selected API version from within the customer dashboard.

With this update, we've also rewritten how the metadata for IP addresses are both stored and accessed and in doing so we were able to eliminate one extra database read operation which should improve API performance, if only slightly.

So that's the update we have for you today, thanks for reading and have a wonderful week!


Re-architecting Software The Right way

Image description

In computing, there is a strong resistance to complete rewrites and there are many great reasons for that including cost, time, the potential for new bugs and regressions in functionality.

Instead, the software industry prefers to do what we call refactoring where you take existing code and improve it by making many small changes over a long period so that each change can be more easily implemented and the results of those changes measured in a controlled manner. In short, it's fast, it's cheap, it gets results and it lowers the potential for problems.

But now and then it may be required to completely rearchitect a system and there can be many reasons. You need more performance, you need the code to scale better to more computing resources, new hardware has arrived that runs the old code poorly or not at all, new libraries, operating systems or execution environments are incompatible with your old code etc

And so many of the above conditions can precipitate a rewrite.

Image description

This is exactly where we found ourselves this year with some of our backend software written since 2017. This was software that was designed for a specific operating system (Windows Server) which meant we had leveraged some Microsoft-specific operating system functions which made our code incompatible with Linux.

We had also designed this software around certain hardware that we had access to at the time which meant low core count processors and slow storage using hard disk drives. This resulted in a lot of our backend systems being conservative in how they used the hardware, meaning mostly single-threaded operations and serial data access.

Our current servers average 23.5 CPU threads whereas when most of the code we're discussing was created our biggest and best server only had 8 CPU threads. And when it comes to storage we used to use HDDs that could manage only 800 IOP's now we're dealing with NVMe SSDs that can handle 1.2 Million IOP's.

With us wanting to rewrite things to be processor-independent and operating system agnostic whilst taking better advantage of our latest (and future) hardware it necessitated some rewrites. For sure we could refactor some of our old code and in many smaller cases that is what we did but for the biggest stuff rewrites were the right way to go.

So how do you re-architecture your software the right way?

Firstly you need to do a full code review for the system you're going to rewrite. This includes reading all of the code and understanding and documenting all the tasks the code performs and why it does them. This is paramount because otherwise, you will forget to include functionality in the new rewrite that the previous iteration contained.

Secondly, you want to identify all the deficient parts of the code that you want to improve upon. This could be simply messy or unmanageable code or it could be just unproductive code that performs poorly or doesn't meet your goals for now or the future (such as tying you to a particular operating system).

Thirdly you want to plan how you intend to improve the code to meet the goals you have for the new program. For us that mostly entailed making things multithreaded, better use of our storage system's I/O capabilities, not using any Windows operating system-specific features or functions that won't be available on Linux. And of course, the culmination of all this work is added scalability and flexibility.

Fourth and finally you write and test the code. We did a lot of testing during development to test our many hypotheses and this informed our design process. As we learned the capability of certain approaches to problems the decisions we were making changed along the way.

So let us go over some of these.

1: About two years ago Microsoft ended support for WinCache which was an in-memory data store for PHP. We made extensive use of this and so we had to build a replacement. Thus two years ago we wrote what we call ramcache. It performed the same role and re-implemented all of the WinCache functions. We were also able to extend the functionality. WinCache had a 85MB memory limit, we have no such limit in our ramcache as one example. We also made it operating system agnostic aka it will run on anything including Linux.

2: Webhooks. We use a lot of webhooks, mainly for payment-related events such as sending you an email when a payment is declined or coming up but we also use webhooks for what we consider time-critical events such as when you make a change in your Dashboard that must be propagated to all cluster nodes very quickly.

3: Our database synchronisation system. At one time the processor usage caused by synchronising our databases was as high as 70%. This did scale back as a percentage when we upgraded to systems with faster processors but it was still very high and we saw the usage steadily increasing as our customers generated more data per minute. In this case, we developed a new process called Dispatcher to handle this traffic and we dramatically reduced processor usage to just 0.1%.

4: Cluster management, node health monitoring & node deployment. Before our rewrite, this heavily relied on Microsoft-specific operating system features, especially the node health monitoring and node deployment features. We've now rewritten all of these to also be operating system agnostic and processor independent.

So let's look at some net results. Previously our webhooks (where one server specifically sends out a small update to one or more servers in the cluster) took an average of 6 seconds for a full cluster-wide update. The new code which is multithreaded when it comes to network usage has reduced this time to just 0.3 seconds. That's a 20x performance improvement.

Image description

When it comes to Dispatcher, this was a ginormous change for us as everything that keeps our servers in sync with one another utilised our previous system. The old system was so encompassing it didn't even have a name because it wasn't thought of as one specific object, the code was interspersed with so many other functions and features that it was almost omnipresent throughout our code.

This has all changed with Dispatcher which provides a standardised interface for reading and writing to our databases and it provides a framework for our cluster nodes to provide data in the most passive (and thus least resource-intensive) way possible through the use of packaging up node updates and having a singular chosen master node for each geographical region temporarily selected as a collector, processor and distributor of database updates.

You can think of Dispatcher a lot like a train network. Each node operates its own train that is constantly going around the track to all the other nodes and picking up data. Master nodes pick up data from non-Masters, they process the data and then carefully decide where in the database that data should be inserted. It is then repackaged by the Master and presented to any trains that come by from non-Masters which pick up those updates.

Image description

Every few minutes the nodes hold a vote and the node with the most free resources and best uptime is chosen to act as the master for that geographical region. The way we stop conflicts from having multiple master nodes able to distribute updates simultaneously is by having clearly defined containers, merge conflict resolution and a database that is built around a structured 1-minute time table.

Each minute in the real world is accounted for in the database with a master node attached to it for that specific minute and geographical region. That node and only that node can perform maintenance and alterations unless the nodes agree to remove it from that minute and assign another node, thus allowing another node to become the master over that minute. Any node can read from a minute but only masters can perform alterations and writes.

What all this results in is a high-performance database synchronisation system that scales to however many server nodes we have and most importantly with the least amount of processor and storage burden.

While performing all these code rewrites and refactors we also investigated new technologies and tuned our execution environments for the code we author. To that end, we upgraded to the latest PHP v8.2.6 across our entire service including all our webpages. During this upgrade process, we also engaged the latest JIT compiler present since PHP v8 as we saw massive improvements in page load times across the site with no regressions.

So that's the update for today, we hope you enjoyed this look at what we've been up to and the illustrations.

Thanks for reading and have a wonderful week.


Weekend Topic: Why we use a monolithic architecture

Image description

For this blog post, we would like to go over our architecture design for proxycheck.io and explain some of the decisions we've made along the way in building the service. To start with what exactly is a monolithic architecture and what is the alternative approach?

Monolithic in software pretty much means running all your services on one or more beefy servers as opposed to breaking out your services into what is commonly referred to as microservices and having them distributed across many smaller servers or even running them on what is known as "serverless" or "edge" computing infrastructure. The idea behind the microservices approach is you remove a lot of overhead like managing an operating system, you only instead manage the specific application that you've developed.

The other benefit is you can scale up microservices horizontally meaning if you need more resources for an application you can simply spin up another copy of the microservice on another system and load balance between them.

This approach however does have some caveats. As the amount of microservices you have increases the volume of network activity between all the services in your infrastructure increases. After all each service needs to obtain, process and share data with the rest of your infrastructure and the more servers you have sharing that burden the more there is to keep synchronised.

Database traffic is often overlooked when people turn to these services but it can become substantial to the point that you cannot expand horizontally anymore because there aren't enough resources to keep all your services synchronised. In addition to this complexity, there is also a creeping increase in cost from all this overhead which can overshadow the initial costs you thought you would incur for the resources you're using to serve customers.

Some good examples of how other services moved from microservices to monolithic would be Dropbox or even Prime Video which recently shared an interesting blog post about how they reduced their costs by 90% when moving from microservices to a monolithic architecture. And yes that is Amazons Prime Video who were using Amazon's AWS services to operate their microservices.

To quote Amazons Prime Video:

"Moving our service to a monolith reduced our infrastructure cost by over 90%. It also increased our scaling capabilities. Today, we’re able to handle thousands of streams and we still have capacity to scale the service even further."

So not only did it save them money but it also increased their ability to scale and helped them to support more users with fewer servers.

We have used a monolithic architecture since the very beginning because although we identified the benefits of microservices and specifically the use of AWS's EC2 and Azure clouds to scale rapidly we identified many drawbacks. Performance for these services on an individual level is not high that is to say individual requests have poor performance.

To put it another way, this microservices approach is akin to flying 2,000 hot air balloons instead of having 2 jumbo jets. Sure you can have double the amount of people across those hot air balloons but the time it takes to get to their destination will be much longer.

And that was and continues to be the crutch of the microservices model that has kept us not only on our monolithic trajectory but our bare metal one too. When we rent servers we are the only tenant and we get to pick the hardware, we often pick the fastest hardware available and we have been replacing our older servers with new ones that offer 3x to 4x their performance.

Meanwhile, if you look at the past 5 years of "serverless" computing like EC2 the performance has remained pretty much the same driven by the service provider's desire to maximise the amount of customers per unit of compute resource available.

To us, speed matters. If you compare for instance our customer dashboard to companies that use cloud providers and microservices you'll find ours loads instantly and populates with data in the blink of an eye while some of even the largest companies like OVHCloud have you sit for upwards of 10 seconds for their customer dashboards to populate with information.

Now we don't think that microservices have no use at all. There are certainly workloads that benefit from this approach especially data processing that needs a lot of workers and doesn't need instantaous results and any workload that can be accelerated by dedicated fixed-function silicon for example video transcoding, network encryption/decryption, packet routing. All of these tasks make sense for the horizontal growth approach that serverless/microservices can provide.

But for anything customer-facing where speed and latency are paramount, we just don't see the same benefits, users get frustrated waiting for things to load, the performance of the service isn't great overall, the costs can spiral out of control and the overhead with regards to data synchronisation can be crippling.

We hope this was interesting, we wanted to go a bit more in-depth about this topic due to our recent infrastructure posts which spurred some customers to message us and ask about why we don't use cloud providers and instead continue to use bare metal.

Thanks for reading and have a wonderful weekend.


Back