First new node of the year: Nyx

Image description

Today we've added a new cluster node called Nyx to serve the North American market alongside Cronus and Metis. If you read our earlier blog post here you'll know we recently began our infrastructure expansion outside of Europe by adding servers in North America.

The aim has been to lower customer access latency by reducing the distances between our customers servers and our own servers while also maximising service availability in the event of network disruption or hardware failure.

Over the past several months we've seen our daily API request volume increase substantially and the North American region especially has seen a huge increase in just the past 30 days which has been driven by an influx of new customers and even existing customers deploying our service to their pre-existing servers in North America after we delivered the lower access latency they required.

Due to this increase in NA based requests we needed to increase capacity and that's why we've added Nyx today. We're still looking for some great infrastructure in Asia which will be the next region we expand into, we're seeing heavy traffic from Japan, South Korea, Singapore and Malaysia which we want to serve better and we hopefully will later this year.

We have been asked if we're still using bare-metal servers or if we've moved some of the cluster to cloud instances. We can confirm that we're still using bare metal servers, we continue to believe the highest security possible remains with bare metal and so that's what we're using for all cluster nodes and in-fact any of our servers that hold any kind of customer data or metadata are bare metal and wholly operated by us directly which is to say we don't use managed services from a third party.

Remember to follow us on Twitter for all the latest updates, we don't tweet often but when we do it's always worth the wait!

Thanks for reading and have a great day.


Custom Rule Library

Image description

Today we've updated the customer dashboard to add a custom rules library which as of this writing contains 18 pre-made rules for the most common scenarios we see rules being used for. As time goes on we'll add more rules to the library as we've made it fully extensible and new rules can be easily created and exported to the library by our staff.

Below is a screenshot showing the library interface and as you can see it's very simple with categories along the left, descriptions in the middle and buttons along the right to add specific rules to your account.

Image description

The part that makes this new feature powerful is the fact the rules are editable by you. Simply add one of our rules from the library and then you can modify it just as if you had created it yourself thus allowing you to quickly get started by using the library as a templating tool.

This is a feature from our roadmap that we've been planning since the custom rules feature was first unveiled to you in mid 2019 and we're very pleased to introduce it today.

We've prototyped many different approaches to creating an easier to use rule system including walk-through guides and even a questionaire based wizard but ultimately we settled on this library system because it's the easiest to use and teaches you how to use our advanced rule interface through working examples.

Thanks for reading and welcome to 2021!


Our 2020 Retrospective

Image description

At the end of each year we like to look back and discuss some of the significant things that happened and this year we did have a few significant events occur.

Firstly we need to discuss the elephant in the room. The pandemic has been a constant presence since very early in the year and it shaped some of our decision making. We added four new nodes this year due to increased customer demand and part of this increased demand was due to the pandemic changing peoples online behaviour.

That is to say as people spent more of their time online the services they use turned to us for protection to a higher degree. We saw record numbers of customers increase their plan sizes, record numbers of new signups and record levels of conversions from free to paid plans.

Our service more than doubled in size this year by user signup and daily usage metrics which is significant for a service in its 4th year. This growth is somewhat bittersweet because the pandemic which has harmed so many is partly why we've grown so rapidly this year.

We had to disable our live chat support for most of this year due in part to the high volume of chats it was generating but also our support staff being less available because of the pandemic. This is a feature we do intend to bring back in 2021.

When we look past these aspects of 2020 we did launch a lot of new features and enhancements to previous features. One of the biggest new features we launched was Cross-Origin Resource Sharing (CORS) which we recently updated with enhanced functionality.

This features popularity took us a bit by surprise. We knew customers wanted it as they had mentioned client-based implementations to us often even if they didn't know how to articulate it exactly. Since we launched CORS many thousands of websites have added our service to themselves through this method which again has been a surprise. It is by far the most popular feature we've added since the service started with custom rules being a close second.

This year did also see us add extra information to the API including Continents and Regions. These were added not just to improve the usefulness of the API in your apps but also to assist customers who wanted to make more targeted custom rules within their accounts.

Another popular feature has been burst tokens. We added this to help customers handle those momentary spikes of traffic that can happen from an attack, going viral or launching a new product where purchasing a higher paid plan doesn't quite make sense yet. We've had a lot of feedback about this feature and customers really seem to love the peace of mind it brings them, we've solved what we're calling query anxiety.

One of the more recent developments has been our regional expansion. Adding servers to North America had been on our roadmap for over a year but we always had a lot of concerns about data security. When you use our service you entrust us with your data and we didn't like the idea of that sitting on servers in the United States, specifically EU customer data on US servers. To combat this "issue" we devised an internal data access and permission system that keeps your data accessible only to our regions that you're actually using.

We've always stored customer data at rest in an encrypted container format within our block based database system but all our servers always had the keys to decrypt any of that data. With the advent of our regional nodes this has changed and each region will only have the ability to decrypt specific account data (Stats, CORS, White/Black lists, Custom Rules etc) after the customer tied to that account performs an action that would necessitate accessing their data. These keys also expire regularly to revoke access when it makes sense.

Looking towards the future it is our intention to add nodes in Asia next year which will lower access latency to our API for that region like we did for North America. We also have traditionally done price alterations in January but we're not doing that this time. We feel the pricing we have right now is great and so no price rises are currently necessary.

We know this year has been marred by disappointment due to the pandemic but we will return to normality. It's our sincerest wish that all of you are safe and healthy as we enter into the new year.

Thanks for reading!


Welcome Cronus & Metis

Image description

We're sure if you've been following our blog this year you would be surprised to see another new node announcement quite this soon let alone two. But it's true, we're adding two new nodes today called Cronus & Metis and these are quite special because they're the first nodes we're activating outside of Europe.

This year has seen us increase our capacity quite a lot to meet the growing demands of our customers and while we were intending to add new servers early next year we've pushed up our timetable because we're seeing increased request volumes from outside Europe.

Specifically 1/4th to 1/3rd of our traffic (depending on the time of day) now originates from America and Canada. Having these requests traverse the Atlantic Ocean to our servers in Europe has meant our north American customers are facing higher than acceptable latency and so today we've added two new server nodes in Canada just on the border with the United States.

Cronus was the greek god of time and so it's aptly named as its only job will be to serve the North American market with an aim to reduce their access latency to our API and Metis is the personification of prudence or in other more common language, cautiousness. And we're being cautious with our North American rollout by adding two servers for load balancing.

In addition to the new nodes we've spruced up the status page a bit breaking out where our server nodes are available. At current that is Western Europe, Eastern Europe and now North America. It is our intention to add servers in the Oceania and Asia regions to serve those areas in the same way and we will likely add such server nodes next year.

Like all our other servers these new ones are part of our unified cluster architecture and so while all North American traffic will go to Cronus & Metis it will seamlessly failover to our European servers if there are any problems. Your data is synchronised between all nodes and protected from downtime without you needing to do anything.

So that's what we have for you today and we hope you enjoy this one last present before Christmas.

Happy holidays everyone!


CORS take two

Image description

Today we've released some major updates to the CORS (Cross-Origin Resource Sharing) feature found within your dashboard and we're excited to tell you about them.

Firstly we've made some under the hood changes to how your origins are stored on our servers and processed by our v2 API endpoint which should reduce retrieval time from our database and lower the latency incurred when answering a CORS based request from customers, especially for those of you with a large number of origins on your accounts.

Secondly we've improved the import/export experience within the Dashboard. The exported CORS files will be easier to parse and edit with UUID's scrapped from the process, only domains are present within the exported files now.

Thirdly we've added wildcard support which means if you have a lot of subdomains you no longer need to enter them all manually and can instead put a star to indicate all subdomains and the main domain should be allowed to use CORS for your account. (example: login.site.com can become *.site.com).

Image description

Fourth we've finally added a Dashboard API endpoint (currently in beta but accessible to all customers) which allows you to list, add, set, remove and clear your origins but crucially it allows for large batch changes to be performed for both adding and removing origins which supports usage at scale. You can view all the documentation for this here

So those are the four changes to CORS, we know you'll find them useful, especially the API and the wildcard support which have been often requested by customers. One last thing, if you intend to use the CORS API please report any issues you come across to us and we'll work to remedy them quickly.

Thanks for reading and have a great weekend!


Introducing Burst Tokens!

Image description

Today we've launched a major new feature called Burst Tokens which allow our customers to make even greater use of their plans without needing to lift a finger.

For a long time we've had customers coming to us with a simple problem. Most of the time their usage fits within their plan size but sometimes they have bursts of activity which go beyond their plan size. This is a problem because it doesn't make economic sense to increase your plan size just for those one or two days a month when you need a few more queries.

This scenario plays out fairly often especially with websites that receive unexpected viral traffic and game servers which are often targeted by DDoS attacks from disgruntled players.

So to solve this problem for our customers we've introduced Burst Tokens which while active allow you to go over your daily allowance by five times until the next daily reset time. And best of all the tokens are redeemed on your behalf automatically when you go over your daily allowance.

Image description

You'll receive tokens on the 1st of every month and only a single token can be consumed each day with the plan you have dictating how many tokens you're granted. For our free customers on our 1,000 daily query plan they are given one token to use each month while our Starter plans get 3, our Pro plans (as illustrated above) get 5, our Business plans get 6 and our Enterprise plans get 7.

As we said above you can go over your daily allowance by 5 times when a token is consumed. So if you're on our first paid tier which is our 10,000 daily query plan and you happen to go over that, a token will be automatically redeemed and your daily allowance for the remainder of that day will be 50,000 queries.

At this point you're probably wondering if this is a new paid feature and actually it's not. All past, present and future customers with an account will have access to the new token feature and in-fact by the time you're reading this you'll be able to see your available tokens in your customer dashboard. We've also updated our usage dashboard API endpoint with burst token availability.

And so that's all there is to it, a free upgrade on us to help supercharge the plan you already have. But don't worry, we'll still send you normal usage emails when you go over your plans daily allowance but they'll now also detail if a burst token was used so you know if it's time to upgrade your plan or if it's just a spike in usage that your tokens can handle.

With the launch of this feature we have released a new version of our API v2 dated November 17th. If you already have your version set to use the latest stable API version you will be using this version of the API automatically, otherwise you can select it within the customer dashboard. We're not expecting any implementation breakages but some of the status code messages have changed wording to indicate if a burst token has been consumed or not.

If you have any questions about the new feature as always contact us, we love to hear from you.

Thanks for reading, stay safe and have a wonderful day!


Welcome Aura

Image description

It's hard to believe only 9 months have passed since we introduced our Eos server node and yet we're already introducing another new server node to our cluster.

This year has been filled with difficulties as the world continues to grapple with the COVID-19 pandemic. A result of which has meant more people than ever before have turned to the internet for their communication with loved ones, entertainment, education and work.

As our company helps individuals and businesses protect their infrastructure we too have seen the demand for our services grow. In-fact we broke every record we held this year. Monthly, weekly and daily signup records to the service were easily broken multiple times as were our daily query volume records. We saw record levels of user activity on the website and general enquiries about the service from potential customers increased by an unbelievable volume.

And this is why it's so important to always be continually investing in our infrastructure. The previous blog post to this one explained how we had added multiple high-end servers for post-processing inference so that our proxy detection can continue to be the best available. Today we continue that focus by adding a new high performance server node to our cluster.

Aura is the Titan goddess of the breeze and fresh cool air of the early morning. And it is also now our most powerful server node featuring a high performance AMD Zen2 processor. This is the beginning of a new platform for us, this single server is the equivalent of three of our 1st generation server nodes in raw compute power giving us enormous growing capacity.

It is our intention to replace all our 1st and 2nd generation infrastructure with nodes of this capability and to keep the cluster around 10 servers or less spread out around the globe offering us redundancy against not only individual system failure but also geographic problems such as international fiber optic cable damage. Already we make use of multiple datacenters spread across Europe and we will expand on this as we add more systems to the cluster.

At the moment Aura is in the final stages of provisioning where we perform rigorous tests to make sure it's up to our standards. So far it's looking good and we're expecting Aura to answer its first customer queries starting tomorrow.

Thanks for reading, stay safe and have a great day!


Post-Processing Inference Infrastructure Update

Image description

Today we would like to share with you some updates we have regarding our machine learning infrastructure geared towards post-processing. This is where you send us an IP Address to be checked and after we give you an immediate answer we put it into a large pool of addresses to be examined where time is no longer an issue.

In February 2019 we made a blog post about a new server we introduced called STYX which was designed to do all post-processing inference to free up resources on our core cluster so they could spend more time answering queries instead of processing data.

Image description

You can see above a graphic we shared within that post illustrating how our (at the time) three cluster nodes would feed data into STYX to be processed by its many processing cores.

Since then the volume of addresses we process every day has increased to an unimaginable amount. To keep up with this growth we've increased our cluster size from 3 to 5 servers, replaced our weakest servers with stronger ones and gone to extreme levels of code optimisation all of which has allowed for our level of growth without spending obscene amounts of money on cloud providers.

But coming back to STYX we did hit a problem there. No amount of code optimisation can get around the fact that there are simply too many addresses to process on one system. We put in some stop-gap measures by creating a ratio system where only half of addresses were tested, then 1/3rd, 1/4th and finally only 1/5th. Eventually if we continued in this manner only a tenth of all addresses would be able to be processed by the post-processing engine on STYX.

And so that brings us to todays post where we have invested in an entirely new range of infrastructure dedicated to inference. They consist of various servers with various core counts. Some of the largest servers we've acquired for this now feature dual 18-Core XEON's. In-fact our inference infrastructure is now several times more powerful than our cluster that answers customer queries.

STYX is still with us but it has been repurposed as a job scheduler. It will now monitor all of the inference infrastructure, hand out jobs as needed and retrieve the results. We created a little fun visualiser for ourselves to see what STYX sees as it hands out work which we thought would be interesting to show below.

Image description

So what is the net benefit of all this work? well the main thing is we can once again fully examine every single address we receive from customers within our post-processing inference engine and we can easily add more servers to the inference infrastructure as needed in the future which is something we will need to do as the service becomes ever more popular.

One of the quickest ways to see the results of our new infrastructure is to check out the threats page. This is where we post only addresses our post-processing inference engine found to be proxies and it takes a random assortment of the most recent few hundred to be displayed there. It wasn't so long ago that all the entries on that page would show as last being seen 8 to 12 hours ago but with the new engine steaming through data we're discovering more proxies per hour than we used to discover per day.

This is why you'll see a lot of addresses on there were last seen just an hour ago or less. Being able to obtain knowledge of proxies like this that are "undiscovered" on the wider web (ones we've discovered that aren't yet posted publicly on message boards, blogs and websites) is important to us as it's these proxies that are perhaps the most dangerous and most likely are being abused by the individual[s] who set them up in the first place (often on hacked remote servers and IoT devices).

In addition to broadening our infrastructure we did also rewrite the way we synchronise information within our cluster. We found with so much data being updated per second there were some bottlenecks which we were able to completely solve several days ago.

Some of this was caused by the immense data changes occurring due to the new infrastructures ability to process so much data at once and some of it was a watershed moment caused by some internet problems affecting one of our cluster nodes that meant it had more data to synchronise than usual once it came back online, during this process we noticed how it wasn't able to reach parity with the other nodes after several hours due to just how much data was changing during the synchronising process.

So that's what we wanted to share with you today, bigger and better infrastructure that leads to tangible improvements in proxy detection.

Thanks for reading and have a great week!


Building a better Detective

Image description

One of the challenges a service like ours faces is the existence of anonymising services that specifically go out of their way to obscure their infrastructure. And so while it's easy to detect most addresses and their suppliers there's always a small percentage that slip by.

Which is why this past month we created a list of these difficult to index suppliers and went about building tools that were tailored specifically to scan and verify the addresses they offer. Traditionally when we want to scan a provider they will offer a webpage of addresses or hostnames which makes it easy to scan them and correlate what we find across our honeypot network and other scraped websites, this is part of our collect and verify strategy.

But some of these let’s say hardened providers will either mask their addresses behind signup pages, paid memberships or other means. For instance it's becoming very common for VPN providers to only show you their server addresses once you've signed up and paid for service and with there being hundreds of VPN suppliers paying for all those subscriptions isn't really commercially viable.

But even the free providers are becoming more shrewd by inserting randomly generated addresses within their legitimate address pools to thwart page scraping and some sites only show you addresses once you verify you're not a bot by solving a captcha or require a javascript engine to decode the addresses before they're rendered on the webpage.

All of these are things we worked to solve this month with what we're calling our Detective. It's a new module within our custom scraping engine which allows for a lot more thought during collection and processing. The results have been quite promising with our list of detected proxies and virtual private networks steadily increasing since it went live.

Some of its features include:

  1. Web and non-Web collection for anonymising services that only offer an application for accessing their network of servers.
  2. Javascript engine for solving any kind of proof-of-browser anti-Bot measures during address collection.
  3. Captcha solving support using image recognition with a fallback to human based solving.
  4. Bad/Fake/Generated address discardment through time based observation and frequency of appearance.
  5. Pattern recognition for indexing VPN providers infrastructure based on a few hand entered sample hostnames.

There are some very well known providers that are constantly being abused that have employed one or more of the above tactics to make it difficult for services like our own to get a full picture of their infrastructure but the new system we've devised has been able to break all of these approaches.

As always if you've come across an address, range or service provider we don't yet detect please contact us, we really do investigate every lead sent to us by customers.

Thanks for reading and have a great week!


API Version Selector added to the Dashboard

Image description

After we updated our API yesterday it became apparent that some customers had really depended on our type responses only being present for positive detections which is understandable because we hadn't used this field for clean addresses before.

Because of this we've decided to push up our launch of the API version selector which we mentioned in yesterdays post to today. This means you can now choose which version of our API you want to be used (by the dates of major revisions) and when you select a version you'll get a neat explanation about what changed compared to the previous dated version like in the screenshot below.

Image description

We are also providing a way for you to select which version gets used by adding &ver=date to the end of your queries. For example &ver=17-June-2020 and when you do that you'll get a version response back from the API like so:

{
    "version": "17-June-2020",
    "status": "ok",
    "98.75.2.4": {
        "proxy": "no"
    }
}

Which lets you know the version you requested was in-fact the version you received. This version indicator will not be present if you're using the latest version of the API via the selection box in the dashboard (the default selection) or you've not provided the &ver= flag with your query.

We hope this will help customers to plan for future changes to the API so they can upgrade their implementations when they're able to do so instead of on our release schedule.

Thanks for reading and have a great week!


Back