COVID-19 and proxycheck.io

Image description

Hello everyone

As you are all likely aware by now the world is currently gripped by a global pandemic caused by an infectious disease known as COVID-19. As of right now many countries are in lock-down and many more are in the process of shutting all non-essential travel to slow its spread.

At proxycheck.io we operate in such a country that is currently in lockdown and all non-essential travel is no longer permitted here, please don't worry about us though we're doing perfectly fine working from home.

Over the past few weeks you may have noticed our live support chat has been unavailable and we've only been accepting support requests sent to us via email. This is directly due to the disease as our live support staff have been told to stay home for the safety of themselves and their families.

At the same time due to so many people around the world staying home due to the disease the volume of queries we're handling has increased quite significantly. Our daily peak traffic hasn't changed too much but the surrounding low-periods have increased to meet our peaks. We have more than enough capacity for this extra traffic and so the service has remained completely stable.

However this increased traffic has lead to an extra burden on our lowered support presence as many of our customers have been upgrading their plans to get access to more daily queries and these plan alterations are currently done manually by our staff. In-fact we've seen more customers upgrade their plans in the past two weeks than in the previous several months combined.

And so that's where we are today. The service is handling its extra traffic fine, we're still continuing to work on everything and support is still available via email like normal, although replies may be a little more delayed than usual. The live support chat isn't currently available but feel free to use it when you see it accessible again.

Looking to the near future we hope this disease will be under control soon, it hurts us deeply to see so many suffering. And please do listen to your countries officials and heed all their advice just like we're doing here at proxycheck.io.

Thanks for reading and stay safe!

v1 API has reached end of life

Today we're officially ending all support for our v1 API. We first announced we were doing this way back on March 17th 2018 and we've been showing a notice within customer dashboards since that time if you've made any recent v1 API calls.

Now to be clear, we're not removing the v1 API endpoint but we are no longer guaranteeing that it will remain functional and available. In addition to that all of the new features we released over the past 9 months have only been accessible through our v2 API endpoint and this will obviously continue to be the case as we roll out further new features.

As an example some highly requested features such as Custom Rules and CORS (Cross-Origin Resource Sharing) have only been accessible through our v2 API endpoint since they launched.

And so if you're one of the 0.36% still calling our v1 API endpoint now is the time to switch. To help with the urgency we're changing our dashboard alerts wording and the colour of the notice that appears to be more prominent to users. An example of this notice has been included below.

Image description

We know these kinds of changes can be stressful when they're made without fair warning. This is why we spent the previous two years giving customers a lot of notice and we're quite happy to see most customers transitioned before today, 99.64% of you in-fact. This is no-doubt due to the many wonderful developers who've updated their integrations over the prior 24 months to utilise our current v2 API.

At present we do not have any plans to make another change of this type, we feel the v2 API format is very robust and extensible allowing us to add new features without jeopardising backwards compatibility. So in short, update with confidence we won't be doing another API format change any time soon.

Thanks for reading and have a great week.

A world of Caches

Image description

Probably the biggest obstacle to overcome when operating a popular API such as ours is having the hardware infrastructure and software architecture to handle the millions of requests per hour that our customers generate.

Careful selection of our hardware combined with our extensive software optimisations have resulted in our ability to operate one of the internets most utilised API’s without turning to hyper-scale cloud hosting providers. That's important as it has allowed us to remain one of the most affordable API's in our space.

Today we wanted to talk about one very important part of our software architecture which is caching. We use caching not just for local data that is accessed often by our own servers but also at our content delivery network (CDN) to deliver repeatable responses.

During February we took an extensive look at all of our different levels of caching to see if there were more optimisations possible, which we found there were. We’ve also created a new feature we’re calling Adaptive Cache that by the time you read this will be enabled across all our customer accounts for your benefit.

So before we get into the new feature lets just quickly detail the three key areas where we employ caching today.

Server-side Code Caching

When our own code is first interpreted and executed the result is stored on our servers in memory as opcode and then those stored instructions are re-run directly instead of going through our high-level code interpreter.

This results in massive savings both computationally and time wise. In-fact if we didn’t do this a single request to our servers would take between 1.4 and 2 seconds instead of the 1ms to 7ms requests take currently.

Server-side Data Caching

Whenever you make a request to our servers and we have to access data from a database or compute new information from data held in a database we cache all of it. We cache any data we requested from our database and the computed answers you received.

This also dramatically increases performance as database operations are much slower than accessing things stored in memory and similarly it’s much faster to retrieve a computed answer from memory than it is to compute it again from the raw elements. This is one way we’re able to do real-time inference so quickly.

Remote CDN Caching

Whenever a request is made to our service our CDN stores a copy of our response and if the exact same request is made to us (same API Key, IP Address being checked, Flags etc) then the CDN simply re-serves that prior stored response. But only if both requests were made in the same 10 second window.

This is one of the most important levels of caching for our service when it comes to maximising the efficiency of our infrastructure because as you’ll see below we receive a lot of duplicate queries, mostly from customers not using client-side request caching.

So that’s the three main ways in which we utilise caching. Code, Data and Content. The main way we determine if our caching is working well is by monitoring cache hit rates. Which simply means when the cache is asked for something how often does the cache contain what we asked for.

Ideally you want the cache hit rating to be as high as possible and now we would like to share some numbers. Firstly code caching. This should be the type of caching with the highest hit rates because our code doesn’t change very often. At most a few source files are altered daily.

Image description

And as expected we have a cache hit rate of 99.66%. The 0.34% of missed hits are from seldom accessed code files that execute only once every few hours or days.

Image description

For data our hit rate is also quite high at 31.88% as seen above. This is mostly due to us having servers with enormous pools of memory dedicated to caching. In-fact all our servers now have at minimum 32GB of memory and we usually dedicate around 1/3rd of that to data caching (this is tweaked per-server to maximise the hardware present at each node, for example one of our nodes has 256GB of memory shared across two processors and a larger cache is more appropriate there).

Image description

Finally and perhaps this will be the most surprising to our readers is our CDN cache hit rate. At 52.15% it’s extremely high. This means for every 2 requests we receive one of them is requesting data we already provided very recently (within the past 10 or 60 seconds depending on certain factors).

The reason we say this is extremely high is because for an API like ours that provides so many unique requests (literally millions every hour) it’s odd that so many of the requests we receive are duplicates, especially when our CDN cache is customer unique meaning one customer will never receive a cached result generated by another customers request.

So what causes this? it happens to be the case that many of our customers are calling the API multiple times with the exact same request due to a lack of client-side caching. The common scenario is a visitor comes to your website and you check their IP. They load a different page on your website and you check their IP again because the first result was not saved locally and cannot be used for the second page load. Thus generating two identical requests to our API, the first answered directly by our cluster while the second coming from our CDN only.

Now the good news is, the CDN we’re using (CloudFlare) is built to take this kind of traffic and since they have datacenters all over the world getting a repeated answer from them is usually going to be faster than getting it from our cluster. The other benefit is it saves you queries as we do not count anything served only from our CDN Cache as a query, they’re essentially free.

And so that brings us to todays new feature we’re calling Adaptive Cache. Prior to today we only cached requests made by registered users for a maximum of 10 seconds at our CDN. But with our new Adaptive Cache feature we’re now going to adjust the caching per-customer dynamically based on how many requests you’re making per second and how many of those requests are repeatable. This will save you queries and thus money and help us more efficiently utilise our cluster by answering more unique queries and spending less time handing out duplicate responses.

Essentially if you make a lot of repeatable requests but some of them are spread out too far from each other to fit within the 10 second CDN cache window we’ll simply increase the window size so your cache hit rate becomes higher. But don’t worry we’ll only adjust it between 10 seconds and 60 seconds.

It’s completely transparent to users and the system will always try to minimise the caching time so that changes you make in your account (whitelist/blacklist changes or rule changes for example) will be more immediately reflected by our API responses.

And so that brings us to to the end of what is a very long article on caching. If you want to optimise your own client that uses our API we highly recommend adding some kind of local caching even 30 to 60 seconds can save a considerable amount of queries and make your application or service feel more responsive for your own users.

Thanks for reading and we hope everyone is having a great week!

Introducing Cross-Origin Resource Sharing!

Image description

Today we're introducing a new feature that we have been asked for frequently since our service began. The ability to query the API through client-side JavaScript in a web browser.

This feature may seem on the surface quite simple just allowing the API to be queried through a web browser. But securing this system so that your API Key was never put in jeopardy while maintaining the integrity of our service and making it easy to use required some thought and engineering effort.

The way it works is simple, you go into your dashboard and click on the new CORS button. There you'll receive a new public API key intended to be used in client-side only implementations of our API. Below that you'll find a field where you can enter the origin addresses for all your client-side requests to our API.

Image description

Client-side implementations use the same end-point as our server-side API and just make use of your new public key. This lets our API know you're making a client-side request to the API which will lock the API to checking only the requesters IP Address. It also tests the origin (domain name) of the request against the ones you entered into your dashboard.

All queries made this way will accrue against your private API Key automatically and appear in your dashboard the same way that server-side requests do. In-fact you can make both server-side and client-side requests to the API at the same time allowing you the flexibility to use the right implementation in the right place.

Since you're making requests to the same endpoint as server-side requests you get access to all the same features. You can use all our query flags like normal and gain access to all the same data such as location data, service provider information and more. It even supports your custom rules automatically.

So what does change when using the client-side request method? - The main thing is a downgrade in security. If you choose to block a website visitor using only JavaScript it's possible for your visitor to disable JavaScript or modify the script on the page to circumvent the block.

And so if unwavering security is something you require then the server-side implementation is still the way to go and is still our recommended way to use our API. But if you have a website that doesn't make it easy to integrate a server-side call to our API or you lack the expertise to perform such an implementation then our client-side option may be appropriate.

To make it as easy as possible to utilise the client-side method we've written some simple JavaScript examples for both blocking a visitor and redirecting a visitor to another webpage. You'll find both examples within your dashboard under the new CORS tab, an example of which is shown below.

Image description

The last thing we wanted to discuss was origin limits. Several years ago we added an FAQ to our pricing page containing a question about website use limits which we've quoted below.

Do I need to purchase a plan for each individual website I want to protect?

No, you can simply purchase one plan and then use your API Key for every website you own. This applies to both our free and paid plan account holders.

We know how frustrating it is when you signup for a service and they apply arbitrary limits. No one wants to signup for multiple accounts and we've never wanted to push complex multi-key management or licensing on our customers. And that is why our new CORS feature has no origin limits. Simply add as many origins to your dashboard as you need.

If you visit your Dashboard right now you'll find the new CORS feature is live and ready to be used. We do consider this feature beta so you may come across some minor bugs and we welcome you to report those to us using the contact us page on our website.

Thanks for reading, we hope you'll find the new client-side way to query our API useful and have a great weekend!

Hello from Eos

Image description

Today we've added a brand new server to our cluster and like our other servers this one is also named after a greek titan, specifically the goddess of dawn who rose each and every morning. Today our new server Eos will be rising to fill a great need within our cluster for more compute resources.

And the reason for that is over the past week we've received an unprecedented increase in very large customers which has put an increased load on the cluster. The performance has remained stable and fast but we always want to have a large buffer of extra capacity so in the event one or more server nodes has an issue the service is never disrupted.

And so Eos provides us with that extra buffer space. In-fact it's now our second fastest server in the cluster and later this year we will be transitioning some of our weaker nodes to stronger ones. Our intention is to double CPU performance and memory capacity on three of the five nodes before the end of this year.

A side benefit to adding Eos to the cluster is that our per-second API request limits have raised automatically by 25% so instead of being able to make 500 queries to the cluster per second you can now make 625. This should help those of you who see very high burst activity during attacks on your services and infrastructure.

We hope everyone had a great January just like we did and thank you for reading!

Starter Plan Pricing Change

Today we've increased the price of two of our starter plans and we would like to show you the changes and explain why we've made them. But before we do that please note these pricing changes only apply to newly started plans which means if you're already subscribed to either of the plans we've increased prices on you will remain on the previous plan pricing.

Image description

In the above image you can see we've increased the price of our first starter plan by $1 and our second starter plan by 50 cents. The reason we've increased these plan prices is because while these starter plans are by far the most popular plans we offer they do not generate a lot of revenue for the business.

We have to acquire 50 customers at $1.99 each to bring in the same revenue as a single customer at $99.99 but those 50 customers take up a lot more of our human resources due to the high support needs they generate. So to help offset these support costs we have decided to increase these plan prices.

Another solution we considered was reducing the starter plan size from 10,000 to 5,000 daily queries but we found most of the users purchasing the $1.99 plan still fell under 5,000 daily queries and so it wouldn't solve the problem of our high support costs vs revenue generated by those users. In short, they would be unlikely to upgrade their plan to access more queries.

Price increases are not something we ever want to do as it heightens the barrier to use our service but in this case our business needs dictated this change be made. And of course these new prices only apply when you begin a new plan, users already subscribed will not be subject to the increased pricing while their plan remains active.

Thank you for reading and we hope everyone is having a great start to the new year.

One more thing...

We thought the previous post would be our last update for the year but there is one more thing. Today we've added support for continent data throughout our service. Which means when you now do &asn=1 you'll now receive a response like this:

{
    "status": "ok",
    "98.75.2.4": {
        "asn": "AS6389",
        "provider": "AT&T Corp.",
        "continent": "North America",
        "country": "United States",
        "latitude": "37.751",
        "longitude": "-97.822",
        "isocode": "US",
        "proxy": "no"
    }
}

The new field obviously being the one labelled continent. We've also added support for this within our custom rules feature as a new condition and output variable. This will allow easier rules that target all the countries within a continent without needing to break out the atlas and enter each country name manually.

This was a requested feature by several customers and we were more than happy to oblige :)

Improving our Raven

As the year comes to a close we thought it would be interesting to share with you the story of Raven which is our internal codename for our current inference engine. Currently Raven is at v1.32 and runs all day on every one of our servers and it consists of three parts.

  1. The training engine which generally takes just under a month to create a new model which is then loaded on all our servers.
  2. The real-time Raven client which runs on your queries live within only a few milliseconds.
  3. The post-processing Raven client which runs on STYX our server dedicated to inference.

This is what Raven looks like when we're training a new model. Fair warning, it's not very visually interesting.

Image description

You can see in the above screenshot we started this model in early December and it is yet to complete. We're expecting it to finish around Christmas time. When developing the engine we had to overcome quite a few obstacles. Some of those we overcame by throwing more physical resources at the problem (cores, memory, storage). Others we had to solve with better software.

The first iteration of the Raven client was single threaded and we focused on acquiring servers with very high single thread performance (high IPC and clock speeds). We knew the industry wasn't moving in this direction and instead was building processors with higher core counts instead of increasing instructions per clock.

So after a lot of redevelopment the Raven clients that run on each of our cluster nodes were re-engineered to use multiple threads. We even added support for NUMA (non-uniform memory access) allowing us to efficiency make use of multiple processors in a single system. Our PROMETHEUS node for example is a dual-socket XEON system and it's where we primarily train the Raven engine once a month, it also runs the raven real-time client since this server also acts as one of our main cluster nodes which answers customer queries.

Another issue we've had to overcome is the engines determination throughput. Almost every day we deal with more queries than the day previous as the proxycheck service becomes ever more popular. In-fact it's not uncommon to break our single highest query day records several days in a row. So with all this constantly growing traffic the engine, especially the post-processing engine which is specifically designed to be more thorough needs constant adjustments and refinements to be able to process every address we receive on the same hardware we've allocated to it.

Some of these changes we've shared with you previously such as our bucket system of pre-computing and reusing data about similar addresses so that the engine doesn't have to start from nothing when forming a decision about an address it hasn't seen before.

Other changes to the way the engine thinks and weighs decisions have been made over time as we've learned what matters most when determining if an IP is bad or not. We have been relying a lot more on attack history as a way for the engine to make faster decisions as data that can be read from a database is a lot quicker to use than forming a decision based on weighting lots of abstract data points especially when a single address could have more than a thousand neighbours with varying levels of weighable behaviour.

We've also made a lot of structural changes to the plugins we've developed that our engine uses for evidence gathering. Making them more efficient with process recycling, shared memory pooling for their gathered data, socket reuse and remote socket use through an internal mesh network between our clustered servers and other resource use optimisations. We have in-fact developed an application we call Commander which can dynamically spin up extra resources as the load on our cluster becomes higher allowing us to expend more resources for evidence gathering when necessary.

To speak a little bit more about our evidence gathering, we do often probe addresses that the engine wants more information about. That means we will look for open services running from those addresses including proxy servers, mail servers and web servers. We will allow it to run scans to determine if an address has vulnerable services exposed to the internet. These scans help to provide concrete evidence of bad addresses and you'll find a lot of this data funnelled into our Compromised Server type responses. Other plugins for the engine load the website pages you include in your tags to us so we can categorise the page and assign it an access risk level.

With our constant adjusting of our behind the scenes software stack like Raven and its associated plugins (the Commander spoken about above is less than a year old for instance) it's not always evident that we're working on things, more visible features like the custom rules feature get a lot more show time on the site and our blog here but rest assured we're always improving things behind the scenes.

Over the past several months the service has been at its most stable with the least amount of node drop outs, least connectivity problems and our fastest ever response times. We've also been able to maintain a real-time processing ratio for our post-processing inference engine through constant code improvements without adding more hardware while still increasing the accuracy and thoroughness of the engine.

Looking past Raven we also greatly improved cluster communication. Many of you have noticed that the stats within your dashboard and at our dashboard API endpoints are updating significantly faster (at or under 60 seconds most of the time). This is not a coincidence, significant engineering effort went into redesigning that whole system for your benefit so we could make sure our stats continue to synchronise quickly no matter how much traffic you generate.

We are really looking forward to 2020 where we'll be continuing our refinement of our service in every which way. Thank you for reading this look at Raven and merry christmas!


Back