API Performance Improvements

Today we have rolled out new versions of our v1 and v2 API's with a focus on reducing query processing time and the improvement is quite drastic.

Prior to today it was common to have a full query which means having all flags enabled (VPN, ASN, Inference etc) take between 7ms and 14ms to process. Today we've been able to reduce that to between 1ms and 3ms on average.

The way we've been able to accomplish this is by porting some of our changes that we've created for our yet unannounced v3 API back to v1 and v2. The changes are mostly structural dealing with the processing of our code, how it's compiled and cached between queries and finally how it's executed again for subsequent queries.

Although we've always reused processes for different queries (due to the time it takes to setup a new process being too long) we're now doing it more efficiently with more data being retained by these processes so they don't need to reload as much information into memory between queries.

We've also altered our caching system for our code to have a tiered storage approach. Code files are now loaded from disk into memory by one of our processes and then the opcache retrieves the code from that memory based file cache before compiling it and storing the subsequent opcode in memory. This results in more consistent performance as the opcache no longer needs to reload or check modified dates of code files from our physical disks, they are instead held in memory by our file based caching process.

This change is important because opcache checks code files frequently to determine if the file containing our code needs to be re-compiled and cached again, storing the files in system memory thus keeps performance consistently high over long periods of time.

We're also making more efficient use of the compiled versions of our code by removing comments and other nonessential text from the code files prior to compilation which makes the compiled versions smaller and faster to run. Finally in the code itself we've reduced database calls which cause processor context switching so we waste fewer CPU cycles through gathering and sorting information and more time displaying information. This ties into the change we mentioned above with regards to our re-used processes storing more data in memory for each query to make use of.

So what is the net benefit of having the API respond this quickly? ~7ms to ~3ms may not seem like a lot of time to save. But put simply, it allows us to handle 2.3 times more queries per second on each cluster node.

It also means we can answer your specific queries faster so you're not waiting as long for an answer from us, that allows you to use the API in more time sensitive deployments. Some of our customers are already so physically close to our infrastructure they can receive an answer (including network overhead) in under 20ms so we're getting to the point where every single millisecond we save counts.

Thanks for reading and have a great day!


DDoS Attack Post-Mortem

Yesterday between 11:12 PM and 11:59 PM GMT+1 we were faced with a DDoS attack of substantial size. The attack was so large it produced five times the API query volume than we normally receive.

During the attack our API performance did degrade but the cluster did not go down and continued to answer legitimate customer queries. Our average query answer time increased from 12ms to 1,250ms (1.25 seconds) as the graph below illustrates:

Image description

We found that 25% of our customer traffic was subject to this increased latency. The other 75% continued to receive low response times of around 15ms to 30ms. The reason for this is due to the attack coming in frequent but short bursts. Likely a coordinated attempt to create the highest impact on our service quality.

We don't know who is behind the attack at this time as we have received no communication from those responsible. We're not yet planning any changes to our infrastructure as a result of this attack but we are monitoring our service closely and may introduce more nodes to the cluster if we feel it's warranted.

Thanks for reading and we hope you're having a great week.


April Newsletter

Today we sent out the first newsletter of this year to users who have the "New features and improvements" email toggle enabled within their dashboard. This has been the widest distributed newsletter so far with several hundred users opting in to receive it.

If you didn't receive the newsletter but would like to read it you can do so here on our website.

We've made quite a lot of changes since November 2018 when we sent our last newsletter. We only publish two per year so you can expect our next one around October-November later this year.

Thanks for reading and have a great weekend!


New cluster node and other infrastructure changes

Much like the last time we added a new node we've been looking for the right server for a while. One that offers very high IPC (instructions per clock) but also has lots of threads for handling a high volume of simultaneous connections. We believe we've found both of those qualities in our new node we're calling RHEA.

We now have four nodes in the cluster and they are spread out across three countries. And the two servers that are in the same country are hundreds of miles apart. We also have two nodes that aren't in the cluster day-to-day but can be inserted within a few seconds in the event multiple meteor strikes hit all the datacenters we use.

But we haven't only been working to add a new cluster node. Over the past month we've been optimising our post-processing inference engine which runs exclusively on STYX which was a new dedicated server we added in February. We've been able to significantly improve performance which means the server is scaling to meet our increasing daily demands.

In addition to these changes we've been working on improving our infrastructure. We've invested in an entirely new load balancing method. We setup our own nameservers and can now perform anycast DNS based geographic routing. Which means in the future when we enable this capability we will be able to direct you to one of our cluster nodes that is physically closest to you.

Although these are all under the hood changes they have taken a significant amount of capital and development time to achieve. They're not very glamorous but are important for the longevity of the proxycheck.io service. And all of it was delivered without a single second of downtime.

Thanks for reading and we hope everyone is having a great week.


Updated Contact Us Page

Today we've updated our contact us page with a new web form to make it faster and easier to get support. Best of all when you're logged into your account while using the new contact form we automatically receive information about your recent usage of our service so that we can assist you more quickly.

Image description

You may notice we've also included subject and a priority drop-downs. This will help us triage support queries so that customers with the greatest need will be assisted first.

But don't worry, if you prefer to send us an email yourself from your own email client you can still do that. We're also still supporting our Live Chat and iMessage contact methods.

We hope you'll check out the new improvement and perhaps give it a try the next time you need to contact us for anything. Thanks for reading and we hope everyone is having a fantastic week!


Threat Page Enhancements

Today we've updated the threats page to include both our threat assessment risk score system and new navigation buttons to quickly traverse backwards and forwards from the currently displayed IP Address which should make the threats page much more efficient to use.

Below is a screenshot showing both changes, the new navigation buttons can be found to the top left of the map while the new risk score is displayed under the detection determination.

Image description

We hope you like these improvements and remember at the bottom right of each page you'll find our "last update" feature which lets you quickly see all the changes to the current page your viewing.

Thanks for reading and we hope everyone has a great weekend!


New Risk Assessment Score

Today we've introduced a new feature to the v2 API endpoint which allows you to get a risk score for an IP Address. This draws on the immense volume of attack data we have combined with our knowledge of active proxy servers and virtual private networks.

{
    "status": "ok",
    "node": "PROMETHEUS",
    "140.143.90.193": {
        "asn": "AS45090",
        "provider": "Shenzhen Tencent Computer Systems Company Limited",
        "country": "China",
        "city": "Beijing",
        "latitude": 39.9288,
        "longitude": 116.3889,
        "isocode": "CN",
        "proxy": "yes",
        "type": "Compromised Server",
        "risk": 100,
        "last seen human": "56 minutes, 33 seconds ago",
        "last seen unix": "1551868636"
    },
    "query time": "0.006s"
}

Above is an example query and just below the proxy and type responses you can see a new risk score. This can range from 0 to 100 and it's a percentage value. Anything below 33% is considered a low risk while between 34% and 66% is considered a high risk and anything between 67% and 100% is considered dangerous.

We've added this score so that you can glean more information about an IP, specifically how dangerous it is on top of the proxy yes/no determination. Scores will generally be between 66% and 100% for positive detections depending on how much bad activity we're witnessing from that IP while negative detections will generally be below 10%.

To access this new feature you must supply a new flag with your queries which is &risk=1 and please remember not to rely on this score to make all your determinations, we recommend you provision your software for proxy: yes/no although you may want to fine tune your blocking based on how risky the IP has been determined to be.

Thanks for reading!


Warrant Canary Updated

Today we have updated our warrant canary. It was meant to be updated on the 1st of January 2019 but with some internal changes that were made last year we did not get the reminder to update the warrant canary until now, apologies for the delay.

As always the warrant canary can be found here: https://proxycheck.io/canary.txt And our public key to verify the canary is still the same one from 2016 and is available here: https://proxycheck.io/pubkey.txt

Thanks and sorry for the delay!


Improved Positive Detection Log

Today we've pushed out an update to the dashboard which allows you to expand the entries listed to get detailed information including the network owner, hostname, port number, proxy type and even attack history from across our network. Below is a screenshot showing an expanded entry.

Image description

This new feature is live for all customers as of this post, we know that providing more information within the postitive detection log has been a frequently requested feature and we're happy to be able to deliver it today.


New server dedicated to Inference added!

Today we'd like to share with you a new server we've added to our family called STYX, it joins HELIOS, PROMETHEUS And HELIOS within our cluster but it has a dedicated job, inference.

Specifically, post-processing inference. See we've run into a bit of a problem, the volume of incoming traffic we're receiving is now so vast that processing all of the undetected traffic is turning into a huge burden for our main cluster nodes to handle. They have to run the API, do live inference, host the website, coalesce, process and synchronise statistics and while doing all that they also have to go through a literal mountain of addresses and figure out which ones are running proxies as part of our post-processing inference engine.

Below is an illustration showing our current system where each node handles its own incoming addresses and then simply updates the other servers about any new proxies it discovers amongst that data.

Image description

As you can see 50% of working time is spent on the API, as it should be. But 25% is spent on our post-processing inference engine. And recently this has meant we can only process 1/20th of the undetected address data we're receiving. This means if we find an IP that isn't already in our database there is only a 1 chance in 20 it will even get processed by our post-processing inference engine.

Now to fix this we've tried a lot of different things from precomputing as much data as possible and storing it on on disk, we've tried reusing inference data for common IP's (for example if two IP's are in the same subnet a lot of the prior computational work doesn't need to be done again). But all of this isn't enough because the volume of addresses being received is simply so high.

In addition to this we have a privacy commitment to our customers to only hold undetected IP information for a maximum of one hour. So we're up against the clock every time we receive an IP that needs to be examined by our inference engine.

So what is the solution? Well we've decided to invest in a new dedicated server with a lot of high performing processing cores and a lot of memory to specifically deal with this problem. We've ported our Inference Commander and Inference Engine software to this new standalone server where it can spend 100% of its time working on inference.

Below is an illustration showing how our three main nodes now have their addresses downloaded by our new server we're calling STYX before processing on its immense compute resources.

Image description

Already we've been able to move from processing only 1/20th of the addresses we're sent per day to processing 1/7th and we're confident we can increase it further until we're able to process every single address we're sent by carefully examining where the bottlenecks are and solving them. With this new server we can run it at 100% without worrying about other tasks suffering as it doesn't host our website or API, its sole purpose is inference.

The other benefit of this new server is that it frees up the main nodes to handle more customer queries, we've already seen improvements in query answer times during peak hours and that directly correlates to being able to handle more queries per second.

Thanks for reading and we hope everyone is having a great week!


The following is an edit to this post made on the 7th of Feb 2019.

As of this update our new server is now processing 100% of all the undetected addresses we have coming in through our post-processing inference engine software. A big jump up from the 1/7th we originally quoted when this blog post was made. Over the past several days we have been tweaking and gradually increasing the volume of queries and today we have hit a more than sustainable processing threshold allowing us to process all incoming data. We're very happy with this and so we thought an update was in order :)


Back