New homepage, footer changes and dropping google ads

Today we've launched a brand new homepage with the goal of drawing in more users by showcasing our amazing customer dashboard which we feel is our biggest differentiator in this space and a great asset.

Remaking the face of your website, the home page everyone sees when they visit for the first time is a daunting task and we've been quite conservative with our changes over the past two years but today we've taken a big step and we're very happy with how it turned out.

If you're very perceptive you may also have noticed that we've cleaned up our footer navigation across the site by removing some redundant links and visual separators. A more obvious change is our removal of Google Ads.

The reason for removing all ads across the site is due to them not performing well enough to warrant us carrying them. For the software developer community that our product is made for the usage of ad-blocking software is extremely high which results in very low ad views when compared to our page views.

So from now on we will not be displaying any ads on the site, not from Google or any other ad network, we'll instead be subsisting purely on the revenue made from selling paid plans.

We hope you like these changes and please do check out the new homepage!


Invoice history added to the Dashboard

This has been an often requested feature, the ability to view and print out past and current invoices. Today we've added the feature to the customer dashboard under the Paid Options tab and this is what it looks like: Image description We will be showing your most recent 100 invoices here, due to that possibly becoming quite a long list we've also added a hide button. To keep the page loading quickly we are loading in the invoice log after the page itself is loaded so the dashboard won't be slowed down at all by this new feature.

That's it for this update we hope you enjoy the new addition!


What happened on October 19th?

If you visited your Dashboard yesterday you may have seen a notice at the top explaining we had a very bad server failure on our HELIOS node which had caused many stats related issues. Today we will explain this very unusual failure and what we learned from it.

So to begin with, HELIOS had been our longest serving node. We have had that server for many years and it has had some hardware failures in the past including two failed hard disks. Yesterday was the most difficult type of failure to deal with from a programmers perspective, bad memory. To fix it we replaced the Motherboard, CPU and Memory so effectively HELIOS is a new server.

When writing any software you are building on a foundation of truths and what is held in the computers memory is something you have to trust as that's where all your software is actually living. It's very difficult to program a system to self diagnose a memory issue when the self diagnosis tool itself will likely be affected by the memory problems.

And that is exactly what happened here. Our system is designed to remove malfunctioning nodes from the cluster but in this case HELIOS's bad memory was causing it to re-assert itself. It even tried to remove other nodes from our cluster thinking they were malfunctioning because its own verification systems were so broken it was interpreting their valid health responses as invalid.

The reason this affected our stats processing is because to keep our cluster database coherent, to stop conflicts caused by multiple nodes processing the same data at the same time we use an election process where every so often the nodes hold a vote and one healthy node is selected to process all of the statistics for a given time period. Due to the HELIOS node memory issues this voting process did not work as intended.

What we learned from this is that we needed a better way to completely lock out malfunctioning nodes from the cluster and we needed more points of reference for nodes to self diagnose issues and preferably to break themselves completely when they discover problems that would need human intervention instead of continuing to harm the cluster by remaining within it.

Today we think we've accomplished both of these goals. Firstly we've setup a lot of references in our health checks for self diagnosis that weren't there before. This isn't a foolproof solution but if any of the references are corrupted it shouldn't allow the nodes built in self management system to start arguing with the cluster and voting other nodes offline or at-least if it still has the working capability to perform votes it should neuter itself before attempting to vote on other nodes health status.

Secondly we've broadened our nodes ability to lockout bad nodes by revoking the tokens needed to be a part of the cluster group. This means good servers with a consensus can remove the "passwords" required to access the cluster by a malfunctioning node.

A third change that we've made is having known good nodes act faster when they are removed from the cluster while they're still functional by allowing them to initiate a confidence vote amongst the other nodes, this can be done in just a few seconds after they are removed from the cluster if the node thinks it's working correctly. Only nodes with perfect health scores over the past 3 minutes are allowed to vote in these decisions to reduce false positives caused by malfunctioning nodes.

Also we should mention although we only have three nodes listed in the cluster there is in-fact 5 nodes. Two of them do not accept queries and are not front-facing and instead work behind the scenes to manage the health, settle vote disputes and step in under another nodes name if there is a serious enough issue to warrant that.

We are of course disappointed that this failure occurred, many of you contacted support yesterday via live chat to express your concerns and we're very sorry that this happened. We're especially sorry to those of you who received overage notices due to the invalid query amounts that accumulated on your accounts and we hope you can accept our sincere apology for that. Our hope is that with these changes something like this will never happen again.

Thanks for reading and we hope everyone has a great weekend.


Minor stats issue yesterday evening through to this morning

Just a quick notice, yesterday evening we renewed some of our internal security certificates and although we set the new certificates to be applied to all three of our server nodes they were in-fact only applied to our prometheus node.

Due to this, customer stats including how many queries you've made and your positive detections were not being updated within your dashboard. The good news is, none of these stats were lost, they just weren't being processed, we have now corrected the certificate issue and all of your stats from the affected time period will now be reflected accurately within your dashboard.

We're sorry for the inconvenience this caused.


Survey Results and other Statistics

Hello everyone, in mid September we asked you to fill our a survey and we included a link to the survey in the customer dashboard. We're pleased to say many of you did fill out the survey and we would like to share the results with you. We're also going to share some updated performance stats at the bottom of the post.

So in the Survey we asked you the following questions.

1. Has proxycheck.io helped your property stave off proxies and VPN's?

100% of all respondents selected "Yes, it often works" which is a great result. The other choices were it sometimes works and it never works so we're very happy that the service is working very well for everyone who took the survey.

2. How well do you consider the proxy detection?

  • 50% selected 10
  • 25% selected 9
  • 12.5% selected 8
  • 12.5% selected 7

We're happy that we did not score any 5 or below here, but clearly we can do better. 25% of our respondents voted between 7 and 8 and that's definitely lower than where we want to be. Though we are happy that 50% felt the proxy detection was perfect and 25% felt it was near-perfect.

3. How well do you consider the VPN detection?

  • 62.5% selected 10
  • 25% selected 9
  • 12.5% selected 8

This surprised us as we feel that we're stronger on proxy detection than VPN detection but regardless we're very happy to see everyone vote 8 or higher for the quality of our VPN detection. We are of course still highly focused on improving all our detection types.

4. How do you feel about the plan pricing?

  • 87.5% selected 1 which means "Very Affordable".
  • 12.5% selected 10 which means "Very Expensive".

We do tend to agree with the 87.5% who said our pricing was very affordable. We didn't have anyone select between 2 and 9 in this question and perhaps some who voted were confused about the 10 and 1 being switched around in this question compared to the others. In any event we don't intend to increase our prices this year so we're glad that the overwhelming majority felt the prices were very affordable.

5. How easy have you found the proxycheck API to use?

  • 62.5% selected 10
  • 12.5% selected 9
  • 12.5% selected 8
  • 12.5% selected 7

We're glad to see that the majority feels the API is very easy to use. We can certainly make it easier through better documentation and providing more sample code, we're actively looking to partner with third party developers to get more examples, functions and libraries made for all manner of coding languages.

6. How easy have you found the proxycheck customer dashboard to use?

  • 87.5% selected 10
  • 12.5% selected 9

We're really happy here that so many felt the customer dashboard was easy to use. We have invested a lot of time into making it look great and usable. We've also listened to a lot of customer feedback to bring many features to the dashboard such as Two-Factor Authentication, Country data in the stats, searchable detection logs and more.

7. How have you found the proxycheck.io support? (Live Chat, Email etc)

  • 87.5% selected 10
  • 12.5% selected 9

Here again we saw some great responses with universal praise of our support. We're working to increase the hours we're available on support chat and answering emails faster than ever. In-fact 90% of all the support emails we receive are answered within 30 minutes.

Also we have been able to help many different customers through our live chat system for all manner of requests. Things like free trials, extending paid plans when our customers are having some temporary financial trouble, upgrading and downgrading plans with prorated differences and generally solving our customers issues in a convenient and fast way. I do believe our high customer service score is a reflection of our ability to get things done in a timely fashion.

8. Extra Feedback

In addition to the questions above we also asked customers to provide us with any extra feedback they wanted to write. Many of you wrote messages simply stating your love for the service its good monetary value and the level of support you've received. We're very grateful for these messages.

Some of you also took the time to write about features you would like to see added and issues you found around our website and API. We're happy to say that we added all the features that were requested and we fixed all of the issues raised within 24 hours of receiving each message.

For example we added searching and filtering to the positive detection log under the stats tab within your dashboard. This was as a direct result of feedback in the Survey. We also fixed some UI oddities like the placement of certain navigation buttons, these changes were done as a result of another customers survey answers.

Finally we fixed many minor issues around the site that caused console errors in web browsers. Mostly Javascript errors arising from the reuse of scripts from other pages but also some due to insecure content (Fonts loaded over HTTP) within secure pages. Nothing that broke functionality but things that did cause page errors and were important to fix.

We're very thankful to everyone that took part in the survey and especially to those who spent a lot of their time filling out very detailed answers for the extra feedback box, all of the information you provided was invaluable and we acted upon all of it very quickly.

Now apart from the Survey results we also wanted to share with you an update to our performance metrics. Back in May 2018 we showed you a graph which detailed the breakdown of our query answer times (including network overhead through our CDN partner CloudFlare) as a percentage.

Today we're updating this graph to show the work we've been able to accomplish since then through optimising our code and prioritising the checks that take the most time.

Image description

We're now answering 32.78% of all queries in under 25ms where as in May 2018 that was only 23.07%. If you look at the graph as a whole you can see we've maintained the 50ms, 75ms and 100ms leads with our new code while moving down those queries that were taking around 225ms and higher to the lower latency positions.

The big take away here is that 75.11% of all queries are now performed at or under 75ms. This is a big difference from our original code where only 9.69% of our queries were answered at or under 75ms and even a sizeable improvement over our May 2018 code where 51.18% were at or under 75ms.

We're really happy with these improvements which make it possible to use our API in more latency sensitive deployments. We've also been able to accomplish these latency improvements while having the volume of queries we handle increase by several hundred million per day.

We're still optimising and looking for more ways to improve latency but we feel there is a night and day difference between where we were a year ago to now, it has been so vast that we've relaxed our per-request IP limit from 1,000 to 10,000 and we're fully comfortable doing that due to how performative the API has become over the past several months.

So that's it for this update, thank you again to everyone who took part in the survey and we hope you all had a great weekend like we did after seeing these results.


Temporary degraded syncing

Due to a failing disk which significantly reduced database performance to only a few kilobytes per second our cluster has been finding it difficult to sync important data between nodes starting from mid day yesterday to early this morning. We did detect and correct this behaviour by taking the affected node offline and installing a replacement disk drive.

At no time were queries affected as all data accessed there is done so from memory for performance reasons. But new account creation, account changes such as adding/removing whitelist/blacklist entries were significantly delayed due to the slow syncing performance caused by the degraded disk.

We apologise to all customers that were inconvenienced by this problem, we should have taken the affected server offline sooner and we would have except it wasn't quite obvious to us at the time why the server was not syncing at the speed we expect it to.

Thank you.


Dashboard Improvements: Log Filtering!

Today we'd like to share with you an update to the dashboards positive detection log as we've now added a powerful new filtering feature.

This change has been a direct result of your survey feedback, so please if you have some ideas you would like to see implemented contact us or fill out our survey here, we really do take your ideas into consideration.

Below is a screenshot showing us filtering by a country but you can also filter by time, date, address, node and tag. We've also provided a dropdown menu to quickly filter results by a specific detection type (All, Proxies, VPN's or Blacklisted results).

Image description

You may also notice if you're very perceptive that we've moved the "View Older Entries" and "View Newer Entries" buttons around. This was also done based on feedback provided through our ongoing customer survey. Someone made a very good point that the button placements were unintuitive and go against normal user interface conventions for traversing content so we've switched them around.

That's all the updates we have for you today, we hope you will all like the new filter feature, we're sure it will be a well used addition.


Inference Engine Improvements

Over the past month we've been working diligently on our post processing inference engine. This is the machine learning system which does the heavy lifting on an IP after the real-time inference engine has made an attempt to determine if an IP is a proxy and then not made a positive detection.

Our main goal with the changes have been to dramatically reduce system resource usage whilst also gaining higher degrees of accuracy and better performance. We worked on the problem in three separate stages over the past month.

  1. Improve detection accuracy
  2. Increase performance
  3. Decrease resource usage

We achieved increased accuracy on the 14th of September. Since we implemented those changes we've seen more than a doubling in the detection rate with no increase in false positives. We achieved this by allowing the engine to spend more time per IP to make its determinations, increased use of pre-computed data (which we implemented on the real-time inference engine some months ago) and improved methods which we learned by examining old data, so we can lead the engine into better outcomes.

We increased performance by giving the engine the ability to create more simultaneous processes with which to process data. This had a detrimental affect on overall system performance because when we increased accuracy one of the ways we did that was by allowing the engine to spend more time processing an IP than ever before, in-fact we increased the time by 3x which had a direct correlation to how long the process running the engine must stay open and consuming resources.

So where as before our inference engine was using around 30-40% CPU on ZEUS and HELIOS and around 10% on PROMETHEUS (our strongest node) we found both ZEUS and HELIOS at 90-100% CPU usage and PROMETHEUS around 20-30%. This is obviously not good.

At first we tried to tune the engine using different configuration settings, placing limits on thread creation and so forth. But this only created issues where the engines running on all three nodes weren't able to clear incoming IP traffic fast enough and were falling behind.

So another approach was decided, we would scrap our old engine scheduler and create a new one which we're calling the Inference Engine Controller (I know it's a very unique name) and this perfectly balances and spawns different processes for our engine to use. Now we've never re-spawned processes per IP, that would be highly inefficient. But we usually have one process per 1,000 addresses.

With the new controller we can actually place a certain number of IP's in buckets together. Grouping addresses based on their subnet and ASN relationships. This dramatically speeds up inference time for closely matched addresses as much of the inference work doesn't need to be thrown away due to an IP having no relationship of any kind with the previous one that was just checked.

With us now dealing with hundreds of millions of checks per day there is a huge degree of similar addresses waiting to be processed, sometimes off by just one octet. In that kind of situation 99% of the inferred work only has to be computed once and can be used for both addresses resulting in a near instant determination for the second address.

So lets get to the results of all the work. Today we're seeing CPU utilisation of around 7 to 8% on ZEUS and HELIOS and around 1-2% on PROMETHEUS. All while being able to process 10x more addresses and with a much higher accuracy. Now again these changes are all for our post-processing inference engine so there isn't a performance improvement to the API, at-least not directly, although the lower CPU usage in general may help the API be more snappy and consistent.

We're still working on improving the inference engine and we hope to take some of what we've learned here and apply it to the real-time version in the future. We think the bucket-type system we've devised could be utilised on the realtime system if the queries per second to the API reach a certain threshold so that the availability of similar addresses is high enough to make it beneficial.

Another avenue we're looking at is storing inference data as a type of array in memory so that if an IP is similar to one already processed very recently the computational work used for that prior determination can be re-used by the real-time inference engine in the immediate moment, but more testing is needed to evaluate the latency impact of accessing an "inference map" even when held in fast system memory.

What we're describing above is decidedly different to the pre-computed data we currently store on disk for our real-time inference engine where just determinations are stored and not the inferred network data that came to that decision. That is something that is only really possible with IPv4 addresses and some (but nowhere near all) IPv6 addresses. By having the network determinations stored at every decisive stage in memory it allows inference about similar but different addresses to be performed without recomputing all of the work, that should in theory result in some fantastic speed improvements.

So that's all we have for you today, we've been quite busy over the past month working on this and we're really happy to share it with you now.


Take a survey and share your thoughts!

Although we often receive emails from customers where they ask us for features, tell us about bugs or other feedback we thought it would be a good idea to create a survey and ask our customers exactly what they think.

To take part simply click here no account is needed and we're not collecting email addresses. Simply make your selections and hit submit.

At the very bottom of the short survey we've included an optional feedback text field so you can write anything you want. Thank you everyone that takes part in the survey, it does mean a lot to us. We'll be linking to the survey in our customer dashboard for a short while aswell.


New threats page!

Today we've put live our new threats page which gives detailed information for specific IP Addresses. Similar to our web interfaces page but in a more eye pleasing and detailed presentation.

At the moment only the IP specific pages are live at the threats tab and in-fact if you visit the tab it will take you to your own IP addresses report by default. But we intend to add a live threat page there showing recent bad addresses that are attacking our customers infrastructure.

We're hoping the new threats pages will make the service more useful to the general public who are looking for specific IP information as the new threat pages can be indexed easily by search engines. We've also added links to the threat page for specific addresses shown within your dashboards positive detection log.

The page lookups work similar to the web interface in that queries made to it will incur against your querying address or API Key if you're logged in. We've done this to hinder web scraping, put simply it shares the same queries as your account this way.

We'll update you again once the main threats page is live, we're still working on that one.


Back