Improved dashboard query stats

One of the things our customers have asked for is a way to view yesterdays query stats in their dashboard. This is a useful metric because it helps potential customers estimate how many daily queries they will need to purchase if they're going over the 1,000 free queries we provide.

So today we've gone ahead and spruced up the stats view on the dashboard, you can now page through the last 30 days of queries on a day by day basis and you can download your stats as a text file or export them using our JSON Export button. And like with our positive detection log you can query the JSON Export API using just your API Key so it can be integrated into your own control panels.

Here is what the new stats section looks like:

As always these new features are instantly accessible to all users whether you're on our free or paid plans. We've also enhanced the subscription UI if you're subscribed (monthly or yearly) you will now see how much you will be charged on your next billing date, prior to today we only showed you the next billing date and of course that's not useful if you forget what your plan costs.

We hope you all like these changes as they are all a direct result of your feedback, we really appreciate how helpful everyone has been with reporting the issues they find and requesting new features like the enhanced query stats we added today.

Thanks for reading and have a great day!

Customer Q&A

Since the service has been going quite a while now I thought it'd be a good time to answer some of the most frequently asked questions we get from customers. Firstly we welcome these questions so please feel free to keep sending us questions via our live chat or email (support@proxycheck.io).

So lets get into the top questions we receive.

Question: Why is the free service limit so large? (1,000 queries per day) and why are the paid plans so affordable?.

Answer: We know that there are two types of proxy detection API's out there, the free kind where absolutely every query you make is free up to a usually unexplained "reasonable" amount and the paid kind where the cost is extremely high, especially for new developers who are simply nurturing an idea into existence and thus don't have the capital needed to purchase expensive subscriptions.

So our free plan is quite generous because we're fighting for market share from the other free players in this category. Our main free competitors have a lot of mindshare because they're free but the service level offered isn't that great. Some of them did not or continue to not offer TLS querying, they are ambiguous about how many queries you can make per day before being cut off and none of the ones we've found offer cluster redundancy to free customers. There is also some questions about how accurate free services are when the creators are not being incentivised monetarily to keep their API's up to date with the latest detection methods.

So these are areas we identified where we could offer something better to grow our marketshare, to pull some customers away from the free services and perhaps in the process convert them to paid customers but even if they don't convert, simply mentioning us online (as developers often do) is more than enough pay back for the free service they enjoy.

As for why the paid plans are so affordable, we don't have much overhead as we run proxycheck.io very lean. We've worked hard to write it in a way that it scales on mid-range server hardware. If you look at the companies that are offering paid only proxy detection their plans are often several times more costly than us but they have enormous overheads with lots of employees. We simply don't, we know that proxy detection is mostly a niche service, it's not as popular as for example geolocation services.

And the other side of that coin is we believe all developers should have access to the best proxy detection service for the lowest prices. Right now our starting plan offers unlimited concurrent querying and 10,000 daily queries for only $1.99 a month or $1.59 a month if paid annually. Some of the lower paid only providers start charging at $8.99 a month and that's just too much for most peoples websites.

But rest assured we have a lot of paying customers, the service is profitable and we intend to maintain our current pricing while adding lots of new features and further improving our detection methods.

Question: Why do you only accept Debit/Credit cards and not PayPal? (I want to pay with PayPal)

Answer: This question has come up more often than any other and to be quite honest PayPal is not an enjoyable company to work with, it's more like a necessary evil. Part of us being able to offer prices as low as $1.99/$1.59 (Annually billed pricing) is because we don't use PayPal.

They take quite a large slice of each transaction. Not only do they take a percentage but they take a set fee as-well and it would simply eat into our revenues too much. We would have to increase our lowest priced plan by an extra dollar.

PayPal also has a nasty habbit of closing accounts on a whim, something we can't be dealing with when our business relies on the automatic reoccurring payments made by our customers.

Naturally we understand the reason people like to use PayPal as a customer, it keeps your Debit/Credit card information safe as the merchant doesn't get it, but this is partly why we partnered with Stripe instead. Their card processing fees are much lower than PayPal and we still do not get your card information, only Stripe has it and they're fully PCI compliant like PayPal is.

Unfortunately we cannot offer PayPal at this time although it has resulted in quite a few lost sales I think you can understand from our perspective we're reluctant to enter into any partnership with them.

Question: Can I make an app that includes your API and if so can I sell that app to others?

Answer: You sure can, we welcome you to make all manner of software which includes our API and if it's really good we'll even feature it on our examples webpage so feel free to shoot us an email when you've made a great app!

Question: How are you detecting/gathering the proxy addresses to be blocked?

Answer: There are a lot of different ways we use but the most common two are scanning different websites all over the internet all day every day to find new IP Addresses which are acting as proxy servers. The other way is collecting IP Addresses by testing them using our own inference engine.

The inference engine is a type of machine learning system where by we set it some goals and a working set of data and a very rough guideline of how to apply evidence based reasoning to sorting that information. In this case good IP Addresses from bad ones. Over the past few weeks it has quickly become a major source of proxy server addresses for us and during our testing we've found 90% of the addresses our inference engine finds are unknown to other major proxy checking services we've tried.

On top of these major ways we get around 0.25% of our data from third party sources who we either pay for their data or they make it available for free without commercial license stipulations. It's important to realise we cannot have total detection so we look to third parties when they have reasonable pricing or licensing and offer unique address data that we didn't have. Thanks to our inference engine we've seen our database almost double in size and so our reliance on third parties has fallen drastically as a result.

I should also mention we operate 20 honeypots situated in different address ranges around the world and from these we feed our inference engine data about attacks being performed on our honeypots. We monitor for VNC, RDP, FTP, Telnet attacks and also Email spam, Website signup/comment spam and more. These have been a great source of data for us.

Question: Why are you using CloudFlare?

Answer: CloudFlare is a CDN (Content Distribution Network) and they have servers all over the world, as do we. They can offer us great connectivity to our customers anywhere on the globe, even in areas we don't have servers.

The other benefit is our cluster was designed from the very beginning to work in tandem with CloudFlare and that is why we're able to offer triple server redundancy without requiring our developers do anything special with how they query our API. It just works, it's incredible fast and we've designed it into our cluster from day one. Whether we have a single server or 2,000 servers in our cluster it will continue to just work.

Question: What's your opinion of x service that denigrated your service?

Answer: Whenever you enter a competitive market you're going to get some disparaging comments from the other established players. We've actually been offering proxy blocking services (for free) since 2009 but it was only in 2016 that we put up the proxycheck.io domain and decided to turn it into a proper business. So what I'm saying is we have a lot of experience, we're not new to this and frankly our service is already one of if not the best offering developers incredible flexibility in pricing and features.

Question: Your biggest plan isn't big enough for me or I'm worried I will outgrow your service

Answer: Currently our biggest plan is 2.56 Million daily queries for $29.99 a month / or $287.90 a year (20% discount for annual pricing). However this is just our biggest set plan. If you need twice, thrice or quadruple this amount of queries (or more) simply let us know, we're offering very competitive pricing.

For example for $39.99 a month you can have 5.12 Million daily queries. We're not suddenly charging extortionate pricing for custom plans so feel free to shoot us an email and we can discuss your needs.

Question: Can I really cancel my plan at any time?

Answer: You sure can. From your dashboard you'll see a new Cancellation button if you're currently subscribed to any plan (monthly or yearly). The best part is if you cancel before your current plan ends you don't lose what you've already paid for. So you can purchase a one month plan, cancel it after a few minutes and still enjoy an entire months worth of your paid plan.

We've done this because we know some are hesitant about automatically renewing subscriptions, you don't want to be caught out with a payment you forgot was coming and so we fully support your ability to cancel a subscription without losing anything you've paid for.

And that's all folks!

We hope you've found this little Q&A useful. All of these questions were put to us by customers, often many times. If you have any other questions please feel free to email us at support@proxycheck.io we aim to answer all emails within 12 hours.

Improving node efficiency

Recently we've been focusing a lot of effort on improving the performance of our API. We've reduced overall query access time, improved network peering to lower our network overhead, added new query caching software and reformatted how our data is stored.

But over the past few days we've been focusing on the CPU usage of our nodes. With the inference engine running constantly and our API having to answer millions of queries per day we found that the CPU usage on our nodes was getting quite high. Here is a image depicting an average 60 seconds on one of our nodes, HELIOS.

As you can see from the graph above the CPU usage is quite consistently high around 55-60%.

So to figure out what is causing this consistently high CPU usage we looked at our performance counters and also the data from Ocebot. What we found was, this high CPU usage isn't being caused by API queries directly. Our level of caching and code efficiency is very high there and the impact of even several hundred thousand queries a minute was not causing these kinds of high load scenarios.

Instead we found it to be caused by the inference engine (about 10-20% load) and our database syncing system (25-30%). So combining these it's easy to get around 55% usage all the time.

To fix it we've rewritten some core parts of our syncing system, we did some code refactoring to this system last month so that some of our data that changes very often enters into a local cache to be synced at timed intervals. This coalescing of database updates allows for a higher efficiency because data that changes very often (hundreds or even thousands of times per minute) are being synced only one time instead of hundreds or thousands of times.

But what we found is, as our customer base has continued to double every few weeks that the amount of data we need to cache before syncing has increased too. So what we're doing now is staging all cluster database updates in local node caches.

As for the inference engine, we have manually gone in and altered some of the algorithm to remove some learned behaviour which got results but in an unoptimised way, artificial learning still has a way to go or at-least our implementation does. This has also resulted in lowered CPU usage.

So here is the result of our work:

Now we're seeing much lower average CPU usage, from 55% to around 7% with peaks to 10-15%. We're still optimising for CPU usage but we think we've hit all the major CPU issues with this update and we're now looking at other aspects of the service for improvement. The good news is by doing this kind of work we can put off purchasing another node for our cluster which leaves more money to pay for development and partner services instead of the servers that run our infrastructure.

Thanks for reading and have a great day!

Improved ASN data and lower response times

Earlier today we made a post about our ASN data source having some network issues causing us to have incomplete ASN data. We have sinced switched data sources for ASN information which has resulted in two benefits.

  1. Queries that ask for ASN data are now being answered in 100-200ms instead of 400-600ms.
  2. We now have ASN data for IPv6 addresses.

Previously only IPv4 was supported for ASN lookups and those took quite a while (relatively speaking) to be answered, we're now using a much better partner for this information which allows us to store more ASN data on our servers themselves resulting in faster and more complete lookups.

Thanks!

Minor Node Issue with HELIOS

Around 10 hours ago an intermittent syncing issue with our HELIOS node began where by it wasn't syncing some of its data with the other nodes in our cluster, stats data and new user registrations specifically were not being synced by this node. This morning we discovered the issue and it has been corrected by the time you're reading this post.

At no point did any customer data become lost and your query stats should show correctly as of right now. Also at no time was the API giving bad or incomplete data as the syncing of that information was working correctly at all times.

We're very sorry that this occurred and we're investigating why HELIOS was not removed from the cluster permanently, our initial findings seem to indicate it synced up completely a few times and then re-entered the cluster only to fall out of sync almost immediately and then was removed again after a long delay. We will be adjusting our cluster architecture to be more resilient to these kinds of intermittent faults in the future.

In an unrelated event we're having some latency issues with our ASN data suppliers network which has resulted in diminished ASN information and also higher latency. Expect 4-5 second replies for queries that contain the ASN flag and for the information to be incomplete (lacking country information sometimes). We expect to have this working correctly again soon.

Thank you for your patience and have a great day.

Real time Inference Engine

As we mentioned in our previous blog post about Honeybot our machine learning inference engine has become so fast in making determinations about IP Addresses that it exhausted our backlog of negative detection data and this subsequently slowed down its self iteration considerably.

We've now reached a point where the algorithm we have is able to consistently make an accurate assessment of an IP Address in under 80ms and so we've decided to add the inference engine to our main detection API for real time assessment.

What this means is when you perform a query on our API we now have our inference engine examine that IP Address at the same time as our other checks are being performed. Our hope is that we can provide a more accurate real time detection system instead of only fortifying our data after a query is made.

Our inference engine is still doing exhaustive testing on IP Addresses that have negative results to find proxies we weren't aware of and our system still performs checks on the surrounding subnet when it feels confident there are other bad addresses in that neighbourhood but all these checks are still being done after your query in addition to the more targeted checks we're now doing in real time.

As of this post the new real time inference engine is live on our API and being served from every node in our cluster, one thing you should expect is slightly higher latency. Previously our average response (after network overhead is removed) was 26ms, with real time inference that average has increased to 75ms.

We feel this is a good trade off because we're continually working to reduce latency while also introducing more thorough checking, so we're confident we can get back down below 30ms soon and we will use those extra response time savings to introduce more types of checks.

Thanks for reading and have a great day!


Back