Reducing stagnant data

September 20, 2017 Posted by proxycheck.io

When you operate a data driven service such as proxycheck.io you will come up against an issue where you need to decide at what point data has become old and irrelevant.

For us that means how long should we consider an IP Address bad before we remove it from our database. In the past we would cache an address for a period of 90 days since we last saw it operating as a proxy or compromised server.

But this presents an issue in that addresses are often repurposed and bad services running on compromised servers get cleaned up constantly. So this means it's not always best practice to hold IP data as long as we have been.

So we're extending the duties of our inference engine to not only discover new IP Addresses acting as proxy servers but also to go through our old data and verify that the IP Addresses there are still bad.

This means we're now holding IP's for a minimum time of 15 days down from 90 days. The inference engine will make assessments every day from the moment we first add an IP to our database and then slowly discard Addresses where it has a 100% confidence rating they're safe again.

We believe this will cut down on false positives allowing more of your legitimate users to access your services without being blocked unduly just because they received a previously abused IP Address. This change has been active for a while on our development platform and after positive and accurate results we've engaged the system on our live data.

Thanks for reading this change and as always have a great day.

Introducing Enterprise Plans

September 8, 2017 Posted by proxycheck.io

Today we're introducing three plans targeted towards enterprise customers. This brings our total paid plan offering to twelve subscriptions.

You may be wondering why do we need so many plan options? Well we don't want to force customers into purchasing plans which offer two or three times more queries than they're actually going to use so we have three budget plans for hobbyists, three medium plans for entry-level developers, three business sized plans and now three enterprise sized plans.

And just because these plans are intended for Enterprise customers that doesn't mean they can't be good value. In-fact it's cheaper to take any of these larger plans than it is to use different accounts and purchase multiple smaller plans.

For example our largest plan prior to today cost $29.99 per month for 2.56 Million daily queries. If you double that plan to 5.12 Million queries it should cost $59.98 but we're only charging $49.99 so a 16.7% decrease in per query cost. This cost saving is the same for the new 7.68 Million and 10.24 Million daily query packages when compared to purchasing three or four 2.56 Million query packages.

And we've still been able to keep our highest query package under $100 .. if only by a penny. All three of these new packages are available as not just monthly plans but yearly plans too and when you pay for a year up front you still receive the same 20% discount we've been offering on our prior plans.

So we hope this will help some of you that have been enquiring about some very big requests, especially those of you operating ad networks which are constantly being undermined by click fraud. We've had a great many enquires about larger packages and these three are in-line with the custom package sizes we most commonly get asked for, by offering them officially it cuts down on the time we spend setting up custom plans and standardises pricing for everyone to view at their convenience.

Thank you and have a great day!

Updated Website Appearance

September 5, 2017 Posted by proxycheck.io

Today we updated the proxycheck.io website with an improved stylesheet. You'll find that our white background has now come up a few shades, we've done this so we can essentiate some of our page elements like input boxes, features descriptions and payment options.

No page makes this more evident than our updated dashboard where we've added a shadow to our navigation toolbars and subscription boxes as illustrated below.

You can also see this effect yourself on our pricing page. We're still tweaking the look and feel of proxycheck.io this is our second year in operation and so we're going through everything we did last year and just re-examining how things should look and react.

We hope you like the new look as always we invite your feedback, please email us your thoughts at [email protected] we read and reply to every email we receive.

Have a great day!

Query Exhausted Emails

September 3, 2017 Posted by proxycheck.io

In late July we added a new feature to the service where when a customer went over their daily query allowance for five days consecutively they were sent an email to let them know. So far we'e sent quite a few of these emails but we've been wanting to improve how they look and what information they present to users.

Today we've gone through and changed how these emails look, what they say and the information displayed. Now you'll be given a six day query overview so you can more quickly ascertain how many queries you've been making so you can make an informed decision about your usage of the API. Below is what this new email looks like.

You'll notice in this example my account is a free registered account so I currently have a 1,000 daily query allowance. You'll also note I've gone over this allowance today and for the past 4 days as shown in red but five days ago I didn't go over and so that shows in green.

The email is simply informational so that our customers are made aware that the service[s] they're protecting with our API are going unprotected for some part of each of the past five days. They are at no point told they must make any alterations to how they query the API or that they must purchase a paid plan.

If you receive one of these emails please don't be alarmed as your account is not in jeopardy, these emails are simply for your information, we know setting up monitoring for all the API's you use is time intensive so we have developed this to take care of it on your behalf. You can of course disable these emails in your customer dashboard.

We hope you like these changes, we know it's not as flashy as some of our other unveils but these kinds of features are what we consider foundational and thus important enough to share.

Thanks for reading and have a great day.

When to signup and some statistics

August 29, 2017 Posted by proxycheck.io

Since the service went live we've offered two main ways to access our API. As a registered user and an unregistered user. Meaning you could query our API with or without an API Key.

As is often the case with a service like this most users choose not to signup for an API Key. This is mostly due to third party software integrating our API and not requiring users to obtain an API Key. Secondary to this is developers that don't think they'll need more than 100 queries or the other features accessible by being registered.

But of course this presents a problem for our service and our unregistered users. Currently every day almost half (49.7% today) of our unregistered users go over the 100 free queries they are allotted each day. 35% of those users make less than 1,000 queries per day but more than 100. The remaining 15% go over 1,000 queries and the top 5% go over 10,000.

These daily unregistered users outnumber our daily registered users by exactly 6 to 1. So the problem here is two fold.

Half of our unregistered users go unprotected for some part of every day.
We have no real way to contact these unregistered customers because .. they're unregistered.

So what this means is we cannot inform these unregistered users of when they go over their query allotment so that their services remain protected. And that means we cannot refer those 15% of users that would go over even our 1,000 free queries to a paid plan.

It also means we have to handle millions of daily queries which are not being answered by our API but are hitting our server infrastructure.

So what are we going to do about it? - Well there's many options, take away the unregistered plan would be the most obvious one but we're not going to do that. What we are going to do is make a good faith effort to contact unregistered users that go well into high volume query amounts.

So to be specific only unregistered users will receive emails from us and only if we can track them down based on their querying IP Address. For some it's not so difficult as often the users making the highest queries are using dedicated hosting which point to a specific domain.

If we're not able to contact some of the high volume unregistered users (and we're really talking about unregistered users who are making 50,000-100,000+ queries per day) then we may block their IP Addresses from accessing our cluster. However this isn't something we're doing currently and not something I believe we'll need to do frequently.

We've already been able to contact a few of our unregistered users that are in that top 5% category of making over 10,000 queries per day with some good results.

The big takeaway from all this though is if you're one of our unregistered users now is a great time to get yourself registered. You not only go from 100 to 1,000 daily free queries but you can monitor your daily and monthly query volumes and get automatic alerts if you go over your daily allotment plus you get a breakdown of your positive detections, query tagging, and white/black list support as part of our customer dashboard.

Thanks for reading, if you have any questions about this or anything else please don't hesitate to contact us at [email protected] we do read and respond to every email. We hope everyone has a great day!

Improved dashboard query stats

August 21, 2017 Posted by proxycheck.io

One of the things our customers have asked for is a way to view yesterdays query stats in their dashboard. This is a useful metric because it helps potential customers estimate how many daily queries they will need to purchase if they're going over the 1,000 free queries we provide.

So today we've gone ahead and spruced up the stats view on the dashboard, you can now page through the last 30 days of queries on a day by day basis and you can download your stats as a text file or export them using our JSON Export button. And like with our positive detection log you can query the JSON Export API using just your API Key so it can be integrated into your own control panels.

Here is what the new stats section looks like:

As always these new features are instantly accessible to all users whether you're on our free or paid plans. We've also enhanced the subscription UI if you're subscribed (monthly or yearly) you will now see how much you will be charged on your next billing date, prior to today we only showed you the next billing date and of course that's not useful if you forget what your plan costs.

We hope you all like these changes as they are all a direct result of your feedback, we really appreciate how helpful everyone has been with reporting the issues they find and requesting new features like the enhanced query stats we added today.

Thanks for reading and have a great day!

Customer Q&A

August 16, 2017 Posted by proxycheck.io

Since the service has been going quite a while now I thought it'd be a good time to answer some of the most frequently asked questions we get from customers. Firstly we welcome these questions so please feel free to keep sending us questions via our live chat or email ([email protected]).

So lets get into the top questions we receive.

Question: Why is the free service limit so large? (1,000 queries per day) and why are the paid plans so affordable?.

Answer: We know that there are two types of proxy detection API's out there, the free kind where absolutely every query you make is free up to a usually unexplained "reasonable" amount and the paid kind where the cost is extremely high, especially for new developers who are simply nurturing an idea into existence and thus don't have the capital needed to purchase expensive subscriptions.

So our free plan is quite generous because we're fighting for market share from the other free players in this category. Our main free competitors have a lot of mindshare because they're free but the service level offered isn't that great. Some of them did not or continue to not offer TLS querying, they are ambiguous about how many queries you can make per day before being cut off and none of the ones we've found offer cluster redundancy to free customers. There is also some questions about how accurate free services are when the creators are not being incentivised monetarily to keep their API's up to date with the latest detection methods.

So these are areas we identified where we could offer something better to grow our marketshare, to pull some customers away from the free services and perhaps in the process convert them to paid customers but even if they don't convert, simply mentioning us online (as developers often do) is more than enough pay back for the free service they enjoy.

As for why the paid plans are so affordable, we don't have much overhead as we run proxycheck.io very lean. We've worked hard to write it in a way that it scales on mid-range server hardware. If you look at the companies that are offering paid only proxy detection their plans are often several times more costly than us but they have enormous overheads with lots of employees. We simply don't, we know that proxy detection is mostly a niche service, it's not as popular as for example geolocation services.

And the other side of that coin is we believe all developers should have access to the best proxy detection service for the lowest prices. Right now our starting plan offers unlimited concurrent querying and 10,000 daily queries for only $1.99 a month or $1.59 a month if paid annually. Some of the lower paid only providers start charging at $8.99 a month and that's just too much for most peoples websites.

But rest assured we have a lot of paying customers, the service is profitable and we intend to maintain our current pricing while adding lots of new features and further improving our detection methods.

Question: Why do you only accept Debit/Credit cards and not PayPal? (I want to pay with PayPal)

Answer: This question has come up more often than any other and to be quite honest PayPal is not an enjoyable company to work with, it's more like a necessary evil. Part of us being able to offer prices as low as $1.99/$1.59 (Annually billed pricing) is because we don't use PayPal.

They take quite a large slice of each transaction. Not only do they take a percentage but they take a set fee as-well and it would simply eat into our revenues too much. We would have to increase our lowest priced plan by an extra dollar.

PayPal also has a nasty habbit of closing accounts on a whim, something we can't be dealing with when our business relies on the automatic reoccurring payments made by our customers.

Naturally we understand the reason people like to use PayPal as a customer, it keeps your Debit/Credit card information safe as the merchant doesn't get it, but this is partly why we partnered with Stripe instead. Their card processing fees are much lower than PayPal and we still do not get your card information, only Stripe has it and they're fully PCI compliant like PayPal is.

Unfortunately we cannot offer PayPal at this time although it has resulted in quite a few lost sales I think you can understand from our perspective we're reluctant to enter into any partnership with them.

Question: Can I make an app that includes your API and if so can I sell that app to others?

Answer: You sure can, we welcome you to make all manner of software which includes our API and if it's really good we'll even feature it on our examples webpage so feel free to shoot us an email when you've made a great app!

Question: How are you detecting/gathering the proxy addresses to be blocked?

Answer: There are a lot of different ways we use but the most common two are scanning different websites all over the internet all day every day to find new IP Addresses which are acting as proxy servers. The other way is collecting IP Addresses by testing them using our own inference engine.

The inference engine is a type of machine learning system where by we set it some goals and a working set of data and a very rough guideline of how to apply evidence based reasoning to sorting that information. In this case good IP Addresses from bad ones. Over the past few weeks it has quickly become a major source of proxy server addresses for us and during our testing we've found 90% of the addresses our inference engine finds are unknown to other major proxy checking services we've tried.

On top of these major ways we get around 0.25% of our data from third party sources who we either pay for their data or they make it available for free without commercial license stipulations. It's important to realise we cannot have total detection so we look to third parties when they have reasonable pricing or licensing and offer unique address data that we didn't have. Thanks to our inference engine we've seen our database almost double in size and so our reliance on third parties has fallen drastically as a result.

I should also mention we operate 20 honeypots situated in different address ranges around the world and from these we feed our inference engine data about attacks being performed on our honeypots. We monitor for VNC, RDP, FTP, Telnet attacks and also Email spam, Website signup/comment spam and more. These have been a great source of data for us.

Question: Why are you using CloudFlare?

Answer: CloudFlare is a CDN (Content Distribution Network) and they have servers all over the world, as do we. They can offer us great connectivity to our customers anywhere on the globe, even in areas we don't have servers.

The other benefit is our cluster was designed from the very beginning to work in tandem with CloudFlare and that is why we're able to offer triple server redundancy without requiring our developers do anything special with how they query our API. It just works, it's incredible fast and we've designed it into our cluster from day one. Whether we have a single server or 2,000 servers in our cluster it will continue to just work.

Question: What's your opinion of x service that denigrated your service?

Answer: Whenever you enter a competitive market you're going to get some disparaging comments from the other established players. We've actually been offering proxy blocking services (for free) since 2009 but it was only in 2016 that we put up the proxycheck.io domain and decided to turn it into a proper business. So what I'm saying is we have a lot of experience, we're not new to this and frankly our service is already one of if not the best offering developers incredible flexibility in pricing and features.

Question: Your biggest plan isn't big enough for me or I'm worried I will outgrow your service

Answer: Currently our biggest plan is 2.56 Million daily queries for $29.99 a month / or $287.90 a year (20% discount for annual pricing). However this is just our biggest set plan. If you need twice, thrice or quadruple this amount of queries (or more) simply let us know, we're offering very competitive pricing.

For example for $39.99 a month you can have 5.12 Million daily queries. We're not suddenly charging extortionate pricing for custom plans so feel free to shoot us an email and we can discuss your needs.

Question: Can I really cancel my plan at any time?

Answer: You sure can. From your dashboard you'll see a new Cancellation button if you're currently subscribed to any plan (monthly or yearly). The best part is if you cancel before your current plan ends you don't lose what you've already paid for. So you can purchase a one month plan, cancel it after a few minutes and still enjoy an entire months worth of your paid plan.

We've done this because we know some are hesitant about automatically renewing subscriptions, you don't want to be caught out with a payment you forgot was coming and so we fully support your ability to cancel a subscription without losing anything you've paid for.

And that's all folks!

We hope you've found this little Q&A useful. All of these questions were put to us by customers, often many times. If you have any other questions please feel free to email us at [email protected] we aim to answer all emails within 12 hours.

Improving node efficiency

August 7, 2017 Posted by proxycheck.io

Recently we've been focusing a lot of effort on improving the performance of our API. We've reduced overall query access time, improved network peering to lower our network overhead, added new query caching software and reformatted how our data is stored.

But over the past few days we've been focusing on the CPU usage of our nodes. With the inference engine running constantly and our API having to answer millions of queries per day we found that the CPU usage on our nodes was getting quite high. Here is a image depicting an average 60 seconds on one of our nodes, HELIOS.

As you can see from the graph above the CPU usage is quite consistently high around 55-60%.

So to figure out what is causing this consistently high CPU usage we looked at our performance counters and also the data from Ocebot. What we found was, this high CPU usage isn't being caused by API queries directly. Our level of caching and code efficiency is very high there and the impact of even several hundred thousand queries a minute was not causing these kinds of high load scenarios.

Instead we found it to be caused by the inference engine (about 10-20% load) and our database syncing system (25-30%). So combining these it's easy to get around 55% usage all the time.

To fix it we've rewritten some core parts of our syncing system, we did some code refactoring to this system last month so that some of our data that changes very often enters into a local cache to be synced at timed intervals. This coalescing of database updates allows for a higher efficiency because data that changes very often (hundreds or even thousands of times per minute) are being synced only one time instead of hundreds or thousands of times.

But what we found is, as our customer base has continued to double every few weeks that the amount of data we need to cache before syncing has increased too. So what we're doing now is staging all cluster database updates in local node caches.

As for the inference engine, we have manually gone in and altered some of the algorithm to remove some learned behaviour which got results but in an unoptimised way, artificial learning still has a way to go or at-least our implementation does. This has also resulted in lowered CPU usage.

So here is the result of our work:

Now we're seeing much lower average CPU usage, from 55% to around 7% with peaks to 10-15%. We're still optimising for CPU usage but we think we've hit all the major CPU issues with this update and we're now looking at other aspects of the service for improvement. The good news is by doing this kind of work we can put off purchasing another node for our cluster which leaves more money to pay for development and partner services instead of the servers that run our infrastructure.

Thanks for reading and have a great day!

Improved ASN data and lower response times

August 5, 2017 Posted by proxycheck.io

Earlier today we made a post about our ASN data source having some network issues causing us to have incomplete ASN data. We have sinced switched data sources for ASN information which has resulted in two benefits.

Queries that ask for ASN data are now being answered in 100-200ms instead of 400-600ms.
We now have ASN data for IPv6 addresses.

Previously only IPv4 was supported for ASN lookups and those took quite a while (relatively speaking) to be answered, we're now using a much better partner for this information which allows us to store more ASN data on our servers themselves resulting in faster and more complete lookups.

Thanks!

Minor Node Issue with HELIOS

August 5, 2017 Posted by proxycheck.io

Around 10 hours ago an intermittent syncing issue with our HELIOS node began where by it wasn't syncing some of its data with the other nodes in our cluster, stats data and new user registrations specifically were not being synced by this node. This morning we discovered the issue and it has been corrected by the time you're reading this post.

At no point did any customer data become lost and your query stats should show correctly as of right now. Also at no time was the API giving bad or incomplete data as the syncing of that information was working correctly at all times.

We're very sorry that this occurred and we're investigating why HELIOS was not removed from the cluster permanently, our initial findings seem to indicate it synced up completely a few times and then re-entered the cluster only to fall out of sync almost immediately and then was removed again after a long delay. We will be adjusting our cluster architecture to be more resilient to these kinds of intermittent faults in the future.

In an unrelated event we're having some latency issues with our ASN data suppliers network which has resulted in diminished ASN information and also higher latency. Expect 4-5 second replies for queries that contain the ASN flag and for the information to be incomplete (lacking country information sometimes). We expect to have this working correctly again soon.

Thank you for your patience and have a great day.