Inference Engine and API Improvements

Inference Engine Improvements

Over the past few days we've been working on improvements to our Inference Engine. Specifically making it work across more processing threads simultaneously and efficiently while decreasing its rate of false positives.

We recently upgraded our Prometheus server node from a 6 Core, 12 Thread XEON to dual 8 core, 16 thread XEON processors. Giving us a total of 32 threads and a giant 50MB of L3 Cache. These processors also have much higher frequency with a base clock of 3.6GHz and a maximum turbo-speed clock of 4GHz.

With this new hardware we've seen a dramatic 181% uplift in performance for our multithreaded workloads. Every part of proxycheck.io is multithreaded, it's one of the ways we're able to deliver such high performance for the millions of queries we process daily.

But some of the post-processing we do such as with the Inference Engine to discover new proxies did have some performance hangups under certain scenarios which could reduce performance. Tuning your software to take advantage of 32 processing threads is not an easy task and it's a big jump from the 12 threads we were using previously.

We've had to tune the software not just to take advantage of the extra threads but also to understand NUMA (Non-uniform memory access) so that our threads are working on data in the RAM connected directly to the processor the thread is running from. We've now completed the rewrites necessary for our new hardware and we're seeing dramatic increases in performance on this node.

Due to this performance uplift we've been able to spend more time training our Inference Engine and with the extra computation time available each determination our inference engine makes is more thorough than ever before which lends itself to increased accuracy.

We're now seeing a 0.02% reduction in false positives in our training models which equates to 2,000 less false positives per 10 million determinations.

API Improvements

Over the past several days we've also been hard at work on the API itself. Not only to tune it for our new multi-processor node but also to fix some bugs including one with ASN's and special characters. When receiving an ASN result we tell you the provider name, sometimes these providers have special characters in their name for example Telefónica has a special ó in their name. This caused an encoding error with our JSON encoding which resulted in null results.

We've corrected this by encoding all provider names before the JSON encoding is performed. We believe very few results were affected by this issue which was caught by our own Ocebot during its automatic API probing.

That's all we have to share with you today. We hope you all had a great weekend.


Degraded cluster performance

As of right now two of our three node hosts are having peering issues. We're monitoring the situation closely. All of our services are online and available but you may find some queries taking longer than usual to be answered.

EDIT:// Stats within your Dashboards are delayed at the moment while we work on cluster maintenance. Thank you for your patience.

EDIT2:// We have re-enabled stat collection, we believe some stats created by our ATLAS node have not all been reported to the cluster accurately due to its network issues combined with the stats accumulating in our database updating coalescing cache. Due to this cache becoming full new stats couldn't enter the cache and old stats couldn't be reported accurately due to packet loss. This has resulted in some stats being lost. This means your query volumes for yesterday and the first two hours of today are lower than they should be (in your favour, as you receive more queries for free basically).

We apologise for the disruption to our stats feature, we're working to make sure this doesn't happen again by building in secondary disk based file caching to hold stats that cannot be committed in a timely fashion for later processing.

Thank you.


New dashboard API

Since we added the whitelist and blacklist features to the dashboard in may earlier this year we've wanted to add a powerful JSON API for them enabling you to fully manage your whitelist and blacklist with your own software in an automated way.

It has also become one of the most frequently requested features and today we're happy to deliver this feature to you. If you just wan't to get integrating head on over to our API Documentation page and click on the new Dashboard API Information tab.

Using the new API you can list, add, remove, truncate and clear your whitelist and blacklists using standard JSON, GET and POST requests. This enables you to completely manage your whitelist and blacklists without needing to login to your account but you can still do that of course.

We consider this feature beta but stable, there are no outstanding bugs as of right now but if you see any please feel free to message us, we've spruced up our contact us page as-well.

Thanks for reading and have a great day!


Improving our VPN detection

Over the past few weeks we've been adding many new VPN services to our detection system and we're now at 250 individual service providers that we detect. The main way we've been going about it is detecting Data Centers and Service Providers that offer commercial hosting and thus enabling us to detect all the middle-men VPN providers that rent their servers.

We've also received a lot of leads from our users. It has been made clear to us that VPN detection is very important to our customers. We're incredibly thankful to those of you who have been sending us IP Addresses we missed and service providers we don't yet detect.

While adding many of these VPN services to our database we did find some bugs with our API VPN database lookups. Specifically when a VPN Provider had a great many addresses their IP's weren't being detected consistently due to an exhausted key buffer. This was caused by a very simple code error on our part which we have since resolved. From our data we believe 5% of all our VPN entries were affected by this bug.

Being a well rounded service which means the accurate detection of VPN services and not just Proxy Servers is of paramount importance to us and we are laser focused on adding as many VPN services as possible to our database.

Thank you for reading this post and please feel free to contact us via Skype, Web chat or Email to provide us with the IP Addresses or the names of VPN services we're not currently detecting, all of this data is incredibly valuable to us.


Dashboard Speedup

Today we've been spending some time going through and auditing code all over the site. Mainly CSS related but we've also been looking at ways to improve page load times by reducing image sizes, moving scripts around that cause slower page painting and so forth. One of the areas we looked at was the Dashboard. Sometimes on initial loading it can take some time before it begins to show any content beyond the navigational bar and this was due to our use of Javascript to load the initial content you see when it first loads as-well as handling the switching of separate tabs within the Dashboard.

So today we've gone through and altered how this code works so the content you initially see when you load the Dashboard (or subscribe to a paid plan where it shows you the Paid Plans tab by default) now load instantly.

This has a dramatic effect on how the page feels, it loads practically instantaneously now. No one has actually complained or requested this change but it's something we noticed while auditing the website as we do from time to time.

One of the best parts of our service currently is the customer dashboard, it is our second most visited page after the home page and so it's important to us that it's well built. We feel the code behind the scenes is written incredibly well and will be easy to maintain going forward so we're not planning a rewrite of the Dashboard any time soon, just maintenance and new features like our recent ASN support and improved Stats exporting.

Thanks for reading and have a great day!


New Code Examples

Today we've expanded our code examples page with new code snippets for Node.js, Python and Ruby. We've also added a curl command line example. These are all very simple examples to help get developers started and we are including your API key in all the examples (including the C# one now) to get you up and running faster. All of these examples were submitted to us by users of our API and we're very grateful for your contributions. If you have other languages you'd like included or you would like to expand upon any of the examples we've provided please submit those to us through our support email address [email protected] we are more than happy to feature your work on our examples page for other users to benefit from.

Thanks for reading and have a great day!


New Service Status Page

When we first added the Status Page it was a great resource to quickly view if any of our server nodes were offline and their load conditions. But as our service has evolved we've added so many new things that the status page wasn't serving our needs fully.

So today we've introduced a brand new service page written from the ground up just for us. Like all of our services it runs on our cluster so the chances of it being unable to load are very low.

As you can see it has a new interface that lists each of our features separately. And if you hover your mouse over the tables you'll receive a bit more detail about what the service is or what's for.

We are now able to list our backend systems so we ourselves can get a quick look into issues. And our new status page doesn't just display whether a service is functioning or not, it also can display intermittent issues, high load conditions or anything that would disrupt service.

And finally, we can list our individual Honeypots. Allowing us to see how our IP vacuum cleaners are doing. We hope you like these changes, we feel they were necessary and it does give you a more complete picture of our infrastructure. It also allows us to display more servers in less space which will come in handy next year.

Thanks.


Should I be using HTTPS to query the proxycheck.io API?

It's a question we get quite often, what is the benefit of using transport security for API queries. We've offered it since the day we launched but it's not completely obvious why you'd need it so we're going to explain it.

Image description

Firstly HTTPS means Hyper Text Transfer Protocol Secure. When in use an encryption algorithm is used to secure your connection to the server you're communicating with. In our case that is your application server with our application server. All the information our two servers transmit and receive while communicating is now cryptographically secure meaning third parties cannot determine what you're sending us and vice versa.

So why would you need this advanced security for what seems on the surface quite basic API calls?

Well the main reason is, it stops your visitors from being tracked by third parties. In the current political climate we have world powers trying to undermine individual personal security at every opportunity. So when you send the IP Address used by your visitors to proxycheck.io there is potential for a government agency or other organisation to record that interaction, that is if you've not used our HTTPS API endpoint.

We could call this kind of information collection metadata because although they don't know what the user was doing on your website they know they visited it and they can link that visit with their IP Address within a larger database to track that user and build an overall profile of who that person is and what they do online.

That's we feel the main reason you would want to use the HTTPS endpoint. The second reason is for your own account security. If you're making an API request to our service as a signed up customer you have to supply your API key with every request. If your communications with our server are being intercepted it is possible for a third party to grab your API key and begin to make queries against your accounts query allowance.

And that could cost you money or exhaust queries you've already paid for. The only real drawback to using the HTTPS endpoint is the added time it takes to setup the encrypted connection. There are more handshakes and third parties have to be consulted about the accuracy of our encryption certificate which increases the time it takes for your query to be answered.

It is for this reason we offer both HTTP and HTTPS endpoints for our API. We're giving developers the choice. We hope this post has been helpful in explaining why we offer HTTPS to all our customers and have done so from the very beginning. Privacy can only be maintained when we all do our part to strengthen it.


Payment Processing Fixed

Some time between the 17th and 20th of October our payment system stopped loading within the Dashboard. This was caused by a compatibility issue between our CDN Partners Javascript compression technology and our payment widget.

We managed to fix this problem this morning and we are once again able to take new payments. This bug may have also affected customers trying to cancel a paid plan, if you were trying to do that, you can now do so. This issue did not affect customers who were already subscribed, your payments during this time would have been processed normally.

Thank you to the users who made us aware of this issue this morning, we are very sorry it happened and have taken steps to make sure this doesn't occur again in the future.


Enforcing Terms of Service

An issue that almost all service providers will come up against when they offer any kind of free plan is users finding ways to maximise their use of these free resources well beyond the limits imposed by the service provider. For us this manifests itself in two ways.

1. Users signing up for a registered account multiple times under different email addresses

2. Unregistered users performing queries from a large pool of IP Addresses and/or Proxy Servers/VPN's

For the most part we don't mind if a registered user has two accounts. Perhaps you need one for production and one for your development environment. We do not police users that have 2 accounts, especially if you're under the 1,000 query limits on both accounts.

However we have found recently multiple users signing up for 10 to 15 accounts and load balancing their queries. This essentially means they have a 10,000 to 15,000 daily query limit for free when it should only be 1,000.

Another problem is unregistered users performing queries from Proxy Servers. So instead of making 100 queries per day from a single IP Address they are making thousands of queries per day across hundreds of IP Addresses. In-fact we've seen some users utilising more than 1,000 proxy servers in a single day to load balance their queries.

The thing to keep in mind if you're a user that does this is our API is not an unlimited resource so we cannot let the abuse of our API and the disregarding of our Terms of Service continue.

To this end over the past week we have been contacting registered users that have more than two accounts to let them know we've disabled all but one of their accounts. In these situations we always leave the account that has performed the most queries active while the rest are disabled.

We're also tackling the unregistered user abuse issue by now checking that the IP you're using to contact the API isn't a proxy server. If it is, the query will go unanswered. To be clear, this only affects unregistered users, we do not perform this check if you have registered for an account and are using your API Key to make your queries. We're also not blocking VPN services from making unregistered queries.

So if you wish to contact the API through a proxy server you can still do so, you'll just need to signup for an account and supply your API Key with your queries.

We hope that these changes will not disappoint too many of you. If you're a registered customer and adhering to our Terms of Service you will not notice any change to your service. If you're an unregistered user you may find your queries take a few milliseconds longer while we verify you're not accessing our API from a Proxy Server.

Thanks for reading and have a great day. If you have any questions please feel free to contact us.


Back