Introducing IP Multichecking to our API

One of the biggest requested features from our customers over the past year has been the ability to check multiple IP Addresses within a single query. This feature has many benefits including reduced TLS handshake times, reduced resource usage from multiple webserver connections and decreased API latency through resource reuse.

To put it simply, it's a lot faster to perform one query with a 100 IP payload than it is to perform 100 queries with one IP payload each. We've tuned the new API for this multi-payload scenario and the performance improvement is dramatic as our benchmarks will show.

Before we get to those, we're considering this feature experimental. Due to this, the enhanced API is only accessible through /v1b/ with the b meaning beta. We're also supporting the submission of IP Addresses via GET and POST. You should use POST, the GET input is just there for testing in your browser.

We also want to be clear that this is not simply an abstraction endpoint that calls our API internally (and individually) we have literally gone through the API and rewrote every part of it to handle multiple checks. So this differs to how our web interface page has functioned (we will be transitioning that page to our new API soon).

So lets get to the benchmarks. Firstly we performed our testing multiple times with different addresses and averaged the results. There was not much deviation between tests. We did all our tests with 100 IP Addresses and we used TLS encryption.

IP Addresses that are NOT already in our data set with real-time Inference Engine turned ON

  • v1 (current) API with 100 queries each with 1 IP Address: "query time": "65.35s"
  • v1b (beta) API with 1 query containing 100 IP Addresses: "query time": "36.184s"

This is an impressive reduction, but watch what happens when we disable our real-time Inference Engine.

IP Addresses that are NOT already in our data set with real-time Inference Engine turned OFF

  • v1 (current) API with 100 queries each with 1 IP Address: "query time": "45.221s"
  • v1b (beta) API with 1 query containing 100 IP Addresses: "query time": "6.133s"

Now we're seeing a much larger decrease in query time. To be clear, our past Inference Engine cached data is still being processed here, so all past determinations made by the real-time and post-processing Inference Engine are still being utilised here but actual live determinations have been turned off.

Now finally lets take a look at positive detections. This is where the IP Addresses being tested (all 100) are already present in our data set but not within caches. So it's still searching all of our data but it's finding matches throughout our data set as opposed to never finding a positive detection like the tests above.

IP Addresses that ARE already in our data set with real-time Inference Engine turned ON or OFF

  • v1 (current) API with 100 queries each with 1 IP Address: "query time": "22.372s"
  • v1b (beta) API with 1 query containing 100 IP Addresses: "query time": "0.639s"

Here again we're seeing a huge decrease in query time. This is where we're seeing multiple TLS Handshakes and HTTP Connections being removed from the query overhead and our in-memory resource reuse has come into play.

So how do you start using the new API. We've made it really simple when you perform a query to the v1b API with multiple IP Addresses simply place multicheck in the IP field and then provide your multiple IP Addresses in a POST request called ips with each IP separated by a comma. If you want to use it in a GET request instead change the singular IP to multiple IP Addresses also separated by a comma. Below we've provided two examples.

GET request example

POST request example

You can still use your normal flags with these requests, for example ASN, VPN, Time, Node and we've introduced a new flag just for v1b called INF and as you probably can guess this disables our real-time Inference Engine so that you can perform multiple checks faster. To disable this feature provide &inf=0 in your request as by default it's turned on.

We're limiting multi-checking to 100 IP Addresses per query right now but we do intend to increase that limit after it comes out of beta. We hope you will all give it a good try and provide us some feedback which we welcome you to do at [email protected]

The last few things we wanted to mention about the new v1b endpoint is it does still support singular IP checks and the JSON result format is exactly the same as it has always been when perfoming a single IP check. You will see the new multi-check format only when performing multi-checks.

And finally since this is our new API we are working on it full time now. It has some added enhancements that our older API didn't have including better IPv6 support for VPN detection (since backported to /v1/ today) and we've moved around where certain checks are performed so you can now blacklist Google and Cloudflare IP Addresses, Ranges and ASN's from your dashboard and have those blacklists adhered to where as before they weren't.

Thank you for reading, we hope you're all having a great week and we look forward to hearing your feedback about this new feature.


Realtime Inference Engine Improvements

In our previous update we shared with you the accuracy improvements we've made to our post-processing Inference Engine. This is the part of our service that searches for active proxies within our negative detections and from our Honeypots positioned around the world.

Today we've enhanced our real-time Inference checks which are performed at the same time as your queries. Prior to today only 1/3rd of our Inference Engines capability was utilised for real-time checks due to the time it takes for determinations to be made.

But we've now enabled 2/3rds of our Inference Engines checking capability for real-time queries. Based on our testing this means 95% of all Inference Engine based determinations on your queries will now occur at the point you perform your query.

That means you're much more likely to receive a complete result the first time you check an IP Address as opposed to us only detecting an IP as being a Proxy Server after you've performed your query and already received a negative detection result.

Prior to today about 65% of our Inference Engines positive detections were performed in real-time so this is a rather large increase in real-time detection rates. The final 5% will still be detected in post-processing with the entire Inference Engine enabled but we're hoping to further improve performance here also to be able to offer it in real-time at a later date.

While we have been able to tune the real-time Inference Engine considerably to allow for 2/3rds enablement there is a small latency increase on queries by roughly 70ms. The thing to keep in mind here though is these increases only occur for what would otherwise be negative detections. If an IP Address has already been run though the Inference Engine or otherwise detected in our dataset as a Proxy or VPN Server you won't incur this extra latency so think of it as an added accuracy tradeoff.

As always we're working to improve the performance of the API so we can answer your queries faster and support more queries per second, improving our real-time detection rate is one of the core benefits of improved API performance as our detection accuracy directly correlates to how much CPU time we can spend on each determination.

We hope you found this post interesting, thanks for reading!


Inference Engine and API Improvements

Inference Engine Improvements

Over the past few days we've been working on improvements to our Inference Engine. Specifically making it work across more processing threads simultaneously and efficiently while decreasing its rate of false positives.

We recently upgraded our Prometheus server node from a 6 Core, 12 Thread XEON to dual 8 core, 16 thread XEON processors. Giving us a total of 32 threads and a giant 50MB of L3 Cache. These processors also have much higher frequency with a base clock of 3.6GHz and a maximum turbo-speed clock of 4GHz.

With this new hardware we've seen a dramatic 181% uplift in performance for our multithreaded workloads. Every part of proxycheck.io is multithreaded, it's one of the ways we're able to deliver such high performance for the millions of queries we process daily.

But some of the post-processing we do such as with the Inference Engine to discover new proxies did have some performance hangups under certain scenarios which could reduce performance. Tuning your software to take advantage of 32 processing threads is not an easy task and it's a big jump from the 12 threads we were using previously.

We've had to tune the software not just to take advantage of the extra threads but also to understand NUMA (Non-uniform memory access) so that our threads are working on data in the RAM connected directly to the processor the thread is running from. We've now completed the rewrites necessary for our new hardware and we're seeing dramatic increases in performance on this node.

Due to this performance uplift we've been able to spend more time training our Inference Engine and with the extra computation time available each determination our inference engine makes is more thorough than ever before which lends itself to increased accuracy.

We're now seeing a 0.02% reduction in false positives in our training models which equates to 2,000 less false positives per 10 million determinations.

API Improvements

Over the past several days we've also been hard at work on the API itself. Not only to tune it for our new multi-processor node but also to fix some bugs including one with ASN's and special characters. When receiving an ASN result we tell you the provider name, sometimes these providers have special characters in their name for example Telefónica has a special ó in their name. This caused an encoding error with our JSON encoding which resulted in null results.

We've corrected this by encoding all provider names before the JSON encoding is performed. We believe very few results were affected by this issue which was caught by our own Ocebot during its automatic API probing.

That's all we have to share with you today. We hope you all had a great weekend.


Degraded cluster performance

As of right now two of our three node hosts are having peering issues. We're monitoring the situation closely. All of our services are online and available but you may find some queries taking longer than usual to be answered.

EDIT:// Stats within your Dashboards are delayed at the moment while we work on cluster maintenance. Thank you for your patience.

EDIT2:// We have re-enabled stat collection, we believe some stats created by our ATLAS node have not all been reported to the cluster accurately due to its network issues combined with the stats accumulating in our database updating coalescing cache. Due to this cache becoming full new stats couldn't enter the cache and old stats couldn't be reported accurately due to packet loss. This has resulted in some stats being lost. This means your query volumes for yesterday and the first two hours of today are lower than they should be (in your favour, as you receive more queries for free basically).

We apologise for the disruption to our stats feature, we're working to make sure this doesn't happen again by building in secondary disk based file caching to hold stats that cannot be committed in a timely fashion for later processing.

Thank you.


New dashboard API

Since we added the whitelist and blacklist features to the dashboard in may earlier this year we've wanted to add a powerful JSON API for them enabling you to fully manage your whitelist and blacklist with your own software in an automated way.

It has also become one of the most frequently requested features and today we're happy to deliver this feature to you. If you just wan't to get integrating head on over to our API Documentation page and click on the new Dashboard API Information tab.

Using the new API you can list, add, remove, truncate and clear your whitelist and blacklists using standard JSON, GET and POST requests. This enables you to completely manage your whitelist and blacklists without needing to login to your account but you can still do that of course.

We consider this feature beta but stable, there are no outstanding bugs as of right now but if you see any please feel free to message us, we've spruced up our contact us page as-well.

Thanks for reading and have a great day!


Improving our VPN detection

Over the past few weeks we've been adding many new VPN services to our detection system and we're now at 250 individual service providers that we detect. The main way we've been going about it is detecting Data Centers and Service Providers that offer commercial hosting and thus enabling us to detect all the middle-men VPN providers that rent their servers.

We've also received a lot of leads from our users. It has been made clear to us that VPN detection is very important to our customers. We're incredibly thankful to those of you who have been sending us IP Addresses we missed and service providers we don't yet detect.

While adding many of these VPN services to our database we did find some bugs with our API VPN database lookups. Specifically when a VPN Provider had a great many addresses their IP's weren't being detected consistently due to an exhausted key buffer. This was caused by a very simple code error on our part which we have since resolved. From our data we believe 5% of all our VPN entries were affected by this bug.

Being a well rounded service which means the accurate detection of VPN services and not just Proxy Servers is of paramount importance to us and we are laser focused on adding as many VPN services as possible to our database.

Thank you for reading this post and please feel free to contact us via Skype, Web chat or Email to provide us with the IP Addresses or the names of VPN services we're not currently detecting, all of this data is incredibly valuable to us.


Dashboard Speedup

Today we've been spending some time going through and auditing code all over the site. Mainly CSS related but we've also been looking at ways to improve page load times by reducing image sizes, moving scripts around that cause slower page painting and so forth. One of the areas we looked at was the Dashboard. Sometimes on initial loading it can take some time before it begins to show any content beyond the navigational bar and this was due to our use of Javascript to load the initial content you see when it first loads as-well as handling the switching of separate tabs within the Dashboard.

So today we've gone through and altered how this code works so the content you initially see when you load the Dashboard (or subscribe to a paid plan where it shows you the Paid Plans tab by default) now load instantly.

This has a dramatic effect on how the page feels, it loads practically instantaneously now. No one has actually complained or requested this change but it's something we noticed while auditing the website as we do from time to time.

One of the best parts of our service currently is the customer dashboard, it is our second most visited page after the home page and so it's important to us that it's well built. We feel the code behind the scenes is written incredibly well and will be easy to maintain going forward so we're not planning a rewrite of the Dashboard any time soon, just maintenance and new features like our recent ASN support and improved Stats exporting.

Thanks for reading and have a great day!


New Code Examples

Today we've expanded our code examples page with new code snippets for Node.js, Python and Ruby. We've also added a curl command line example. These are all very simple examples to help get developers started and we are including your API key in all the examples (including the C# one now) to get you up and running faster. All of these examples were submitted to us by users of our API and we're very grateful for your contributions. If you have other languages you'd like included or you would like to expand upon any of the examples we've provided please submit those to us through our support email address [email protected] we are more than happy to feature your work on our examples page for other users to benefit from.

Thanks for reading and have a great day!


New Service Status Page

When we first added the Status Page it was a great resource to quickly view if any of our server nodes were offline and their load conditions. But as our service has evolved we've added so many new things that the status page wasn't serving our needs fully.

So today we've introduced a brand new service page written from the ground up just for us. Like all of our services it runs on our cluster so the chances of it being unable to load are very low.

As you can see it has a new interface that lists each of our features separately. And if you hover your mouse over the tables you'll receive a bit more detail about what the service is or what's for.

We are now able to list our backend systems so we ourselves can get a quick look into issues. And our new status page doesn't just display whether a service is functioning or not, it also can display intermittent issues, high load conditions or anything that would disrupt service.

And finally, we can list our individual Honeypots. Allowing us to see how our IP vacuum cleaners are doing. We hope you like these changes, we feel they were necessary and it does give you a more complete picture of our infrastructure. It also allows us to display more servers in less space which will come in handy next year.

Thanks.


Should I be using HTTPS to query the proxycheck.io API?

It's a question we get quite often, what is the benefit of using transport security for API queries. We've offered it since the day we launched but it's not completely obvious why you'd need it so we're going to explain it.

Image description

Firstly HTTPS means Hyper Text Transfer Protocol Secure. When in use an encryption algorithm is used to secure your connection to the server you're communicating with. In our case that is your application server with our application server. All the information our two servers transmit and receive while communicating is now cryptographically secure meaning third parties cannot determine what you're sending us and vice versa.

So why would you need this advanced security for what seems on the surface quite basic API calls?

Well the main reason is, it stops your visitors from being tracked by third parties. In the current political climate we have world powers trying to undermine individual personal security at every opportunity. So when you send the IP Address used by your visitors to proxycheck.io there is potential for a government agency or other organisation to record that interaction, that is if you've not used our HTTPS API endpoint.

We could call this kind of information collection metadata because although they don't know what the user was doing on your website they know they visited it and they can link that visit with their IP Address within a larger database to track that user and build an overall profile of who that person is and what they do online.

That's we feel the main reason you would want to use the HTTPS endpoint. The second reason is for your own account security. If you're making an API request to our service as a signed up customer you have to supply your API key with every request. If your communications with our server are being intercepted it is possible for a third party to grab your API key and begin to make queries against your accounts query allowance.

And that could cost you money or exhaust queries you've already paid for. The only real drawback to using the HTTPS endpoint is the added time it takes to setup the encrypted connection. There are more handshakes and third parties have to be consulted about the accuracy of our encryption certificate which increases the time it takes for your query to be answered.

It is for this reason we offer both HTTP and HTTPS endpoints for our API. We're giving developers the choice. We hope this post has been helpful in explaining why we offer HTTPS to all our customers and have done so from the very beginning. Privacy can only be maintained when we all do our part to strengthen it.


Back