Setting up Nginx caching for API use

posted on 2013-07-20

With most of the HTTP basic auth secured API's on the internet, you have a limit for the number of requests you are allowed to make. Nginx can help by doing two things: cache the responses and remove the need for authentication.

First, install nginx and add the basic proxy configuration: the cache path (where to store the data) and the location to proxy on which url:

proxy_cache_path  /var/cache/nginx/yourapi levels=1 keys_zone=yourapi:10m;

server {

    root /srv/http;
    index index.html index.htm;

    server_name localhost;

    location / {}

    location /yourapi {
	    proxy_cache yourapi;
	    proxy_pass https://webservices.example.com/example/api;
	    proxy_set_header Authorization "Basic SECRET_BASE64_ENCODED_STRING";
    }

}

The first line sets up a cache location where you will store responses with a single directory level of depth for the data to reside for at most 10 minutes (10m). All not that important. The important part is the basic auth header you set on the request.

To acquire the bsecretkeyyouneed you can simply use Chrome or Firebug to check the request you send out when you do an API request in the Network tab.

If you need to add more headers in your request, you can just add extra proxy_set_header lines.

With this configuration Nginx will proxy the requests to http://yourserver/yourapi to https://webservices.example.com/example/api. You should now be able to work with the API without having to add your password.

But we are not there yet, we still have no control over the time the request will be cached. This is because, as documented, the cache validity is based on the response headers of the upstream API. It's very likely that your API will say not to cache it at all or only shortly.

To take full control of the cache timeout Nginx uses, we add two lines:

proxy_ignore_headers X-Accel-Expires Expires Cache-Control;
proxy_cache_valid any 30s;

This will ignore any of the cache related headers from upstream and keep the cache valid for 30 seconds. This means that as you bombard Nginx, a request will only go out to upstream every 30 seconds. But another problem is, that it won't be one request: it's very possible that multiple requests will be made when multiple people request the same stale cache. We are still missing two things of the puzzle: only one request needs to go upstream at any time:

proxy_cache_lock on;

and if people want the cache during an update then give them the cache instead of waiting for upstream:

proxy_cache_use_stale updating;

After using all these extra lines, you can bombard Nginx and it will only send a single request every 30 seconds while still responding quickly with stale data. Now a last part of the puzzle is introduce headers that will request any user of your proxy api to not cache the response: add_header Cache-Control "no-cache, must-revalidate, max-age=0";

That's it, all bases are covered and you can now start programming multiple clients for the proxy api without going over your API usage limit or having to add the password to each of your clients.

The complete configuration you end up with is:

proxy_cache_path  /var/cache/nginx/yourapi levels=1 keys_zone=yourapi:10m;

server {

    root /srv/http/yourapi;
    index index.html index.htm;

    server_name localhost;

    location / {
    }

    location /yourapi {
	    add_header Cache-Control "no-cache, must-revalidate, max-age=0";

	    proxy_cache yourapi;
	    proxy_cache_use_stale updating;
	    proxy_cache_lock on;
	    proxy_cache_valid any 30s;
	    proxy_ignore_headers X-Accel-Expires Expires Cache-Control;

	    proxy_pass https://webservices.example.com/example/api;
	    proxy_set_header Authorization "Basic VGhhbmsgeW91IGZvciByZWFkaW5nIGJuZWlqdC5ubAo=";
    }

}