HTTP/2 Server Push and Cache-Digest

Imagine you're at a library looking for sources for your university homework. You see a book about the topic you're interested in and start reading. After a few pages, there's a reference to an earlier book that sounds interesting. So you go back to the bookshelves, find the other book, and continue reading. Soon you come upon another intriguing reference. This keeps happening, and you end up needing a handful of round-trips to the bookshelves to finish your research.

Now imagine you come back the next day to research a different topic. This time, as soon as you grab a book, a friendly librarian approaches you and hands you a stack of other books they think might interest you. You start reading, and this time, all the references you need are already on your desk. This saves so much time that you can go home early that day!

That's (more or less) what HTTP Server Push does. When you open a website, your browser first requests an HTML file from the server. That will usually contain references to other files (stylesheets, scripts, images, etc.) That means you need many round-trips to the server before the website is entirely loaded.

If the server supports Server Push, it will send those extra files right along with the initial HTML page. When your browser parses the HTML and realizes it needs a certain file, that file is already there!

Cache Awareness

Server Push is a great way to make the first page load more efficient, but what happens when the same user opens the page again?

At that point most files will already be in the browser cache. If the server pushed everything again, it would cause unnecessary traffic (imagine the librarian handing you another copy of a book you already have)!

Ideally only those files should be pushed that aren't in the browser cache, or that have changed since being cached. To achieve that, we need to compare the browser's cache contents to the files on the server. This is addressed in Cache Digests for HTTP/2 (draft 01). Unfortunately, that's just a draft and has never been standardized. As a workaround, you can use the (non-standard) Cache-Digest Header:

Cache-Digest is a custom HTTP request header that contains a summary of your cache contents. What does that summary look like? It could be a mapping of file names to file hashes, like this:

Cache-Digest: /script.js=7c945a00df91db1004abe6821518e9b04d43f814, /style.css=8f2456093274e34b4c975d39f80b5dde99d06531

Some frameworks, e.g. Angular, add the hash of a file's contents to the file name. You get files like main-es5.18254b677f51d2bd0eaa.js . In that case, it's enough to specify just the file names.

The more files you have, the larger that header value becomes. That could be a problem for large sites with lots of extra files - the additional cost of sending the header might outweigh any benefits you get from server push. A good strategy for those sites is to use a Bloom Filter or a Golomb-Coded Set: Those are fancy data structures that can store an arbitrary number of hashes in a fixed size. The caveat: They can have false positives - which means the server might end up not pushing a file because it thinks that file is already cached, when in reality it isn't. That's a compromise that might be worth it in exchange for a smaller header size.

There's an open-source implementation of the Golomb-Coded Set strategy by the developers of the h2o web server.

How do you actually set the Cache-Digest header? You can't set it from your normal JavaScript files, because those will only run when the browser has already started loading your page. You want to add that header before that happens!

A good way to achieve that is to use a Service Worker (see: Service Workers: an Introduction). The Service Worker can make sure all the necessary files are cached, and it can intercept requests and add the Cache-Digest header. Here's a simple example:

self.addEventListener('fetch', evt => {
      if (req.method != "GET") {
          return;
      }
      evt.respondWith(caches.open("v1").then(cache => {
          return cache.match(req).then(res => {
              if (req.mode == "navigate") {
                  // "navigate" means this is a page load request, so we add the Cache-Digest header
                  return generateCacheDigests(cache).then(digest => requestWithDigests(req, digest));
              } else {
                  // Otherwise, just run the request normally
                  return requestWithoutDigests(req);
              }
          });
      }));
});

I've left out the boring implementation details here. What's important is that the header only needs to be added on GET requests with mode = "navigate" (i.e. page loads).

Since you're using a Service Worker, you can also explicitly control which files should be cached and for how long. Let's say you update your site regularly, so you set the cache expiry to one month. When a user first opens your page, any additional assets will be sent with a Cache-Control header that looks something like this:

Cache-Control: max-age=2629800

(2629800 seconds is one month).

Let's say that same user loads your page again after a bit more than one month. But this time, you've been on vacation and your site hasn't changed. Normally the user's browser would still request those assets again, because it can't be sure that they're valid anymore. It would send a conditional request with a header like this:

If-Modified-Since:  Fri, 26 Nov 2021 07:28:00 GMT

The server would check if the file has changed since then, see that it hasn't, and respond with the following:

304 Not Modified
Last-Modified: Fri, 12 Nov 2021 09:45:00 GMT

That's already pretty good because the actual file content won't be transmitted. But it's still some overhead for the additional assets. With a Cache-Digest , we can do better: Upon the initial request the server can check if any of the file hashes are different than in the header. If they aren't, the server pushes the above 304 Not Modified response. In other words, the server tells the client that the file hasn't changed, before the client even realizes it needs that file at all!

That's great but nobody uses it

There's great discussion about HTTP/2 Server Push in the Chromium Developer google group. Tl;dr: Less than 0.05% of sites use it at all - and of those that do, many push assets the browser doesn't actually need. Server Push can make sites more efficient, but it has a far smaller impact than other HTTP/2 features. It's difficult to implement and easy to mess up.

If you want to add Server Push to your site anyway, there are several options: nginx and h2o both support pushing assets based on a Link header: This allows you to control server push from an upstream application by simply setting a header value.

See the instructions for nginx and instructions for h2o. You can test your setup using nghttp2.

In the output, the asterisk means that a resource was pushed by the server.

$ nghttp -ans https://example.com/demo.html
id  responseEnd requestStart  process code size request path
 13    +84.25ms       +136us  84.11ms  200  492 /demo.html
  2    +84.33ms +84.09ms    246us  200  266 /style.css
  4   +261.94ms    +84.12ms 177.83ms  200  40K /image2.jpg
  6   +685.95ms *   +84.12ms 601.82ms  200 173K /image1.jpg


If you made it here, thanks for reading! You now know about another obscure web feature that almost nobody uses. Happy holidays!