The Impossibility of Perfectly Caching HTTP Range Requests

Posted on 2021-06-04

I was just bitten by range request caching in NGINX. The default behaviour upon receiving a range request is to upgrade that to a full request (effectively stripping the Range header) and then return the requested subset to the client. This can certainly be suboptimal in some cases such as when the client tries to download the small table of contents at the end of a large video file and ends up waiting while NGINX downloads the entire video before answering the request.

This issue can be mitigated by setting proxy_cache_max_range_offset. Which keeps the range header—but doesn’t cache the response—if the start offset of the requested range is greater than the specified value.

This is a pretty bad user experience as there is a huge delay before starting the video, or we are disabling caching. It got me thinking, what better strategies are there?

Range Caching Techniques

I looked at a number of proxies and found common patterns for caching range requests. I separated into 3 primary dimensions that can be used to summarize the basic behaviour of most proxies.

Rounding

Some caches round the requested ranges. Rounding can improve cache hit rate and simplify implementation at the cost of using more origin bandwidth. Additionally, if there is a significant difference between the requested range and start of the range after rounding the client will experience additional latency as the cache downloads “unnecessary” data from the origin before receiving the content relevant to the client.

As an extreme case some proxies simply strip the Range header from the origin request, effectively rounding to the entire file. This uses a lot of unnecessary bandwidth (if the rest of the file is not later needed) and can add massive delay when requesting a range far into the file. For example if the client requests data 10GiB into the file they will not receive any content until the proxy has finished downloading the first 10GiB from the origin. (This is the surprising NGINX behaviour described above.)

Note

In theory, it is possible to avoid the extra latency by generating a Range header with the content that the client wants first. (ex: for a server rounding to 10MiB chunks and a request Range: bytes=12345678-20000000, the proxy could request Range: bytes=12345678-20971519,10485760-12345677 from the origin) However this is unlikely to work in practice as many servers will ignore or reject requests like this. Such a rejection is legal and even somewhat suggested by the RFC.

“A client SHOULD NOT request multiple ranges that are inherently less efficient to process and transfer than a single range that encompasses the same data.” While requesting ranges in this way may be more efficient overall it is almost certainly less efficient for the origin.
“[A server] MAY ignore or reject a Range header field that consists of […] many small ranges that are not listed in ascending order, since [it is an indication] of either a broken client or a deliberate denial-of-service attack”. I guess we aren’t requesting “many” ranges out of order but that term is not precisely specified and some real-word proxies ignore all out-of-order requests.

Although we are not fully against the RFC because it does say that a client may suggest ranges out-of-order if “there is a specific need to request a later part earlier”. Which in this case applies because we need to request the data the client requested first to provide the lowest latency.

However, even if out-of-order requests are supported most systems are designed for streaming responses in order, so this may have unacceptable overheads anyways.

Match Mode

There are a couple of techniques used to match the request to the cached content.

None

Ranges are not cached. The range is passed onto the origin server without the response being stored in the cache.

Some caches have mixed behaviour where even though they do not cache partial responses they can respond to range requests if the full resource is in the cache.

Exact

The requested range is stored in the cache key. This means that a cached response will only be used for future requests with the exact same range. This works excellently when clients are frequently requesting the same ranges, and especially well when the same ranges are always used to download a file. For example if serving video files where the player always buffers 1MiB chunks then it can be effectively optimal. However, if clients request ranges that overlap, redundant data will be fetched from the origin and stored in the cache.

For example if a request for Range: bytes=0-99 is made it will be passed to the origin and cached. If another request comes in for Range: bytes=0-99 the cache will be used. However, if a request is then made for Range: bytes=0-49 it will miss the cache and be passed to the origin. Afterwards bytes 0-49 will be stored twice in the cache, consuming valuable space.

Full

All data will only be stored in the cache once, and the request will be fulfilled from the cache if the requested data is available. If only some of the requested content is in the cache the rest will be fetched according to the Fill Mode.

For example if a request for Range: bytes=0-99 is made it will be passed to the origin and cached. If a request is then made for Range: bytes=0-49 is made it will be fulfilled from the cache.

Fill Mode

There are two main ways that caches fetch content from the origin. The simplest is “full fill” whereupon a cache miss for any content the full requested range is requested from the origin. An alternative approach is “partial fill” where only the missing data is requested. Partial fill can be more efficient but has hard to manage consistency issues if the resource is changing, if the response does not match the already cached data the proxy will need to make another request. Furthermore, nothing guarantees that the second request matches the first! In the extreme case of the origin returning a random ETag you would never be able to build a consistent response. The only way to guarantee a consistent response is to fetch the entire requested range in a single request. In practice range requests from quickly changing files are rare, but I’m sure that you could get a PhD analyzing the performance of different fallback behaviours.

If the origin server was determined it could ensure that the fallback was not necessary by keeping around old versions of the file even after a new version is available. This way the old content can be served for partial requests based on the If-Range or If-Match headers, allowing partial updates until the previously cached content becomes stale.

What are CDNs Doing?

I read the docs of some CDNs to see how they handle Range requests. The landscape is quite varied, if anyone can test the behaviour of various CDNs I would be happy to update this list.

Here is a table of what I could identify. Notes with sources are below.

Proxy	Rounding	Matching	Fill
Amazon CloudFront	Yes	Full	Full
Cloudflare	Entire File	-	-
Fastly	Fixed Chunks	Full	Partial
Google Cloud CDN	Fixed Chunks	Full	Partial
NGINX default	Entire File	-	-
NGINX http_proxy_module	Fixed Chunks	Full	Partial
NGINX proxy_pass Range	None	Exact	-
NGINX proxy_no_cache	None	None	-

Amazon CloudFront

The docs say that they may perform rounding, the exact nature is not described.

How CloudFront Processes Partial Requests for an Object (Range GETs) - Amazon CloudFront

Cloudflare

I couldn’t find any docs for how Cloudflare handles Range requests! However, based on my own testing they appear to use the default NGINX behaviour of “rounding” to the entire file.

Fastly

If the partial fill fails they appear to return an error on ETag mismatch based on the “Flapping origin entity” error response.

Segmented Caching | Fastly Help Guides

Google Cloud CDN

Google Cloud CDN has some unique behaviour. Notably it will never cache the first Range request for a given cache key. However, if the server supports Range requests along with a strong ETag it will then cache future requests. Fill is done via multiple requests of 2MiB-16B each.

I couldn’t find anything concrete about how partial fill mismatches are handled. This page seems to imply that a full fill is done on ETag mismatch.

Caching overview | Cloud CDN | Google Cloud

NGINX

NGINX can be configured a number of ways. The http_slice_module module can be used for fixed-size chunk rounding. proxy_pass Range and adding $http_range to the cache key can be used to perform exact matching. Alternatively the cache can be disabled for range requests.

Smart and Efficient Byte-Range Caching with NGINX

What is the Best Strategy?

There is no perfect strategy, especially when you consider implementation complexity. However, there are definitely some options that are better than others in the vast majority of cases.

Rounding

“Rounding” to the entire file is behaviour that performs very poorly in common scenarios, but rounding to a fixed chunk size is a reasonable tradeoff. It can make implementation simpler and avoids issues such as needing to request a couple of bytes between cached ranges. Just be sure that the chunk size is small enough that the extra first-byte latency isn’t significant.

Matching

Full matching is clearly superior as it avoids unnecessary origin fetch and avoids wasting cache capacity on duplicate data.

However, if you are rounding to fixed boundaries and slicing the content Exact and Full become equivalent, so rounding can be an effective way to simplify the matching implementation.

Fill

In general partial fill is superior however in edge cases the behaviour can be suboptimal with extra requests leading to a full fill anyways. I think a perfect solution would use a set of heuristics (or configuration) to select the best behaviour for various circumstances. For example if only a couple of bytes are already cached it probably makes sense to perform a full fill to avoid the risk of validation failure. However, if you have 900MiB/1Gib cached it is likely worth trying the partial fill as even if the resource has changed the overhead of an extra request will be minimal compared to the transfer. Other heuristics include estimating churn rate per file or directory to guess how likely a match is.

Another option is sending one request with If-Match and one request with If-None-Match in parallel. Most of the time exactly one response will return content. In the rare case that neither return you can resend a full fill and in the rare case that both return content you can abort one.