Using Varnish on the live streaming delivery platform

When building the live platform, we focused our attention on the delivery stack. We searched for a software capable of delivering small files (HLS/HDS manifests) as well as large files (Full-HD video fragments) efficiently with following requirements:

  • Highly configurable,

  • Able to handle 10Gbps of traffic with bursts of up to 2,000 connections per second,

  • An architecture that allows us to write our own modules or plug-ins,

  • And most importantly for our use case, support for cache locking (also known as origin shielding).

With this list in mind, we decided to use Varnish, a high-performance HTTP accelerator with its own configuration language called VCL. Servers running Varnish instances are placed in front of the back end servers and directly serve requests coming in from video players.

Caching behavior

Because of the nature of live streaming, every viewer is synced to the same point of the video, which results in the same video fragment being requested at the same time by everybody.

The cache locking system included in Varnish ensures that when a requested file is not in cache, a single request is made to the back end regardless of the number of pending requests for that specific file. This greatly reduces the bandwidth used between the back end and the caching layer.

We also found that the same video fragment could be cached more than once under different keys. That problem was caused by the use of many query-string parameters to manipulate the HLS manifest, so we introduced a white-list of parameters to create the cache key.

We started by implementing it directly in VCL but ended with a giant unreadable IF army. We switched to the open-source VMOD querystring module to keep things under control.

sub vcl_recv {
    ...
    set req.url = querystring.filter_except(req.url, ...);
    return(hash);
}

sub vcl_hash {
    hash_data(querystring.sort(req.url));
    return (lookup);
}
Streaming mode

Varnish offers a streaming mode that allows the transfer of the body response to the client to begin, while it is been fetched from the back end. The lack of distinction between back end and front end in Varnish 3 resulted in an undesired side-effect: the content had to be read and sent in full to the first client before it could be sent to all other clients. If that client was slow to read, it could produce a sufficiently long delay causing all the other clients to drain their play buffer.

With the refactoring of the back end architecture in Varnish 4, this problem is history and the option was re-enabled as follows:

sub vcl_backend_response
{
    ...
    if (beresp.status != 404 && beresp.status != 403)
    {
         set beresp.do_stream = true;
    }
    ...
 }
Varnish tuning

Since a live can be scheduled to start at a fixed hour, we face situations where a server can see the number of requests jump from zero to a very high value in a very short space of time.

The first thing we did was to take a look at the thread configuration options:

  • thread_pool_add_delay We cannot afford requests being queued because of the creation time of a worker thread so we set this value to 0.

  • thread_pool_min We warm up the Varnish instance by setting this value to 1500, to handle request spikes when a server comes back into production.

  • thread_pools Sorry, but you will not find the magic value here :) After discussions with the Varnish Software techs, we set this value to 2.

Monitoring

We monitor each Varnish instance by checking critical metrics from varnishstat's output:

  • MAIN.threads_limited : This value must always be 0 (or not even appear in your output). Having a threads_limited greater than 0 means that Varnish could not create new threads to serve requests.

  • MAIN.cache_hit and MAIN.cache_miss : We calculate the cache hit ratio to determine that the server is doing its job. If the cache ratio falls below the warning threshold, the server is automatically removed from the production pool.

  • MAIN.thread_queue_len : The number of requests waiting to be handled by a worker. A positive value triggers an alert as it implies that fragments may not be delivered in time.

  • MAIN.backend_fail, MAIN.sess_fail and MAIN.sess_dropped : for obvious reasons!

In addition, we have a set of aberration checks on our production graphs using graphite functions. Those checks do not trigger a server being kicked out of production, but send an event to the on-duty team.

What's next?

As live streaming at Dailymotion is used more and more around the world, we have launched new Points Of Presence to enhance our QoS. Rack space is at a premium in the new PoPs, so to ensure the same bandwidth, we are working on a Varnish instance able to deliver content at a throughput of up to 20Gbps. The current version stops serving requests at around 14Gbps, in fact it crashes. We are looking at ways of fixing this. This includes testing the latest version, analyzing the source code to optimize it for our use case, launching two varnish daemons on the same server and so forth.

We have also observed that as the VOD delivery moves to TLS, we have to follow the same path. Varnish itself does not handle TLS, but they have released Hitch, a scalable TLS proxy. We will benchmark it against other well known proxies (such as Nginx and HAProxy) to see which one best fits our needs.