« Home

Opinionated view of a 3rd generation serving framework

Or how to get 1 second render times on complex web pages …

In the previous post, we saw how we could classify internet serving architecture into different generations and what the implications of each generation are. We also observed there, how, as we enter 3rd generation and later, the need for concurrency becomes central to having a performant serving system.

In the world of complex web pages, like YouTube, etc., the question then arises on how to organize the code to balance ease of coding while still retaining the performance benefits. In this post we look at one such idea, which has proven to work well in practice.

Problem Definition

What do we mean by 1 second render time? It means that once the user types in the URL, the web page is visible for the user to interact with within or at 1 second. There is a lot of content in the web today, like this blog post or this book or this Google developer documentation that talk about browser page rendering and how to look at it from a front-end perspective.

But I’ve rarely seen folks talk about what it means from a backend web-server perspective on how to generate and organize the content in the first place.

Let’s start by quantifying the problem.

Modern browsers today implicitly support what is called the Navigation Timings API in JavaScript to capture and expose the performance of a given web page load. The excellent Navigation Timing Bookmarklet project provides a bookmarklet that you can drag into your bookmark toolbar and once a page is loaded, click to visualize the various stages at a high level.

Here for example is how one of the page loads on the current blog looks like:

NavTimingBlog

As one can see, the whole of the page load can be considered as being three distinct phases, namely,

Most of the documentation, like the ones references above concentrate on how to organize the HTML content so as to reduce the time in the Render phase. In this post, we will take a stab at the Request phase, or in other words, how to structure the code in the backend so as to make the combination of \( Connect + Request + Render \) is 1 second or lower.

The rendering above is quite generous as we are using a 5th generation architecture, which means we are mostly serving a static page and connecting from within the US.

The Connect phase is mostly driven by network latency and DNS lookup timings. If you look at data from AT&T and Verizon , it ranges between 30 to 50 ms for roundtrip within the continental US and about 100 ms to cross the atlantic. So the rule of thumb is to assume between 250 to 300 ms for connection setup, assuming international traffic and SSL handshake time.

Similarly, assuming we do all the “right” things for the Render phase to include caching and edge-servers for serving the various assets, including JavaScript startup time and DOM rendering time, we can assume between 300 to 400 ms for the browser to start rendering the page and make it visible to the user after fetching the content.

Assuming around 100 ms for the latency bandwidth, to be perfectly safe, we are looking at a Request time between 250 to 300 ms to generate the web page.

Approach

Assuming a 3rd generation or later architecture, rendering the HTML content for the web-page will consist of again three phases,

Going back to the YouTube example, if we consider it being made of three main sections, namely, the header, left-nav and the central stream, we can imagine that there could be three services, one for each of the sections respectively. The problem we face though is that HTML has no concept of these sections or components. So have an impedance mismatch between how we would like to think about the page vs how the HTML document wants it to be described.

We first need to address this problem.

One of the most elegant solutions that exist out there to address this problem is using the concept of a Widget from the Yesod Haskell Web Framework. We can visualize the concept of a widget as an interface that supports getting the various parts of the web page so that they can be stitched together correctly by a higher level orchestrator. For example, in Go, it could look something like

type Widget interface {
    // header information
    getCss()           []string  // CSS file links
    getJsSrc()         []string  // JS script source links
    getHeaderContent() []string  // other header content like inline CSS, script, meta-tags, etc.
    getTitle()         string    // (optional) only if this widget is providing the title 

    // body information
    getBody()          string    // body content
    getBodyScripts()   []string  // any scripts that need to be added at the end of the body
}

or in Java as

public interface Widget {
    // header information
    public List<String> getCss();            // CSS file links
    public List<String> getJsSrc();          // JS script source links 
    public List<String> getHeaderContent();  // other header content like inline CSS, script,
                                             //    meta-tags, etc.
    public String       getTitle();          // (optional) only if this widget is providing
                                             //    the title
    // body information
    public String       getBody();           // body content
    public List<String> getBodyScripts();    // any scripts that need to be added at the end of
                                             //    the body
}

This then allows the main driver of the framework, after parsing the request, identify the list of widgets that need to make the current page and then call the appropriate functions during the generation phase to stitch together the HTML in the right format.

This also then allows the framework to do optimizations like early flush once all the header information is available to further improve the perceived performance of the web page.

While this solves the impedance mismatch problem, it does not address the performance concerns for generating the HTML. Assuming we budget around 50 ms for the \( Parse + Generate \) phases (which is not unreasonable as they are mostly doing in-memory computations based on local data), we are left with around 200 ms for fetching the resources.

Assuming we have to fetch say 5 resources to fetch the page (I know we said three above, but bear with me as an illustration - it can get to larger numbers if we consider each itself being made of additional widgets - yes it’s turtles all the way down), assuming a budget of 200 ms, assuming sequential fetch of each of these resources will mean that the mean latency for each resource cannot be more than 40 ms ( \(200 \div 5\) ).

On the other hand, if we were to concurrently fetch each of these resources, then the budget increases to 200 ms for each resource! It’s far easier to build resources that have to return data in 200 ms than ones that have to do it in 40 or 50 ms. In addition, the concurrent fetch also has the benefit that as the number of resources needing to be fetched increases, it latency budget per resource does not drop significantly, whereas in the sequential case, it changes drasticaly for each additional resource added to the mix.

So, can we have our cake and eat it too? Meaning, can we solve the performance problemn and the impedance mismatch problem? Turns out we can! To do so, let’s enhance the Widget specification with two additional methods, one to get the list of resource URLs to fetch for rendering the header content and another to get the list of resource URLs to fetch for rendering the body, like so,

type Widget interface {
    // definitions as above ...

    getHeaderResourceUrls() []string          // list of url's to fetch for rendering the header
    setHeaderDataReceived(data *ResourceData) // data fetched for header resource URLs

    getBodyResourceUrls()   []string          // list of url's to fetch for rendering the body
    setBodyDataReceived(data *ResourceData)   // data fetched for body resource URLs
} 
public interface Widget {
    // definitions as above ...

    List<String> getHeaderResourceUrls();          // list of url's to fetch for rendering the header
    void setHeaderDataReceived(ResourceData data); // data fetched for header resource URLs

    List<String> getBodyResourceUrls();            // list of url's to fetch for rendering the body
    void setBodyDataReceived(ResourceData data);   // data fetched for body resource URLs
}

Here ResourceData represents the results of doing the concurrent fetch. Now the widgets just declare what resources need to be fetched. The orchestrator can then collect all this information by querying the widget, dedup them, and then make one concurrent fetch of all these resources. Once the data is available the appropriate set method is called so that widgets can update their local state and render the appropriate content for each of the generation phases.

Why two get/set combinations? In practice it turns out that we either need some data to be able to render personalized content in the header section (like title etc.,) or need to first do a setup like call to get some meta-data before doing the actual request during the body phase. Splitting it into to and then providing a budget of 50 ms for the header resources and 150 ms for the body resources still allows us to provide adequate budgets for a quick header flush, followed by a more generous body content generation.

Finally, the psuedo-code for the orchestrator itself now becomes

    build list of widgets given current request
    collect all resource requests for HTML head content
    fetch resources concurrently
    render the HTML head content with early flush
    collect all resource requests for HTML body content
    fetch resources concurrently
    render the HTML body content

The nice thing is that this micro-framework idea can be implemented in almost any language, including PHP (for example using the curl multiget for the concurrent fetch). And it allows the rest of the system to have generous budgets, while still controlling the overall render time performance for the HTML on the server side.

The render phases can mostly be syncrhonous as they are compute bound and working on local data. This leads to, what is computer science, we call high cohesion, but low coupling between the widgets. Each widget or component author can work fairly independently of others, which also leads to better maintainence and easier evolution of the web page as the requirements change.

Observations

While algorithmic complexity (and Big-O notation) are useful to describe and reason about the runtime characteristics of code at the small scale, performance in today’s systems is mostly governed by I/O latency. As computers have evolved, memory and I/O access speeds have not kept up with processor speeds. So most CPU’s spend a lot of idle time waiting for data. This requires us to think differently on how to attain “global optima” of performance by factoring in the I/O cost. As can be seen above, doing so does not have to mean sacrificing code maintainabilty or structure. Instead, we can use separation of concerns to solve for each using the appropriate tools, while still leaving most of the code synchronous and easy to reason about.