Peter Leonov's Blog

Notes on HTTP headers management in nginx

14 May 2010

As a result of writing request headers management code got this short article written.

This post is also available on the nginx wiki at HeadersManagement.

Intro

As far as the Nginx.HeadersIn and Nginx.HeadersOut classes of the ngx_http_js_module implemented almost fully now we can talk about this headers_in struct a little.

The HTTP headers in nginx are split in two parts: the input request headers (headers_in structure) and the output request headers (headers_out structure). There is no such an entity as a response, all data is stored in the same single request structure. The actual response data is constructed from the request data and the headers_out structure fields.

All the things in nginx are highly optimized. No memory overhead caused by strings copying, no memory leaks and alloc/free burden as far as memory is managed with pools, no wasted CPU cycles by comparing those strings again and again, everything is cached in a sane way, complicated things are pre-calculated at configure stage. So are the input (and output) headers and all this optimizations are the root of their complexity and beauty.

A flexibility of the HTTP headers

Lets talk about HTTP headers a little. All of us have seen lots of headers. And we know that the HTTP header is a very flexible data format. Client may send only one simple header, or many headers with the same name each on its own line, or even one big header split into many lines. It’s a kinda mess. And nginx in its turn tries to rule the mess.

nginx takes care of known frequently used headers (list of known headers_in). It parses it and stores in the handy place (direct pointer in headers_in). If a known header may consist of more then one value (Cookies or Cache-Control for example.) nginx could handle it with an array. And for a header that known to have a numeric value (Content-Length, Expires) nginx will parse the text and store it directly in the headers_in struct. All the rest of headers are carefully stored in a simple list within the headers_in structure, so nothing is been lost.

Get the header

That said, there are at least three ways to get the value. As we already know, every input header value may be obtained by a brute force lookup in the headers_in->headers list (typeof ngx_list_t). The known header value may be found with a help of a simple pointer in the headers_in structure (NULL if the header does not exists). And for the known numeric headers there is even easier way to get the value: by a special field in the headers_in (content_length_n is a good example).

This is good if you know at compile time which header you are going to read. But how do we get the header by its name at run time where the name is just a string? For this kind of situation we have a smart hash of headers ???. If the header is known to nginx the header name is cached within this hash and we can find the header value relatively fast. If the header hasn’t been hashed we’ll have to run through the full headers list and compare all headers names with our one. This isn’t very fast but also isn’t slow. Unfortunately, there is no way to get the offset of the digital (already parsed) representation of the header by its name, we have to parse text value every time it’s needed. And it’s normal while we do not know the data type of the representation of a random header at run time.

So far, we can get a full list of input headers and run through it to enumerate, direct access a header with its personal field in headers_in structure; and we even may get the already parsed value if the header is of type number, time etc.

Some examples follow.

Brute force search for one header with the specified name

static ngx_table_elt_t *
search_headers_in(ngx_http_request_t *r, u_char *name, size_t len) {
    ngx_list_part_t            *part;
    ngx_table_elt_t            *h;
    ngx_uint_t                  i;
    
    /*
    Get the first part of the list. There is usual only one part.
    */
    part = &r->headers_in.headers.part;
    h = part->elts;
    
    /*
    Headers list array may consist of more than one part,
    so loop through all of it
    */
    for (i = 0; /* void */ ; i++) {
        if (i >= part->nelts) {
            if (part->next == NULL) {
                /* The last part, search is done. */
                break;
            }
            
            part = part->next;
            h = part->elts;
            i = 0;
        }
        
        /*
        Just compare the lengths and then the names case insensitively.
        */
        if (len != h[i].key.len || ngx_strcasecmp(name, h[i].key.data) != 0) {
            /* This header doesn't match. */
            continue;
        }
        
        /*
        Ta-da, we got one!
        Note, we'v stop the search at the first matched header
        while more then one header may fit.
        */
        return &h[i];
    }
    
    /*
    No headers was found
    */
    return NULL;
}

Quick search with hash

ngx_table_elt_t *
search_hashed_headers_in(ngx_http_request_t *r, u_char *name, size_t len) {
    ngx_http_core_main_conf_t  *cmcf;
    ngx_http_header_t          *hh;
    u_char                     *lowcase_key;
    ngx_uint_t                  i, hash;

    /*
    Header names are case-insensitive, so have been hashed by lowercases key
    */
    lowcase_key = ngx_palloc(r->pool, len);
    if (lowcase_key == NULL) {
        return NULL;
    }
    
    /*
    Calculate a hash of lowercased header name
    */
    hash = 0;
    for (i = 0; i < len; i++) {
        lowcase_key[i] = ngx_tolower(name[i]);
        hash = ngx_hash(hash, lowcase_key[i]);
    }
    
    /*
    The layout of hashed headers is stored in ngx_http_core_module main config.
    All the hashes, its offsets and handlers are pre-calculated
    at the configuration time in ngx_http_init_headers_in_hash() at ngx_http.c:432
    with data from ngx_http_headers_in at ngx_http_request.c:80.
    */
    cmcf = ngx_http_get_module_main_conf(r, ngx_http_core_module);
    
    /*
    Find the current header description (ngx_http_header_t) by its hash
    */
    hh = ngx_hash_find(&cmcf->headers_in_hash, hash, lowcase_key, len);
    
    if (hh == NULL) {
        /*
        There header is unknown or is not hashed yet.
        */
        return NULL;
    }
    
    if (hh->offset == 0) {
        /*
        There header is hashed but not cached yet for some reason.
        */
        return NULL;
    }
    
    /*
    The header value was already cached in some field
    of the r->headers_in struct (hh->offset tells in which one).
    */
        
    return *((ngx_table_elt_t **) ((char *) &r->headers_in + hh->offset));
}

Blazing fast header access with a structure field

ngx_table_elt_t *
get_host_from_headers_in(ngx_http_request_t *r) {
    /*
    Returns NULL if there is no such a header.
    */
    return r->headers_in.host;
}

Blazing crazy fast header access with a pre-parsed value

off_t
get_content_length_n_from_headers_in(ngx_http_request_t *r) {
    /*
    Returns -1 if the Content-Length wasn't set.
    */
    return r->headers_in.content_length_n;
}

The examples give a good illustration for how much faster the cached and optimized header access may be even compared to the search with a pre-hashed key.

How does hashed search work?

At the configuration stage nginx creates a hash (ngx_hash_t) of known HTTP headers (as mentioned above). In each pair the key is a the header name and the value is a nginx header handler structure (pretty smart structure, you know). In this structure we can see the header name, its handler on a stage of headers parsing (for internal use) and, the most interesting, the offset of the header value in the headers_in struct. This offset is used to fill the appropriate field in the request struct when the request value is been adding. At the parsing stage nginx calculates a hash of the lowercased header name (HTTP headers names are case-insensitive) and searches the header handler by this hash (in main conf headers has). If the handler is found nginx invokes it, otherwise just adds the key/value pair to the plain list of headers (headers_in.headers). Pretty simple if you know how it’ made ;)

What about output headers?

If you’ve red the post you do know almost everything about headers_out. The only difference is the headers_out hasn’t a hash to find the output header by its name at runtime.

How can I set a header?

As far as nginx may store a header value in many places you have to be careful setting a header. Every known header needs a special way to be set. If it is a numeric header you could set it three times: a plain key/value pair in the list, the pointer in headers_in struct and the actual numeric value in the special field of headers_in. Every step reflects the way you get the header value.

For example set the Content-Length in headers_out

ngx_int_t
set_content_length_n_in_headers_out(ngx_http_request_t *r, ngx_str_t *length, off_t length_n) {
    ngx_table_elt_t   *h;
    
    h = r->headers_out.content_length;
    if (h == NULL) {
        /*
        The header is not present at all. We have to allocate it...
        */
        h = ngx_list_push(&r->headers_out.headers);
        if (h == NULL) {
            return NGX_ERROR;
        }
        
        /*
        ... setup the header key ...
        */
        h->key.data = (u_char *) "Content-Length";
        h->key.len = sizeof("Content-Length") - 1;
        
        /*
        ... and then set the headers_out field to tell others
        that the header is already set.
        */
        r->headers_out.content_length = h;
    }
    
    /*
    So far we have the header and are able to set its value.
    Do not forget to allocate the length.data memory in such
    place where the memory will survive till the request ends.
    The best place to store the data is the request pool (r->pool),
    of course.
    */
    h->value = *length;
    
    /*
    This trick tells ngx_http_header_module to reflect the header value
    in the actual response. Otherwise the header will be ignored and client
    will never see it. To date the value must be just non zero.
    */
    h->hash = 1;
    
    /*
    And do not forget to set up the numeric field.
    */
    r->headers_out.content_length_n = length_n;
    
    return NGX_OK;
}

Unknown headers (custom ones) may be just pushed to the list (headers_out.headers).

ngx_int_t
set_custom_header_in_headers_out(ngx_http_request_t *r, ngx_str_t *key, ngx_str_t *value) {
    ngx_table_elt_t   *h;
    
    /*
    All we have to do is just to allocate the header...
    */
    h = ngx_list_push(&r->headers_out.headers);
    if (h == NULL) {
        return NGX_ERROR;
    }
    
    /*
    ... setup the header key ...
    */
    h->key = *key;
    
    /*
    ... and the value.
    */
    h->value = *value;
    
    /*
    Mark the header as not deleted.
    */
    h->hash = 1;
    
    return NGX_OK;
}

Coming soon

In the next post (whenever it posted) we will look at the real code of header management from the ngx_http_js_module. It is much more fun to see some working stuff instead of cold examples.

Tags:
  • backend
  • nginx