Optimizing HTTP header manipulation, part 2

Tagged:  •    •  

In the previous post I told you how a couple of primitives (which means, modifications to the base classes) helped speeding up HTTP processing in Swazoo by a factor of 6.

Today I'll remove another part of it by modifying Swazoo itself. To remove Seaside, I used the simple "hello world" site that Swazoo serves if you start it with

gst-load --start=swazoodemo Swazoo

This time I gathered the profile with ab, to whose joys I was introduced by Nicolas Petton. From the output of callgrind_annotate --inclusive=yes, we see a very different profile from last time. Since the character-by-character escaping, which is a major bottleneck of Seaside, is not happening here, Stream operations are a bit far from the top.

But what is the top? Well...

14,421,593  HashedColl.st:HashedCollection>>do: []
13,589,556  CharArray.st:CharacterArray>>asUppercase []
10,843,632  Character.st:Character>>asUppercase []

Yes. 11% of the time is spent converting strings to uppercase. And another 8% is spent iterating through a dictionary. Neither thing seems very cunning; though in defense of the Swazoo authors, these 11+8=19% represented a meager 3% before my optimizations on streams. Anyway, let's see if we can kick them out.

Of course, the answer is yes.

First, let's consider the iteration. The hashed collection in question is a dictionary of HTTP headers. This dictionary is used for both parsing and producing HTTP (i.e. RFC821) headers. It is accessed many times and it is relatively small, so the authors of Swazoo decided to use iteration instead of hashing in some cases, effectively using the dictionary class only as a convenience to store key-value pairs.

Second, the uppercasing. This is done obviously because HTTP headers are case-insensitive, so it's not something we can completely eliminate---but we can almost eliminate it.

The two parts are very much intertwined, it turns out. The Swazoo authors did know about the cost of uppercasing, so they use iteration to look for fields by class (e.g. "give me a ContentTypeField") whenever possible, and hashed access to look for fields by name whenever adding them. The rationale here was obviously to avoid string comparisons.

The solution was as follows:

  1. move the uppercasing to the caller. I first added an abort in case the string was not already uppercase, and fixed the fallout in the testsuite (I don't like test-driven development, but when refactoring it is extremely important to have a good one!). As I said Swazoo tries to look up fields by class, and it also tries to uppercase them as soon as possible, so this already removed almost all the cost of #asUppercase.
  1. store mixed-class keys into the HashedCollection. Whenever you have a header field which has a specific class, I use the class name as the key directly. If I don't, I store the uppercased field name. This is done with polymorphism: I ask the header field for its key, and the choice is done with two implementations in GenericHeaderField (return the name) and SpecificHeaderField (return the class).

The latter technique allows to remove the iteration, because it removes most string comparisons. The additional cost is one dictionary access for some operations (such as accessing the headers by field name) that are rare anyway, usually occurring once or twice per connection.

The result? Here are the new entries for those three methods in the profile

2,013,663  HashedColl.st:HashedCollection>>do: []
1,697,044  CharArray.st:CharacterArray>>asUppercase []
1,640,412  Character.st:Character>>asUppercase []

so more than 90% of the calls have been removed. The profile is now quite flat, with nothing taking more than 5% of the execution time. On my machine (which I am currently running at 800 MHz to save battery), Swazoo happily sustains 1000 hello-world connections per second, and does not drop any connection even with 200 concurrent connections, with only a handful of hiccups:

Percentage of the requests served within a certain time (ms)
 50%     50
 66%     51
 75%     54
 80%     54
 90%     55
 95%     56
 98%     58
 99%     60
100%   9049 (longest request)

While there are still some interesting optimizations that the profile suggests, I am quite fine with this.

User login