At SOASTA, we’re building tools and services to help our customers understand and improve the performance of their websites. Our mPulse product utilizes real user monitoring to capture data about page-load performance.
For browser-side data collection, mPulse uses Boomerang, which beacons every single page-load experience back to our real time analytics engine. Boomerang utilizes NavigationTiming when possible to relay accurate performance metrics about the page load, such as the timings of DNS, TCP, SSL and the HTTP response.
The challenge with ResourceTiming is that it offers a lot of data if you want to beacon it all back to a server. For each resource, there’s data on:
- Initiating element (eg IMG)
- Start time
- Plus 11 other timestamps
Here’s an example of performance.getEntriesByType(‘resource’) of a single resource:
JSON.stringify(), that’s 469 bytes for this one resource. Multiple that by each resource on your page, and you can quickly see that gathering and beaconing all of this data back to a server will take a lot of bandwidth and storage if you’re tracking this for every single visitor to your site. The HTTP Archive tells us that the average page is composed of 99 HTTP resources, with an average URL length of 85 bytes. So for a rough estimate you could expect around 45 KB of ResourceTiming data per page load.
We wanted to find a way to compress this data before we JSON serialize it and beacon it back to our server. Philip Tellis, the author of Boomerang, and I have come up with several compression techniques that can reduce the above data to about 15% of it’s original size.
Let’s start out with a single resource, as you get back from
Step 1: Drop some attributes
We don’t need:
- entryType will always be resource
- duration can always be calculated as responseEnd – startTime.
- fetchStart will always be startTime (with no redirects) or redirectEnd (with redirects)
Step 2: Change into a fixed-size array
Since we know all of the attributes ahead of time, we can change the object into a fixed-sized array. We’ll create a new object where each key is the URL, and its value is a fixed-sized array. We’ll take care of duplicate URLs later:
With our data:
Step 3: Drop microsecond timings
For our purposes, we don’t need sub-milliscond accuracy, so we can round all timings to the nearest millisecond:
Step 4: Trie
We can now use an optimized Trie to compress the URLs. A Trie is an optimized tree structure where associative array keys are compressed. Mark Holland and Mike McCall discussed this technique at Velocity this year. Here’s an example with multiple resources:
Step 5: Offset from startTime
If we offset all of the timestamps from startTime (which they should always be larger than), they may use fewer characters:
Step 6: Reverse the timestamps and drop any trailing 0s
The only two required timestamps in ResourceTiming are startTime and responseEnd. Other timestamps may be zero due to being a Cross-Origin resource, or a timestamp that was “zero” because it didn’t take any time offset from startTime, such as domainLookupStart if DNS was already resolved. If we re-order the timestamps so that, after startTime, we put them in reverse order, we’re more likely to have the “zero” timestamps at the end of the array.
Once we have all of the zero timestamps towards the end of the array, we can drop any repeating trailing zeros. When reading later, missing array values can be interpreted as zero.
Step 7: Convert initiatorType into a lookup
Using a numeric lookup instead of a string will save some bytes for initiatorType:
Step 8: Use Base36 for numbers
Step 9: Compact the array into a string
A JSON string representation of an array (separated by commas) saves a few bytes during serialization. We’ll designate the first byte as the initiatorType:
Step 10: Multiple hits
Finally, if there are multiple hits to the same resource, the keys (URLs) in the Trie will conflict with each other.
Let’s fix this by concatenating multiple hits to the same URL via a special character such as pipe | (see foo.js below):
Step 11: Gzip or MsgPack
Applying gzip compression or MsgPack can give additional savings during transport and storage.
Overall, the above techniques compress raw
JSON.stringify(performance.getEntriesByType('resource')) to about 15% of its original size. Taking a few sample pages:
Search engine home page
- Raw: 1,000 bytes
- Compressed: 172 bytes
Questions and answers page:
- Raw: 5,453 bytes
- Compressed: 789 bytes
News home page
- Raw: 32,480 bytes
- Compressed: 4,949 bytes
These compression techniques have been added to the latest version of Boomerang. I’ve also released a small library that does the compression as well as de-compression of the optimized result: resourcetiming-compression.js.
Read original article here.
About the Author
Nic is a software developer building high-performance websites, apps, and open-source tools as a Director of Engineering at SOASTA, where he works on mPulse and Boomerang. He is also a Microsoft MVP for IE.