art with code

2010-05-21

Parsing tarballs with JavaScript

Update: Check out this augmented version that streams gzipped tarballs.

Here's a small piece of JavaScript to parse tarballs and my custom JSON packfiles. There be four demos as well: loading files from a tar, streaming images from a tar, loading files from a JSON packfile and streaming files from a JSON packfile.

The part that converts images to date URLs is a bit slower than it could be, as it has to strip high bytes off the characters. The upcoming JS File and Blob APIs for binary data handling should help there. Though if you have less than a hundred kB of images, I don't think you'll even notice the delay. Even half a meg of stuff unpacks in a fraction of second on my slow laptop (Pentium M 1.7GHz). If you do need speed, you can convert the images to data URIs beforehand.

Quickly estimating, it'd take something like fifteen seconds to load up a hundred megs of models and textures on my laptop, maybe around 5 s on a decent computer. Doing the initial archive parsing pass would take maybe a second for a hundred meg archive. If that's too slow for you, I want your internet connection. If the hundred meg tarball is split 1:4 geometry:textures, where the geometry takes 20 bytes per tri and the textures are 10x compressed JPEGs, it'd have 1 Mtri geometry and 240 Mpx textures.

The script doesn't handle gzip or any other compression, use gzip-encoding on the server for that. The tar file format is pretty simple: it's based on 512-byte blocks and each file begins with a 512-byte header, followed by the file data padded up to a multiple of 512 bytes. The numbers are represented as octal ASCII (though there is a GNU tar extension that uses binary ints for handling files bigger than 8 GB, which my script doesn't support).

My JSON packfile format consists of a one-line JSON header array of {filename : string, offset : bytes, length : bytes} followed by a newline and the concatenated file contents. Easy to create and parse.

Edit: Added streaming using xhr.readyState == 3 checks. It might cause some stuttering on the page when dataURLing the images, though it should be quite efficient otherwise. Optimizations welcome :)

3 comments:

gero3 said...

seems like this would be usefull for o3D

Ilmari Heikkinen said...

Yeah, O3D's original plugin archive loader used tar.gz. If you either skip the compression or do it in HTTP, you could load O3D tarballs.

...I wonder if there's a JavaScript gunzip implementation.

Anonymous said...

Browsers will handle gz compression.

Also there's this now: https://github.com/bigeasy/node-tar

It shouldn't be too hard to make it work on modern browsers, with something like node-browserify.

Blog Archive