art with code

2009-04-26

Some O3D vs. GL canvas performance analysis

To continue, my opinion on O3D is that it probably has the best approach for flexible rendering of complex scenes in the browser that I've seen yet. To explain, a little bit of background:

The bottlenecks on the JavaScript side are GC, JS -> C++ API call overhead (timed it at around 1.3 ms / 1000 calls here), and 4x4 matrix multiplication (~20-50x slower than C, and you do it a lot.)

O3D works a bit like editable display lists: you have a transform graph of objects that are separated into draw lists which are turned to native API calls in the renderer. And all that happens on the C++ side, so you don't have any JS -> C++ overhead for the drawing calls.

Suppose you want to animate a thousand meshes and don't want to push the matrix math to your vertex shader. To draw a single mesh, you need to bind the mesh's VBOs and textures, then setup the shader uniforms (transform, samplers, lights) and call the draw command. That means a dozen API calls per mesh, so a thousand meshes would need 12000 API calls, which'd take 15 ms just in call overhead.

And if you multiply the matrices for each mesh in JavaScript, you end up creating at least a thousand matrices worth of garbage every frame (~= 7.7 MB/s), triggering a Firefox GC run every 3 seconds (IIRC the max alloc until GC is 24 megs.) And as a thousand matrix multiplications takes 4 ms here, you end up with a total 19 ms JS overhead per frame and a framerate glitch every three seconds courtesy of GC.

The matrix math overhead is livable, but the JS -> C++ overhead and the GC pauses (on Firefox and Webkit) sink it. On O3D's embedded V8 JS engine, the GC pauses are less of an issue, as it uses a generational collector and the temporary matrices should be taken care of by the fast young generation collections (take this with a dose of salt, I haven't timed it.) They still get some hurt from having to do the matrix math in JS, but it's not too bad compared to the API call overhead and GC pauses.

What to optimize?


The best solution for the API call overhead would be to minimize JS -> C++ call overhead in the JS engine, maybe by generating direct native calls from the JIT.

Making an immediate-mode API that works on the concept of editable draw lists that are executed in C++ would get rid of the API call overhead as well. Even a system to draw a mesh with a single API call would drop the amount of API calls to a tenth (I imagine it'd be something like drawObject({buffers: [], textures: [], program: shader, uniforms: {foo4f: [1,2,3,4], bar1i: 2} }), but the draw list approach is probably easier to implement and more flexible. Record calls into a draw list and run through that in C++.)

GC pauses really need to be fixed in browser JS engines.

The matrix math slowdown isn't too bad but it's still nasty. I've heard some talk of adding native Vector and Matrix types to JS and maybe something like Mono.SIMD.

No comments:

Blog Archive