Skip to content

Archive for June, 2011

Refactoring an Asset Pipeline (Part 2)

Part 2

In the last post, we covered where the state of things currently were with the asset pipeline.  Parts of it were slow and inefficient and were a bottleneck in our asset generation process.  This was hurting the iteration time of the artists and leading to build times that were unacceptable.  So we drew up a plan to give it a major overhaul.  Below is the overview for what we were hoping to accomplish.

First Steps

We had used distributed building systems to speed up our code building process for quite some time, so we wanted to take advantage of distributing our asset building as well.  We had this going in a small fashion already, where each platform could be built independently.  That only helped a little, however, as there were many parts of the process that were identical between platforms.  So we would want to make it so that the identical parts only happened once and the results could then be shared by the platforms.  With such a massive restructuring, we needed to be able to properly test to make sure that we were getting the same results as when we started so that we could verify everything was still working.  But before we could start breaking our build up we had to make it so that our assets came out the same every time.

You see, in order to satisfy the requirements of the various platforms, it expected data to be organized a certain way.  Sometimes this led to areas containing data that was there only as padding.  If this data was not properly initialized, then it could be random (containing whatever was in memory at the time it was written).  So different runs with the same data could produce different results.  With this being the case, it would be impossible to know if it was our changes that were causing things to be different or if it was just the random data.  So the first step was the ensure binary repeatability, or simply that the output was always the same given the same inputs.  Once that was done then we could have a baseline with which to compare our results and make sure everything was coming out how we expected it.

Similar to how we split out the materials from the model/scene information before, it was also planned to break up the model and scene information and handle them separately.  That way when the designers moved the layout of the scene, they wouldn’t have to go through a full model build.

Distributed for the Win!

Since the goal was to distribute the build as much as possible we had to move any inter-asset dependencies to as late in the process as we could.  That would allow the majority of the building to occur in parallel since an asset wouldn’t have to wait on something else to finish before it could proceed.  In our system, shaders and textures weren’t dependent on anything, but materials depended on both of them.  In turn meshes depended on the materials, models upon meshes, and the scene layout depended on the models.  However, the only time those dependencies came into play were some optimization steps that we did.  One example was that we stripped the model of information from its vertices that wasn’t used by any of the assigned materials.  Say if a model didn’t make use of normal mapping at all, then there was no need for it to have tangent or binormal/bitangent information.

So we decided that we would split the asset building into two separate and distinct steps.  We were influenced by the design of modern compilers and our current code building, where the compiling step happened distributed and the linking step was local.  Using this as a guide, we designed compilers and linkers for our assets that had the same behavior.  So the compiling step would have no external dependencies other than the source data, meaning it could be done completely in parallel.  Then once all of those were done, it would then be linked, where optimization and other platform specific processing could occur.  The final step would be to pack the various assets together in a load-efficient manner.

Don’t Repeat Yourself

As mentioned earlier, a lot of the processing was the same for the various platforms we were developing for.  Since the compiling step could have no dependencies and would output all the info needed for the asset linking (which would then do the platform specific stuff), that meant that the compiling only had to be done once, regardless of the destination platform.  Similar to code compilers outputting an object file, our asset compilers would output intermediate, platform agnostic data that could be used to build for any platform.

One thought that I had while designing this system was given that the compilers were outputting to an intermediate file, it didn’t really matter what the source was as long as the result was the same.  Because of this, we could tailor each compiler to the input file type instead of having a one-for-one relationship with the asset type!  That would allow us to support multiple input types easily to boot.  This was fortuitous since we were considering replacing COLLADA with FBX, which had a little more support in our 3rd party tools.  Once this pipeline refactoring was complete, then we could build a compiler that would read in FBX files and handle both at once.  Then we could make the transition much smoother and less likely to interrupt the art and design schedules.

So our final design had a compiler for each input type and a linker for each platform and asset type with intermediate files going between them.  We could also add additional compilers for other intermediate steps in our pipeline.  One example is that we used Valve’s distance texture algorithm for certain effects, where the input was a texture and the output was a different texture (which contained the distance values).

Packing Files

One of the decisions I made with the original design was to store our assets in two files, one was the header and one was the data.  This was due to the fact that I was told that one of the problems that people had seen before that we wanted to avoid was reading in data to one area just to copy it to its final destination.  So I thought that splitting the files up so that we could avoid that situation was a good idea, and it worked in most cases.  The main point where it failed, however, was when the header and data files became out of sync with each other.  This was a rare case, but happened enough that it was something else that I wanted to fix.  So the final results instead would be output to a single file that contained a basic table of contents followed by the header and data information.

Alongside this file would be dependency information which could be used in a final packing step so that we could group multiple asset types together.  This was very useful as a load time optimization so we could read in a scene quickly in one read and then deserialize it in memory.  However, we would still have loose file support for the times when you just wanted to change one asset and wanted to avoid the final packing step.  This support could easily be removed in shipping builds as well.

Laying This to Rest

So this was a basic overview of the plans that I and several others drew up to optimize our asset pipelines.  I was really looking forward to implementing this and watching our artists and designers be more productive and have an easier time getting the game to look and behave how they wanted.  I think our first designs and implementations were good for what we could do at the time, but I believe this final product would have been truly stellar.  Since I won’t get the chance to actually implement it, writing about what we hoped to accomplish helps fulfill that itch that I’ve had since the realization that none of this would actually come about.  Hopefully this might even spark a few ideas in other people on ways that they can help optimize their workflow and reduce the iteration time.  Thanks for taking the time to read this!