Frame-based game profiling

2014-09-05

To make Tharsis multi-threaded, I need to be able to balance the overhead of individual processes (called Systems in many other ECS frameworks) by moving them between threads. To do this, I need to quickly measure the time spent running each process on a frame-by-frame basis.

When working on ICE, I wrote a simple frame-based profiler that recorded exact time when entering and exiting important zones of code, such as rendering or GUI. I wanted to update this profiler for Tharsis, but it turned out to be easier to rewrite it due to heavy use of globals (not friendly to threaded code) and occasional slow allocations that distorted the output.

I also intended to write a blog post about it... but it got a bit too long so I divided it into three shorter posts. This post is an introduction to what frame-based profiling even is, or rather, what I mean by it. It's likely there's a more common term than frame-based profiling, but I didn't come across it yet.

Other parts of this series:

Main-stream profilers are not always helful (for games)

Profilers such as perf, CodeXL and even Callgrind+KCacheGrind are great for finding the slowest parts of your code in average case. With a good profiler you can identify bottlenecks on source line or instruction level, count cache misses, branch mispredictions and so on. This is awesome to improve the average performance of your game... and pretty useless when you get that two-second heisenlag that only happens once per hour.

With a conventional profiler, irregular spikes in overhead disappear as they are averaged over time. To track down such spikes, we need a profiler capable of keeping track of individual frames.

Frame-based profiler

A frame-based profiler records overhead for each frame separately. An external tool can't really know what 'frame' means (especially if 'frame' is decoupled from rendering), so we have to instrument our code manually.

Once we know the duration of each frame, we can determine when any unexpected overhead happens. If we know how long various parts (zones) of each frame take, we can look at zones of slowest frames to find offending code.

Zones

A zone is a section of code to measure overhead of. Zones may contain child zones (e.g. a "rendering" zone may contain "batch" sub-zones). In D or C++ we can implement this with a RAII type that records the precise time in its constructor and destructor. This can be pretty easy to use:

auto profiler = new Profiler(new ubyte[1024 * 1024 * 16]);
while(!done)
{
    Zone frameZone = Zone(profiler, "frame");

    {
        Zone renderZone = Zone(profiler, "render");

        // ... rendering code ..

        // renderZone end
    }
    {
        Zone collisionZone = Zone(profiler, "collision");

        // ... collision code ..

        // collisionZone end
    }
    {
        Zone aiZone = Zone(profiler, "AI");

        // ... AI code ..

        // aiZone end
    }

    // frameZone end
}

Note

To get precise time, use a high resolution clock such as std.datetime.Clock.currStdTime in D (also see C (POSIX), C++11)

Notes

Unlike a conventional profiler, a frame-based profiler can't measure each function, line or instruction.

Frame-based profiling doesn't replace a profiler; however, it can detect issues such a profiler could not. Once you've localized the issue, you may even isolate it and use e.g. a sampling profiler to optimize it.

Note that keeping track of profiling data can be pretty memory intensive. The next post is about how I minimized the overhead of the frame profiler I wrote for Tharsis.