Frame-based profiling with D ranges

In previous posts I wrote about frame-based profiling and how to do it with reasonable memory usage. Tharsis.prof is the library (in D) I wrote based on these ideas. It provides minimalist RAII-based API for recording overhead of zones. However, the more interesting part of Tharsis-prof is the range-based API used to process profiling data.

Ranges in D

A lot has been written about D ranges, but for our needs it should be enough to say that ranges are a couple of compile-time concepts describing sequences.

For example, a random number generator that generates numbers in a sequence can be used as an input range, a single-linked list can be iterated with an forward range range that in an input range that can save its current position into a copy, a doubly-linked list can use a bidirectional range (which is a forward range) and so on.

D libraries use compile-time predicates to determine that something is a range. For example, isInputRange!T checks that type T has E front(), void popFront() and bool empty() members where E is the element type. These range primitives provide one-directional iteration or generation functionality of an input range. Any type that defines these primitives is considered a range. Since this is checked at compile-time, there is no need to use virtual functions, allowing the compiler to inline the range methods (assuming a compiler with a decent optimizer, such as GDC or LDC).

D standard library and many third-party libraries are designed to work with ranges. For example, filter from std.algorithm takes any input range and returns a new range type that will only iterate over filtered results from the input range. map and reduce from the same module do the same thing their namesakes from other languages do, std.range.cycle takes a forward range and wraps it in an infinite circular range, etc.

Tharsis-prof ranges

Tharsis-prof uses range types to wrap profile data (which is a raw byte buffer, see previous post) in a type-safe API. D standard library (especially std.algorithm) can then be used to analyze this data.

EventRange

EventRange is a simple forward range that wraps raw profile data. Elements of this range are of type Event; a simple struct that specifies event type (such as ZoneStart or ZoneEnd), time when the event occured and any event-specific data. EventRange is not not very useful by itself; it serves as a low-level base for other ranges. It does, however, provide a lightweight type-safe API to read profile data. Implementation of EventRange is very simple; it just needs to remember its position in the profile data buffer (updated in popFront()) and read the bytes at this position to produce an Event (in front()). Events produced by EventRange are ordered by time.

Usage is pretty straigforward:

// "profiler" is a Profiler we used with some Zones to get some profiling data.
// Profiler profiler;

import std.stdio;
// Create an EventRange from profile data with UFCS syntax.
auto events = profiler.profileData.eventRange;
// Foreach over range calls popFront()/front()/empty() internally
foreach(event; events)
{
    writeln(event);
}

import std.algorithm;
// Get a range of only the events with start time between 1000 and 5000 (hectonanoseconds)
// (std.algorithm.filter)
//
// This line doesn't filter anything or allocate memory; filtering only happens once
// "filtered" is iterated over (but if we did want to do the filtering right now, e.g.
// to get an array of filtered results, we'd suffix this with ".array")
auto filtered = events.filter!(e => e.startTime > 1000 && e.startTime < 5000);

// Print the IDs of events between 10000 and 50000 hectonanoseconds
// (std.algorithm.map, although admittedly we could just print event.id for each event)
foreach(id; filtered.map!(e => e.id))
{
    writeln(id);
}

// Count the number of events between 1000 and 5000 (std.algorithm.count)
writeln(filtered.count);

ZoneRange

ZoneRange is a forward range that wraps an EventRange, or rather any forward range of Event. Like EventRange, it’s a lightweight wrapper that generates elements on-the-fly; it generates ZoneData structs which specify the time and duration of a Zone, zone info string and ID of its parent zone. Unlike EventRange, ZoneRange does need some extra memory to store the stack of parents of the current zone. ZoneData generated by a ZoneRange are sorted by their end time (the ZoneRange must read both zone start and end events to construct ZoneData).

ZoneRange allows us to get some more interesting data from a profiling run, such as all zones in the longest/slowest frame (which is a likely place to find what causes any lag):

// Profiler profiler;

import std.algorithm;
// zoneRange() internally builds an EventRange and wraps it in a ZoneRange.
auto zones = profiler.profileData.zoneRange;
// filter! produces a range of only the zones with the string "frame" as info,
// which is the info string we passed to Zone instances used to wrap the entire frame.
auto frames = zones.filter!(z => z.info == "frame");

import std.container;
// std.container.Array constructor builds an RAII array containing zones from frames.
// We need an array as we need a random access range to sort the zones.
auto frameArray = Array!ZoneData(frames);
frameArray[].sort!((a, b) => a.duration > b.duration);

import std.stdio;
// Print durations of 4 longest frames.
foreach(frame; frameArray[0 .. 4])
{
    writeln(frame.duration);
}

// Print details about all zones in the worst frame.
auto worst = frameArray[0];
foreach(zone; zones.filter!(z => z.startTime >= worst.startTime && z.endTime <= worst.endTime))
{
    writefln("%s: %s hnsecs from %s to %s",
             zone.info, zone.duration, zone.startTime, zone.endTime);
}

accumulatedZoneRange

accumulatedZoneRange is a function that returns a range of a Voldemort type that can not be named directly. In fact, the type will differ based on input parameters. accumulateZoneRange takes one or more forward ranges of ZoneData and uses functions (accumulate and match) passed as compile-time parameters to merge matching zones and accumulate data about them. match, by default, considers zones with the same info to be a match, which will merge them if they share a parent or their parents have been merged. One use of this is to get total or average time spent in each zone between frames, similarly to a conventional profiler, but depending on the accumulate function different data can be calculated. For example, the maximum duration of any matching Zone (to get not just the longest frame but the longest time for every zone in all frames), the number of times a zone was entered, and so on.

Element type of range returned by accumulateZoneRange depends on accumulate, and consists of a ZoneData and the return value of accumulate, whatever it may be. For example, you could use accumulate with a struct return type to accumulate multiple values simultaneously.

The match function could be changed, for example, to merge zones or keep them separate based on their info, or to merge similarly named zones (e.g. numbered draw calls).

Unlike ZoneRange, accumulateZoneRange (the function itself, not returned type) needs a decent chunk of memory (enough to copy all zones from the input zone ranges). This must be provided by the user. See example (sorry for the gray text, the highlighter apparently doesn’t know @nogc yet):

// Accumulate data into this struct.
struct ZoneInfo
{
    ulong minDuration;
    ulong maxDuration;
    // Needed to calculate average duration.
    size_t instanceCount;

    // We also need the total duration to calculate average, but that is accumulated
    // by default in zoneData.
}

// Gets min, max, total duration as well as the number of times the zone was entered.
ZoneInfo accumulate(ZoneInfo* aPtr, ref const ZoneData z) pure nothrow @nogc
{
    if(aPtr is null) { return ZoneInfo(z.duration, z.duration, 1); }

    return ZoneInfo(min(aPtr.minDuration, z.duration),
                    max(aPtr.maxDuration, z.duration),
                    aPtr.instanceCount + 1);
}

auto zones      = profiler.profileData.zoneRange;

const zoneCount = zones.walkLength;
alias Data = AccumulatedZoneData!accumulate;
// We could also do 'new Data[zoneCount]' with the GC, or a safer malloc() wrapper
auto accumStorage = (cast(Data*)malloc(zoneCount * Data.sizeof))[0 .. zoneCount];
scope(exit) { free(accumStorage.ptr); }

auto accumulated = accumulatedZoneRange!accumulate(accumStorage, zones.save);

// Write out the results.
foreach(zone; accumulated) with(zone.accumulated)
{
    import std.stdio;
    writefln("id: %s, min: %s, max: %s, avg: %s, total: %s, count: %s",
             zone.id, minDuration, maxDuration,
             zone.duration / cast(double)instanceCount, zone.duration, instanceCount);
}

Conclusion

For now, Tharsis-prof can do what I need it to do: profile my code over short durations of time and allow analyzing results of such short profiling runs in real time. There’s still a lot that could be done to make Tharsis-prof a decent profiler. A tree structure of zones would make analysis of profile data easier in some cases; memory usage can be decreased even further (e.g. by using a lightweight compression algorithm); and of course, a way to visualize profile data (preferably in real time) would be very useful.

I expect to revisit Tharsis-prof eventually, but for now I need to put it in use for Tharsis itself.