ClientServicesPerformance

ClientServices invoked from an application running outside the client are handled from the ThereClient’s poll loop, which appears to run at the simulation step rate of 30 ticks per second. If you fire sequential requests at the client, you can therefore expect to get exactly 30 services processed per second.

This page describes some experiments in this area with different services

Benchmark Setup

All of these tests were performed against the V2.06 client, on a machine with the following characteristics:

Windows 2000 Professional, SP4
Single AMD Athlon XP 2400+ processor (2.06GHz clock rate)
1GB RAM
Radeon 9600 Pro video card

On this machine, the OS CPU usage always shows 100% utilisation when the ThereClient window has focus, although this falls to anything down to 25% or so when the ThereClient loses focus, depending on scene complexity. I will refer to these as foreground and background operation for the client.

I use the PerformanceGraph and the flipHud facility to detect whether the ThereClient is able to keep up its normal 30fps rendering rate under load.

When reporting service rates, numbers are rounded to one decimal place. When reporting speedups (rate ratios), numbers are rounded to two decimal places.

Benchmark Axes

I can think of the following performance axes given the above client hardware configuration:

Client running foreground or background
Type of operation being requested:
- fetch root (“/”) page
- pilot info
- forcefield edprop via ScriptHooks
- fetch “/ihost/” page
- fetch thob list (currently done from “/ihost/” page)
- fetch full details page for single thob
- fetch thob list from ScriptHooks “/thobs” path
Number of concurrent requester threads
Number of objects in locality:
- Essentially none, for example in a spot a couple of hours boat ride from the North Pole.
- Moderate: in the Oasis
- High: Karuna Plaza
Priority class of ThereClient: Normal or Below Normal; the latter allows an external application to get more of the available CPU time.

Obviously we can’t test all of these possibilities. However, I have written a benchmark application that automates the process of collecting data for six different ClientServices with from 1 requester threads upwards (until the ThereClient stops replying) to reduce the number of possibilities required.

Raw Data

Raw data for the initial benchmark runs can be found at RawData20040502.

Summary Conclusions

Whatever you do, you are likely to be able to get the basic 30 responses per second from the current client.
Whatever you do with the current client, you can’t get more than 300 responses per second, which is to say a speedup factor of 10.0.
You never get super-linear speedups when you add requester threads (not a surprise).
It is never beneficial to have more than 10 requester threads.
More than 14 or 15 requester threads can result in failed requests.
Long responses cause drastically more problems than short responses.
It is better to use several concurrent ScriptHook accesses than a single request for the full detail page followed by data extraction from that.

Detailed Conclusions

It looks like the client is servicing requests at the top of its main loop, which we can assume happens at the same basic rate as the simulation engine and display pipeline, which is to say at 30 times per second. This, the aim is to have the client process as many requests per main loop iteration as possible.

It seems to be impossible to get the client to service more than 10 requests per main loop iteration, no matter what kind of request they are. This may be the result of having an internal queue that is 10 slots long, although it is not clear why having more request threads than this does not immediately cause trouble: you need to push things up to 14 or 15 threads to get to that stage. It may be that because of overlapping between the ThereClient and the benchmark application, 14 threads (say) are in fact only forming a queue of a maximum of 10 requests at the ThereClient end.

In practice, it seems that limiting the number of requester threads to 10 is both optimal in terms of result rate and cautious in case there is an overlap effect we need to be worrying about.

If the ThereClient is either operating in the background, or at a lower priority class than the benchmark code (or both), we see essentially linear speedups as we add requester threads. As the raise the number of threads, the CPU utilisation rises to 100%. At this point, if the ThereClient is in a lower priority class, then the frame rate will start to drop from its normal 30fps. If the ThereClient is in the normal priority class, the impact is a drastic drop in response rate, probably caused by the Windows scheduler getting confused. This is particularly visible with long responses: short responses (perhaps those that can be transferred in a single chunk) don’t cause these problems.