Paper
Niki C. Thornock. Using Set Sampling in Level three Cache Studies.
Master's thesis, Brigham Young University, 1999.
Abstract
In single processor systems, one or two cache levels are sufficient to
reduce the performance gap between the processor and main memory. With
the increasing popularity of multiprocessor systems, this level of caching
is becoming inadequate; adding a third, very large cache (level 3 or L3)
seems a likely candidate for reducing the performance gap. Simulation,
especially trace-driven simulation, is a frequently used method of testing
new cache configurations. Creating a simulator is fairly straightforward
but it is difficult to obtain the long, accurate traces necessary for
simulating extremely large L3 cache systems used in current and future
multiprocessor systems.
We discuss some of the difficulties present in trace collection and
trace-driven simulation. We then describe our multiprocessor tracing
technique and verify that it accurately collects long traces. We
investigate time sampling and two types of set sampling and conclude that
the second set sampling technique achieves the most accurate results. The
miss rate for the second set sampling method is calculated as the number
of misses to sampled sets divided by the total number of references scaled
by the sample size. We found that the sampling accuracy depends on the
workload: if the workload warms up the cache, the sampling technique is
accurate for all cache configurations. If the workload does not warm up
the cache, the sampling technique is only accurate for very associative
caches. We determined that the 10% sample size was the most accurate.
Our chosen sampling method reduces required disk space, enables
simulations to run faster, and effectually enlarges the trace buffer of
our hardware monitor, decreasing trace distortion.