| T. Blackwell. Ph.D. Thesis, Harvard University, 1998.
This thesis presents and analyzes a simple principle for building systems: that there
should be a random component in all arbitrary decisions. If no randomness is used, system
performance can vary widely and unpredictably due to small changes in the system
workload or configuration. This makes measurements hard to reproduce and less
meaningful as predictors of performance that could be expected in similar situations.
To measure the sensitivity of non-randomized systems to slight configuration changes,
we measured the variation in performance measures of TCP/IP and workstation memory
systems as a result of "small" configuration perturbations. By "small" we mean within the
range over which things may change unintentionally due to other modifications being
evaluated, or within the range of accuracy that an independent researcher could reasonably
achieve.
For TCP/IP, changes of a few percent in link propagation delays and other parameters
caused order of magnitude shifts in bandwidth allocation between competing connections.
For memory systems, changes in the essentially arbitrary order in which functions were
arranged in memory caused changes in runtime of tens of percent for single benchmarks,
and of a few percent when averaged across a suite of benchmarks. In both applications the
measured variability is larger than performance increases often reported for new improved
designs, suggesting that many published measurements of the benefits of new schemes
may be erroneous or at least irreproducible.
To make TCP/IP and memory systems measurable enough to make benchmark results
meaningful and convincing, randomness must be added. Methods for adding randomness
to conventional program linkers, to linkers which try to optimize memory system
performance by avoiding cache conflicts, and to TCP/IP are presented and analyzed. In all
of the systems, various amounts of randomness can be added in many different places. We
show how to choose reasonable amounts of randomness based on measuring configuration
sensitivity, and propose specific recipies for randomizing TCP/IP and memory systems.
Substantial reductions in the configuration sensitivity are demonstrated, making
measurements much more robust and meaningful. The accuracy of the results increases
with the number of runs and thus is limited only by the available computing resources.
When the overall performance of a system is strongly influenced by the worst case
behavior, reducing the sensitivity of the system can also make it perform better. Using
average waiting time as a metric, TCP/IP performance is shown to improve significantly
when randomization is added to the sending host's congestion window calculations.
Although the improvements are less than those achieved by previously proposed schemes
using randomized packet discard algorithms inside the network, the proposed
modifications can be implemented entirely in the sending host and so can be deployed
more easily.
|