[e2e] Queue size of routers

Dennis Ferguson dennis at juniper.net
Tue Jan 21 12:07:07 PST 2003


> > I believe (based solely on a single long-ago observation that an M/M/1/K
> > queuing model seemed to predict the measured behaviour of core routers
> > with short buffers pretty accurately) that the speculation in your third
> > sentence above may in fact be true.
> This sounds intriguing. Does anyone have a pointer to this work?

The observation I'm speaking of was buried in an internal report for a
large network operator which studied what to spend money on to improve the
network they had at the time.  This was a relatively long time ago, when
"large" backbone networks were constructed from T3 circuits.  I left my
copy of the report with the company almost 8 years ago.

Since this is very old news, however, I'll live dangerously and describe
the observation.  One of the goals of traffic engineering was to make sure
you had sufficient bandwidth in your network, with traffic sufficiently well
distributed over the topology, to keep the average packet loss rates on
every circuit in the network below the trigger level where customers might
notice the problem and want to complain about it.  Let's set the trigger
level at 1% loss, though the actual number you pick turns out not to matter
so much.  Since the average circuit loads turn out to be fairly predictable
in a large backbone network the practical requirement becomes one of
determining the average circuit load which results in a 1% (or whatever)
loss rate, since this is the average load level at which the circuit is
"full" for purposes of minimizing customer complaints, and is hence the
load level you seek to avoid by engineering your traffic.

The relationship between average circuit load and average packet loss
rate for highly-aggregated traffic is hence interesting to know for
backbone traffic engineering purposes, and to optimize to ensure you
are getting the most from investments in bandwidth.  In particular, to
the extent that the relationship between these two macro-level time-averaged
quantities for a circuit is dependent on output buffer size provided
by the router, "optimizing" the length of the router's output queue
translates into finding the minimum queue length which allows you to load
circuits fairly close to 100% of the actual circuit bandwidth, on average,
while keeping the packet loss rate below 1% (or whatever), on average.
Anything shorter than this wastes bandwidth you are paying for, while
anything much longer potentially increases queuing delay excursions for
little identifiable benefit.

Now at the time some people closer to the voice part of the company
I worked for produced an analysis of exactly the relationship above
using classic queuing theory, assuming Poisson-distributed packet
arrivals, and used this to demonstrate that routers with quite modest
amounts of output buffering (from my point of view) were sufficient for
our purposes.  I didn't think that could possibly be correct, based on
my understanding of conventional wisdom at the time, and so set out to
prove they were idiots.  We were collecting 5-minute-average statistics
for packet and byte rates, and queuing drops, for all circuits in the
network, and when we looked at the (entirely default) configurations
of our routers we found every box with a trunk T3 circuit had
one of 4 output queue configurations: 0.5(!), 1.5, 4 and 9 milliseconds
at T3 at the measured average packet size.  We took 2 weeks of data for
every circuit for which a router had dropped packets in the period and
used that to draw four graphs of average drop rate versus average load,
each one combining data from all routers with the same buffer configuration.

The results of the exercise were that routers in the network with
the same output buffer configurations, including those on long-delay
international circuits, produced indistinguishable loss/load curves
and that all 4 configurations of routers matched M/M/1/K queuing
predictions my not-so-idiotic co-workers used with almost uncanny
accuracy given the coarseness of the assumptions we'd made to produce
comparable results (if I recall the data in the report itself showed
the theoretical curves slightly underestimating the measured packet drop
rates, which I tried to make something of, but I realized afterward that
most of that difference was the result of a mistake I made in computing
the productive data-carrying capacity of a T3 circuit).  In particular,
and if I remember correctly, what we found was that while the routers
with 0.5 ms of output buffering were pretty sad and could only get their
circuits to about 60% load before beginning to drop packets at excessive
rates, the routers with just 9 ms of output buffering could get their
circuits to over 90% of full load on average before this happened, and
were hence just about good enough.  And, of course, since backbone
circuits are now typically 2 orders of magnitude larger in bandwidth than
they were then, the same classical queuing theory, if it applies, predicts
that relatively shorter output buffers should be sufficient now even
though the relative amount of output buffering actually supplied in
current routers has changed several orders of magnitude in the opposite

Note that the network in question was one of the largest backbones in
existence at the time, and even though the data was crude it was still
data obtained at a scale of aggregation inaccessable to contemporaneous
researchers who might have been interested.  I really wanted to publish this
data in particular since the measurements seemed to flatly contradict the
expectations of the effect of scale expressed by some of the people who
were finding long-tailed distributions all over at much smaller levels
of aggregation, and I thought that reconciling these (or, at least,
recognizing that there might be an inconsistency in need of reconciliation)
might help improve our theoretical understanding of how scale manifests
itself in these networks.  Unfortunately, documenting the behaviour of
commercial networks for public consumption is often problematic,
particularly when the most interesting data (like this) are often
obtained by making measurements of defects which you might not want
to admit you have (or, worse, might be artifacts of defects you don't
know you have but which someone else might be able to deduce from the
data), so it doesn't happen all that often.

And, to be clear on the implications of this for the current conversation,
if the macro-level behaviour of core circuits in large-scale networks really
does conform to that expected using the assumption of Poisson-distributed
packet arrivals then you might just as well ignore everything that has been
said in this conversation concerning anything related to the micro-level
behaviour of TCP flow control, bandwidth*delay products and RTTs experienced
by flows as irrelevant at these scales because the best conclusion would be
that all of that is somehow being washed out of the problem by the law of
large numbers anyway.  You could instead just find what you want the old
fashioned way by openning a textbook, finding the right equation (having
no dependence on any of the above) and shoving numbers through it until you
found a queue length which gives you an attractively full link at relatively
modest packet loss rates, at which point you'll probably find that just about
everyone on the planet has their output queues on big circuits set way too
long to be doing anything useful.

Of course it might also be that I'm out to lunch, that this unpublished
and unreproduced observation means something else entirely, or nothing at
all, and the guys who talk about TCP flow control dynamics as if it matters
at this scale are on to something, but to determine this with any certainty
we really need data from big networks which drop a lot of packets out of
their router queues on core circuits, and I've got a feeling that is
something which we aren't going to get a lot of in a hurry.

I hence think the correct answer is that we don't really know how to choose
the size of router queues on core circuits since the behaviour of large
Internet traffic aggregations is not very well understood.

Dennis Ferguson

More information about the end2end-interest mailing list