[e2e] TCP in outer space

Sat Apr 14 19:25:13 PDT 2001

    > From: Cannara <cannara at attglobal.net>

    > then we hear principles like: "thou shalt not cross layer bounds". This
    > despite whole trains of design and discussion history on using router
    > code, that assumes something about a particular transport, to delete
    > frames and thus trick far end transports into backing off, just so the
    > near router code can hope to reduce its filling queues -- how many
    > layers are involved here?

First, you're reading the "layer boundaries" rule far too simplistically.
See, for example, RFC-817, "Modularity and Efficiency in Protocol
Implementation".

Second, in this particular instance layers are being perfectly respected. The
current service model (i.e. the abstract model presented to higher-level
clients) of the Internet layer is quite simple in how it deals with
congestion, and how it signals congestion to its clients: it silently drops
packets. That's the only signal the higher layers currently get.

Higher layers which are potentially sending a lot of traffic MUST (I don't
have the appropriate RFC number at hand, but it is some sort of standard) be
able to detect packet losses and deal with the case in which those losses are
due to congestion, and back off their sending rate. But that all happens at
the higher layer.

You may not *like* that architecture, but it's one we've evolved to, and one
that works pretty well (but by no means Really Well) *in practise*, as part of
*an overall system architecture*.

    > As for comments on "conservative design", I'm always in favor of that,
    > yet we open a vast network to partial control, which actually generates
    > control inputs of the wrong polarity on occasion, and we assume it's ok
    > ... I'd say luck, rather than engineering conservatism, is in force
    > here

First, the world is full of large networks with the characteristic that they
have less-than-optimal control systems - and far more severe penalties when
those control systems malfunction. Try looking at the road network in most
major industrialized countries. Somehow we survive.

The Internet's congestion control mechanisms actually work pretty well, thank
you - in part because of clever design which informs the characteristic that
if people try and get more than their "fair share", they almost always
actually get less.

Which is not to say they couldn't be improved further - I think most people
can see potential further improvements - but the situation is no where near
as dire as your comments above would indicate.

Further, if I'm hearing you correctly, you seem to feel that authoritarian
rate limiting is the only answer - and therefore by definition anything else
(such as our current discard-based mechanism) is wrong. Please don't be
surprised if many (most?) of the rest of us don't agree with you.

Second, the Internet is a work in progress, and there are large parts of it
which bear a strong resemblence to Swiss cheese. Congestion control is, to my
mind, rather far along, compared to, say, routing and addressing.

(Those who've been at meetings with me will no doubt have amusing
recollections of the clouds of steam that come out of my ears when people
doing long-range planning speak of tweaking the current system, as opposed to
"throwing it with great force" [as Dorothy Parker said] in the trash; lock,
stock and barrel. :-)

So, again, your complaints about the awful state of congestion control are
likely to fall on fairly deaf ears.

    > The fact is that network processors are now available that have (per
    > chip) >500,000,000 RISC cycles available per second .. (corresponding
    > to about 2M packets/sec).

Still nowhere near fast enough to deploy in the core of the network -
although we can do some traffic shaping at the edges. But this is something
that's already being worked on in the IETF.

    > Since basic RFC1812 routing takes <200 such cycles per packet

No high-performance router that I know of uses software forwarding on a
general purpose processor. Everyone's gone to specialized ASIC's.

    > router/switch code can do far more these days than ever imagined when
    > the decision to offload performance and capacity decisions from
    > 'gateways' (routers) was made years ago.

I'm a bit confused by this version of history. It doesn't at all correspond
to the one I recall living through.

The decisions on where to put various functions was *not* primarily driven by
efficiency/switch-processing-capacity. Those decisions were principally
driven by things like i) the end-end principle (the version in "End-End
Considerations in System Design", not the *other* end-end principle, which
seems to be about global names, not tweaking packets, etc), ii) robustness,
iii) the ability to work over a fairly wide range of technologies, with
varying underlying capabilities, etc, etc.

For example, if you have a network technology where *you can't tell if the
network is congested*, and *the hardware silently drops packets when
congetsion occurs*, so that when a shared link congests *the switches cannot
detect it* (and yes, we had some of those), then building a congestion
control system that *depends* on the switches noticing, and signalling,
congestion is clearly not going to work.

(Not that we really had a lot of problems with congestion in the early days,
otherwise the poor TCP congestion control would have bit us sooner than it
did - but I digress.)

Your guess about switch processing power is exactly back to front, as a
matter of fact. When the architecture was being set, in the late 70's, it was
often taking about 10K instructions to switch a packet, mostly because of OS
overhead. When I got it down to around 1K packets (and in a higher-level
language to boot), that was a big step. The most common network technology,
ARPANet, had all sorts of wierd back-pressure stuff which meant you had to
keep *all sorts* of fancy meters, queues, etc. Putting in the better output
queue management needed for switch-based congesion control would have been a
cost so incremental it would have been in the noise.

The reason we didn't do it was i) because we didn't have anything like the
understanding of, or focus on, congestion control that we do now, and ii) to
the extent we *did* focus on it, we preferred (for architectural reasons) to
do in the hosts. We did throw in a signal (ICMP SQ) for the routers to notify
the hosts of congestion. However, at that point we still didn't know whether
closed-loop congestion control was even feasible! (Some thought traffic
levels would vary much faster than the RTT, making any closed-loop congestion
control infeasible.)

    > So, for example, rather than simply using the hardware RED capability
    > now available to drop packets, use it to generate a more intelligent
    > control statement to the sender.

The IETF technical community has fairly thoroughly explored the topic of
congestion control and how to do it over the years. Some things have been
tried and discarded, others have been tried and adopted, and still others
are being experimented with now.

Flat assertions that "I know it needs to be done this way" are therefore
likely to have very little impact - and, indeed, likely to merely annoy
people.

Particularly when it's clear, as above, that the person making the statements
has a very poor understanding of what has been done, and why.

	Noel