[e2e] Can feedback be generated more fast in ECN?

Wed Feb 21 09:09:30 PST 2001

> From: "Eric A. Hall" <ehall at ehsco.com>

> > If the forward congestion is so bad that ECN can do no good, then the
> > sender will have stopped sending because it won't be receiving ACK's for
> > the packets that the forward congestion is dropping.
>
> SQ can tell a sender to stop sending faster.

Within at most 0.5 RTT, that is equally true of ECN.
Note that the router sending an SQ can be at the far end, and so
SQ can be arbitrarily close to exactly as slow oas ECN.
In other words, that 0.5 RTT advantage for SQ assumes the congestion
point is at most half way to the destination.

>                                              We already agree that early
> notification is good, that's the intended goal of ECN as well. What we
> don't agree with is the degree of severity of congestion failure, and that
> early notfication is *always* good. You seem to be saying that it's good
> when it's convenient, I'm saying it's always good, especially when the
> links have collapsed.

That's wrong on several counts:
  - if there is any congestion failure, then there is nothing that either
     ECN or SQ can say to the sender that the sender does not already know
  - early notification does sound good, and that is the whole point of 
     both ECN and SQ
  - I'm not saying that congestion notification is good only when convenient,
     but instead that redundant congestion notification is unneeded and
     a waste of bandwidth and CPU cycles.

You point is that SQ is better than ECN when both are redundant and useless.

Are you aware of how TCP deals with "incremental build-up" congestion
today?  I have the distinct impression not.

> ...
> There are lots of reasons to stop senders when things have gone horribly
> wrong. DOS attacks are one of them, telling the sender to stop quickly and
> reminding them whenever they try to crank up again is good for the entire
> network, is it not?

Again, if things have gone horribly wrong, no matter what the cause,
then there is nothing that SQ or ECN have to say that the data sender
does not know thanks to the very thing that silences ECN.

Are you aware of what happens today when "they try to crank up again"?
It sounds as if you are not familiar with slow start.

>                     Don't you think people would like this feature if it
> helps to seriously constrain DOS attacks?

Exactly how does SQ deal with DOS attacks better than ECN?  How is
SQ anything but yet another denial of security problem waiting to
be exploited by the bad guys?  You can't just wave a wand and solve
the authentication issues.

> What about an oversubscribed exchange point, or when a backhoe has
> redirected your routes over slower links? I can point you to a 100%
> utilized network any day of the week, some of which are poorly planned,
> some of which are accidental, all of them are detrimental.
> Don't those also qualify as being good scenarios for telling people to
> slow down, if not to stop altogether?

And all of which have absolutely no use for either ECN or SQ.  In all of
those cases, people have already been told "to slow down, if not stop
altogether" by the lack of ACKs, and there is nothing either ECN or SQ
have to say that has not already been known.  Once you start dropping
packets, both ECN and SQ have nothing to say that is not a waste of
bandwidth.  Fortunately ECN doesn't waste any bandwidth saying what is
already known, bu SQ does waste bandwidth, and so makes the problem,
whatever it is, worse.

>                                       The senders should slow down rather
> than stop when it is non-fatal, and this is a good usage scenario for SQ
> codes above zero, since they are also examples of where ECN fails to
> notify the sender quickly.

That seems to be a claim that ECN does not work when there are no packet
losses.  Are you aware of how ECN works?

> And of course not everything is TCP.

And of course there is absolutely no hope of SQ doing anything for anything
except for protocols that already have bidirectional flows and kernel
state machinery and that could use ECN, at least among those of us who
are aware of the problems of matching incoming ICMP errors to non-TCP
streams.

Do you know that existing ICMP errors do nothing for applications that
use sendto() instead of connected sockets?  An application that wants to
talk to several peers using UDP will generally use sendto() instead of
connected sockets, or connect() to reconnect() before each (burst of) UDP
packets.  When an ICMP error arrives for a previously sent packet, there
is nothing for the ICMP code to match against in current systems.  You
might hypothesize that the system could record the last 1000
(addr,port,addr,port) 4-tuples sent by each UDP socket, but then what and
how do you tell the application?  You can't cause the next send() or
sendto() to fail because it might be for a different destination than for
the ICMP packet you received 10 ms (or 10 hours) ago.

And then there is hopelessness of expecting a bazillion blast-and-forget
UDP applications to change to pay attention to those sendto() errors they
won't be getting because there is no way in the API's to say "this isn't
an error, but you really ought start a timer so you can slow down."

You can't just wave your wand to create timers and state machines in
applications, even if there were a way to tell UDP applications to invoke
them.

> But the key point is that SQ code 0 works for saturated links, while there
> are 255 more codes to use for other scenarios. SQ is capable of solving
> both problems.

But your SQ code 0 *didn't* work for saturated links, which is why
we have what we have in TCP today.

Why do you keep ignoring the fact that we already know that simple
SQ simply does not work?

> I am raising the point that the biggest congestion problems that we have
> are from failure, not from incremental build-up.

That is simply and obviously false.

>                                                  ECN acts like there's
> never any failure, or that failure doesn't matter since it can't do
> anything about it.

That is simply and obviously false.  ECN doesn't deal with catastrohic
failure because other mechanisms already handle catastrophies better than
any active signal such as SQ can hope.  SQ makes "failure" worse.  When
your backhoe failure happens, you would have a router send a burst of
10,000 Source Quenches in a second or less.  (Only 10,000, because I'm
assuming your magic wand creates the per-flow state machinery in each
router that limits the number of SQ/second/flow, and never mind the
terrible problems of router per-flow state in other contexts.)

A big router moves 1,000,000,000,000 bits (Tbits) or close to 1,000,000,000
packets per second.  Give the number of streams through a Tbit/sec router,
how can you talk about it generating SQ's when one of its OC-192's is
cut by a backhoe?

>                    SQ can deal with both of these scenarios.

No matter how many times you repeat that wish, it remains a wish.

Vernon Schryver    vjs at rhyolite.com