[e2e] Is a non-TCP solution dead?

Cannara cannara at attglobal.net
Mon Apr 21 12:30:25 PDT 2003


Man, you're a dogged reader Mark!  Most of this was discussed on the list a
couple of years ago, so no item should really be a surprise to any old timers
here.  Specific things for TCP were even listed then, like Ack disambiguation
under loss, real ECN, etc.  Anyway, for the newer folks, looking to improve
our networks & transports, let's examine what you raise...

"Use the new stuff that has been developed!  If you don't then why
are you complaining?" -- I only report what I see that other folks have paid
to have figured out, so it's not me who "decides" what to use, it's folks like
uSoft, HP, Sun, yadda, yadda.  And, I know this is a shock to some, but all
TCPs shipped don't work the same way, nor do they all implement all the
"latest" RFCs (not even the same way).  Adding insult to injury, when someone
is told by a high-priced consultant, or finds out what's wrong, like
increasing a window, the vendor too often provides unclear docs, even no
parameter, for making the improvement.  As I hope people will get from this
thread, my most critical opinion is of how the Internet protocols, their
releases and support, have been managed (read unmanaged).  This forces all of
the rest of the world to continually debug what they have and how it works
with what others have.  This despite the fact that Internet protocols were
developed at our own, read taxpayer, expense.

On "If you do [complain] and they do not fix the problem then write it all
down and show everyone where the problems still are." -- I like the "still
are" wishful belief, but, as I said, the TCP problems we see today have been
written down often before, even on this list.  Just to recap some baddies: a)
highly nonlinear slowdown in relation to packet loss (~1% => ~50% slower); b)
an inability to distinguish error loss from "traffic-shaping" loss at
congested nodes; c) Delayed Ack inefficiencies; d) blind use of Slow Start; e)
Susceptibility to non-TCP flows; and more.  The last, of course, is simply due
to the desperation applied to TCP in the '80s as "The Way" to prevent network
collapse.  Now that TCP flows are waning in relation to others, the emperor's
clothes are indeed diaphonous.

On real implementations not working: "That is, you could change the
implementation in ways that do not change the algorithms or the protocol specs
and achieve better performance.  (I've done it.)"  Ok, and maybe I can do a
valve job on your car, but you can't.  Someone fixing any TCP/IP stack in situ
is not evidence of a well-managed protocol, which is what I say in the first
paragraph.  Unfortunately, as the Internet was never designed to be secure, it
was also never designed to be managed.  Management is part of any engineered
product.  It's critical to the established telco systems, on which,
interestingly, Internet growth has depended.  It has not been "interesting" to
the IETF and some others in the Internet community.  And, even if you do
change an implementation and "do not change the algorithms", your TCP will
still be fooled by various real situations and clearly demonstrate its lack of
ken in handling data transport over arbitrary paths. This is the future
wireless is bringing.

On a couple of details: "But, with enough of a window delayed ACKs do not hurt
much at all during steady-state TCP." -- the NT example discussed before shows
why this is false.  When several windows must be sent, for a sizeable block
transfer, and the last window sent has an odd number of packets, the Ack Timer
delays the final Ack, thus the processing of the next block (FTP, SMB...).  If
the path is fast, but RTT is significant, then the Ack Timer gains great
significance.  One could imagine a smarter TCP that, wanting to keep the
imagined capacity savings from Delayed Ack, simply notes if the Ack Timer
popped in the prior block, and if so, assumes it need not delay the end of the
next block the application wants to receive.  But these are bandaids, kludging
the transport, rather than addressing network congestion.  Delayed Ack is of
little value these days, not a waste of resources, and certainly itself a
waste of end-node time in long paths.

On: "Note that I am not claiming that TCP does not have problems or even
that CC belongs in the transport in some ideal networking stack." -- exactly.

On: "TCP definitely has its problems.  And, one can make the case for CC
being in the network layer (or transport or application)." -- exactly, on
Network Layer, that is, because that's where the "network" congestion is!
That's where reliable old telco switches, Sonet ADMs, etc. do their flow
control, statistics gathering, auto link switchover, path sharing, yadda,
yadda.  A datagram network layer riding on this kind of infrastructure works
great, as long as the feed nodes (routers/bridges) know what they're doing
with their packet handling -- i.e., this is the Network Layer to us.  More
needs doing at the packet network layer (IP).

On: "But, I'd like to see a coherent argument made on the topic rather than
bits and pieces of assertion based on sketchy NDA encumbered data in email to
a mailing list." -- the examples I've used here are indeed a few from real
companies' networks, but as I said, you can see some at NetPredict.com, as
sanitized, but no less valid, examples.  As far as coherent recommendations,
I've always said the problem stems from a specific choice to modify a
transport (TCP) so that network congestion won't cause collapse.  This, by
itself, is a engineering flub.  It put us on the wrong path to do congestion
control in the network, which is The Network Layer.  It was a clear choice
made to avoid doing the right things at the network layer.

What to do now?  Well, some folks are quietly working on things, perhaps
timidly, because they don't enjoy the heat a critic of TCP/IP gets here and
some other places -- you know, The Emperor's New Clothes deal, or The Church
wants the Earth to be in the center.  That's how bureaucracies work.  It
doesn't bother me, especially when TCP problems provide income!  Ok, sorry,
that's cold.  The network layer has always been the future for congestion
management, because segments of any path are already dealing with flow control
and management issues, so have valuable info to help a packet network layer,
even allowing any transports above get some useful help.  This is not to be
construed as being "TCP Friendly", which itself is a bureaucratic
misdefinition.

Alex

Mark Allman wrote:
> 
> Alex-
> 
> I have now caught up on this entire thread.  It was painful.
> 
> I do not understand what sort of magic bullet you're looking for.
> You claim that TCP performance is poor.  But, it *has* been
> improved.  And, then you turn around and say that the improvements
> are only in RFCs and not in the real world and so cite results from
> TCPs that do not support the improvements (like fast retransmit!).
> Use the new stuff that has been developed!  If you don't then why
> are you complaining?  If you do and they do not fix the problem then
> write it all down and show everyone where the problems still are.
> 
> > 1) Well, we know its max can be set wrong by default -- NT was
> > shipped with a 6-pkt window limit.  Microsoft upped that a bit a
> > while later.  The user, however, did not have access to the
> > adjustment.  So, talk about RFCs is often irrelevant.
> 
> If talk about RFCs is irrenlevant than so is talk of "protocol
> problems".  If the host operating system (from MS or whoever) is
> using a small window then performance may be sub-optimal.  And, it
> may happen for a number of reasons (can't fill the pipe, loss
> recovery is brittle, delayed ACK packet counts, etc.).  But, this is
> not a problem with TCP.  This is a problem with the implementation
> of TCP.  That is, you could change the implementation in ways that
> do not change the algorithms or the protocol specs and achieve
> better performance.  (I've done it.)
> 
> > 2) A lost pkt will put the receiver into triple Acking,
> > eventually, to try to replace it, if the receiver is of that
> > vintage.  If the sender doesn't understand that, as this NT TCP
> > apparently did not, or if one repeated Ack is lost, then the
> > sender goes into long timeout, perhaps back to slow start.
> 
> Again, we know better, but cannot force all implementations to know
> better.  If you use a larger window or limited transmit your
> performance will increase.
> 
> > Add in Alok's RFC quote: "Many TCP's acknowledge only every Kth
> > segment out of a group of segments arriving within a short time
> > interval;..." and you can see that very long delays can occur when
> > an Ack is lost.  One question we can legitimately ask today is:
> > Why worry about delaying Acks?  Ack every segment received.  This
> > would also allow new feedback info to be included with an Ack, as
> > discussed in the archives, and so let the sender know some useful
> > things more quickly.
> 
> Or, to flip this around.... when things are running smoothly, why
> waste the network resources to send an ACK for every segment?  I am
> certainly an advocate of getting new information to senders as
> quickly as possible.  For instance, note that RFC2581 says packets
> that arrive out-of-order should be ACKed immediately (to help out
> with loss recovery).  That seems fine to me.  But, with enough of a
> window delayed ACKs do not hurt much at all during steady-state TCP.
> 
> (And, I don't think anyone has noted that delayed ACKs do hurt in
> growing the cwnd -- which happens on a per-ACK basis rather than on
> a per-ACKed-byte basis.  RFC3465 is a start at correcting this.)
> 
> Note that I am not claiming that TCP does not have problems or even
> that CC belongs in the transport in some ideal networking stack.
> TCP definately has its problems.  And, one can make the case for CC
> being in the network layer (or transport or application).  But, I'd
> like to see a coherent argument made on the topic rather than bits
> and pieces of assertion based on sketchy NDA encumbered data in
> email to a mailing list.  And, further, I'd like to see results that
> show these problems that you have found.  If they're as easy to show
> as you say then forget the pile of NDA data and generate some new
> unencumbered data -- it should be easy, right?  We all love to see
> reports like that.  As Jonathan noted, there are plenty of people
> who would add energy to solving the problems.  But, until these
> problems can be nailed down and shown concretely to the community we
> don't have much of a chance at solving them based on what Dave
> correctly identified as an extended rant.  If sure would be nice
> (since you have all of these "problems" in your pocket) if you were
> part of the solution.
> 
> allman
> 
> --
> Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/




More information about the end2end-interest mailing list