[e2e] Reacting to corruption based loss

Tue Jun 28 23:35:52 PDT 2005

On the error rates issue, mobile is an extreme case, always subject to
difficult conditions in the physical space, so symbol definitions & error
correction are paramount.  However, most corporate traffic isn't over mobile
links, but dedicated lines between routers, or radio/optical bridges. etc. 
Here, the reality of hardware failures raises its head and we see long-lasting
error rates that are quite small and even content dependent.  This is where
TCP's ignorance of what's going on and its machete approach to slowdown are
inappropriate and costly to the enterprise.  

As an example of the latter, a major telecom company, whose services many of
us are using this instant, called a few years back, asking for help
determining why just some of its offices were getting extremely poor
performance downloading files, like customer site maps, from company servers,
while other sites had great performance.  The maps were a few MB and loaded
via SMB/Samba over TCP/IP to staff PCs.  The head network engineer was so
desperate, he even put a PC in his car and drove all over Florida checking
sites.  This was actually good.  But, best of all, he had access to the
company's Distributed Sniffers(r) at many offices and HQ.  A few traces told
the story:  a) some routed paths from some offices were losing 0.4% of pkts,
while others lost none; b) the lossy paths experienced 20-30% longer
file-download times.  By simple triangulation, we decided that he should check
the T3 interface on Cisco box X for errors.  Sure enough, about 0.4% error
rates were being tallied.  The phone-line folks fixed the problem and voila,
all sites crossing that path were back to speed!

Now, if you were a network manager for a major corporation, would you rush to
fix a physical problem that generated less than 1% errors, if your boss &
users were complaining about mysterious slowdowns many times larger?  0.4%
wasn't even enough to trigger an alert on their management consoles.  You'd
certainly be looking for bigger fish.  Well, TCP's algorithms create a bigger
fish -- talk about Henny Penny.  :]

The files were transferred in many 34kB SMB blocks, which required something
like 23 server pkts per.  The NT servers had a send window of about 6 pkts
(uSoft later increased that to about 12).  All interfaces were 100Mb/s, except
the T3 and a couple of T1s, depending on path.  RTT was about 70mS for all
paths.

Thankfully, the Sniffer traces also showed exactly what the TCPs at both ends
were doing, despite Fast Retransmit, SACK, etc.: a) the typical, default
timeouts were knocking the heck out of throughput; b) the fact that transfers
required many blocks of odd numbers of pkts meant the the Ack Timer at the
receiver was expiring on every block, waiting (~100mS) for the magical
even-numbered last pkt in the block, which never came.  These defaults could
have been changed to gain some performance back, but not much.  The basic idea
that TCP should assume congestion = loss was the Achille's heel.  Even the
silly "ack alternate pkts" concept could have been largely automaticaly
eliminated, if the receiver TCP actually learned that it would always get an
odd number.

In any case, the network management and CIO at this corporation learned just
how important a good transport protocol is, and how careful they need to be of
TCP.

Another example, not for TCP but just for cute hardware-failure tricks,
involved another WAN-connected corporation, whose St. Louis office suddenly
couldn't see all the printers & servers they normally did, especially in the
larger of their offices around the US.  After some Sniffering in LA, it became
clear that certain sizes & types of pkts just weren't getting all the way to
the St. Louis office LAN.  Packets larger than 128B and containing a broadcast
destination weren't getting there.  Everything else was.  Since the servers &
printers announced themselves with Netware SAP b'casts, any site with few
services sent pkts <128B, while those with more services sent large ones. 
What could possibly select the 128B b'casts for demolition?  After the usual,
follow-the-pointing-finger phone discussions with the techs for the WAN
provider and the St. Louis phone company, they reported clean local line-test
results.  Yet, the problem persisted.  Fortunately, the WAN tech was
experienced and knew anything can happen in hardware, so overnight he ran some
special tests into the local phone circuit.  The next am, the St. Louis phone
company suddenly provided a new hardware path to the T1 office drop.  Voila!

I could go on, but the idea is that assumptions about "low" error rates are
completely protocol (hardware through transport) dependent.  We consultants
love all these problems, because we get paid well to be suspicious of
everything and look at every possibility.  I love TCP/IP, because it generates
far more business per node than AppleTalk, Vines, Netware, or even DECnet ever
did.  Even more than Token Ring!  And, that's saying something.  :]

Alex

Detlef Bosau wrote:
> 
> Wesley Eddy wrote:
> >
> > On Tue, Jun 28, 2005 at 02:12:40AM +0200, Detlef Bosau wrote:
> > >
> > > I see the point. But one question remains (admittedly, I did not yet
> > > read the paper, therefore I apologize if you have given the answer
> > > there).
> >
> > There is much better information in the paper than this email provides,
> > but I'll try to answer anyways :).
> >
> 
> Thanks a lot.
> 
> Just one question in advance. What is the rationale behind the error
> rates used in your simulation?
> 
> >From first glance at your thesis and your papers, I see packet
> corruption rates varying from about 0.001 to 0.1. Ist this correct?
> In that case I do not understand your choice. Even 0.1 is far too low
> for mobile networks _without_ RLP. And 0.001 seems to me much too high
> for wirebound networks and for mobile networs _with_ RLP. But I´m sure,
> that Alex will provide additional information here.
> 
> Corruption rates in mobile networks vary on extremely broad ranges. What
> I´ve read so far, some papers use BER (_Block_ Error Rates, because
> radio blocks
> are the entity used by RLP) values 1 %, 2 %, 3 %, 4 %, 5%, other ones
> use 5 %, 10 %, 15 %.
> 
> So let us take 5 % for the moment. This appears to me a reasonable radio
> block corruption rate which can be found in many papers, so hopefully it
> is sometimes met even in reality. (Perhas, some person experienced with
> mobile networks can provide details here.)
> 
> So, if you consider a mobile network _without_ RLP, and if you consider
> for example IP packets of 500 byte = 4000 bits = ca. 23 radio blocks,
> assuming the 171 bit/block I mentioned yesterday as an example, then the
> probability for a packet to remain intact in the presence of a block
> corruption rate 0.05 is
> (1-0.05)^23 = 0.31. In other words: Your IP packet corruption rate is
> about seventy percent. So, for mobile links, I would have expected
> packet
> corruption rates about 0.5, 0.6, 0.7, 0.8, 0.9 to meet realistic values.
> 
> That is why I did not give pure e2e approaches in the "mobile access net
> scenario" further consideration, because with packet corruption rates
> 0.7 or 0.8 congestion control does not matter. It´s simply not the
> problem. The problem are the inacceptable rate of packet retransmissions
> which is not only annoying for the user, because it takes large numbers
> of transmissions and large amounts of time to have a packet eventually
> delivered. It is annoying for the rest of the world as well, because
> even wirebound network links with small corruption rates would be
> occupied by retransmissions.
> 
> This was exactly the moment, where I abolished the idea of using pure
> e2e recovery for TCP including mobile channels.
> 
> In fact, I think, in practical mobile networks the NOs have never given
> it a thought. From what I see, there is no mobile network without RLP,
> even good ol´ GSM
> has a reliable character / byte (?) stream used for various purposes.
> However, it took a long time to see this point. Personally, I have
> thought about pure e2e solutions for lossy channels for a really long
> time.
> 
> I can honestly say that I have thrown away about 3 years of work because
> it suddenly became clear to me that I´ve done
> work for the waste basket.
> 
> Please correct me if I´m wrong there. But I´m totally convinced that in
> really lossy networks, e.g. mobile wireless links, congestion control
> and loss differentiation simply _miss_ the problem.
> 
> In mobile networks, I think it is unevitable to make use of the RLP
> mechanisms which typically decrease packet corruption
> rates to 10^-3 or 10^-9 (sic!), whatever you prefer. (However, I´ve
> never seen a _reliable_ packet transfer there, and I think it´s to avoid
> starvation
> problems caused by "everlasting packets" when there is no possibility to
> restrict the number of sending attempts by a finite limit.)
> 
> >
> > > How do you achieve _fairness_, when beta may vary?
> >
> > The paper splits this into two questions, so that it makes more sense:
> >
> > 1) Are a bunch of competing CETEN flows "fair" to each other?
> > and
> > 2) Are CETEN flows "friendly" to competing "legacy" TCP flows?
> >
> > Define fair to mean "equal sharing of resources", and define friendly (in
> > a way that's a bit different from what TFRC uses) to mean "doesn't reduce
> > the throughput of a normal competing TCP flow any more than another normal
> > TCP flow would."  In other words, by fairness, we mean to say that the
> > enhanced TCP only gains performance improvements from utilizing unused
> > link capacity, not by stealing from competing flows.
> 
> Hm ;-)
> 
> Of course, there are lots of situations, where a Sender cannot exploit
> the capacity of the link. In real situations you will often meet the
> situation that a backbone´s bandwidth by fare exceeds the bandwidth used
> in access links.
> 
> However, in quite a number of situations, links are fully occupied by
> actual flows. In other words: There is no unused link capacity. And in
> fact, in those situations adding a new flow to the network _of_ _course_
> means to take away ressources from existing flows and use them for the
> new one. I remember a slide set from a talk given by Len Kleinrock,
> where he pointed out the directive "Keep the line full!". In fact, this
> is the very basis for affordable network communication. And this holds
> true not only for packet switched networks but for any kind of networks,
> consider e.g. the telephone system.
> 
> 
> > The answers given to these questions in the paper are:
> >
> > 1) Yes, in fact, at high error rates, CETEN flows are more fair to each
> > other than normal TCP flows are.  Under CETEN, each flow has it's own
> > floating point value for beta that's computed from observations of its
> > own TCP behavior and some hints on error rates observed by routers; so
> > it's safe to say that few flows have the same beta, although for most
> > long-lived flows the beta values should be fairly closely grouped.  The
> > paper has experimental (simulation-based) evidence that acceptable
> > fairness can be acheived even if beta isn't totally uniformly
> > distributed.
> 
> O.k.
> 
> One remark from my own ns2-experience: Even in simple dumbbell
> scenarios, with only a few flows, it takes some time for a number of
> flows to reach fairness.
> To my understanding, this was due to the fact that often flows are
> poorly interleaved and therefore "implicit congestion notification"
> (ICN, AKA "droped packets" ;-)) did not reach all senders at the same
> time and sometimes not all senders received the same number of ICN.
> 
> I´m not quite sure, how exactly this matches real networs, because I
> always consider the possibility of simulation artefacts etc.
> 
> Thus, I set value on the theoratical basis here. Although I play around
> with the ns2 a lot (and even my PTE stuff is implemented with the ns2)
> and of
> course _one_ ns2 simulation may yield a counterexample to disprove an
> approach, I personaly would not rely too much on simulation results
> only.
> 
> For the traditional AIMD scheme, it´s quite easy to see that AIMD
> sequences starting from different initial vaules will end up in the
> same sawtooth as long as alpha and beta are equally chosen for all
> competing flows.
> 
> In case of different values for beta, even two competing  AIMD sequences
> starting from a fair share at the starting point (e.g. both
> from zero) would become unfair after the first congestion event and
> never would reach fairness again. If I had a pencil and a piece of paper
> here, I would
> make a little sketch on that matter and the situation would becaome
> clear within three lines or so ;-) The periods between the congestion
> events will become longer and longer, and the flow with the lesser beta
> will disappear in the long run.
> 
> Consinder to starting value (0,0), beta 1 = 1/2, beta 2 = 1/3.
> Congestion event: Sum of two values is 1. Alpha arbitrary but equaly
> chosen for both flows.
> 
> Then the sequence is:
> Start: (0,0)
> Congestion at (1/2, 1/2)
> after congestion handling: (1/4,1/6)
> Congestion at (0.542, 0.459(
> after congestion handling (0.271, 0.153)
> 
> etc...
> 
> I think, my objection is obvious: Different values for beta will never
> reach a fair share.
> 
> --
> Detlef Bosau
> Galileistrasse 30
> 70565 Stuttgart
> Mail: detlef.bosau at web.de
> Web: http://www.detlef-bosau.de
> Mobile: +49 172 681 9937