[e2e] Is a non-TCP solution dead?

Tue Apr 1 09:17:24 PST 2003

Reiner, I disagree with a couple of items in a few paragraphs, below...

Alex

Reiner Ludwig wrote:
> 
[clip]
> Having worked on this subject for a couple of years, we have learned that
> TCP  performs *great* most of the times (yes, rare corner cases exist) *if*
> the wireless subnetwork is designed right. On this subject, I can only
> recommend to read draft-ietf-pilc-link-design-13.txt which is a
> soon-to-be-BCP-RFC. RFC3366 provides additional detail on the subject.
...In over 10 years of network consulting with folks owning real networks, the
largest number of transport problems seen were due to how TCP 'works'.  If the
network-layer amelioration stuff (slow start, inability to distinguish error
loss from congestion loss...) were removed from TCP's behavior, the problems
observed would have been absent or not nearly as severe.  This was illustrated
most recently by a 30% slowdown seen at a major corporate WAN, simply because
of a 0.3% loss rate on a single telco link.  The net admin felt 0.3% shouldn't
be an issue.  After all, it's not even bank interest, yet!  He had no idea how
naive TCP's design is.  To suggest that this is a rare corner case begs the
question, since the algorithms in this "standard" TCP at thousands of
companies do exactly the wrong thing.

> 
[clip]
> Exactly! If you follow draft-ietf-pilc-link-design-13.txt and RFC3366, the
> use of persistent link layer ARQ will translate transmission errors on the
> wireless link into congestion, i.e., queueing delays at the wireless link.
...Error means loss.  How does loss become "queueing delay" when lost items
can no longer be in any queue?

> 
> For the case of most *wide-area* wireless links, the packet transmission
> delays across those links often dominate the e2e RTT. Thus, the queueing
> delays caused by transmission errors may often cause large and sudden RTT
> spikes on the order of the e2e RTT. But as you say, TCP's congestion
> control loop is mostly doing fine here. A number of publications confirm
> that. An open issue, though, are the spurious timeouts that the mentioned
> RTT spikes can cause, but that is being addressed in the IETF (TSV WG) with
> the Eifel response algorithm.
...Your algorithm indeed helps, I simply disagree that "TCP's congestion-
control loop is mostly doing fine here", because its intent of "congestion
control" is misplaced.  Adding more algorithmic processing at the wrong layer
is suboptimal.

> 
> >[...] whether TCP does OK depends a lot on buffering and
> >queue management at the wireless hop.
> 
> Exactly! And large queues, as suggested by someone before, is certainly not
> the answer. Instead, the queue size (or the AQM thresholds) should be
> dynamically adapted as the capacity of the wireless link changes as
> per-mobile-host bit rates are switched up and down (on the timescales of an
> e2e RTT). This approach leads to high link utilization, high e2e
> throughput, and low e2e delays. We have recently presented a paper on that
> subject:
...Queues are automatically adjusted as the normal operation of any
packet-switching device.  This is provided in all the major network-processor
chips used by vendors.  Queue size = obligated pkts to send, until a queue is
overrun, or a drop decision is made based on priority.

Alex

>    Mats Sågfors, Reiner Ludwig, Michael Meyer and Janne Peisa, "Queue
>    Management for TCP Traffic over 3G Links", IEEE WCNC 2003.
> 
> ///Reiner