[e2e] Reacting to corruption based loss

Tue Jun 7 12:19:50 PDT 2005

I agree with many of these comments except that the first lines, ending with
"So any key difference should relate to the first impact" are overstated a
bit.  When error drops occur, they often have nothing to do with loading.  So,
their net effect is fully dependent on the protocols involved, both those at
the faulty link's ends and those at the far ends.  As it is today, interfaces
interior to the systems comprising links may or may not correct such physical
errors, but when they are not corrected, the ends that use TCP suffer greatly,
simply because TCP, perhaps even with ECN, knows not what to do except slow
down.  In other words, a lightly-loaded network path, with one bad Tx interior
interface, can drop 1% of its bits/pkts and force the running TCP connection
to slow down more than 10%, while leaving UDP, VPNs, etc. largely unaffected. 
This is one important place where the TCP/IP implementations we now have fall
down on the job.

Alex

"David P. Reed" wrote:
> 
> There are two effects that corruption causes.
> 
> First, it lowers the end-to-end error-free capacity of the channel.
> 
> Second, it causes congestion because it lowers the potential end-to-end
> error-free rate of the channel.
> 
> Clearly, the response of lowering the input rate is one possible way to
> deal with the second phenomenon.   However, this second effect is
> indistinguishable from bottleneck congestion or overload congestion.
> So, given that we have an effective way to deal with transient overload,
> why would "corruption" need a new layered interface change.
> 
> So any key difference should relate to the first impact.
> 
> It is well known that there are good reasons to create codes that cross
> packet boundaries.   So-called erasure codes or digital fountain
> techniques provide the ability on an end-to-end basis to deal with data
> losses that are packet centric.    If errors are "bursty" in time,
> spreading any particular end-to-end bit across several packets (or even
> across several paths with independent failures) is a good end-to-end
> response to corruption.
> 
> So the utility of separation of corruption from overload losses is to be
> able to code better.    Suppose a packet's header is salvageable but its
> data is not (perhaps putting a code on the header, rather than a
> checksum would help here!)   Would it be helpful in improving the
> effective end-to-end capability if decoded at the endpoint?   Absolutely
> - if there are priors that give you a reasonable error model.
> 
> But the real question here is about coding a stream across a network
> with packet corruption.   It probably is better to look at the
> end-to-end perspective, which includes such things as latency (spreading
> a bit across successive packets adds latency when decoded at the
> receiver) and control-loop latency (how fast can the endpoints change
> coding of a stream to spread across more packets and more paths,
> compared to a more local, rapid, link-level response).
> 
> The observation that 802.11 slows rates automatically based on link
> quality points out the issue here - such a local tactic improves all
> end-to-end paths with one fell-swoop, whereas there is the possiblity
> that end-to-end responses will be too slow, or else drive each other
> into mutual instability if the rate of change of link quality varies
> faster than the end-to-end control loop timing can resolve.
> 
> I'd argue that intuitions of most protocol designers are weak here,
> because the state of the system as a whole is not best managed either at
> the link level or at the end-to-end "session" level - but at the whole
> network level.   RED and ECN are decentralized "network level" control
> strategies - which end up providing a control plane that is implicit
> among all those who share a common bottleneck link.   SImilarly, coding
> strategies that can deal with "corruption" require a "network level"
> implicit control, not an intuitive fix focused on the TCP state machine.