[e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption

Tue Jul 5 05:58:38 PDT 2005

alok wrote:
> Hi,
> 
> What I want to know is:
> 
> (a)
> If the retransmission/ARQ is entirely offloaded to the end transmitter 
> and receiver (say my PC and your PC if we are doing a peer to peer),
> 
> versus
> 
> (b)
> each transmitter and receiver pair on intermediate hop does the same,

This is the one millione dollar question.

O.k. First of all, the standard reference on this matter is:
Saltzer, Reed, Clark: End-To-End Arguments in System Design. ACM 
Transactions on Computer Systems 2(4), Nov. 1984, pp. 277-288.

(Hopefully the reference is correct. But I think, title and authors are.)

However, "commonly accepted general truths" are similar to the bible. 
Each believer tell´s you, the bible is true. Ask two believers - you 
will hear three truths ;-)

Basically, your question meets _eaxctly_ the point of local recovery in 
mobile wireless networks (you guessed correctly, it´s me again ;-)), 
however I´m not sure whether the resulting design decision is the same.

Some helpful criteria might be:

1.: User perspective: What is the _goodput_ from the user´s point of view?
2.: Fairness perspective: Does a user unduely waste network ressources?
3.: System perspecive: Where could error recovery be done cheapest?

Let´s start with 3. Where could error recovery be done cheapest.

Let me give to propositions, and please correct me here because I do not 
  know the precicese values here. But as fas as I have in mind, on 
backbone routers / switching systems we have

- about 30000 (3^e4) active TCP flows per 100 MBits/s capacity,
- about 100 ns available processing time for a packet on a router. (A 
colleague told me about this some years ago. I personally think this is 
rather dated. I personally would expect 10 ns or even 1 ns)

For these reasons, the general wisdom is to put complexity on the end 
systems if possible.

This is perhaps a possible problem for CETEN, where a correct 
implementation requests floating point calculation for each IP datagram. 
  (Perhaps, one can improve the algorithm in this respect.)

To illustrate the importance of this matter, please consider the IPv6 
header: Because there was no compelling reason for spending this 
processing effort, one has _left_ _out_ the header checksum!

For 3G networks, my position is that the gateways between Internet and 
mobile network are typically quite large computer systems, each one 
serving some few hundreds of flows. In this case, the effort is acceptable.

In satellite networks: I don´t know. Particularly the state variables 
for ARQ in high bandwidth systems may turn out inacceptable high.

2.: Fairness:

If ARQ is placed on the end system, the whole network path "enjoys" 
necessary retransmissions. Particularly, when a packet must be sent 100 
times or more to be successfuly received ad least once, it may increase 
the network performance to plate ARQ on intermediate sywtems.

Once again on 3G networks: Typically, 3G networks are only used as 
access line. So the major part of the path typically resides in the 
wirebound internet. Therefore, it makes sense not to bother ther 
internet with retransmissions. Even more, ARQ in 3G networks is done on 
radio block level, which is more efficient than ARQ on pakcet level.

However, in satellite networks, I can imagine that the bottleneck is 
really the satellite link itself. In that case, it would make only a 
minor difference, if ARQ is placed on IS or ES.

3.: User perspective:

How long does it take for a packet to be delivered?

Again: On a 3G network, the major transission time is spent on the 
Intentet, in case a _RAW_ channel _WITHOUT_ ARQ/RLP is used.
Let´s consider a latency 50 ms and 100 transmissions, than a user will 
see 5 s STT latency for a packet.

When the same packet could be sucessfully delivered via RLP and STT 
would be increased by 100 ms for that reason, STT would be 150 ms. This 
is less than 5 s, and this is preferable to the user.

Satellite networks: Here the major time is spent on the satellite link.

In summary, I´m not quite sure but I can imagine that in satellite 
networks error recovery is left to the end systems. I think the error 
recovery effort for IS can turn out unduly high with not that much 
benefit for fairness and user.

Basically, high costs (1.) are an argument for (a), utilization and good 
user performance (2., 3.) are an argument for (b).

It is a tradeoff.

> 
> How is (a) different from (b) in terms of effective utilization? 
> Obviously it is true if an end point A is talking to B and C :

This is mainly covered by 2. Fairness.

Of couse, the utilization of a link decreases if it is fed up with 
retransmissions only.

I think, the consideration can turn out quite different, depending on 
the actual scenario: E.g. a satellite mobile phone could be attached to 
the Internnet. Or a satellite link could be used for Internet backbone 
connections, perhaps wheather dependent in combination with a fibre link.

As you see, I cannot offer a real "answer" here. My intention is to draw 
attention to the question.

I´ve got the impression that there are typically strong objections 
against doint local recovery in the TCP community. Althouth RLP is 
practically in use for more about a decade now in mobile networks, I 
freuqently see the position that TCP should be run on faw e2e networks 
without any local recovery support.

Perhaps, this impression is wrong. However: I think the decision is not 
easy to make.

DB

-- 
Detlef Bosau
Galileistrasse 30
70565 Stuttgart
Mail: detlef.bosau at web.de
Web: http://www.detlef-bosau.de
Mobile: +49 172 681 9937