[e2e] Some thoughts on WLAN etc., was: Re: RES: Why Buffering?

Sun Jul 5 05:44:56 PDT 2009

Lachlan Andrew wrote:
> Greetings Detlef,
>
> 2009/7/5 Detlef Bosau <detlef.bosau at web.de>:
>   
>> Lachlan Andrew wrote:
>>     
>>> "a period of time over which all packets are lost, which extends for
>>> more than a few average RTTs plus a few hundred milliseconds".
>>>       
>> I totally agree with you here in the area of fixed networks, actually we use
>> hello packets and the like in protocols like OSPF. But what about outliers
>> in the RTT on wireless networks, like my 80 ms example?
>>     
>
> That is why I said "plus a few hundred milliseconds".

Now, how large is "a few"?

Not to be misunderstood: There are well networks, where a link state can 
be determined.
E.g.:
- Ethernet, Normal Link Pulse,
- ISDN, ATM, where we have a continuous bit flow,
- HSDPA, where we have a continuous symbol flow on the pilot channel in 
downlink direction and responses from the mobile stations in uplink 
direction.

In all these networks, we have continuous or short time periodic traffic 
on the link and this traffic is reflected by responses in a quite well 
known period of time. In addition, the behaviour of hello-response seems 
does not depend on any specific traffic. In Ethernet or ATM, a link, our 
a link outage respectively, is detected even when no traffic from upper 
layers exist.

In some sense, this even holds true for HSDPA, when we define a HSDPA 
link to be "down", when the base station does not receive CQI 
indications any longer.

I'm not quite sure (to be honest: I don't really know) whether similar 
mechanisms are available e.g. for Ad Hoc Networks.
Particularly as we well know of hidden terminal / hidden station 
problems, where stations in a wireless network even see each other.

>   You're right
> that outliers are common in wireless, which is why protocols to run
> over wireless need to be able to handle such things.
>
>   
Exactly.

So, we come to an important turn in the discussion. It's not only the 
question whether we can detect a link outage.
The question is: How do we deal with a link outage?

In wireline networks, link outages are supposed to be quite rare. 
(Nevertheless, the consequences may be painful.)
In contrast to that, link outages are extremely common in MANETs. 
Actually, we have to ask what the term "link" and the term "link outage" 
or "disconnection" shall mean in MANETs.

For example, think of TCP. How does TCP deal with a link outage?

Now, if this were a German mailing list and I came from Cologne, I would 
write: "Es is wie es is und et kütt wie et kütt."
More internationally spoken: "Don't worry, be happy."

If the path is finally broken, the TCP flow is broken as well.

If there is an alternative path and the routing is adjusted by some 
mechanism, the TCP flow will continue.

Of course, there may be packet loss. So, TCP will do packet retransmissions.
Of course, the path capacity may change. So, TCP will reassess the path 
capacity. Either by slow start or by one or several 3 D ACK / fast 
retransmit, fast recovery cycles.
Of course, the throughput may change. Thats the least problem of all, 
because its automatically fixed by the ACK clocking mechanism.
Of course, the RTT may change. So, the timers have to converge to a new 
expectation.

There will me some rumbling, more or less, but afterwards, TCP will keep 
on going.

Either way, there is no smart guy to tell TCP "there is a short time 
disconnection." Hence, there is no explicit mechanism in TCP to deal 
with short time disconnections. Because the TCP mechanisms as they are 
work fine - even when short time disconnections and path changes occur. 
There is no need for some "short time disconnection handling".

Of course, this will rise the question whether TCP as is can be suitable 
for MANETs, because we can well put in question whether e.g. the RTO 
estimation and the CWND assessment algorithms in TCP will hold in the 
presence of volatile paths with volatile characteristics.

TCP is supposed work with a connectionless packet transport mechanism 
with "reasonbly quasistationary characteristics"  and a packet loss 
ratio, we can reasonably live with.

Or for the people in Cologne: "Es is wie es is und et kütt wie et kütt."

>> Was there a "short time disconnection" then?
>> Certainly not, because the system was busy to deliver the packet all the
>> time.
>>     
>
> From the higher layer's point of view, it doesn't matter much whether
> the underlying system was working hard or not... 

Correct. From the higher layer's point of view, the questions are:
- is the packet acknowledged at all?
- is the round trip time "quasistationary" (=> Edge's paper).
- is the packet order maintained or should we adapt the dupackthreshold?
- more TCP specific: Is the MSS size appropriate or should it be changed?
>  If the outlier were
> more extreme, then I'd happily call it a short term disconnection, and
> say that the higher layers need to be able to handle it.
>
>   

Question: Should we _actively_ _handle_ it (e.g. Freeze TCP?) or should 
we build protocols sufficiently robust, so that protocols can implicitly 
cope with short time disconnections?
>> So the problem is not a "short time disconnection", the problem is that
>> timeouts don't work
>>     
>
> Timeouts are part of the problem.  Another problem is reestablishing
> the ACK clock after the disconnection.
>
>   

Hm. Where is the problem with the ACK clock?

If, the problem could be (and I'm not quite sure about WLAN here) that a 
TCP downlink may use more than one paths in parallel. Hence, there may 
be three packets delivered along three different paths - and a sender in 
the wireline network sees three ACKs and hence sends three packets....

However, in the normal "single path scenario", I don't see a severe 
problem. Or do I miss something?

>> Actually, e.g. in TCP, we don't deal with "short time disconnections"
>>     
>
> There may not be an explicit mechanism to deal with them.  I think
> that the earlier comment that they are more important than random
> losses is saying that we *should* perhaps deal with them (somehow), or
> at least include them in our models.
>   

I'm actually not convinced that short time disconnections are more 
important than random losses.

If this was the attitude of the reviewers who rejected my papers, I 
would suppose they would try to tease me.

Of course, I could redefine any random loss to be a short time 
disconnection - hence there wouldn't be any random loss at all.

However, this would be some nasty kind of hair splitting.

I think, the perhaps most important lesson from my experience from last 
week is that we must not suppose
one wireless problem to be more important than others.

Of course this puts in question mainly the opportunistic scheduling work 
which assumes that there is only Rayleigh Fading
and despite the useful, well behaved, periodic and predictable Rayleigh 
Fading for evenly moving mobiles, there is no other disturbance on the 
wireless channel.

Of course, many students earn there "hats" that way, but the more I 
think about it, the less I believe that this really reflects reality.

Detlef
>   
>> So, the basic strategy of "upper layers" to deal with short time
>> disconnections, or latencies more than average, is simply not to deal with
>> them - but to ignore them.
>>
>> What about a path change? Do we talk about a "short time disconnection" in
>> TCP, when a link on the path fails and the flow is redirected then? We
>> typically don't worry.
>>     
>
> Those delays are typically short enough that TCP handles them OK.  If
> we were looking at deploying TCP in an environment with common slow
> redirections, then we should certainly check that it handles those
> short time disconnections.
>
>   
>> To me, the problem is not the existence  - or non existence - of short time
>> disconnections at all but the question why we should _explicitly_ deal with
>> a phenomenon where no one worries about?
>>     
>
> The protocol needn't necessarily deal with them explicitly, but we
> should explicitly make sure that it handles them OK.
>
>   
>> Isn't it sufficient to describe the corruption probability?
>>     
>
> No, because that ignores the temporal correlation.  You say that the
> Gilbert-Elliot model isn't good enough, but an IID model is orders of
> magnitude worse.
>
> Cheers,
> Lachlan
>
>   

-- 
Detlef Bosau		Galileistraße 30	70565 Stuttgart
phone: +49 711 5208031	mobile: +49 172 6819937	skype: detlef.bosau	
ICQ: 566129673		http://detlef.bosau@web.de