[e2e] end2end-interest Digest, Vol 19, Issue 5

S. Keshav keshav at uwaterloo.ca
Wed Sep 7 15:44:19 PDT 2005

> Q: What exactly causes delay spikes / unduly often spurious timeouts in
> mobile wireless networks?

As your mail indicates, delay spikes are due to link-level (i.e. what is
called 'Radio Link Protocol' or RLP) retransmissions, which attempt to hide
link losses from higher layers. They can _also_ be caused by opportunistic
scheduling, used, for example, in EvDO, where a mobile with a good channel
gets all the resources, causing delays for other mobiles.

A possible (and perhaps only?) way in which a TCP connection gets spurious
timeouts is as follows:

The channel is in a 'good' state wrt a particular mobile, so that a mobile
gets all its packets through with low RTT, and it sets its RTO small. Now,
if the channel goes into a bad state, the mobile will experience a
'spurious' timeout.

While plausible, this scenario may not actually play out in real life (and
in your simulations) for one or more of the following reasons:

1. The mobile may not move, so it does not change from a good to bad area,
staying all the time in a good or bad area.

2. The connection may be too short, so that during the lifetime of a
connection, the mobile does not change channel state.

3. The connection may be long, but the initial RTO may be so high, that
during the connection lifetime the RTO is never short enough to be a problem
even when the mobile has a 'bad' channel state.

4. The variations in the channel state may be so rapid that, from the
perspective of a connection, all it sees is the average channel state, so
the RTO is roughly correct.

5. The RTO may be wrong, but the coarse timeout granularity may be long
enough (it used to be 500ms granularity), so that even the shortest timeout
is not long enough to cause a spurious timeout.

6. Channel conditions may not differ so much in the 'good' and 'bad' states.

So, there are many reasons, why, even with the known variations in channel
delay on a cell phone link, we may not see spurious timeouts. It would be
nice if a cell phone operator reading this list were to actually verify (or
contradict) this using real data traces.

Opportunistic scheduling, will, I think, tend to exacerbate the differences
between 'good' and 'bad' channel state (caveat #6 above). However, the other
causes may yet cause spurious timeouts to be avoided.

I should add that I am not an expert in cell phone links, so my analysis
above is purely seat-of-the-pants, but its looks reasonable, at least to me.



More information about the end2end-interest mailing list