[e2e] Spurious Timeouts, Fact or Fake?

Wed Aug 3 07:08:54 PDT 2011

During the recent past, this list has seen quite some few posts 
regarding TCP RTT measurement.

Now, first of all, I was interested in how often RTT measurments shall 
be made and how they can be made. A particular concern is Karn's Algorithm,
because to my understanding, the consequence of Karn's algorithm is that 
RTT measurements obtained by a single RTT timer can be taken only when a 
sender has no outstanding duplicate packets.

Perhaps, I'm wrong here.

However, from what I've read so far, it is not yet completely clear, how 
often RTT measurements should be made. The alternatives discussed so 
fare are:
- once round,
- each packet.

While the latter appears appealing to me, particularly when implemented 
with time stamps (RFC 1323), which overcomes the problems discussed by 
Karn & Partridge regarding the problem of packets being sent more than 
once, some literature indicates problems with the SRTT estimator when 
time stamps are in use.

Now, the whole discussion is somewhat confusing to me.

1.: Spurious Timeouts are confusing to me, because spurious timeouts 
(i.e. a packet which is well successfully transmitted, however the ACK 
does not reach the sender on time) are basically expected by Edges paper 
and the literature based upon this. However, there are papers around, 
which put the mere existence of spurious timeouts in question, e.g.
author = "Francesco Vacirca and Thomas Ziegler and Eduard Hasenleithner",
title="{TCP Spurious Timeout estimation in
an operational GPRS/UMTS network}",
month="May",
year="2005",
journal = "Forschungszentrum Telekommunikation Wien
Technical Report
FTW-TR-2005-008"
}
, while others give detailed recommendations how to deal with spurious 
timeouts in practical implementations, e.g.
http://tools.ietf.org/search/draft-allman-rto-backoff-02

However, to me the problem seems closely coupled to the underlying 
question whether or not we can estimate the expectation and variance of 
the RTT in a TCP session. Edge requires the according stochastic process 
to be weakly stationary. In other words: In a TCP session, once having 
started and being run for some settling time, the observerd RTT shall 
be, at least roughly, identically distributed.

This distribution should be subject to only very slow and very rare 
change, if at all.

And accourding to RFC 2988, we can obtain SRTT and RTTVAR by RTT samples 
using the well known EWMA estimators for this purpose.

So, my questions are:

1.: How often shall RTTM be made?
2.: Is it reasonable to assume "weakly stationary" RTTs as done by Edge?
3.: Are the EWMA filters from RFC 2988 satisfactory, particularly are 
these sufficiently generic to yield reasonable results for an arbitrary 
TCP session?

One could summarize these to the question: Do we obtain RTO in a 
reasonable way? And when we talk about spurious timeouts, are we talking 
about spurious timeouts - or are we talking about shortcomings of the 
SRTT and RTTVAR estimators here?

I'm somewhat confused here at the moment. And I would appreciate any 
enlightenment ;-)

Detlef

-- 
------------------------------------------------------------------
Detlef Bosau
Galileistraße 30	
70565 Stuttgart                            Tel.:   +49 711 5208031
                                            mobile: +49 172 6819937
                                            skype:     detlef.bosau
                                            ICQ:          566129673
detlef.bosau at web.de                     http://www.detlef-bosau.de
------------------------------------------------------------------