[e2e] Agility of RTO Estimates, stability, vulneratibilites

Sun Jul 24 02:08:55 PDT 2005

I´m somwhat confused that apparently hardly anyone is interested in this 
topic. Perhaps, it´s a stupid one. Then, please explain to me why.

Perhaps, I did not pose my questions clear enough. I will give it 
another try.

Q1: What is the semantics of RTO? Is it correct to see RTO as a 
confidence intervall for the RTT?

If so, I´m particulary confused about quite a couple of papers 
concerning "spurious timeouts". Sometimes, I got the impression that 
spurious timeouts are some "strange phenomenon" which was "detected" by 
chance or by accident. I don´t know.

In addition, years ago a professort told me the formula RTO=RTT+2VAR was 
found by "probing", "experiments".

O.k. He is professor, not me.. So he must be right there ;-)

So once again: Is RTO commonly seen as a confidence intervall for RTT or 
not?

Craig wrote:

>> I believe the immediate issue is not the "RTO model" but rather the
>> question of what RTO estimator you use.  In the late 1980s there was
>> a crisis of confidence in RTO estimators -- a problem we dealt with by

What´s the meaning of confidence here?

I use "confidence interval" in its mathematical sense. An interval I is 
a p confidence interfal vor a stochastic variable X, if an instance x of 
X is in the interval I with propability p.

To my understanding, it´s important for competing TCP flows to use 
similar or equal confidence intervals here, otherwise we hardly would 
achieve fairness.

So, we have basically two issues here.

The first one is the robustness issue. How robust are RTO/RTT/.. estimates?

I don´t want to discuss this here, because this is bascially no TCP 
related question. It´s the question if it is at least _possible_ to 
estimate a RTT. And this is a requirement for the network itself and its 
structure. To my knowledge, there are quite a few papers around dealing 
with "self similarity". I´m not quite sure, but if e2e latencies were in 
fact "self similar" (I use "" here because the term "self similar" is 
sometimes used without a satisfactory mathematical definition), we could 
stop the discussion here. In that case, there would be hardly any chance 
to have acceptable RTT estimates. (I´m  no expert here, but estimators 
often converge due to the SLLN or similar theorems and there is an 
assumption "i.i.d." in it. Identically and _independently_ distributed.
In a self similar series of stochastic variables I _strongly_ doubt 
their independence.)

So, at least _one_ assumption for a network is inevitable in order to 
use sender initiated, timeout based retransmission: Convergent 
estimators for the timeout must exist.

Unfortunaly, a priori we do not know about possible limitaions of RTT, 
particularly there is no general upper limit. So it is somewhat 
cumbersome to derive an 1-alpha confidence interval directly from the 
sample here. In fact, it is a common approach in statistics, to derive 
confidence intervals from estimates for expectation and variation of a 
stochastic variable. Often there is some implicit assumption about the 
districution function of this varible, e.g. gaussian.

So, if whe use RTT and VAR (as we do in TCP), we implictly assume that 
estimators for RTT and VAR _exist_.

But in principle, these estimators are not defined by TCP, they are 
_assumed_ by TCP. Bascially, we _assume_ the existence of a RTT/VAR/RTO 
estimators here and then we use them. And hopefully, we use appropriate 
ones for the packet switched network in use.

So once again and very short:

The RTO used in TCP is a confidence interval for RTT.
TCP _assumes_ (if implicitly) the existence of a reasonable RTO estimator.

Is this correct?

O.k.

Then the next steps are:
-Identification of a gerenic estimator, if possible.
-Identification and elimination of vulnerabilities.

>> developing Karn's algorithm (to deal with retransmission ambiguity) and
>> improving the RSRE estimation algorithm with Van Jacobson's replacement.

O.k. Let´s ignore the retransmission ambiguity for the moment.
(An easy way to overcome this would be to mark each TCP datagram sent 
with a unique identifier, which is reflected by the according ACK. 
Particularly, if a TCP datagram is sent more than once, it would be 
given a different identifier each time it is sent.AFAIK this is the 
rationale behaind the "sequence number" in ICMP.)

Q2: What are other vulnerabilities and implicit assumptions?

-Are there assumptions concerning the latency distribution?
-Are there assumptions concerning the latency _stability_? What about 
latency oscillations?

In other words: What is the system model behind the RTT estimators used 
in TCP?

What are the _requirements_ for TCP to work properly? Can we make 
implicit assumptions explicit?

Which requirements must be met by a network so that TCP can work without 
problems?

Is this question stupid? If not: Is there existing work on this issue? 
If so, I would appreciate any hint.

Detlef Bosau

-- 
Detlef Bosau
Galileistrasse 30
70565 Stuttgart
Mail: detlef.bosau at web.de
Web: http://www.detlef-bosau.de
Mobile: +49 172 681 9937