From detlef.bosau at web.de  Mon Jan  1 04:48:19 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 01 Jan 2007 13:48:19 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
Message-ID: <45990313.1050003@web.de>

Fred Baker wrote:
>
>
> I wonder where you got the notion that a typical session had a 10 ms 
> RTT. In a LAN environment where the servers are in the same
It?s just a number.  However, it?s the magnitude. Not the exact number. 
In your example below you have about 
20 ms,
100 ms,
200 ms
RTT.

If we again consider one outstanding segment with 12000 bit we have the 
following rates (approximately)
600 kbps
120 kbps
60 kbps

It?s not the question whether this is optimal. It?s the question: Does 
this happen in a relevant number of cases?
Particularly in downloads from mobile devices which quite often do not 
offer larger bandwidth.

So to your question why TCP should tune itself to only one outstanding 
segment: The reason could be the limited bandwidth the node, e.g. a 
mobile node, can handle.


Detlef


From detlef.bosau at web.de  Mon Jan  1 04:49:41 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 01 Jan 2007 13:49:41 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu>
Message-ID: <45990365.4080708@web.de>

Medhi, Deep wrote:
> See 
>
> John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the Performance of HTTP Over Several Transport Protocols." ACM/IEEE Transactions on Networking, vol. 5, pp. 616-630, October, 1997. 
>
> This covers maximum usable window size for different transmission media.
>
> 	-- Deep
>   
Unfortunately, I don?t have an ACM account. Is it possible to send me a 
copy? Perhaps Joe?

Thanks a lot!


From detlef.bosau at web.de  Mon Jan  1 08:36:27 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 01 Jan 2007 17:36:27 +0100
Subject: [e2e] Thanks a lot for the copies! Re: Are we doing sliding window
 in the Internet?
In-Reply-To: <45990365.4080708@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu>
	<45990365.4080708@web.de>
Message-ID: <4599388B.2050904@web.de>

I just received two copies of the paper. Many thanks to all!


Detlef

Detlef Bosau wrote:
> Medhi, Deep wrote:
>> See
>> John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the 
>> Performance of HTTP Over Several Transport Protocols." ACM/IEEE 
>> Transactions on Networking, vol. 5, pp. 616-630, October, 1997.
>> This covers maximum usable window size for different transmission media.
>>
>>     -- Deep
>>   
> Unfortunately, I don?t have an ACM account. Is it possible to send me 
> a copy? Perhaps Joe?
>
> Thanks a lot!
>
>


From pingali at ISI.EDU  Tue Jan  2 10:31:29 2007
From: pingali at ISI.EDU (Venkata Pingali)
Date: Tue, 02 Jan 2007 10:31:29 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
Message-ID: <459AA501.8050901@isi.edu>


A few months back we collected some
per-connection data in both client
and server modes. We thought you might
be interested in the preliminary results.

We collected data in two modes/configurations.
In the client mode we configured Apache to
be a web proxy  and in the server mode we
configured Apache to serve an actual website.
The basic results, which must be only considered
as being indicative/hints of the reality, are
as follows:

Server end (i.e, end that has large
amount of data to transfer):

     - Most connections are short (90% < 1sec)
     - MaxCwnd is < 5KB in > 80% of cases
     - MaxRTT is distributed almost uniformly
       in the 0-400ms range.

Client end (i.e., the end receiving data):

     - ~ 90% of connections see MaxCwnd < 5KB
     - < 1% connections see MaxCwnd > 10KB
     - 90% of connections have MaxRTT < 100ms

There are some problems with the data:

     - limited scenarios (web based)
     - small sample sizes (21K for server, 150K
       for client)
     - the website has non-standard distribution
       of file types and sizes

You can find the various graphs here:
http://www.isi.edu/aln/e2e.ppt

Venkata Pingali
http://www.isi.edu/aln


Fred Baker wrote:
> yes and no.
> 
> A large percentage of sessions are very short - count the bytes in this 
> email and consider how many TCP segments are required to carry it, for 
> example, or look through your web cache to see the sizes of objects it 
> stores. We are doing the sliding window algorithm, but it cuts very 
> short when the TCP session abruptly closes.
> 
> For longer exchanges - p2p and many others - yes, we indeed do sliding 
> window.
> 
> I don't see any reason to believe that TCPs tune themselves to have 
> exactly RTT/MSS segments outstanding. That would be the optimal number 
> to have ourstanding, but generally they will have the smallest of { the 
> offered window, the sender's maximum window, and the used window at 
> which they start dropping traffic }. If they never see loss, they can 
> keep an incredibly large amount of data outstanding regardless of the 
> values of RTT and MSS.
> 
> I wonder where you got the notion that a typical session had a 10 ms 
> RTT. In a LAN environment where the servers are in the same building, 
> that is probably the case. But consider these rather more typical 
> examples: across my VPN to a machine at work, across the US to MIT, and 
> across the Atlantic to you:
> 
> [stealth-10-32-244-218:~] fred% traceroute irp-view7
> traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40 byte 
> packets
> 1  fred-vpn (10.32.244.217)  1.486 ms  1.047 ms  1.034 ms
> 2  n003-000-000-000.static.ge.com (3.7.12.1)  22.360 ms  20.962 ms  
> 22.194 ms
> 3  10.34.251.137 (10.34.251.137)  23.559 ms  22.586 ms  22.236 ms
> 4  sjc20-a5-gw2 (10.34.250.78)  21.465 ms  22.544 ms  20.748 ms
> 5  sjc20-sbb5-gw1 (128.107.180.105)  22.294 ms  22.351 ms  22.803 ms
> 6  sjc20-rbb-gw5 (128.107.180.22)  21.583 ms  22.517 ms  24.190 ms
> 7  sjc12-rbb-gw4 (128.107.180.2)  22.115 ms  23.143 ms  21.478 ms
> 8  sjc5-sbb4-gw1 (171.71.241.253)  26.550 ms  23.122 ms  21.569 ms
> 9  sjc12-dc5-gw2 (171.71.241.66)  22.115 ms  22.435 ms  22.185 ms
> 10  sjc5-dc3-gw2 (171.71.243.46)  22.031 ms  21.846 ms  22.185 ms
> 11  irp-view7 (171.70.65.144)  22.760 ms  22.912 ms  21.941 ms
> 
> [stealth-10-32-244-218:~] fred% traceroute www.mit.edu
> traceroute to www.mit.edu (18.7.22.83), 64 hops max, 40 byte packets
> 1  fred-vpn (10.32.244.217)  1.468 ms  1.108 ms  1.083 ms
> 2  172.16.16.1 (172.16.16.1)  11.994 ms  10.351 ms  10.858 ms
> 3  cbshost-68-111-47-251.sbcox.net (68.111.47.251)  9.238 ms  19.517 ms  
> 9.857 ms
> 4  12.125.98.101 (12.125.98.101)  11.849 ms  11.913 ms  12.086 ms
> 5  gbr1-p100.la2ca.ip.att.net (12.123.28.130)  12.348 ms  11.736 ms  
> 12.891 ms
> 6  tbr2-p013502.la2ca.ip.att.net (12.122.11.145)  15.071 ms  13.462 ms  
> 13.453 ms
> 7  12.127.3.221 (12.127.3.221)  12.643 ms  13.761 ms  14.345 ms
> 8  br1-a3110s9.attga.ip.att.net (192.205.33.230)  13.842 ms  12.414 ms  
> 12.647 ms
> 9  ae-32-54.ebr2.losangeles1.level3.net (4.68.102.126)  16.651 ms 
> ae-32-56.ebr2.losangeles1.level3.net (4.68.102.190)  20.154 ms *
> 10  * * *
> 11  ae-2.ebr1.sanjose1.level3.net (4.69.132.9)  28.222 ms  24.319 ms 
> ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2)  35.417 ms
> 12  ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2)  25.640 ms  22.567 ms *
> 13  ae-3.ebr1.denver1.level3.net (4.69.132.58)  52.275 ms  60.821 ms  
> 54.384 ms
> 14  ae-3.ebr1.chicago1.level3.net (4.69.132.62)  68.285 ms 
> ae-1-100.ebr2.denver1.level3.net (4.69.132.38)  59.113 ms  68.779 ms
> 15  * * *
> 16  * ae-7-7.car1.boston1.level3.net (4.69.132.241)  94.977 ms *
> 17  ae-7-7.car1.boston1.level3.net (4.69.132.241)  95.821 ms 
> ae-11-11.car2.boston1.level3.net (4.69.132.246)  93.856 ms 
> ae-7-7.car1.boston1.level3.net (4.69.132.241)  96.735 ms
> 18  ae-11-11.car2.boston1.level3.net (4.69.132.246)  91.093 ms  92.125 
> ms 4.79.2.2 (4.79.2.2)  95.802 ms
> 19  4.79.2.2 (4.79.2.2)  93.945 ms  95.336 ms  97.301 ms
> 20  w92-rtr-1-backbone.mit.edu (18.168.0.25)  98.246 ms www.mit.edu 
> (18.7.22.83)  93.657 ms w92-rtr-1-backbone.mit.edu (18.168.0.25)  92.610 ms
> 
> [stealth-10-32-244-218:~] fred% traceroute web.de
> traceroute to web.de (217.72.195.42), 64 hops max, 40 byte packets
> 1  fred-vpn (10.32.244.217)  1.482 ms  1.078 ms  1.093 ms
> 2  172.16.16.1 (172.16.16.1)  12.131 ms  9.318 ms  8.140 ms
> 3  cbshost-68-111-47-251.sbcox.net (68.111.47.251)  10.790 ms  9.051 ms  
> 10.564 ms
> 4  12.125.98.101 (12.125.98.101)  13.580 ms  21.643 ms  12.206 ms
> 5  gbr2-p100.la2ca.ip.att.net (12.123.28.134)  12.446 ms  12.914 ms  
> 12.006 ms
> 6  tbr2-p013602.la2ca.ip.att.net (12.122.11.149)  13.463 ms  12.711 ms  
> 12.187 ms
> 7  12.127.3.213 (12.127.3.213)  185.324 ms  11.845 ms  12.189 ms
> 8  192.205.33.226 (192.205.33.226)  12.008 ms  11.665 ms  25.390 ms
> 9  ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65)  13.695 ms 
> ae-1-51.bbr1.losangeles1.level3.net (4.68.102.1)  11.645 ms 
> ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65)  12.517 ms
> 10  ae-1-0.bbr1.frankfurt1.level3.net (212.187.128.30)  171.886 ms 
> as-2-0.bbr2.frankfurt1.level3.net (4.68.128.169)  167.640 ms  168.895 ms
> 11  ge-10-0.ipcolo1.frankfurt1.level3.net (4.68.118.9)  170.336 ms 
> ge-11-1.ipcolo1.frankfurt1.level3.net (4.68.118.105)  174.211 ms 
> ge-10-1.ipcolo1.frankfurt1.level3.net (4.68.118.73)  169.730 ms
> 12  gw-megaspace.frankfurt.eu.level3.net (212.162.44.158)  169.276 ms  
> 170.110 ms  168.099 ms
> 13  te-2-3.gw-backbone-d.bs.ka.schlund.net (212.227.120.17)  171.412 ms  
> 171.820 ms  170.265 ms
> 14  a0kac2.gw-distwe-a.bs.ka.schlund.net (212.227.121.218)  175.416 ms  
> 173.653 ms  174.007 ms
> 15  ha-42.web.de (217.72.195.42)  174.908 ms  174.921 ms  175.821 ms
> 
> 
> On Dec 31, 2006, at 11:15 AM, Detlef Bosau wrote:
> 
>> Happy New Year, Miss Sophy My Dear!
>>
>> (Although this sketch is in Englisch, it is hardly known outside 
>> Germay to my knowledge.)
>>
>> I wonder whether we?re really doing sliding window in TCP connections 
>> all the time or whether a number of connections have congestion 
>> windows of only one segment, i.e. behave like stop?n wait in reality.
>>
>> When I assume an  Ethernet like MTU, i.e. 1500 byte = 12000 bit, and 
>> 10 ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps.
>>
>> From this I would expect that in quite a few cases a TCP connection 
>> will have a congestion window of 1 MSS or even less.
>>
>> In addition, some weeks ago I read a paper, I don?t remember were, 
>> that we should reconsider and perhaps resize our MTUs to larger values 
>> for networks with large bandwidth. The rationale was simply as 
>> follows: The MTU size is always a tradeoff between overhead and 
>> jitter. From Ethernet we know that we can accept a maximum packet 
>> duration of 12000 bit / (10 Mbps) = 1.2 ms  and the resultig jitter. 
>> For Gigabit Ethernet
>> a maximum packet duration of 1.2 ms would result in a MTU size of 1500 
>> kbyte = 1.5 Mbyte.
>>
>> If so, we would see "stop?n wait like" connections much more 
>> frequently than today.
>>
>> Is this view correct?
>>


From detlef.bosau at web.de  Tue Jan  2 11:52:03 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Tue, 02 Jan 2007 20:52:03 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459AA501.8050901@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu>
Message-ID: <459AB7E3.7010705@web.de>

Venkata Pingali wrote:
>
>
> Server end (i.e, end that has large
> amount of data to transfer):
>
>     - Most connections are short (90% < 1sec)

Do you have any knowledge of the number of "rounds" the TCP connection 
has seen?  A couple of years ago I saw some similar result (don?t no the 
source at the moment) where 90 % of connections consist of not more than 
20 packets.

Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then 
we have:

Round    CWND
    1              2
    2              4
    3              8
                            total of 14 packets up to now
    4             16
                            total of 24 packets up to now,

thus many flows will finisch before the end of the fourth round which 
would correspond to a CWND of about 6 kByte, 1500 byte MSS assumed.

In short words: Quite a few connections are finished before the end of 
the fist slow start period.

Does this match your observations?

>     - MaxCwnd is < 5KB in > 80% of cases
>     - MaxRTT is distributed almost uniformly
>       in the 0-400ms range.
>
> Client end (i.e., the end receiving data):
>
>     - ~ 90% of connections see MaxCwnd < 5KB
>     - < 1% connections see MaxCwnd > 10KB
>     - 90% of connections have MaxRTT < 100ms
>

Oh, I love it :-)

Last year I had a long argument with someone who told me about the 
benefits of window scaling :-) He talked about extremely large CWNDs by 
several dozens or hundreds of MByte :-)

O.k., that?s a different story because we are talking about greedy 
sources than. However, if that colleague was the only one to activate 
window scaling while surfing from the US and A to good ol? Europe and 
Cisco et al. had buried hundreds of megabytes of useless queue memory in 
their hardware *blush* this guy perhaps filled the queues the first time 
ever, following the good old paradigm: "Keep the queue full" and that 
way of course outperformed his competitors hopelessly ;-)

> There are some problems with the data:
>
>     - limited scenarios (web based)
>     - small sample sizes (21K for server, 150K
>       for client)
>     - the website has non-standard distribution
>       of file types and sizes
>

At least it exists. And reality is often more convincing than standards. 
Particularly in cases were both disagree.


> You can find the various graphs here:
> http://www.isi.edu/aln/e2e.ppt

Just a question: Is it possible to export those slides to a common 
readable format like PDF? I don?t have any M$ products in use here and 
when I opten PowerPoint slides with OpenOffice the results are sometimes 
interesting, sometimes surprising, sometimes hopeless, but nearly always 
quite different from what you wrote :-)

Regards

Detlef


From pingali at ISI.EDU  Tue Jan  2 12:29:55 2007
From: pingali at ISI.EDU (Venkata Pingali)
Date: Tue, 02 Jan 2007 12:29:55 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459AB7E3.7010705@web.de>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
Message-ID: <459AC0C3.30103@isi.edu>

Detlef Bosau wrote:
> Venkata Pingali wrote:
>>
>>
>> Server end (i.e, end that has large
>> amount of data to transfer):
>>
>>     - Most connections are short (90% < 1sec)
> 
> Do you have any knowledge of the number of "rounds" the TCP connection 
> has seen?  A couple of years ago I saw some similar result (don?t no the 
> source at the moment) where 90 % of connections consist of not more than 
> 20 packets.


Our sample shows that 94% of connections
have < 20 packets - when observed from the
server end.

     Number of Packets       Percentile of Connections

      3                         4%
      4                         55%
      5                         69%
      10                        87%
      20                        94%


I have included the new graph and generated pdfs.

http://www.isi.edu/aln/e2e.pdf
http://www.isi.edu/aln/e2e.ppt

> 
> Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then 
> we have:
> 
> Round    CWND
>    1              2
>    2              4
>    3              8
>                            total of 14 packets up to now
>    4             16
>                            total of 24 packets up to now,
> 
> thus many flows will finisch before the end of the fourth round which 
> would correspond to a CWND of about 6 kByte, 1500 byte MSS assumed.
> 
> In short words: Quite a few connections are finished before the end of 
> the fist slow start period.
> 
> Does this match your observations?

Yes.

About 90-95% finished before slow start
completed - often within the first two round
trips. About 3-4% of connections lasted for a
long time (several secs - minutes). But there
is an interesting category of connections that
last beyond the slow start but not for very
long. These connections, it turns, carry a large
chunk of the data (40+%) and most of the time in
these connections is spent in slow start.


> 
>>     - MaxCwnd is < 5KB in > 80% of cases
>>     - MaxRTT is distributed almost uniformly
>>       in the 0-400ms range.
>>
>> Client end (i.e., the end receiving data):
>>
>>     - ~ 90% of connections see MaxCwnd < 5KB
>>     - < 1% connections see MaxCwnd > 10KB
>>     - 90% of connections have MaxRTT < 100ms
>>
> 
> Oh, I love it :-)
> 
> Last year I had a long argument with someone who told me about the 
> benefits of window scaling :-) He talked about extremely large CWNDs by 
> several dozens or hundreds of MByte :-)

Dont know if it is correct to extrapolate from the
same that we have but the MaxCwnd graph seems to
plateau as the connection length increases (bytes
or packets).


> 
> O.k., that?s a different story because we are talking about greedy 
> sources than. However, if that colleague was the only one to activate 
> window scaling while surfing from the US and A to good ol? Europe and 
> Cisco et al. had buried hundreds of megabytes of useless queue memory in 
> their hardware *blush* this guy perhaps filled the queues the first time 
> ever, following the good old paradigm: "Keep the queue full" and that 
> way of course outperformed his competitors hopelessly ;-)
> 
>> There are some problems with the data:
>>
>>     - limited scenarios (web based)
>>     - small sample sizes (21K for server, 150K
>>       for client)
>>     - the website has non-standard distribution
>>       of file types and sizes
>>
> 
> At least it exists. And reality is often more convincing than standards. 
> Particularly in cases were both disagree.
> 
> 
>> You can find the various graphs here:
>> http://www.isi.edu/aln/e2e.ppt
> 
> Just a question: Is it possible to export those slides to a common 
> readable format like PDF? I don?t have any M$ products in use here and 
> when I opten PowerPoint slides with OpenOffice the results are sometimes 
> interesting, sometimes surprising, sometimes hopeless, but nearly always 
> quite different from what you wrote :-)
> 
> Regards
> 
> Detlef
> 


From touch at ISI.EDU  Tue Jan  2 16:14:50 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 02 Jan 2007 16:14:50 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459AB7E3.7010705@web.de>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>
Message-ID: <459AF57A.5080304@isi.edu>


Detlef Bosau wrote:
> Venkata Pingali wrote:
>>
>>
>> Server end (i.e, end that has large
>> amount of data to transfer):
>>
>>     - Most connections are short (90% < 1sec)
> 
> Do you have any knowledge of the number of "rounds" the TCP connection
> has seen?  A couple of years ago I saw some similar result (don?t no the
> source at the moment) where 90 % of connections consist of not more than
> 20 packets.
> 
> Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then
> we have:

I don't know if the current code starts with 2 MSS; it could start with 4.

> Round    CWND
>    1              2
>    2              4
>    3              8
>                            total of 14 packets up to now
>    4             16
>                            total of 24 packets up to now,

It doesn't double each RTT; it goes up by 50%. Remember, the window
grows by one MSS each ACK during the initial phase, but there is one ACK
for each two MSS's.

I.e., the sequence should be:

round	CWND
1	2	(assuming it starts with 2)
2	3
3	4
4	6
5	9
6	13

This assumes that the congestion window hasn't kicked in, at which point
the growth would be 1 MSS per round (RTT).

FYI,Internet MSS's are usually in the 500-byte range in general. A 5KB
file would take 10 packets and be over by the 4th round.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/73b105f8/signature.bin

From touch at ISI.EDU  Tue Jan  2 18:55:05 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 02 Jan 2007 18:55:05 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
References: <45980C60.9020405@web.de>	
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>	
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
Message-ID: <459B1B09.40301@isi.edu>


Lachlan Andrew wrote:
> Greetings,
> 
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>>
>> Detlef Bosau wrote:
>> > Round    CWND
>> >    1              2
>> >    2              4
>> >    3              8
>>
>> It doesn't double each RTT; it goes up by 50%. Remember, the window
>> grows by one MSS each ACK during the initial phase, but there is one ACK
>> for each two MSS's.
> 
> If you have ABC (as recent Linux senders do by default), or don't use

ABC is EXPERIMENTAL.

> delayed ACKs (as Linux receivers don't when the window is small),

Delayed ACKs are strongly encouraged.

Both good reasons to fix these bugs in Linux.

> Detlef was right that it doubles each RTT.

Right - noncompliant or nonstandard implementations can do various other
things.

Joe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/db75defd/signature.bin

From ian.mcdonald at jandi.co.nz  Tue Jan  2 19:58:09 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Wed, 3 Jan 2007 16:58:09 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B1B09.40301@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
Message-ID: <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com>

> > If you have ABC (as recent Linux senders do by default), or don't use
>
> ABC is EXPERIMENTAL.
>
And ABC is now off by default on even later kernels as basically the
congestion window didn't grow with how the whole code base interacted.

Can't comment on the delayed acks as don't know that part of the code so well.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From touch at ISI.EDU  Tue Jan  2 20:11:05 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 02 Jan 2007 20:11:05 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	<459B1B09.40301@isi.edu>
	<5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com>
Message-ID: <459B2CD9.3030509@isi.edu>

Ian McDonald wrote:
>> > If you have ABC (as recent Linux senders do by default), or don't use
>>
>> ABC is EXPERIMENTAL.
>>
> And ABC is now off by default on even later kernels as basically the
> congestion window didn't grow with how the whole code base interacted.

That's not how "experimental" is intended by the IETF, i.e., it's not a
patch to other bugs.

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/0c91a4a9/signature.bin

From ian.mcdonald at jandi.co.nz  Tue Jan  2 20:25:38 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Wed, 3 Jan 2007 17:25:38 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B2CD9.3030509@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com>
	<459B2CD9.3030509@isi.edu>
Message-ID: <5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com>

On 1/3/07, Joe Touch <touch at isi.edu> wrote:
> Ian McDonald wrote:
> >> > If you have ABC (as recent Linux senders do by default), or don't use
> >>
> >> ABC is EXPERIMENTAL.
> >>
> > And ABC is now off by default on even later kernels as basically the
> > congestion window didn't grow with how the whole code base interacted.
>
> That's not how "experimental" is intended by the IETF, i.e., it's not a
> patch to other bugs.
>
I understand that since I'm working on an experimental protocol myself.

I'm only the messenger here. As I understand it (and I could be wrong)
Linux deals with the cases fairly well that ABC is trying to solve. To
get ABC into the kernel by default some of the other code would have
to be changed and nobody has done that yet. If someone does that and
can convince others it can go back in..

-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From touch at ISI.EDU  Tue Jan  2 20:38:05 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 02 Jan 2007 20:38:05 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	<459B1B09.40301@isi.edu>	<5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com>	<459B2CD9.3030509@isi.edu>
	<5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com>
Message-ID: <459B332D.4040302@isi.edu>


Ian McDonald wrote:
> On 1/3/07, Joe Touch <touch at isi.edu> wrote:
>> Ian McDonald wrote:
>> >> > If you have ABC (as recent Linux senders do by default), or don't
>> use
>> >>
>> >> ABC is EXPERIMENTAL.
>> >>
>> > And ABC is now off by default on even later kernels as basically the
>> > congestion window didn't grow with how the whole code base interacted.
>>
>> That's not how "experimental" is intended by the IETF, i.e., it's not a
>> patch to other bugs.
>>
> I understand that since I'm working on an experimental protocol myself.
> 
> I'm only the messenger here. As I understand it (and I could be wrong)
> Linux deals with the cases fairly well that ABC is trying to solve. To
> get ABC into the kernel by default some of the other code would have
> to be changed and nobody has done that yet. If someone does that and
> can convince others it can go back in..

ABC should NOT be "ON" by default.

As to whether it should be in the kernel at all, or how it interacts
with the code base, that's an implementation issue. I appreciate the
complexities, but the decision of whether to use it or not should be
made solely on whether it is recommended for widescale deployment or not.

Thanks for the update; it's worrisome that Linux's defaults are that
ephemeral, though.


----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/9a3b7181/signature.bin

From touch at ISI.EDU  Tue Jan  2 22:07:48 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 02 Jan 2007 22:07:48 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
References: <45980C60.9020405@web.de>	
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>	
	<459AF57A.5080304@isi.edu>	
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
Message-ID: <459B4834.1050304@isi.edu>


Lachlan Andrew wrote:
> Greetings,
> 
> This is probably not related to the original thread (on what happens
> in real networks, as distinct from what *should* happen), but the word
> "bug" bugged me...
> 
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
...
>> > delayed ACKs (as Linux receivers don't when the window is small),
>>
>> Delayed ACKs are strongly encouraged.
>> Both good reasons to fix these bugs in Linux.
> 
> I don't follow the logic of that at all.

Please review RFC2581.

>  Linux deliberatly suppresses
> delayed ACKs when it guesses that the sender is in slow start, which
> sems generally correct, judging by the earlier posts in this thread.

Whether it's interpreted as correct by this email list, it is NOT what
the IETF currently recommends.

> In that phase, they harm performance, by making slow-start even slower
> than it was intended to be.  Increasing the initial speed of slow
> starts helps short flows at no long term cost to ongoing long flows.
> When the window is large, Linux does use delayed ACKs, for the reasons
> given in the RFCs.  Since this is fully standards compliant, I don't
> see how it can be called a bug.
> 
> The fact that something is "encouraged" doesn't *of itself* seem a
> good reason to do it, if there are clear reasons not to.  That isn't
> to say that there may not indeed be good reasons to change Linux's
> behaviour; I'd be interested to hear them.

I'd be more interested to know that there had been *controlled*
experiments to validate that this behavior was safe and did not impact
the current behavior of TCP congestion control as per RFC2581. At that
point, I'd be interested to have that information taken to the IETF with
a proposal to change the recommended behavior, and have it vetted by
that community.

The idea that this should be tried in the large "until there are good
reasons not to" is NOT how such experiments should be performed.

> (On a related note, this year's PFLDnet
> <http://wil.cs.caltech.edu/pfldnet2007> has a panel session on the
> implications of network stack implementors Linux and Microsoft setting
> new de-facto flow control standards.  This seems analogous to what the
> BSD Reno release did, implementing improvements well before Reno made
> it into the RFCs.  The difference is that now a global infrastructure
> rides on it...)

The improvements in Reno were MORE conservative than TCP as specified,
not less. Being more conservative is always compliant.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/d6846059/signature-0001.bin

From detlef.bosau at web.de  Wed Jan  3 03:13:10 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 03 Jan 2007 12:13:10 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
References: <45980C60.9020405@web.de>	
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>	
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
Message-ID: <459B8FC6.1040208@web.de>

Lachlan Andrew wrote:
> Greetings,
>
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>>
>> Detlef Bosau wrote:
>> > Round    CWND
>> >    1              2
>> >    2              4
>> >    3              8
>>
>> It doesn't double each RTT; it goes up by 50%. Remember, the window
>> grows by one MSS each ACK during the initial phase, but there is one ACK
>> for each two MSS's.
>
> If you have ABC (as recent Linux senders do by default), or don't use
> delayed ACKs (as Linux receivers don't when the window is small),
> Detlef was right that it doubles each RTT.
>
> $0.02
> Lachlan
>

Just before I?m to end my life on "Yellow Mama" .......  ;-)

I admit that I often forget to mention all my assumptions. And even 
more, I don?t have all the RFCs in mind, particularly not rfc 3390, 
which Joe has in mind when he talks of an initial window of 4 MSS.

When I do NS2 simulations, I mostly turn off delayed ACKs for my 
purposes at the moment.

 From the congavoid paper, I understand that the intention was to double 
CWND each round if the sender is in slow start state and to increase it 
by 1 MSS each round when the sender is in congestion avoidance state.

 From my understanding it is not necessary for the AIMD scheme to work 
that this doubling/increasing happens every or every other round.
Of course, it affects the convergence time.

I?m talking too much. Please forgive me, if I miss to mention all my 
assumptions ...

Detlef


From touch at ISI.EDU  Wed Jan  3 08:20:27 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 08:20:27 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B8FC6.1040208@web.de>
References: <45980C60.9020405@web.de>		<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>		<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>		<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B8FC6.1040208@web.de>
Message-ID: <459BD7CB.3080300@isi.edu>


Detlef Bosau wrote:
...
> Just before I?m to end my life on "Yellow Mama" .......  ;-)
> 
> I admit that I often forget to mention all my assumptions. And even
> more, I don?t have all the RFCs in mind, particularly not rfc 3390,
> which Joe has in mind when he talks of an initial window of 4 MSS.
> 
> When I do NS2 simulations, I mostly turn off delayed ACKs for my
> purposes at the moment.
> 
> From the congavoid paper, I understand that the intention was to double
> CWND each round if the sender is in slow start state and to increase it
> by 1 MSS each round when the sender is in congestion avoidance state.

The original intention was to double it, but since delayed ACKs that
hasn't been the case. The current AI is 1.5x in slowstart, and has been
for quite a long time.

> From my understanding it is not necessary for the AIMD scheme to work
> that this doubling/increasing happens every or every other round.
> Of course, it affects the convergence time.

It also affects fairness when different connections use different
factors, either for AI or MD.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/bd68a531/signature.bin

From lachlan.andrew at gmail.com  Tue Jan  2 17:49:53 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Tue, 2 Jan 2007 17:49:53 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459AF57A.5080304@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
Message-ID: <aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>

Greetings,

On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>
> Detlef Bosau wrote:
> > Round    CWND
> >    1              2
> >    2              4
> >    3              8
>
> It doesn't double each RTT; it goes up by 50%. Remember, the window
> grows by one MSS each ACK during the initial phase, but there is one ACK
> for each two MSS's.

If you have ABC (as recent Linux senders do by default), or don't use
delayed ACKs (as Linux receivers don't when the window is small),
Detlef was right that it doubles each RTT.

$0.02
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From lachlan.andrew at gmail.com  Tue Jan  2 21:15:27 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Tue, 2 Jan 2007 21:15:27 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B1B09.40301@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
Message-ID: <aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>

Greetings,

This is probably not related to the original thread (on what happens
in real networks, as distinct from what *should* happen), but the word
"bug" bugged me...

On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>
> ABC is EXPERIMENTAL.

Fair enough.  I've just noticed that the default in 2.6.18 has been
changed to "off", possibly as a result of their experiments :)

> > delayed ACKs (as Linux receivers don't when the window is small),
>
> Delayed ACKs are strongly encouraged.
> Both good reasons to fix these bugs in Linux.

I don't follow the logic of that at all.  Linux deliberatly suppresses
delayed ACKs when it guesses that the sender is in slow start, which
sems generally correct, judging by the earlier posts in this thread.
In that phase, they harm performance, by making slow-start even slower
than it was intended to be.  Increasing the initial speed of slow
starts helps short flows at no long term cost to ongoing long flows.
When the window is large, Linux does use delayed ACKs, for the reasons
given in the RFCs.  Since this is fully standards compliant, I don't
see how it can be called a bug.

The fact that something is "encouraged" doesn't *of itself* seem a
good reason to do it, if there are clear reasons not to.  That isn't
to say that there may not indeed be good reasons to change Linux's
behaviour; I'd be interested to hear them.

(On a related note, this year's PFLDnet
<http://wil.cs.caltech.edu/pfldnet2007> has a panel session on the
implications of network stack implementors Linux and Microsoft setting
new de-facto flow control standards.  This seems analogous to what the
BSD Reno release did, implementing improvements well before Reno made
it into the RFCs.  The difference is that now a global infrastructure
rides on it...)

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From touch at ISI.EDU  Wed Jan  3 11:04:51 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 11:04:51 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701031038p6b55d894yce085ed766225d5@mail.gmail.com>
References: <45980C60.9020405@web.de>	
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>	
	<459AF57A.5080304@isi.edu>	
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	
	<459B1B09.40301@isi.edu>	
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	
	<459B4834.1050304@isi.edu>
	<aa7d2c6d0701031038p6b55d894yce085ed766225d5@mail.gmail.com>
Message-ID: <459BFE53.5070605@isi.edu>


Lachlan Andrew wrote:
> Greetings Joe,
> 
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>> The improvements in Reno were MORE conservative than TCP as specified,
>> not less. Being more conservative is always compliant.
> 
> Correct me if I'm wrong again, but I thought that RFC 1122 mandated
> following Jacobson'88, which specifies that specifies that packet
> loss, as indicated by timeout, should result in setting the CWND to
> its initial small value.  I also thought that Reno retransmits before
> timeout (less conservative) and consequently only halves the window
> (less conservative).
> 
> If the changes made transmission slower, why were they adopted?  If
> they made it faster, perhaps I'm misinterpreting "conservative".

Reno came out roughly about the same time as RFC1122; when I say "as
specified", I mean as _specified_ at the time, which was just RFC793 (in
this regard, not including Nagle).

It's worth considering that the Internet of 1990 wasn't what it is today
either. Such experiments had much more limited impact on the
international, commercial, and public community at that time.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/fdd74f4d/signature.bin

From Anil.Agarwal at viasat.com  Wed Jan  3 13:14:20 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Wed, 3 Jan 2007 16:14:20 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	<459B1B09.40301@isi.edu><aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com>

Joe at al,
 
To add to this discussion, I just did a few quick tests with a Linux 2.6.18 TCP stack over an (emulated) satellite link.
Here are my observations, based on analyzing the packet trace -
 
1. The sender starts with an initial cwnd of 3 segments, 1448 bytes each (1448 = 1500 - 40 bytes TCP/IPv4 hdr - 12 bytes TCP timestamp option).
2. The receiver acks every segment for the first 32 kbytes of received data; subsequently, it acks every other segment (delayed ack).
3. The sender increases cwnd by 1 segment for every ack (ABC is not used).
 
The cwnd values are as follows -
  Round cwnd
   1        3 segments
   2        6
   3        10     - for some reason, the sender does not increase cwnd by 6 in this round
   4        16     - the 32 kbyte threshold is crossed in this round, so the cwnd increase rate halves
 
These are close to the values described by Detlef.
 
A 50 kbyte transfer finishes in 5 RTTs (including one for the SYN exchange).
 
A quick test on a Sun Solaris 5.8 machine shows the 50 kbyte transfer take 7 RTTs, which is consistent with an implementation that always uses delayed acks.
 
Questions: 
1. Is this what the Linux TCP stack implementors intended? Is this documented somewhere?
2. Does this violate any IETF TCP principle, in letter or spirit? It seems to have an (unfair) advantage over TCP implementations that always perform delayed ack.
 
Anil
 
------------
Anil Agarwal
ViaSat Inc.
Germantown, MD
 

________________________________

From: end2end-interest-bounces at postel.org on behalf of Joe Touch
Sent: Wed 1/3/2007 1:07 AM
To: l.andrew at ieee.org
Cc: end2end-interest at postel.org
Subject: Re: [e2e] Are we doing sliding window in the Internet?


Lachlan Andrew wrote:
> Greetings,
>
> This is probably not related to the original thread (on what happens
> in real networks, as distinct from what *should* happen), but the word
> "bug" bugged me...
>
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
...
>> > delayed ACKs (as Linux receivers don't when the window is small),
>>
>> Delayed ACKs are strongly encouraged.
>> Both good reasons to fix these bugs in Linux.
>
> I don't follow the logic of that at all.

Please review RFC2581.

>  Linux deliberatly suppresses
> delayed ACKs when it guesses that the sender is in slow start, which
> sems generally correct, judging by the earlier posts in this thread.

Whether it's interpreted as correct by this email list, it is NOT what
the IETF currently recommends.

> In that phase, they harm performance, by making slow-start even slower
> than it was intended to be.  Increasing the initial speed of slow
> starts helps short flows at no long term cost to ongoing long flows.
> When the window is large, Linux does use delayed ACKs, for the reasons
> given in the RFCs.  Since this is fully standards compliant, I don't
> see how it can be called a bug.
>
> The fact that something is "encouraged" doesn't *of itself* seem a
> good reason to do it, if there are clear reasons not to.  That isn't
> to say that there may not indeed be good reasons to change Linux's
> behaviour; I'd be interested to hear them.

I'd be more interested to know that there had been *controlled*
experiments to validate that this behavior was safe and did not impact
the current behavior of TCP congestion control as per RFC2581. At that
point, I'd be interested to have that information taken to the IETF with
a proposal to change the recommended behavior, and have it vetted by
that community.

The idea that this should be tried in the large "until there are good
reasons not to" is NOT how such experiments should be performed.

> (On a related note, this year's PFLDnet
> <http://wil.cs.caltech.edu/pfldnet2007> has a panel session on the
> implications of network stack implementors Linux and Microsoft setting
> new de-facto flow control standards.  This seems analogous to what the
> BSD Reno release did, implementing improvements well before Reno made
> it into the RFCs.  The difference is that now a global infrastructure
> rides on it...)

The improvements in Reno were MORE conservative than TCP as specified,
not less. Being more conservative is always compliant.

Joe

--
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/4ae13093/attachment.html

From ian.mcdonald at jandi.co.nz  Wed Jan  3 13:15:43 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Thu, 4 Jan 2007 10:15:43 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
Message-ID: <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>

On 1/3/07, Lachlan Andrew <lachlan.andrew at gmail.com> wrote:
> Greetings,
>
> This is probably not related to the original thread (on what happens
> in real networks, as distinct from what *should* happen), but the word
> "bug" bugged me...
>
> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
> >
> > ABC is EXPERIMENTAL.
>
> Fair enough.  I've just noticed that the default in 2.6.18 has been
> changed to "off", possibly as a result of their experiments :)
>
Yes - see http://www.google.com/custom?domains=www.spinics.net&q=%22high+latency+with+tcp+connections%22&sa=Search&sitesearch=www.spinics.net&client=pub-3422782820843221&forid=1&ie=ISO-8859-1&oe=ISO-8859-1&cof=GALT%3A%23003324%3BGL%3A1%3BDIV%3A%2373B59C%3BVLC%3AFF6600%3BAH%3Acenter%3BBGC%3AC5DBCF%3BLBGC%3A66CC99%3BALC%3A330033%3BLC%3A330033%3BT%3A000000%3BGFNT%3A333300%3BGIMP%3A333300%3BFORID%3A1%3B&hl=en--

The thread is messy though so here is probably the most relevant part:

Main message is from Dave Miller replying to Stephen Hemminger

> On Fri, 1 Sep 2006 01:46:35 +0400
> Alexey Kuznetsov <kuznet at ms2.inr.ac.ru> wrote:
>
> > > Expecting any performance with one byte write's is silly.
> >
> > I am not sure why you are so confident about status of ABC.
> > I missed the discussions, when it was implemented. Apparently,
> > it was noticed that ABC in its pure form does not make sense
> > with snd_cwnd counted in packets and there were some reasons,
> > why it still was not adapted.
>
> I implemented it but don't think ABC is the correct thing to be doing
> in all cases.
>
> If you read the RFC3465, the problem it is trying to address is that of
> small packets causing growth of congestion window beyond the capacity
> of the link.
>
> It makes a number of assumptions that may not be true for Linux:
>   * ABC doesn't take into account congestion window validation RFC2861
>     already prevents most of the problem of inflated growth.
>   * ABC assumes that the "true" capacity of the link is limited by
>     byte count not packet count.

It seems to me that the thing gained by ABC are twofold:

1) protection against ACK division
2) a way to take delayed ACKs into account for cwnd growth

Both of which can be obtained by simply validating the ACK
against the retransmit queue, returning number of true
packets ACK'd.

I would even go so far as to suggest that we should drop ACKs which do
not fall on packetization boundaries.  Perhaps only when not in LOSS
state, but I doubt that this matters in practice.

Cases where mid-packet ACK is valid are truly marginal ones involving
repacketization wrt. MSS/MTU changes, and these would self-correct
eventually.

I agree that ABC has some problems.  Solution is good, implementation
is just horrible :-)

From ian.mcdonald at jandi.co.nz  Wed Jan  3 13:46:18 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Thu, 4 Jan 2007 10:46:18 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>
	<aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
Message-ID: <5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com>

> > I would even go so far as to suggest that we should drop ACKs which do
> > not fall on packetization boundaries.
>
> Interesting suggstion.  Would TSO be a problem?  You'd have to make
> sure that the card never got "creative" and put the boundaries where
> we don't expect.
>
I don't know as I'm not an expert here - just cross posting the
discussions. You can always email Dave Miller who made the suggestion.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From weddy at grc.nasa.gov  Wed Jan  3 13:48:11 2007
From: weddy at grc.nasa.gov (Wesley Eddy)
Date: Wed, 3 Jan 2007 16:48:11 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B4834.1050304@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu>
Message-ID: <20070103214811.GA27322@grc.nasa.gov>

On Tue, Jan 02, 2007 at 10:07:48PM -0800, Joe Touch wrote:
> 
> 
> Lachlan Andrew wrote:
> > Greetings,
> > 
> > This is probably not related to the original thread (on what happens
> > in real networks, as distinct from what *should* happen), but the word
> > "bug" bugged me...
> > 
> > On 02/01/07, Joe Touch <touch at isi.edu> wrote:
> ...
> >> > delayed ACKs (as Linux receivers don't when the window is small),
> >>
> >> Delayed ACKs are strongly encouraged.
> >> Both good reasons to fix these bugs in Linux.
> > 
> > I don't follow the logic of that at all.
> 
> Please review RFC2581.
> 


The exact wording in RFC 2581 says that ACKs should be sent "at least" for
every 2 packets, which allows for an ACK to be sent for every packet, as
Linux does when it assumes the other side is in slow start.  I believe the
Linux behavior is perfectly allowable under the letter of RFC 2581.  I do
not consider this behavior buggy whatsoever.

One separate thing to note with regards to ABC is that the RFC2581bis
document in TCPM right now RECOMMENDS to increase CWND by the number of
bytes ACKed during slow-start - i.e. ABC is RECOMMENDED by that document
intended as an update to RFC 2581.

-- 
Wesley M. Eddy
Verizon Federal Network Systems
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/7b7001c9/attachment.bin

From touch at ISI.EDU  Wed Jan  3 14:08:32 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 14:08:32 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <20070103214811.GA27322@grc.nasa.gov>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
Message-ID: <459C2960.7030407@isi.edu>


Wesley Eddy wrote:
> On Tue, Jan 02, 2007 at 10:07:48PM -0800, Joe Touch wrote:
>>
>> Lachlan Andrew wrote:
>>> Greetings,
>>>
>>> This is probably not related to the original thread (on what happens
>>> in real networks, as distinct from what *should* happen), but the word
>>> "bug" bugged me...
>>>
>>> On 02/01/07, Joe Touch <touch at isi.edu> wrote:
>> ...
>>>>> delayed ACKs (as Linux receivers don't when the window is small),
>>>> Delayed ACKs are strongly encouraged.
>>>> Both good reasons to fix these bugs in Linux.
>>> I don't follow the logic of that at all.
>> Please review RFC2581.
> 
> The exact wording in RFC 2581 says that ACKs should be sent "at least" for
> every 2 packets, which allows for an ACK to be sent for every packet, as
> Linux does when it assumes the other side is in slow start.  I believe the
> Linux behavior is perfectly allowable under the letter of RFC 2581.  I do
> not consider this behavior buggy whatsoever.

The exact wording from 2581:

   The delayed ACK algorithm specified in [Bra89] SHOULD be used by a
   TCP receiver.  When used, a TCP receiver MUST NOT excessively delay
   acknowledgments.  Specifically, an ACK SHOULD be generated for at
   least every second full-sized segment, and MUST be generated within
   500 ms of the arrival of the first unacknowledged packet.

The first sentence regards the use of delayed ACKs, which Bra89 defines as:
            A host that is receiving a stream of TCP data segments can
            increase efficiency in both the Internet and the hosts by
            sending fewer than one ACK (acknowledgment) segment per data
            segment received; this is known as a "delayed ACK" [TCP:5].

I.e., "delayed ACK" *means* sending fewer than one ACK per received
segment.

The second sentence from 2581 says not to excessively delay ACKs just do
do delays; the subsequent sentences refer situations that arise due to
holding back on ACKs.

The paragraph in its entirety means that
	- when there are no losses or substantial delays, TCP SHOULD
	ACK *exactly* every other packet

	- when there are losses or delays, more ACKs can be sent to
	avoid withholding feedback

Granted, 'every two' is a SHOULD not a MUST, but that's the only place
for Linux's behavior to be considered compliant. I don't see sufficient
reason in "well, it makes *us* go faster" to warrant overriding SHOULD.

> One separate thing to note with regards to ABC is that the RFC2581bis
> document in TCPM right now RECOMMENDS to increase CWND by the number of
> bytes ACKed during slow-start - i.e. ABC is RECOMMENDED by that document
> intended as an update to RFC 2581.

*When* that doc comes out, then the status of ABC may need to be
updated. Until then, widespread default use of ABC is not appropriate.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/2a55e617/signature-0001.bin

From touch at ISI.EDU  Wed Jan  3 14:37:24 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 14:37:24 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701031424u275a4eb3k89e0eb51e9ff2a67@mail.gmail.com>
References: <45980C60.9020405@web.de>	
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>	
	<459AF57A.5080304@isi.edu>	
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	
	<459B1B09.40301@isi.edu>	
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	
	<459B4834.1050304@isi.edu>	
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com>
	<aa7d2c6d0701031424u275a4eb3k89e0eb51e9ff2a67@mail.gmail.com>
Message-ID: <459C3024.5000903@isi.edu>


Lachlan Andrew wrote:
...
> As an aside, I thought of a nice hack which I think is within the
> letter of the standards, but well outside the spirit.
> 1. First packet, send a MSS
> 2. After the first ACK, send 2MSS worth of 1-byte packets
> 3. 1 RTT later, receive 1MSS worth of ACKs (ack'ing every second packet)
> 4. Without ABC, we now have a CWND of 500-1500 packets.
> 
> Could someone tell me if this is within the letter of the standards?

RFC1122, Sec 4.2.2.2:

            An application program is logically required to set the PUSH
            flag in a SEND call whenever it needs to force delivery of
            the data to avoid a communication deadlock.  However, a TCP
            SHOULD send a maximum-sized segment whenever possible, to
            improve performance (see Section 4.2.3.4).

Given the penchant for trampling SHOULDs, however, I wouldn't be
surprised to see someone implement the above and claim it to be compliant.

Joe

--
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/1c7b6c22/signature.bin

From touch at ISI.EDU  Wed Jan  3 14:46:15 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 14:46:15 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu>	
	<459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>	
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	
	<459B1B09.40301@isi.edu>	
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>	
	<459C2960.7030407@isi.edu>
	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
Message-ID: <459C3237.4000709@isi.edu>


Lachlan Andrew wrote:
> Greetings,
> 
> On 03/01/07, Joe Touch <touch at isi.edu> wrote:
>> I.e., "delayed ACK" *means* sending fewer than one ACK per received
>> segment.
> 
> It obviously doesn't mean that *every* packet should be ACK'd less
> than once (i.e., zero times).  It means that *some* packets should not
> be ACK'd, just as Linux does once the transmission is underway.
> 
>> I don't see sufficient
>> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.
> 
> Agreed!!  Selfishness should be discouraged.
> 
> The point is that if *everyone* used QuickACKs, short transfers would
> be faster, with almost no harm done to long flows.

If you believe that's true, please present some verification. An
implementation based on an assertion is insufficient.

> (It is a better
> approximation to "shortest job first", which is well known to minimise
> the average delay for a given utilisation.)  It is well known that
> slow start is too slow for modern bandwidth-delay products (althought
> it was fine when it was proposed).  

Agreed.

> To me, that *is* a good reason to
> override a SHOULD.

Thought experiments are *lousy* reasons to override SHOULDs. The desire
for something better than what we currently have is an equally lousy
reason by itself. If you have evidence, please make the case and get the
community to agree and deploy this everywhere.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/2045a7f6/signature.bin

From faber at ISI.EDU  Wed Jan  3 14:59:36 2007
From: faber at ISI.EDU (Ted Faber)
Date: Wed, 3 Jan 2007 14:59:36 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C2960.7030407@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
Message-ID: <20070103225935.GA11407@hut.isi.edu>

On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote:
> Granted, 'every two' is a SHOULD not a MUST, but that's the only place
> for Linux's behavior to be considered compliant. I don't see sufficient
> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.

A TCP implementation that acknowledges every packet (and otherwise
implements all MUSTs in the relevant RFCs) is a (conditionally)
compliant implementation as defined by RFC1122.  I really don't see any
ambiguity there. (OK, RFC1122 could say that all conditionally and
unconditionally compliant implementations are compliant, which it
doesn't, so strictly speaking I should remove the parens around
"conditionally" above: "anal-retentive" is hyphenated.)

"Buggy," unlike "(un)?conditionally compliant," is not well defined, but
I don't think that the majority of implementors would agree that a
conditionally compliant TCP implementation is per se a buggy one.

It's a good way to argue about text rather than the design decision,
though.

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/7a4129bf/attachment.bin

From touch at ISI.EDU  Wed Jan  3 15:51:07 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 15:51:07 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <20070103225935.GA11407@hut.isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
Message-ID: <459C416B.7040702@isi.edu>


Ted Faber wrote:
> On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote:
>> Granted, 'every two' is a SHOULD not a MUST, but that's the only place
>> for Linux's behavior to be considered compliant. I don't see sufficient
>> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.
> 
> A TCP implementation that acknowledges every packet (and otherwise
> implements all MUSTs in the relevant RFCs) is a (conditionally)
> compliant implementation as defined by RFC1122.  I really don't see any
> ambiguity there. (OK, RFC1122 could say that all conditionally and
> unconditionally compliant implementations are compliant, which it
> doesn't, so strictly speaking I should remove the parens around
> "conditionally" above: "anal-retentive" is hyphenated.)

Conditional compliance should come with a statement of the conditions.
Absent that, it's just buggy.

Reasonable conditions do not include "it makes *us* go faster"; the
include things like "this implementation is to be deployed in a limited
environment that is overwhelmingly satellite-oriented" - e.g., if
DirectPC were to use a variant for proxy traffic to its home routers
that overrode SHOULDs for those reasons, that'd be non-buggy.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/8d345fa3/signature.bin

From L.Wood at surrey.ac.uk  Wed Jan  3 16:26:34 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 04 Jan 2007 00:26:34 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C416B.7040702@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu>
Message-ID: <200701040027.AAA13758@cisco.com>

At Wednesday 03/01/2007 15:51 -0800, Joe Touch wrote:
>Ted Faber wrote:
>> On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote:
>>> Granted, 'every two' is a SHOULD not a MUST, but that's the only place
>>> for Linux's behavior to be considered compliant. I don't see sufficient
>>> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.
>> 
>> A TCP implementation that acknowledges every packet (and otherwise
>> implements all MUSTs in the relevant RFCs) is a (conditionally)
>> compliant implementation as defined by RFC1122.  I really don't see any
>> ambiguity there. (OK, RFC1122 could say that all conditionally and
>> unconditionally compliant implementations are compliant, which it
>> doesn't, so strictly speaking I should remove the parens around
>> "conditionally" above: "anal-retentive" is hyphenated.)
>
>Conditional compliance should come with a statement of the conditions.
>Absent that, it's just buggy.
>
>Reasonable conditions do not include "it makes *us* go faster"; the
>include things like "this implementation is to be deployed in a limited
>environment that is overwhelmingly satellite-oriented" - e.g., if
>DirectPC were to use a variant for proxy traffic to its home routers
>that overrode SHOULDs for those reasons, that'd be non-buggy.

So, if we're DirecPC, overriding SHOULDs can make us go faster.

Do these semantic wranglings actually have a point?

L. 

From L.Wood at surrey.ac.uk  Wed Jan  3 16:28:12 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 04 Jan 2007 00:28:12 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C3237.4000709@isi.edu>
References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
	<459C3237.4000709@isi.edu>
Message-ID: <200701040028.AAA13798@cisco.com>

At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote:
>> 
>> The point is that if *everyone* used QuickACKs, short transfers would
>> be faster, with almost no harm done to long flows.
>
>If you believe that's true, please present some verification. An
>implementation based on an assertion is insufficient.

And yet everyone is expected to implement based on the simple MUST and SHOULD assertions in RFCs, given without explanation.

Which is, as you say, insufficient.

L.

From touch at ISI.EDU  Wed Jan  3 16:36:01 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 16:36:01 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040028.AAA13798@cisco.com>
References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
	<459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com>
Message-ID: <459C4BF1.6060004@isi.edu>


Lloyd Wood wrote:
> At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote:
>>> The point is that if *everyone* used QuickACKs, short transfers would
>>> be faster, with almost no harm done to long flows.
>> If you believe that's true, please present some verification. An
>> implementation based on an assertion is insufficient.
> 
> And yet everyone is expected to implement based on the simple MUST
> and SHOULD assertions in RFCs, given without explanation.
> 
> Which is, as you say, insufficient.

It should be insufficient to get those words into an RFC without
evidence that they are appropriate. RFCs are neither the sole nor
necessarily the appropriate place for that information; they can and
should cite published work that validates their claims. Whether we
should trust the IETF to do that is independent of whether we should
ignore them solely for the benefit of individual performance.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/11df03ab/signature.bin

From touch at ISI.EDU  Wed Jan  3 16:44:03 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 16:44:03 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040027.AAA13758@cisco.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
Message-ID: <459C4DD3.3010106@isi.edu>


Lloyd Wood wrote:
> At Wednesday 03/01/2007 15:51 -0800, Joe Touch wrote:
...
>> Reasonable conditions do not include "it makes *us* go faster"; the
>> include things like "this implementation is to be deployed in a limited
>> environment that is overwhelmingly satellite-oriented" - e.g., if
>> DirectPC were to use a variant for proxy traffic to its home routers
>> that overrode SHOULDs for those reasons, that'd be non-buggy.
> 
> So, if we're DirecPC, overriding SHOULDs can make us go faster.

Yes, but they would not impact others, i.e., their impact would be local
to DirectPC's infrastructure.

> Do these semantic wranglings actually have a point?

The question is "under what conditions is it permissible to override a
SHOULD". I would hope that would be clarified in an update to 2119, but
don't know what the state of that doc is...

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/e8acd584/signature-0001.bin

From L.Wood at surrey.ac.uk  Wed Jan  3 17:57:05 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 04 Jan 2007 01:57:05 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C4BF1.6060004@isi.edu>
References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
	<459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com>
	<459C4BF1.6060004@isi.edu>
Message-ID: <200701040157.BAA18111@cisco.com>

At Wednesday 03/01/2007 16:36 -0800, Joe Touch wrote:


>*** PGP SIGNATURE VERIFICATION ***
>*** Status:   Good Signature from Invalid Key
>*** Alert:    Please verify signer's key before trusting signature.
>*** Signer:   Joe Touch <touch at isi.edu> (0x89A766BB)
>*** Signed:   04/01/2007 00:36:02
>*** Verified: 04/01/2007 01:24:20
>*** BEGIN PGP VERIFIED MESSAGE ***
>
>
>
>Lloyd Wood wrote:
>> At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote:
>>>> The point is that if *everyone* used QuickACKs, short transfers would
>>>> be faster, with almost no harm done to long flows.
>>> If you believe that's true, please present some verification. An
>>> implementation based on an assertion is insufficient.
>> 
>> And yet everyone is expected to implement based on the simple MUST
>> and SHOULD assertions in RFCs, given without explanation.
>> 
>> Which is, as you say, insufficient.
>
>It should be insufficient to get those words into an RFC without
>evidence that they are appropriate. RFCs are neither the sole nor
>necessarily the appropriate place for that information; they can and
>should cite published work that validates their claims. 

Such citations would be informational rather than normative, and therefore optional.

Informational references tend to get left out of RFCs.


>Whether we
>should trust the IETF to do that is independent of whether we should
>ignore them solely for the benefit of individual performance.
>
>Joe
>
>-- 
>----------------------------------------
>Joe Touch
>Sr. Network Engineer, USAF TSAT Space Segment
>
>
>
>*** END PGP VERIFIED MESSAGE ***

From Anil.Agarwal at viasat.com  Wed Jan  3 19:59:35 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Wed, 3 Jan 2007 22:59:35 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com><459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de><459AF57A.5080304@isi.edu><aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com><459B1B09.40301@isi.edu><aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com><459B4834.1050304@isi.edu>
	<20070103214811.GA27322@grc.nasa.gov><459C2960.7030407@isi.edu>
	<20070103225935.GA11407@hut.isi.edu><459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>

 
Joe Touch wrote :

>> Do these semantic wranglings actually have a point?

> The question is "under what conditions is it permissible to override a
> SHOULD". I would hope that would be clarified in an update to 2119, but
> don't know what the state of that doc is...

1. The technical issue in question is QuickAck, where delayed acks are not used for the first R / 2 bytes of received data, where R seems to be the receive socket buffer size
2. QuickAck is enabled in Linux, by default. There is no procedure to disable it, except temporarily, for an application via a system call.
3. Linux supports many other "non-standard" TCP features, but most/all of them seem to be disabled by default.
4. There does not seem to be a whole lot of technical documentation on the feature, except for the Linux man page. It is not clear how this feature gets turned on and off during the life of a connection.  There is no RFC on the subject.
5. It seems to violate a "SHOULD" statement in the RFCs. 
6. It's objective is certainly not nefarious. It improves performance for individual short data transfers. Perhaps the SHOULD needs to be changed with some qualifications. But that requires an open discussion.
 
It is perhaps understandable that SHOULDs and even MUSTs can be violated in controlled experimental environments (e.g., simulations).
It is perhaps understandable that SHOULDs may be violated in controlled , isolated environments (e.g., satellite networks).
It may be unavoidable that a SHOULD or MUST is violated by a "hacker" and  used over over the Internet.
But under what circumstances should a SHOULD be violated and let loose over the Internet as part of a widely used OS?
 
One would like to think that the last category should require some care and a rigorous process. Is this process not documented or well understood? Surely, it cannot be - implement, deploy, publish paper and write RFC :). What role should the IETF play in this process? Advisory only?
 
Anil
-----
Anil Agarwal
ViaSat Inc.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/9be39f79/attachment.html

From L.Wood at surrey.ac.uk  Wed Jan  3 20:29:19 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 04 Jan 2007 04:29:19 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.v
	iasat.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <200701040429.EAA24974@cisco.com>

This issue is minor compared to the widespread changes to their TCP stack Microsoft made with adopting Compound TCP in Vista.
http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx

and the IETF didn't have any say in that either. Standards bodies don't ship code.

At Wednesday 03/01/2007 22:59 -0500, Agarwal, Anil wrote:
> 
>Joe Touch wrote :
>
>>> Do these semantic wranglings actually have a point?
>
>> The question is "under what conditions is it permissible to override a
>> SHOULD". I would hope that would be clarified in an update to 2119, but
>> don't know what the state of that doc is...
>
>1. The technical issue in question is QuickAck, where delayed acks are not used for the first R / 2 bytes of received data, where R seems to be the receive socket buffer size
>2. QuickAck is enabled in Linux, by default. There is no procedure to disable it, except temporarily, for an application via a system call.
>3. Linux supports many other "non-standard" TCP features, but most/all of them seem to be disabled by default.
>4. There does not seem to be a whole lot of technical documentation on the feature, except for the Linux man page. It is not clear how this feature gets turned on and off during the life of a connection.  There is no RFC on the subject.
>5. It seems to violate a "SHOULD" statement in the RFCs. 
>6. It's objective is certainly not nefarious. It improves performance for individual short data transfers. Perhaps the SHOULD needs to be changed with some qualifications. But that requires an open discussion.
> 
>It is perhaps understandable that SHOULDs and even MUSTs can be violated in controlled experimental environments (e.g., simulations).
>It is perhaps understandable that SHOULDs may be violated in controlled , isolated environments (e.g., satellite networks).
>It may be unavoidable that a SHOULD or MUST is violated by a "hacker" and  used over over the Internet.
>But under what circumstances should a SHOULD be violated and let loose over the Internet as part of a widely used OS?
> 
>One would like to think that the last category should require some care and a rigorous process. Is this process not documented or well understood? Surely, it cannot be - implement, deploy, publish paper and write RFC :). What role should the IETF play in this process? Advisory only?
> 
>Anil
>-----
>Anil Agarwal
>ViaSat Inc.
> 

From touch at ISI.EDU  Wed Jan  3 21:14:06 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 21:14:06 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040157.BAA18111@cisco.com>
References: <45980C60.9020405@web.de>
	<459AA501.8050901@isi.edu>	<459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	<459B1B09.40301@isi.edu>	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	<459B4834.1050304@isi.edu>
	<20070103214811.GA27322@grc.nasa.gov>	<459C2960.7030407@isi.edu>	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>	<459C3237.4000709@isi.edu>
	<200701040028.AAA13798@cisco.com>	<459C4BF1.6060004@isi.edu>
	<200701040157.BAA18111@cisco.com>
Message-ID: <459C8D1E.5080404@isi.edu>


Lloyd Wood wrote:
> At Wednesday 03/01/2007 16:36 -0800, Joe Touch wrote:
...
>> It should be insufficient to get those words into an RFC without
>> evidence that they are appropriate. RFCs are neither the sole nor
>> necessarily the appropriate place for that information; they can and
>> should cite published work that validates their claims. 
> 
> Such citations would be informational rather than normative, and therefore optional.

Although there is a distinction between required citations of protocols
(normative) and other references, I don't agree that it's appropriate to
consider all informative references optional. They're informative only
in the sense that they don't cite protocol standards; they're required
if they are needed to understand motivation.

> Informational references tend to get left out of RFCs.

I hope we all avoid making that mistake, or allowing others to do so.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/88153b30/signature.bin

From touch at ISI.EDU  Wed Jan  3 21:15:52 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 21:15:52 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040429.EAA24974@cisco.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<200701040429.EAA24974@cisco.com>
Message-ID: <459C8D88.5020603@isi.edu>


Lloyd Wood wrote:
> This issue is minor compared to the widespread changes to their TCP stack Microsoft made with adopting Compound TCP in Vista.
> http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx
> 
> and the IETF didn't have any say in that either. Standards bodies don't ship code.

And two bugs don't make a right ;-)

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/79c7a38b/signature.bin

From touch at ISI.EDU  Wed Jan  3 21:21:04 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 03 Jan 2007 21:21:04 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com><459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de><459AF57A.5080304@isi.edu><aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com><459B1B09.40301@isi.edu><aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com><459B4834.1050304@isi.edu>
	<20070103214811.GA27322@grc.nasa.gov><459C2960.7030407@isi.edu>
	<20070103225935.GA11407@hut.isi.edu><459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <459C8EC0.3050708@isi.edu>


Agarwal, Anil wrote:
...
> 1. The technical issue in question is QuickAck, where delayed acks are
> not used for the first R / 2 bytes of received data, where R seems to be
> the receive socket buffer size
> 2. QuickAck is enabled in Linux, by default. There is no procedure to
> disable it, except temporarily, for an application via a system call.
> 3. Linux supports many other "non-standard" TCP features, but most/all
> of them seem to be disabled by default.
> 4. There does not seem to be a whole lot of technical documentation on
> the feature, except for the Linux man page. It is not clear how this
> feature gets turned on and off during the life of a connection.  There
> is no RFC on the subject.
> 5. It seems to violate a "SHOULD" statement in the RFCs.
> 6. It's objective is certainly not nefarious. It improves performance
> for individual short data transfers. Perhaps the SHOULD needs to be
> changed with some qualifications. But that requires an open discussion.

Nefarious motives are not the issue. The SHOULD currently stands, and it
is Linux's default that should be changed first.

...
> But under what circumstances should a SHOULD be violated and let loose
> over the Internet as part of a widely used OS?
>  
> One would like to think that the last category should require some care
> and a rigorous process. Is this process not documented or well
> understood? Surely, it cannot be - implement, deploy, publish paper and
> write RFC :). 

How about "implement, *test*, publish a paper or bring the results to
the IETF, and publish an RFC"? (i.e., basically, "of course it can be")

And don't call me Shirley ;-) (with apologies in advance to those not
familiar with the movie "Airplane")

> What role should the IETF play in this process? Advisory only?

The IETF plays the role of standards body. Linux (and Microsoft)
*should* play the role of test first, deploy later.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/d677beb1/signature-0001.bin

From ian.mcdonald at jandi.co.nz  Wed Jan  3 23:54:14 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Thu, 4 Jan 2007 20:54:14 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>

> One would like to think that the last category should require some care and
> a rigorous process. Is this process not documented or well understood?
> Surely, it cannot be - implement, deploy, publish paper and write RFC :).
> What role should the IETF play in this process? Advisory only?
>
You'll find that Linux is probably the most RFC compliant
implementation of TCP. However Linux isn't perfect and the developers
do as they want.

I think the bigger issue is that there are academics in one corner and
implementors in another and usually they are not the same people and
often don't even talk to each other. Linux is a meritocracy so if
people from this list were to go over to the netdev mailing list and
make a reasonable argument then it will get listened to.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From ian.mcdonald at jandi.co.nz  Wed Jan  3 23:55:23 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Thu, 4 Jan 2007 20:55:23 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C8EC0.3050708@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<459C8EC0.3050708@isi.edu>
Message-ID: <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com>

On 1/4/07, Joe Touch <touch at isi.edu> wrote:
> Nefarious motives are not the issue. The SHOULD currently stands, and it
> is Linux's default that should be changed first.

If you think Linux has a problem here post it to
netdev at vger.kernel.org and say what is wrong and why. Even better if
it comes with patches.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From detlef.bosau at web.de  Thu Jan  4 06:24:07 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Thu, 04 Jan 2007 15:24:07 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	<459B4834.1050304@isi.edu>
	<20070103214811.GA27322@grc.nasa.gov>	<459C2960.7030407@isi.edu>
	<20070103225935.GA11407@hut.isi.edu>	<459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com>	<459C4DD3.3010106@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
Message-ID: <459D0E07.7040004@web.de>

Ian McDonald wrote:
>>
> You'll find that Linux is probably the most RFC compliant
> implementation of TCP. However Linux isn't perfect and the developers
> do as they want.
>
> I think the bigger issue is that there are academics in one corner and
> implementors in another and usually they are not the same people and
> often don't even talk to each other.


No.

I basically disagree.

Sounds similar to a paper last year which I criticized and the answer 
was: "You can publisch results yourself!"

Correctness is not proven by acclamation. And if some implementation is 
buggy or not standard compliant this is not healed by a large number of 
implementors who do something wrong.

Last year, I had some look at some networking code in the BSD kernel and 
much of it reminded me of code, I?ve seen in the NS2. And there have 
been comments with names. With authors. And from that I guess, that many 
of the "academics" have done a great deal of implementation work, 
particularly in the field of TCP.

In addition, computer science is an engineering discipline. And in 
engineering, you _first_ do research, _then_ you test your protocols, 
_then_ you write the standards if the tests yield convincing results and 
further implmentations are to follow the standards.

Period.

The other way round is some kind of trial and error.

I think, we all remember the well known fortune cookie "If builders 
built buildings like programmers write programs, any woodpecker that 
came along would destroy human civilization." That directly applies here.

I pesonally find it difficult to have always the "state of the art" i.e. 
the actual standards of TCP in mind, but this my problem and I have to 
deal with it. However, TCP is not a meritocratic or implmentocratic or 
commerciocratic election and the winner is M$ for today and Linux for 
tommorrow and afterwards it?s Novell, and then I once again see one of 
these funny "TCP probing" papers where some guys propose a sophisticated 
test suite which standards they follow, if any.

I strongly believe in sound scientific work and standards which are 
based on that. And from that, implementations are simply to follow the 
standards - no ifs and buts.

We have learned this in any other field of enginieriung but computer 
science. However, it?s necessary for computer science to achieve 
maturity to catch up with other disciplines here. And I say this from my 
own experience in professional life, because other engineers often 
ridicule about CS or even take it not seriously - for exactly this reason.

Detlef

> Linux is a meritocracy so if
> people from this list were to go over to the netdev mailing list and
> make a reasonable argument then it will get listened to.
>
> Ian


From touch at ISI.EDU  Thu Jan  4 06:39:11 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 06:39:11 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459B4834.1050304@isi.edu>
	<20070103214811.GA27322@grc.nasa.gov>	<459C2960.7030407@isi.edu>
	<20070103225935.GA11407@hut.isi.edu>	<459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com>	<459C4DD3.3010106@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>	<459C8EC0.3050708@isi.edu>
	<5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com>
Message-ID: <459D118F.8070309@isi.edu>


Ian McDonald wrote:
> On 1/4/07, Joe Touch <touch at isi.edu> wrote:
>> Nefarious motives are not the issue. The SHOULD currently stands, and it
>> is Linux's default that should be changed first.
> 
> If you think Linux has a problem here post it to
> netdev at vger.kernel.org and say what is wrong and why. Even better if
> it comes with patches.

That's a convenient way to ensure that the problem doesn't get fixed.
Participating in the IETF is not a full-time job, and going around to
every OS's specific discussion venue to make the case to fix a bug - or
demanding that we fix it - confuses this body with a free, evangelical
repair service, which it is not.

I've made the case that this is a problem here, on this list. We can
take that discussion to the TSVWG mailing list if desired.

Yhe next step in the IETF process - given others agree this is a bug and
it does not get fixed by the *Linux community* (no, we're not all part
of that) - would be to add this to an update to RFC 2525. If others
decide that this should be a change to all TCPs, then the next step
would be to propose it as a change in an I-D.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/7e79abbf/signature.bin

From touch at ISI.EDU  Thu Jan  4 07:09:01 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 07:09:01 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>	
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>	
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>	
	<459C4DD3.3010106@isi.edu>	
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
Message-ID: <459D188D.8060204@isi.edu>


Ian McDonald wrote:
>> One would like to think that the last category should require some
>> care and
>> a rigorous process. Is this process not documented or well understood?
>> Surely, it cannot be - implement, deploy, publish paper and write RFC :).
>> What role should the IETF play in this process? Advisory only?
>>
> You'll find that Linux is probably the most RFC compliant
> implementation of TCP. 

Should we include the time when Linux defaulted T/TCP to "on" in that?
Or the default-ON of ABC? I.e., there are certainly points when versions
of Linux were clearly not RFC-compliant in more significant ways; which
version are you referring to?

And *WE* won't find that. If you want to look for evidence of that fact,
then please do. But unfounded assertions do not make it so, nor does
throwing the gauntlet at the rest of the world saying, "if you think
this is wrong, PROVE it".

> However Linux isn't perfect and the developers
> do as they want.

That's clearly true. The good news is that Linux ends up with some of
the earliest versions of new protocols. The bad news is that Linux
sometimes enables things as default that were never intended as such.

> I think the bigger issue is that there are academics in one corner and
> implementors in another and usually they are not the same people and
> often don't even talk to each other.

If I'm the academic in this discussion, note that I have a number of
patches that fixed bugs in FreeBSD. Just because I don't work on Linux
doesn't render me an academic.

However, you're right - we're not all in the same corner. I'm in the
IETF corner, as are developers from other OS's, and right now it seems
like you're representing the Linux community in their corner demanding
that we all come over there for a chat (see below).

> Linux is a meritocracy so if
> people from this list were to go over to the netdev mailing list and
> make a reasonable argument then it will get listened to.

That's the disconnect here. *THE* place for this sort of discussion is
the IETF, which this list is a peripheral (IRTF) party to. Perhaps the
discussion should occur on TSVWG, or even TCPM. But expecting us to take
this to the Linux community is a disconnect on how standards bodies work.

Again, we don't all work on Linux. Linux cannot demand that of the
world. The Linux community needs to participate in the bodies of
standards it uses, and expect that of its developers.

I know of no standards body that sends emissaries to developer
communities (at best, they send emissaries to other standards bodies).
The converse is the way things work; Linux is implementing IETF
protocols, and has an *obligation* to participate in the IETF, where
other communities participate.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/38a0cbdd/signature.bin

From perfgeek at mac.com  Thu Jan  4 07:13:26 2007
From: perfgeek at mac.com (rick jones)
Date: Thu, 4 Jan 2007 07:13:26 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>
	<aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
	<5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com>
Message-ID: <4e17c6bbe1c216e4f25ced41852dab5f@mac.com>

> I don't know as I'm not an expert here - just cross posting the
> discussions. You can always email Dave Miller who made the suggestion.

Direct email to David Miller generally (well, if my experience can be 
generalized, perhaps I'm just too far back in the peanut gallery) 
results in a "send it to the <insert mailing list> list" response.  In 
this case that would be netdev at vger.kernel.org.

rick jones


From Anil.Agarwal at viasat.com  Thu Jan  4 07:20:18 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Thu, 4 Jan 2007 10:20:18 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<200701040429.EAA24974@cisco.com>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A356B@VGAEXCH01.hq.corp.viasat.com>

Lloyd Wood wrote:

> This issue is minor compared to the widespread changes to their TCP stack
> Microsoft made with adopting Compound TCP in Vista.
> http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx <http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx> 
>
> and the IETF didn't have any say in that either. Standards bodies don't ship
> code.

Yikes !!
>From the above URL -
"CTCP is enabled by default in computers running Windows Server "Longhorn" ..."
 
Whatever happened to the idea of vendors and IETF conducting trial tests over the Internet for a period of time and writing RFCs before widespread deployment of a new protocol feature?

Anil
-----
Anil Agarwal
ViaSat Inc.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/babb6f99/attachment.html

From lachlan.andrew at gmail.com  Wed Jan  3 10:38:58 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Wed, 3 Jan 2007 10:38:58 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459B4834.1050304@isi.edu>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu>
Message-ID: <aa7d2c6d0701031038p6b55d894yce085ed766225d5@mail.gmail.com>

Greetings Joe,

On 02/01/07, Joe Touch <touch at isi.edu> wrote:
> The improvements in Reno were MORE conservative than TCP as specified,
> not less. Being more conservative is always compliant.

Correct me if I'm wrong again, but I thought that RFC 1122 mandated
following Jacobson'88, which specifies that specifies that packet
loss, as indicated by timeout, should result in setting the CWND to
its initial small value.  I also thought that Reno retransmits before
timeout (less conservative) and consequently only halves the window
(less conservative).

If the changes made transmission slower, why were they adopted?  If
they made it faster, perhaps I'm misinterpreting "conservative".

Cheers,
Lachaln

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From lachlan.andrew at gmail.com  Wed Jan  3 13:40:46 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Wed, 3 Jan 2007 13:40:46 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>
Message-ID: <aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>

Greetings Ian,

On 03/01/07, Ian McDonald <ian.mcdonald at jandi.co.nz> wrote:
> On 1/3/07, Lachlan Andrew <lachlan.andrew at gmail.com> wrote:
> > the default in 2.6.18 has been
> > changed to "off", possibly as a result of their experiments :)
> >
> Yes - see http://www.google.com/custom?domains=www.spinics.net&q=%22high+latency+with+tcp+connections%22&sa=Search&sitesearch=www.spinics.net&client=pub-3422782820843221&forid=1&ie=ISO-8859-1&oe=ISO-8859-1&cof=GALT%3A%23003324%3BGL%3A1%3BDIV%3A%2373B59C%3BVLC%3AFF6600%3BAH%3Acenter%3BBGC%3AC5DBCF%3BLBGC%3A66CC99%3BALC%3A330033%3BLC%3A330033%3BT%3A000000%3BGFNT%3A333300%3BGIMP%3A333300%3BFORID%3A1%3B&hl=en--

Thanks for that explanation.

> I would even go so far as to suggest that we should drop ACKs which do
> not fall on packetization boundaries.

Interesting suggstion.  Would TSO be a problem?  You'd have to make
sure that the card never got "creative" and put the boundaries where
we don't expect.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From lachlan.andrew at gmail.com  Wed Jan  3 14:24:54 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Wed, 3 Jan 2007 14:24:54 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <aa7d2c6d0701031424u275a4eb3k89e0eb51e9ff2a67@mail.gmail.com>

Greetigns Anil,

On 03/01/07, Agarwal, Anil <Anil.Agarwal at viasat.com> wrote:
> I just did a few quick tests with a Linux 2.6.18
> TCP stack over an (emulated) satellite link.
>
> A 50 kbyte transfer finishes in 5 RTTs (including one for the SYN exchange).
> a Sun Solaris 5.8 machine shows the 50 kbyte transfer take 7 RTTs.
>
> 1. Is this what the Linux TCP stack implementors intended? Is this
> documented somewhere?

I can't speak for them, but I would think that speeding up slow start
was their aim, yes.  Google "quickack", or look at  man 7 tcp on a
Linux system.

> 2. Does this violate any IETF TCP principle, in letter or spirit? It seems
> to have an (unfair) advantage over TCP implementations that always perform
> delayed ack.

I personally think it is within the spirit of TCP.   TCP is already
internally unfair (look at "RTT unfairness", or "jumbo-frame
unfairness", which can give speed disparities much greater than 7:5).
The original aim of TCP was a roughly-fair mechanism to achieve good
effective data rates while avoiding congestion collapse.  Speeding up
slow start is an important part of improving the effective data rate.

If absolute equality of rates had been the aim, wouldn't the
algorithms have been specified independently of the MSS, and wouldn't
steps have been taken to avoid RTT-unfairness when it was discovered?

As an aside, I thought of a nice hack which I think is within the
letter of the standards, but well outside the spirit.
1. First packet, send a MSS
2. After the first ACK, send 2MSS worth of 1-byte packets
3. 1 RTT later, receive 1MSS worth of ACKs (ack'ing every second packet)
4. Without ABC, we now have a CWND of 500-1500 packets.

Could someone tell me if this is within the letter of the standards?

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From lachlan.andrew at gmail.com  Wed Jan  3 14:37:46 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Wed, 3 Jan 2007 14:37:46 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C2960.7030407@isi.edu>
References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
Message-ID: <aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>

Greetings,

On 03/01/07, Joe Touch <touch at isi.edu> wrote:
> I.e., "delayed ACK" *means* sending fewer than one ACK per received
> segment.

It obviously doesn't mean that *every* packet should be ACK'd less
than once (i.e., zero times).  It means that *some* packets should not
be ACK'd, just as Linux does once the transmission is underway.

> I don't see sufficient
> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.

Agreed!!  Selfishness should be discouraged.

The point is that if *everyone* used QuickACKs, short transfers would
be faster, with almost no harm done to long flows.  (It is a better
approximation to "shortest job first", which is well known to minimise
the average delay for a given utilisation.)  It is well known that
slow start is too slow for modern bandwidth-delay products (althought
it was fine when it was proposed).  To me, that *is* a good reason to
override a SHOULD.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From L.Wood at surrey.ac.uk  Thu Jan  4 08:24:34 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 04 Jan 2007 16:24:34 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D118F.8070309@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<459C8EC0.3050708@isi.edu>
	<5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com>
	<459D118F.8070309@isi.edu>
Message-ID: <200701041625.QAA20711@cisco.com>

At Thursday 04/01/2007 06:39 -0800, Joe Touch wrote:

>Yhe next step in the IETF process - given others agree this is a bug and
>it does not get fixed by the *Linux community* (no, we're not all part
>of that) 

obviously, since ABC and other TCP specifications in RFCs are quite specific to BSD stacks.

L.

From faber at ISI.EDU  Thu Jan  4 08:26:04 2007
From: faber at ISI.EDU (Ted Faber)
Date: Thu, 4 Jan 2007 08:26:04 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C416B.7040702@isi.edu>
References: <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu>
Message-ID: <20070104162604.GA85755@hut.isi.edu>

On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote:
> 
> 
> Ted Faber wrote:
> > On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote:
> >> Granted, 'every two' is a SHOULD not a MUST, but that's the only place
> >> for Linux's behavior to be considered compliant. I don't see sufficient
> >> reason in "well, it makes *us* go faster" to warrant overriding SHOULD.
> > 
> > A TCP implementation that acknowledges every packet (and otherwise
> > implements all MUSTs in the relevant RFCs) is a (conditionally)
> > compliant implementation as defined by RFC1122.  I really don't see any
> > ambiguity there. (OK, RFC1122 could say that all conditionally and
> > unconditionally compliant implementations are compliant, which it
> > doesn't, so strictly speaking I should remove the parens around
> > "conditionally" above: "anal-retentive" is hyphenated.)
> 
> Conditional compliance should come with a statement of the conditions.
> Absent that, it's just buggy.

Now who's not reading 1122?  The terms are defined there and there's
no indication of a "signing statement" requirement for conditionally
compliant implementations.  It's just a phrase that means "did all the
MUSTs and omitted one or more of the SHOULDs."  It's precise, unlike the
"buggy" word we can't agree on.

You may disagree with omitting delayed ACKs, but the RFCs allow it.

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/21c5ea18/attachment.bin

From touch at ISI.EDU  Thu Jan  4 08:57:45 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 08:57:45 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <20070104162604.GA85755@hut.isi.edu>
References: <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu>
Message-ID: <459D3209.5090602@isi.edu>


Ted Faber wrote:
> On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote:
...
>> Conditional compliance should come with a statement of the conditions.
>> Absent that, it's just buggy.
> 
> Now who's not reading 1122?  The terms are defined there and there's
> no indication of a "signing statement" requirement for conditionally
> compliant implementations.  It's just a phrase that means "did all the
> MUSTs and omitted one or more of the SHOULDs."  It's precise, unlike the
> "buggy" word we can't agree on.

See below...

> You may disagree with omitting delayed ACKs, but the RFCs allow it.

RFC1122 also states:

         *    "SHOULD"

              This word or the adjective "RECOMMENDED" means that there
              may exist valid reasons in particular circumstances to
              ignore this item, but the full implications should be
              understood and the case carefully weighed before choosing
              a different course.

I.e., if you negate a SHOULD you ought to demonstrate you understand the
implications and have weighed the case. That's clearly stated in
RFC1122. It may not be reiterated where "conditionally compliant" is
defined, but it comes along when a SHOULD is negated.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/916925b4/signature.bin

From faber at ISI.EDU  Thu Jan  4 10:16:33 2007
From: faber at ISI.EDU (Ted Faber)
Date: Thu, 4 Jan 2007 10:16:33 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D3209.5090602@isi.edu>
References: <aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu>
	<459D3209.5090602@isi.edu>
Message-ID: <20070104181633.GC85755@hut.isi.edu>

On Thu, Jan 04, 2007 at 08:57:45AM -0800, Joe Touch wrote:
> 
> 
> Ted Faber wrote:
> > On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote:
> ...
> >> Conditional compliance should come with a statement of the conditions.
> >> Absent that, it's just buggy.
> > 
> > Now who's not reading 1122?  The terms are defined there and there's
> > no indication of a "signing statement" requirement for conditionally
> > compliant implementations.  It's just a phrase that means "did all the
> > MUSTs and omitted one or more of the SHOULDs."  It's precise, unlike the
> > "buggy" word we can't agree on.
> 
> See below...
> 
> > You may disagree with omitting delayed ACKs, but the RFCs allow it.
> 
> RFC1122 also states:
> 
>          *    "SHOULD"
> 
>               This word or the adjective "RECOMMENDED" means that there
>               may exist valid reasons in particular circumstances to
>               ignore this item, but the full implications should be
>               understood and the case carefully weighed before choosing
>               a different course.
> 
> I.e., if you negate a SHOULD you ought to demonstrate you understand the
> implications and have weighed the case. That's clearly stated in
> RFC1122. 

If we're going to be picky (and why stop now?) no *demonstration* is
required.  It says that implementors *should* to think seriously about
their choice when they violate a SHOULD, not that they have to explain
their thinking to you (or me, or anyone else).

I understand that there's no objective way to make sure that thinking
has been done, but there's no requirement to present it either.  To whom
would you require such a presentation, anyway?

And, of course, there's a "should" in the definition of SHOULD.
Regardless of whether any thinking at all has happened, one can ignore a
SHOULD and be within the letter of the RFC "law."

FWIW, I don't think SHOULDs should be thrown aside lightly, either.  But
they're spots where the IETF consensus admits that designers and
implementors can make a different decision without catastrophic
interoperability problems.

For my money "bug" is much more derisive than even "wrong design"
because it implies (to me) a level of obliviousness that doesn't seem
present here.  Bugs are accidents; this seems like a conscious choice.
I understand it's a choice you disagree with, but IMHO it's a choice
that violates no RFC.

I think you're much better off debating the content of the design
decision than wether it violates some unenforcable boundary.

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/7942ae30/attachment.bin

From touch at ISI.EDU  Thu Jan  4 10:40:16 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 10:40:16 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <20070104181633.GC85755@hut.isi.edu>
References: <aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu>
	<459D3209.5090602@isi.edu> <20070104181633.GC85755@hut.isi.edu>
Message-ID: <459D4A10.30200@isi.edu>


Ted Faber wrote:
> On Thu, Jan 04, 2007 at 08:57:45AM -0800, Joe Touch wrote:
>> RFC1122 also states:
>>
>>          *    "SHOULD"
>>
>>               This word or the adjective "RECOMMENDED" means that there
>>               may exist valid reasons in particular circumstances to
>>               ignore this item, but the full implications should be
>>               understood and the case carefully weighed before choosing
>>               a different course.
...
> FWIW, I don't think SHOULDs should be thrown aside lightly, either.  But
> they're spots where the IETF consensus admits that designers and
> implementors can make a different decision without catastrophic
> interoperability problems.

That's not what's implied above, IMO, e.g., by using the terms "full"
and "carefully". Let's consider ma few of the SHOULDs in 1122 and
consider whether we can negate them without catastrophe:

- ARP would discard the first packet sent to each unresolved IP address
(Nagle saw this problem in 1986:
http://www-mice.cs.ucl.ac.uk/multimedia/misc/tcp_ip/8604.mm.www/0126.html)

- ICMPs redirects could be used for arbitrary off-path diversion (3.2.2.2)

- packets could be forwarded to a gateway indefinitely in the absence of
positive information it is available

> For my money "bug" is much more derisive than even "wrong design"
> because it implies (to me) a level of obliviousness that doesn't seem
> present here.  Bugs are accidents; this seems like a conscious choice.

Bugs can be conscious choices too; they are just incorrect ones.

> I understand it's a choice you disagree with, but IMHO it's a choice
> that violates no RFC.

If 'violates' means obeys only MUSTs, then we agree. If 'violates' means
obeys all MUSTs and negates SHOULDs only in particular circumstances,
then we disagree.

> I think you're much better off debating the content of the design
> decision than wether it violates some unenforcable boundary.

I've already pointed out that it is likely to be unfair w.r.t. TCPs that
ACK every second packet all the time (excepting timeouts). Others seem
intent on finding ways to make their preferred OS behave better so long
as it's within the 'letter of the RFCs'; it is in that spirit that we
need to be clear on the conditions where SHOULDs are OK to skip.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/04461b0b/signature-0001.bin

From ian.mcdonald at jandi.co.nz  Thu Jan  4 11:25:18 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Fri, 5 Jan 2007 08:25:18 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D188D.8060204@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu>
	<20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
	<459D188D.8060204@isi.edu>
Message-ID: <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>

On 1/5/07, Joe Touch <touch at isi.edu> wrote:
>

I'm sorry for the way I said things. I wasn't trying to start a
mini-flame war but I have a habit of saying things in a way that
causes misunderstanding at times.
>
> Ian McDonald wrote:
> >> One would like to think that the last category should require some
> >> care and
> >> a rigorous process. Is this process not documented or well understood?
> >> Surely, it cannot be - implement, deploy, publish paper and write RFC :).
> >> What role should the IETF play in this process? Advisory only?
> >>
> > You'll find that Linux is probably the most RFC compliant
> > implementation of TCP.
>
> Should we include the time when Linux defaulted T/TCP to "on" in that?
> Or the default-ON of ABC? I.e., there are certainly points when versions
> of Linux were clearly not RFC-compliant in more significant ways; which
> version are you referring to?
>

What I was meaning is that Linux at present seems to be attracting
people to check code against RFCs and implement experimental RFCs.
This is probably because Linux is "fashionable" at the moment.

I can certainly add to the list of problems as well - e.g. broken BIC
the default, DCCP implementation is broken against RFCs.

> And *WE* won't find that. If you want to look for evidence of that fact,
> then please do. But unfounded assertions do not make it so, nor does
> throwing the gauntlet at the rest of the world saying, "if you think
> this is wrong, PROVE it".
>
> > However Linux isn't perfect and the developers
> > do as they want.
>
> That's clearly true. The good news is that Linux ends up with some of
> the earliest versions of new protocols. The bad news is that Linux
> sometimes enables things as default that were never intended as such.
>
I think the development community for Linux is significantly different
in make up to how the BSD community was. This has its positives as
well as some negatives. Linux developers are very much in the mold of
"lets try this out and see what happens".

> > I think the bigger issue is that there are academics in one corner and
> > implementors in another and usually they are not the same people and
> > often don't even talk to each other.
>
> If I'm the academic in this discussion, note that I have a number of
> patches that fixed bugs in FreeBSD. Just because I don't work on Linux
> doesn't render me an academic.
>
> However, you're right - we're not all in the same corner. I'm in the
> IETF corner, as are developers from other OS's, and right now it seems
> like you're representing the Linux community in their corner demanding
> that we all come over there for a chat (see below).
>
I'm not saying you need to chat. I'm saying notify bugs to the
relevant place (see also below)

> > Linux is a meritocracy so if
> > people from this list were to go over to the netdev mailing list and
> > make a reasonable argument then it will get listened to.
>
> That's the disconnect here. *THE* place for this sort of discussion is
> the IETF, which this list is a peripheral (IRTF) party to. Perhaps the
> discussion should occur on TSVWG, or even TCPM. But expecting us to take
> this to the Linux community is a disconnect on how standards bodies work.
>
But surely if you say Linux is broken and then you don't inform the
relevant developers then how will it get fixed? Its nice to moan about
a broken TCP implementation but if you talk about that within your own
community it doesn't get fixed.

I'm referring specifically to the situations where people are saying
Linux is not following the RFCs. The rest of the discussion quite
rightly does belong here.

> Again, we don't all work on Linux. Linux cannot demand that of the
> world. The Linux community needs to participate in the bodies of
> standards it uses, and expect that of its developers.
>
> I know of no standards body that sends emissaries to developer
> communities (at best, they send emissaries to other standards bodies).
> The converse is the way things work; Linux is implementing IETF
> protocols, and has an *obligation* to participate in the IETF, where
> other communities participate.
>
What I am trying to do is help bridge some of the gaps. I see the
disconnect between the two communities and want to help remove some of
that distance. The reason that I didn't directly do this myself in
this case as I didn't understand the issue myself properly - other
times I do.

I encourage people to post comments to relevant Linux people if they
are concerned. I know from personal experience that it has helped
immensely. There are a few RFC authors now corresponding with Linux
developers and that has helped the code base in TCP and DCCP.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From touch at ISI.EDU  Thu Jan  4 11:35:16 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 11:35:16 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu>	
	<20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu>	 <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>	
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>	
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>	
	<459D188D.8060204@isi.edu>
	<5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>
Message-ID: <459D56F4.5090206@isi.edu>


Ian McDonald wrote:
> On 1/5/07, Joe Touch <touch at isi.edu> wrote:
>>
> 
> I'm sorry for the way I said things. I wasn't trying to start a
> mini-flame war but I have a habit of saying things in a way that
> causes misunderstanding at times.
...
>> > You'll find that Linux is probably the most RFC compliant
>> > implementation of TCP.
>>
>> Should we include the time when Linux defaulted T/TCP to "on" in that?
>> Or the default-ON of ABC? I.e., there are certainly points when versions
>> of Linux were clearly not RFC-compliant in more significant ways; which
>> version are you referring to?
> 
> What I was meaning is that Linux at present seems to be attracting
> people to check code against RFCs and implement experimental RFCs.
> This is probably because Linux is "fashionable" at the moment.

I think it's also a property of Linux, as you note below. One of its
major benefits its that many devices/features/protocols are probably
implemented and available; that's also one of its detriments at times,
though.

>...Linux developers are very much in the mold of
> "lets try this out and see what happens".

Agreed.

>> > Linux is a meritocracy so if
>> > people from this list were to go over to the netdev mailing list and
>> > make a reasonable argument then it will get listened to.
>>
>> That's the disconnect here. *THE* place for this sort of discussion is
>> the IETF, which this list is a peripheral (IRTF) party to. Perhaps the
>> discussion should occur on TSVWG, or even TCPM. But expecting us to take
>> this to the Linux community is a disconnect on how standards bodies work.
>>
> But surely if you say Linux is broken and then you don't inform the
> relevant developers then how will it get fixed? Its nice to moan about
> a broken TCP implementation but if you talk about that within your own
> community it doesn't get fixed.

We're not talking about that in "our own community" on this list; this
(IRTF) list, as with IETF lists, is for all communities to come together
to discuss such issues.

> I'm referring specifically to the situations where people are saying
> Linux is not following the RFCs. The rest of the discussion quite
> rightly does belong here.

I agree that the discussion of how to fix this in Linux belongs on a
Linux list, but we're all hoping they track this and other IETF lists
and take that information there.

> I encourage people to post comments to relevant Linux people if they
> are concerned.

It'd be great for those on this list who either use Linux or who are
interested to participate on the Linux lists too, but I'm sincerely
hoping the Linux folk aren't waiting around for us to post this issue to
their lists to address it.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/3c4246ca/signature.bin

From ian.mcdonald at jandi.co.nz  Thu Jan  4 11:42:03 2007
From: ian.mcdonald at jandi.co.nz (Ian McDonald)
Date: Fri, 5 Jan 2007 08:42:03 +1300
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D56F4.5090206@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu>
	<200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>
	<459D188D.8060204@isi.edu>
	<5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>
	<459D56F4.5090206@isi.edu>
Message-ID: <5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com>

> I agree that the discussion of how to fix this in Linux belongs on a
> Linux list, but we're all hoping they track this and other IETF lists
> and take that information there.
>
I think this is a false hope in many cases unfortunately.

> > I encourage people to post comments to relevant Linux people if they
> > are concerned.
>
> It'd be great for those on this list who either use Linux or who are
> interested to participate on the Linux lists too, but I'm sincerely
> hoping the Linux folk aren't waiting around for us to post this issue to
> their lists to address it.
>
If they don't read this list (as most aren't I believe) then they
won't know about it. I think it is easier to crosspost or separately
post to netdev at vger.kernel.org if people believe there are issues with
Linux.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group

From touch at ISI.EDU  Thu Jan  4 11:46:01 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 11:46:01 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	
	<20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu>	 <200701040027.AAA13758@cisco.com>
	<459C4DD3.3010106@isi.edu>	
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>	
	<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>	
	<459D188D.8060204@isi.edu>	
	<5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>	
	<459D56F4.5090206@isi.edu>
	<5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com>
Message-ID: <459D5979.9050009@isi.edu>


Ian McDonald wrote:
>> I agree that the discussion of how to fix this in Linux belongs on a
>> Linux list, but we're all hoping they track this and other IETF lists
>> and take that information there.
>>
> I think this is a false hope in many cases unfortunately.

Agreed.

>> > I encourage people to post comments to relevant Linux people if they
>> > are concerned.
>>
>> It'd be great for those on this list who either use Linux or who are
>> interested to participate on the Linux lists too, but I'm sincerely
>> hoping the Linux folk aren't waiting around for us to post this issue to
>> their lists to address it.
>>
> If they don't read this list (as most aren't I believe) then they
> won't know about it. I think it is easier to crosspost or separately
> post to netdev at vger.kernel.org if people believe there are issues with
> Linux.

Easier for whom?

*That* is the disconnect.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/ce632376/signature.bin

From touch at ISI.EDU  Thu Jan  4 11:57:35 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 04 Jan 2007 11:57:35 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D56F4.5090206@isi.edu>
References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>		<20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>		<20070103225935.GA11407@hut.isi.edu>	<459C416B.7040702@isi.edu>	
	<200701040027.AAA13758@cisco.com>	<459C4DD3.3010106@isi.edu>		<0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com>		<5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com>		<459D188D.8060204@isi.edu>	<5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com>
	<459D56F4.5090206@isi.edu>
Message-ID: <459D5C2F.7010409@isi.edu>

Finally, let me say that I agree with Ian that the best way to fix this
issue now is to post to the Linux lists, which I will proceed to do.

I sincerely hope that Linux users on this list will track this and other
IETF lists for such issues, and bring concerns to the Linux group
themselves, rather than expecting "other" list members to do so.

We *each* fix our own systems (and we're not all Linux users), and this
is (one of) the common place(s) we figure that all out.

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/31741cbb/signature.bin

From faber at ISI.EDU  Thu Jan  4 13:17:41 2007
From: faber at ISI.EDU (Ted Faber)
Date: Thu, 4 Jan 2007 13:17:41 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D4A10.30200@isi.edu>
References: <aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu>
	<459D3209.5090602@isi.edu> <20070104181633.GC85755@hut.isi.edu>
	<459D4A10.30200@isi.edu>
Message-ID: <20070104211741.GD85755@hut.isi.edu>

On Thu, Jan 04, 2007 at 10:40:16AM -0800, Joe Touch wrote:
> That's not what's implied above, IMO, e.g., by using the terms "full"
> and "carefully". Let's consider ma few of the SHOULDs in 1122 and
> consider whether we can negate them without catastrophe:

Your examples don't convince me.  I understand them fine, but I don't
agree that they're catastrophic interoperability problems.  Furthermore
I can think of situations in which a rational implementor would choose
to go against the quoted SHOULDs.

> > I think you're much better off debating the content of the design
> > decision than wether it violates some unenforcable boundary.
> 
> I've already pointed out that it is likely to be unfair w.r.t. TCPs that
> ACK every second packet all the time (excepting timeouts). Others seem
> intent on finding ways to make their preferred OS behave better so long
> as it's within the 'letter of the RFCs'; it is in that spirit that we
> need to be clear on the conditions where SHOULDs are OK to skip.

"Likely" doesn't seem sufficient for those who disagree with you.

But I don't have much to say about this particular choice, except that I
think it's in the letter of the law.  I expect that the performance
change is in the noise most of the time, but I'm not excited enough to
either argue about it or to go out and collect data.

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/aca6e959/attachment.bin

From perfgeek at mac.com  Thu Jan  4 19:30:44 2007
From: perfgeek at mac.com (rick jones)
Date: Thu, 4 Jan 2007 19:30:44 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>
	<aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
Message-ID: <26e64446b0f56dae4d44408c5ee436e6@mac.com>

earlier someone wrote:

 > I would even go so far as to suggest that we should drop ACKs which 
do not
 > fall on packetization boundaries.

that suggests one is tracking segmentation boundaries, in which case 
wouldn't one be using conservation of packets heuristics rather than 
conservation of bytes heuristics - packet counting rather than byte 
counting?

[istr that was part of the issue when Linux tried to implement the 
byte-counting ABC RFC in their packet counting stack...]

The someone else wrote:

> Interesting suggstion.  Would TSO be a problem?  You'd have to make
> sure that the card never got "creative" and put the boundaries where
> we don't expect.

Indeed.

rickjones
there is no rest for the wicked, yet the virtuous have no pillows


From detlef.bosau at web.de  Fri Jan  5 02:48:20 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Fri, 05 Jan 2007 11:48:20 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
Message-ID: <459E2CF4.6030701@web.de>

Hi.

When I asked whether wie did sliding window in the Internet, I basically 
had a quite simple scenario in mind and basically I would like a comment 
on this one.


So, I write it down once again, perhaps making my question more clear.

The parameters are examples, so please don?t kill me whether they don?t 
are "that typical".


Basic scenario:

Sender------(some Internet path) -----Router---(link)--------Receiver

The router may be replaced by a splitter, see below,


The basic question is whether the use of a splitter may shorten the RTT 
seen by the sender to that degree, that the appropriate rate cannot be 
achieved by a sliding window protocol even if CWND were set to 1 MSS, 
the sender must hence be stalled from time to time to have the rate slow 
enough.

Is this possible, or do I miss something?

Now to the scenario in detail:


Case 1: Router.


Sender
-----------------------------------------Router-------------------------Receiver
               10 MBps, 100 ms                    300 Bps, 10 ms

Baiscally the link behind the router has a "slow dialin-modem bandwidth" 
here.

Imagine a 12000 bit packet traverling from Sender to Receiver.
What?s the RTT then? Let?s have a look:
Sender-Router: 1.2 ms serialization latency + 100 ms transport latency =
101.2 ms
Router-Receiver: 40 s serialization latency + 10 ms tranport latency =
30.01 s
=================
Sender-Receiver: 40.1112 ms.

If there is one packet in transit in each direction, i.e. the line is
full in both directions, we would roughly have CWND/RTT = 2*12000 bit /80 s
= 300 bit/s and anything is fine.

Now lets replace the router by a splitter:


Case 2: Splitter.

Sender
-----------------------------------------Splitter-------------------------Receiver
               10 MBps, 100 ms                                     300
Bps, 10 ms                        (Bandwidth, latency)

If the Splitter is doing "dumb spoofing", i.e. any packets are
acknowledged immediately as they are received, the sender would see a
round trip time of about 200 ms. So even in the stop?n waite case, i.e.
CWND = 1*12000 bit, the throughput sender/splitter is
12000 bit / 200 ms = 60 bit / ms = 60 kbit/s. Which is obvioulsly to
fast for the 300 bps modem line to carry.

So, what should the splitter do?

1. stall the sender periodically using zero windo packets?
2. don?t care, doesn?t matter?
3. ??

(let?s ignore my own stupid ideas on this one for the moment ;-))


Detlef


From detlef.bosau at web.de  Fri Jan  5 03:09:18 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Fri, 05 Jan 2007 12:09:18 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <26e64446b0f56dae4d44408c5ee436e6@mac.com>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu>	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>	<459B1B09.40301@isi.edu>	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>	<5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com>	<aa7d2c6d0701031340o6862565enfdf460a229dc95d4@mail.gmail.com>
	<26e64446b0f56dae4d44408c5ee436e6@mac.com>
Message-ID: <459E31DE.5030602@web.de>

rick jones wrote:
> earlier someone wrote:
>
> > I would even go so far as to suggest that we should drop ACKs which 
> do not
> > fall on packetization boundaries.
>
> that suggests one is tracking segmentation boundaries, in which case 
> wouldn't one be using conservation of packets heuristics rather than 
> conservation of bytes heuristics - packet counting rather than byte 
> counting?
>
To my understanding, we do so anyway.

AFAIK we use a scoreboard in Reno to track acknowledged _bytes_, we 
calculate windows in _bytes_, except of course the NS2 and following 
some rumour, i.e. I didn?t check it, in Linux.


From Anil.Agarwal at viasat.com  Fri Jan  5 05:25:01 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Fri, 5 Jan 2007 08:25:01 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com>

Detlef wrote:
 
> The basic question is whether the use of a splitter may shorten the RTT
> seen by the sender to that degree, that the appropriate rate cannot be
> achieved by a sliding window protocol even if CWND were set to 1 MSS,
> the sender must hence be stalled from time to time to have the rate slow
> enough.

Yes

Here is a more practical example -

Sender
-------------------------TCP-Splitter---------------------Receiver
    100 Mbps, 10 us (LAN)      1 Mbps, 300 ms (geo-satellite)

A cwnd of 1 segment of 1500 bytes will achieve roughly 
1500 * 8 / (20 + 120) Mbps
i.e., 85 Mbps, on the LAN segment,
which is much higher than the satellite link rate.

> So, what should the splitter do?
> 1. stall the sender periodically using zero windo packets?
> 2. don?t care, doesn?t matter?
> 3. ??

Since, the network can support a maximum of 1 Mbps, 
on average, the sender should send 1 segment every 
1500 * 8 / 1000000 seconds
i.e, every 12 ms.
 
So, stalling the sender using zero window Ack packets is an 
appropriate solution, which does not require any changes to the 
sender TCP stack. The cwnd value may be 1 segment or larger, 
it does not matter.
 
Anil
-----------
Anil Agarwal
ViaSat Inc.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/ea11dee7/attachment.html

From detlef.bosau at web.de  Fri Jan  5 06:13:56 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Fri, 05 Jan 2007 15:13:56 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <459E5D23.2040902@web.de>

Agarwal, Anil wrote:
> Detlef wrote:
>  
> > The basic question is whether the use of a splitter may shorten the RTT
> > seen by the sender to that degree, that the appropriate rate cannot be
> > achieved by a sliding window protocol even if CWND were set to 1 MSS,
> > the sender must hence be stalled from time to time to have the rate slow
> > enough.
>
> Yes
>

Great :-)

> Here is a more practical example -
>
> Sender
> -------------------------TCP-Splitter---------------------Receiver
>     100 Mbps, 10 us (LAN)      1 Mbps, 300 ms (geo-satellite)
> A cwnd of 1 segment of 1500 bytes will achieve roughly
> 1500 * 8 / (20 + 120) Mbps
> i.e., 85 Mbps, on the LAN segment,
> which is much higher than the satellite link rate.
>

The very interesting thing is that this behaviour is not restricted to a 
typical dialin modem bandwidth.
And the RTT from sender to splitter can even be in a range of some ms 
and we will still have the same behaviour.
> > So, what should the splitter do?
> > 1. stall the sender periodically using zero windo packets?
> > 2. don?t care, doesn?t matter?
> > 3. ??
>
> Since, the network can support a maximum of 1 Mbps,
> on average, the sender should send 1 segment every
> 1500 * 8 / 1000000 seconds
> i.e, every 12 ms.
>  
Yes.

Question: How is this achieved using actual splitters?

> So, stalling the sender using zero window Ack packets is an
> appropriate solution, which does not require any changes to the
> sender TCP stack. The cwnd value may be 1 segment or larger,
> it does not matter.

I wonder if splitters actually stall.

I personally think, stalling is an extremely bad solution as a stalled 
sender must wake up somehow or must be woken up somehow.
It is woken up by window updates, which unfortunately are sent 
unreliably as they typically do not carry any data bytes.
If it is not woken up, it wakes up by itself after some timeout and 
sends probing packets.

In my own simulations I did not yet implement window updates and do only 
zero window probing where I use the actual retransmission timeout for 
zero window probing as well.

The throughput decrease is, kindly spoken, disastrous. Depending on the 
parameters I choose, the flow actually uses only 25 % or less of the 
available bandwidth.

I have yet to add window updates. The problem with window updates 
however is to model the loss of window updates. This is a typical "paper 
tuning parameter (abbrev.: PTP)" If you choose this rate to low the 
paper will be rejected because it?s not relevant. If you choose it to 
high, no one believes your results (however, no one will call you a 
liar, it will be written more political correct) and somewhere in the 
middle you will find something between "weak accept" and "weak reject" :-)

O.k., but let?s wait for the "strong reject" comments now. I?m eager to 
know what the rest of the world is tinking about this problem.

In addition, I would appreciate any hint to actual papers on zero window 
probing.  (Of course there is a way to do it without zero window probing 
but I would like to see whether this is really needed or whether it?s 
irrelevant.)


Detlef


From misha at eecs.cwru.edu  Fri Jan  5 06:51:26 2007
From: misha at eecs.cwru.edu (Michael Rabinovich)
Date: Fri, 5 Jan 2007 09:51:26 -0500
Subject: [e2e] Announcement: a new network measurement platform
Message-ID: <43A64099-3E62-4089-B8A6-5FE6B569D76A@eecs.cwru.edu>

We are pleased to announce the availability of DipZoom P2P network  
measurement infrastructure.   Unlike existing approaches that face a  
difficult challenge of building a measurement platform with  
sufficiently diverse measurements and measuring hosts, DipZoom offers  
a matchmaking service instead,  bringing together experimenters in  
need of measurements with external measurement providers.

Salient features of DipZoom are:

1. DipZoom is an open system.  Anyone can perform measurement  
experiments autonomously.  We seeded the system with over a hundred  
measurement points (MPs) on PlanetLab nodes.  Several residential  
measurement points are also available.

2. DipZoom is an extensible system.  While its current standard  
distribution offers wget, ping, traceroute, and nslookup  
measurements, anyone can add new measurements as plug-ins, and  
recruit participants to install these plugins on their MPs.

3. DipZoom offers a coherent view over the entire collection of  
measurement points, which are all accessible from any local computer  
with DipZoom installed.  The only restriction is that, in the peer-to- 
peer spirit, in order to run a DipZoom client, the computer must also  
offer measurements by becoming a DipZooom measurement point.

4. DipZoom offers both navigational and programmatic access to the  
entire platform.  For navigational access, there is a graphical  
DipZoom client that allows the user to browse available MPs, select  
the MPs according to a number of characteristics (platform, location,  
autonomous system), and  obtain measurements from those MPs.  For  
programmatic access, DipZoom provides APIs to script and run complex  
globally distributed measurement experiments from a local computer.   
The APIs are implemented by a Java class library and can be called  
from any Java application.  As a test of the usability of DipZoom  
APIs, students in the Fall'07 undergraduate networking class were  
able to perform a complex measurement experiment (investigating the  
quality of Akamai's server selection) in a matter  of days.

5. Utmost care is paid to security, including the rate limiting of  
requests to both any given measurement point and to any given  
measurement target.

DipZoom runs on windows, linux, and Mac OS platforms, and can be  
freely downloaded from http://dipzoom.case.edu/ .  The site also  
includes further details on the system and links to the mailing list  
and people involved.  Please send your comments to any of us.

We hope you will find DipZoom useful and fun.

Regards,
Misha Rabinovich.


From touch at ISI.EDU  Fri Jan  5 08:38:58 2007
From: touch at ISI.EDU (Joe Touch)
Date: Fri, 05 Jan 2007 08:38:58 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <459E2CF4.6030701@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de>
Message-ID: <459E7F22.2030907@isi.edu>


Detlef Bosau wrote:
> Hi.
> 
> When I asked whether wie did sliding window in the Internet, I basically
> had a quite simple scenario in mind and basically I would like a comment
> on this one.
...
> The basic question is whether the use of a splitter may shorten the RTT
> seen by the sender to that degree, that the appropriate rate cannot be
> achieved by a sliding window protocol even if CWND were set to 1 MSS,
> the sender must hence be stalled from time to time to have the rate slow
> enough.

The window doesn't by itself determine rate; it's ACK clocking that
does. In high BW*delay product nets, the same stalling happens - you
send data, get an ACK, send more, get ACKs of that, etc - and the data
keeps bunching up at the source.

I.e., ACK clocking works only when the data-ACK look experiences a
bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match
rates' at all.

FWIW, the same thing happens when the receiver application doesn't drain
the incoming data fast enough. The receive buffers fill up, and the
sender is stalled. The same thing is happening here.

...
> So, what should the splitter do?
> 
> 1. stall the sender periodically using zero windo packets?
> 2. don?t care, doesn?t matter?
> 3. ??

Splitters are bad for other reasons, but as you said, let's ignore them
for this discussion..

It seems like the dominant effect is exactly what you expect - the
endpoint (the splitter, really) isn't experiencing the bottleneck, but
it's "application" (the receiver on the modem) is too slow. So you get
bursty 'scheduling' of the sender based on availability of buffers at
the (IMO, real, or at least effective) receiver.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/84fc22b2/signature.bin

From detlef.bosau at web.de  Fri Jan  5 09:45:06 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Fri, 05 Jan 2007 18:45:06 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <459E7F22.2030907@isi.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
Message-ID: <459E8EA2.4010000@web.de>

Joe Touch wrote:
>   
>> The basic question is whether the use of a splitter may shorten the RTT
>> seen by the sender to that degree, that the appropriate rate cannot be
>> achieved by a sliding window protocol even if CWND were set to 1 MSS,
>> the sender must hence be stalled from time to time to have the rate slow
>> enough.
>>     
>
> The window doesn't by itself determine rate; it's ACK clocking that
> does. 

I?m totally with you.

In the scenario above, the splitter ack?s packets "to fast", when it 
does dumb spoofing.
In other words: Without splitting, the serialization delay of each link 
ensures that the sender is paced correctly via ACK clocking.
When a splitter is used, the ACK pacing mechanism can be undermined.
> In high BW*delay product nets, the same stalling happens - you
> send data, get an ACK, send more, get ACKs of that, etc - and the data
> keeps bunching up at the source.
>
> I.e., ACK clocking works only when the data-ACK look experiences a
> bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match
> rates' at all.
>   

This was a little bit too fast for me....

Shouldn?t the ACKs be clocked by the TCP data packets, at least in 
symmetric paths? Thus, the ACK clocking should reflect the TCP rate 
which is achieved downstream?


> FWIW, the same thing happens when the receiver application doesn't drain
> the incoming data fast enough. The receive buffers fill up, and the
> sender is stalled. The same thing is happening here.
>
>   

Yes, absolutely. When a splitter is in use, the sending socket (directed 
to the final receiver) doesn?t drain its incomming data fast enogh.

It?s an interesting question whether data of short term flows can be 
buffered entirely at the splitter and then sent to the receiver with a 
rate the link can handle.
It?s interesting what handles to the final CLOSE ACK here which is 
typically not spoofed in splitters to ensure poper ACK semantics.

> Splitters are bad for other reasons, but as you said, let's ignore them
> for this discussion..
>
>   

I just see that they are in use. And so I think one should weigh up the 
pro?s and con?s here.
In the particular case of wide area mobile networks, I personally think 
splitters can be helpful because of the extremely irregular delivery 
times of datagrams.
I had great difficulties to see a reason for this and found Thierry 
Kleins paper !Improved TCP Performance in Wireless IP Networks through 
Enhanced Opportunistic Scheduling Algorithms" (Globecom 2004) extremely 
interesting.

Perhaps, the scheduling caused variations in packet delivery times are 
the most distinguishing mark for mobile wide area networks compared to 
other network technologies. (I would be glad to get comments on this claim!)

> It seems like the dominant effect is exactly what you expect - the
> endpoint (the splitter, really) isn't experiencing the bottleneck, but
> it's "application" (the receiver on the modem) is too slow. So you get
> bursty 'scheduling' of the sender based on availability of buffers at
> the (IMO, real, or at least effective) receiver.
>   

It?s just interesting to see, whether this is important / relevant / 
annoying.

Detlef


From touch at ISI.EDU  Fri Jan  5 09:48:06 2007
From: touch at ISI.EDU (Joe Touch)
Date: Fri, 05 Jan 2007 09:48:06 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <459E8EA2.4010000@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de>
Message-ID: <459E8F56.9070101@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
...
>> In high BW*delay product nets, the same stalling happens - you
>> send data, get an ACK, send more, get ACKs of that, etc - and the data
>> keeps bunching up at the source.
>>
>> I.e., ACK clocking works only when the data-ACK look experiences a
>> bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match
>> rates' at all.
> 
> This was a little bit too fast for me....
> 
> Shouldn?t the ACKs be clocked by the TCP data packets, at least in
> symmetric paths? Thus, the ACK clocking should reflect the TCP rate
> which is achieved downstream?

It does - 'downstream' is really the splitter, i.e., the thing
generating the ACKs. Since the path to the splitter and back has no
bottleneck, there's no ACK pacing going on.

>> FWIW, the same thing happens when the receiver application doesn't drain
>> the incoming data fast enough. The receive buffers fill up, and the
>> sender is stalled. The same thing is happening here.
> 
> Yes, absolutely. When a splitter is in use, the sending socket (directed
> to the final receiver) doesn?t drain its incomming data fast enogh.
> 
> It?s an interesting question whether data of short term flows can be
> buffered entirely at the splitter and then sent to the receiver with a
> rate the link can handle.

Sure it can; that's what a true proxy does.

> It?s interesting what handles to the final CLOSE ACK here which is
> typically not spoofed in splitters to ensure poper ACK semantics.

I don't understand "proper ACK semantics". The splitter destroys those.
The semantics that may be kept are at the connection level
(open/closed), but the semantics of data ACKs are irrevocably destroyed.

>> Splitters are bad for other reasons, but as you said, let's ignore them
>> for this discussion..
> 
> I just see that they are in use. And so I think one should weigh up the
> pro?s and con?s here.
> In the particular case of wide area mobile networks, I personally think
> splitters can be helpful because of the extremely irregular delivery
> times of datagrams.
> I had great difficulties to see a reason for this and found Thierry
> Kleins paper !Improved TCP Performance in Wireless IP Networks through
> Enhanced Opportunistic Scheduling Algorithms" (Globecom 2004) extremely
> interesting.
> 
> Perhaps, the scheduling caused variations in packet delivery times are
> the most distinguishing mark for mobile wide area networks compared to
> other network technologies. (I would be glad to get comments on this
> claim!)

Variations in delivery times can be handled via PEPs that don't spoof
ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively
participate in the communication.

Joe
-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/84adb585/signature.bin

From detlef.bosau at web.de  Fri Jan  5 12:45:44 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Fri, 05 Jan 2007 21:45:44 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <459E8F56.9070101@isi.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
Message-ID: <459EB8F8.4060304@web.de>

Joe Touch wrote:
>> Shouldn?t the ACKs be clocked by the TCP data packets, at least in
>> symmetric paths? Thus, the ACK clocking should reflect the TCP rate
>> which is achieved downstream?
>>     
>
> It does - 'downstream' is really the splitter, i.e., the thing
> generating the ACKs. Since the path to the splitter and back has no
> bottleneck, there's no ACK pacing going on.
>
>   

O.k. That?s the general problem with splitting and ACK pacing. When you 
do dumb spoofing, the sender is not correctly paced by the ACKs. In case 
of a (imminent) buffer overrun at the splitter the sender is throttled 
by TCP flow control.
>   
>> It?s interesting what handles to the final CLOSE ACK here which is
>> typically not spoofed in splitters to ensure poper ACK semantics.
>>     
>
> I don't understand "proper ACK semantics". The splitter destroys those.
> The semantics that may be kept are at the connection level
> (open/closed), but the semantics of data ACKs are irrevocably destroyed.
>
>   

I think of the semantics at the connection level. Which I think to be 
sufficient in many cases. In fact, I think the main problem is that a 
splitter introduces a single point of failure / hard state into the 
path: If a router fails, the flow may continue along an alternate path. 
If a splitter fails, the flow is dead because we cannot recover from the 
lost state.

However, we should careful look at the technology in use: Particularly 
in mobile wireless networks, I?m not totally convinced (perhaps somebody 
can comment on this one?) that there are no single points of failure in 
the path, e.g. a SGSN in GPRS. In that case, the state is "hard" anyway 
and "making it harder", e.g. by putting a PEP at the SGSN, does not 
really worsen the situation.


>> Perhaps, the scheduling caused variations in packet delivery times are
>> the most distinguishing mark for mobile wide area networks compared to
>> other network technologies. (I would be glad to get comments on this
>> claim!)
>>     
>
> Variations in delivery times can be handled via PEPs that don't spoof
> ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively
> participate in the communication.
>
>   

Really? I agree with you for the Remote Socket Architecture 
(Schlager/Wolisz) because that architecture actually does not split the 
connection but places the PEP mechanism at the application/socket interface.

Otherwise the problem is: When the bandwidth sender - splitter is, e.g., 
the average bandwidth / rate splitter-sender but far less than the 
maximum rate splitter / sender than a simple router perhaps would hardly 
store any data and thus hardly equalize the rate / delivery times.
Thierry describes delay spikes of several seconds. If we think about 
UMTS, we can imagine a wireless link were nothing happens for up to 
several seconds - thus even no data is clocked out from the sender - and 
then we have about 2 Mbps throuhput for a short time - which is perhaps 
much more than the actual Internet path can carry. In such a scenario we 
want to have the router / splitter / PEP / whateverbox buffer the data 
and equalize the rate variations. Can this be achieved by pure pacing in 
the one or other direction?

Detlef


From lynne at telemuse.net  Fri Jan  5 14:19:20 2007
From: lynne at telemuse.net (Lynne Jolitz)
Date: Fri, 5 Jan 2007 14:19:20 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459D5C2F.7010409@isi.edu>
Message-ID: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>

Good luck, but realize that "linux" is not a monolithic group like BSD was. There are many variations on the theme - some very responsive (and well-backed) and others running more hand-to-mouth. Once the genie is out of the bottle, expect a long relaxation time wrt issues in implementation.

Joe is right in his annoyance at the lack of testing and communication given the widespread deployment of linux and windows. It is irresponsible to put out a poorly thought networking change that could potentially and unwittingly cause severe congestion and unfairness. 

But it has been clear for at least a decade that the slower implement / test / prove / deploy cycle no longer is acceptable - not just because it is too slow or costly but because any delay in the release of any code, worthy or not, makes the releaser look like he's bogarting on the rest of the open source community. 

The days where IETF RFCs and tested releases were done by many of the same people are long gone. If it's important enough, perhaps it's time to take on the responsibility for correctness of operating systems and networking implementations within an accredited organization and certify such. 

But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead. 

Lynne Jolitz

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Joe Touch
> Sent: Thursday, January 04, 2007 11:58 AM
> To: Joe Touch
> Cc: Ted Faber; l.andrew at ieee.org; Lloyd Wood;
> end2end-interest at postel.org
> Subject: Re: [e2e] Are we doing sliding window in the Internet?
> 
> 
> Finally, let me say that I agree with Ian that the best way to fix this
> issue now is to post to the Linux lists, which I will proceed to do.
> 
> I sincerely hope that Linux users on this list will track this and other
> IETF lists for such issues, and bring concerns to the Linux group
> themselves, rather than expecting "other" list members to do so.
> 
> We *each* fix our own systems (and we're not all Linux users), and this
> is (one of) the common place(s) we figure that all out.
> 
> -- 
> ----------------------------------------
> Joe Touch
> Sr. Network Engineer, USAF TSAT Space Segment
> 
> 


From gds at best.com  Sat Jan  6 13:38:14 2007
From: gds at best.com (Greg Skinner)
Date: Sat, 6 Jan 2007 21:38:14 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>;
	from lynne@telemuse.net on Fri, Jan 05, 2007 at 02:19:20PM -0800
References: <459D5C2F.7010409@isi.edu>
	<005501c73117$8c0cb3c0$6e8944c6@telemuse.net>
Message-ID: <20070106213814.A82315@gds.best.vwh.net>

On Fri, Jan 05, 2007 at 02:19:20PM -0800, Lynne Jolitz wrote:
> The days where IETF RFCs and tested releases were done by many of
> the same people are long gone. If it's important enough, perhaps it's
> time to take on the responsibility for correctness of operating
> systems and networking implementations within an accredited
> organization and certify such.   

Doesn't this just push the problem onto the accredited organization?
What would make the Linux communities more likely to interact with it?
Either they have their own accreditation/certification, or it's not an
issue WRT development/deployment.

--gregbo

From lynne at telemuse.net  Sat Jan  6 15:06:34 2007
From: lynne at telemuse.net (Lynne Jolitz)
Date: Sat, 6 Jan 2007 15:06:34 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <20070106213814.A82315@gds.best.vwh.net>
Message-ID: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net>

Yes, Greg. You're right. Buy-in is difficult to achieve and maintain, especially in open source. As I also went on to say in that same email you quote:
"But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead."

People are very good at finding reasons to justify inaction on their part, and it is frustrating to even try for something better. That takes vision and risk.

If one were to set up such an arrangement with any eye towards the long-term, wouldn't it be wise to find an approach that would bring in parties and allow them to all benefit from an accord? Isn't it in the best interests of OS and networking developers, academics, and scientists to make sure things work well?

But that would require people to reach out to others, put skin in the game, and take a risk. It requires trust and mutual respect. It's much easier to complain and expect someone else to do the work. And it's much easier to ignore complaints because there is too much work to do already.

And that's why the marketplace is the default. It's not the best solution, but it is a solution.

Lynne Jolitz.

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Greg Skinner
> Sent: Saturday, January 06, 2007 1:38 PM
> To: Lynne Jolitz
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Are we doing sliding window in the Internet?
> 
> 
> On Fri, Jan 05, 2007 at 02:19:20PM -0800, Lynne Jolitz wrote:
> > The days where IETF RFCs and tested releases were done by many of
> > the same people are long gone. If it's important enough, perhaps it's
> > time to take on the responsibility for correctness of operating
> > systems and networking implementations within an accredited
> > organization and certify such.   
> 
> Doesn't this just push the problem onto the accredited organization?
> What would make the Linux communities more likely to interact with it?
> Either they have their own accreditation/certification, or it's not an
> issue WRT development/deployment.
> 
> --gregbo
> 


From detlef.bosau at web.de  Sun Jan  7 04:33:44 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sun, 07 Jan 2007 13:33:44 +0100
Subject: [e2e] Borat Science. Was: Re: Are we doing sliding window in the
 Internet?
In-Reply-To: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net>
References: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net>
Message-ID: <45A0E8A8.5080903@web.de>


Once again: NO!

<Bad Flame>

First of all: What is Linux?

A job application of mine once was rejected, one question was: Do you 
know Linux?

When it came to Unix, I mentioned several flavours of Linux-clones, I?m 
familiar with - I forgot about Linux. Therefore, in the eyes of that 
employer,  I was an idiot.

Excuse me, I use Linux in my home since 1993, that?t not that long but 
it?s perhaps longer that many kids in some human ressources
department use computers at all.

</Bad Flame>

O.k. For a even more bad flame on this issue, please refer to RMS? well 
known talk on Linux and Free Software.

I just remember that one day even in some university allegedly a 
department?s chair had said that Linux were more realistic than the NS2.
Here in Germany, we have a "joke". Roughly translated: At night, it?s 
colder than outside.
That?s the same scientific level.

What is "realistic"? What is _reality_?

I only can talk about standards and whether a software is compliant to 
them or not.

The problem with Linux is, that it is "positioned" on the market as a 
competitor for M$ products and that their are growing commercial 
interests behind it - and no adequate commercial responsibility and 
accountability at the same time. So, Linux lost its virginity when it is 
taken as a scientific research system and it never achieved maturity 
when it comes to a commercial accountability.

I use Linux because it is free, it is sufficient for my purposes. But I 
don?t accept this "Linux religion" which appears to be continously 
spreading.

The very point is a different one.

As I said some days ago, scientific research starts with a problem 
statement, than we investigate whether there exist solutions or whether 
the problem can be solved at all and evaluate solutions and approaches. 
Perhaps we consider new ones if they are better than existing ones or 
even the first ones to exist.

That is, to my understanding, proper science.

And it absolutely doesn?t matter whether wie run TCP/IP on Linux, M$ 
systems, AIX, HP-UX, SunOS or even the KA9Q stack.

So, what are we talking about here?

Should we do advertising for certain operating systems?

Or should we talk about end to end issues in distributed systems?

Here in Germany standards sometimes are respected the same way as an act 
of parliament. E.g. we have something called "Technischer 
Ueberwachungsverein", roughly: Technical Supervisory Association. If you 
own a car, you have to persent this to this association every two years 
in order to make sure that your car complies to the technical 
regulations here in Germany. And if you don?t do so, you are not allowed 
to use this car in the public road traffic otherwise it would be a 
criminal offense. And it absolutely doesn?t matter in this context if 
you use a Volkswagen or a BMW. (Thanks to Professor Schrempp 
Mercedes-Benz does not exist any longer. There is some nostalgic trade 
mark which remembers us at these cars.) So, even you have a "star" at 
your hood this won?t help you if the test badge is missing.

So, we do not experiment with different brake, steering wheels etc. in 
the public road traffic and count the victims of deadly accidents 
afterwards.

Instead we _first_ define standards, _then_ we make sure that cars used 
in Germany comply to these. Otherwise these cars must not be used. Period.

I once talked to a colleague who told me how this is handled in some 
country where he spent his vacation. IIRC they had an extemely 
scientific way for brake testing there: The experiment. Roughly spoken: 
Put a child against a wall, tell the driver to brake timely before the 
wall - and if the child is still alive afterwards the brake may have 
worked sufficiently fine.

Sometimes, this approach is called "Borat Science".


Lynne Jolitz wrote:
> Yes, Greg. You're right. Buy-in is difficult to achieve and maintain, especially in open source. As I also went on to say in that same email you quote:
> "But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead."
>
> People are very good at finding reasons to justify inaction on their part, and it is frustrating to even try for something better. That takes vision and risk.
>   

Excuse me, but what exactly do you call "inaction" here? I always see a 
vivid discussion here. Many papers are published - much more than I can 
read. Problems are identified and solved. Where is "inaction" here? In 
addition: When will the first M$-guy come to this discussion and will 
claim that the academic community has to fix what they don?t get handled 
in Redmond? Do you happen to mix up the task of industrial / commercial 
implementation and proper academic research?
> If one were to set up such an arrangement with any eye towards the long-term, wouldn't it be wise to find an approach that would bring in parties and allow them to all benefit from an accord? Isn't it in the best interests of OS and networking 

Of course! That?s to my understandig the purpose of the IETF. _That?s_ 
the venue.
> developers, academics, and scientists to make sure things work well?
>
> But that would require people to reach out to others, put skin in the game, and take a risk. It requires trust and mutual respect. It's much easier to complain and expect someone else to do the work. And it's much easier to ignore complaints because there is too much work to do already.
>   

Excuse me, I have no one to do my work. I?m a single unemployed male and 
I have to do _any_ of my work on my own. And perhaps, some day this is 
reckognized. If not? Bad luck.

So, _please_ don?t tell me anything about risks before yo know what 
you?re talking about.

I try to take part in the academic discussion _without_ any help or 
assistance. When I try to publish a paper, I even don?t know who will 
pay to possbible conference fees. That?s all my own risk. Perhaps, some 
time this will pay. For the moment, it doesn?t. Howver, there is no 
opportunity for me to get a job, so I try to do some scientific work.

_Without_ any help by the IETF or any others.

Perhaps, this requires to do some homework. When something does not 
work, you will even have to spend a night on it or a weekend.

But please don?t talk about taking a risk here.

> And that's why the marketplace is the default. It's not the best solution, but it is a solution.
>
>   

The marketplace has thrown me out.

I?m a single male, unemployed for 3 years now, aged 43. For the 
marketplace, I?m not longer a human being. I graduated in 1992, so for our
employment centre and our human ressources departments I?m regarded as 
an "unskilled worker".

So, I take a risk, i.e. that Joe throws me out of this list when I say 
this, but is my honest opinion: Please leave me alone with this McKinsey 
attitude!


From simon at limmat.switch.ch  Sun Jan  7 05:24:20 2007
From: simon at limmat.switch.ch (Simon Leinen)
Date: Sun, 07 Jan 2007 14:24:20 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459AF57A.5080304@isi.edu> (Joe Touch's message of "Tue, 02 Jan
	2007 16:14:50 -0800")
References: <45980C60.9020405@web.de>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
Message-ID: <aaejq7j7gb.fsf@limmat.switch.ch>

Joe Touch writes:
> FYI,Internet MSS's are usually in the 500-byte range in general. A
> 5KB file would take 10 packets and be over by the 4th round.

Um, the Internet MSS is usually 1460 bytes, except where it is hacked
to between 1300 and 1400 bytes to avoid issues with broken Path MTU
Detection in the presence of links with an MTU slightly smaller than
1500 (mostly ADSL links).

Packets around 500 bytes have become quite rare on the Internet today.
-- 
Simon.

From touch at ISI.EDU  Sun Jan  7 08:28:02 2007
From: touch at ISI.EDU (Joe Touch)
Date: Sun, 07 Jan 2007 08:28:02 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <aaejq7j7gb.fsf@limmat.switch.ch>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu>
	<459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu>
	<aaejq7j7gb.fsf@limmat.switch.ch>
Message-ID: <45A11F92.3000102@isi.edu>


Simon Leinen wrote:
> Joe Touch writes:
>> FYI,Internet MSS's are usually in the 500-byte range in general. A
>> 5KB file would take 10 packets and be over by the 4th round.
> 
> Um, the Internet MSS is usually 1460 bytes, except where it is hacked
> to between 1300 and 1400 bytes to avoid issues with broken Path MTU
> Detection in the presence of links with an MTU slightly smaller than
> 1500 (mostly ADSL links).
> 
> Packets around 500 bytes have become quite rare on the Internet today.

http://netweb.usc.edu/~rsinha/pkt-sizes/
http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html

'better connected' sites show larger packet sizes (show in the USC
traces), but that smaller packets are still used, and that the average
size depends on the protocol (CSL traces).

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070107/96da0c88/signature.bin

From Anil.Agarwal at viasat.com  Sun Jan  7 10:39:47 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Sun, 7 Jan 2007 13:39:47 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu><459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu><aaejq7j7gb.fsf@limmat.switch.ch>
	<45A11F92.3000102@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com>

 
Joe Touch wrote -
>>> FYI,Internet MSS's are usually in the 500-byte range in general. A
>>> 5KB file would take 10 packets and be over by the 4th round.
>>
>> Um, the Internet MSS is usually 1460 bytes, except where it is hacked
>> to between 1300 and 1400 bytes to avoid issues with broken Path MTU
>> Detection in the presence of links with an MTU slightly smaller than
>> 1500 (mostly ADSL links).
>>
>> Packets around 500 bytes have become quite rare on the Internet today.

> http://netweb.usc.edu/~rsinha/pkt-sizes/ <http://netweb.usc.edu/~rsinha/pkt-sizes/> 
> http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html <http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html> 

> 'better connected' sites show larger packet sizes (show in the USC
> traces), but that smaller packets are still used, and that the average
> size depends on the protocol (CSL traces).

Even though smaller packet sizes are observed on the net,
depending on protocol and application, that does not imply
that the MSS or path MTU is small. Some applications simply send small
amounts of data, at a time (telnet, http GETs, etc).
I suspect, MSS is of the order of 1300-1460 bytes,
even in these traces.
 
Regards,
Anil
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070107/539f828c/attachment.html

From avg at kotovnik.com  Sun Jan  7 13:50:11 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Sun, 7 Jan 2007 13:50:11 -0800 (PST)
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net>
Message-ID: <Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com>

On Sat, 6 Jan 2007, Lynne Jolitz wrote:

> But that would require people to reach out to others, put skin in the
> game, and take a risk. It requires trust and mutual respect. It's much
> easier to complain and expect someone else to do the work. And it's much
> easier to ignore complaints because there is too much work to do
> already.

> And that's why the marketplace is the default. It's not the best
> solution, but it is a solution.

Lynne - I think you meant "commercial" vs "community", not "market" 
vs "collective".  The oft-repeated notion that there's anything superior 
to market is complete nonsense.

Market is *any* kind of voluntary exchange and cooperation.  That
includes contributing resources and labor in order to gain social status,
reputation, or sense of belonging to a community.  Not all goods are
material, and not all exchanges in a marketplace are intermediated with
money (or can be priced).  There's really no boundary between "for-profit"
and "non-profit"  activities, and in the real-life commerce every activity
includes both - one gains not only profit, but also reputation,
recognition and such.

Everything else (i.e. tax-funded projects, work required by law, etc) is
fundamentally involuntary and cannot exist without threats of violence
towards non-cooperators or simply those who disagree.

This reduction to fundamentals not only shows that the market is the best
solution; it clearly shows that it is the only possible ethical solution.

Sorry for the off-topic.

--vadim


From lynne at telemuse.net  Sun Jan  7 14:29:28 2007
From: lynne at telemuse.net (Lynne Jolitz)
Date: Sun, 7 Jan 2007 14:29:28 -0800
Subject: [e2e] Are we doing sliding window in theInternet?
In-Reply-To: <45A0E8A8.5080903@web.de>
Message-ID: <000801c732ab$4b48b100$6e8944c6@telemuse.net>

I think this rant illustrates my point to Greg perfectly as to the pitfalls of getting buy-in in open source and working in a respectful and considerate manner. :-)
Lynne Jolitz.
----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Detlef Bosau
> Sent: Sunday, January 07, 2007 4:34 AM
> To: end2end-interest at postel.org
> Cc: Lynne Jolitz; frank.duerr; Daniel Minder
> Subject: [e2e] Borat Science. Was: Re: Are we doing sliding window in
> theInternet?
> 
> 
> 
> Once again: NO!
> 
> <Bad Flame>
> 
> First of all: What is Linux?
> 
> A job application of mine once was rejected, one question was: Do you 
> know Linux?
> 
> When it came to Unix, I mentioned several flavours of Linux-clones, I?m 
> familiar with - I forgot about Linux. Therefore, in the eyes of that 
> employer,  I was an idiot.
> 
> Excuse me, I use Linux in my home since 1993, that?t not that long but 
> it?s perhaps longer that many kids in some human ressources
> department use computers at all.
> 
> </Bad Flame>
> 
> O.k. For a even more bad flame on this issue, please refer to RMS? well 
> known talk on Linux and Free Software.
> 
> I just remember that one day even in some university allegedly a 
> department?s chair had said that Linux were more realistic than the NS2.
> Here in Germany, we have a "joke". Roughly translated: At night, it?s 
> colder than outside.
> That?s the same scientific level.
> 
> What is "realistic"? What is _reality_?
> 
> I only can talk about standards and whether a software is compliant to 
> them or not.
> 
> The problem with Linux is, that it is "positioned" on the market as a 
> competitor for M$ products and that their are growing commercial 
> interests behind it - and no adequate commercial responsibility and 
> accountability at the same time. So, Linux lost its virginity when it is 
> taken as a scientific research system and it never achieved maturity 
> when it comes to a commercial accountability.
> 
> I use Linux because it is free, it is sufficient for my purposes. But I 
> don?t accept this "Linux religion" which appears to be continously 
> spreading.
> 
> The very point is a different one.
> 
> As I said some days ago, scientific research starts with a problem 
> statement, than we investigate whether there exist solutions or whether 
> the problem can be solved at all and evaluate solutions and approaches. 
> Perhaps we consider new ones if they are better than existing ones or 
> even the first ones to exist.
> 
> That is, to my understanding, proper science.
> 
> And it absolutely doesn?t matter whether wie run TCP/IP on Linux, M$ 
> systems, AIX, HP-UX, SunOS or even the KA9Q stack.
> 
> So, what are we talking about here?
> 
> Should we do advertising for certain operating systems?
> 
> Or should we talk about end to end issues in distributed systems?
> 
> Here in Germany standards sometimes are respected the same way as an act 
> of parliament. E.g. we have something called "Technischer 
> Ueberwachungsverein", roughly: Technical Supervisory Association. If you 
> own a car, you have to persent this to this association every two years 
> in order to make sure that your car complies to the technical 
> regulations here in Germany. And if you don?t do so, you are not allowed 
> to use this car in the public road traffic otherwise it would be a 
> criminal offense. And it absolutely doesn?t matter in this context if 
> you use a Volkswagen or a BMW. (Thanks to Professor Schrempp 
> Mercedes-Benz does not exist any longer. There is some nostalgic trade 
> mark which remembers us at these cars.) So, even you have a "star" at 
> your hood this won?t help you if the test badge is missing.
> 
> So, we do not experiment with different brake, steering wheels etc. in 
> the public road traffic and count the victims of deadly accidents 
> afterwards.
> 
> Instead we _first_ define standards, _then_ we make sure that cars used 
> in Germany comply to these. Otherwise these cars must not be used. Period.
> 
> I once talked to a colleague who told me how this is handled in some 
> country where he spent his vacation. IIRC they had an extemely 
> scientific way for brake testing there: The experiment. Roughly spoken: 
> Put a child against a wall, tell the driver to brake timely before the 
> wall - and if the child is still alive afterwards the brake may have 
> worked sufficiently fine.
> 
> Sometimes, this approach is called "Borat Science".
> 
> 
> Lynne Jolitz wrote:
> > Yes, Greg. You're right. Buy-in is difficult to achieve and 
> maintain, especially in open source. As I also went on to say in 
> that same email you quote:
> > "But if it's not worth the time and effort for the academic 
> side to take on this charge, the marketplace will have to serve instead."
> >
> > People are very good at finding reasons to justify inaction on 
> their part, and it is frustrating to even try for something 
> better. That takes vision and risk.
> >   
> 
> Excuse me, but what exactly do you call "inaction" here? I always see a 
> vivid discussion here. Many papers are published - much more than I can 
> read. Problems are identified and solved. Where is "inaction" here? In 
> addition: When will the first M$-guy come to this discussion and will 
> claim that the academic community has to fix what they don?t get handled 
> in Redmond? Do you happen to mix up the task of industrial / commercial 
> implementation and proper academic research?
> > If one were to set up such an arrangement with any eye towards 
> the long-term, wouldn't it be wise to find an approach that would 
> bring in parties and allow them to all benefit from an accord? 
> Isn't it in the best interests of OS and networking 
> 
> Of course! That?s to my understandig the purpose of the IETF. _That?s_ 
> the venue.
> > developers, academics, and scientists to make sure things work well?
> >
> > But that would require people to reach out to others, put skin 
> in the game, and take a risk. It requires trust and mutual 
> respect. It's much easier to complain and expect someone else to 
> do the work. And it's much easier to ignore complaints because 
> there is too much work to do already.
> >   
> 
> Excuse me, I have no one to do my work. I?m a single unemployed male and 
> I have to do _any_ of my work on my own. And perhaps, some day this is 
> reckognized. If not? Bad luck.
> 
> So, _please_ don?t tell me anything about risks before yo know what 
> you?re talking about.
> 
> I try to take part in the academic discussion _without_ any help or 
> assistance. When I try to publish a paper, I even don?t know who will 
> pay to possbible conference fees. That?s all my own risk. Perhaps, some 
> time this will pay. For the moment, it doesn?t. Howver, there is no 
> opportunity for me to get a job, so I try to do some scientific work.
> 
> _Without_ any help by the IETF or any others.
> 
> Perhaps, this requires to do some homework. When something does not 
> work, you will even have to spend a night on it or a weekend.
> 
> But please don?t talk about taking a risk here.
> 
> > And that's why the marketplace is the default. It's not the 
> best solution, but it is a solution.
> >
> >   
> 
> The marketplace has thrown me out.
> 
> I?m a single male, unemployed for 3 years now, aged 43. For the 
> marketplace, I?m not longer a human being. I graduated in 1992, so for our
> employment centre and our human ressources departments I?m regarded as 
> an "unskilled worker".
> 
> So, I take a risk, i.e. that Joe throws me out of this list when I say 
> this, but is my honest opinion: Please leave me alone with this McKinsey 
> attitude!
> 
> 
> 
> 


From lynne at telemuse.net  Sun Jan  7 15:34:05 2007
From: lynne at telemuse.net (Lynne Jolitz)
Date: Sun, 7 Jan 2007 15:34:05 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com>
Message-ID: <000c01c732b4$5244ef60$6e8944c6@telemuse.net>

My comments were in the context of harnessing e2e expertise to make sure that experimental networking changes made in releases considered carefully congestion and fairness. If that cannot be achieved, then the marketplace will prevail, with unpredictable consequences to network performance and reliability.
I'm afraid a discussion of general economic paradigms is off topic. :-)
Lynne Jolitz.
----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net

> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Vadim Antonov
> Sent: Sunday, January 07, 2007 1:50 PM
> To: Lynne Jolitz
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Are we doing sliding window in the Internet?
> 
> 
> On Sat, 6 Jan 2007, Lynne Jolitz wrote:
> 
> > But that would require people to reach out to others, put skin in the
> > game, and take a risk. It requires trust and mutual respect. It's much
> > easier to complain and expect someone else to do the work. And it's much
> > easier to ignore complaints because there is too much work to do
> > already.
> 
> > And that's why the marketplace is the default. It's not the best
> > solution, but it is a solution.
> 
> Lynne - I think you meant "commercial" vs "community", not "market" 
> vs "collective".  The oft-repeated notion that there's anything superior 
> to market is complete nonsense.
> 
> Market is *any* kind of voluntary exchange and cooperation.  That
> includes contributing resources and labor in order to gain social status,
> reputation, or sense of belonging to a community.  Not all goods are
> material, and not all exchanges in a marketplace are intermediated with
> money (or can be priced).  There's really no boundary between "for-profit"
> and "non-profit"  activities, and in the real-life commerce every activity
> includes both - one gains not only profit, but also reputation,
> recognition and such.
> 
> Everything else (i.e. tax-funded projects, work required by law, etc) is
> fundamentally involuntary and cannot exist without threats of violence
> towards non-cooperators or simply those who disagree.
> 
> This reduction to fundamentals not only shows that the market is the best
> solution; it clearly shows that it is the only possible ethical solution.
> 
> Sorry for the off-topic.
> 
> --vadim
> 
> 


From detlef.bosau at web.de  Sun Jan  7 15:47:02 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 08 Jan 2007 00:47:02 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com>
Message-ID: <45A18676.3070408@web.de>

Vadim Antonov wrote:
>
> This reduction to fundamentals not only shows that the market is the best
> solution; it clearly shows that it is the only possible ethical solution.
>
>   

Oh yeah.

Meritocracy as the only ethical form of government. Social darwinism as 
the only acceptable basis for a modern society.

Didn?t we see in Europe during the last decade that this does _not_ work?
> Sorry for the off-topic.
>
>   
I think many will agree here when I propose to return to end to end 
topics. In fact, I should not answer to this post and I apologize for 
doing so. However, in my personal situation, I wrote about it, comments 
like yours hurt bitterly. Please don?t see this as pure criticism. I did 
not think the way I do all my life. It?s simple my personal experience 
of life which makes me reconsider some of my opionions. And thus I 
simple share my actual point of view that "market" is neither the only 
solution  for all kind of problens nor the best solution. However, it?s 
quite often a very inhuman solution.

Detlef


From L.Wood at surrey.ac.uk  Sun Jan  7 16:12:24 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Mon, 08 Jan 2007 00:12:24 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com
 >
References: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net>
	<Pine.LNX.4.44.0701071321520.19847-100000@gato.kotovnik.com>
Message-ID: <200701080013.AAA16294@cisco.com>

At Sunday 07/01/2007 13:50 -0800, Vadim Antonov wrote:
>The oft-repeated notion that there's anything superior 
>to market is complete nonsense.

human society is the thing that enables markets to exist; markets are established and made to be free and fair by law and regulation. Without the threat of punishment for misdeeds, there would be no free or fair markets. And there are many things that need to be done that markets do not and cannot address - and when markets are applied to them, they fail miserably.

get a clue.

L.

<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at surrey.ac.uk> 

From avg at kotovnik.com  Sun Jan  7 20:35:29 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Sun, 7 Jan 2007 20:35:29 -0800 (PST)
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <000c01c732b4$5244ef60$6e8944c6@telemuse.net>
Message-ID: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>

On Sun, 7 Jan 2007, Lynne Jolitz wrote:

> My comments were in the context of harnessing e2e expertise to make sure
> that experimental networking changes made in releases considered
> carefully congestion and fairness. If that cannot be achieved, then the
> marketplace will prevail, with unpredictable consequences to network
> performance and reliability.

I guess in the end the network will be designed properly - i.e. resistant
to any kind of behavior from the end hosts (including malicious). It is
not that hard to achieve.  The best-effort delivery with no fairness
enforcement by the network itself is asking for trouble, and I'm suprised 
that it still persists.

If the network is enforcing fairness, there is nothing a misbehaving host
(or millions of misbehaving hosts) could do to degrade performance as seen
by other users (except as a part of coordinated DDoS attack on a 
specific target).

How hard it is to turn the Fair Queueing knob to "on" on the gateways?

The mass deployment of supposedly poorly behaving stacks is either a
problem (in which case ISPs and equipment vendors will do the homework
needed to protect their networks - or leave the ground to smarter
competitors), or a non-issue (in which case nothing changes).  In both
cases, there's no problem in the long term.  With "long" is closer to days
than years.

So, why exactly should we care?

--vadim


From detlef.bosau at web.de  Mon Jan  8 02:18:29 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 08 Jan 2007 11:18:29 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>
Message-ID: <45A21A75.7080506@web.de>

Vadim Antonov wrote:
>
>
> I guess in the end the network will be designed properly - i.e. resistant
> to any kind of behavior from the end hosts (including malicious). It is
>   

And when will this be?

When will this trial and error phase come to an end? Apparently, you 
have all time of the world.

> not that hard to achieve.  The best-effort delivery with no fairness
> enforcement by the network itself is asking for trouble, and I'm suprised 
> that it still persists.
>
>   

You probably want to read the congavoid paper or RFC 2581.

You will learn from that, that fairness enforcement _does_ exist.


> If the network is enforcing fairness, there is nothing a misbehaving host
> (or millions of misbehaving hosts) could do to degrade performance as seen
> by other users (except as a part of coordinated DDoS attack on a 
>   
                                     
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I appreciate that you name even one problem yourself.
> specific target).
>
> How hard it is to turn the Fair Queueing knob to "on" on the gateways?
>
>
> So, why exactly should we care?
>
>   
I don?t know, why "we" should care. But I frankly tell you, what you 
should care fore.
First of all, you most probably want to care for a good text book on 
networking because what you write on this topic simply makes my hair 
stands on end.
The second is a personal advice I tried to give you already yesterday. 
Perhaps you should reconsider your opinons from time to time based upon 
your personal life experience. Times are changing and so are opinions. 
It is always a good idea to put some kind of low pass filter on opinions 
and to avoid both, extreme positions and simple answers to complex 
questions.

Finally, and I really think so, we should politics out of this list.

When we had birthday parties in our familiy or similar occasions, I was 
always given a strong advice by my father concerning topics of discussion:
" NO sports, NO politics, NO religion."
You can talk about anything but these.

Believe me: My father was perfectly right.

And for this list: You?re welcome to contribute to the discussion of end 
to end issues.

I apologize for posting on this issue again. Please, Lynne, Vadim, let 
us return to the subject of this list again. Not only for the benefit of 
ourselves but for the benefits of all the other readers. Particularly 
the thread on sliding window is an interesting one and I learned a lot 
from it. Perhaps, others find it interesting as well, at least there are 
far too many contributions for a boring thread. It would be a pity if 
people would leave thr thread or even the list because of continous off 
topic posts on politics and similar issues.

Thanks.

Detlef


From sisalem at fokus.fraunhofer.de  Mon Jan  8 06:41:39 2007
From: sisalem at fokus.fraunhofer.de (sisalem@fokus.fraunhofer.de)
Date: Mon, 8 Jan 2007 15:41:39 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A21A75.7080506@web.de>
References: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>
	<45A21A75.7080506@web.de>
Message-ID: <532648004.20070108154139@mail.iptel.org>

Hello,
>> not that hard to achieve.  The best-effort delivery with no fairness
>> enforcement by the network itself is asking for trouble, and I'm suprised 
>> that it still persists.
>>
> You probably want to read the congavoid paper or RFC 2581.

> You will learn from that, that fairness enforcement _does_ exist.
just a short remark: I would assume that the definition of fairness
here is that two TCP connections with the same RTT and packet size
would receive the same bandwidth share.
Hence, fairness enforcement is only partially done. Two
TCP sessions with different congestion avoidance schemes (e.g., one
with SACK and another one with Reno) will not achieve the same
bandwidth share under the same RTT conditions (whether this is to be
considered unfair though is another issue which has more to do with
philosophy). And a UDP flow is not interested in fairness at all as
well.

regarding the input about enforcing fairness in the network. I think
that the painful experience ATM and ABR taught us already, that
network based fairness enforcement schemes are theoretically great but
practically too complex to be of practical use

cheers
>> If the network is enforcing fairness, there is nothing a misbehaving host
>> (or millions of misbehaving hosts) could do to degrade performance as seen
>> by other users (except as a part of coordinated DDoS attack on a 
>>   
>                                      
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> I appreciate that you name even one problem yourself.
>> specific target).
>>
>> How hard it is to turn the Fair Queueing knob to "on" on the gateways?
>>
>>
>> So, why exactly should we care?
>>
>>   
> I don?t know, why "we" should care. But I frankly tell you, what you 
> should care fore.
> First of all, you most probably want to care for a good text book on 
> networking because what you write on this topic simply makes my hair 
> stands on end.
> The second is a personal advice I tried to give you already yesterday.
> Perhaps you should reconsider your opinons from time to time based upon
> your personal life experience. Times are changing and so are opinions.
> It is always a good idea to put some kind of low pass filter on opinions
> and to avoid both, extreme positions and simple answers to complex 
> questions.

> Finally, and I really think so, we should politics out of this list.

> When we had birthday parties in our familiy or similar occasions, I was
> always given a strong advice by my father concerning topics of discussion:
> " NO sports, NO politics, NO religion."
> You can talk about anything but these.

> Believe me: My father was perfectly right.

> And for this list: You?re welcome to contribute to the discussion of end
> to end issues.

> I apologize for posting on this issue again. Please, Lynne, Vadim, let
> us return to the subject of this list again. Not only for the benefit of
> ourselves but for the benefits of all the other readers. Particularly 
> the thread on sliding window is an interesting one and I learned a lot
> from it. Perhaps, others find it interesting as well, at least there are
> far too many contributions for a boring thread. It would be a pity if 
> people would leave thr thread or even the list because of continous off
> topic posts on politics and similar issues.

> Thanks.

> Detlef


-- 
Best regards,
 Dorgham                            mailto:sisalem at iptel.org


From detlef.bosau at web.de  Sun Jan  7 16:05:44 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 08 Jan 2007 01:05:44 +0100
Subject: [e2e] Hiccups and scheduling in mobile networks
In-Reply-To: <459EB8F8.4060304@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>	<459E2CF4.6030701@web.de>
	<459E7F22.2030907@isi.edu>	<459E8EA2.4010000@web.de>
	<459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de>
Message-ID: <45A18AD8.8090903@web.de>


Joe Touch wrote:
>> Variations in delivery times can be handled via PEPs that don't spoof
>> ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively
>> participate in the communication.
>>
>>   
>

And my humble comment was:

> Really? I agree with you for the Remote Socket Architecture 
> (Schlager/Wolisz) because that architecture actually does not split 
> the connection but places the PEP mechanism at the application/socket 
> interface.
>
> Otherwise the problem is: When the bandwidth sender - splitter is, 
> e.g., the average bandwidth / rate splitter-sender but far less than 
> the maximum rate splitter / sender than a simple router perhaps would 
> hardly store any data and thus hardly equalize the rate / delivery times.
> Thierry describes delay spikes of several seconds. If we think about 
> UMTS, we can imagine a wireless link were nothing happens for up to 
> several seconds - thus even no data is clocked out from the sender - 
> and then we have about 2 Mbps throuhput for a short time - which is 
> perhaps much more than the actual Internet path can carry. In such a 
> scenario we want to have the router / splitter / PEP / whateverbox 
> buffer the data and equalize the rate variations. Can this be achieved 
> by pure pacing in the one or other direction?
>
> Detlef
>
>
>

O.k. So, I see: Splitting is unsellable :-) So the question is whether 
we really need it.

So, this weekend I spent some time adding hiccups to a quite complex 
network scenario:

Sender-----(internet)--------BS----(mobile net)-------Receiver

And the mobile net suffers from hiccups :-)

What I would like to know (and AFAIK Andreas dealt with questions like 
these, therefore I put him on the cc: list) is "how bad" this hiccups my 
become. As I said before, Thierry Klein published a paper at Globecom 
2004 on this issue.  There, he observed delay spikes from up to two seconds.

For the moment, I simply model the wireless link as a link with a 
constant high bandwidth (e.g. 10 Mbps) which reflects its _physical_ rate
and I add hiccup times to the serialization delay (i.e. txtime in NS2). 
These are drawn from a two point distribution: Either the hiccup time is 
zero or it is 1 second. The probabilities are chosen that way that a 
given average throughput is achieved.

Of course that?s extremely simplified. However: Is this reasonable as a 
first approach? I would appreciate any comment on this one.

I would like to study different pacing techniques in this scenario, 
_intendedly_ without splitting.

AFAIK, there is a variety of scheduling algorithms available for 
networks like GPRS or UMTS. So, the question is whether we have a, if 
extremely rough, "worst case model" to get a feeling for what TCP has to 
cope with. The idea of my model above is to insert constant, say 1 
second, delay spikes randomly into the flow, just in a way that I can 
estimate the average throughput on the link.

Is this completely weird? Or does it sound reasonable?

Thanks

Detlef


From braden at ISI.EDU  Mon Jan  8 11:28:21 2007
From: braden at ISI.EDU (Bob Braden)
Date: Mon, 08 Jan 2007 11:28:21 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040157.BAA18111@cisco.com>
References: <459C4BF1.6060004@isi.edu> <45980C60.9020405@web.de>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu>
	<aa7d2c6d0701031437ub03c83amf1df2a731b39ded7@mail.gmail.com>
	<459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com>
	<459C4BF1.6060004@isi.edu>
Message-ID: <5.1.0.14.2.20070108112451.00ac3988@boreas.isi.edu>


Lloyd Wood wrote:


>Such citations would be informational rather than normative, and therefore 
>optional.
>
>Informational references tend to get left out of RFCs.
>


Indeed.  By the standards of academia, IETF protocol publications, even the 
best of them,
often suggest willful ignorance of earlier related work.  And even within 
academic Computer
Science, the level of reinvention is sometimes deplorable.

Bob Braden


From braden at ISI.EDU  Mon Jan  8 11:33:45 2007
From: braden at ISI.EDU (Bob Braden)
Date: Mon, 08 Jan 2007 11:33:45 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <459C4DD3.3010106@isi.edu>
References: <200701040027.AAA13758@cisco.com>
	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>
	<459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de>
	<459AF57A.5080304@isi.edu>
	<aa7d2c6d0701021749g505f40ecq188e715882d4bc17@mail.gmail.com>
	<459B1B09.40301@isi.edu>
	<aa7d2c6d0701022115s310953a9uf7283711baa520b8@mail.gmail.com>
	<459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov>
	<459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu>
	<459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com>
Message-ID: <5.1.0.14.2.20070108113020.03285388@boreas.isi.edu>


>
>
>The question is "under what conditions is it permissible to override a
>SHOULD". I would hope that would be clarified in an update to 2119, but
>don't know what the state of that doc is...
>
>Joe

In its original usage in RFC 1122-1123, SHOULD was applied where we could
imagine relatively unusual or extreme conditions where the MUST might not
apply.  But the intent was that anyone who overrode a SHOULD ought to
be able to present a credible argument to his/her peers to justify this
deviation.  In other words, you had better have a "DAMNED GOOD" (technical
term) reason for it.

Bob Braden


>----------------------------------------
>Joe Touch
>Sr. Network Engineer, USAF TSAT Space Segment
>


From detlef.bosau at web.de  Mon Jan  8 14:04:30 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 08 Jan 2007 23:04:30 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <532648004.20070108154139@mail.iptel.org>
References: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>
	<45A21A75.7080506@web.de> <532648004.20070108154139@mail.iptel.org>
Message-ID: <45A2BFEE.8010401@web.de>

sisalem at fokus.fraunhofer.de wrote:
>> You will learn from that, that fairness enforcement _does_ exist.
>>     
> just a short remark: I would assume that the definition of fairness
> here is that two TCP connections with the same RTT and packet size
> would receive the same bandwidth share.
> Hence, fairness enforcement is only partially done. 
I agree with you here. However, one could understand Vadim that way, 
that we do no fairness consideration at all. And that?s simply wrong.


> Two
> TCP sessions with different congestion avoidance schemes (e.g., one
> with SACK and another one with Reno) will not achieve the same
> bandwidth share under the same RTT conditions (whether this is to be
> considered unfair though is another issue which has more to do with
> philosophy). And a UDP flow is not interested in fairness at all as
> well.
>
>   

 From my point of view, you mix up at least three issues here.

1. It?s a basic decision to make whether the Internet is built 
hierarchical or heterarchical. A heterarchical design is of course much 
more robust than a hierarchical one. A hierarchical desin is only robust 
against the failure of up to n (n to be defined) nodes. A heterarchical 
design will still work, when there is even _one_ path left between two 
nodes which want to communicate. I don?t know the discussions of the 
early 70s, because I as a schoolboy then, but I can imagine that 
robustness was a major issue then.

If the Internet were designed hierarchical, you could provide admission 
control and QoS assignments etc. and then you have fairness matching
any criteria you desire.

In a heterarchical design, it?s much more difficult, if possible, to 
enforce arbitrary fairness schemes.


2. That TCP/Tahoe will run unfair against TCP/Reno is no structural 
problem. It goes without argument that TCP flavours are basically fair 
if all parties use the same one. In my opinion, this is one decisive 
argument not to play around with an arbitrary number of TCP flavours and 
see what happens but to carefully consider which flavour is deployed and 
which consequences this will have.

3. It?s always a concern if protocols are responsive or not or if they 
are even TCP friendly. In this context please allow the question: What 
is a "UDP flow"? If you use UDP, the task of fairness / congestion 
control / TCP friendlyness / responsiveness is passed to the application.
> regarding the input about enforcing fairness in the network. I think
> that the painful experience ATM and ABR taught us already, that
> network based fairness enforcement schemes are theoretically great but
> practically too complex to be of practical use
>
>   
That?s perhaps even one reason more to use a heterarchical scheme.

And as I stated quite some time ago: When I consider all the arguments, 
why the Internet is supposed not to work, I?m always suprised that it 
works quite fine :-)

Detlef


From avg at kotovnik.com  Mon Jan  8 19:07:54 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Mon, 8 Jan 2007 19:07:54 -0800 (PST)
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A21A75.7080506@web.de>
Message-ID: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>


On Mon, 8 Jan 2007, Detlef Bosau wrote:

> When will this trial and error phase come to an end? Apparently, you 
> have all time of the world.

No, I have some experience in networking since when 2400bps was "high
speed", and remember how computers with no integrated circiuts in them
looked like.  And because of that I know that a lot of things considered
hard or impossible to do in a few years aren't.
 
> You probably want to read the congavoid paper or RFC 2581.
> 
> You will learn from that, that fairness enforcement _does_ exist.

Ah, you don't understand I said - namely that fairness enforcement should
be done in the network, and not by the software in end hosts.
 
Besides, TCP is not fair.  For example, long-RTT flows always lose to 
short-RTT flows in non-stationary (i.e. real-life) scenarios.

> First of all, you most probably want to care for a good text book on 
> networking because what you write on this topic simply makes my hair 
> stands on end.

Ah, looks like you haven't been around for a long time; and I did projects
in fields other than networking in the recent years.  Or you'd be more
inclined to listen to what I say. After all, I'm the guy who built
backbones in all 24 time zones (including the first commercial T-3
backbone, and the first backbone which did CIDR), wrote networking code
for BSD kernel when I worked at BSD Inc. (though, admittedly, not the TCP
stack - hacking TCP stack is my present occupation <grin>), and invented
the only practical method for doing packet routing at speeds over 10Gbps.

> The second is a personal advice I tried to give you already yesterday.

I'm not an unemployed engineer. In fact, I made my first million many
years ago. My personal life is quite satisfactory.  Why exactly should I
listen to your personal advice?

As for politics, well. Anyone who's doing any engineering should have some 
grasp of basic economics - because the success of engineering projects 
often depends not on technical merits of the design but rather on its 
economics.  An engineer who's oblivious to the economic and sociological 
ramifications of his decisions is, let me put it mildly, incompetent.

In the topic at hand the issue of which part of the overall system 
(network or end-host software) performs the fairness enforcement is 
neutral from the technical point of view.  Technically, it'll work either 
way.

It is not neutral from the point of view of an economist - having shared 
resource with no admission control creates the tragedy of commons. Meaning 
that it creates incentives for people to cheat and overexploit the shared 
resource, until it becomes useless (this, incidentally, is the problem 
with socialism in general).

Therefore the appeal to developers to be conscientious in the way they
design network stacks and applications is not going to work.  On the other
hand, long-haul ISPs have pretty good reason to protect the value of their
resources - i.e. the networks.  So far, they do not perceive
overexploitation as a problem.  That will change as end-users en mass
start to exchange huge video files - and, consequently, are starting to
use software which does cheat - it does make a lot of difference for them.
Any P2P software which opens multiple TCP sessions for simultaneous
download essentially overrides the rough fairness of the cooperative
congestion control.

The end-point based congestion control and fairness enforcement, while
quite widely deployed, were a bad architectural decision - economically.  
People who made that decision didn't pay much attention to the economics -
they were doing research, not doing business. (To their credit back then
even getting data through reliably, without congestion collapses, was a
big deal; and this was a workable approach. Things like FQ and RED were 
invented much later - and back then doing fancy packet processing in 
backbone gateways was out of question. Heck, people still think doing 
longest-prefix search with patricia tries is a good idea, though we're 
no longer in the "horror, we're running out of 16Mb RAM on the darn 
backbone router" era.)

Well, the reality is starting to catch up - the name of the game in the
ISP business is no longer "grab as much ground as you can and damn the
cost" but, rather, "drive the costs down".  The profit margins are getting
slim, and the packet transport is no longer novelty, but simply another
commodity.  It is no longer feasible just to throw bandwidth at the
problem; there's not going to be another mad rush to lay fiber anytime
soon.

--vadim


From avg at kotovnik.com  Mon Jan  8 20:08:18 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Mon, 8 Jan 2007 20:08:18 -0800 (PST)
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <532648004.20070108154139@mail.iptel.org>
Message-ID: <Pine.LNX.4.44.0701082002160.24593-100000@gato.kotovnik.com>

On Mon, 8 Jan 2007 sisalem at fokus.fraunhofer.de wrote:

> regarding the input about enforcing fairness in the network. I think
> that the painful experience ATM and ABR taught us already, that
> network based fairness enforcement schemes are theoretically great but
> practically too complex to be of practical use

The ATM problems are/were due to it's fundamental dependency on the
virtual circuits (and inability to route them at high rates), and having
the whole bandwidth reservation boondoggle as a design requirement.

FQ does not require either.

--vadim


From tim at ivisit.com  Mon Jan  8 20:52:43 2007
From: tim at ivisit.com (Tim Dorcey)
Date: Mon, 08 Jan 2007 20:52:43 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>
Message-ID: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com>

> Any P2P software which opens multiple TCP sessions for simultaneous
> download essentially overrides the rough fairness of the cooperative
> congestion control.

I wonder how much BitTorrent performance is due to his effect?  Might it do
almost as well if a receiver opened up multiple TCP sessions to the best
single source?

I get the point that accessing multiple sources simultaneously deals with
asymmetry in upload/download speeds.  But, something makes me think this
washes out in the aggregate if enough torrents are running.  I am ignorant
on actual network technology though.  Is the asymetric upload/download speed
common with consumer broadband a function of the last mile link technology?
Or, something else?

Tim


From dga+e2e at cs.cmu.edu  Mon Jan  8 21:35:11 2007
From: dga+e2e at cs.cmu.edu (Dave Andersen)
Date: Tue, 09 Jan 2007 00:35:11 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>
Message-ID: <45A3298F.7070502@cs.cmu.edu>

Vadim Antonov wrote:
> On Mon, 8 Jan 2007, Detlef Bosau wrote:
> 
>> First of all, you most probably want to care for a good text book on 
>> networking because what you write on this topic simply makes my hair 
>> stands on end.

And in return, might I kindly suggest:

http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782

> It is not neutral from the point of view of an economist - having shared 
> resource with no admission control creates the tragedy of commons. Meaning 
> that it creates incentives for people to cheat and overexploit the shared 
> resource, until it becomes useless (this, incidentally, is the problem 
> with socialism in general).

Though in the case of TCP, it takes a certain amount of effort to cheat.
 Absent an easy to use mechanism in a popular OS, most people aren't
going to do it.  If you will, there's a certain cost to cheating (be
that the cost of tweaking your stack, writing a new protocol, or
installing some "accelerator" program that does it for you).

> Therefore the appeal to developers to be conscientious in the way they
> design network stacks and applications is not going to work.  On the other
> hand, long-haul ISPs have pretty good reason to protect the value of their
> resources - i.e. the networks.  So far, they do not perceive
> overexploitation as a problem.  That will change as end-users en mass
> start to exchange huge video files - and, consequently, are starting to
> use software which does cheat - it does make a lot of difference for them.
> Any P2P software which opens multiple TCP sessions for simultaneous
> download essentially overrides the rough fairness of the cooperative
> congestion control.
> 
> The end-point based congestion control and fairness enforcement, while
> quite widely deployed, were a bad architectural decision - economically.  
> People who made that decision didn't pay much attention to the economics -
> they were doing research, not doing business. (To their credit back then

But if you're making an economic argument, you have to consider all of
the costs.  There is a cost to enforcement in the network, in hardware
and complexity.  There is a cost to billing by usage, both in actual
costs and in customer satisfaction.

There most likely exists a point at which the costs of enforcement or
the costs of accounting are lower than the costs imposed by cheating
users.  But in an environment where capacity is still increasing
exponentially and where clueful network operators and programmers are
not getting any cheaper, it's not clear to me when we'll reach that
point.  Some people may argue we already have;  I don't think that we're
there _for the majority of uses_.  It may well be that there are
applications that want to pay more for better service today (voip,
remote open heart surgery), but it's not clear yet that the economic
benefit to ISPs for satisfying that class of apps is worth the costs.

(Particularly when most of the voip people can usually be satisfied by
simply doing prioritization at the edge.)

It's very hard to quantify the costs of things like "complexity", "more
code", and "users prefer flat-rate billing", but they do exist.

> Well, the reality is starting to catch up - the name of the game in the
> ISP business is no longer "grab as much ground as you can and damn the
> cost" but, rather, "drive the costs down".  The profit margins are getting
> slim, and the packet transport is no longer novelty, but simply another
> commodity.  It is no longer feasible just to throw bandwidth at the
> problem; there's not going to be another mad rush to lay fiber anytime
> soon.

The nice thing about today's environment is that the fiber is already in
the ground.  Adding more capacity is doable by "only" upgrading the
transcievers, adding more wavelengths, upgrading to faster multimillion
dollar routers, etc. :)

I suspect we're saying the same thing from different perspectives, but
have possibly different opinions about where we are on the cost curve.

  -Dave


From avg at kotovnik.com  Mon Jan  8 21:44:27 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Mon, 8 Jan 2007 21:44:27 -0800 (PST)
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A315F4.90500@cs.cmu.edu>
Message-ID: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>

On Mon, 8 Jan 2007, Dave Andersen wrote:

> http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782

Oh, I'm never the first to use ad hominem, but I also won't let anyone to 
try that on me without getting taste of their own medicine.
 
> Though in the case of TCP, it takes a certain amount of effort to cheat.
>  Absent an easy to use mechanism in a popular OS, most people aren't
> going to do it.

Cheating TCP is very simple - it is sufficient to open several TCP
sessions. All software written specifically to download large files does
that.

> But if you're making an economic argument, you have to consider all of
> the costs.  There is a cost to enforcement in the network, in hardware
> and complexity.  There is a cost to billing by usage, both in actual
> costs and in customer satisfaction.

Actually, I didn't talk of usage-based billing. Customers tend to dislike 
it (people like to have predictable expenses), and switch to flat-rate 
plans whenever they can afford them.

What is really needed is fairness enforcement, not usage accounting. In a
fair network you pay for the ability to have a guaranteed use of some
fraction of network capacity, plus use of proportionally allocated unused
capacity.  Ideally, the fee should be proportional to the guaranteed
fraction. It does not have to be ideal, just somewhat effective.
 
> There most likely exists a point at which the costs of enforcement or
> the costs of accounting are lower than the costs imposed by cheating
> users.  But in an environment where capacity is still increasing
> exponentially and where clueful network operators and programmers are
> not getting any cheaper, it's not clear to me when we'll reach that
> point.  

Mmm... demand is expanding faster than capacity. Right now the choke 
point is distribution networks, but that is slowly (in US) being fixed.
Currently DSL providers in US have something like 1:30 oversubscription, 
and P2P has the capacity to soak all of that. In the recent year the DSL 
service in major population centers got noticeably slower during peak 
times, and the customer dissatisfaction will eventually force ISPs to 
decrease the oversubscription.

The backbone capacity has hard physical limits - getting smaller 
dispersion in the fiber or reducing size of WDM frequency bands can go
only that far; the remaining option (just put more fibers) is generally
limited by what's already in the ground - with no prospect of another 
dot-com style financial insanity on the horizon.

Besides, "lay more fiber" is not exponential, it's linear in bandwidth to 
cost ratio.

> (Particularly when most of the voip people can usually be satisfied by
> simply doing prioritization at the edge.)

Yep. That's because right now backbones are faster than edge - given the
present duty cycle.  The duty cycle is changing from 2-3% to 20-30% as
video over Internet becomes popular.  This will shift (or already 
shifting) the bottleneck back to the backbones - to the place where it was 
10-15 years ago.
 
> It's very hard to quantify the costs of things like "complexity", "more
> code", and "users prefer flat-rate billing", but they do exist.

The funny part is that most routers can do FQ out of box. Just enabling 
that will reduce the misbehaving stack/application problem to the point of 
insignificance.

A better design would track FQ weights on per-prefix basis (and sum them 
when routes are aggregated) to improve fairness on larger scales.

> The nice thing about today's environment is that the fiber is already in
> the ground.  Adding more capacity is doable by "only" upgrading the
> transcievers, adding more wavelengths, upgrading to faster multimillion
> dollar routers, etc. :)

Unfortunately, it is not that simple. You cannot pack information denser
than Shannon limit for a given level of noise, you cannot increase S/N by
pumping more power into fibers without causing non-linearity and things
like Raman scattering.  So the way to expand is to put more equipment in
parallel and reduce leg distances. It means the expensive things like
building more amplifier stations in the middle of nowhere, and beefing up
CO space, power, and cooling.  The high-speed stuff is hot, and power
budget quickly gets to megawatt range.
 
All the while prices on residential access are getting down to few dozens
$ per Mbps of downlink capacity.  The market is not growing very fast in
financial terms. So it is either cost-cutting or out-of-business.

There's a huge disparity between capacity of PCs to source/sink traffic
(the modern desktop CPUs can easily run 200-300Mbps or TCP traffic with a
suitable NIC) and the capacity of the network.  This creates, well, an
interesting situation - the demand is potentially huge.

> I suspect we're saying the same thing from different perspectives, but
> have possibly different opinions about where we are on the cost curve.

Yep. But at least it is helpful to think about economics rather than go 
wishing that the world was perfect and everybody did the Right Thing:)

--vadim


From detlef.bosau at web.de  Mon Jan  8 23:50:42 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Tue, 09 Jan 2007 08:50:42 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701081601240.24593-100000@gato.kotovnik.com>
Message-ID: <45A34952.5090809@web.de>

Vadim Antonov wrote:
>> nt _does_ exist.
>>     
>
> Ah, you don't understand I said - namely that fairness enforcement should
> be done in the network, and not by the software in end hosts.
>  
> Besides, TCP is not fair.  For example, long-RTT flows always lose to 
> short-RTT flows in non-stationary (i.e. real-life) scenarios.
>
>   

Please read my comments on Dorgham Sisalem?s post yesterday.

That long-RTT flows lose to short-RTT flows results from the probing 
scheme used in TCP: When a flow increases its window one segment per 
round, a short RTT flow increases faster than a long RTT flow. However, 
we must use some probing scheme and this probing scheme should be 
adaptive to the path. And of course we have to take into account a 
flow?s RTT for a probing scheme because we must take into account how 
fast reactions of a network on probing will be visible.

WRT to leaving fairness to the network: I have no first hand experience 
with ABR. But I think, Dorgham told us about the experiences here yesterday.

To make a long story short: I think it?s already said in the Twelve 
Basic Network Truths but I think it?s generic:
There are always arbitrary much simple and wrong solutions to complex 
prolblems :-)

BTW: Just a pointer to literature: For a network based congestion 
control approach (hopefully I understood this work correctly) you should 
read the PhD thesis by Srinivasan Keshav. IIRC this is some really 
interesting work on ressource allocation in a complex network.

Detlef


From Arnaud.Legout at sophia.inria.fr  Tue Jan  9 01:07:07 2007
From: Arnaud.Legout at sophia.inria.fr (Arnaud Legout)
Date: Tue, 09 Jan 2007 10:07:07 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com>
References: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com>
Message-ID: <45A35B3B.3020702@sophia.inria.fr>

Hello,

Tim Dorcey wrote:
>
> I wonder how much BitTorrent performance is due to his effect?  Might it do
> almost as well if a receiver opened up multiple TCP sessions to the best
> single source?
>
> I get the point that accessing multiple sources simultaneously deals with
> asymmetry in upload/download speeds.  But, something makes me think this
> washes out in the aggregate if enough torrents are running.  I am ignorant
> on actual network technology though.  Is the asymetric upload/download speed
> common with consumer broadband a function of the last mile link technology?
> Or, something else?

This is not the point with BT. It is not a web client doing parallel 
download to open servers. It is
a P2P protocol that manages to enforce cooperation among selfish peers. 
This is the main reason of its
efficiency. If you want to get data you have to give data. The faster 
you give, the faster you receive, thus a
strong sharing incentive.

If you are interested in you can get papers on experimental evaluation 
of BT from my web page.

Concerning parallel download, you can read this insightful paper:
P. Rodriguez, W. Ernst Biersack., "Dynamic Parallel-Access to Replicated 
Content in the Internet". In IEEE/Transactions on Networking, August 
2002 (Also in IEEE/Infocom 2000)/
/http://www.research.microsoft.com/~pablo/papers/paraload_ton.pdf/

/In particular the authors evaluate parallel download from a single 
source or from multiple sources.
The major conclusion is that with dynamic parallel download you don't 
have to know who is the best server.
This best server can even change with time, this is transparent and 
still optimal with dynamic parallel download.


Regards,
Arnaud.

-- 
Arnaud Legout, Ph.D.

INRIA Sophia Antipolis - Plan?te  Phone : 00.33.4.92.38.78.15
2004 route des lucioles - BP 93   Fax   : 00.33.4.92.38.79.78
06902 Sophia Antipolis CEDEX      E-mail: arnaud.legout at sophia.inria.fr
FRANCE                            Web   : http://www-sop.inria.fr/planete/Arnaud.Legout/index.html


From touch at ISI.EDU  Tue Jan  9 06:59:29 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 09 Jan 2007 06:59:29 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>
Message-ID: <45A3ADD1.9070709@isi.edu>


Vadim Antonov wrote:
> On Mon, 8 Jan 2007, Dave Andersen wrote:
> 
>> http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782
> 
> Oh, I'm never the first to use ad hominem, but I also won't let anyone to 
> try that on me without getting taste of their own medicine.

All,

Please folks, let's keep personal attacks out of this, or at least off
the list. There's enough passion about the technical material to offend,
anyway ;-)

As to medicine, anyone using ad hominems - whether initiated OR in
response - will jeopardize their unmoderated list posting privileges.

Joe (as list admin)

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/8fca40aa/signature.bin

From L.Wood at surrey.ac.uk  Tue Jan  9 07:38:24 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Tue, 09 Jan 2007 15:38:24 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com
 >
References: <45A315F4.90500@cs.cmu.edu>
	<Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>
Message-ID: <200701091538.PAA00410@cisco.com>

At Monday 08/01/2007 21:44 -0800, Vadim Antonov wrote:

>What is really needed is fairness enforcement, not usage accounting. 

Vadim, didn't you just send me a lot of emails advocating free markets sans all enforcements, which just skew said free markets unfairly?

Surely usage accounting leads to cost-based accounting and a valid free market, and enforcement is not required?

(snorts.)

L.

>Yep. But at least it is helpful to think about economics rather than go 
>wishing that the world was perfect and everybody did the Right Thing:)
>
>--vadim

From touch at ISI.EDU  Tue Jan  9 09:02:25 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 09 Jan 2007 09:02:25 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>
References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>
Message-ID: <45A3CAA1.9010505@isi.edu>


Lynne Jolitz wrote:
> But if it's not worth the time and effort for the academic side to
> take on this charge, the marketplace will have to serve instead.

It's not whether academics want to spend the time and effort. Many are
already giving it for projects they prefer (e.g., FreeBSD in my case);
others have none to give (note the dearth of academics on the IESG,
which requires letters of 80% support).

I.e., the effort of volunteers is subject to its own market as well.

However, the primary tension seems to be that:
	- standards bodies rely on emissaries from
	development communities

	- development communities rely on volunteers

This may appear to suggest that the two communities are competing for
volunteers, but that's not the case. We all *must* come together to work
on standards; the same is not true for particular OS's.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/11831fdb/signature.bin

From jg at laptop.org  Tue Jan  9 10:35:20 2007
From: jg at laptop.org (Jim Gettys)
Date: Tue, 09 Jan 2007 13:35:20 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A3CAA1.9010505@isi.edu>
References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>
	<45A3CAA1.9010505@isi.edu>
Message-ID: <1168367720.4840.92.camel@localhost>

There is a fundamental divide that has to be overcome.

With a few exceptions (Ted T'so comes to mind), there has been few Linux
people who also have been exposed to actively participating in the IETF.

The culture has been that there are IETF (and other specifications) that
the Linux community read and implement.  And, as you note, they are
(often) volunteers, though these days, a large fraction of the key
developers are full time employees of various companies.

If this community wants to bridge this divide, I'd recommend some active
outreach.  Having worked in both communities, it is remarkable how few
faces are in common.

One opportunity is next week at Linux Conf Australia (in Sydney).  A
year ago, Van Jacobson gave the best talk I've attended in more than a
decade in New Zealand at LCA (it was the best talk of the conference,
and given twice as a result), and caused quite a bit of a stir and
ferment among the Linux networking people.  This kind of cross
fertilization is healthy for both communities, I believe.

Now I'll throw some stones at some of the academic research I've seen
done on Linux.

One of the fundamental tenants of Linux development is its continual
nature.  I've seen some very good academic work end up being entirely
ignored since, by the time the work was done, the work (which was based
on what had become a several year stale version of Linux), was hopeless
integrate into Linux.  

If you *really* want research that can be taken advantage of by Linux,
you have to understand Linux's development model, and be willing to pay
the price to keep up with ongoing development, and figure out how to get
from where Linux is, to where it should be in an incremental fashion. 

Particularly since the Linux 2.6 series started, "big bang" integrations
of large changes into the system never occur; it is always stepwise
evolution, and you have to work in this fashion, as part of the
development community.
                                   Regards,
                                        - Jim


On Tue, 2007-01-09 at 09:02 -0800, Joe Touch wrote:
> 
> Lynne Jolitz wrote:
> > But if it's not worth the time and effort for the academic side to
> > take on this charge, the marketplace will have to serve instead.
> 
> It's not whether academics want to spend the time and effort. Many are
> already giving it for projects they prefer (e.g., FreeBSD in my case);
> others have none to give (note the dearth of academics on the IESG,
> which requires letters of 80% support).
> 
> I.e., the effort of volunteers is subject to its own market as well.
> 
> However, the primary tension seems to be that:
> 	- standards bodies rely on emissaries from
> 	development communities
> 
> 	- development communities rely on volunteers
> 
> This may appear to suggest that the two communities are competing for
> volunteers, but that's not the case. We all *must* come together to work
> on standards; the same is not true for particular OS's.
> 
> Joe
> 
-- 
Jim Gettys
One Laptop Per Child


From dga+ at cs.cmu.edu  Tue Jan  9 10:55:40 2007
From: dga+ at cs.cmu.edu (David Andersen)
Date: Tue, 9 Jan 2007 13:55:40 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0701082042480.24593-100000@gato.kotovnik.com>
Message-ID: <9AF36FB5-3460-4895-A0C0-E755DDB200FC@cs.cmu.edu>

[Wow, sorry, this is long.  And I think I'm on e2e with the wrong  
address, so this might get held up for moderation.]

On Jan 9, 2007, at 12:44 AM, Vadim Antonov wrote:

> On Mon, 8 Jan 2007, Dave Andersen wrote:
>
>
>> Though in the case of TCP, it takes a certain amount of effort to  
>> cheat.
>>  Absent an easy to use mechanism in a popular OS, most people aren't
>> going to do it.
>
> Cheating TCP is very simple - it is sufficient to open several TCP
> sessions. All software written specifically to download large files  
> does
> that.

- If everyone does it, is it cheating?
- If it's only a small constant factor, is it cheating?  (It's  
certainly still TCP-friendly, though TCP-fair is a more stringent  
definition.)
- If instead of running p2p software, I just download 10 programs in  
parallel instead of 1 program in ten-parallel, is it cheating?

I ask not to poke at your argument, but more to expose a fuzziness in  
the very definition of end-to-end fairness.  The meaning of "flow A  
and flow B interact fairly" is well-defined.  The meaning of  
"application A and application B interact fairly" is less clear.  By  
the time you get to the host level, it's out the window (what if the  
host is a proxy server for 1000s of clients?).

Combine this with the difficulty of determining the direction of  
value flow for Internet packets, and I think you've got an incredibly  
difficult problem.  It may well be that what we have today is the  
best solution:  An over-engineered core and endpoints that are  
limited by the capacity of the access link they purchase and by the  
limited demand that most people have.*

(* -- A recent thread on Nanog is interesting in this regard:  The  
Skype people are starting a real-time TV/video/whatever p2p streaming  
service.  If it becomes as popular as they hope / as Skype has, it's  
quite possible that the demand will go through the roof.  I don't  
pretend to know if they're right, of course.)

>> But if you're making an economic argument, you have to consider  
>> all of
>> the costs.  There is a cost to enforcement in the network, in  
>> hardware
>> and complexity.  There is a cost to billing by usage, both in actual
>> costs and in customer satisfaction.
>
> Actually, I didn't talk of usage-based billing. Customers tend to  
> dislike
> it (people like to have predictable expenses), and switch to flat-rate
> plans whenever they can afford them.

I know.  I was giving that as an example of a cost of enforcement.   
There's also a cost to doing fairness in the network.

>> There most likely exists a point at which the costs of enforcement or
>> the costs of accounting are lower than the costs imposed by cheating
>> users.  But in an environment where capacity is still increasing
>> exponentially and where clueful network operators and programmers are
>> not getting any cheaper, it's not clear to me when we'll reach that
>> point.
>
> Mmm... demand is expanding faster than capacity. Right now the choke
> point is distribution networks, but that is slowly (in US) being  
> fixed.
> Currently DSL providers in US have something like 1:30  
> oversubscription,
> and P2P has the capacity to soak all of that. In the recent year  
> the DSL
> service in major population centers got noticeably slower during peak
> times, and the customer dissatisfaction will eventually force ISPs to
> decrease the oversubscription.

Eh, there are still multiple factors involved.  We're still in a  
phase where DSL is still gaining because it's replacing the still- 
very-present dialup:

http://www.pewinternet.org/PPF/r/184/report_display.asp

(The Pew Internet & American Life project claims that from March 2005  
- 2006, 75% of broadband growth came from "current users switching  
from dial-up to broadband."  The total # of homes with broadband  
access grew by 40% during this period.)

The portion of the growth that is fueled by an increasing # of  
customers or by customers moving to more expensive service pays for  
itself.  (Or better...)

That leaves the other portion - demand growth - that has to be  
balanced with the growth in capacity per dollar.  I suspect you're  
right that capacity per dollar is growing more slowly than total  
unfunded demand, but it's not quite as bad as it sounds.

>> It's very hard to quantify the costs of things like "complexity",  
>> "more
>> code", and "users prefer flat-rate billing", but they do exist.
>
> The funny part is that most routers can do FQ out of box. Just  
> enabling
> that will reduce the misbehaving stack/application problem to the  
> point of
> insignificance.
>
> A better design would track FQ weights on per-prefix basis (and sum  
> them
> when routes are aggregated) to improve fairness on larger scales.

Certain vendor's routers can do *everything* out of the box.  But  
they don't necessarily do everything well, or stably, or at full line- 
speed, or in a way that a network operator is comfortable with or can  
get to behave properly.  Consider RED.

>> The nice thing about today's environment is that the fiber is  
>> already in
>> the ground.  Adding more capacity is doable by "only" upgrading the
>> transcievers, adding more wavelengths, upgrading to faster  
>> multimillion
>> dollar routers, etc. :)
>
> Unfortunately, it is not that simple. You cannot pack information  
> denser
> than Shannon limit for a given level of noise, you cannot increase  
> S/N by
> pumping more power into fibers without causing non-linearity and  
> things
> like Raman scattering.  So the way to expand is to put more  
> equipment in
> parallel and reduce leg distances. It means the expensive things like
> building more amplifier stations in the middle of nowhere, and  
> beefing up
> CO space, power, and cooling.  The high-speed stuff is hot, and power
> budget quickly gets to megawatt range.

1)  There's still remaining capacity in existing fibers:

I'm not an expert in this area, but getting somewhere around 150 Tbit/ 
sec out of a fiber (aggregate, WDM) should be doable assuming the  
technology keeps up.  (Grossly simplified and underestimated from a  
quick read of http://www.nature.com/nature/journal/v411/n6841/full/ 
4111027a0.html and a few other papers.  A very conservative read  
might say a lower bound would be in the 45Tbit/sec range).

That's about an order of magnitude more than the best research  
results today:
   http://www.ntt.co.jp/news/news06e/0609/060929a.html  (Sept 2006)

which is itself about 15x better than what's in use today.

2)  A large chunk of the cost of laying fiber is the cost of  
physically installing it.  Hence, dark fiber.  (Wikipedia claims  
without attribution that the physical process "accounts for more than  
60% of the cost of developing fiber networks."  I don't know the  
truth of this.)  Yes, we've been increasing capacity by using up the  
dark fiber that was put in during the dot-com craze, but there's  
still some left.

> There's a huge disparity between capacity of PCs to source/sink  
> traffic
> (the modern desktop CPUs can easily run 200-300Mbps or TCP traffic  
> with a
> suitable NIC) and the capacity of the network.  This creates, well, an
> interesting situation - the demand is potentially huge.

Sure.  But there's also a big difference between how much people  
_can_ source/sink and how much they _want_ to.

I, like probably 98% of this list, have a *lot* of capacity at my  
fingertips, and I (ab)use it to the fullest.  Over the last day, when  
I was using the network _a lot_ (streaming mp3s constantly and one  
movie from my remote storage server), I used about 200 Kbit/sec on  
average.  That's a far cry from even my access link capacity, much  
less my NIC's capacity.

>> I suspect we're saying the same thing from different perspectives,  
>> but
>> have possibly different opinions about where we are on the cost  
>> curve.
>
> Yep. But at least it is helpful to think about economics rather  
> than go
> wishing that the world was perfect and everybody did the Right Thing:)

Of course. :)  I just think that the economics are a bit more subtle  
than just saying "people can cheat, so they will."

   -Dave


-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/bb569744/PGP.bin

From touch at ISI.EDU  Tue Jan  9 12:31:34 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 09 Jan 2007 12:31:34 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com>
References: <45980C60.9020405@web.de>	<2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>	<459AA501.8050901@isi.edu><459AB7E3.7010705@web.de>	<459AF57A.5080304@isi.edu><aaejq7j7gb.fsf@limmat.switch.ch>
	<45A11F92.3000102@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45A3FBA6.3070904@isi.edu>

Agarwal, Anil wrote:
> =20
> Joe Touch wrote -
>>>> FYI,Internet MSS's are usually in the 500-byte range in general. A
>>>> 5KB file would take 10 packets and be over by the 4th round.
>>>
>>> Um, the Internet MSS is usually 1460 bytes, except where it is hacked=

>>> to between 1300 and 1400 bytes to avoid issues with broken Path MTU
>>> Detection in the presence of links with an MTU slightly smaller than
>>> 1500 (mostly ADSL links).
>>>
>>> Packets around 500 bytes have become quite rare on the Internet today=
=2E
>=20
>> http://netweb.usc.edu/~rsinha/pkt-sizes/
>> http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html=

>=20
>> 'better connected' sites show larger packet sizes (show in the USC
>> traces), but that smaller packets are still used, and that the average=

>> size depends on the protocol (CSL traces).
> Even though smaller packet sizes are observed on the net,
> depending on protocol and application, that does not imply
> that the MSS or path MTU is small. Some applications simply send small
> amounts of data, at a time (telnet, http GETs, etc).
> I suspect, MSS is of the order of 1300-1460 bytes,
> even in these traces.

If that's the case, and such MSSs are indeed predominant throughout the
Internet, not just in well-connected universities talking to the world,
then that begs why the IETF is bothering with updates to path MTU to
avoid black-holing.

One possibility is that black-holing is prevalent, and that sites
accessible only with smaller MTUs whose ICMP 'too big' error messages
are not received are being ignored from these traces.

Anyway, it's probably appropriate to consider both 500 and 1500-byte
MTUs in these calculations. The real question is how much a connection
is sped up by using a larger arithmetic increase factor (2x vs 1.5x),
and how much that matters depends on the size of the transfer, the BW,
the RTT, and the server load. Packet size is part of that equation, but
ultimately not all that critical anyway.

Joe


--=20
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 259 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/fe522c57/signature.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/fe522c57/signature-0001.bin

From L.Wood at surrey.ac.uk  Tue Jan  9 15:30:02 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Tue, 09 Jan 2007 23:30:02 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <1168367720.4840.92.camel@localhost>
References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>
	<45A3CAA1.9010505@isi.edu> <1168367720.4840.92.camel@localhost>
Message-ID: <200701092330.XAA12986@cisco.com>

At Tuesday 09/01/2007 13:35 -0500, Jim Gettys wrote:

>One of the fundamental tenants of Linux development is its continual
>nature.  I've seen some very good academic work end up being entirely
>ignored since, by the time the work was done, the work (which was based
>on what had become a several year stale version of Linux), was hopeless
>integrate into Linux.


That's no different from simulation work done with an ns network simulator snapshot that is hopeless to integrate into the current version of ns, and so gets ignored.

Academics are rewarded by writing papers. They are not rewarded by staying current with the current codebase of the linux kernel/ns.

L.


>If you *really* want research that can be taken advantage of by Linux,
>you have to understand Linux's development model, and be willing to pay
>the price to keep up with ongoing development, and figure out how to get
>from where Linux is, to where it should be in an incremental fashion. 
>
>Particularly since the Linux 2.6 series started, "big bang" integrations
>of large changes into the system never occur; it is always stepwise
>evolution, and you have to work in this fashion, as part of the
>development community.
>                                   Regards,
>                                        - Jim

From lynne at telemuse.net  Tue Jan  9 15:46:54 2007
From: lynne at telemuse.net (Lynne Jolitz)
Date: Tue, 9 Jan 2007 15:46:54 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <1168367720.4840.92.camel@localhost>
Message-ID: <003801c73448$713c10c0$6e8944c6@telemuse.net>

Jim,
Perfectly correct. The Linux model is very different from the older BSD model of major changes and revisions every few years, and it follows more the product "renovation" cycle that Ray Lane of KPCB espouses. Bridging the gap by carefully following incremental work is the price to be paid by academics to ensure the continuity of their work in Linux.

Many people on the academic side still use forms of BSD, and perhaps prefer the old way of doing things. I use BSD myself. However, Linux is clearly the market leader and cooperating with how they handle their development model is a key consideration for promulgating new work in networking and operating systems.

I'm pleased to hear how eager the Linux conference attendees were to hear an academic "star" and take them seriously. And you are right - perhaps it is time for more networking and OS "stars" to reach out to them through talks.

Lynne Jolitz.

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Jim Gettys
> Sent: Tuesday, January 09, 2007 10:35 AM
> To: Joe Touch
> Cc: Lynne Jolitz; end2end-interest list
> Subject: Re: [e2e] Are we doing sliding window in the Internet?
> 
> 
> There is a fundamental divide that has to be overcome.
> 
> With a few exceptions (Ted T'so comes to mind), there has been few Linux
> people who also have been exposed to actively participating in the IETF.
> 
> The culture has been that there are IETF (and other specifications) that
> the Linux community read and implement.  And, as you note, they are
> (often) volunteers, though these days, a large fraction of the key
> developers are full time employees of various companies.
> 
> If this community wants to bridge this divide, I'd recommend some active
> outreach.  Having worked in both communities, it is remarkable how few
> faces are in common.
> 
> One opportunity is next week at Linux Conf Australia (in Sydney).  A
> year ago, Van Jacobson gave the best talk I've attended in more than a
> decade in New Zealand at LCA (it was the best talk of the conference,
> and given twice as a result), and caused quite a bit of a stir and
> ferment among the Linux networking people.  This kind of cross
> fertilization is healthy for both communities, I believe.
> 
> Now I'll throw some stones at some of the academic research I've seen
> done on Linux.
> 
> One of the fundamental tenants of Linux development is its continual
> nature.  I've seen some very good academic work end up being entirely
> ignored since, by the time the work was done, the work (which was based
> on what had become a several year stale version of Linux), was hopeless
> integrate into Linux.  
> 
> If you *really* want research that can be taken advantage of by Linux,
> you have to understand Linux's development model, and be willing to pay
> the price to keep up with ongoing development, and figure out how to get
> from where Linux is, to where it should be in an incremental fashion. 
> 
> Particularly since the Linux 2.6 series started, "big bang" integrations
> of large changes into the system never occur; it is always stepwise
> evolution, and you have to work in this fashion, as part of the
> development community.
>                                    Regards,
>                                         - Jim
> 
> 
> 
> 
> 
> On Tue, 2007-01-09 at 09:02 -0800, Joe Touch wrote:
> > 
> > Lynne Jolitz wrote:
> > > But if it's not worth the time and effort for the academic side to
> > > take on this charge, the marketplace will have to serve instead.
> > 
> > It's not whether academics want to spend the time and effort. Many are
> > already giving it for projects they prefer (e.g., FreeBSD in my case);
> > others have none to give (note the dearth of academics on the IESG,
> > which requires letters of 80% support).
> > 
> > I.e., the effort of volunteers is subject to its own market as well.
> > 
> > However, the primary tension seems to be that:
> > 	- standards bodies rely on emissaries from
> > 	development communities
> > 
> > 	- development communities rely on volunteers
> > 
> > This may appear to suggest that the two communities are competing for
> > volunteers, but that's not the case. We all *must* come together to work
> > on standards; the same is not true for particular OS's.
> > 
> > Joe
> > 
> -- 
> Jim Gettys
> One Laptop Per Child
> 
> 
> 


From detlef.bosau at web.de  Wed Jan 10 03:46:56 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 10 Jan 2007 12:46:56 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <003801c73448$713c10c0$6e8944c6@telemuse.net>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>
Message-ID: <45A4D230.5020105@web.de>

Lynne Jolitz wrote:
> Jim,
> Perfectly correct. The Linux model is very different from the older BSD model of major changes and revisions every few years, and it follows more the product "renovation" cycle that Ray Lane of KPCB espouses. Bridging the gap by carefully following incremental work is the price to be paid by academics to ensure the continuity of their work in Linux.
>
> Many people on the academic side still use forms of BSD, and perhaps prefer the old way of doing things. I use BSD myself. However, Linux is clearly the market leader and cooperating with how they handle their development model is a key consideration for promulgating new work in networking and operating systems.
>   

That?s simply not the point.

I think, Lloyd Wood made the point precisely:

"Academics are rewarded by writing papers. They are not rewarded by staying current with the current codebase of the linux kernel/ns."

And this holds for BSD, the OMNET simulator and for all other software that exists.

This is no bad excuse for academics not doint Linux development. It?s 
simply the fact, that research is focussed an detecting and solving 
problems. This is totally different from development and marketing. It?s 
the difference between  proving algebraic rules for dealing with natural 
numbers and developing and sell a new desktop calculator.

Research is fundamental by its nature and thus has to be independend 
from simulators and operating systems. There are many fields of research 
where implementations even do not yet exists - nevertheless they are 
necessary.

I think the basic dispute here is simply a misconception of the very 
difference between research and development.

Detlef


From jg at laptop.org  Wed Jan 10 05:14:35 2007
From: jg at laptop.org (Jim Gettys)
Date: Wed, 10 Jan 2007 08:14:35 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A4D230.5020105@web.de>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>
	<45A4D230.5020105@web.de>
Message-ID: <1168434876.4840.256.camel@localhost>

On Wed, 2007-01-10 at 12:46 +0100, Detlef Bosau wrote:

> This is no bad excuse for academics not doint Linux development. It?s 
> simply the fact, that research is focussed an detecting and solving 
> problems. This is totally different from development and marketing. It?s 
> the difference between  proving algebraic rules for dealing with natural 
> numbers and developing and sell a new desktop calculator.
> 
> Research is fundamental by its nature and thus has to be independend 
> from simulators and operating systems. There are many fields of research 
> where implementations even do not yet exists - nevertheless they are 
> necessary.
> 
> I think the basic dispute here is simply a misconception of the very 
> difference between research and development.

I have to fundamentally disagree when it comes to systems research.

If you are doing research into *systems*, an academic exercise using a
marginal system can only be justified if you are trying a *fundamental*
change to that system, and *must* start from scratch.  Most systems
research does not fall into that category.

Doing such work outside the context of a current system invalidates the
results as you cannot inter compare the results you get with any sort of
"control".  This is the basis of doing experimental science.  Giving me
results that some "improvement" helps Linux 2.4.24, when current Linux
is 2.6.19, or whatever, essentially invalidates the result, due to the
extensive changes between versions.

Much of why Van's research was able to be taken seriously by the Linux
community and has had impact was precisely in that he had done the work
on a recent version of Linux (independent of whether the code was ever
to become available or not), and so the variables were very precisely
controlled to those of his TCP implementation.  He had real credibility
as a result.
                             - Jim


-- 
Jim Gettys
One Laptop Per Child


From Jon.Crowcroft at cl.cam.ac.uk  Wed Jan 10 06:01:04 2007
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Wed, 10 Jan 2007 14:01:04 +0000
Subject: [e2e] s/Re:  Are we doing sliding window in the Internet?/systems
Message-ID: <E1H4e1Z-0002Ll-00@mta1.cl.cam.ac.uk>

In missive <1168434876.4840.256.camel at localhost>, Jim Gettys typed:

 >>Doing such work outside the context of a current system invalidates the
 >>results as you cannot inter compare the results you get with any sort of
 >>"control".  This is the basis of doing experimental science.  Giving me
 >>results that some "improvement" helps Linux 2.4.24, when current Linux
 >>is 2.6.19, or whatever, essentially invalidates the result, due to the
 >>extensive changes between versions.
 
aside from also accidentally being useful as well:-)

the idea of science is a bit like the idea of open source - so it isn't
surprising that computer systems science flourishes in an open source manner- if
other people can look at your experimental equipment, as well as your data
and can affordably re-run your experiment in the same, similar or other
circumstances, the validity of the work, and the rate of expansion of human
knowledge, are both enhanced

when people do clinical drug trial papers, they are required by many medical
journal publishers to place the data in escrow so that 3 independant reviewers can 
check the data is being analyzed right - patents mean that the drugs themselves
are publically checkable - often published funding requires the results are published
(suitably anonymised) completely...as with the genome project

 >>One Laptop Per Child
 
why not one iphone per person ?:-)

by the way, do you take the laptops back when they rich 18 (or 16, or 21, or
whatever the gnu age of majority is :-)

 cheers

   jon


From detlef.bosau at web.de  Wed Jan 10 06:16:24 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 10 Jan 2007 15:16:24 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <1168434876.4840.256.camel@localhost>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>	
	<45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost>
Message-ID: <45A4F538.1040905@web.de>

Jim Gettys wrote:
>
>
> I have to fundamentally disagree when it comes to systems research.
>
> If you are doing research into *systems*, an academic exercise using a
> marginal system can only be justified if you are trying a *fundamental*
> change to that system, and *must* start from scratch.  Most systems
> research does not fall into that category.
>
>   

What do you mean by "research into systems"? The term "system" extremely 
general.

> Doing such work outside the context of a current system invalidates the
> results as you cannot inter compare the results you get with any sort of
> "control".  This is the basis of doing experimental science.  Giving me
>   
Why does research "outside the context" of a current system invalidate 
results? Could you perhaps provide a concrete example for this?

> results that some "improvement" helps Linux 2.4.24, when current Linux
> is 2.6.19, or whatever, essentially invalidates the result, due to the
> extensive changes between versions.
>
> Much of why Van's research was able to be taken seriously by the Linux
> community and has had impact was precisely in that he had done the work
> on a recent version of Linux (independent of whether the code was ever
>   

One prominent example for Van?s research is the congavoid paper. Linux 
did not yet exist when this work was done.
Does that invalidate this work?


From jg at laptop.org  Wed Jan 10 07:18:17 2007
From: jg at laptop.org (Jim Gettys)
Date: Wed, 10 Jan 2007 10:18:17 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A4F538.1040905@web.de>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>
	<45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost>
	<45A4F538.1040905@web.de>
Message-ID: <1168442297.4840.321.camel@localhost>

On Wed, 2007-01-10 at 15:16 +0100, Detlef Bosau wrote:
> Jim Gettys wrote:
> >
> >
> > I have to fundamentally disagree when it comes to systems research.
> >
> > If you are doing research into *systems*, an academic exercise using a
> > marginal system can only be justified if you are trying a *fundamental*
> > change to that system, and *must* start from scratch.  Most systems
> > research does not fall into that category.
> >
> >   
> 
> What do you mean by "research into systems"? The term "system" extremely 
> general.

If you go look at Van's LCA presentation referenced, you'll see it is
rethinking TCP's implementation in a real system.  That is systems
research.  Maybe I should have said research in implementations and
algorithms.

Simulation of protocols does not fit what I'm talking about here.

> 
> > Doing such work outside the context of a current system invalidates the
> > results as you cannot inter compare the results you get with any sort of
> > "control".  This is the basis of doing experimental science.  Giving me
> >   
> Why does research "outside the context" of a current system invalidate 
> results? Could you perhaps provide a concrete example for this?

I saw what looked to be very nice results showing better caching
behavior in memory systems, done on an obsolete version of Linux a
couple years ago.  But since that version was so out of date, the data
showing improvement over baseline had to be taken with a large *block*
of salt, since so much had been done in the base operating system in the
meanwhile.  The data had become an apples and orange comparison. If I
can remember enough to dig up the paper, I'll send a pointer.

Without a control, experimental science becomes hand-waving anecdotes
(which typifies research in many fields, unfortunately).


> 
> > results that some "improvement" helps Linux 2.4.24, when current Linux
> > is 2.6.19, or whatever, essentially invalidates the result, due to the
> > extensive changes between versions.
> >
> > Much of why Van's research was able to be taken seriously by the Linux
> > community and has had impact was precisely in that he had done the work
> > on a recent version of Linux (independent of whether the code was ever
> >   
> 
> One prominent example for Van?s research is the congavoid paper. Linux 
> did not yet exist when this work was done.
> Does that invalidate this work?
> 

I still have scars on my back from the internet collapse in the mid
'80's. Things were so bad we were at times reduced to Federal Express
between Cambridge and Palo Alto.

The *proof* that made people take the congestion avoidance work
seriously that I remember was the application of Van and Mike Karel's
patches to 4.2BSD that made the internet (and the individual machines)
work again.  

Those patches preceded the paper, if I'm not mistaken of the history.
The proof was in the implementation of the algorithms in widely used
system of that era. Had it been done in the Twenex implementation, while
it might have been noticed, its impact would have taken far longer and
could even conceivably been ignored.
                             Regards,
                                       - Jim

-- 
Jim Gettys
One Laptop Per Child


From jg at laptop.org  Wed Jan 10 07:36:31 2007
From: jg at laptop.org (Jim Gettys)
Date: Wed, 10 Jan 2007 10:36:31 -0500
Subject: [e2e] s/Re: Are we doing sliding window in the	Internet?/systems
In-Reply-To: <E1H4e1Z-0002Ll-00@mta1.cl.cam.ac.uk>
References: <E1H4e1Z-0002Ll-00@mta1.cl.cam.ac.uk>
Message-ID: <1168443391.4840.330.camel@localhost>

On Wed, 2007-01-10 at 14:01 +0000, Jon Crowcroft wrote:

> 
> by the way, do you take the laptops back when they rich 18 (or 16, or 21, or
> whatever the gnu age of majority is :-)
> 

Nope.  The going in premise of the project is that the computers are
owned by the individual kids; this is for many reasons, including they
get taken care of much better if they are individual property rather
than communally shared property.  And the children are part of a family.
Learning does not stop at age 16, or 18, or 21 (with the exception of
certain individuals I've known ;-)).
                               - Jim

-- 
Jim Gettys
One Laptop Per Child


From ddc at csail.mit.edu  Wed Jan 10 08:08:29 2007
From: ddc at csail.mit.edu (David Clark)
Date: Wed, 10 Jan 2007 11:08:29 -0500
Subject: [e2e] Opportunity to get involved in the NSF FIND research program
Message-ID: <45A50F7D.9010602@csail.mit.edu>

Folks,
     Many of you may know that NSF has announced a focus area for
research funding called Future Internet Design, or FIND. The idea behind
FIND is to bring together interested researchers to discuss options for
a future Internet, and to develop integrated proposals for such a network.
     NSF understands that there is lots of interesting, relevant work
that has been funded from sources other than NSF, and there may be folks
who would like to come to the meetings and participate in the process,
on a BYOF (Bring Your Own Funding) basis. You might have funding from a
different NSF program, from another funding agency, or from your
company. Perhaps you are from a different country with its own funding
mechanisms.
     However you are funded, if you are interesting in being part of the
intellectual effort, please read the attached announcement, which is an
invitation to send in an informal white paper describing what you are up
to.
     If you can conceive of other ways to build bridges between this FIND
program and other research efforts, please send me a message directly.
We are open to other ideas.

David Clark (for the FIND Planning Committee)

----


    CALL FOR RESEARCH COLLABORATION ON FUTURE INTERNET ARCHITECTURES
                IN PARTNERSHIP WITH THE US NSF FIND PROGRAM

BACKGROUND

Much energy has recently crystallized within the international network 
research community for developing fresh perspectives on how to architect 
a single, coherent, global data network. The Internet's unquestionable 
success at embodying one such architecture has also led over the decades 
of its operation to unquestionable difficulties with regard to support 
for some types of functionality and sound operation.

As a reflection of this growing community interest, the U.S. National 
Science Foundation has announced a focus area for networking research 
called FIND, or Future Internet Design. The agenda of this focus area is 
to invite the research community to take a long-range perspective, and 
to consider what our global network of 10 or 15 years should be, and how 
to build a network that meets the future requirements. (For further 
information on the FIND program, see NSF solicitation 07-507, available 
at http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf07507.) The 
research funded by FIND aims to contribute to the emergence of one or 
more integrated visions of a future network.

A vital part of this effort concerns fostering collaboration and 
consensus-building among researchers working on future global network 
architecture. To this end, NSF has created a FIND Planning Committee, 
which is working with NSF to organize a series of meetings among FIND 
grant recipients structured around activities to identify and refine 
overarching concepts for a network of the future.

A BROADER COMMUNITY

NSF recognizes that its efforts at funding research to contribute to a 
future global network exists within a broader set of efforts with 
similar goals supported by other agencies, industry, and nations. 
Accordingly, NSF seeks researchers external to the FIND program 
itself?but who share a likeminded vision?to participate in the 
collaboration and consensus-building. NSF particularly welcomes 
international collaboration?any vision of a future global network will 
greatly benefit from global participation.

To this end, external researchers interested in such participation are 
invited to submit short white papers describing themselves and their 
work. Based on evaluation of these white papers, a select number of 
researchers will be invited to join the FIND meetings and other events, 
as overall meeting sizes and logistics permit.

EXPECTATIONS AND EVALUATION CRITERIA

Since the efficacy of FIND meetings is in part a function of their size 
and coherence, the evaluation of the white papers will focus on certain 
criteria that are listed below, along with expectations regarding what 
external participation entails. Naturally, interested parties should 
take these considerations into account as they write their white papers, 
and include information in their papers sufficient to allow the FIND 
program to evaluate the aptness of their participation.

? In a few sentences, please describe your research and its intended 
impact. When possible, include as an attachment (or a URL) a longer 
description, which if you wish can be something prepared for another 
purpose (e.g. your original funding proposal or a publication). It will 
help to limit the supporting material to 15 pages or fewer.

? Please summarize in the white paper the ways you see your research as 
being compatible with the objectives of FIND (the URL for the FIND 
solicitation is included above). Research that accords with the FIND 
program will generally be based on a long-term vision of future 
networking, rather than addressing specific near-term problems, and 
framed in terms of how it might contribute to an overall architecture 
for a future network.

? The FIND meetings have been organized for the benefit of researchers 
who have already been funded and are actively pursuing their research. 
Research described in white papers should already be funded. Please 
describe the means you have available to cover your FIND-related 
research: the source of funds, their duration, and (roughly) the 
supported level of effort. Unfortunately, NSF lacks additional funds to 
financially support your participation in the meetings, so you must be 
prepared to cover those costs as well. If you are planning to submit a 
FIND research proposal to the current NeTS solicitation, you should not 
submit a white paper here based on that research. Successful FIND grant 
recipients will automatically be invited to join the FIND community.

? As one of the goals of FIND is to develop an active community of 
researchers who over time work increasingly together towards coherent, 
overall architectural visions, we aim for external participants to 
likewise become significantly engaged. To this end, you should 
anticipate (and have resources for) participating in FIND project 
meetings in an active, sustained fashion.

? Your research must not be encumbered by intellectual property 
restrictions that prevent you from fully discussing your work and its 
results with the other participants.

Please try to limit your white paper to 2 pages. Your white paper (and 
the supporting research description) will be read by members of the 
research community, so do not submit anything that you would not reveal 
to your peers. (White papers are not viewed as formal submissions to NSF.)

TIMING AND SUBMISSION

You may submit a white paper at any time during the FIND program. Before 
each scheduled FIND PI meeting, the papers on hand will be reviewed. 
Meetings are anticipated to occur approximately three times a year, in 
March, July/August and November. The next FIND meeting is scheduled for 
March 5/6, 2007, and priority in consideration for that meeting will be 
given to white papers that are received by Friday, January 19th, 2007.

Send your white paper to Darleen Fisher <dlfisher at nsf.gov> and Allison 
Mankin <amankin at nsf.gov> for coordination.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: FIND-external-invite-7.pdf
Type: application/pdf
Size: 98448 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/09f5e6a4/FIND-external-invite-7-0001.pdf

From craig at aland.bbn.com  Wed Jan 10 08:20:36 2007
From: craig at aland.bbn.com (Craig Partridge)
Date: Wed, 10 Jan 2007 11:20:36 -0500
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: Your message of "Wed, 10 Jan 2007 10:18:17 EST."
	<1168442297.4840.321.camel@localhost> 
Message-ID: <20070110162036.16BFA64@aland.bbn.com>


In message <1168442297.4840.321.camel at localhost>, Jim Gettys writes:

>Those patches preceded the paper, if I'm not mistaken of the history.
>The proof was in the implementation of the algorithms in widely used
>system of that era. Had it been done in the Twenex implementation, while
>it might have been noticed, its impact would have taken far longer and
>could even conceivably been ignored.

Just building on Jim's recollection.

The patches preceeded the paper.  But the patches were vigorously tested.
Van actually did his work incrementally.  First he worked on trying to
improve congestion response and then round-trip time estimation.
There is a small set of emails from him reporting progress and asking
questions on the E2E and TCP-IP lists.  He also gave talks, with various
graphs showing the behavior of existing TCP implementations and his
implementation with various changes and got feedback.  (You can see many
of these talks if you go look at the old IETF proceedings at www.ietf.org --
a small tragedy -- the Moffett Field talk, which caused everyone
to sit up and notice isn't on-line).  He distributed his patches to
a small number of beta-testers before releasing them widely.

There was a lot of testing and carefully staged progress.  One fond
memory I have of that time is the Winter USENIX (I think in 1988 in
San Diego) and finding Van during a break.  He was sitting with a thick
stack of graphs showing the performance of round-trip time estimation
algorithms on real data over problematic Internet paths and sorting
out which algorithms worked well.

Craig

From detlef.bosau at web.de  Wed Jan 10 10:56:36 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 10 Jan 2007 19:56:36 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <1168442297.4840.321.camel@localhost>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>	
	<45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost>	
	<45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost>
Message-ID: <45A536E4.2030300@web.de>

Jim Gettys wrote:
>> What do you mean by "research into systems"? The term "system" extremely 
>> general.
>>     
>
> If you go look at Van's LCA presentation referenced, you'll see it is
>   

Could you give me a pointer please? Unfortunately, I don?t know this talk.

> rethinking TCP's implementation in a real system.  That is systems
> research.  Maybe I should have said research in implementations and
> algorithms.
>
>   

As I said, I don?t know the talk yet. However, rethinking TCP?s 
implementation in a real system should be done independent from a 
concrete operating system.

Of course, one should consider the difficulties encountered in real 
systems. But then, we should abstract from concrete systems and look for 
general principles how we can avoid difficulties und learn from our 
experiences in the past.

> Simulation of protocols does not fit what I'm talking about here.
>
>   
What are the alternatives? You can build testbeds and you can trace real 
traffic.

At least we should exploit these befor deploying premature protocols.

> Without a control, experimental science becomes hand-waving anecdotes
> (which typifies research in many fields, unfortunately).
>
>   

There is no argument about this.

The dissent is first, what is experimental science? To me, engineering 
is not purely experimental but always should rely on sound theoretical 
work and include then proper experiemnts.

Second: Can experimental deployment replace a solid research? I don?t 
think so.


>>
>> One prominent example for Van?s research is the congavoid paper. Linux 
>> did not yet exist when this work was done.
>> Does that invalidate this work?
>>
>>     
>
> I still have scars on my back from the internet collapse in the mid
> '80's. Things were so bad we were at times reduced to Federal Express
> between Cambridge and Palo Alto.
>   

So, there was an opportunity to learn from.

More drastically spoken. We all know about the Titanic disaster. And 
about the Takoma bridge disasater.
Do these invalidate academic research for, how it is called in englisch, 
naval engineering and civil engineering?
I don?t think so. I think proper research prevented a number of 
disasters like these.

And it was proper research, when we learnt from the Takoma bridge 
disaster and eventually, after decades of research, the Akashi-Kaikyo 
bridge could be completed.
> The *proof* that made people take the congestion avoidance work
> seriously that I remember was the application of Van and Mike Karel's
> patches to 4.2BSD that made the internet (and the individual machines)
> work again.  
>   

First: How many nodes did the Internet have that time?
Seconde: How many operating systems and implementations for TCP/IP 
support have been around that time?

To the best of my knowledge, the "Internet" was an experimental test bed 
that time. We must not compare this situation to the actual one.

> Those patches preceded the paper, if I'm not mistaken of the history.
> The proof was in the implementation of the algorithms in widely used
>   

I don?t think that this is a "proof". I think, the congavoid paper has a 
very sound theoretical foundation.

What was experienced practically was the problem and the relevance of 
congestion control.

The rest is proper work.

I still think on a remark of some computer science professor who even 
told me that the timeouts could be only determined by experiments.
And even these timeouts are based on sound conceptional work in Van?s paper.

> system of that era. Had it been done in the Twenex implementation, while
> it might have been noticed, its impact would have taken far longer and
> could even conceivably been ignored.
>   

If the congestion collapses in the eighties were as bad as you say and 
if there was a solution, this surely would not have been ignored.

Detlef


From jg at laptop.org  Wed Jan 10 11:28:43 2007
From: jg at laptop.org (Jim Gettys)
Date: Wed, 10 Jan 2007 19:28:43 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45A536E4.2030300@web.de>
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>
	<45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost>
	<45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost>
	<45A536E4.2030300@web.de>
Message-ID: <1168457323.4840.418.camel@localhost>

On Wed, 2007-01-10 at 19:56 +0100, Detlef Bosau wrote:
> Jim Gettys wrote:
> >> What do you mean by "research into systems"? The term "system" extremely 
> >> general.
> >>     
> >
> > If you go look at Van's LCA presentation referenced, you'll see it is
> >   
> 
> Could you give me a pointer please? Unfortunately, I don?t know this talk.

There is a link in the following article, as I posted before.

http://lwn.net/Articles/169961/
> 
> > rethinking TCP's implementation in a real system.  That is systems
> > research.  Maybe I should have said research in implementations and
> > algorithms.
> >
> >   
> 
> As I said, I don?t know the talk yet. However, rethinking TCP?s 
> implementation in a real system should be done independent from a 
> concrete operating system.
> 
> Of course, one should consider the difficulties encountered in real 
> systems. But then, we should abstract from concrete systems and look for 
> general principles how we can avoid difficulties und learn from our 
> experiences in the past.
> 

I think you will see that by analyzing and solving the real problems in
Linux he came up with principles that are (potentially) transferable to
many systems.

Doing it independently of a real system would have not proved the points
he proved in that work Van reported on at LCA last year.

> > Simulation of protocols does not fit what I'm talking about here.
> >
> >   
> What are the alternatives? You can build testbeds and you can trace real 
> traffic.
> 
> At least we should exploit these befor deploying premature protocols.

Certainly; but they are at most "doing your homework"; but they cannot
substitute for deployment or testing at scale on a real network.

> 
> > Without a control, experimental science becomes hand-waving anecdotes
> > (which typifies research in many fields, unfortunately).
> >
> >   
> 
> There is no argument about this.
> 
> The dissent is first, what is experimental science? To me, engineering 
> is not purely experimental but always should rely on sound theoretical 
> work and include then proper experiemnts.

In a real science, theory and experiment go hand in hand; you can't know
what problems are worth trying to apply or develop theory for without
experience and experiments, and you can't validate any theory without
experiment.

> 
> I don?t think that this is a "proof". I think, the congavoid paper has a 
> very sound theoretical foundation.

Yes, and the motivation and theory worked out in reaction to the real
world experience and analysis of the network failing.

If theory had been understood in the first place in advance of the
Internet's congestion collapse, Van would never have worked on the
problem; presumably one would try to avoid what one forsees.

> 
> What was experienced practically was the problem and the relevance of 
> congestion control.
> 
> The rest is proper work.
> 
> I still think on a remark of some computer science professor who even 
> told me that the timeouts could be only determined by experiments.
> And even these timeouts are based on sound conceptional work in Van?s paper.

You seem to think that theory exists in a vacuum from experience and
experiment.  It doesn't.  The theory was worked out in reaction to the
situation at hand.

> 
> > system of that era. Had it been done in the Twenex implementation, while
> > it might have been noticed, its impact would have taken far longer and
> > could even conceivably been ignored.
> >   
> 
> If the congestion collapses in the eighties were as bad as you say and 
> if there was a solution, this surely would not have been ignored.

For parts of the internet, it really was that bad, and it would
*certainly* have taken much longer before the work was validated and
deployed, had it been done on a small minority system or as a research
prototype model.
                         - Jim


-- 
Jim Gettys
One Laptop Per Child


From james.Ramming at darpa.mil  Tue Jan  9 15:50:01 2007
From: james.Ramming at darpa.mil (Ramming, J. Christopher)
Date: Tue, 9 Jan 2007 18:50:01 -0500
Subject: [e2e] Assurable Global Networking (RFI & Workshop)
Message-ID: <ABBE6B6E61B21A4CBDA5D7AC5C06A1A7035010EF@sde2k3-mb2.darpa.mil>

REQUEST FOR INFORMATION - Assurable Global Networking

Response deadline: January 31, 2007
Workshop for respondents: February 22, 2007

Defense Advanced Research Projects Agency's (DARPA) Strategic Technology
Office (STO) is requesting information on research ideas and approaches
that could provide the foundation for next-generation Assurable Global
Networks (AGNs).

For more information please visit:
http://www.darpa.mil/sto/solicitations/AGN/index.html


From touch at ISI.EDU  Wed Jan 10 13:39:47 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 10 Jan 2007 13:39:47 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <459EB8F8.4060304@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de>
Message-ID: <45A55D23.6080505@isi.edu>


Detlef Bosau wrote:
...
>>> It?s interesting what handles to the final CLOSE ACK here which is
>>> typically not spoofed in splitters to ensure poper ACK semantics.
>>>     
>> I don't understand "proper ACK semantics". The splitter destroys those.
>> The semantics that may be kept are at the connection level
>> (open/closed), but the semantics of data ACKs are irrevocably destroyed.
> 
> I think of the semantics at the connection level. Which I think to be
> sufficient in many cases.

The result is that you think you started/ended a connection correctly,
but that the wrong data got there?

As to PEPs...

> Otherwise the problem is: When the bandwidth sender - splitter is, e.g.,
> the average bandwidth / rate splitter-sender but far less than the
> maximum rate splitter / sender than a simple router perhaps would hardly
> store any data and thus hardly equalize the rate / delivery times.
> Thierry describes delay spikes of several seconds. If we think about
> UMTS, we can imagine a wireless link were nothing happens for up to
> several seconds - thus even no data is clocked out from the sender - and
> then we have about 2 Mbps throuhput for a short time - which is perhaps
> much more than the actual Internet path can carry. In such a scenario we
> want to have the router / splitter / PEP / whateverbox buffer the data
> and equalize the rate variations. Can this be achieved by pure pacing in
> the one or other direction?

Pacing is a simpler version of what you're asking ACK clocking to do; if
ACK clocking works, pacing definitely should.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/d6265b10/signature.bin

From gds at best.com  Wed Jan 10 15:07:36 2007
From: gds at best.com (Greg Skinner)
Date: Wed, 10 Jan 2007 23:07:36 +0000
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <1168457323.4840.418.camel@localhost>;
	from jg@laptop.org on Wed, Jan 10, 2007 at 07:28:43PM +0000
References: <003801c73448$713c10c0$6e8944c6@telemuse.net>
	<45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost>
	<45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost>
	<45A536E4.2030300@web.de> <1168457323.4840.418.camel@localhost>
Message-ID: <20070110230736.A10734@gds.best.vwh.net>

On Wed, Jan 10, 2007 at 07:28:43PM +0000, Jim Gettys wrote:
> On Wed, 2007-01-10 at 19:56 +0100, Detlef Bosau wrote:
> > I don??t think that this is a "proof". I think, the congavoid paper has a 
> > very sound theoretical foundation.
> 
> Yes, and the motivation and theory worked out in reaction to the real
> world experience and analysis of the network failing.
> 
> If theory had been understood in the first place in advance of the
> Internet's congestion collapse, Van would never have worked on the
> problem; presumably one would try to avoid what one forsees.

Depending on what you mean by "theory", one could argue that the basis
of the congavoid paper is in control theory, which was well understood
in the 1980s.  OTOH, its application to the Internet and TCP/IP
implementations of that time was not well understood.

> > If the congestion collapses in the eighties were as bad as you say and 
> > if there was a solution, this surely would not have been ignored.
> 
> For parts of the internet, it really was that bad, and it would
> *certainly* have taken much longer before the work was validated and
> deployed, had it been done on a small minority system or as a research
> prototype model.

For Detlef's benefit, there are archives of the tcp-ip mailing list
where the early discussions on congestion avoidance in the emerging
Internet were held.  Most people involved in this discussion today
will read the emails of the past and recognize the problem that was
being discussed based on what has been studied and published.  Go to
http://securitydigest.org/tcp-ip/#archives, follow the July 1986 link,
and start with the subject "TCP retransmission efficiency". Follow the
discussions from there.  You'll eventually get to VJ's results.

Jim does note correctly that:

> Had [congavoid changes] been done in the Twenex implementation,
> while it might have been noticed, its impact would have taken far
> longer and could even conceivably been ignored.

Benefit of making the changes on a widely used platform was that
congestion was considerably reduced, validating the research.

IMO, it would have been great if more control theory could have been
applied to early Internet design.  Fortunately, VJ kept plugging
enough that he was able to push his ideas through, providing the
bedrock for the R&D in network performance that came afterward.

--gregbo


From detlef.bosau at web.de  Wed Jan 10 15:29:47 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Thu, 11 Jan 2007 00:29:47 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A55D23.6080505@isi.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
Message-ID: <45A576EB.206@web.de>

Joe Touch wrote:
>> I think of the semantics at the connection level. Which I think to be
>> sufficient in many cases.
>>     
>
> The result is that you think you started/ended a connection correctly,
> but that the wrong data got there?
>   

Well, it?s just how I understand the semantics of a "CLOSE ACK". When a 
receiver issues a CLOSE ACK, we know that all data has reached the 
receiving socket. What we do not know is whether the data has reached 
the application. To my understanding that?s one reason why we use 
acknowledgements on application level when it is necessary to know 
whether an application has received all data.

So, to my understanding a PEP which keeps the semantics at the 
connection level keeps all semantics which is provided by TCP itself.
Acknowledgements at the application level are beyond the scope of TCP.


> As to PEPs...
>
>   
>> Otherwise the problem is: When the bandwidth sender - splitter is, e.g.,
>> the average bandwidth / rate splitter-sender but far less than the
>> maximum rate splitter / sender than a simple router perhaps would hardly
>> store any data and thus hardly equalize the rate / delivery times.
>> Thierry describes delay spikes of several seconds. If we think about
>> UMTS, we can imagine a wireless link were nothing happens for up to
>> several seconds - thus even no data is clocked out from the sender - and
>> then we have about 2 Mbps throuhput for a short time - which is perhaps
>> much more than the actual Internet path can carry. In such a scenario we
>> want to have the router / splitter / PEP / whateverbox buffer the data
>> and equalize the rate variations. Can this be achieved by pure pacing in
>> the one or other direction?
>>     
>
> Pacing is a simpler version of what you're asking ACK clocking to do; if
> ACK clocking works, pacing definitely should.
>   

The problem I mean is very similar to problems like ACK compression or 
the problem descriped in an RFC draft by Craig:

http://tools.ietf.org/html/draft-partridge-e2e-ackspacing-00

Craig addresses the problem that during slow start bursts may grow that 
large that buffer queues on the path may be overloaded.
A similar problem may happen when a mobile network has intermittend 
delay spikes and phases with high througput. In phases with high 
throughut a mobile might receive a data burst and thus an appropriate 
data burst is clocked out at the sender which may overrun queues on the 
path.

Craig proposed to overcome this problem by appropriate ACK spacing, i.e. 
intendedly puts short time gaps between ACK datagrams.
The problem is also addressed in a paper "Paced TCP for High 
Delay-Bandwidth Networks" by Joanna Kulik, Robert Coulter, Dennis 
Rockwell and Craig Partridge.

The one interesting question for me (perhaps not for the community, 
depending on the answer ;-)) is: Do we already have a pacing / spacing 
scheme which provides appropriate ACK spacing for mobile networks?

And of course this question very much depends on whether the problem of 
intermittend bursts in mobile networks is relevant. That?s why I wrote 
the post on hiccups in mobile networks some days ago. I haved looked for 
literature in this area quite intensely but found it extremely hard to 
get useful information here. I already refered to the Globecom 04 paper 
by Thierry Klein but I did not find really useful additional material on 
this issue. Particularly scheduling algorithms seem to be company 
confidential quite often so it is extremely hard to get information there.

Moreover, I?m not quite sure whether ACK spacing is already in use here 
(sic!) because one consequence of doing ACK spacing in mobile networks 
is that the sender is confronted with a large delay bandwidth product. 
 From the literatur about mobile networks I know that large delay 
bandwidth produckts are often claimed for mobile networks - however no 
one could explain to me where the claimed path capacity should come 
from. It?s surely not the wireless channel which typically hardly keeps 
an IP packet layer. I don?t think it?s likely that the ARQ buffers 
provide too much memory capacity because a "sliding window scheme" for 
ARQ and RLP would require mobile receivers to keep a number of 
incomplete IP-packets and therefore a certain amount of storage capacity 
for a questionable benefit because in mobile networks the wireless path 
can keep only very few RLP frames on the fly.

In short: Perhaps we may find some kbytes memory on L2 here. Perhaps the 
layer 2 may keep an average of one or two IP packets. That does 
absolutely not explain why mobile networks are frequently claimed to 
have that large bandwidth delay products that this would be a problem 
for TCP.

So, I?m just eager to know what mobile network operators are doing here.

If mobile networks really exhibit that large delay bandwidth products, 
and if we have intermittent bursts and delay spikes here we do not talk 
about some kbytes but we talk about up to several hundred kbytes and 
more depending on how bursty the traffic is, we have the same issues 
here as we have in satellite networks and other networks with an 
extremely large delay bandwidth product.

So my question is of course a state of the art question. And I spent a 
huge amount of time for literature research on this issue but as I said 
its extremely hard to find resilient research papers here. Most of the 
information I found is either extremely vague or it is written in PhD 
theses which are written in close cooperation with network operators and 
where I find claimed problems - but when it comes to details, this is 
"corporate confidential", which is definitely not my understanding of 
proper research.

In know that this post here exhibits a very strong criticism against 
many papers which present "results" from "practical experiences with 
GPRS" etc.
But after having read dozens of papers of this kind for years, my 
conclusion is that many of the authors present snapshots of non 
repeatable experiments here and do not really know what they have 
measured. The more material I read of this kind the less I?m convinced 
that the material is good.

So, it?s my personal opinion, and if this is wrong I?m willing to accept 
criticism here, that when it comes to mobile networks we have quite a 
few statement of belief but hardly any resilient material.

And what I find extremely annoying here is the permanent excuse "we 
cannot say anything about the wireless channel". I own a cell phone 
myself for more than a decade now and use it frequently. And in fact, 
mobile NOs know there channels that well that they can offer phone 
service. So the knowledge on mobile channels may be incomplete - but 
there is more than nothing. In addition, there is a bunch of work on 
adaptive channel coding. Now, you cannot adapt a coding scheme when you 
don?t know what channel properties your coding scheme shall be adapted 
to. So obviously, there _are_ channel models.
And they are practically used. And there _are_ Radio Link Protocols and 
thre _are_ MAC- and scheduling schemes.

But when I ascked even research engineers in well known companies which 
build mobile phones why e.g. GPRS accepts delivery times for a packet of 
up to 10 minutes, no one was able or willing to explain this to me. Now, 
why it?s in the standards, when there is no explanation for this or no 
necessity to accept this?

I was involved in an academic research project which dealt with 
adaptation of multimedia streams at varying channel conditions in mobile 
networks. And even there I didn?t get resilient material at which 
conditions I should adapt by our industrial partners. The inevitable 
consequence was that the reserach ended up in a pure disaster. I waisted 
years of my life on this one. So, when I write this post, you see me in 
fact in an angry and bitter condition.

Nearly seven years ago, a professor asked my what are the 
characteristics of mobile network. After seven years I still do not know.
And when I tried to talk to colleagues from mobile phone manufacturers 
the only remark was: "Oh, I see, you?re used to wirebound networks".

I have seen a number of PhD theses dealing with hiccups. But I have not 
yet seen any resilient material whether there _are_ hiccups.
Of course we can do research that way: "Let?s assume hiccups." O.k. But 
which assumptions are reasonable here? And which are resulting delay 
bandwidth products? 10 kByte? 100 kByte? 10 MByte? And which RTTs are we 
going to see if we use sufficient buffering? 1 second? 2 seconds?
Or - according to the ETSI standard for GPRS - a quarter of an hour?

During the last seven years of my life, from which I am unemployed the 
last three years, I always wanted to understand only one thing:
"What are the consequences of mobilty and mobile networks for TCP and 
upper layers?"

And after seven years, to the best of my knowledge, I say: We have a lot 
of creeds - but hardly any resilient knowlege.

Detlef
> Joe
>
>   


From touch at ISI.EDU  Wed Jan 10 15:57:46 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 10 Jan 2007 15:57:46 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A576EB.206@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de>
Message-ID: <45A57D7A.6030505@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
>>> I think of the semantics at the connection level. Which I think to be
>>> sufficient in many cases.
>>>     
>>
>> The result is that you think you started/ended a connection correctly,
>> but that the wrong data got there?
>>   
> 
> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
> receiver issues a CLOSE ACK, we know that all data has reached the
> receiving socket.

We should know that. But when we have intermidiates spoofing ACKs, all
we know is that the two endpoints agree that they have closed. The data
itself is not known.

Case in point - if the intermediary ACKs data and continues to buffer
it, and the window wraps, and then the intermediary goes down, the
endpoints think the data reached the buffer correctly but it really did not.

> What we do not know is whether the data has reached
> the application.

TCP is a reliable transport protocol; it is not a reliable application
protocol. Actions outside of TCP are not ensured by TCP.

> To my understanding that?s one reason why we use
> acknowledgements on application level when it is necessary to know
> whether an application has received all data.

Agreed, but we do know some other things. As a *receiver*, when we issue
a CLOSE, we keep reading until there is no more data. If we do so, AND
we receive a "no more data", then we *know* all the data has been
received correctly.

I.e., the semantics of who knows what are receiver-driven, not sender.

> So, to my understanding a PEP which keeps the semantics at the
> connection level keeps all semantics which is provided by TCP itself.
> Acknowledgements at the application level are beyond the scope of TCP.

See above; PEPs that spoof ACKs can result in different data streams
being 'correctly' processed without either side knowing so.

Joe

> 
> 
>> As to PEPs...
>>
>>  
>>> Otherwise the problem is: When the bandwidth sender - splitter is, e.g.,
>>> the average bandwidth / rate splitter-sender but far less than the
>>> maximum rate splitter / sender than a simple router perhaps would hardly
>>> store any data and thus hardly equalize the rate / delivery times.
>>> Thierry describes delay spikes of several seconds. If we think about
>>> UMTS, we can imagine a wireless link were nothing happens for up to
>>> several seconds - thus even no data is clocked out from the sender - and
>>> then we have about 2 Mbps throuhput for a short time - which is perhaps
>>> much more than the actual Internet path can carry. In such a scenario we
>>> want to have the router / splitter / PEP / whateverbox buffer the data
>>> and equalize the rate variations. Can this be achieved by pure pacing in
>>> the one or other direction?
>>>     
>>
>> Pacing is a simpler version of what you're asking ACK clocking to do; if
>> ACK clocking works, pacing definitely should.
>>   
> 
> The problem I mean is very similar to problems like ACK compression or
> the problem descriped in an RFC draft by Craig:
> 
> http://tools.ietf.org/html/draft-partridge-e2e-ackspacing-00
> 
> Craig addresses the problem that during slow start bursts may grow that
> large that buffer queues on the path may be overloaded.
> A similar problem may happen when a mobile network has intermittend
> delay spikes and phases with high througput. In phases with high
> throughut a mobile might receive a data burst and thus an appropriate
> data burst is clocked out at the sender which may overrun queues on the
> path.
> 
> Craig proposed to overcome this problem by appropriate ACK spacing, i.e.
> intendedly puts short time gaps between ACK datagrams.
> The problem is also addressed in a paper "Paced TCP for High
> Delay-Bandwidth Networks" by Joanna Kulik, Robert Coulter, Dennis
> Rockwell and Craig Partridge.
> 
> The one interesting question for me (perhaps not for the community,
> depending on the answer ;-)) is: Do we already have a pacing / spacing
> scheme which provides appropriate ACK spacing for mobile networks?
> 
> And of course this question very much depends on whether the problem of
> intermittend bursts in mobile networks is relevant. That?s why I wrote
> the post on hiccups in mobile networks some days ago. I haved looked for
> literature in this area quite intensely but found it extremely hard to
> get useful information here. I already refered to the Globecom 04 paper
> by Thierry Klein but I did not find really useful additional material on
> this issue. Particularly scheduling algorithms seem to be company
> confidential quite often so it is extremely hard to get information there.
> 
> Moreover, I?m not quite sure whether ACK spacing is already in use here
> (sic!) because one consequence of doing ACK spacing in mobile networks
> is that the sender is confronted with a large delay bandwidth product.
> From the literatur about mobile networks I know that large delay
> bandwidth produckts are often claimed for mobile networks - however no
> one could explain to me where the claimed path capacity should come
> from. It?s surely not the wireless channel which typically hardly keeps
> an IP packet layer. I don?t think it?s likely that the ARQ buffers
> provide too much memory capacity because a "sliding window scheme" for
> ARQ and RLP would require mobile receivers to keep a number of
> incomplete IP-packets and therefore a certain amount of storage capacity
> for a questionable benefit because in mobile networks the wireless path
> can keep only very few RLP frames on the fly.
> 
> In short: Perhaps we may find some kbytes memory on L2 here. Perhaps the
> layer 2 may keep an average of one or two IP packets. That does
> absolutely not explain why mobile networks are frequently claimed to
> have that large bandwidth delay products that this would be a problem
> for TCP.
> 
> So, I?m just eager to know what mobile network operators are doing here.
> 
> If mobile networks really exhibit that large delay bandwidth products,
> and if we have intermittent bursts and delay spikes here we do not talk
> about some kbytes but we talk about up to several hundred kbytes and
> more depending on how bursty the traffic is, we have the same issues
> here as we have in satellite networks and other networks with an
> extremely large delay bandwidth product.
> 
> So my question is of course a state of the art question. And I spent a
> huge amount of time for literature research on this issue but as I said
> its extremely hard to find resilient research papers here. Most of the
> information I found is either extremely vague or it is written in PhD
> theses which are written in close cooperation with network operators and
> where I find claimed problems - but when it comes to details, this is
> "corporate confidential", which is definitely not my understanding of
> proper research.
> 
> In know that this post here exhibits a very strong criticism against
> many papers which present "results" from "practical experiences with
> GPRS" etc.
> But after having read dozens of papers of this kind for years, my
> conclusion is that many of the authors present snapshots of non
> repeatable experiments here and do not really know what they have
> measured. The more material I read of this kind the less I?m convinced
> that the material is good.
> 
> So, it?s my personal opinion, and if this is wrong I?m willing to accept
> criticism here, that when it comes to mobile networks we have quite a
> few statement of belief but hardly any resilient material.
> 
> And what I find extremely annoying here is the permanent excuse "we
> cannot say anything about the wireless channel". I own a cell phone
> myself for more than a decade now and use it frequently. And in fact,
> mobile NOs know there channels that well that they can offer phone
> service. So the knowledge on mobile channels may be incomplete - but
> there is more than nothing. In addition, there is a bunch of work on
> adaptive channel coding. Now, you cannot adapt a coding scheme when you
> don?t know what channel properties your coding scheme shall be adapted
> to. So obviously, there _are_ channel models.
> And they are practically used. And there _are_ Radio Link Protocols and
> thre _are_ MAC- and scheduling schemes.
> 
> But when I ascked even research engineers in well known companies which
> build mobile phones why e.g. GPRS accepts delivery times for a packet of
> up to 10 minutes, no one was able or willing to explain this to me. Now,
> why it?s in the standards, when there is no explanation for this or no
> necessity to accept this?
> 
> I was involved in an academic research project which dealt with
> adaptation of multimedia streams at varying channel conditions in mobile
> networks. And even there I didn?t get resilient material at which
> conditions I should adapt by our industrial partners. The inevitable
> consequence was that the reserach ended up in a pure disaster. I waisted
> years of my life on this one. So, when I write this post, you see me in
> fact in an angry and bitter condition.
> 
> Nearly seven years ago, a professor asked my what are the
> characteristics of mobile network. After seven years I still do not know.
> And when I tried to talk to colleagues from mobile phone manufacturers
> the only remark was: "Oh, I see, you?re used to wirebound networks".
> 
> I have seen a number of PhD theses dealing with hiccups. But I have not
> yet seen any resilient material whether there _are_ hiccups.
> Of course we can do research that way: "Let?s assume hiccups." O.k. But
> which assumptions are reasonable here? And which are resulting delay
> bandwidth products? 10 kByte? 100 kByte? 10 MByte? And which RTTs are we
> going to see if we use sufficient buffering? 1 second? 2 seconds?
> Or - according to the ETSI standard for GPRS - a quarter of an hour?
> 
> During the last seven years of my life, from which I am unemployed the
> last three years, I always wanted to understand only one thing:
> "What are the consequences of mobilty and mobile networks for TCP and
> upper layers?"
> 
> And after seven years, to the best of my knowledge, I say: We have a lot
> of creeds - but hardly any resilient knowlege.
> 
> Detlef
>> Joe
>>
>>   
> 
> 

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/48d916ff/signature-0001.bin

From L.Wood at surrey.ac.uk  Wed Jan 10 16:25:24 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 11 Jan 2007 00:25:24 +0000
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A576EB.206@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de>
Message-ID: <200701110025.AAA04307@cisco.com>

At Thursday 11/01/2007 00:29 +0100, you wrote:

>Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK,

...the ACK starts off CLOSE to the receiver and then goes FURTHER AWAY, ending up a FAR ACK. That's the semantics.

L.

channelling crowcroft. 


From faber at ISI.EDU  Wed Jan 10 17:35:08 2007
From: faber at ISI.EDU (Ted Faber)
Date: Wed, 10 Jan 2007 17:35:08 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <200701110025.AAA04307@cisco.com>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de> <200701110025.AAA04307@cisco.com>
Message-ID: <20070111013508.GA1402@hut.isi.edu>

On Thu, Jan 11, 2007 at 12:25:24AM +0000, Lloyd Wood wrote:
> At Thursday 11/01/2007 00:29 +0100, you wrote:
> 
> >Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK,
> 
> ...the ACK starts off CLOSE to the receiver and then goes FURTHER
> AWAY, ending up a FAR ACK. That's the semantics.

And if you collect 5 of them, it's a FIN ACK.  Which may be where we
were *trying* to go. :-)

-- 
Ted Faber
http://www.isi.edu/~faber           PGP: http://www.isi.edu/~faber/pubkeys.asc
Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/18fd3301/attachment.bin

From Anil.Agarwal at viasat.com  Wed Jan 10 20:08:06 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Wed, 10 Jan 2007 23:08:06 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
References: <45A57D7A.6030505@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>

Joe Touch wrote -
>>
>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
>> receiver issues a CLOSE ACK, we know that all data has reached the
>> receiving socket.

> We should know that. But when we have intermidiates spoofing ACKs, all
> we know is that the two endpoints agree that they have closed. The data
> itself is not known.

> Case in point - if the intermediary ACKs data and continues to buffer
> it, and the window wraps, and then the intermediary goes down, the
> endpoints think the data reached the buffer correctly but it really did not.

Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes of sender 
data without delivering any to the receive end-point, then goes down, and 
the end-points continue the connection using the wrapped
sequence number, which in this case match up just right, so that the intervening
2^32 bytes disappear down a black hole, without the sender or receive 
being any wiser?
 
Cheers,
Anil
------------------
Anil Agarwal
ViaSat Inc.
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/bb9dc00f/attachment.html

From detlef.bosau at web.de  Thu Jan 11 02:07:25 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Thu, 11 Jan 2007 11:07:25 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <20070111013508.GA1402@hut.isi.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de> <200701110025.AAA04307@cisco.com>
	<20070111013508.GA1402@hut.isi.edu>
Message-ID: <45A60C5D.7000300@web.de>

Ted Faber wrote:
> On Thu, Jan 11, 2007 at 12:25:24AM +0000, Lloyd Wood wrote:
>   
>> At Thursday 11/01/2007 00:29 +0100, you wrote:
>>
>>     
>>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK,
>>>       
>> ...the ACK starts off CLOSE to the receiver and then goes FURTHER
>> AWAY, ending up a FAR ACK. That's the semantics.
>>     
>
> And if you collect 5 of them, it's a FIN ACK.  Which may be where we
> were *trying* to go. :-)
>
>   

I apologize if this is a stupid question, but from the RFC I understood 
that the receiving socket sends an acknowledgement when all data was 
received. That?s CLOSE ACK.

Is there an explicit acknowledgement which tell?s the sender that all 
data has been delivered to the _application_? Can this even be achieved 
in finite time? An application may crash or hang!

To my understanding the Fin/FinACK/..... is to shut down a TCP 
connection knowing about the two army problem and the fact that this 
cannot be solved in finite time?

I apologizse when this is a misconception...

Detlef


From detlef.bosau at web.de  Thu Jan 11 03:43:04 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Thu, 11 Jan 2007 12:43:04 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A57D7A.6030505@isi.edu>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de> <45A57D7A.6030505@isi.edu>
Message-ID: <45A622C8.1080707@web.de>

Joe Touch wrote:
>>>
>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
>> receiver issues a CLOSE ACK, we know that all data has reached the
>> receiving socket.
>>     
>
> We should know that. But when we have intermidiates spoofing ACKs, all
> we know is that the two endpoints agree that they have closed. The data
> itself is not known.
>
> Case in point - if the intermediary ACKs data and continues to buffer
> it, and the window wraps, and then the intermediary goes down, the
> endpoints think the data reached the buffer correctly but it really did not.
>
>   

Of course. But, assumed we can overcome the window wrap problem, to my 
understanding spoofing boxes must not spoof CLOSE ACK, so that the 
sender is not notified that all data has reached the final receiver 
until this happens.

Of course, we don?t know anything of intermediate states and of course 
we run into a problem if a spoofing box fails.

>> What we do not know is whether the data has reached
>> the application.
>>     
>
> TCP is a reliable transport protocol; it is not a reliable application
> protocol. Actions outside of TCP are not ensured by TCP.
>
>   
Fine :-)

Then one could even argue with an end to end argument: When a tranport 
protocol cannot assure that sent data has been read successfully by the 
receiving application, we do need an acknowledgement scheme at the 
application level anyway.

Please don?t misunderstand me. I don?t want to be careless about the 
problem.

All I want to say is that there may be situations, e.g. extremely large 
delay bandwidth products where one perhaps really wants to have an 
alternative to AIMD probing to have an acceptable startup behaviour, 
where proxies / splitters / spoofing boxes should be considered very 
seriously.

I don?t remember the paper but I think Sally Floyd once wrote about a 
satellite connecetion where it takes 20 minutes or so for a flow to 
achive acceptable throuhgput due to an extremely large delay bandwidth 
product. So, when we _have_ acknowledgements at application level and we 
can reduce fate sharing problems to an acceptable level and some proxy 
could help us to significantly accelerate the start up phase here, I 
think we should at least consider this as one way to go.

>> To my understanding that?s one reason why we use
>> acknowledgements on application level when it is necessary to know
>> whether an application has received all data.
>>     
>
> Agreed, but we do know some other things. As a *receiver*, when we issue
> a CLOSE, we keep reading until there is no more data. If we do so, AND
> we receive a "no more data", then we *know* all the data has been
> received correctly.
>   

O.k., so we can detect an error: The sender sent a CLOSE and there is 
trailing data afterwards. In that case (I don?t know what the RFCs say 
here) we can issue an error message , e.g. a RST. So, let?s take the 
sender?s view then: How long shall a sender wait for a possible error 
message like that? Doesn?t this lead to the problem that a missing NAK 
is not equivalent to an ACK?

> I.e., the semantics of who knows what are receiver-driven, not sender.
>
>   

However, in a reliable connection the sender wants to know when all data 
has been completely delivered.

>> So, to my understanding a PEP which keeps the semantics at the
>> connection level keeps all semantics which is provided by TCP itself.
>> Acknowledgements at the application level are beyond the scope of TCP.
>>     
>
> See above; PEPs that spoof ACKs can result in different data streams
> being 'correctly' processed without either side knowing so.
>
> Joe
>
>   


From lars.eggert at nokia.com  Thu Jan 11 07:43:16 2007
From: lars.eggert at nokia.com (lars.eggert@nokia.com)
Date: Thu, 11 Jan 2007 17:43:16 +0200
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <200701040429.EAA24974@cisco.com>
Message-ID: <213394AB6954BE4682E8B2A49103468348536D@esebe108.NOE.Nokia.com>

Hi,

sorry for jumping in late - big pile of unread mail over the holidays.

> This issue is minor compared to the widespread changes to 
> their TCP stack Microsoft made with adopting Compound TCP in Vista.
> http://www.microsoft.com/technet/community/columns/cableguy/cg
> 1105.mspx

The IETF has approached MS over this issue, and apparently C-TCP will
not be enabled by default on end-user Vista versions. The decision for
the server versions has not been made AFAIK.

Recent Linux versions, however, apparently enable BIC or CUBIC TCP by
default, which raises concerns.

In any case, these examples all illustrate that there seems to be
considerable interest in deploying "faster" TCP variants on the
Internet, or TCP variants that are "more optimal" across certain paths.
Many of these schemes can be significantly more aggressive than any
current congestion control standard.

The concern is that while many of these proposals look interesting, few
if any have been validated to the point where they can be recommended
for wide-spread deployment. (And let's be clear, stuff that gets shipped
in the most common stacks out there _is_ seeing wide-spread deployment,
especially if enabled by default.) Many modifications haven't even been
_documented_ to the point of allowing to analyze their impact, even ones
that are shipping.

We're on a slippery slope here. Yes, TCP is less than efficient in many
network scenarios that are becoming increasingly more common, and
modifications can have a positive impact. But they can also have a large
negative impact on the careful equilibrium that the VJ mechanism have
maintained for the last 15 years. Congestion control is arguably one of
the pillars of the Internet, and changes need to be thought through and
validated carefully, both by the proposers and the community at large.

The good news is that we do have research results and some limited
operational experience that looks promising. We need more of it. Before
wide-spread deployment, we need wide-spread experimentation. The recent
draft-floyd-tsvwg-cc-alt lists a number of important points that such
experiments need to discuss.

Some IETF transport folks have been discussing how to make progress in
this space. A first step seems to be that proposed modifications need to
be sufficiently documented by the proposers in a public forum, such that
the community can review them. Informational RFCs are a convenient form.
The community could then elect to further discuss and analyze promising
proposals, developing them towards specification for Experimental use. A
Standards Track effort would eventually follow.

We're planning to further discuss these issues and a proposed way
forward at the ICCRG and PFLDnet meetings in February and would welcome
participation from researchers, developers and other interested parties.

Lars
-- 
NEW EMAIL: lars.eggert at nokia.com
NEW MOBILE: +358 50 48 24461
NEW JABBER: lars.eggert at googlemail.com  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3368 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/9d44353d/smime-0001.bin

From david.borman at windriver.com  Thu Jan 11 09:11:43 2007
From: david.borman at windriver.com (David Borman)
Date: Thu, 11 Jan 2007 11:11:43 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45A60C5D.7000300@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de> <200701110025.AAA04307@cisco.com>
	<20070111013508.GA1402@hut.isi.edu> <45A60C5D.7000300@web.de>
Message-ID: <456C4B17-4E6C-4700-A0C8-AA3BA29A5441@windriver.com>

Hi Detlef,

TCP does not provide any mechanism to tell that all the data has been  
delivered to the application.  TCP can only tell you that all the  
data has been received by the TCP at the remote end.  The application  
itself on both ends of the connection will need to determine whether  
or not all the relevant data has been received by the application.   
That is really the only place where it can take place reliably.  For  
example, even if TCP could let the other side know that the  
application has read all the data, that doesn't mean that the  
application has actually processed the data, or gotten it to stable  
storage, or whatever else it is doing with the data.

TCP provides a reliable byte stream, it is up to the application to  
decide how to use the data that is transferred over that stream.

			-David Borman

On Jan 11, 2007, at 4:07 AM, Detlef Bosau wrote:

> I apologize if this is a stupid question, but from the RFC I  
> understood that the receiving socket sends an acknowledgement when  
> all data was received. That?s CLOSE ACK.
>
> Is there an explicit acknowledgement which tell?s the sender that  
> all data has been delivered to the _application_? Can this even be  
> achieved in finite time? An application may crash or hang!
>
> To my understanding the Fin/FinACK/..... is to shut down a TCP  
> connection knowing about the two army problem and the fact that  
> this cannot be solved in finite time?
>
> I apologizse when this is a misconception...
>
> Detlef
>


From touch at ISI.EDU  Thu Jan 11 10:10:11 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 11 Jan 2007 10:10:11 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A622C8.1080707@web.de>
References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu>
	<459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu>
	<459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu>
	<459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu>
	<45A576EB.206@web.de> <45A57D7A.6030505@isi.edu>
	<45A622C8.1080707@web.de>
Message-ID: <45A67D83.200@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
...
>> Case in point - if the intermediary ACKs data and continues to buffer
>> it, and the window wraps, and then the intermediary goes down, the
>> endpoints think the data reached the buffer correctly but it really
>> did not.
> 
> Of course. But, assumed we can overcome the window wrap problem, to my
> understanding spoofing boxes must not spoof CLOSE ACK, so that the
> sender is not notified that all data has reached the final receiver
> until this happens.
> 
> Of course, we don?t know anything of intermediate states and of course
> we run into a problem if a spoofing box fails.

Right - that's the other case where problems occur.

>>> What we do not know is whether the data has reached
>>> the application.
>>>     
>>
>> TCP is a reliable transport protocol; it is not a reliable application
>> protocol. Actions outside of TCP are not ensured by TCP.
>>
>>   
> Fine :-)
> 
> Then one could even argue with an end to end argument: When a tranport
> protocol cannot assure that sent data has been read successfully by the
> receiving application, we do need an acknowledgement scheme at the
> application level anyway.

The E2E argument applies to the ends in question. In this case, the
transport protocol is the endpoint, not the application.

...
> All I want to say is that there may be situations, e.g. extremely large
> delay bandwidth products where one perhaps really wants to have an
> alternative to AIMD probing to have an acceptable startup behaviour,

Agreed.

> where proxies / splitters / spoofing boxes should be considered very
> seriously.

I do NOT agree with that conclusion. If you want to change AIMD, proceed
as Lars suggested in a separate post - within TCP or within a link- or
network-consistent PEP, or within an application-visible proxy.

You may need to read the transport packets to implement such a PEP, but
you should not (MUST NOT, actually) need to spoof transport packets to
accomplish the result.

...
>>> To my understanding that?s one reason why we use
>>> acknowledgements on application level when it is necessary to know
>>> whether an application has received all data.
>>
>> Agreed, but we do know some other things. As a *receiver*, when we issue
>> a CLOSE, we keep reading until there is no more data. If we do so, AND
>> we receive a "no more data", then we *know* all the data has been
>> received correctly.
> 
> O.k., so we can detect an error: The sender sent a CLOSE and there is
> trailing data afterwards. In that case (I don?t know what the RFCs say
> here) we can issue an error message , e.g. a RST.

There are two cases:
	- the sender still has data to send already buffered
	in the socket
		TCP won't ACK the received FIN in that case

	- the sender has emptied its buffer and ACK'd the FIN
		TCP won't accept further SEND calls (socket writes)
		from the sending application in that case

> So, let?s take the
> sender?s view then: How long shall a sender wait for a possible error
> message like that? Doesn?t this lead to the problem that a missing NAK
> is not equivalent to an ACK?

The sender can wait all it wants. All it will ever know is that the
receiving TCP has correctly received the data; it needs a separate
signal from the application to know it has actually been read.
Otherwise, both TCPs could close with data in the receive buffers and if
their corresponding applications die, the data is just lost.

>> I.e., the semantics of who knows what are receiver-driven, not sender.
> 
> However, in a reliable connection the sender wants to know when all data
> has been completely delivered.

Not necessarily. The sender wants to know that IF the data is received,
it is received correctly, and that IF the receiver thinks it has all the
data, then it will be correct.

There's nothing in TCP semantics that ensure that the sender knows
anything other than that the receiving TCP has accepted all the data
correctly, though. All that knowledge stops at the TCP layer.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/9a5838ee/signature.bin

From touch at ISI.EDU  Thu Jan 11 10:13:13 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 11 Jan 2007 10:13:13 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45A67E39.4000308@isi.edu>


Agarwal, Anil wrote:
> Joe Touch wrote -
>>>
>>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
>>> receiver issues a CLOSE ACK, we know that all data has reached the
>>> receiving socket.
> 
>> We should know that. But when we have intermidiates spoofing ACKs, all
>> we know is that the two endpoints agree that they have closed. The data
>> itself is not known.
> 
>> Case in point - if the intermediary ACKs data and continues to buffer
>> it, and the window wraps, and then the intermediary goes down, the
>> endpoints think the data reached the buffer correctly but it really
> did not.
> 
> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes
> of sender
> data without delivering any to the receive end-point, then goes down, and
> the end-points continue the connection using the wrapped
> sequence number, which in this case match up just right, so that the
> intervening
> 2^32 bytes disappear down a black hole, without the sender or receive
> being any wiser?

Yes. The system can wrap without things matching up _exactly_, depending
on how big the CWND is, though, so this isn't as absurdly specific as it
appears at first glance.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/60454759/signature.bin

From touch at ISI.EDU  Thu Jan 11 10:14:21 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 11 Jan 2007 10:14:21 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45A67E7D.4010609@isi.edu>

PS - this could also happen within a single CWND, e.g., if the network
path temporarily shifts around the TCP-splitter. It doesn't require an
entire window wrap to occur.

Joe


Agarwal, Anil wrote:
> Joe Touch wrote -
>>>
>>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
>>> receiver issues a CLOSE ACK, we know that all data has reached the
>>> receiving socket.
> 
>> We should know that. But when we have intermidiates spoofing ACKs, all
>> we know is that the two endpoints agree that they have closed. The data
>> itself is not known.
> 
>> Case in point - if the intermediary ACKs data and continues to buffer
>> it, and the window wraps, and then the intermediary goes down, the
>> endpoints think the data reached the buffer correctly but it really
> did not.
> 
> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes
> of sender
> data without delivering any to the receive end-point, then goes down, and
> the end-points continue the connection using the wrapped
> sequence number, which in this case match up just right, so that the
> intervening
> 2^32 bytes disappear down a black hole, without the sender or receive
> being any wiser?
>  
> Cheers,
> Anil
> ------------------
> Anil Agarwal
> ViaSat Inc.
>  
>  
>  
>  

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/078af0b4/signature.bin

From touch at ISI.EDU  Thu Jan 11 16:05:11 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 11 Jan 2007 16:05:11 -0800
Subject: [e2e] FYI - update to list policy and operation
Message-ID: <45A6D0B7.5040207@isi.edu>

Hi, all,

Just a quick note that the list is under new management, at least
officially. I'm still running it day-to-day; that hasn't changed (sorry
;-) Some of this info is new, and some may be repeated...

Karen Sollins and Craig Partridge are now chairs of the IRTF E2E WG,
which is the owner of this list.
http://www.irtf.org/charter?gtype=rg&group=end2end

I have taken over Bob Braden's role as primary POC for queries about the
mailing list and requests to post. PS - I encourage you to thank Bob
either personally or on this list; running things, both at the IRTF, and
this list, are often otherwise thankless jobs, and he deserves the
primary credit for this list's prosperity).

The list posting policy has been updated to explain how CFPs are gated
more clearly, as follows:
    * have a primary focus on E2E issues
    * focus on research discussion
    * be open to all participants (space permitting)

See also the following:
http://www.postel.org/e2e.htm
http://www.postel.org/mailman/listinfo/end2end-interest

This is the spirit in which the recent DARPA and NSF posts have been
encouraged, and we further encourage all organizations - national,
industrial, academic, and other - to post relevant *open* calls for
papers and participation to this list.

Posts of calls for papers/participation which have non-space
restrictions, proposals for funding, and previously prohibited items
(job advertisements, job solicitations, and book announcements) are not
permitted under the current policy.

I hope this information is useful. Please feel free to contact me
directly if you have any questions, either about this list in general or
about a specific post.

Thanks,

Joe (as list admin)

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/1f64877d/signature.bin

From michael.welzl at uibk.ac.at  Fri Jan 12 10:15:40 2007
From: michael.welzl at uibk.ac.at (Michael Welzl)
Date: Fri, 12 Jan 2007 19:15:40 +0100
Subject: [e2e] ICCRG meeting agenda (12/13 Feb @ ISI)
Message-ID: <006e01c73675$ab2e72d0$0200a8c0@fun>

Dear all,

As Lars mentioned in his previous message to the list, ICCRG will have
a meeting which is co-located with Pfldnet 2007, and would welcome
your participation.

The agenda can be found at the bottom of this email. It seems to be
quite stable now, but if any (minor) changes need to be made, note
that they will be announced in the ICCRG list only. Additionally, the
most up-to-date version of the agenda is always available at:
http://www1.tools.ietf.org/group/irtf/trac/wiki/Agenda
(as you can see, Monday afternoon is dedicated to the
"how-to-cope-with-all-these-different-TCP's-out-there" issue)

and logistics details can be found at:
http://www1.tools.ietf.org/group/irtf/trac/wiki/Logistics

Please send an email to Alba Regalado-Palacios ( alba at isi.edu )
if you'd like to participate so that we can do a head count.

I hope to see you there!

Cheers,
Michael

==============================================

Monday 12. 2. 2007:
-------------------
08:30 - 09:00   Light breakfast
09:00 - 09:15   Welcome and agenda bashing
09:15 - 09:30   Michael Welzl: The current state of ICCRG
09:30 - 10:00   Keshav: What is congestion and what is congestion control
10:00 - 10:45   Jeremy Mineweaser: Congestion control in the Global
Information Grid (GIG)
10:45 - 11:00   Break
11:00 - 11:45   Tom Phelan: DCCP, TFRC and Open Problems
                in Congestion Control for Media Applications
11:45 - 12:15   Lachlan Andrew: Rate control with packet corruption
12:15 - 13:45   Lunch
13:45 - 15:15   Lars Eggert: The role of the IETF/IAB/ICCRG for
                already deployed non-standard TCP CC
15:15 - 15:30   Break
15:30 - 18:00   Discussion: What should the ICCRG be doing?


Tuesday 13. 2. 2007:
--------------------
08:30 - 09:00   Light breakfast
09:00 - 09:45   K. K. Ramakrishnan: LT-TCP: Loss Tolerant TCP
09:45 - 10:30   Ted Faber and Eric Coe: Congestion Control with Explicit
Feedback
                (XCP implementation experiences (Eric), and potential for
                incremental deployment (Ted))
10:30 - 10:45   Break
10:45 - 11:30   Doan B. Hoang: FICC-DiffServ: using CC as a QoS element
11:30 - 12:15   Bob Briscoe: Flow Rate Fairness: Dismantling a Religion
12:15           Open discussion: Next steps: meetings, docs, etc

From gorinsky at arl.wustl.edu  Fri Jan 12 14:54:49 2007
From: gorinsky at arl.wustl.edu (Sergey Gorinsky)
Date: Fri, 12 Jan 2007 16:54:49 -0600 (CST)
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
	Internet?)
In-Reply-To: <Pine.LNX.4.44.0701072015120.20178-100000@gato.kotovnik.com>
Message-ID: <Pine.LNX.4.44.0701120954110.8510-100000@dom.arl.wustl.edu>


  Vadim,

> How hard it is to turn the Fair Queueing knob to "on" on the gateways?

  To put my 2 kopecks in... First, since an application can masquerade as 
multiple flows, fairness enforcement with FQ is not effective. To lend 
itself to meaningful enforcement, fairness should be defined not in terms 
of flows or even hosts/processes generating them. Instead, fairness 
should be linked to humans behind the communications but this requires 
a very different network architecture.  

  Second, packet-by-packet FQ and end-to-end TCP strive to approximate 
instantaneous PS (Processor Sharing) which is not a good fit for any 
natural application. Multimedia streams need a minimal rate, not a fair 
share. Elastic applications are not well served by PS either because 
average message delay is much larger than under SRPT (Shortest Remaining 
Processing Time), in agreement with Internet experiences where deviations 
from short-term fair sharing improves overall efficiency.

  While minimizing the average message delay, SRPT also might starve large 
messages. However, one can have it both ways: the rich can get richer 
without making the poor poorer. ViFi (Virtual Finish Time First), which 
schedules messages preemptively in the order of their finish times under 
PS, is close to SRPT (and much better than PS) with respect to the average 
message delay and guarantees that no message is delivered later than 
under PS. You can read more on ViFi in:

  S. Gorinsky and N. S. V. Rao, "Dedicated Channels as an Optimal 
Network Support for Effective Transfer of Massive Data", Proceedings of 
High-Speed Networking (HSN 2006), April 2006.

The paper and respective simulation suite are available at:

http://www.arl.wustl.edu/~gorinsky/pdf/HSN_2006_dedicated_fairness.pdf
http://www.arl.wustl.edu/~gorinsky/ViFi/

  In the context of web servers, ViFi was independently proposed under  
a name of FSP (Fair Sojourn Protocol):

   E. J. Friedman and S. G. Henderson, "Fairness and Efficiency in 
Web Server Protocols", Proceedings of ACM SIGMETRICS 2003, June 2003, 
available through:
 
http://portal.acm.org/ft_gateway.cfm?id=781056&type=pdf&coll=GUIDE&dl=ACM&CFID=8911549&CFTOKEN=92127204

  Thank you,

  Sergey


From detlef.bosau at web.de  Fri Jan 12 16:09:07 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sat, 13 Jan 2007 01:09:07 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A67E7D.4010609@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu>
Message-ID: <45A82323.30405@web.de>

Joe Touch wrote:
> PS - this could also happen within a single CWND, e.g., if the network
> path temporarily shifts around the TCP-splitter. It doesn't require an
> entire window wrap to occur.
>
> Joe
>
>   

Two remarks.

First.

The only scenrios where I see a justification / necessity for doing 
splitting or spoofing are scenarios where the TCP flow must pass the 
split box / spoofing box / PEP anyway. These are scenarios without path 
redundancy or path transparency. Hence, in these scenarios the path 
cannot temporarily shift around the splitter because no alternative path 
exist. If we want redundancy in those scenarios, we have to consider hot 
stand by nodes for splitters which keep flow states and any other data 
which is "hard" and cannot be recovered synchronously with the backed up 
system.

To be not misunderstood: I don?t want to make restrictions for the 
benefit of a splitter. I think in scenarios where an alternative path to 
a splitter exist, a splitter must not be used. In my opinion splitters 
are to be used with maximum care and only in exceptional cases where any 
known alternative is worse than a splitter.

Second.

To my understanding we can avoid wrap around problems by having the 
receiver window sufficiently small. And of course, a splitter does flow 
control by itself. I?m not convinced that it is necessary to have 4 
GByte of unacknowledged data in the fly in all networks. And _if_ we 
need windows of this size we can reconsider the length of our sequence 
numbers.  But at least for terrestrial networks 4 GByte of data in 
transit seems extremely large to me.

WRT mobile networks: I?m still looking for material about delivery times 
and their distributions. It?s absolutely not necessary to have accurate 
quantitative data here. But it would be helpful to know wether we have 
to cope with delay spikes of up to 1 second, up to 10 seconds or up to 
10 minutes and whether these happen once a minute, once a day or once a 
year. When we encounter maximum delay spikes of 1 second not more than 
once a decade, the best idea is to simply ignore these. If a (wireless) 
link offers 1 Gbps throghput and is blocked each other minute, the 
situation might be somewhat different.

Particularly, and perhaps Anil could help me there, I want to get an 
idea what is already done by NOs and where research is necessary. As I 
said before, I presume that there happen quite a lot of things which are 
not publicly documented. This can lead first to duplicate research. And 
second this can lead to "strange" problems where protocols and 
applications do not work for unkown reasons, and the real problem is 
that there is some strange middlebox in use which does one of these neat 
"company confidential" or "non disclosure" algorithms. Obscure 
middleboxes can render our whole work completely worthless because they 
can cause problems no one can solve.

Particularly, I eventually want to understand the problems TCP and other 
protocols encounter on mobile links - and afterwards I can take a 
position how these can be solved. As I said before, much of the 
literature in this context appears quite obscure to me. It simply makes 
no sense to e.g. talk about a splitter and its benefits in a mobile 
network when it is yet unclear whether a splitter is even necessary.

Detlef
> Agarwal, Anil wrote:
>   
>> Joe Touch wrote -
>>     
>>>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a
>>>> receiver issues a CLOSE ACK, we know that all data has reached the
>>>> receiving socket.
>>>>         
>>> We should know that. But when we have intermidiates spoofing ACKs, all
>>> we know is that the two endpoints agree that they have closed. The data
>>> itself is not known.
>>>       
>>> Case in point - if the intermediary ACKs data and continues to buffer
>>> it, and the window wraps, and then the intermediary goes down, the
>>> endpoints think the data reached the buffer correctly but it really
>>>       
>> did not.
>>
>> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes
>> of sender
>> data without delivering any to the receive end-point, then goes down, and
>> the end-points continue the connection using the wrapped
>> sequence number, which in this case match up just right, so that the
>> intervening
>> 2^32 bytes disappear down a black hole, without the sender or receive
>> being any wiser?
>>  
>> Cheers,
>> Anil
>> ------------------
>> Anil Agarwal
>> ViaSat Inc.
>>  
>>  
>>  
>>  
>>     
>
>   


From jtw at ISI.EDU  Fri Jan 12 16:15:07 2007
From: jtw at ISI.EDU (John Wroclawski)
Date: Fri, 12 Jan 2007 16:15:07 -0800
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <Pine.LNX.4.44.0701120954110.8510-100000@dom.arl.wustl.edu>
References: <Pine.LNX.4.44.0701120954110.8510-100000@dom.arl.wustl.edu>
Message-ID: <p06240834c1cdd44e310e@[128.9.168.145]>

At 4:54 PM -0600 1/12/07, Sergey Gorinsky wrote:
>   Vadim,
>
>>  How hard it is to turn the Fair Queueing knob to "on" on the gateways?
>
>   To put my 2 kopecks in... First, since an application can masquerade as
>multiple flows, fairness enforcement with FQ is not effective. To lend
>itself to meaningful enforcement, fairness should be defined not in terms
>of flows or even hosts/processes generating them. Instead, fairness
>should be linked to humans behind the communications but this requires
>a very different network architecture.

Along these lines folks might want to read Bob Briscoe's internet 
draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), 
IETF Internet-Draft <draft-briscoe-tsvarea-fair-00.pdf>, can be found 
in many formats at 
http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis

john

From avg at kotovnik.com  Fri Jan 12 17:38:58 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Fri, 12 Jan 2007 17:38:58 -0800 (PST)
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <Pine.LNX.4.44.0701120954110.8510-100000@dom.arl.wustl.edu>
Message-ID: <Pine.LNX.4.44.0701121705270.18433-100000@gato.kotovnik.com>

On Fri, 12 Jan 2007, Sergey Gorinsky wrote:

> 
>   Vadim,
> 
> > How hard it is to turn the Fair Queueing knob to "on" on the gateways?
> 
>   To put my 2 kopecks in... First, since an application can masquerade as 
> multiple flows, fairness enforcement with FQ is not effective.

Doing FQ on src/dst addresses (not on address+ports) flows will be a lot 
better than per-flow fairness of TCP in any case.

> To lend 
> itself to meaningful enforcement, fairness should be defined not in terms 
> of flows or even hosts/processes generating them. Instead, fairness 
> should be linked to humans behind the communications but this requires 
> a very different network architecture.  

That is pretty much what I'm saying. The fairness is an economic, not 
technical concept.  Basically, I'd venture to guess that share of network 
capacity allocated should be roughly proportional to the payments.  
Meaning that routing system should be augmented with some way to announce
weights for the fairness enforcement.
 
>  Second, packet-by-packet FQ and end-to-end TCP strive to approximate 
> instantaneous PS (Processor Sharing) which is not a good fit for any 
> natural application. Multimedia streams need a minimal rate, not a fair 
> share.

This a common misconception. The multimedia streams are either
pre-recorded, lag-insensitive content, in which case they are, basically,
file transfers (that accounts for 99% of the "streams", incidentally); or
a real-time content which is quite elastic in bandwidth requirements -
especially video (audio bandwidth is not an issue nowadays, anyway). You
can reduce frame rate, reduce color & luminosity bit depth, reduce
horizontal & vertical resolution, or just increase compression - for a TV
quality stream that yields two orders of magnitude acceptable degradation
bandwidth-wise.  This is more than you can typically get from the TCP
congestion control; and more than the common bandwidth oversubscription 
ratio is.

What can be done for real-time streams is doing deadline scheduling on the 
output queues - and tossing away packets which are past deadline.  That'd 
require accurate timing (in ms resolution) on gateways, but it's quite 
doable.

Instead of the bandwidth reservation nonsense (which screws up dynamic
routing) we'd be much better served by the introduction of a
millisecond-resolution TTL field from seconds to milliseconds. Or simply
adding a bit which changes meaning of TTL field from hops/seconds to
milliseconds (255 ms one way should be enough, I guess, at least for this
plane :) - it will also be backwards compatible with the existing
gateways.

> Elastic applications are not well served by PS either because 
> average message delay is much larger than under SRPT (Shortest Remaining 
> Processing Time), in agreement with Internet experiences where deviations 
> from short-term fair sharing improves overall efficiency.

Yep. But you still need to enforce fairness between *users*.  So it must
be some combination of FQ for end-points and deadline packet order and
drop scheduling.
 
Thanks for the references!

I'm not insisting on FQ being the best way to do things - it's just that 
it is already implemented and addresses most obvious problems: short 
sessions, parallel session cheats, point-origin flooding, etc - including 
overly-aggressive or poorly tested TCP stacks, which was the point of the 
original discussion.
 
--vadim


From avg at kotovnik.com  Fri Jan 12 17:57:46 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Fri, 12 Jan 2007 17:57:46 -0800 (PST)
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <p06240834c1cdd44e310e@[128.9.168.145]>
Message-ID: <Pine.LNX.4.44.0701121747490.18433-100000@gato.kotovnik.com>

On Fri, 12 Jan 2007, John Wroclawski wrote:

> Along these lines folks might want to read Bob Briscoe's internet 
> draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), 
> IETF Internet-Draft <draft-briscoe-tsvarea-fair-00.pdf>, can be found 
> in many formats at 
> http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis

Pretty much my point all along.

I have one issue with the paper, though -- they advocate fairness based on
"the costs, not benefits".  That shows that they didn't really thought
about economics. The real-life enterprises (such as ISPs) maximise profit,
so they are interested in giving most profitable customers a bigger share
and penalize less profitable customers - thus creating incentive to pay 
more for the better performance.

The cost-based allocation does not work economically, as it encourages 
incurring higher costs in order to obtain higher benefits.

--vadim


From Jon.Crowcroft at cl.cam.ac.uk  Sat Jan 13 02:43:27 2007
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Sat, 13 Jan 2007 10:43:27 +0000
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
	Internet?)
In-Reply-To: Message from Vadim Antonov <avg@kotovnik.com> of "Fri,
	12 Jan 2007 17:57:46 PST."
	<Pine.LNX.4.44.0701121747490.18433-100000@gato.kotovnik.com> 
Message-ID: <E1H5gMG-0007Co-00@mta1.cl.cam.ac.uk>

yes indeed...

the use of congestion avoidance is to avoid congestion at shared points in the
net, which _should_ be an unusual occurrance (by definition)

but: most nets should be designed for the expected traffic load
with some design headroom for variance. when i get x million DSL users, I _know_
what the rate is that they are limited to at their access line. I can build a
core for that, and i can peer or connect to upstream tiers with capacity in line
with that. i can also engineer for p2p traffic.

the congestion experienced often by academics and researchers and people working
for techy geeky companies is because they don't have a
designed network and don't pay proper prices for impedence matched networks - we
have a 10Gbps access line to the internet in cambridge - many departmtnts have
gigE attachments to the net - TCP encourages (doesnt avoid) congestion:) in this
scenario. but it isnt the scenario experienced by the great unwashed public
Internet users.

bob briscoe's note is good reading, but it talks to the situation where people
have way faster access lines than their mean access use. this isn't the dominant
situation in the commercial net where DSL (and cable ) broadband access can be
co-designed/dimensioned with the core net provisioning.

in this case, customers get what they pay for (If i want 20Mbps DSL, I can get it
easily, but i pay more than i do for 8Mbps or for 384kbps all in line with the 
ISP maximising profit subject to competition with Other ISPs controlling them
meeting the users utility/satisfaction - this is all discussed in Kelly's work
quite nicely

actually, if TCP used ECN right (again, other bob briscoe work) it would just be
achinveing the same as this argument (kelly etal) but just on a shorter timescale
if ECN pricing was done right (see re-feedback etc etc) but if you strive for
"fairness" in the current world, you are striving for an illusion

In missive <Pine.LNX.4.44.0701121747490.18433-100000 at gato.kotovnik.com>, Vadim Antonov typ
ed:

 >>On Fri, 12 Jan 2007, John Wroclawski wrote:
 >>
 >>> Along these lines folks might want to read Bob Briscoe's internet 
 >>> draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), 
 >>> IETF Internet-Draft <draft-briscoe-tsvarea-fair-00.pdf>, can be found 
 >>> in many formats at 
 >>> http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis
 >>
 >>Pretty much my point all along.
 >>
 >>I have one issue with the paper, though -- they advocate fairness based on
 >>"the costs, not benefits".  That shows that they didn't really thought
 >>about economics. The real-life enterprises (such as ISPs) maximise profit,
 >>so they are interested in giving most profitable customers a bigger share
 >>and penalize less profitable customers - thus creating incentive to pay 
 >>more for the better performance.
 >>
 >>The cost-based allocation does not work economically, as it encourages 
 >>incurring higher costs in order to obtain higher benefits.
 >>
 >>--vadim
 >>

 cheers

   jon


From avg at kotovnik.com  Sat Jan 13 16:47:04 2007
From: avg at kotovnik.com (Vadim Antonov)
Date: Sat, 13 Jan 2007 16:47:04 -0800 (PST)
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <B833AEA9-3F5E-408A-999B-2FACD83458E4@mac.com>
Message-ID: <Pine.LNX.4.44.0701131630110.22375-100000@gato.kotovnik.com>


On Sat, 13 Jan 2007, Dado Colussi wrote:

> I'm not sure I understand your point. Bob's paper describes  
> mechanisms to create an economic regulatory system where individuals  
> are required to pay extra for their unsocial behavior of causing  
> congestion. It is only a part of the economic landscape ISPs and  
> other entities operate and it is akin to the cost of causing  
> greenhouse gas emissions in real world. I don't see why ISPs couldn't  
> adjust congestion prices per customer in order to drive their system  
> to a resource allocation that would maximize profit in times of  
> congestion?
> 
> Dado


Dado - ISPs are not interested in reducing amount of traffic; quite 
opposite. It is their product, and as any producer they are interested in 
increasing volume - if you remember Econ 101, in the long term the 
profitability of all kinds of businesses tends to converge to the same 
norm. (Business segments with higher-than-average ROI attract more 
invenstments - and competition, thus reducing profitability; 
underperforming segments lose capital and consequently have less 
competitive pressure, thus allowing increase in profitability).

In the established markets, where the initial period of rapid growth (on 
the S-curve) is over, the only sustainable way to make more money and 
increase value of business shares is to increase volume.

So it makes no sense for ISPs whatsoever to penalize users for causing
congestion (thus reducing the demand). Instead, they want to encourage
users to pay more for bigger share of the network resources - the
congestion is their friend, if they can differentiate service (who would
pay for premium service when regular service is quite good?)

Also, congested network is the network operating at full capacity - 
meaning that there is no overinvestment.  If a provider has underloaded 
network it, basically, means that its business people made a mistake and 
overinvested (driving ROI - and share prices - lower).

--vadim


From Anil.Agarwal at viasat.com  Sat Jan 13 17:17:09 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Sat, 13 Jan 2007 20:17:09 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E39.4000308@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com>

Joe Touch wrote -
>> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes
>> of sender
>> data without delivering any to the receive end-point, then goes down, and
>> the end-points continue the connection using the wrapped
>> sequence number, which in this case match up just right, so that the
>> intervening
>> 2^32 bytes disappear down a black hole, without the sender or receive
>> being any wiser?
>>
> Yes. The system can wrap without things matching up _exactly_, depending
> on how big the CWND is, though, so this isn't as absurdly specific as it
> appears at first glance.

I think, this scenario will occur if 
  the TCP-splitters buffer x bytes of undelivered data,
  the sender cwnd is y,
  x < 2^32 and x+y >= 2^32,
  the splitters go down and 
  packets flow between the sender and receiver over an alternate path.
 
In this case, the receiver rcv.nxt value is within the 
sender [snd.nxt , snd.nxt + cwnd) range and hence 
the receiver will acknowledge sequence number rcv.nxt and 
accept data beyond it; 
the sender will gladly accept the acknowledgement and 
continue sending data.
 
Now, we know 
  y < 2^30
a constraint required and imposed by TCP (rfc 1323).
 
We can claim that a (good) TCP splitter ensures that
  x < 2^31
 
Detlef - this is not as easy as it might first appear, especially
since data can get buffered at the sending or 
receiving TCP-splitter in a two-splitter case,
but it can be (is) done.
 
Hence, x + y < 2^32 and the above scenario will not occur.
 
The above also requires that TCP-splitters use
the same ISS (Inital Sequence Number) with the receiver
as the one used by the sender. 
A good TCP-splitter should (does).
 
Anil
-----------
Anil Agarwal
ViaSat Inc.
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/e143068c/attachment.html

From Anil.Agarwal at viasat.com  Sat Jan 13 17:33:35 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Sat, 13 Jan 2007 20:33:35 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E39.4000308@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3594@VGAEXCH01.hq.corp.viasat.com>

This is an re-send of the previous email with some
additional info towards the end.
 
Joe Touch wrote -
>> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes
>> of sender
>> data without delivering any to the receive end-point, then goes down, and
>> the end-points continue the connection using the wrapped
>> sequence number, which in this case match up just right, so that the
>> intervening
>> 2^32 bytes disappear down a black hole, without the sender or receive
>> being any wiser?
>>
> Yes. The system can wrap without things matching up _exactly_, depending
> on how big the CWND is, though, so this isn't as absurdly specific as it
> appears at first glance.

I think, this scenario will occur if 
  the TCP-splitters buffer x bytes of undelivered data,
  the sender cwnd is y,
  x < 2^32 and x+y >= 2^32,
  the splitters go down and 
  packets flow between the sender and receiver over an alternate path.
 
In this case, the receiver rcv.nxt value is within the 
sender [snd.nxt , snd.nxt + cwnd) range and hence 
the receiver will acknowledge sequence number rcv.nxt and 
accept data beyond it; 
the sender will gladly accept the acknowledgement and 
continue sending data.
 
Now, we know 
  y < 2^30
a constraint required and imposed by TCP (rfc 1323).
 
We can claim that a (good) TCP splitter ensures that
  x < 2^31
 
Detlef - this is not as easy as it might first appear, especially
since data can get buffered at the sending or 
receiving TCP-splitter in a two-splitter case,
but it can be (is) done.
 
Hence, x + y < 2^32 and the above scenario will not occur.
 
The above also requires that TCP-splitters use
the same ISS (Inital Sequence Number) with the receiver
as the one used by the sender. 
A good TCP-splitter should (does).
 
Also, with a good TCP-splitter, x does not reach 2^31 -1 
simply because of a slow receiver or network and 
a fast sender.
x is generally a function of the bandwidth*RTT value and 
the number of TCP connections using the bottleneck
and a good TCP-splitter flow-controls the sender to enforce x.
 
As an aside, I would love to see the day when CWND < 2^30
becomes too limiting and we need to raise the limit :) 
 
Anil
-----------
Anil Agarwal
ViaSat Inc.
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/fe7a1ef5/attachment.html

From Anil.Agarwal at viasat.com  Sat Jan 13 17:45:38 2007
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Sat, 13 Jan 2007 20:45:38 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com>

 
Joe Touch wrote:
> PS - this could also happen within a single CWND, e.g., if the network
> path temporarily shifts around the TCP-splitter. It doesn't require an
> entire window wrap to occur.

Joe - I am not able to think of a scenario similar to what you
describe above where a network with TCP-splitters causes
undetected loss of data or delivery of incorrect data.
 
I will appreciate if you can describe what you are thinking in 
some more detail.
 
Thanks,
Anil
----------
Anil Agarwal
ViaSat Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/f3dc6772/attachment-0001.html

From detlef.bosau at web.de  Sun Jan 14 04:00:29 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sun, 14 Jan 2007 13:00:29 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E39.4000308@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45AA1B5D.2040805@web.de>

Agarwal, Anil wrote:
>
>
> I think, this scenario will occur if
>   the TCP-splitters buffer x bytes of undelivered data,
>   the sender cwnd is y,
>   x < 2^32 and x+y >= 2^32,
>   the splitters go down and
>   packets flow between the sender and receiver over an alternate path.

If a splitter goes down, all data which is acknowledged by the splitter 
but not yet delivered to the sender is unreceoverably lost.
In addition, sequence numbers are flow specific, so when a splitter goes 
down and the flow takes an alternate path the ack-numbers received by 
the sender are completely undefined as they stem from a different TCP flow.

So, Joe is right here when he says that end to end semantics on the 
connection level are destroyed and we cannot recover from a failure of 
the splitter.

However, I don?t know whether there are hot stand by architectures 
available or at least possible in some cases where a backup can replace 
a failed split box. But such an architecture would at least require a 
one to one copy of any flow specific state data to be available at the 
split box and each of its backup systems as well.

>  
> Detlef - this is not as easy as it might first appear, especially
> since data can get buffered at the sending or
> receiving TCP-splitter in a two-splitter case,
> but it can be (is) done.
Are there papers available on this?

>  
> Hence, x + y < 2^32 and the above scenario will not occur.
>  
> The above also requires that TCP-splitters use
> the same ISS (Inital Sequence Number) with the receiver
> as the one used by the sender.
> A good TCP-splitter should (does).

This is one issue I refered to above.

Particularly on this one: I admittedly have no idea how sequence numbers 
are frozen. It would be sufficient to freeze sequence numers wrt to a 
certain address quadrupble - however this is somewhat arduous. So I can 
imagine (perhaps someone can tell me) that a TCP sender simply freezes 
every used sequence number for some period of time and does not consider 
the address quadruple. In that case, I think exact spoofing of
sequence numbers can be difficult?

Detlef


From Jon.Crowcroft at cl.cam.ac.uk  Sun Jan 14 06:20:00 2007
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Sun, 14 Jan 2007 14:20:00 +0000
Subject: [e2e] any source unicast
Message-ID: <E1H66Dw-0003UH-00@mta1.cl.cam.ac.uk>

On Any Source Unicast...

So i just caved in and bought a digital flatscreen TV - 

it is quite nice in terms of multimedia, 
BUT i now see why the pressure is on to drive the world to digital television - 

nothing to do with display technology  
(any how, purists will tell you that CRTs are still,
just about, the crispest pound for pound).

What Digital TV does is lock you in to a (hammer style cheap)
telecom circuit nightmare reality of the 1970s. 

With an analogue transmission&receiver, 
you basically live in an any-to-any world, 
and all the devices switch everything through ether-like
(like multicast too:).

in the digital world, you can plug in what is, au fond, 
an analogue signal into the first hop, 
but once its tuned and decoded that flow, 
it wont deliver you any other flow off the "air". 
This means that to watch channel X while recording channel Y 
(a perfectly legal thing to do in the UK), 
you have to have 2 digital receivers
(once the legacy analogue channels go away) - 

and worse, in our house there are 6 people, 
3 or 4 TVs, 3 or 4 VCRs, 4 or 5 DVD capable devices 
(if i include games consoles)  - 
interconnecting all of these locks up virtual circuit resources, 
and ties you down to a real limited set of scenarios 
without some massive switch -

they didn't even learn the ATM lesson of having 
virtual paths to put virtual channels in. 

But worse is to come: Once you connect up some HD devices, 
you find HDMI supports a thing (not completely standardised yet) 
called HDCP (high definition copy protection) - 
this means that you cannot put in anything in the path 
(e.g. a T-connector) 
to copy something you are viewing;
_everything_ has to be point-to-point authorised ...

and coz it aint all standard yet, 
some things don't interwork - 

[interestingly, amusingly, some display devices let you disable HDCP - 
a bit like the DVD players that turn of region control:) - 
basically, in a free market, someone is gonna work around this godawful stupidity, 
then everyone will eventually follow.]

Why is this relevant end-to-end? 
well this ought to be obvious, but if not, 
let me spell it out ...

A lot of Next Gen Internet ideas are being cast about right now. 
Some of them have the flavour of circuits. This is a foul and bitter flavour, 
and should be resisted at all costs, 
since it reduces the net value of an interconnect, 
increases its costs, and massively reduces its flexibility 
(and indeed, reduces everything to the lowest common denominator technology, 
and locks you in there til kingdm come).

In wireless network research, 
a lot of folks seem (thankfully) to be going the _opposite_ direction, 
with a more open, many-to-many, multiradio, mesh/community/adhoc/dtn 
thousand flowers (flow pun intended:) flourishing  - indeed
network coding, (just for 1 example) means you _have_ to allow in-net copy!).

but the whole "triple play" by telecos to pull 
TV, Telephone and Internet into one box, 
seems to be more and more predicated on a fundamental 
misdirection of the world.

this is not about QoS - this is about lockin.

happy new year

jon
p.s. for the less mad: see ->
--------------------------------------------------------------------
               The Second International Workshop on
    Mobility in the Evolving Internet Architecture (MobiArch 2007)
                   Kyoto, Japan, August 27, 2007
        (to be held with ACM SIGCOMM 2007, August 27-31, 2007
       http://user.informatik.uni-goettingen.de/~mobiarch/2007
--------------------------------------------------------------------

With the recent development of technologies in wireless access and
mobile devices, user, terminal, and network mobility has become an
indispensable component of today's Internet vision, and it is likely to
continue in the near future, while affecting the whole architectural
design of the future Internet. Yet, issues like efficient mobility
management and optimization, locator-identifier split, multihoming,
security, and related operational/deployment concerns are still in their
early stages of development. Moreover, the Internet architecture, its
end-to-end principles, and business models will require rethinking due
to the massive penetration of mobility into the Internet.

MobiArch'07 welcomes submissions, from both researchers and
practitioners, in exploration of recent advances in architectures,
protocols, and experiences with emerging technologies on wireless and
mobility over the Internet, with an emphasis on wireless infrastructures
and mobility patterns for mobility support, new mobility protocols,
service discovery, routing and location management, mobile network
performance evaluation and modeling, multi-homing, security,
architectural impacts and deployment considerations.

Topics of Interest:
==================
Topics of MobiArch?07 cover all aspects of architectural issues and
system support for wireless and mobility in the Internet, including but
not limited to:

- Impacts of new wireless technologies/services and mobility patterns on
the Internet architecture
- Architectures and protocols for mobility support in the Internet,
ranging from approaches in link, network, transport to
session/application layers and cross-layer design
- Location management, positioning and data management systems for
wireless and mobility
- Routing and addressing, including locator/identifier split issues and
their impacts to the Internet architecture
- IP multihoming including flow distribution and load sharing for
wireless and mobility
- Performance evaluation, experimentation and modeling of mobility in
the Internet
- Accounting, access control, security and privacy issues and impacts to
Internet architecture
- Economic, scalability and deployment issues of mobility infrastructure
design
- Mechanisms and issues with connecting developing regions into the Internet

Following the success of MobiArch'06, the MobiArch'07 workshop will be a
single-track one-day workshop. Early stages, position papers, systems
and measurement papers will be particularly welcome. The proceedings
will be published by the ACM and ACM digital library.

Submissions:
===========
Submissions must be made to MobiArch'07 EDAS entry:
http://edas.info/5238, following the guidelines in MobiArch'07 webpage:
http://user.informatik.uni-goettingen.de/~mobiarch/2007

Important Dates:
===============
Paper registration:  March 20, 2007
Submission Deadline: March 27, 2007
Acceptance Notification: May 15, 2007
Camera-ready version due: June 12, 2007
Workshop: August 27, 2007
SIGCOMM Main Conference: August 27-31, 2007

PROGRAM CO-CHAIRS
=================
   Xiaoming Fu, University of Goettingen (Germany)
   Katherine Guo, Bell Labs (USA)
   Sue Moon, KAIST (Korea)
   Ryuji Wakikawa, Keio University (Japan)

PUBLICITY CHAIR
===============
Jon Crowcroft, U. Cambridge (UK)

Please consult the Program Co-Chairs
(mobiarch at informatik.uni-goettingen.de) if you are uncertain whether
your paper falls within the scope of the workshop.


From randy at psg.com  Sun Jan 14 10:33:31 2007
From: randy at psg.com (Randy Bush)
Date: Sun, 14 Jan 2007 08:33:31 -1000
Subject: [e2e] any source unicast
References: <E1H66Dw-0003UH-00@mta1.cl.cam.ac.uk>
Message-ID: <17834.30587.611439.898650@roam.psg.com>

> [interestingly, amusingly, some display devices let you disable
> HDCP - a bit like the DVD players that turn of region control:) -
> basically, in a free market, someone is gonna work around this
> godawful stupidity, then everyone will eventually follow.]

the lawyers will follow.  and the us congress will follow the
industry lobbiests.  and the other govts will follow the us
congress.

> Why is this relevant end-to-end?  well this ought to be obvious,
> but if not, let me spell it out ...

bingo!

> but the whole "triple play" by telecos to pull TV, Telephone and
> Internet into one box, seems to be more and more predicated on a
> fundamental misdirection of the world.
> 
> this is not about QoS - this is about lockin.

they do not see this as misdirection, quite the opposite.  they
see it as cleaning up the layer 8/9 disaster created by the
free-for-all open connectivity of the pre-circuit internet.

randy


From Jon.Crowcroft at cl.cam.ac.uk  Sun Jan 14 23:44:35 2007
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Mon, 15 Jan 2007 07:44:35 +0000
Subject: [e2e] any source unicast
In-Reply-To: Message from Randy Bush <randy@psg.com> of "Sun,
	14 Jan 2007 08:33:31 -1000." <17834.30587.611439.898650@roam.psg.com> 
Message-ID: <E1H6MWK-0007N6-00@mta1.cl.cam.ac.uk>

In missive <17834.30587.611439.898650 at roam.psg.com>, Randy Bush typed:

 >>> this is not about QoS - this is about lockin.
 
 >>they do not see this as misdirection, quite the opposite.  they
 >>see it as cleaning up the layer 8/9 disaster created by the
 >>free-for-all open connectivity of the pre-circuit internet.

yes, so thats what you get when you let lawyers re-design your net:

1/ a system that is exponentially less efficient than the copy net we now have.

2/ a system of intellectual property protection that maximises revenue for a
small (oligopoly/cartel) number of palyers and actually reduces the overal
profitability of the business of content

3/ a system that invents concepts of ownership that were made up recently and
didnt exist for millenia when copying was actually valuble

4/ a system that didn't ake into account that :
i) the internet is efficient
ii) lack of copy protection in the net is a no-op - 
just as with any aspect of security, 
copy protection, if and only if you want it, is an end-to-end matter -
preventing copying in a copy technology is an oxymoron

iii) there is actually virtually no evidence that the fact of a 
near zero-cost copy system is actually harming content profits - 
most of the papers that look at scientific evidence on music, film, games 
and other content profits
and internet or other based piracy (and I've read a lot) 
are at most, equivical, and many show the obvioius, 
that free copies are free advertising, and boost legit sales, 
provided the low copy cost is _passed on to the consumer_ - 
iv) the vandelism of this circuit technology is that it prevents the passing of
this efficiency on to the commercvial consumer and prevents the non commercial
consumer from free copies of things which were created for public good.


the CTO of Time warner gave a talk here a couple of summers back where he was
challenged about copy technologies (audio cassette, vhs, etc) - 
he said that people in the film business  were NEVER against lower copy cost
as they were in the content business, not in the pressing plastic and postal
business - he looked forward to using video bittorrent so that 
film directos and publishers could take blockbuster AND netflix out of the
equation. 

This technology flies in the face of that position, and I bet people like him
will be very very annoyed when it sinks in where the digital tv 
lawyer-led lunacy is leading


 cheers

   jon

p.s. end-to-end copy protection rather than hop-by-hop - think about it:
you know it makes cents.

From detlef.bosau at web.de  Mon Jan 15 07:25:30 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 15 Jan 2007 16:25:30 +0100
Subject: [e2e] How often does congestion control react upon loss?
Message-ID: <45AB9CEA.8020109@web.de>

I apologize, if this is a beginner?s question.

During simulations with large buffers I often have several "drop bursts" 
when a buffer overruns because the RTT is quite large (due to the large 
buffers) and therefore it takes some time for a receiver to detect a drop.

Now I wonder (and I?m currently reading RFC 3517 to get this question 
answered but perhaps someone can help me here to understand this) 
whether it is possible to do several congestion actions in a round. To 
my understanding it is by all means possible (and makes sense) to get 
several sequence numbers in the scoreboard marked "lost" within a round. 
Of course, when a sequence number is marked lost, the sender has to 
retransmit the appropriate segment.

However, is the congestion window halved each time a sequence number is 
recognized as "lost"? Or is there a limit, e.g. the congestion window is 
halved only once a round?

Thanks.

Detlef


From francesco at net.infocom.uniroma1.it  Tue Jan 16 06:50:58 2007
From: francesco at net.infocom.uniroma1.it (Francesco Vacirca)
Date: Tue, 16 Jan 2007 15:50:58 +0100
Subject: [e2e] How often does congestion control react upon loss?
In-Reply-To: <45AB9CEA.8020109@web.de>
References: <45AB9CEA.8020109@web.de>
Message-ID: <45ACE652.5090903@net.infocom.uniroma1.it>

Detlef,

When NewReno TCP and SACK TCP are in the Fast Recovery phase (after the 
Fast Retransmit) the congestion window is not halved if a further packet 
loss is detected.

In the following the procedure used by NewReno:
After the Fast Retransmit retransmission the TCP protocol enters in the 
Fast Recovery phase.
This phase lasts till the ACK for the highest transmitted packet at the 
beginning of this phase (stored in the "recover" variable) is received.
During this phase, for each additional duplicate ACK, the congestion
window is incremented by one packet, to reflect the departure from the 
network of an additional segment. A new segment is transmitted if it is 
allowed by the congestion window and the receiver advertised window. 
When an ACK is received acknowledging new packets, but with sequence 
number lower than "recover" (partial ACK), the first unacknowledged 
packet is retransmitted, the congestion window is deflated by the amount 
of packet acknowledged by the partial ACK. The window deflation attempts 
to keep the congestion window at the level of the number of outstanding 
packets when the "Fast Recovery" phase ends. When an ACK acknowledging a
packet greater or equal to "recover" is received, the congestion window 
is set to the value of ssthresh and TCP exits from the ?Fast Recovery? 
phase.

Francesco


Detlef Bosau wrote:
> I apologize, if this is a beginner?s question.
> 
> During simulations with large buffers I often have several "drop bursts" 
> when a buffer overruns because the RTT is quite large (due to the large 
> buffers) and therefore it takes some time for a receiver to detect a drop.
> 
> Now I wonder (and I?m currently reading RFC 3517 to get this question 
> answered but perhaps someone can help me here to understand this) 
> whether it is possible to do several congestion actions in a round. To 
> my understanding it is by all means possible (and makes sense) to get 
> several sequence numbers in the scoreboard marked "lost" within a round. 
> Of course, when a sequence number is marked lost, the sender has to 
> retransmit the appropriate segment.
> 
> However, is the congestion window halved each time a sequence number is 
> recognized as "lost"? Or is there a limit, e.g. the congestion window is 
> halved only once a round?


> Thanks.
> 
> Detlef
> 
> 

From touch at ISI.EDU  Tue Jan 16 09:05:40 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 16 Jan 2007 09:05:40 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45A82323.30405@web.de>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de>
Message-ID: <45AD05E4.5040200@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
>> PS - this could also happen within a single CWND, e.g., if the network
>> path temporarily shifts around the TCP-splitter. It doesn't require an
>> entire window wrap to occur.
>>
>> Joe
>>
>>   
> 
> Two remarks.
> 
> First.
> 
> The only scenrios where I see a justification / necessity for doing
> splitting or spoofing are scenarios where the TCP flow must pass the
> split box / spoofing box / PEP anyway. These are scenarios without path
> redundancy or path transparency.

Why are you so confident about the path, when you cannot control whether
there is a PEP/spoofing box in it?

...
> To be not misunderstood: I don?t want to make restrictions for the
> benefit of a splitter. I think in scenarios where an alternative path to
> a splitter exist, a splitter must not be used.

Either the use of splitters is under your control or it is not.

If it is, then there are a number of reasons to remove them, alternate
paths are just one.

If it is not, then you cannot make assumptions about the path.

> In my opinion splitters
> are to be used with maximum care and only in exceptional cases where any
> known alternative is worse than a splitter.

It would be interesting if you could explain a sample case. IMO,
splitters just lie - they lie about being an endpoint they are not.
Either you are lying to yourself (you own the endpoint you're lying to)
or you're lying to others. The first is silly - just install a true
application proxy - and the second is YOU making a decision for ME about
what's more important. If I don't want to talk to a true proxy, you have
no business tricking me into thinking I'm not.

> Second.
> 
> To my understanding we can avoid wrap around problems by having the
> receiver window sufficiently small....

As I said, there are other cases where the splitter comes/goes, either
because it is unreliable or due to multipath, that can cause silent data
errors too.

You can't know whether that will happen; all you DO know is that you'll
mess up the data to the receiver. If you own the receiver, that's your
decision. If not, then you're silently breaking TCP semantics.

That's not worth any alternative.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070116/28c32421/signature.bin

From touch at ISI.EDU  Tue Jan 16 09:47:19 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 16 Jan 2007 09:47:19 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45AD0FA7.9060106@isi.edu>


Agarwal, Anil wrote:
>  
> Joe Touch wrote:
>> PS - this could also happen within a single CWND, e.g., if the network
>> path temporarily shifts around the TCP-splitter. It doesn't require an
>> entire window wrap to occur.
> 
> Joe - I am not able to think of a scenario similar to what you
> describe above where a network with TCP-splitters causes
> undetected loss of data or delivery of incorrect data.
>  
> I will appreciate if you can describe what you are thinking in
> some more detail.

The scenario I was thinking of is when the splitter ACKs the data, and
the source moves the window forward. In the meantime, the splitter has
not yet sent the data to the receiver, and goes down.

I'm not sure what would happen when the receiver has a hole in its
window and the sender lacks the data to resend. This may cause a lockup,
though wouldn't cause silent loss/corruption.

There's also the way in which different MSS's could cause similar
hiccups in congestion control.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070116/a81629dc/signature.bin

From detlef.bosau at web.de  Wed Jan 17 11:21:28 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 17 Jan 2007 20:21:28 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45AD05E4.5040200@isi.edu>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
Message-ID: <45AE7738.6070701@web.de>

Joe Touch wrote:
>>
>> First.
>>
>> The only scenrios where I see a justification / necessity for doing
>> splitting or spoofing are scenarios where the TCP flow must pass the
>> split box / spoofing box / PEP anyway. These are scenarios without path
>> redundancy or path transparency.
>>     
>
> Why are you so confident about the path, when you cannot control whether
> there is a PEP/spoofing box in it?
>
>   

Honestly, I don?t understand the question.

I wrote: "The only scenrios where I see a justification / necessity for 
doing

splitting or spoofing are scenarios where the TCP flow must pass the
split box / spoofing box / PEP anyway. "

In other words: I restrict the use of split boxes to scenarios where there is no other path. Either the flow passes the box - or the flow passes away.

This _is_ a strong restriction.

I don?t want to advocate split boxes etc., which are hard state by nature, as an optimal solution for any problem. I?m totally with you that nearly any alternative to a split box is better then a split box. I only want to concede that there may be situations where the use of a splitter should be considered.

Practically spoken: If the word "splitter" appears in the abstract of a paper submission, please don?t reject it immediately. Please read at least the introduction ;-)


> ...
>   
>> To be not misunderstood: I don?t want to make restrictions for the
>> benefit of a splitter. I think in scenarios where an alternative path to
>> a splitter exist, a splitter must not be used.
>>     
>
> Either the use of splitters is under your control or it is not.
>
>   

 From my assumptions / restrictions it clearly _is_. And if you feel 
more comfortable that way we perfectly can integrate some kind of option 
or switch in a mobile network?s UNI where the user has the choise 
whether a splitter shall be allowed or shall be forbidden. So the use of 
a splitter must not be transparent but explicitely granted / requested 
by a user. We have similar options for transcoders / WWW proxies in 
mobile networks here in Germany. IIRC, E-plus offers optional 
transcoders / application level PEP.
> If it is, then there are a number of reasons to remove them, alternate
> paths are just one.
>
> If it is not, then you cannot make assumptions about the path.
>
>   
Hm. Admittedly, I think we?re talking somewhat at cross-purposes here.

I perfectly understand why you are strongly opposed against splitters 
and the reasons are compelling. However, when in a particular situation 
a splitter is the only yet known possibility e.g. to achieve acceptable 
throughput for a flow within a settling time of 10 seconds instead of 10 
minutes ore more, then we should consider giving the user the option to 
allow splitting.
>> In my opinion splitters
>> are to be used with maximum care and only in exceptional cases where any
>> known alternative is worse than a splitter.
>>     
>
> It would be interesting if you could explain a sample case. 

IIRC, Mark Allman has published some interesing work where he used 
splitters for  satellite  / deep space networks.
To my understanding the major concern was the extremely large time TCP 
needs to fill the line here.

I did not deal with TCP and extremely large line capacities too much 
yet. However, actually I do. It?s just your opposition to splitting 
which made me reconsider my paper on Path Tail Emulation and to redesign 
it that way that it relies only on pacing / spacing and does not assume 
/ use splitting or spoofing.

I?m not sure whether one is interested in the results. If so, I would be 
glad to discuss this.

> IMO,
> splitters just lie - they lie about being an endpoint they are not.
>   

When I was a child, my mother occasionally sang a song, I don?t know 
where she got it from or if anybody know ist, "It?s a sin to tell a lie"
And I don?t know (I never saw the text in a written form) whether this 
is a statement or a question. (According to the WWW, it?s a statement.)

> Either you are lying to yourself (you own the endpoint you're lying to)
> or you're lying to others. The first is silly - just install a true
> application proxy - and the second is YOU making a decision for ME about
> what's more important. If I don't want to talk to a true proxy, you have
> no business tricking me into thinking I'm not.
>
>   

As I said: We can agree that splitters shall not be used transparently / 
without permission by the user.

Detlef


From touch at ISI.EDU  Wed Jan 17 11:29:04 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 17 Jan 2007 11:29:04 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45AE7738.6070701@web.de>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de>
Message-ID: <45AE7900.70400@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
>>>
>>> First.
>>>
>>> The only scenrios where I see a justification / necessity for doing
>>> splitting or spoofing are scenarios where the TCP flow must pass the
>>> split box / spoofing box / PEP anyway. These are scenarios without path
>>> redundancy or path transparency.    
>>
>> Why are you so confident about the path, when you cannot control whether
>> there is a PEP/spoofing box in it?
>>
> 
> Honestly, I don?t understand the question.
> 
> I wrote: "The only scenrios where I see a justification / necessity for
> doing
> 
> splitting or spoofing are scenarios where the TCP flow must pass the
> split box / spoofing box / PEP anyway. "
> 
> In other words: I restrict the use of split boxes to scenarios where
> there is no other path. Either the flow passes the box - or the flow
> passes away.

I do not agree that you have control over this restriction.

...
> Practically spoken: If the word "splitter" appears in the abstract of a
> paper submission, please don?t reject it immediately. Please read at
> least the introduction ;-)

A key aspect of such a review is whether the assumptions are realistic.
I do not consider "control over path", as above, a realistic assumption
for splitters you do not control.

>>> To be not misunderstood: I don?t want to make restrictions for the
>>> benefit of a splitter. I think in scenarios where an alternative path to
>>> a splitter exist, a splitter must not be used.
>>>     
>>
>> Either the use of splitters is under your control or it is not.
>>
>>   
> 
> From my assumptions / restrictions it clearly _is_. And if you feel more
> comfortable that way we perfectly can integrate some kind of option or
> switch in a mobile network?s UNI where the user has the choise whether a
> splitter shall be allowed or shall be forbidden. 

I don't believe this is useful. People who deploy splitters that are
intended to be found, simply, do not - they deploy proxies. The whole
point of a splitter is to be transparent - either for backward
compatibility with devices that aren't capable of working with a proxy,
or to deliberately hide their presence.

> So the use of a
> splitter must not be transparent but explicitely granted / requested by
> a user. We have similar options for transcoders / WWW proxies in mobile
> networks here in Germany. IIRC, E-plus offers optional transcoders /
> application level PEP.

As noted above, I don't believe this is a viable case for TCP splitters.

...
> I perfectly understand why you are strongly opposed against splitters
> and the reasons are compelling. However, when in a particular situation
> a splitter is the only yet known possibility e.g. to achieve acceptable
> throughput for a flow within a settling time of 10 seconds instead of 10
> minutes ore more, then we should consider giving the user the option to
> allow splitting.

It would be useful to show such a case. I do not believe there is a case
where a splitter works where a proxy would not - or would not be more
appropriate.

>>> In my opinion splitters
>>> are to be used with maximum care and only in exceptional cases where any
>>> known alternative is worse than a splitter.
>>
>> It would be interesting if you could explain a sample case. 
> 
> IIRC, Mark Allman has published some interesing work where he used
> splitters for  satellite  / deep space networks.
> To my understanding the major concern was the extremely large time TCP
> needs to fill the line here.

In that case the splitter is either isomorphic to a proxy, or it is
spoofing the sender into violating current TCP congestion profiles. It'd
be useful for Mark to comment on this to clarify.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070117/082e4f87/signature.bin

From detlef.bosau at web.de  Wed Jan 17 12:50:12 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 17 Jan 2007 21:50:12 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45AE7900.70400@isi.edu>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
Message-ID: <45AE8C04.1000900@web.de>

Joe Touch wrote:
>>
>> In other words: I restrict the use of split boxes to scenarios where
>> there is no other path. Either the flow passes the box - or the flow
>> passes away.
>>     
>
> I do not agree that you have control over this restriction.
>   

When a network operator places a splitting box into a base station for a 
mobile or an earth station for a satellite, why shouldn? t he have 
control over that? At least the network operator who does the technical 
design and implementation should have control over that.

> ...
>   
>> Practically spoken: If the word "splitter" appears in the abstract of a
>> paper submission, please don?t reject it immediately. Please read at
>> least the introduction ;-)
>>     
>
> A key aspect of such a review is whether the assumptions are realistic.
> I do not consider "control over path", as above, a realistic assumption
> for splitters you do not control.
>
>   

Absolutely.

However, if we consider a mobile in a mobile wireless network the 
network?s infratstructure is completely under control by the network 
operator.

Detlef


From touch at ISI.EDU  Wed Jan 17 12:51:55 2007
From: touch at ISI.EDU (Joe Touch)
Date: Wed, 17 Jan 2007 12:51:55 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45AE8C04.1000900@web.de>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
	<45AE8C04.1000900@web.de>
Message-ID: <45AE8C6B.7040200@isi.edu>


Detlef Bosau wrote:
> Joe Touch wrote:
>>>
>>> In other words: I restrict the use of split boxes to scenarios where
>>> there is no other path. Either the flow passes the box - or the flow
>>> passes away.
>>>     
>>
>> I do not agree that you have control over this restriction.
> 
> When a network operator places a splitting box into a base station for a
> mobile or an earth station for a satellite, why shouldn? t he have
> control over that? At least the network operator who does the technical
> design and implementation should have control over that.

Sure; if you're the exclusive path to the rest of the net, that's true.
But you still haven't explained how such a splitter would help better
than a non-spoofing PEP or a proxy, or why you need a splitter instead
of those alternatives.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070117/e7c6ddcf/signature.bin

From david.borman at windriver.com  Thu Jan 18 08:46:58 2007
From: david.borman at windriver.com (David Borman)
Date: Thu, 18 Jan 2007 10:46:58 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45AE7900.70400@isi.edu>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
Message-ID: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>

There are real-world scenarios where the insertion of a splitter into  
a TCP path does make a lot of sense.  The cases I am familiar with  
all are necessitated by a severe mismatch in MTU, buffering and  
performance, the splitter is in the only path by which the packets  
can travel, and it is sitting at the crossover between the two  
disparate paths.  In the specific case that I dealt with, the  
splitter's main purpose was to change the TCP MSS option, send larger  
window sizes, and buffer/repackage data.

Getting the splitter to operate well takes some work.  It has to  
maintain state for the connections in both directions.  Besides  
acking and buffering data in both directions, and possibly  
repackaging data between the two sides, it also has to make sure that  
it synchronizes control events between the two halves so that neither  
endpoint gets into a state of believing that the connection has  
completed successfully when it hasn't.  And there will still be  
failure modes that you wouldn't get with a straight TCP connection,  
but most of them are when the connection doesn't complete successfully.

But in general, deploying a splitter where there is a possibility  
that packets can take an alternate route around the splitter, or  
where you do not have some degree of control over one side of the  
network seems like a bad idea to me.  A splitter should not be a  
general purpose device, it should be tied to the unique  
bandwidth*delay mismatch of the problem that is being addressed.

A TCP splitter that is *not* a NAT box operates at the TCP layer, and  
should not require any changes to the content of the TCP data stream,  
whereas an application level proxy often requires that the proxy has  
knowledge of the particular application, and may have to modify the  
data stream.

			-David Borman


From touch at ISI.EDU  Thu Jan 18 10:00:01 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 18 Jan 2007 10:00:01 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
Message-ID: <45AFB5A1.9030407@isi.edu>


David Borman wrote:
> There are real-world scenarios where the insertion of a splitter into a
> TCP path does make a lot of sense.  The cases I am familiar with all are
> necessitated by a severe mismatch in MTU, buffering and performance, 

Taking each individually:

Mismatched MTU - sounds like a PMTU issue, otherwise you're
hyper-optimizing IP overheads in ways that the Internet protocols are
not designed to support. If you have a broken PMTU situation, using a
splitter to 'patch' the situation is fixing one broken system with
another, IMO.

Buffing problems could as easily be solved by non-splitter PEPs that
buffer and retransmit, acting like a two-port router. The same is true
for many performance problems.

I don't agree that either makes sense, although I appreciate the desire
for the first case where there are no alternatives., but only as a patch.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070118/07ce59d4/signature.bin

From david.borman at windriver.com  Thu Jan 18 13:20:48 2007
From: david.borman at windriver.com (David Borman)
Date: Thu, 18 Jan 2007 15:20:48 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45AFB5A1.9030407@isi.edu>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
Message-ID: <B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>

Hi Joe,

A little more detail, see below.

On Jan 18, 2007, at 12:00 PM, Joe Touch wrote:

>
>
> David Borman wrote:
>> There are real-world scenarios where the insertion of a splitter  
>> into a
>> TCP path does make a lot of sense.  The cases I am familiar with  
>> all are
>> necessitated by a severe mismatch in MTU, buffering and performance,
>
> Taking each individually:
>
> Mismatched MTU - sounds like a PMTU issue, otherwise you're
> hyper-optimizing IP overheads in ways that the Internet protocols are
> not designed to support. If you have a broken PMTU situation, using a
> splitter to 'patch' the situation is fixing one broken system with
> another, IMO.

It's not a PMTU issue, PMTU finds the smallest MTU along the path.   
I'm talking about a large MTU mismatch, such as a standard ethernet  
on one side with 1500 byte packets, and an interface with a 64K MTU  
on the other side (HIPPI, FibreChannel, etc).  The goal is to be able  
to use the large packets between the splitter and the host on the 64K  
MTU network, an ethernet sized packets out to the other endpoint.   
With PMTU and without the intervention of the splitter, packets will  
be limited to 1500 bytes along the whole path.

> Buffing problems could as easily be solved by non-splitter PEPs that
> buffer and retransmit, acting like a two-port router. The same is true
> for many performance problems.

In this scenario, the 1500 byte host may be only offering a window  
of, say 16K.  The splitter offers a window to the 64K host of  
something like 512K.  This allows the 64K MTU host to send multiple  
64K sized packets, which the splitter then sends out as ethernet size  
packets to the remote host.  In other words, for a 16K vs. 512K  
scenario, for each window of data transferred between the 64K host  
and the splitter, there are 32 windows of data transferred out to the  
remote hosts.

Conversely, as 1500 byte packets arrive from the remote host, they  
are acked and accumulated into larger packets that are then  
transferred over the 64K MTU network in larger packets.


> I don't agree that either makes sense, although I appreciate the  
> desire
> for the first case where there are no alternatives., but only as a  
> patch.

Again, I don't think a splitter is a good general solution, but there  
are specific cases where it can do what needs to be done within the  
constraints of the system.

			-David Borman

>
> Joe
>
> -- 
> ----------------------------------------
> Joe Touch
> Sr. Network Engineer, USAF TSAT Space Segment
>


From touch at ISI.EDU  Thu Jan 18 14:09:08 2007
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 18 Jan 2007 14:09:08 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
Message-ID: <45AFF004.3060709@isi.edu>


David Borman wrote:
> Hi Joe,
> 
> A little more detail, see below.
> 
> On Jan 18, 2007, at 12:00 PM, Joe Touch wrote:
> 
>>
>>
>> David Borman wrote:
>>> There are real-world scenarios where the insertion of a splitter into a
>>> TCP path does make a lot of sense.  The cases I am familiar with all are
>>> necessitated by a severe mismatch in MTU, buffering and performance,
>>
>> Taking each individually:
>>
>> Mismatched MTU - sounds like a PMTU issue, otherwise you're
>> hyper-optimizing IP overheads in ways that the Internet protocols are
>> not designed to support. If you have a broken PMTU situation, using a
>> splitter to 'patch' the situation is fixing one broken system with
>> another, IMO.
> 
> It's not a PMTU issue, PMTU finds the smallest MTU along the path.  I'm
> talking about a large MTU mismatch, such as a standard ethernet on one
> side with 1500 byte packets, and an interface with a 64K MTU on the
> other side (HIPPI, FibreChannel, etc).  The goal is to be able to use
> the large packets between the splitter and the host on the 64K MTU
> network, an ethernet sized packets out to the other endpoint.  With PMTU
> and without the intervention of the splitter, packets will be limited to
> 1500 bytes along the whole path.

That's in the margins of 'hyperoptimization' I noted above, IMO. I'm not
clear what the utility of having the larger MTU is there, vs., e.g.,
frame bursting, except that it offloads data coalescing to an outboard
processor. If that's the goal, then this amounts to an outboard 'network
coprocessor'.

>> Buffing problems could as easily be solved by non-splitter PEPs that
>> buffer and retransmit, acting like a two-port router. The same is true
>> for many performance problems.
> 
> In this scenario, the 1500 byte host may be only offering a window of,
> say 16K.  The splitter offers a window to the 64K host of something like
> 512K.  This allows the 64K MTU host to send multiple 64K sized packets,
> which the splitter then sends out as ethernet size packets to the remote
> host.  In other words, for a 16K vs. 512K scenario, for each window of
> data transferred between the 64K host and the splitter, there are 32
> windows of data transferred out to the remote hosts.
> 
> Conversely, as 1500 byte packets arrive from the remote host, they are
> acked and accumulated into larger packets that are then transferred over
> the 64K MTU network in larger packets.
> 
> 
>> I don't agree that either makes sense, although I appreciate the desire
>> for the first case where there are no alternatives., but only as a patch.
> 
> Again, I don't think a splitter is a good general solution, but there
> are specific cases where it can do what needs to be done within the
> constraints of the system.

The above both look like outboard coprocessors. If that's the goal, then
you're really extending the boundary of what the endhost is, and that's
reasonable. Most other uses - to silently help someone who doesn't know
you're there - are the problem, IMO.

Joe


----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070118/29745269/signature.bin

From perfgeek at mac.com  Fri Jan 19 08:52:53 2007
From: perfgeek at mac.com (rick jones)
Date: Fri, 19 Jan 2007 08:52:53 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
Message-ID: <8b38e92efb05d97f0587240a04505367@mac.com>

> In this scenario, the 1500 byte host may be only offering a window of, 
> say 16K.  The splitter offers a window to the 64K host of something 
> like 512K.  This allows the 64K MTU host to send multiple 64K sized 
> packets, which the splitter then sends out as ethernet size packets to 
> the remote host.  In other words, for a 16K vs. 512K scenario, for 
> each window of data transferred between the 64K host and the splitter, 
> there are 32 windows of data transferred out to the remote hosts.
>
> Conversely, as 1500 byte packets arrive from the remote host, they are 
> acked and accumulated into larger packets that are then transferred 
> over the 64K MTU network in larger packets.

Apart from calling it a splitter, superficially at least that resembles 
what some 10G NICs can do today, albeit with some explicit 
knowledge/assistance by the stack.  Large send has the stack(host) 
giving the NIC(splitter) a large "segment" which the NIC(splitter) 
resegments for the link.  Those flow across the ethernet to the other 
NIC(splitter) which if it has Large Receive Offload enabled will 
"upsegment" the ethernet-sized traffic and give larger segments to the 
receiving stack(host).

rick jones
there is no rest for the wicked, yet the virtuous have no pillows


From touch at ISI.EDU  Fri Jan 19 09:17:41 2007
From: touch at ISI.EDU (Joe Touch)
Date: Fri, 19 Jan 2007 09:17:41 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <8b38e92efb05d97f0587240a04505367@mac.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
Message-ID: <45B0FD35.7050708@isi.edu>


rick jones wrote:
>> In this scenario, the 1500 byte host may be only offering a window of,
>> say 16K.  The splitter offers a window to the 64K host of something
>> like 512K.  This allows the 64K MTU host to send multiple 64K sized
>> packets, which the splitter then sends out as ethernet size packets to
>> the remote host.  In other words, for a 16K vs. 512K scenario, for
>> each window of data transferred between the 64K host and the splitter,
>> there are 32 windows of data transferred out to the remote hosts.
>>
>> Conversely, as 1500 byte packets arrive from the remote host, they are
>> acked and accumulated into larger packets that are then transferred
>> over the 64K MTU network in larger packets.
> 
> Apart from calling it a splitter, superficially at least that resembles
> what some 10G NICs can do today, albeit with some explicit
> knowledge/assistance by the stack.  Large send has the stack(host)
> giving the NIC(splitter) a large "segment" which the NIC(splitter)
> resegments for the link.  Those flow across the ethernet to the other
> NIC(splitter) which if it has Large Receive Offload enabled will
> "upsegment" the ethernet-sized traffic and give larger segments to the
> receiving stack(host).

Right - this looks like a cooperative outboard processor, which makes a
lot of sense in some environments when both the outboard processor and
host are managed/controlled by the same entity, but still makes very
little sense (to me) when that's not the case.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070119/530b7034/signature.bin

From david.borman at windriver.com  Fri Jan 19 11:03:33 2007
From: david.borman at windriver.com (David Borman)
Date: Fri, 19 Jan 2007 13:03:33 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45B0FD35.7050708@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
Message-ID: <C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>

No, it's more than just Large Send Offload or Large Receive Offload.   
That's done on a per-packet basis, without needing to keep much, if  
any state.  In the scenario I'm citing the splitter is also changing  
the window and the MSS option.  The remote host offers a (relatively)  
small window, the splitter offers a much bigger (512K) window to the  
host on the 64K MTU network (in addition to rewriting the MSS  
option).  With the small delay*bandwith to the remote host, the  
splitter has no trouble keeping the pipe full using standard ethernet  
packets.  But if those packets went all the way to the 64K host  
across the large delay*bandwidth 64KMTU network, there'd be a lot of  
idle time waiting for window updates, and you get much lower  
throughput from end-to-end.
			-David Borman

On Jan 19, 2007, at 11:17 AM, Joe Touch wrote:

>
>
> rick jones wrote:
>>> In this scenario, the 1500 byte host may be only offering a  
>>> window of,
>>> say 16K.  The splitter offers a window to the 64K host of something
>>> like 512K.  This allows the 64K MTU host to send multiple 64K sized
>>> packets, which the splitter then sends out as ethernet size  
>>> packets to
>>> the remote host.  In other words, for a 16K vs. 512K scenario, for
>>> each window of data transferred between the 64K host and the  
>>> splitter,
>>> there are 32 windows of data transferred out to the remote hosts.
>>>
>>> Conversely, as 1500 byte packets arrive from the remote host,  
>>> they are
>>> acked and accumulated into larger packets that are then transferred
>>> over the 64K MTU network in larger packets.
>>
>> Apart from calling it a splitter, superficially at least that  
>> resembles
>> what some 10G NICs can do today, albeit with some explicit
>> knowledge/assistance by the stack.  Large send has the stack(host)
>> giving the NIC(splitter) a large "segment" which the NIC(splitter)
>> resegments for the link.  Those flow across the ethernet to the other
>> NIC(splitter) which if it has Large Receive Offload enabled will
>> "upsegment" the ethernet-sized traffic and give larger segments to  
>> the
>> receiving stack(host).
>
> Right - this looks like a cooperative outboard processor, which  
> makes a
> lot of sense in some environments when both the outboard processor and
> host are managed/controlled by the same entity, but still makes very
> little sense (to me) when that's not the case.
>
> Joe
>
> -- 
> ----------------------------------------
> Joe Touch
> Sr. Network Engineer, USAF TSAT Space Segment
>


From perfgeek at mac.com  Fri Jan 19 18:57:43 2007
From: perfgeek at mac.com (rick jones)
Date: Fri, 19 Jan 2007 18:57:43 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
Message-ID: <5e0bbd63ccb9905200d8c14e373a851e@mac.com>


On Jan 19, 2007, at 11:03 AM, David Borman wrote:

> No, it's more than just Large Send Offload or Large Receive Offload.  
> That's done on a per-packet basis, without needing to keep much, if 
> any state.  In the scenario I'm citing the splitter is also changing 
> the window and the MSS option.

Then I guess TOE may be closer, but still not quite there.

rick jones
there is no rest for the wicked, yet the virtuous have no pillows


From touch at ISI.EDU  Sun Jan 21 10:01:49 2007
From: touch at ISI.EDU (Joe Touch)
Date: Sun, 21 Jan 2007 10:01:49 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
Message-ID: <45B3AA8D.7070203@isi.edu>

This device offloads processing on behalf of the endpoint. Such devices
can offload on a per-socketbuffer basis; this isn't much different,
except that it sends IP packets to the offloader.

Buffering serves two purposes: help a source with an insufficient buffer
for the BW*delay product, and help compensate for a receiver that can't
keep pace with a bunch of little packets.

The former helps only if you have an endpoint that can send relatively
fast, but has poor buffering.

The latter helps only if you have a receiver that can't keep pace with
lots of small packets.

Both point to broken implementations, and trade correctness for
performance. That's not how the rest of TCP is optimized. IMO, this
argues for a different transport protocol, not for these splitters.

Joe

David Borman wrote:
> No, it's more than just Large Send Offload or Large Receive Offload. 
> That's done on a per-packet basis, without needing to keep much, if any
> state.  In the scenario I'm citing the splitter is also changing the
> window and the MSS option.  The remote host offers a (relatively) small
> window, the splitter offers a much bigger (512K) window to the host on
> the 64K MTU network (in addition to rewriting the MSS option).  With the
> small delay*bandwith to the remote host, the splitter has no trouble
> keeping the pipe full using standard ethernet packets.  But if those
> packets went all the way to the 64K host across the large
> delay*bandwidth 64KMTU network, there'd be a lot of idle time waiting
> for window updates, and you get much lower throughput from end-to-end.
>             -David Borman
> 
> On Jan 19, 2007, at 11:17 AM, Joe Touch wrote:
> 
>>
>>
>> rick jones wrote:
>>>> In this scenario, the 1500 byte host may be only offering a window of,
>>>> say 16K.  The splitter offers a window to the 64K host of something
>>>> like 512K.  This allows the 64K MTU host to send multiple 64K sized
>>>> packets, which the splitter then sends out as ethernet size packets to
>>>> the remote host.  In other words, for a 16K vs. 512K scenario, for
>>>> each window of data transferred between the 64K host and the splitter,
>>>> there are 32 windows of data transferred out to the remote hosts.
>>>>
>>>> Conversely, as 1500 byte packets arrive from the remote host, they are
>>>> acked and accumulated into larger packets that are then transferred
>>>> over the 64K MTU network in larger packets.
>>>
>>> Apart from calling it a splitter, superficially at least that resembles
>>> what some 10G NICs can do today, albeit with some explicit
>>> knowledge/assistance by the stack.  Large send has the stack(host)
>>> giving the NIC(splitter) a large "segment" which the NIC(splitter)
>>> resegments for the link.  Those flow across the ethernet to the other
>>> NIC(splitter) which if it has Large Receive Offload enabled will
>>> "upsegment" the ethernet-sized traffic and give larger segments to the
>>> receiving stack(host).
>>
>> Right - this looks like a cooperative outboard processor, which makes a
>> lot of sense in some environments when both the outboard processor and
>> host are managed/controlled by the same entity, but still makes very
>> little sense (to me) when that's not the case.
>>
>> Joe
>>
>> ------------------------------------------
>> Joe Touch
>> Sr. Network Engineer, USAF TSAT Space Segment
>>

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070121/7c89ae7d/signature.bin

From detlef.bosau at web.de  Sun Jan 21 15:02:37 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 22 Jan 2007 00:02:37 +0100
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de> <45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de> <45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
Message-ID: <45B3F10D.9010501@web.de>

David Borman wrote:
> There are real-world scenarios where the insertion of a splitter into 
> a TCP path does make a lot of sense.  The cases I am familiar with all 
> are necessitated by a severe mismatch in MTU, buffering and 
> performance, the splitter is in the only path by which the packets can 
> travel, and it is sitting at the crossover between the two disparate 
> paths.  In the specific case that I dealt with, the splitter's main 
> purpose was to change the TCP MSS option, send larger window sizes, 
> and buffer/repackage data.

In addition to the MTU issue you mention let me point to
Joseph Ishac, Mark Allman. / On the Performance of TCP Spoofing in 
Satellite Networks <http://www.icir.org/mallman/papers/milcom01.pdf>/. 
IEEE Milcom. October 2001.
http://www.icir.org/mallman/papers/milcom01.pdf

The issue here is the extremely large round trip time in satellite 
networks which causes TCP to need a quite long time to achieve 
sufficient throughput.

In fact, the RTT described by Mark Allman aren?t even that bad. I?m 
still working on the issue of opportunistic scheduling in mobile networks.
I just found a technical report on that issue:
TCP Performance in Wireless Systems with Opportunistic Scheduling, R. 
Srinivasan and J. S. Baras, 
<http://techreports.isr.umd.edu/ARCHIVE/searchResults.php?searchString=R.%20Srinivasan%20and%20J.%20S.%20Baras>*Number:* 
TR 2002-48, *Year:* 2002 
<http://techreports.isr.umd.edu/ARCHIVE/dsp_reportList.php?year=2002&center=ISR>, 
*Advisor:* John Baras 
<http://techreports.isr.umd.edu/ARCHIVE/searchResults.php?searchString=John%20Baras>
http://techreports.isr.umd.edu/reports/2002/TR_2002-48.pdf

As far as I see, this is really excellent work. In the example at the 
beginning at the paper opportunistic scheduling introduces a delay 
jitter of up to 1 second into the flow.

I currently simulate  networks with a physcical bandwidth of 10 Mbps and 
an average throughput of 100 kbps at the link layer due to 
retransmissions and accept a delay jitter of up to 1 second. When a 
equalize the delay spikes by buffering to an extent that the TCP fully 
exploits the average throughput at the link layer the round trip time as 
perceived by the sender can reach up to 10 seconds. It?s simply the 
question in wich kind of application 10 seconds RTT will be accepted.

Particularly for mobile networks with no splitting or spoofing I clearly 
expect the quite strict alternative that a network will either exhibit 
acceptable round trip times or acceptable throughput. Of course delay 
spikes themselves will be annoying in interactive appliccations. But one 
second delay may be acceptable for a user whereas the same user would 
simply abolish an application / connection with 10 seconds round trip time.

Detlef


From david.borman at windriver.com  Mon Jan 22 08:02:26 2007
From: david.borman at windriver.com (David Borman)
Date: Mon, 22 Jan 2007 10:02:26 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45B3AA8D.7070203@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
Message-ID: <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>

Joe,

You keep missing the point.  The delay*bandwidth between the end  
hosts is sufficiently large that it can not be driven at full speed  
from end-to-end given the window advertised by the host on the  
ethernet side of things.  Even if that host advertised a sufficiently  
large window, the inefficiencies of small packets on the 64K MTU side  
of the network will keep the network from being driven at full speed,  
not to mention the cost of ramping up slowstart using 1.5K byte  
packets vs. 64K byte packets.

The splitter in this case is sitting between the two networks,  
transparently connecting what it has effectively turned into two TCP  
connections, providing the necessary resources to allow TCP to run  
optimally on each half of the path, without the end nodes needing to  
have explicit knowledge of the splitter.

TCP does not always work well in all scenarios, but there is a lot of  
value in being able to use TCP instead of designing a new transport  
and internet layer.  In a scenario like this, the splitter allows TCP  
to be used in an environment where it otherwise wouldn't work very well.

And sure, it'd be best if the splitter wasn't needed, and the  
connection could be run at full speed from end-to-end, but sometimes  
you have to deal with realities and not just theory.  Focusing on  
details and calling the implementation broken, and then ignoring the  
underlying issues, doesn't resolve anything.

			-David Borman

On Jan 21, 2007, at 12:01 PM, Joe Touch wrote:

> This device offloads processing on behalf of the endpoint. Such  
> devices
> can offload on a per-socketbuffer basis; this isn't much different,
> except that it sends IP packets to the offloader.
>
> Buffering serves two purposes: help a source with an insufficient  
> buffer
> for the BW*delay product, and help compensate for a receiver that  
> can't
> keep pace with a bunch of little packets.
>
> The former helps only if you have an endpoint that can send relatively
> fast, but has poor buffering.
>
> The latter helps only if you have a receiver that can't keep pace with
> lots of small packets.
>
> Both point to broken implementations, and trade correctness for
> performance. That's not how the rest of TCP is optimized. IMO, this
> argues for a different transport protocol, not for these splitters.
>
> Joe
>
> David Borman wrote:
>> No, it's more than just Large Send Offload or Large Receive Offload.
>> That's done on a per-packet basis, without needing to keep much,  
>> if any
>> state.  In the scenario I'm citing the splitter is also changing the
>> window and the MSS option.  The remote host offers a (relatively)  
>> small
>> window, the splitter offers a much bigger (512K) window to the  
>> host on
>> the 64K MTU network (in addition to rewriting the MSS option).   
>> With the
>> small delay*bandwith to the remote host, the splitter has no trouble
>> keeping the pipe full using standard ethernet packets.  But if those
>> packets went all the way to the 64K host across the large
>> delay*bandwidth 64KMTU network, there'd be a lot of idle time waiting
>> for window updates, and you get much lower throughput from end-to- 
>> end.
>>             -David Borman
>>
>> On Jan 19, 2007, at 11:17 AM, Joe Touch wrote:
>>
>>>
>>>
>>> rick jones wrote:
>>>>> In this scenario, the 1500 byte host may be only offering a  
>>>>> window of,
>>>>> say 16K.  The splitter offers a window to the 64K host of  
>>>>> something
>>>>> like 512K.  This allows the 64K MTU host to send multiple 64K  
>>>>> sized
>>>>> packets, which the splitter then sends out as ethernet size  
>>>>> packets to
>>>>> the remote host.  In other words, for a 16K vs. 512K scenario, for
>>>>> each window of data transferred between the 64K host and the  
>>>>> splitter,
>>>>> there are 32 windows of data transferred out to the remote hosts.
>>>>>
>>>>> Conversely, as 1500 byte packets arrive from the remote host,  
>>>>> they are
>>>>> acked and accumulated into larger packets that are then  
>>>>> transferred
>>>>> over the 64K MTU network in larger packets.
>>>>
>>>> Apart from calling it a splitter, superficially at least that  
>>>> resembles
>>>> what some 10G NICs can do today, albeit with some explicit
>>>> knowledge/assistance by the stack.  Large send has the stack(host)
>>>> giving the NIC(splitter) a large "segment" which the NIC(splitter)
>>>> resegments for the link.  Those flow across the ethernet to the  
>>>> other
>>>> NIC(splitter) which if it has Large Receive Offload enabled will
>>>> "upsegment" the ethernet-sized traffic and give larger segments  
>>>> to the
>>>> receiving stack(host).
>>>
>>> Right - this looks like a cooperative outboard processor, which  
>>> makes a
>>> lot of sense in some environments when both the outboard  
>>> processor and
>>> host are managed/controlled by the same entity, but still makes very
>>> little sense (to me) when that's not the case.
>>>
>>> Joe
>>>
>>> ------------------------------------------
>>> Joe Touch
>>> Sr. Network Engineer, USAF TSAT Space Segment
>>>
>
> -- 
> ----------------------------------------
> Joe Touch
> Sr. Network Engineer, USAF TSAT Space Segment
>


From touch at ISI.EDU  Mon Jan 22 09:09:22 2007
From: touch at ISI.EDU (Joe Touch)
Date: Mon, 22 Jan 2007 09:09:22 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
Message-ID: <45B4EFC2.3020408@isi.edu>


David Borman wrote:
> Joe,
> 
> You keep missing the point.  The delay*bandwidth between the end hosts
> is sufficiently large that it can not be driven at full speed from
> end-to-end given the window advertised by the host on the ethernet side
> of things.  Even if that host advertised a sufficiently large window,
> the inefficiencies of small packets on the 64K MTU side of the network
> will keep the network from being driven at full speed, not to mention
> the cost of ramping up slowstart using 1.5K byte packets vs. 64K byte
> packets.

This is a contradiction: clearly the splitter needs to keep up with
receiving small packets at rate or it can't sustain emitting the large
packets at full speed. If the splitter can do this, then the destination
can. The fact that it doesn't means this is (by definition) a patch to a
broken system.

Using splitters to patch broken systems is understandable, but it's
still preferable (IMO) to make the splitter visible and run it as a true
proxy, terminating the TCP on both ends properly.

> The splitter in this case is sitting between the two networks,
> transparently connecting what it has effectively turned into two TCP
> connections,

That's the point that's missed, IMO - this isn't "effectively' two TCP
connections; it provides the benefit of two TCP connections without
actually terminating the connections, which means this isn't
'effectively' two, but 'one TCP connection with the performance and
semantics of two'. The former is understandable, but the latter is the
problem.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/d2d27035/signature.bin

From david.borman at windriver.com  Mon Jan 22 13:33:33 2007
From: david.borman at windriver.com (David Borman)
Date: Mon, 22 Jan 2007 15:33:33 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45B4EFC2.3020408@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
	<45B4EFC2.3020408@isi.edu>
Message-ID: <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>


On Jan 22, 2007, at 11:09 AM, Joe Touch wrote:

>
>
> David Borman wrote:
>> Joe,
>>
>> You keep missing the point.  The delay*bandwidth between the end  
>> hosts
>> is sufficiently large that it can not be driven at full speed from
>> end-to-end given the window advertised by the host on the ethernet  
>> side
>> of things.  Even if that host advertised a sufficiently large window,
>> the inefficiencies of small packets on the 64K MTU side of the  
>> network
>> will keep the network from being driven at full speed, not to mention
>> the cost of ramping up slowstart using 1.5K byte packets vs. 64K byte
>> packets.
>
> This is a contradiction: clearly the splitter needs to keep up with
> receiving small packets at rate or it can't sustain emitting the large
> packets at full speed. If the splitter can do this, then the  
> destination
> can. The fact that it doesn't means this is (by definition) a patch  
> to a
> broken system.

Ah, you are assuming that both the ethernet side and the 64K MTU side  
of the path operate equally efficiently using small packets.  That is  
not the case.  The splitter isn't able to keep the pipe full over the  
64K network using 1500 byte packets, but it can using larger packets,  
so the further remote host is even less able to keep it full using  
1500 byte packets.  It's like using dump trucks to haul individual  
wheelbarrow size loads; it'll work, but you won't be very efficient  
and the transfer will take a lot longer.

So, I disagree with your contention that the system is broken.  It's  
different and heterogeneous, but that doesn't make it broken.

			-David Borman


From touch at ISI.EDU  Mon Jan 22 14:05:58 2007
From: touch at ISI.EDU (Joe Touch)
Date: Mon, 22 Jan 2007 14:05:58 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
	<45B4EFC2.3020408@isi.edu>
	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>
Message-ID: <45B53546.4080507@isi.edu>


David Borman wrote:
>> This is a contradiction: clearly the splitter needs to keep up with
>> receiving small packets at rate or it can't sustain emitting the large
>> packets at full speed. If the splitter can do this, then the destination
>> can. The fact that it doesn't means this is (by definition) a patch to a
>> broken system.
> 
> Ah, you are assuming that both the ethernet side and the 64K MTU side of
> the path operate equally efficiently using small packets.

source ---------------> splitter ----------------> dest
           1500byte                64K byte


You're claiming that the splitter is required to keep the 64Kbyte side
running at full rate. That means the 1500-byte side has to handle
packets roughly 40x faster. Otherwise, the 64K byte side is not running
at high-rate.

So here's what we have:
	- dest can handle 64K but not 1500
	- source must handle 1500 at high rate
	- splitter must receive 1500 at high rate

Now you're claiming that there's a link (source-splitter) that's
efficient enough for small packets. If that's the case, why would we
ever want the kind of link that's being used splitter-dest?

Again, this argues that something is seriously broken.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/08092b10/signature.bin

From david.borman at windriver.com  Mon Jan 22 14:38:14 2007
From: david.borman at windriver.com (David Borman)
Date: Mon, 22 Jan 2007 16:38:14 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45B53546.4080507@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
	<45B4EFC2.3020408@isi.edu>
	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>
	<45B53546.4080507@isi.edu>
Message-ID: <EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>


On Jan 22, 2007, at 4:05 PM, Joe Touch wrote:

>
>
> David Borman wrote:
>>> This is a contradiction: clearly the splitter needs to keep up with
>>> receiving small packets at rate or it can't sustain emitting the  
>>> large
>>> packets at full speed. If the splitter can do this, then the  
>>> destination
>>> can. The fact that it doesn't means this is (by definition) a  
>>> patch to a
>>> broken system.
>>
>> Ah, you are assuming that both the ethernet side and the 64K MTU  
>> side of
>> the path operate equally efficiently using small packets.
>
> source ---------------> splitter ----------------> dest
>            1500byte                64K byte
>
>
> You're claiming that the splitter is required to keep the 64Kbyte side
> running at full rate. That means the 1500-byte side has to handle
> packets roughly 40x faster. Otherwise, the 64K byte side is not  
> running
> at high-rate.
>
> So here's what we have:
> 	- dest can handle 64K but not 1500
> 	- source must handle 1500 at high rate
> 	- splitter must receive 1500 at high rate
>
> Now you're claiming that there's a link (source-splitter) that's
> efficient enough for small packets. If that's the case, why would we
> ever want the kind of link that's being used splitter-dest?

If all you're ever going to do is talk through the splitter to remote  
ethernet hosts, then yes, it'd be preferable to bring ethernet  
directly to the host instead of using the 64K MTU network.  But you  
don't always get what you want.  For various reasons it might not be  
possible to bring ethernet directly to the hosts on the 64K network.   
And while the 64K MTU network may not be as efficient with 1500 byte  
packets as an ethernet network, replacing it with an ethernet network  
might be slower internally than the 64K MTU network.  So the trade  
off is a faster 64K network that works well with large packets but  
not ethernet sized packets, vs. a slower ethernet network that works  
better with ethernet sized packets, but doesn't have the overall  
capacity of the 64K network.

>
> Again, this argues that something is seriously broken.

Sometimes there isn't an optimal solution and you have to make hard  
choices.  Just because it isn't the one you want doesn't mean things  
are *broken* when you then try to mitigate the effects of those choices.

			-David Borman


From touch at ISI.EDU  Mon Jan 22 14:49:48 2007
From: touch at ISI.EDU (Joe Touch)
Date: Mon, 22 Jan 2007 14:49:48 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
	<45B4EFC2.3020408@isi.edu>
	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>
	<45B53546.4080507@isi.edu>
	<EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
Message-ID: <45B53F8C.409@isi.edu>


David Borman wrote:
> 
> On Jan 22, 2007, at 4:05 PM, Joe Touch wrote:
...
>>> Ah, you are assuming that both the ethernet side and the 64K MTU side of
>>> the path operate equally efficiently using small packets.
>>
>> source ---------------> splitter ----------------> dest
>>            1500byte                64K byte
>>
>>
>> You're claiming that the splitter is required to keep the 64Kbyte side
>> running at full rate. That means the 1500-byte side has to handle
>> packets roughly 40x faster. Otherwise, the 64K byte side is not running
>> at high-rate.
>>
>> So here's what we have:
>>     - dest can handle 64K but not 1500
>>     - source must handle 1500 at high rate
>>     - splitter must receive 1500 at high rate
>>
>> Now you're claiming that there's a link (source-splitter) that's
>> efficient enough for small packets. If that's the case, why would we
>> ever want the kind of link that's being used splitter-dest?
> 
> If all you're ever going to do is talk through the splitter to remote
> ethernet hosts, then yes, it'd be preferable to bring ethernet directly
> to the host instead of using the 64K MTU network.  But you don't always
> get what you want.  For various reasons it might not be possible to
> bring ethernet directly to the hosts on the 64K network.

In that case you're making a case for an outboard ethernet adapter,
e.g., like the USB dongles.

> And while the
> 64K MTU network may not be as efficient with 1500 byte packets as an
> ethernet network, replacing it with an ethernet network might be slower
> internally than the 64K MTU network.

That's the part that's confusing. In order to warrant the splitter, the
ethernet side must keep up. But then you're saying here that the
ethernet is slower.

Either:
ethernet keeps up
	which you need to assume to warrant a splitter,
	but which begs the question of why you have less capable
	net to the right

ethernet doesn't keep up
	in which case aggregation doesn't help

> So the trade off is a faster 64K
> network that works well with large packets but not ethernet sized
> packets, vs. a slower ethernet network that works better with ethernet
> sized packets, but doesn't have the overall capacity of the 64K network.

In the latter case, you're not keeping up on the source-splitter side.
Which means you don't need the splitter either to aggregate or to buffer.

>> Again, this argues that something is seriously broken.
> 
> Sometimes there isn't an optimal solution and you have to make hard
> choices.  Just because it isn't the one you want doesn't mean things are
> *broken* when you then try to mitigate the effects of those choices.

It's still very unclear what effects this is mitigating.
-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/d1f75b85/signature.bin

From david.borman at windriver.com  Mon Jan 22 17:48:46 2007
From: david.borman at windriver.com (David Borman)
Date: Mon, 22 Jan 2007 19:48:46 -0600
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
	window thread ; -))
In-Reply-To: <45B53F8C.409@isi.edu>
References: <45A57D7A.6030505@isi.edu>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>
	<45A67E7D.4010609@isi.edu> <45A82323.30405@web.de>
	<45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de>
	<45AE7900.70400@isi.edu>
	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>
	<45AFB5A1.9030407@isi.edu>
	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>
	<8b38e92efb05d97f0587240a04505367@mac.com>
	<45B0FD35.7050708@isi.edu>
	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>
	<45B3AA8D.7070203@isi.edu>
	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>
	<45B4EFC2.3020408@isi.edu>
	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>
	<45B53546.4080507@isi.edu>
	<EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
	<45B53F8C.409@isi.edu>
Message-ID: <2404896E-2714-4BBF-97C1-B1C952280F90@windriver.com>


On Jan 22, 2007, at 4:49 PM, Joe Touch wrote:

>
>
> David Borman wrote:
>>
>> On Jan 22, 2007, at 4:05 PM, Joe Touch wrote:
> ...
>>>> Ah, you are assuming that both the ethernet side and the 64K MTU  
>>>> side of
>>>> the path operate equally efficiently using small packets.
>>>
>>> source ---------------> splitter ----------------> dest
>>>            1500byte                64K byte
>>>
>>>
>>> You're claiming that the splitter is required to keep the 64Kbyte  
>>> side
>>> running at full rate. That means the 1500-byte side has to handle
>>> packets roughly 40x faster. Otherwise, the 64K byte side is not  
>>> running
>>> at high-rate.
>>>
>>> So here's what we have:
>>>     - dest can handle 64K but not 1500
>>>     - source must handle 1500 at high rate
>>>     - splitter must receive 1500 at high rate
>>>
>>> Now you're claiming that there's a link (source-splitter) that's
>>> efficient enough for small packets. If that's the case, why would we
>>> ever want the kind of link that's being used splitter-dest?
>>
>> If all you're ever going to do is talk through the splitter to remote
>> ethernet hosts, then yes, it'd be preferable to bring ethernet  
>> directly
>> to the host instead of using the 64K MTU network.  But you don't  
>> always
>> get what you want.  For various reasons it might not be possible to
>> bring ethernet directly to the hosts on the 64K network.
>
> In that case you're making a case for an outboard ethernet adapter,
> e.g., like the USB dongles.

If you can't bring in ethernet, you aren't going to have USB...

>
>> And while the
>> 64K MTU network may not be as efficient with 1500 byte packets as an
>> ethernet network, replacing it with an ethernet network might be  
>> slower
>> internally than the 64K MTU network.
>
> That's the part that's confusing. In order to warrant the splitter,  
> the
> ethernet side must keep up. But then you're saying here that the
> ethernet is slower.
>
> Either:
> ethernet keeps up
> 	which you need to assume to warrant a splitter,
> 	but which begs the question of why you have less capable
> 	net to the right
>
> ethernet doesn't keep up
> 	in which case aggregation doesn't help

Why is this not clear?  The overall capacity of the 64K network  
exceeds the capacity of the ethernet network.  So for large packets,  
the 64K network is better than the ethernet network.  But with the  
smaller ethernet sized packets, the 64K network is unable to make use  
of that capacity.  That's the scenario.


>
>> So the trade off is a faster 64K
>> network that works well with large packets but not ethernet sized
>> packets, vs. a slower ethernet network that works better with  
>> ethernet
>> sized packets, but doesn't have the overall capacity of the 64K  
>> network.
>
> In the latter case, you're not keeping up on the source-splitter side.
> Which means you don't need the splitter either to aggregate or to  
> buffer.

Huh?  I'm saying that even if you could replace the 64K network with  
an ethernet network, you'd improve the end-to-end performance from  
hosts on that network to remote hosts without the need for the  
splitter, but the cost of doing that is lower performance between  
hosts that used to be on the 64K network.

>
>>> Again, this argues that something is seriously broken.
>>
>> Sometimes there isn't an optimal solution and you have to make hard
>> choices.  Just because it isn't the one you want doesn't mean  
>> things are
>> *broken* when you then try to mitigate the effects of those choices.
>
> It's still very unclear what effects this is mitigating.

I'm sorry that you don't understand it.  I've tried to be clear in my  
description of the scenario.

The throughput across the 64K network with 1500 byte packets is worse  
than ethernet.  The throughput across the 64K network with larger  
packets exceeds the throughput over ethernet.  And even without that  
issue, the typical window at the remote host across ethernet is not  
large enough for the delay*bandwidth end-to-end.  That's the  
scenario.  The splitter mitigates those issues without needing to add  
a new transport.

			-David Borman


From dpreed at reed.com  Tue Jan 23 09:13:39 2007
From: dpreed at reed.com (David P. Reed)
Date: Tue, 23 Jan 2007 12:13:39 -0500
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de>	<45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de>	<45AE7900.70400@isi.edu>	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>	<45AFB5A1.9030407@isi.edu>	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>	<8b38e92efb05d97f0587240a04505367@mac.com>	<45B0FD35.7050708@isi.edu>	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>	<45B3AA8D.7070203@isi.edu>	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>	<45B4EFC2.3020408@isi.edu>	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>	<45B53546.4080507@isi.edu>
	<EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
Message-ID: <45B64243.8000109@reed.com>

This is a very strange debate.   One can (of course) develop an 
idiosyncratic protocol that works in just this case better than any 
other protocol.   The situation is not "broken" - just highly specific, 
the kind of thing that one encounters as a result of historical 
accidents, and most of the Internet infrastructure is full of historical 
accidents.

So are we accomplishing anything with this discussion?

I assert that all concerned are quite intelligent people.  So if the 
debate is just to measure your intellectual manhood against each other, 
perhaps a contest like "American Idol" would be a better place than here?

David Borman wrote:
>
> On Jan 22, 2007, at 4:05 PM, Joe Touch wrote:
>
>>
>>
>> David Borman wrote:
>>>> This is a contradiction: clearly the splitter needs to keep up with
>>>> receiving small packets at rate or it can't sustain emitting the large
>>>> packets at full speed. If the splitter can do this, then the 
>>>> destination
>>>> can. The fact that it doesn't means this is (by definition) a patch 
>>>> to a
>>>> broken system.
>>>
>>> Ah, you are assuming that both the ethernet side and the 64K MTU 
>>> side of
>>> the path operate equally efficiently using small packets.
>>
>> source ---------------> splitter ----------------> dest
>>            1500byte                64K byte
>>
>>
>> You're claiming that the splitter is required to keep the 64Kbyte side
>> running at full rate. That means the 1500-byte side has to handle
>> packets roughly 40x faster. Otherwise, the 64K byte side is not running
>> at high-rate.
>>
>> So here's what we have:
>>     - dest can handle 64K but not 1500
>>     - source must handle 1500 at high rate
>>     - splitter must receive 1500 at high rate
>>
>> Now you're claiming that there's a link (source-splitter) that's
>> efficient enough for small packets. If that's the case, why would we
>> ever want the kind of link that's being used splitter-dest?
>
> If all you're ever going to do is talk through the splitter to remote 
> ethernet hosts, then yes, it'd be preferable to bring ethernet 
> directly to the host instead of using the 64K MTU network.  But you 
> don't always get what you want.  For various reasons it might not be 
> possible to bring ethernet directly to the hosts on the 64K network.  
> And while the 64K MTU network may not be as efficient with 1500 byte 
> packets as an ethernet network, replacing it with an ethernet network 
> might be slower internally than the 64K MTU network.  So the trade off 
> is a faster 64K network that works well with large packets but not 
> ethernet sized packets, vs. a slower ethernet network that works 
> better with ethernet sized packets, but doesn't have the overall 
> capacity of the 64K network.
>
>>
>> Again, this argues that something is seriously broken.
>
> Sometimes there isn't an optimal solution and you have to make hard 
> choices.  Just because it isn't the one you want doesn't mean things 
> are *broken* when you then try to mitigate the effects of those choices.
>
>             -David Borman
>
>
>

From touch at ISI.EDU  Tue Jan 23 10:36:35 2007
From: touch at ISI.EDU (Joe Touch)
Date: Tue, 23 Jan 2007 10:36:35 -0800
Subject: [e2e] A simple scenario. (Basically the reason for the sliding
 window thread ; -))
In-Reply-To: <45B64243.8000109@reed.com>
References: <45A57D7A.6030505@isi.edu>	<0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com>	<45A67E7D.4010609@isi.edu>
	<45A82323.30405@web.de>	<45AD05E4.5040200@isi.edu>
	<45AE7738.6070701@web.de>	<45AE7900.70400@isi.edu>	<77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com>	<45AFB5A1.9030407@isi.edu>	<B29726AD-D1CD-4429-A756-E8352CF443F6@windriver.com>	<8b38e92efb05d97f0587240a04505367@mac.com>	<45B0FD35.7050708@isi.edu>	<C6E91AAA-B1C9-4CF9-B4C2-5E8F9AF444B0@windriver.com>	<45B3AA8D.7070203@isi.edu>	<0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com>	<45B4EFC2.3020408@isi.edu>	<2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com>	<45B53546.4080507@isi.edu>
	<EA8C645D-4AD1-4A55-90B4-C51DAC9E79E0@windriver.com>
	<45B64243.8000109@reed.com>
Message-ID: <45B655B3.1050903@isi.edu>

FWIW...

David P. Reed wrote:
> This is a very strange debate.   One can (of course) develop an
> idiosyncratic protocol that works in just this case better than any
> other protocol.   The situation is not "broken" - just highly specific,
> the kind of thing that one encounters as a result of historical
> accidents, and most of the Internet infrastructure is full of historical
> accidents.

The key question IMO is whether this is a useful component of the
architecture or whether it is support for legacy systems. The latter
need not be something we propagate.

> So are we accomplishing anything with this discussion?

I thought we were deciding whether this accident was useful in general
or just for legacy. D. Borman and I have taken the rest of the
discussion off-line, though.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070123/4ed8df67/signature.bin

From rbriscoe at jungle.bt.co.uk  Wed Jan 24 11:12:40 2007
From: rbriscoe at jungle.bt.co.uk (Bob Briscoe)
Date: Wed, 24 Jan 2007 19:12:40 +0000
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <Pine.LNX.4.44.0701131630110.22375-100000@gato.kotovnik.com
 >
References: <B833AEA9-3F5E-408A-999B-2FACD83458E4@mac.com>
Message-ID: <5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk>

Vadim,

At 00:47 14/01/2007, Vadim Antonov wrote:

>Dado - ISPs are not interested in reducing amount of traffic; quite
>opposite. It is their product, and as any producer they are interested in
>increasing volume - if you remember Econ 101, in the long term the
>profitability of all kinds of businesses tends to converge to the same
>norm. (Business segments with higher-than-average ROI attract more
>invenstments - and competition, thus reducing profitability;
>underperforming segments lose capital and consequently have less
>competitive pressure, thus allowing increase in profitability).
>
>In the established markets, where the initial period of rapid growth (on
>the S-curve) is over, the only sustainable way to make more money and
>increase value of business shares is to increase volume.

Agreed (assuming by 'established' you mean highly competitive / commoditised)

>So it makes no sense for ISPs whatsoever to penalize users for causing
>congestion (thus reducing the demand). Instead, they want to encourage
>users to pay more for bigger share of the network resources - the
>congestion is their friend, if they can differentiate service (who would
>pay for premium service when regular service is quite good?)

[First, a caveat: I'm going to talk in terms of charging for congestion, as 
that's how your conversation started. However, limiting a customer's 
congestion is probably much more acceptable than charging for it, and I say 
below why the two are equivalent.]

It only /seems/ to make no sense for ISPs to penalize users for causing 
congestion on a superficial first look.

Econ 101 also says that a business doesn't want to supply a customer if the 
cost of supply is higher than that which the customer is willing to pay. 
Those customers willing to pay for congestion are saying "if you supplied 
more capacity I'd pay for it". Those unwilling to pay for congestion are 
saying "ok, you've hit my limit, I don't actually want more capacity so 
much that I'd be willing to pay as much as it will cost you to provide it".

The key to this is to understand that congestion charges /complement/ 
capacity subscription charges - it certainly wouldn't make sense to /only/ 
charge for congestion whilst not charging subscriptions. The idea isn't 
that an ISP adds congestion charges on top of subscriptions. Increasing one 
should reduce the other, so that overall the user pays the same. It's just 
a question of tying a proportion of the charge to the user's traffic behaviour.

In fact, if you'd attended the Econ 103 class ;) you would have been able 
to predict what the usage proportion will tend to in a competitive market. 
Let's say an ISP's costs are 60% capacity-related and 40% operational costs 
(faults, customer service, marketing, billing and so on). We're only 
concerned here with the 60%. It turns out that an ISP's most competitive 
strategy will be to get proportion p of its capacity-related revenues (the 
60%) from usage using this simple formula:

         p = 1/e

where e is the elasticity of scale. e measures how the cost of capacity 
flattens out as the ISP buys more (aka. economy of scale).

This formula comes from Hal Varian & Jeffrey MacKie-Mason's seminal 1995 
paper "Pricing Congestible Network Resources". It comes from optimising 
what an ISP would do in a scenario where ISPs all compete by charging the 
same in total, but varying the proportions due to usage vs. subscription. 
If you want to argue that pricing congestion makes no sense in networks, 
that's the paper to argue against. No-one has successfully done that, so 
good luck.

One (unpublished) study found the cost of optical capacity (interface cards 
and links) rose with capacity by about a square-root law, implying e=2. If 
everything were optical and the market was perfectly competitive, this 
would imply a successful ISP would aim to tie 30% of its revenue to usage 
(if capacity related costs are 60%, half should be usage).

So what's the intuition behind all this? I think you will agree ISPs will 
probably want to limit the amount of congestion one user can cause. 
Otherwise that customer reduces the value of the ISP's business for all the 
other customers. If the ISP doesn't limit each user's ability to cause 
congestion, customers will switch to another ISP that does. To be 
competitive, an ISP does well to aim for p=1/e.

Returning to my initial caveat: Why is limiting congestion equivalent to 
charging for it?

Many people prefer a 'fixed price contract' to 'pay as you go'. Basically 
the ISP would be saying "For your $10/month, you're getting up to X 
capacity and up to Y congestion." It's not actually saying X capacity costs 
$7 and Y congestion costs $3. But you would be able to infer the internal 
prices the ISP is using for X and Y if the same ISP also sells X capacity 
and 2Y congestion for say $13/month.

Many people think congestion charging is all theoretical clap trap. 
However, it's actually what is happening all around us already. However, in 
practice, an ISP can't measure how much congestion each user causes in 
other networks. So instead we see various attempts to limit congestion 
using other more convenient levers:
- Volume caps are one crude proxy for congestion.
- DPI against p2p is a really crude attempt to limit congestion.
- TCP congestion control is the nearest we have to a perfect example of 
congestion charging. Except it's a voluntary reduction in rate /as if/ the 
TCP algorithm were being charged for congestion. But it's certainly not 
perfect (my fairness-religion I-D that John W mentioned explains why TCP is 
myopic in time and myopic across flows).

BTW, I've posted a more convenient version of the paper in CCR On-line that 
prints in 10pp (not 32pp of I-D format bloat). I've also updated it to say 
specifically what's wrong with the fairness in TCP, TFRC, WFQ and XCP as 
examples:
<http://www.sigcomm.org/ccr/drupal/?q=node/166>


>Also, congested network is the network operating at full capacity -
>meaning that there is no overinvestment.  If a provider has underloaded
>network it, basically, means that its business people made a mistake and
>overinvested (driving ROI - and share prices - lower).

Congestion is excess load over offered load (another way of saying loss 
rate). Loosely, you can think of congestion charges as the part of the 
charge that pays for the capacity needed to serve the traffic that isn't 
being served (what Econ 101 calls 'marginal cost of capacity'). 
Subscriptions recover past investment in capacity. Together they cover the 
average cost of capacity. In fact economists usually calcualte elasticity 
of scale from

         1/e = marginal cost / average cost,

which is why 1/e = p, the proportion of congestion charge to total charge.


In summary, it makes absolute sense for ISPs to limit congestion, which is 
equivalent to setting aside part of the monthly charge as if they are 
charging for congestion.

Cheers


Bob


____________________________________________________________________________
Bob Briscoe, <bob.briscoe at bt.com>      Networks Research Centre, BT Research
B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK.    +44 1473 645196 


From dpreed at reed.com  Wed Jan 24 12:21:18 2007
From: dpreed at reed.com (David P. Reed)
Date: Wed, 24 Jan 2007 15:21:18 -0500
Subject: [e2e] why fair sharing? ( Are we doing sliding window in the
 Internet?)
In-Reply-To: <5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk>
References: <B833AEA9-3F5E-408A-999B-2FACD83458E4@mac.com>
	<5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk>
Message-ID: <45B7BFBE.7060505@reed.com>

Bob - nice analysis, but beware of simple models being viewed as complete.

The end user values more than just transport, which is all that is 
modeled in this notion of congestion.

"Choices" or "options" also matter to users - whether it is the 
perception that there are "500 channels" a la the US cable system vs. 
the British broadcasting model of a couple of gov't channels and a few 
more gov't granted monopolies called private channels - users will pay 
for choices that they may or may not exercise.

This provides a value to "switching" functions in networks.   The 
freedom to channel surf, or the freedom to assemble a web page from many 
sources, with a small switching latency matters.   But congestion 
directly blocks the ability to switch - it kills option value, and if 
option value is a large part of customer value, then congestion means 
that greedy users who don't value choice can kill value for other users.

The other point is that network infrastructure is at scale a dynamically 
priced thing.    If you study the other literature on "real options" 
(besides that which applies to R&D and network switching options) you 
will find that options or contingent value analysis is crucial to 
pricing such infrastructures as refineries, power plants, cable plants, 
etc. when faced with variable costs such as tooling, plant construction 
costs (think semiconductor fabs and Moore's Law estiimates of demand 
opportunity).

So equilibrium economic models are helpful, but in fact contingent and 
dynamic economic models are far more important than easy analyses like 
these would imply.

Bob Briscoe wrote:
> Vadim,
>
> At 00:47 14/01/2007, Vadim Antonov wrote:
>
>> Dado - ISPs are not interested in reducing amount of traffic; quite
>> opposite. It is their product, and as any producer they are 
>> interested in
>> increasing volume - if you remember Econ 101, in the long term the
>> profitability of all kinds of businesses tends to converge to the same
>> norm. (Business segments with higher-than-average ROI attract more
>> invenstments - and competition, thus reducing profitability;
>> underperforming segments lose capital and consequently have less
>> competitive pressure, thus allowing increase in profitability).
>>
>> In the established markets, where the initial period of rapid growth (on
>> the S-curve) is over, the only sustainable way to make more money and
>> increase value of business shares is to increase volume.
>
> Agreed (assuming by 'established' you mean highly competitive / 
> commoditised)
>
>> So it makes no sense for ISPs whatsoever to penalize users for causing
>> congestion (thus reducing the demand). Instead, they want to encourage
>> users to pay more for bigger share of the network resources - the
>> congestion is their friend, if they can differentiate service (who would
>> pay for premium service when regular service is quite good?)
>
> [First, a caveat: I'm going to talk in terms of charging for 
> congestion, as that's how your conversation started. However, limiting 
> a customer's congestion is probably much more acceptable than charging 
> for it, and I say below why the two are equivalent.]
>
> It only /seems/ to make no sense for ISPs to penalize users for 
> causing congestion on a superficial first look.
>
> Econ 101 also says that a business doesn't want to supply a customer 
> if the cost of supply is higher than that which the customer is 
> willing to pay. Those customers willing to pay for congestion are 
> saying "if you supplied more capacity I'd pay for it". Those unwilling 
> to pay for congestion are saying "ok, you've hit my limit, I don't 
> actually want more capacity so much that I'd be willing to pay as much 
> as it will cost you to provide it".
>
> The key to this is to understand that congestion charges /complement/ 
> capacity subscription charges - it certainly wouldn't make sense to 
> /only/ charge for congestion whilst not charging subscriptions. The 
> idea isn't that an ISP adds congestion charges on top of 
> subscriptions. Increasing one should reduce the other, so that overall 
> the user pays the same. It's just a question of tying a proportion of 
> the charge to the user's traffic behaviour.
>
> In fact, if you'd attended the Econ 103 class ;) you would have been 
> able to predict what the usage proportion will tend to in a 
> competitive market. Let's say an ISP's costs are 60% capacity-related 
> and 40% operational costs (faults, customer service, marketing, 
> billing and so on). We're only concerned here with the 60%. It turns 
> out that an ISP's most competitive strategy will be to get proportion 
> p of its capacity-related revenues (the 60%) from usage using this 
> simple formula:
>
>         p = 1/e
>
> where e is the elasticity of scale. e measures how the cost of 
> capacity flattens out as the ISP buys more (aka. economy of scale).
>
> This formula comes from Hal Varian & Jeffrey MacKie-Mason's seminal 
> 1995 paper "Pricing Congestible Network Resources". It comes from 
> optimising what an ISP would do in a scenario where ISPs all compete 
> by charging the same in total, but varying the proportions due to 
> usage vs. subscription. If you want to argue that pricing congestion 
> makes no sense in networks, that's the paper to argue against. No-one 
> has successfully done that, so good luck.
>
> One (unpublished) study found the cost of optical capacity (interface 
> cards and links) rose with capacity by about a square-root law, 
> implying e=2. If everything were optical and the market was perfectly 
> competitive, this would imply a successful ISP would aim to tie 30% of 
> its revenue to usage (if capacity related costs are 60%, half should 
> be usage).
>
> So what's the intuition behind all this? I think you will agree ISPs 
> will probably want to limit the amount of congestion one user can 
> cause. Otherwise that customer reduces the value of the ISP's business 
> for all the other customers. If the ISP doesn't limit each user's 
> ability to cause congestion, customers will switch to another ISP that 
> does. To be competitive, an ISP does well to aim for p=1/e.
>
> Returning to my initial caveat: Why is limiting congestion equivalent 
> to charging for it?
>
> Many people prefer a 'fixed price contract' to 'pay as you go'. 
> Basically the ISP would be saying "For your $10/month, you're getting 
> up to X capacity and up to Y congestion." It's not actually saying X 
> capacity costs $7 and Y congestion costs $3. But you would be able to 
> infer the internal prices the ISP is using for X and Y if the same ISP 
> also sells X capacity and 2Y congestion for say $13/month.
>
> Many people think congestion charging is all theoretical clap trap. 
> However, it's actually what is happening all around us already. 
> However, in practice, an ISP can't measure how much congestion each 
> user causes in other networks. So instead we see various attempts to 
> limit congestion using other more convenient levers:
> - Volume caps are one crude proxy for congestion.
> - DPI against p2p is a really crude attempt to limit congestion.
> - TCP congestion control is the nearest we have to a perfect example 
> of congestion charging. Except it's a voluntary reduction in rate /as 
> if/ the TCP algorithm were being charged for congestion. But it's 
> certainly not perfect (my fairness-religion I-D that John W mentioned 
> explains why TCP is myopic in time and myopic across flows).
>
> BTW, I've posted a more convenient version of the paper in CCR On-line 
> that prints in 10pp (not 32pp of I-D format bloat). I've also updated 
> it to say specifically what's wrong with the fairness in TCP, TFRC, 
> WFQ and XCP as examples:
> <http://www.sigcomm.org/ccr/drupal/?q=node/166>
>
>
>> Also, congested network is the network operating at full capacity -
>> meaning that there is no overinvestment.  If a provider has underloaded
>> network it, basically, means that its business people made a mistake and
>> overinvested (driving ROI - and share prices - lower).
>
> Congestion is excess load over offered load (another way of saying 
> loss rate). Loosely, you can think of congestion charges as the part 
> of the charge that pays for the capacity needed to serve the traffic 
> that isn't being served (what Econ 101 calls 'marginal cost of 
> capacity'). Subscriptions recover past investment in capacity. 
> Together they cover the average cost of capacity. In fact economists 
> usually calcualte elasticity of scale from
>
>         1/e = marginal cost / average cost,
>
> which is why 1/e = p, the proportion of congestion charge to total 
> charge.
>
>
> In summary, it makes absolute sense for ISPs to limit congestion, 
> which is equivalent to setting aside part of the monthly charge as if 
> they are charging for congestion.
>
> Cheers
>
>
> Bob
>
>
> ____________________________________________________________________________ 
>
> Bob Briscoe, <bob.briscoe at bt.com>      Networks Research Centre, BT 
> Research
> B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK.    +44 1473 
> 645196
>
>
>

From ihsanqazi at gmail.com  Wed Jan 24 17:04:26 2007
From: ihsanqazi at gmail.com (Ihsan Qazi)
Date: Wed, 24 Jan 2007 20:04:26 -0500
Subject: [e2e] TCP and bi-directional traffic
Message-ID: <f39ad110701241704u755e6800xe491e9c2cbd9ada3@mail.gmail.com>

Hi everyone,

I have a question on which I would like to get some comments.

To what extent the current analytic models of TCP accurately capture the
(real) behaviour of TCP? Does there exist a body of work which analytically
characterizes the TCP latency and throughput taking into consideration the
effects of bi-directional traffic (factors like ACK Compression, reduced
forward path capacity due to the presence of ACKs etc) on TCP flows? I am
aware about some observational studies and some work related to mitigating
the effects of ACK Compression and asymmetric links (e.g. prioritizing ACKs,
applying backpressure, connection-level bandwidth allocation schemes etc)
but my question pertains to analytical work.

Thanks in advance.

Ihsan

-- 
Ihsan Ayyub Qazi
PhD Student, Department of Computer Science
6803 Sennott Square, University of Pittsburgh
Pittsburgh, PA 15260
WWW: http://www.cs.pitt.edu/~ihsan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070124/ff9507e5/attachment.html

From detlef.bosau at web.de  Mon Jan 29 12:48:47 2007
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 29 Jan 2007 21:48:47 +0100
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
 indicator for congestion?
Message-ID: <45BE5DAF.5040701@web.de>

My apologies for this question, perhaps it?s simple:

In TCP, lost / dropped packets are recognised as congestion indicator.
We don?t do so with missing ACKs.

Consider the following net:


     (downstream:)  T T T T T T T T T
Sender                                                           Receiver
     (upstream: )      AAAAAAAAAA     


Then the flow occupies the cumulated capacity of T(CP packets) and A(CK 
packets).

If CWND grows too large (by probing) and the available path capacity is 
exceeded, packet drop occurs.
If a TCP packet is dropped, this is reckognized as congestion 
indication. Shouldn?t be a dropped ACK packet seen as congestion 
indication as well?

Perhaps, this question is a bit stupid, but I don?t see the clue here at 
the moment. Perhaps, someone could help me please?

Thanks!

Detlef


From baruch at ev-en.org  Tue Jan 30 00:01:35 2007
From: baruch at ev-en.org (Baruch Even)
Date: Tue, 30 Jan 2007 10:01:35 +0200
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <45BE5DAF.5040701@web.de>
References: <45BE5DAF.5040701@web.de>
Message-ID: <20070130080135.GP22455@galon.ev-en.org>

* Detlef Bosau <detlef.bosau at web.de> [070129 23:29]:
> My apologies for this question, perhaps it?s simple:
> 
> In TCP, lost / dropped packets are recognised as congestion indicator.
> We don?t do so with missing ACKs.
> 
> Consider the following net:
> 
> 
>     (downstream:)  T T T T T T T T T
> Sender                                                           Receiver
>     (upstream: )      AAAAAAAAAA     
> 
> 
> Then the flow occupies the cumulated capacity of T(CP packets) and A(CK packets).
> 
> If CWND grows too large (by probing) and the available path capacity is exceeded, packet drop occurs.
> If a TCP packet is dropped, this is reckognized as congestion indication. Shouldn?t be a dropped ACK packet seen as 
> congestion indication as well?

How would you go about detecting that an ACK was lost?

TCP packet loss is detected by receiving repeating ACKs with the same
sequence number or by packets with SACK information. ACKs might not be
for each TCP packet, delayed-acks can and are being sent all around the
net and they usually acknowledge two or more packets. Linux sometimes
takes its time and was seen to ack 7 packets per ack.

And then there is a (more important) question of why would you consider
an ACK lost to be a congestion event at all. A congestion event means
that we are pushing too much data through the link and we should slow
down, but the ACK packets normally carry no payload so the only
congestion signal should be on the direction that the payload is
flowing. Rarely the protocol has bidirectional data transfers (the
lovely days of bimodem compared to zmodem!) and then congestion is
detect on each direction independently.

There are cases of asymmetric links that might cause trouble, but that
will only serve to slow down the payload direction as well since packets
are released to the network only when acks come back, so a lost ack will
already slow down the rate of the payload, just not by cutting the cwnd
to half.

Baruch

From baruch at ev-en.org  Tue Jan 30 01:58:17 2007
From: baruch at ev-en.org (Baruch Even)
Date: Tue, 30 Jan 2007 11:58:17 +0200
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com>
References: <45BE5DAF.5040701@web.de> <20070130080135.GP22455@galon.ev-en.org>
	<79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com>
Message-ID: <20070130095817.GQ22455@galon.ev-en.org>

* Fred Baker <fred at cisco.com> [070130 10:52]:
> 
> On Jan 30, 2007, at 12:01 AM, Baruch Even wrote:
> 
> >There are cases of asymmetric links that might cause trouble, but that
> >will only serve to slow down the payload direction as well since packets
> >are released to the network only when acks come back, so a lost ack will
> >already slow down the rate of the payload, just not by cutting the cwnd
> >to half.
> 
> actually, one can argue that it speed the payload up, or that it causes it to burst. If I have octets 10000..20000 
> outstanding, receive an ack for 10000-11999, and drop one for 12000-13999, and now receive an ack indicating that my 
> peer has received "through 15999", that looks to me like an ack for 12000-15999, and I should send a burst of that size.

I don't know in other OSes but in Linux that's not the case. Linux will
send up to 3 packets IIRC. So if we get an ack for more than 3 packets
we will only send 3 packets and "lose" the extra credit for now.
Ofcourse, the next packet that acks two packets will cause three packets
to be sent as well, so it does slow us down some and also causes larger
micro-bursts.

Baruch

From lachlan.andrew at gmail.com  Mon Jan 29 14:18:09 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Mon, 29 Jan 2007 14:18:09 -0800
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <45BE5DAF.5040701@web.de>
References: <45BE5DAF.5040701@web.de>
Message-ID: <aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>

Greetings Detlef,

On 29/01/07, Detlef Bosau <detlef.bosau at web.de> wrote:
>
> In TCP, lost / dropped packets are recognised as congestion indicator.
> We don?t do so with missing ACKs.
>
> If a TCP packet is dropped, this is reckognized as congestion
> indication. Shouldn?t be a dropped ACK packet seen as congestion
> indication as well?

Because ACKs are cumulative, we don't know that separate ACKs were
sent for each packet.

For example, high-end NICs typically have "interrupt coalescence",
which delivers a large bunch of packets simultaneously to reduce CPU
overhead.  A single "fat ACK" is sent which cumulatively acknowledges
all of these packets.  This happens even when the receiver is not
congested.


Another factor is that ACKs are typically small compared with data
packets.  The total network throughput is much greater if we throttle
only the sources contributing most to a given link's congestion,
namely those sending full data packets over the link.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603


From fred at cisco.com  Mon Jan 29 14:38:04 2007
From: fred at cisco.com (Fred Baker)
Date: Mon, 29 Jan 2007 14:38:04 -0800
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <45BE5DAF.5040701@web.de>
References: <45BE5DAF.5040701@web.de>
Message-ID: <BBC31B69-1C7A-410C-9B05-CA6E44DDF4B2@cisco.com>

missing acks are indeed an indicator of something, but it may not be  
forward path congestion.

In asymmetric circuits, for example, it is often an indicator of  
reverse path congestion. eg, if I have 100 KBPS up and 1000 KBPS  
down, I might use up the 100 KBPS before I use up the 1000 KBPS. Some  
research I read a few years back suggested that in such cases it  
might be interesting to use last-in-first-out queuing on the slower  
speed path, with a view to letting the later-and-more-inclusive ack  
get through first and eat the bypassed ones later, just to keep the  
forward path going. One of the criticisms of FAST TCP is that it is  
susceptible to reverse path congestion.

and in any event, I can think of many networks in which loss is an  
indicator of nothing more than loss. Just say "radio"...

On Jan 29, 2007, at 12:48 PM, Detlef Bosau wrote:

> My apologies for this question, perhaps it?s simple:
>
> In TCP, lost / dropped packets are recognised as congestion indicator.
> We don?t do so with missing ACKs.
>
> Consider the following net:
>
>
>     (downstream:)  T T T T T T T T T
> Sender                                                            
> Receiver
>     (upstream: )      AAAAAAAAAA
>
>
> Then the flow occupies the cumulated capacity of T(CP packets) and A 
> (CK packets).
>
> If CWND grows too large (by probing) and the available path  
> capacity is exceeded, packet drop occurs.
> If a TCP packet is dropped, this is reckognized as congestion  
> indication. Shouldn?t be a dropped ACK packet seen as congestion  
> indication as well?
>
> Perhaps, this question is a bit stupid, but I don?t see the clue  
> here at the moment. Perhaps, someone could help me please?
>
> Thanks!
>
> Detlef
>
>


From fred at cisco.com  Tue Jan 30 00:42:44 2007
From: fred at cisco.com (Fred Baker)
Date: Tue, 30 Jan 2007 00:42:44 -0800
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <20070130080135.GP22455@galon.ev-en.org>
References: <45BE5DAF.5040701@web.de> <20070130080135.GP22455@galon.ev-en.org>
Message-ID: <79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com>


On Jan 30, 2007, at 12:01 AM, Baruch Even wrote:

> There are cases of asymmetric links that might cause trouble, but that
> will only serve to slow down the payload direction as well since  
> packets
> are released to the network only when acks come back, so a lost ack  
> will
> already slow down the rate of the payload, just not by cutting the  
> cwnd
> to half.

actually, one can argue that it speed the payload up, or that it  
causes it to burst. If I have octets 10000..20000 outstanding,  
receive an ack for 10000-11999, and drop one for 12000-13999, and now  
receive an ack indicating that my peer has received "through 15999",  
that looks to me like an ack for 12000-15999, and I should send a  
burst of that size.

From Jon.Crowcroft at cl.cam.ac.uk  Wed Jan 31 12:23:51 2007
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Wed, 31 Jan 2007 20:23:51 +0000
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: Message from "Lachlan Andrew" <lachlan.andrew@gmail.com> of "Mon,
	29 Jan 2007 14:18:09 PST."
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com> 
Message-ID: <E1HCLzq-0004iZ-00@mta1.cl.cam.ac.uk>

its clear we should devise a schmee for disguising data packets as acks
a they'd 
1/ advance the  congestion window and so on
2/ get highrer priority than data packets

otoh, how do we do this - compression, perhaps? how well would VJ's compressed
tcp./ip headers scale over multiple hops? intersting to thin kabout sratge
recovery ( a la  nat state recovery) too...

also, what would happen if this was typical behaviour? virtual circuit IP?
MPLS on IP? who knows?
In missive <aa7d2c6d0701291418xf8f715eu447b669ae977160b at mail.gmail.com>, "Lachlan Andrew" 
typed:

 >>Greetings Detlef,
 >>
 >>On 29/01/07, Detlef Bosau <detlef.bosau at web.de> wrote:
 >>>
 >>> In TCP, lost / dropped packets are recognised as congestion indicator.
 >>> We don=B4t do so with missing ACKs.
 >>>
 >>> If a TCP packet is dropped, this is reckognized as congestion
 >>> indication. Shouldn=B4t be a dropped ACK packet seen as congestion
 >>> indication as well?
 >>
 >>Because ACKs are cumulative, we don't know that separate ACKs were
 >>sent for each packet.
 >>
 >>For example, high-end NICs typically have "interrupt coalescence",
 >>which delivers a large bunch of packets simultaneously to reduce CPU
 >>overhead.  A single "fat ACK" is sent which cumulatively acknowledges
 >>all of these packets.  This happens even when the receiver is not
 >>congested.
 >>
 >>
 >>Another factor is that ACKs are typically small compared with data
 >>packets.  The total network throughput is much greater if we throttle
 >>only the sources contributing most to a given link's congestion,
 >>namely those sending full data packets over the link.
 >>
 >>Cheers,
 >>Lachlan
 >>
 >>--=20
 >>Lachlan Andrew  Dept of Computer Science, Caltech
 >>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
 >>Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603
 >>

 cheers

   jon


From rewaskar at email.unc.edu  Wed Jan 31 13:02:41 2007
From: rewaskar at email.unc.edu (Sushant Rewaskar)
Date: Wed, 31 Jan 2007 16:02:41 -0500
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
References: <45BE5DAF.5040701@web.de>
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
Message-ID: <002c01c7457b$2580f210$7a850298@cs.unc.edu>

Hi,
I agree with Lachlan. In TCP there is no way to know when an ack is lost as
it carries no "sequence number" of its own. (so in fact not only it is not
done but it cannot be easily done in the current set-up). 

To get a better understanding of these issues you may want to read the
string of papers and RFC on Datagram Congestion Control Protocol (DCCP)
(http://www.read.cs.ucla.edu/dccp/ )  


Take care,
Sushant Rewaskar
-----------------------------
UNC Chapel Hill
www.cs.unc.edu/~rewaskar 
 

-----Original Message-----
From: end2end-interest-bounces at postel.org
[mailto:end2end-interest-bounces at postel.org] On Behalf Of Lachlan Andrew
Sent: Monday, January 29, 2007 5:18 PM
To: Detlef Bosau
Cc: end2end-interest at postel.org
Subject: Re: [e2e] Stupid Question: Why are missing ACKs not considered
asindicator for congestion?

Greetings Detlef,

On 29/01/07, Detlef Bosau <detlef.bosau at web.de> wrote:
>
> In TCP, lost / dropped packets are recognised as congestion indicator.
> We don4t do so with missing ACKs.
>
> If a TCP packet is dropped, this is reckognized as congestion
> indication. Shouldn4t be a dropped ACK packet seen as congestion
> indication as well?

Because ACKs are cumulative, we don't know that separate ACKs were
sent for each packet.

For example, high-end NICs typically have "interrupt coalescence",
which delivers a large bunch of packets simultaneously to reduce CPU
overhead.  A single "fat ACK" is sent which cumulatively acknowledges
all of these packets.  This happens even when the receiver is not
congested.


Another factor is that ACKs are typically small compared with data
packets.  The total network throughput is much greater if we throttle
only the sources contributing most to a given link's congestion,
namely those sending full data packets over the link.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603


From L.Wood at surrey.ac.uk  Wed Jan 31 14:41:00 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Wed, 31 Jan 2007 22:41:00 +0000
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
 indicator for congestion?
In-Reply-To: <002c01c7457b$2580f210$7a850298@cs.unc.edu>
References: <45BE5DAF.5040701@web.de>
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
	<002c01c7457b$2580f210$7a850298@cs.unc.edu>
Message-ID: <200701312241.WAA09104@cisco.com>

At Wednesday 31/01/2007 16:02 -0500, Sushant Rewaskar wrote:
>Hi,
>I agree with Lachlan. In TCP there is no way to know when an ack is lost as
>it carries no "sequence number" of its own. 

It can - timestamps are used for disambiguation, and they disambiguate the acks. They can act as unique sequence numbers.

(In fact, you wouldn't naively issue a timestamp, and expect the other end to copy and reflect it in an ack, as that's open to a variety of DoS attacks. The sender would have a table of timestamp times, with unique keys for each timestamp, and the sender would send out and look for the key in the timestamp option field. To get a better understanding of these issues you may want to read RFC1323.)

It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for.

Stupid question: why is a missing ack presumed to automatically be due to congestion, rather than link errors along the path?

L.


>(so in fact not only it is not
>done but it cannot be easily done in the current set-up). 
>
>To get a better understanding of these issues you may want to read the
>string of papers and RFC on Datagram Congestion Control Protocol (DCCP)
>(http://www.read.cs.ucla.edu/dccp/ )  
>
>
>Take care,
>Sushant Rewaskar
>-----------------------------
>UNC Chapel Hill
>www.cs.unc.edu/~rewaskar 
> 
>
>-----Original Message-----
>From: end2end-interest-bounces at postel.org
>[mailto:end2end-interest-bounces at postel.org] On Behalf Of Lachlan Andrew
>Sent: Monday, January 29, 2007 5:18 PM
>To: Detlef Bosau
>Cc: end2end-interest at postel.org
>Subject: Re: [e2e] Stupid Question: Why are missing ACKs not considered
>asindicator for congestion?
>
>Greetings Detlef,
>
>On 29/01/07, Detlef Bosau <detlef.bosau at web.de> wrote:
>>
>> In TCP, lost / dropped packets are recognised as congestion indicator.
>> We don4t do so with missing ACKs.
>>
>> If a TCP packet is dropped, this is reckognized as congestion
>> indication. Shouldn4t be a dropped ACK packet seen as congestion
>> indication as well?
>
>Because ACKs are cumulative, we don't know that separate ACKs were
>sent for each packet.
>
>For example, high-end NICs typically have "interrupt coalescence",
>which delivers a large bunch of packets simultaneously to reduce CPU
>overhead.  A single "fat ACK" is sent which cumulatively acknowledges
>all of these packets.  This happens even when the receiver is not
>congested.
>
>
>Another factor is that ACKs are typically small compared with data
>packets.  The total network throughput is much greater if we throttle
>only the sources contributing most to a given link's congestion,
>namely those sending full data packets over the link.
>
>Cheers,
>Lachlan
>
>-- 
>Lachlan Andrew  Dept of Computer Science, Caltech
>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
>Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From L.Wood at surrey.ac.uk  Wed Jan 31 14:43:03 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Wed, 31 Jan 2007 22:43:03 +0000
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
 indicator for congestion?
In-Reply-To: <E1HCLzq-0004iZ-00@mta1.cl.cam.ac.uk>
References: <E1HCLzq-0004iZ-00@mta1.cl.cam.ac.uk>
Message-ID: <200701312243.WAA09233@cisco.com>

At Wednesday 31/01/2007 20:23 +0000, Jon Crowcroft wrote:
>its clear we should devise a schmee for disguising data packets as acks

which is what piggybacking acks on data packets already does.

(ns one-way tcp doesn't simulate this. Try Fulltcp.)


>a they'd 
>1/ advance the  congestion window and so on
>2/ get highrer priority than data packets
>
>otoh, how do we do this - compression, perhaps? how well would VJ's compressed
>tcp./ip headers scale over multiple hops? intersting to thin kabout sratge
>recovery ( a la  nat state recovery) too...
>
>also, what would happen if this was typical behaviour? virtual circuit IP?
>MPLS on IP? who knows?


who cares?

>In missive <aa7d2c6d0701291418xf8f715eu447b669ae977160b at mail.gmail.com>, "Lachlan Andrew" 
>typed:
>
> >>Greetings Detlef,
> >>
> >>On 29/01/07, Detlef Bosau <detlef.bosau at web.de> wrote:
> >>>
> >>> In TCP, lost / dropped packets are recognised as congestion indicator.
> >>> We don=B4t do so with missing ACKs.
> >>>
> >>> If a TCP packet is dropped, this is reckognized as congestion
> >>> indication. Shouldn=B4t be a dropped ACK packet seen as congestion
> >>> indication as well?
> >>
> >>Because ACKs are cumulative, we don't know that separate ACKs were
> >>sent for each packet.
> >>
> >>For example, high-end NICs typically have "interrupt coalescence",
> >>which delivers a large bunch of packets simultaneously to reduce CPU
> >>overhead.  A single "fat ACK" is sent which cumulatively acknowledges
> >>all of these packets.  This happens even when the receiver is not
> >>congested.
> >>
> >>
> >>Another factor is that ACKs are typically small compared with data
> >>packets.  The total network throughput is much greater if we throttle
> >>only the sources contributing most to a given link's congestion,
> >>namely those sending full data packets over the link.
> >>
> >>Cheers,
> >>Lachlan
> >>
> >>--=20
> >>Lachlan Andrew  Dept of Computer Science, Caltech
> >>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
> >>Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603
> >>
>
> cheers
>
>   jon

From acaro at bbn.com  Wed Jan 31 14:55:52 2007
From: acaro at bbn.com (Armando L. Caro, Jr.)
Date: Wed, 31 Jan 2007 17:55:52 -0500
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
 indicator for congestion?
In-Reply-To: <BBC31B69-1C7A-410C-9B05-CA6E44DDF4B2@cisco.com>
References: <45BE5DAF.5040701@web.de>
	<BBC31B69-1C7A-410C-9B05-CA6E44DDF4B2@cisco.com>
Message-ID: <45C11E78.4030203@bbn.com>

Fred Baker wrote:
> and in any event, I can think of many networks in which loss is an
> indicator of nothing more than loss. Just say "radio"...

That might not always be true. For simplicity, let's assume a single
wireless link in the end-to-end path. If that link does L2
retransmissions, loss on the radio channel will build up a queue at L2.
Now if the endpoints are seeing loss at L4, then that means the loss was
so bad that multiple L2 retransmissions were unsuccessful... which
implies a larger queue. Thus, the sender should back off, just as it
would if it experienced a loss on a wired network.

-- 
Armando


From L.Wood at surrey.ac.uk  Wed Jan 31 17:40:56 2007
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Thu, 01 Feb 2007 01:40:56 +0000
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
 indicator for congestion?
In-Reply-To: <aa7d2c6d0701311634v1d8ac01cnfc19723f6902c00a@mail.gmail.co
 m>
References: <45BE5DAF.5040701@web.de>
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
	<002c01c7457b$2580f210$7a850298@cs.unc.edu>
	<200701312241.WAA09104@cisco.com>
	<aa7d2c6d0701311634v1d8ac01cnfc19723f6902c00a@mail.gmail.com>
Message-ID: <200702010141.BAA15760@cisco.com>

At Wednesday 31/01/2007 16:34 -0800, Lachlan Andrew wrote:
>Greetings Lloyd,
>
>On 31/01/07, Lloyd Wood <L.Wood at surrey.ac.uk> wrote:
>>It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for.
>
>No, that was my point.  We can't distinguish between ACKs which are
>lost and those which are never sent in the first place.

Yes, we can. If a SACK block is present, it tells you which datagrams were and weren't received.

If a datagram was received, an ack was sent (modulo the delack mechanism), and the datagram will not be called out in the SACK block.

If the datagram wasn't received, this will be reflected in the SACK block.


>Also, having a unique identifier (like a timestamp) isn't the same as
>having sequence numbers which can say "We're (not) consecutive".  The
>latter can detect loss but the former can't.

If you have timestamps on every ack and packet, what's the difference?


>Cheers,
>Lachlan
>
>-- 
>Lachlan Andrew  Dept of Computer Science, Caltech
>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
>Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From michael.welzl at uibk.ac.at  Wed Jan 31 23:14:15 2007
From: michael.welzl at uibk.ac.at (Michael Welzl)
Date: 01 Feb 2007 08:14:15 +0100
Subject: [e2e] Stupid Question: Why are missing ACKs not considered
	as	indicator for congestion?
In-Reply-To: <200702010141.BAA15760@cisco.com>
References: <45BE5DAF.5040701@web.de>
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
	<002c01c7457b$2580f210$7a850298@cs.unc.edu>
	<200701312241.WAA09104@cisco.com>
	<aa7d2c6d0701311634v1d8ac01cnfc19723f6902c00a@mail.gmail.com>
	<200702010141.BAA15760@cisco.com>
Message-ID: <1170314055.4775.12.camel@lap10-c703.uibk.ac.at>

> >On 31/01/07, Lloyd Wood <L.Wood at surrey.ac.uk> wrote:
> >>It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for.
> >
> >No, that was my point.  We can't distinguish between ACKs which are
> >lost and those which are never sent in the first place.
> 
> Yes, we can. If a SACK block is present, it tells you which datagrams were and weren't received.
> 
> If a datagram was received, an ack was sent (modulo the delack mechanism), and the datagram will not be called out in the SACK block.
> 
> If the datagram wasn't received, this will be reflected in the SACK block.
> 
> 
> >Also, having a unique identifier (like a timestamp) isn't the same as
> >having sequence numbers which can say "We're (not) consecutive".  The
> >latter can detect loss but the former can't.
> 
> If you have timestamps on every ack and packet, what's the difference?

I think that these methods of ACK loss detection are interesting
ideas, and there might be a way to intelligently combine them
with what's already in
http://www.icir.org/floyd/papers/draft-floyd-tcpm-ackcc-00d.txt

Cheers,
Michael


From lachlan.andrew at gmail.com  Wed Jan 31 16:34:51 2007
From: lachlan.andrew at gmail.com (Lachlan Andrew)
Date: Wed, 31 Jan 2007 16:34:51 -0800
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <200701312241.WAA09104@cisco.com>
References: <45BE5DAF.5040701@web.de>
	<aa7d2c6d0701291418xf8f715eu447b669ae977160b@mail.gmail.com>
	<002c01c7457b$2580f210$7a850298@cs.unc.edu>
	<200701312241.WAA09104@cisco.com>
Message-ID: <aa7d2c6d0701311634v1d8ac01cnfc19723f6902c00a@mail.gmail.com>

Greetings Lloyd,

On 31/01/07, Lloyd Wood <L.Wood at surrey.ac.uk> wrote:
> It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for.

No, that was my point.  We can't distinguish between ACKs which are
lost and those which are never sent in the first place.

Also, having a unique identifier (like a timestamp) isn't the same as
having sequence numbers which can say "We're (not) consecutive".  The
latter can detect loss but the former can't.

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603

From fred at cisco.com  Wed Jan 31 23:32:41 2007
From: fred at cisco.com (Fred Baker)
Date: Wed, 31 Jan 2007 23:32:41 -0800
Subject: [e2e] Stupid Question: Why are missing ACKs not considered as
	indicator for congestion?
In-Reply-To: <45C11E78.4030203@bbn.com>
References: <45BE5DAF.5040701@web.de>
	<BBC31B69-1C7A-410C-9B05-CA6E44DDF4B2@cisco.com>
	<45C11E78.4030203@bbn.com>
Message-ID: <4D40438F-B763-4A16-83B5-6563A7357935@cisco.com>

yes, there are cases in which it means congestion in a radio circuit.  
My point is that there are cases in which it means nothing of the kind.

On Jan 31, 2007, at 2:55 PM, Armando L. Caro, Jr. wrote:

> Fred Baker wrote:
>> and in any event, I can think of many networks in which loss is an
>> indicator of nothing more than loss. Just say "radio"...
>
> That might not always be true. For simplicity, let's assume a single
> wireless link in the end-to-end path. If that link does L2
> retransmissions, loss on the radio channel will build up a queue at  
> L2.
> Now if the endpoints are seeing loss at L4, then that means the  
> loss was
> so bad that multiple L2 retransmissions were unsuccessful... which
> implies a larger queue. Thus, the sender should back off, just as it
> would if it experienced a loss on a wired network.
>
> -- 
> Armando