From detlef.bosau at web.de Mon Jan 1 04:48:19 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 01 Jan 2007 13:48:19 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> Message-ID: <45990313.1050003@web.de> Fred Baker wrote: > > > I wonder where you got the notion that a typical session had a 10 ms > RTT. In a LAN environment where the servers are in the same It?s just a number. However, it?s the magnitude. Not the exact number. In your example below you have about 20 ms, 100 ms, 200 ms RTT. If we again consider one outstanding segment with 12000 bit we have the following rates (approximately) 600 kbps 120 kbps 60 kbps It?s not the question whether this is optimal. It?s the question: Does this happen in a relevant number of cases? Particularly in downloads from mobile devices which quite often do not offer larger bandwidth. So to your question why TCP should tune itself to only one outstanding segment: The reason could be the limited bandwidth the node, e.g. a mobile node, can handle. Detlef From detlef.bosau at web.de Mon Jan 1 04:49:41 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 01 Jan 2007 13:49:41 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu> Message-ID: <45990365.4080708@web.de> Medhi, Deep wrote: > See > > John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the Performance of HTTP Over Several Transport Protocols." ACM/IEEE Transactions on Networking, vol. 5, pp. 616-630, October, 1997. > > This covers maximum usable window size for different transmission media. > > -- Deep > Unfortunately, I don?t have an ACM account. Is it possible to send me a copy? Perhaps Joe? Thanks a lot! From detlef.bosau at web.de Mon Jan 1 08:36:27 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 01 Jan 2007 17:36:27 +0100 Subject: [e2e] Thanks a lot for the copies! Re: Are we doing sliding window in the Internet? In-Reply-To: <45990365.4080708@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu> <45990365.4080708@web.de> Message-ID: <4599388B.2050904@web.de> I just received two copies of the paper. Many thanks to all! Detlef Detlef Bosau wrote: > Medhi, Deep wrote: >> See >> John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the >> Performance of HTTP Over Several Transport Protocols." ACM/IEEE >> Transactions on Networking, vol. 5, pp. 616-630, October, 1997. >> This covers maximum usable window size for different transmission media. >> >> -- Deep >> > Unfortunately, I don?t have an ACM account. Is it possible to send me > a copy? Perhaps Joe? > > Thanks a lot! > > From pingali at ISI.EDU Tue Jan 2 10:31:29 2007 From: pingali at ISI.EDU (Venkata Pingali) Date: Tue, 02 Jan 2007 10:31:29 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> Message-ID: <459AA501.8050901@isi.edu> A few months back we collected some per-connection data in both client and server modes. We thought you might be interested in the preliminary results. We collected data in two modes/configurations. In the client mode we configured Apache to be a web proxy and in the server mode we configured Apache to serve an actual website. The basic results, which must be only considered as being indicative/hints of the reality, are as follows: Server end (i.e, end that has large amount of data to transfer): - Most connections are short (90% < 1sec) - MaxCwnd is < 5KB in > 80% of cases - MaxRTT is distributed almost uniformly in the 0-400ms range. Client end (i.e., the end receiving data): - ~ 90% of connections see MaxCwnd < 5KB - < 1% connections see MaxCwnd > 10KB - 90% of connections have MaxRTT < 100ms There are some problems with the data: - limited scenarios (web based) - small sample sizes (21K for server, 150K for client) - the website has non-standard distribution of file types and sizes You can find the various graphs here: http://www.isi.edu/aln/e2e.ppt Venkata Pingali http://www.isi.edu/aln Fred Baker wrote: > yes and no. > > A large percentage of sessions are very short - count the bytes in this > email and consider how many TCP segments are required to carry it, for > example, or look through your web cache to see the sizes of objects it > stores. We are doing the sliding window algorithm, but it cuts very > short when the TCP session abruptly closes. > > For longer exchanges - p2p and many others - yes, we indeed do sliding > window. > > I don't see any reason to believe that TCPs tune themselves to have > exactly RTT/MSS segments outstanding. That would be the optimal number > to have ourstanding, but generally they will have the smallest of { the > offered window, the sender's maximum window, and the used window at > which they start dropping traffic }. If they never see loss, they can > keep an incredibly large amount of data outstanding regardless of the > values of RTT and MSS. > > I wonder where you got the notion that a typical session had a 10 ms > RTT. In a LAN environment where the servers are in the same building, > that is probably the case. But consider these rather more typical > examples: across my VPN to a machine at work, across the US to MIT, and > across the Atlantic to you: > > [stealth-10-32-244-218:~] fred% traceroute irp-view7 > traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40 byte > packets > 1 fred-vpn (10.32.244.217) 1.486 ms 1.047 ms 1.034 ms > 2 n003-000-000-000.static.ge.com (3.7.12.1) 22.360 ms 20.962 ms > 22.194 ms > 3 10.34.251.137 (10.34.251.137) 23.559 ms 22.586 ms 22.236 ms > 4 sjc20-a5-gw2 (10.34.250.78) 21.465 ms 22.544 ms 20.748 ms > 5 sjc20-sbb5-gw1 (128.107.180.105) 22.294 ms 22.351 ms 22.803 ms > 6 sjc20-rbb-gw5 (128.107.180.22) 21.583 ms 22.517 ms 24.190 ms > 7 sjc12-rbb-gw4 (128.107.180.2) 22.115 ms 23.143 ms 21.478 ms > 8 sjc5-sbb4-gw1 (171.71.241.253) 26.550 ms 23.122 ms 21.569 ms > 9 sjc12-dc5-gw2 (171.71.241.66) 22.115 ms 22.435 ms 22.185 ms > 10 sjc5-dc3-gw2 (171.71.243.46) 22.031 ms 21.846 ms 22.185 ms > 11 irp-view7 (171.70.65.144) 22.760 ms 22.912 ms 21.941 ms > > [stealth-10-32-244-218:~] fred% traceroute www.mit.edu > traceroute to www.mit.edu (18.7.22.83), 64 hops max, 40 byte packets > 1 fred-vpn (10.32.244.217) 1.468 ms 1.108 ms 1.083 ms > 2 172.16.16.1 (172.16.16.1) 11.994 ms 10.351 ms 10.858 ms > 3 cbshost-68-111-47-251.sbcox.net (68.111.47.251) 9.238 ms 19.517 ms > 9.857 ms > 4 12.125.98.101 (12.125.98.101) 11.849 ms 11.913 ms 12.086 ms > 5 gbr1-p100.la2ca.ip.att.net (12.123.28.130) 12.348 ms 11.736 ms > 12.891 ms > 6 tbr2-p013502.la2ca.ip.att.net (12.122.11.145) 15.071 ms 13.462 ms > 13.453 ms > 7 12.127.3.221 (12.127.3.221) 12.643 ms 13.761 ms 14.345 ms > 8 br1-a3110s9.attga.ip.att.net (192.205.33.230) 13.842 ms 12.414 ms > 12.647 ms > 9 ae-32-54.ebr2.losangeles1.level3.net (4.68.102.126) 16.651 ms > ae-32-56.ebr2.losangeles1.level3.net (4.68.102.190) 20.154 ms * > 10 * * * > 11 ae-2.ebr1.sanjose1.level3.net (4.69.132.9) 28.222 ms 24.319 ms > ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2) 35.417 ms > 12 ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2) 25.640 ms 22.567 ms * > 13 ae-3.ebr1.denver1.level3.net (4.69.132.58) 52.275 ms 60.821 ms > 54.384 ms > 14 ae-3.ebr1.chicago1.level3.net (4.69.132.62) 68.285 ms > ae-1-100.ebr2.denver1.level3.net (4.69.132.38) 59.113 ms 68.779 ms > 15 * * * > 16 * ae-7-7.car1.boston1.level3.net (4.69.132.241) 94.977 ms * > 17 ae-7-7.car1.boston1.level3.net (4.69.132.241) 95.821 ms > ae-11-11.car2.boston1.level3.net (4.69.132.246) 93.856 ms > ae-7-7.car1.boston1.level3.net (4.69.132.241) 96.735 ms > 18 ae-11-11.car2.boston1.level3.net (4.69.132.246) 91.093 ms 92.125 > ms 4.79.2.2 (4.79.2.2) 95.802 ms > 19 4.79.2.2 (4.79.2.2) 93.945 ms 95.336 ms 97.301 ms > 20 w92-rtr-1-backbone.mit.edu (18.168.0.25) 98.246 ms www.mit.edu > (18.7.22.83) 93.657 ms w92-rtr-1-backbone.mit.edu (18.168.0.25) 92.610 ms > > [stealth-10-32-244-218:~] fred% traceroute web.de > traceroute to web.de (217.72.195.42), 64 hops max, 40 byte packets > 1 fred-vpn (10.32.244.217) 1.482 ms 1.078 ms 1.093 ms > 2 172.16.16.1 (172.16.16.1) 12.131 ms 9.318 ms 8.140 ms > 3 cbshost-68-111-47-251.sbcox.net (68.111.47.251) 10.790 ms 9.051 ms > 10.564 ms > 4 12.125.98.101 (12.125.98.101) 13.580 ms 21.643 ms 12.206 ms > 5 gbr2-p100.la2ca.ip.att.net (12.123.28.134) 12.446 ms 12.914 ms > 12.006 ms > 6 tbr2-p013602.la2ca.ip.att.net (12.122.11.149) 13.463 ms 12.711 ms > 12.187 ms > 7 12.127.3.213 (12.127.3.213) 185.324 ms 11.845 ms 12.189 ms > 8 192.205.33.226 (192.205.33.226) 12.008 ms 11.665 ms 25.390 ms > 9 ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65) 13.695 ms > ae-1-51.bbr1.losangeles1.level3.net (4.68.102.1) 11.645 ms > ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65) 12.517 ms > 10 ae-1-0.bbr1.frankfurt1.level3.net (212.187.128.30) 171.886 ms > as-2-0.bbr2.frankfurt1.level3.net (4.68.128.169) 167.640 ms 168.895 ms > 11 ge-10-0.ipcolo1.frankfurt1.level3.net (4.68.118.9) 170.336 ms > ge-11-1.ipcolo1.frankfurt1.level3.net (4.68.118.105) 174.211 ms > ge-10-1.ipcolo1.frankfurt1.level3.net (4.68.118.73) 169.730 ms > 12 gw-megaspace.frankfurt.eu.level3.net (212.162.44.158) 169.276 ms > 170.110 ms 168.099 ms > 13 te-2-3.gw-backbone-d.bs.ka.schlund.net (212.227.120.17) 171.412 ms > 171.820 ms 170.265 ms > 14 a0kac2.gw-distwe-a.bs.ka.schlund.net (212.227.121.218) 175.416 ms > 173.653 ms 174.007 ms > 15 ha-42.web.de (217.72.195.42) 174.908 ms 174.921 ms 175.821 ms > > > On Dec 31, 2006, at 11:15 AM, Detlef Bosau wrote: > >> Happy New Year, Miss Sophy My Dear! >> >> (Although this sketch is in Englisch, it is hardly known outside >> Germay to my knowledge.) >> >> I wonder whether we?re really doing sliding window in TCP connections >> all the time or whether a number of connections have congestion >> windows of only one segment, i.e. behave like stop?n wait in reality. >> >> When I assume an Ethernet like MTU, i.e. 1500 byte = 12000 bit, and >> 10 ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps. >> >> From this I would expect that in quite a few cases a TCP connection >> will have a congestion window of 1 MSS or even less. >> >> In addition, some weeks ago I read a paper, I don?t remember were, >> that we should reconsider and perhaps resize our MTUs to larger values >> for networks with large bandwidth. The rationale was simply as >> follows: The MTU size is always a tradeoff between overhead and >> jitter. From Ethernet we know that we can accept a maximum packet >> duration of 12000 bit / (10 Mbps) = 1.2 ms and the resultig jitter. >> For Gigabit Ethernet >> a maximum packet duration of 1.2 ms would result in a MTU size of 1500 >> kbyte = 1.5 Mbyte. >> >> If so, we would see "stop?n wait like" connections much more >> frequently than today. >> >> Is this view correct? >> From detlef.bosau at web.de Tue Jan 2 11:52:03 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 02 Jan 2007 20:52:03 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459AA501.8050901@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> Message-ID: <459AB7E3.7010705@web.de> Venkata Pingali wrote: > > > Server end (i.e, end that has large > amount of data to transfer): > > - Most connections are short (90% < 1sec) Do you have any knowledge of the number of "rounds" the TCP connection has seen? A couple of years ago I saw some similar result (don?t no the source at the moment) where 90 % of connections consist of not more than 20 packets. Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then we have: Round CWND 1 2 2 4 3 8 total of 14 packets up to now 4 16 total of 24 packets up to now, thus many flows will finisch before the end of the fourth round which would correspond to a CWND of about 6 kByte, 1500 byte MSS assumed. In short words: Quite a few connections are finished before the end of the fist slow start period. Does this match your observations? > - MaxCwnd is < 5KB in > 80% of cases > - MaxRTT is distributed almost uniformly > in the 0-400ms range. > > Client end (i.e., the end receiving data): > > - ~ 90% of connections see MaxCwnd < 5KB > - < 1% connections see MaxCwnd > 10KB > - 90% of connections have MaxRTT < 100ms > Oh, I love it :-) Last year I had a long argument with someone who told me about the benefits of window scaling :-) He talked about extremely large CWNDs by several dozens or hundreds of MByte :-) O.k., that?s a different story because we are talking about greedy sources than. However, if that colleague was the only one to activate window scaling while surfing from the US and A to good ol? Europe and Cisco et al. had buried hundreds of megabytes of useless queue memory in their hardware *blush* this guy perhaps filled the queues the first time ever, following the good old paradigm: "Keep the queue full" and that way of course outperformed his competitors hopelessly ;-) > There are some problems with the data: > > - limited scenarios (web based) > - small sample sizes (21K for server, 150K > for client) > - the website has non-standard distribution > of file types and sizes > At least it exists. And reality is often more convincing than standards. Particularly in cases were both disagree. > You can find the various graphs here: > http://www.isi.edu/aln/e2e.ppt Just a question: Is it possible to export those slides to a common readable format like PDF? I don?t have any M$ products in use here and when I opten PowerPoint slides with OpenOffice the results are sometimes interesting, sometimes surprising, sometimes hopeless, but nearly always quite different from what you wrote :-) Regards Detlef From pingali at ISI.EDU Tue Jan 2 12:29:55 2007 From: pingali at ISI.EDU (Venkata Pingali) Date: Tue, 02 Jan 2007 12:29:55 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459AB7E3.7010705@web.de> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> Message-ID: <459AC0C3.30103@isi.edu> Detlef Bosau wrote: > Venkata Pingali wrote: >> >> >> Server end (i.e, end that has large >> amount of data to transfer): >> >> - Most connections are short (90% < 1sec) > > Do you have any knowledge of the number of "rounds" the TCP connection > has seen? A couple of years ago I saw some similar result (don?t no the > source at the moment) where 90 % of connections consist of not more than > 20 packets. Our sample shows that 94% of connections have < 20 packets - when observed from the server end. Number of Packets Percentile of Connections 3 4% 4 55% 5 69% 10 87% 20 94% I have included the new graph and generated pdfs. http://www.isi.edu/aln/e2e.pdf http://www.isi.edu/aln/e2e.ppt > > Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then > we have: > > Round CWND > 1 2 > 2 4 > 3 8 > total of 14 packets up to now > 4 16 > total of 24 packets up to now, > > thus many flows will finisch before the end of the fourth round which > would correspond to a CWND of about 6 kByte, 1500 byte MSS assumed. > > In short words: Quite a few connections are finished before the end of > the fist slow start period. > > Does this match your observations? Yes. About 90-95% finished before slow start completed - often within the first two round trips. About 3-4% of connections lasted for a long time (several secs - minutes). But there is an interesting category of connections that last beyond the slow start but not for very long. These connections, it turns, carry a large chunk of the data (40+%) and most of the time in these connections is spent in slow start. > >> - MaxCwnd is < 5KB in > 80% of cases >> - MaxRTT is distributed almost uniformly >> in the 0-400ms range. >> >> Client end (i.e., the end receiving data): >> >> - ~ 90% of connections see MaxCwnd < 5KB >> - < 1% connections see MaxCwnd > 10KB >> - 90% of connections have MaxRTT < 100ms >> > > Oh, I love it :-) > > Last year I had a long argument with someone who told me about the > benefits of window scaling :-) He talked about extremely large CWNDs by > several dozens or hundreds of MByte :-) Dont know if it is correct to extrapolate from the same that we have but the MaxCwnd graph seems to plateau as the connection length increases (bytes or packets). > > O.k., that?s a different story because we are talking about greedy > sources than. However, if that colleague was the only one to activate > window scaling while surfing from the US and A to good ol? Europe and > Cisco et al. had buried hundreds of megabytes of useless queue memory in > their hardware *blush* this guy perhaps filled the queues the first time > ever, following the good old paradigm: "Keep the queue full" and that > way of course outperformed his competitors hopelessly ;-) > >> There are some problems with the data: >> >> - limited scenarios (web based) >> - small sample sizes (21K for server, 150K >> for client) >> - the website has non-standard distribution >> of file types and sizes >> > > At least it exists. And reality is often more convincing than standards. > Particularly in cases were both disagree. > > >> You can find the various graphs here: >> http://www.isi.edu/aln/e2e.ppt > > Just a question: Is it possible to export those slides to a common > readable format like PDF? I don?t have any M$ products in use here and > when I opten PowerPoint slides with OpenOffice the results are sometimes > interesting, sometimes surprising, sometimes hopeless, but nearly always > quite different from what you wrote :-) > > Regards > > Detlef > From touch at ISI.EDU Tue Jan 2 16:14:50 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 02 Jan 2007 16:14:50 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459AB7E3.7010705@web.de> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> Message-ID: <459AF57A.5080304@isi.edu> Detlef Bosau wrote: > Venkata Pingali wrote: >> >> >> Server end (i.e, end that has large >> amount of data to transfer): >> >> - Most connections are short (90% < 1sec) > > Do you have any knowledge of the number of "rounds" the TCP connection > has seen? A couple of years ago I saw some similar result (don?t no the > source at the moment) where 90 % of connections consist of not more than > 20 packets. > > Now, consider the initial slowstart, IIRC we start with 2 MSS (?) then > we have: I don't know if the current code starts with 2 MSS; it could start with 4. > Round CWND > 1 2 > 2 4 > 3 8 > total of 14 packets up to now > 4 16 > total of 24 packets up to now, It doesn't double each RTT; it goes up by 50%. Remember, the window grows by one MSS each ACK during the initial phase, but there is one ACK for each two MSS's. I.e., the sequence should be: round CWND 1 2 (assuming it starts with 2) 2 3 3 4 4 6 5 9 6 13 This assumes that the congestion window hasn't kicked in, at which point the growth would be 1 MSS per round (RTT). FYI,Internet MSS's are usually in the 500-byte range in general. A 5KB file would take 10 packets and be over by the 4th round. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/73b105f8/signature.bin From touch at ISI.EDU Tue Jan 2 18:55:05 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 02 Jan 2007 18:55:05 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> Message-ID: <459B1B09.40301@isi.edu> Lachlan Andrew wrote: > Greetings, > > On 02/01/07, Joe Touch wrote: >> >> Detlef Bosau wrote: >> > Round CWND >> > 1 2 >> > 2 4 >> > 3 8 >> >> It doesn't double each RTT; it goes up by 50%. Remember, the window >> grows by one MSS each ACK during the initial phase, but there is one ACK >> for each two MSS's. > > If you have ABC (as recent Linux senders do by default), or don't use ABC is EXPERIMENTAL. > delayed ACKs (as Linux receivers don't when the window is small), Delayed ACKs are strongly encouraged. Both good reasons to fix these bugs in Linux. > Detlef was right that it doubles each RTT. Right - noncompliant or nonstandard implementations can do various other things. Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/db75defd/signature.bin From ian.mcdonald at jandi.co.nz Tue Jan 2 19:58:09 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Wed, 3 Jan 2007 16:58:09 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B1B09.40301@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> Message-ID: <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com> > > If you have ABC (as recent Linux senders do by default), or don't use > > ABC is EXPERIMENTAL. > And ABC is now off by default on even later kernels as basically the congestion window didn't grow with how the whole code base interacted. Can't comment on the delayed acks as don't know that part of the code so well. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From touch at ISI.EDU Tue Jan 2 20:11:05 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 02 Jan 2007 20:11:05 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com> Message-ID: <459B2CD9.3030509@isi.edu> Ian McDonald wrote: >> > If you have ABC (as recent Linux senders do by default), or don't use >> >> ABC is EXPERIMENTAL. >> > And ABC is now off by default on even later kernels as basically the > congestion window didn't grow with how the whole code base interacted. That's not how "experimental" is intended by the IETF, i.e., it's not a patch to other bugs. -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/0c91a4a9/signature.bin From ian.mcdonald at jandi.co.nz Tue Jan 2 20:25:38 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Wed, 3 Jan 2007 17:25:38 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B2CD9.3030509@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com> <459B2CD9.3030509@isi.edu> Message-ID: <5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com> On 1/3/07, Joe Touch wrote: > Ian McDonald wrote: > >> > If you have ABC (as recent Linux senders do by default), or don't use > >> > >> ABC is EXPERIMENTAL. > >> > > And ABC is now off by default on even later kernels as basically the > > congestion window didn't grow with how the whole code base interacted. > > That's not how "experimental" is intended by the IETF, i.e., it's not a > patch to other bugs. > I understand that since I'm working on an experimental protocol myself. I'm only the messenger here. As I understand it (and I could be wrong) Linux deals with the cases fairly well that ABC is trying to solve. To get ABC into the kernel by default some of the other code would have to be changed and nobody has done that yet. If someone does that and can convince others it can go back in.. -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From touch at ISI.EDU Tue Jan 2 20:38:05 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 02 Jan 2007 20:38:05 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701021958w60fdd86cg8c94055dd495671f@mail.gmail.com> <459B2CD9.3030509@isi.edu> <5640c7e00701022025i1bf18875p2a6d77c374c0c12f@mail.gmail.com> Message-ID: <459B332D.4040302@isi.edu> Ian McDonald wrote: > On 1/3/07, Joe Touch wrote: >> Ian McDonald wrote: >> >> > If you have ABC (as recent Linux senders do by default), or don't >> use >> >> >> >> ABC is EXPERIMENTAL. >> >> >> > And ABC is now off by default on even later kernels as basically the >> > congestion window didn't grow with how the whole code base interacted. >> >> That's not how "experimental" is intended by the IETF, i.e., it's not a >> patch to other bugs. >> > I understand that since I'm working on an experimental protocol myself. > > I'm only the messenger here. As I understand it (and I could be wrong) > Linux deals with the cases fairly well that ABC is trying to solve. To > get ABC into the kernel by default some of the other code would have > to be changed and nobody has done that yet. If someone does that and > can convince others it can go back in.. ABC should NOT be "ON" by default. As to whether it should be in the kernel at all, or how it interacts with the code base, that's an implementation issue. I appreciate the complexities, but the decision of whether to use it or not should be made solely on whether it is recommended for widescale deployment or not. Thanks for the update; it's worrisome that Linux's defaults are that ephemeral, though. ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/9a3b7181/signature.bin From touch at ISI.EDU Tue Jan 2 22:07:48 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 02 Jan 2007 22:07:48 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> Message-ID: <459B4834.1050304@isi.edu> Lachlan Andrew wrote: > Greetings, > > This is probably not related to the original thread (on what happens > in real networks, as distinct from what *should* happen), but the word > "bug" bugged me... > > On 02/01/07, Joe Touch wrote: ... >> > delayed ACKs (as Linux receivers don't when the window is small), >> >> Delayed ACKs are strongly encouraged. >> Both good reasons to fix these bugs in Linux. > > I don't follow the logic of that at all. Please review RFC2581. > Linux deliberatly suppresses > delayed ACKs when it guesses that the sender is in slow start, which > sems generally correct, judging by the earlier posts in this thread. Whether it's interpreted as correct by this email list, it is NOT what the IETF currently recommends. > In that phase, they harm performance, by making slow-start even slower > than it was intended to be. Increasing the initial speed of slow > starts helps short flows at no long term cost to ongoing long flows. > When the window is large, Linux does use delayed ACKs, for the reasons > given in the RFCs. Since this is fully standards compliant, I don't > see how it can be called a bug. > > The fact that something is "encouraged" doesn't *of itself* seem a > good reason to do it, if there are clear reasons not to. That isn't > to say that there may not indeed be good reasons to change Linux's > behaviour; I'd be interested to hear them. I'd be more interested to know that there had been *controlled* experiments to validate that this behavior was safe and did not impact the current behavior of TCP congestion control as per RFC2581. At that point, I'd be interested to have that information taken to the IETF with a proposal to change the recommended behavior, and have it vetted by that community. The idea that this should be tried in the large "until there are good reasons not to" is NOT how such experiments should be performed. > (On a related note, this year's PFLDnet > has a panel session on the > implications of network stack implementors Linux and Microsoft setting > new de-facto flow control standards. This seems analogous to what the > BSD Reno release did, implementing improvements well before Reno made > it into the RFCs. The difference is that now a global infrastructure > rides on it...) The improvements in Reno were MORE conservative than TCP as specified, not less. Being more conservative is always compliant. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070102/d6846059/signature-0001.bin From detlef.bosau at web.de Wed Jan 3 03:13:10 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 03 Jan 2007 12:13:10 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> Message-ID: <459B8FC6.1040208@web.de> Lachlan Andrew wrote: > Greetings, > > On 02/01/07, Joe Touch wrote: >> >> Detlef Bosau wrote: >> > Round CWND >> > 1 2 >> > 2 4 >> > 3 8 >> >> It doesn't double each RTT; it goes up by 50%. Remember, the window >> grows by one MSS each ACK during the initial phase, but there is one ACK >> for each two MSS's. > > If you have ABC (as recent Linux senders do by default), or don't use > delayed ACKs (as Linux receivers don't when the window is small), > Detlef was right that it doubles each RTT. > > $0.02 > Lachlan > Just before I?m to end my life on "Yellow Mama" ....... ;-) I admit that I often forget to mention all my assumptions. And even more, I don?t have all the RFCs in mind, particularly not rfc 3390, which Joe has in mind when he talks of an initial window of 4 MSS. When I do NS2 simulations, I mostly turn off delayed ACKs for my purposes at the moment. From the congavoid paper, I understand that the intention was to double CWND each round if the sender is in slow start state and to increase it by 1 MSS each round when the sender is in congestion avoidance state. From my understanding it is not necessary for the AIMD scheme to work that this doubling/increasing happens every or every other round. Of course, it affects the convergence time. I?m talking too much. Please forgive me, if I miss to mention all my assumptions ... Detlef From touch at ISI.EDU Wed Jan 3 08:20:27 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 08:20:27 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B8FC6.1040208@web.de> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B8FC6.1040208@web.de> Message-ID: <459BD7CB.3080300@isi.edu> Detlef Bosau wrote: ... > Just before I?m to end my life on "Yellow Mama" ....... ;-) > > I admit that I often forget to mention all my assumptions. And even > more, I don?t have all the RFCs in mind, particularly not rfc 3390, > which Joe has in mind when he talks of an initial window of 4 MSS. > > When I do NS2 simulations, I mostly turn off delayed ACKs for my > purposes at the moment. > > From the congavoid paper, I understand that the intention was to double > CWND each round if the sender is in slow start state and to increase it > by 1 MSS each round when the sender is in congestion avoidance state. The original intention was to double it, but since delayed ACKs that hasn't been the case. The current AI is 1.5x in slowstart, and has been for quite a long time. > From my understanding it is not necessary for the AIMD scheme to work > that this doubling/increasing happens every or every other round. > Of course, it affects the convergence time. It also affects fairness when different connections use different factors, either for AI or MD. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/bd68a531/signature.bin From lachlan.andrew at gmail.com Tue Jan 2 17:49:53 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Tue, 2 Jan 2007 17:49:53 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459AF57A.5080304@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> Message-ID: Greetings, On 02/01/07, Joe Touch wrote: > > Detlef Bosau wrote: > > Round CWND > > 1 2 > > 2 4 > > 3 8 > > It doesn't double each RTT; it goes up by 50%. Remember, the window > grows by one MSS each ACK during the initial phase, but there is one ACK > for each two MSS's. If you have ABC (as recent Linux senders do by default), or don't use delayed ACKs (as Linux receivers don't when the window is small), Detlef was right that it doubles each RTT. $0.02 Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From lachlan.andrew at gmail.com Tue Jan 2 21:15:27 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Tue, 2 Jan 2007 21:15:27 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B1B09.40301@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> Message-ID: Greetings, This is probably not related to the original thread (on what happens in real networks, as distinct from what *should* happen), but the word "bug" bugged me... On 02/01/07, Joe Touch wrote: > > ABC is EXPERIMENTAL. Fair enough. I've just noticed that the default in 2.6.18 has been changed to "off", possibly as a result of their experiments :) > > delayed ACKs (as Linux receivers don't when the window is small), > > Delayed ACKs are strongly encouraged. > Both good reasons to fix these bugs in Linux. I don't follow the logic of that at all. Linux deliberatly suppresses delayed ACKs when it guesses that the sender is in slow start, which sems generally correct, judging by the earlier posts in this thread. In that phase, they harm performance, by making slow-start even slower than it was intended to be. Increasing the initial speed of slow starts helps short flows at no long term cost to ongoing long flows. When the window is large, Linux does use delayed ACKs, for the reasons given in the RFCs. Since this is fully standards compliant, I don't see how it can be called a bug. The fact that something is "encouraged" doesn't *of itself* seem a good reason to do it, if there are clear reasons not to. That isn't to say that there may not indeed be good reasons to change Linux's behaviour; I'd be interested to hear them. (On a related note, this year's PFLDnet has a panel session on the implications of network stack implementors Linux and Microsoft setting new de-facto flow control standards. This seems analogous to what the BSD Reno release did, implementing improvements well before Reno made it into the RFCs. The difference is that now a global infrastructure rides on it...) Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From touch at ISI.EDU Wed Jan 3 11:04:51 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 11:04:51 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> Message-ID: <459BFE53.5070605@isi.edu> Lachlan Andrew wrote: > Greetings Joe, > > On 02/01/07, Joe Touch wrote: >> The improvements in Reno were MORE conservative than TCP as specified, >> not less. Being more conservative is always compliant. > > Correct me if I'm wrong again, but I thought that RFC 1122 mandated > following Jacobson'88, which specifies that specifies that packet > loss, as indicated by timeout, should result in setting the CWND to > its initial small value. I also thought that Reno retransmits before > timeout (less conservative) and consequently only halves the window > (less conservative). > > If the changes made transmission slower, why were they adopted? If > they made it faster, perhaps I'm misinterpreting "conservative". Reno came out roughly about the same time as RFC1122; when I say "as specified", I mean as _specified_ at the time, which was just RFC793 (in this regard, not including Nagle). It's worth considering that the Internet of 1990 wasn't what it is today either. Such experiments had much more limited impact on the international, commercial, and public community at that time. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/fdd74f4d/signature.bin From Anil.Agarwal at viasat.com Wed Jan 3 13:14:20 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Wed, 3 Jan 2007 16:14:20 -0500 Subject: [e2e] Are we doing sliding window in the Internet? References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com> Joe at al, To add to this discussion, I just did a few quick tests with a Linux 2.6.18 TCP stack over an (emulated) satellite link. Here are my observations, based on analyzing the packet trace - 1. The sender starts with an initial cwnd of 3 segments, 1448 bytes each (1448 = 1500 - 40 bytes TCP/IPv4 hdr - 12 bytes TCP timestamp option). 2. The receiver acks every segment for the first 32 kbytes of received data; subsequently, it acks every other segment (delayed ack). 3. The sender increases cwnd by 1 segment for every ack (ABC is not used). The cwnd values are as follows - Round cwnd 1 3 segments 2 6 3 10 - for some reason, the sender does not increase cwnd by 6 in this round 4 16 - the 32 kbyte threshold is crossed in this round, so the cwnd increase rate halves These are close to the values described by Detlef. A 50 kbyte transfer finishes in 5 RTTs (including one for the SYN exchange). A quick test on a Sun Solaris 5.8 machine shows the 50 kbyte transfer take 7 RTTs, which is consistent with an implementation that always uses delayed acks. Questions: 1. Is this what the Linux TCP stack implementors intended? Is this documented somewhere? 2. Does this violate any IETF TCP principle, in letter or spirit? It seems to have an (unfair) advantage over TCP implementations that always perform delayed ack. Anil ------------ Anil Agarwal ViaSat Inc. Germantown, MD ________________________________ From: end2end-interest-bounces at postel.org on behalf of Joe Touch Sent: Wed 1/3/2007 1:07 AM To: l.andrew at ieee.org Cc: end2end-interest at postel.org Subject: Re: [e2e] Are we doing sliding window in the Internet? Lachlan Andrew wrote: > Greetings, > > This is probably not related to the original thread (on what happens > in real networks, as distinct from what *should* happen), but the word > "bug" bugged me... > > On 02/01/07, Joe Touch wrote: ... >> > delayed ACKs (as Linux receivers don't when the window is small), >> >> Delayed ACKs are strongly encouraged. >> Both good reasons to fix these bugs in Linux. > > I don't follow the logic of that at all. Please review RFC2581. > Linux deliberatly suppresses > delayed ACKs when it guesses that the sender is in slow start, which > sems generally correct, judging by the earlier posts in this thread. Whether it's interpreted as correct by this email list, it is NOT what the IETF currently recommends. > In that phase, they harm performance, by making slow-start even slower > than it was intended to be. Increasing the initial speed of slow > starts helps short flows at no long term cost to ongoing long flows. > When the window is large, Linux does use delayed ACKs, for the reasons > given in the RFCs. Since this is fully standards compliant, I don't > see how it can be called a bug. > > The fact that something is "encouraged" doesn't *of itself* seem a > good reason to do it, if there are clear reasons not to. That isn't > to say that there may not indeed be good reasons to change Linux's > behaviour; I'd be interested to hear them. I'd be more interested to know that there had been *controlled* experiments to validate that this behavior was safe and did not impact the current behavior of TCP congestion control as per RFC2581. At that point, I'd be interested to have that information taken to the IETF with a proposal to change the recommended behavior, and have it vetted by that community. The idea that this should be tried in the large "until there are good reasons not to" is NOT how such experiments should be performed. > (On a related note, this year's PFLDnet > has a panel session on the > implications of network stack implementors Linux and Microsoft setting > new de-facto flow control standards. This seems analogous to what the > BSD Reno release did, implementing improvements well before Reno made > it into the RFCs. The difference is that now a global infrastructure > rides on it...) The improvements in Reno were MORE conservative than TCP as specified, not less. Being more conservative is always compliant. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/4ae13093/attachment.html From ian.mcdonald at jandi.co.nz Wed Jan 3 13:15:43 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Thu, 4 Jan 2007 10:15:43 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> Message-ID: <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> On 1/3/07, Lachlan Andrew wrote: > Greetings, > > This is probably not related to the original thread (on what happens > in real networks, as distinct from what *should* happen), but the word > "bug" bugged me... > > On 02/01/07, Joe Touch wrote: > > > > ABC is EXPERIMENTAL. > > Fair enough. I've just noticed that the default in 2.6.18 has been > changed to "off", possibly as a result of their experiments :) > Yes - see http://www.google.com/custom?domains=www.spinics.net&q=%22high+latency+with+tcp+connections%22&sa=Search&sitesearch=www.spinics.net&client=pub-3422782820843221&forid=1&ie=ISO-8859-1&oe=ISO-8859-1&cof=GALT%3A%23003324%3BGL%3A1%3BDIV%3A%2373B59C%3BVLC%3AFF6600%3BAH%3Acenter%3BBGC%3AC5DBCF%3BLBGC%3A66CC99%3BALC%3A330033%3BLC%3A330033%3BT%3A000000%3BGFNT%3A333300%3BGIMP%3A333300%3BFORID%3A1%3B&hl=en-- The thread is messy though so here is probably the most relevant part: Main message is from Dave Miller replying to Stephen Hemminger > On Fri, 1 Sep 2006 01:46:35 +0400 > Alexey Kuznetsov wrote: > > > > Expecting any performance with one byte write's is silly. > > > > I am not sure why you are so confident about status of ABC. > > I missed the discussions, when it was implemented. Apparently, > > it was noticed that ABC in its pure form does not make sense > > with snd_cwnd counted in packets and there were some reasons, > > why it still was not adapted. > > I implemented it but don't think ABC is the correct thing to be doing > in all cases. > > If you read the RFC3465, the problem it is trying to address is that of > small packets causing growth of congestion window beyond the capacity > of the link. > > It makes a number of assumptions that may not be true for Linux: > * ABC doesn't take into account congestion window validation RFC2861 > already prevents most of the problem of inflated growth. > * ABC assumes that the "true" capacity of the link is limited by > byte count not packet count. It seems to me that the thing gained by ABC are twofold: 1) protection against ACK division 2) a way to take delayed ACKs into account for cwnd growth Both of which can be obtained by simply validating the ACK against the retransmit queue, returning number of true packets ACK'd. I would even go so far as to suggest that we should drop ACKs which do not fall on packetization boundaries. Perhaps only when not in LOSS state, but I doubt that this matters in practice. Cases where mid-packet ACK is valid are truly marginal ones involving repacketization wrt. MSS/MTU changes, and these would self-correct eventually. I agree that ABC has some problems. Solution is good, implementation is just horrible :-) From ian.mcdonald at jandi.co.nz Wed Jan 3 13:46:18 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Thu, 4 Jan 2007 10:46:18 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> Message-ID: <5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com> > > I would even go so far as to suggest that we should drop ACKs which do > > not fall on packetization boundaries. > > Interesting suggstion. Would TSO be a problem? You'd have to make > sure that the card never got "creative" and put the boundaries where > we don't expect. > I don't know as I'm not an expert here - just cross posting the discussions. You can always email Dave Miller who made the suggestion. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From weddy at grc.nasa.gov Wed Jan 3 13:48:11 2007 From: weddy at grc.nasa.gov (Wesley Eddy) Date: Wed, 3 Jan 2007 16:48:11 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B4834.1050304@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> Message-ID: <20070103214811.GA27322@grc.nasa.gov> On Tue, Jan 02, 2007 at 10:07:48PM -0800, Joe Touch wrote: > > > Lachlan Andrew wrote: > > Greetings, > > > > This is probably not related to the original thread (on what happens > > in real networks, as distinct from what *should* happen), but the word > > "bug" bugged me... > > > > On 02/01/07, Joe Touch wrote: > ... > >> > delayed ACKs (as Linux receivers don't when the window is small), > >> > >> Delayed ACKs are strongly encouraged. > >> Both good reasons to fix these bugs in Linux. > > > > I don't follow the logic of that at all. > > Please review RFC2581. > The exact wording in RFC 2581 says that ACKs should be sent "at least" for every 2 packets, which allows for an ACK to be sent for every packet, as Linux does when it assumes the other side is in slow start. I believe the Linux behavior is perfectly allowable under the letter of RFC 2581. I do not consider this behavior buggy whatsoever. One separate thing to note with regards to ABC is that the RFC2581bis document in TCPM right now RECOMMENDS to increase CWND by the number of bytes ACKed during slow-start - i.e. ABC is RECOMMENDED by that document intended as an update to RFC 2581. -- Wesley M. Eddy Verizon Federal Network Systems -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/7b7001c9/attachment.bin From touch at ISI.EDU Wed Jan 3 14:08:32 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 14:08:32 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <20070103214811.GA27322@grc.nasa.gov> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> Message-ID: <459C2960.7030407@isi.edu> Wesley Eddy wrote: > On Tue, Jan 02, 2007 at 10:07:48PM -0800, Joe Touch wrote: >> >> Lachlan Andrew wrote: >>> Greetings, >>> >>> This is probably not related to the original thread (on what happens >>> in real networks, as distinct from what *should* happen), but the word >>> "bug" bugged me... >>> >>> On 02/01/07, Joe Touch wrote: >> ... >>>>> delayed ACKs (as Linux receivers don't when the window is small), >>>> Delayed ACKs are strongly encouraged. >>>> Both good reasons to fix these bugs in Linux. >>> I don't follow the logic of that at all. >> Please review RFC2581. > > The exact wording in RFC 2581 says that ACKs should be sent "at least" for > every 2 packets, which allows for an ACK to be sent for every packet, as > Linux does when it assumes the other side is in slow start. I believe the > Linux behavior is perfectly allowable under the letter of RFC 2581. I do > not consider this behavior buggy whatsoever. The exact wording from 2581: The delayed ACK algorithm specified in [Bra89] SHOULD be used by a TCP receiver. When used, a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet. The first sentence regards the use of delayed ACKs, which Bra89 defines as: A host that is receiving a stream of TCP data segments can increase efficiency in both the Internet and the hosts by sending fewer than one ACK (acknowledgment) segment per data segment received; this is known as a "delayed ACK" [TCP:5]. I.e., "delayed ACK" *means* sending fewer than one ACK per received segment. The second sentence from 2581 says not to excessively delay ACKs just do do delays; the subsequent sentences refer situations that arise due to holding back on ACKs. The paragraph in its entirety means that - when there are no losses or substantial delays, TCP SHOULD ACK *exactly* every other packet - when there are losses or delays, more ACKs can be sent to avoid withholding feedback Granted, 'every two' is a SHOULD not a MUST, but that's the only place for Linux's behavior to be considered compliant. I don't see sufficient reason in "well, it makes *us* go faster" to warrant overriding SHOULD. > One separate thing to note with regards to ABC is that the RFC2581bis > document in TCPM right now RECOMMENDS to increase CWND by the number of > bytes ACKed during slow-start - i.e. ABC is RECOMMENDED by that document > intended as an update to RFC 2581. *When* that doc comes out, then the status of ABC may need to be updated. Until then, widespread default use of ABC is not appropriate. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/2a55e617/signature-0001.bin From touch at ISI.EDU Wed Jan 3 14:37:24 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 14:37:24 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com> Message-ID: <459C3024.5000903@isi.edu> Lachlan Andrew wrote: ... > As an aside, I thought of a nice hack which I think is within the > letter of the standards, but well outside the spirit. > 1. First packet, send a MSS > 2. After the first ACK, send 2MSS worth of 1-byte packets > 3. 1 RTT later, receive 1MSS worth of ACKs (ack'ing every second packet) > 4. Without ABC, we now have a CWND of 500-1500 packets. > > Could someone tell me if this is within the letter of the standards? RFC1122, Sec 4.2.2.2: An application program is logically required to set the PUSH flag in a SEND call whenever it needs to force delivery of the data to avoid a communication deadlock. However, a TCP SHOULD send a maximum-sized segment whenever possible, to improve performance (see Section 4.2.3.4). Given the penchant for trampling SHOULDs, however, I wouldn't be surprised to see someone implement the above and claim it to be compliant. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/1c7b6c22/signature.bin From touch at ISI.EDU Wed Jan 3 14:46:15 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 14:46:15 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> Message-ID: <459C3237.4000709@isi.edu> Lachlan Andrew wrote: > Greetings, > > On 03/01/07, Joe Touch wrote: >> I.e., "delayed ACK" *means* sending fewer than one ACK per received >> segment. > > It obviously doesn't mean that *every* packet should be ACK'd less > than once (i.e., zero times). It means that *some* packets should not > be ACK'd, just as Linux does once the transmission is underway. > >> I don't see sufficient >> reason in "well, it makes *us* go faster" to warrant overriding SHOULD. > > Agreed!! Selfishness should be discouraged. > > The point is that if *everyone* used QuickACKs, short transfers would > be faster, with almost no harm done to long flows. If you believe that's true, please present some verification. An implementation based on an assertion is insufficient. > (It is a better > approximation to "shortest job first", which is well known to minimise > the average delay for a given utilisation.) It is well known that > slow start is too slow for modern bandwidth-delay products (althought > it was fine when it was proposed). Agreed. > To me, that *is* a good reason to > override a SHOULD. Thought experiments are *lousy* reasons to override SHOULDs. The desire for something better than what we currently have is an equally lousy reason by itself. If you have evidence, please make the case and get the community to agree and deploy this everywhere. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/2045a7f6/signature.bin From faber at ISI.EDU Wed Jan 3 14:59:36 2007 From: faber at ISI.EDU (Ted Faber) Date: Wed, 3 Jan 2007 14:59:36 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C2960.7030407@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> Message-ID: <20070103225935.GA11407@hut.isi.edu> On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote: > Granted, 'every two' is a SHOULD not a MUST, but that's the only place > for Linux's behavior to be considered compliant. I don't see sufficient > reason in "well, it makes *us* go faster" to warrant overriding SHOULD. A TCP implementation that acknowledges every packet (and otherwise implements all MUSTs in the relevant RFCs) is a (conditionally) compliant implementation as defined by RFC1122. I really don't see any ambiguity there. (OK, RFC1122 could say that all conditionally and unconditionally compliant implementations are compliant, which it doesn't, so strictly speaking I should remove the parens around "conditionally" above: "anal-retentive" is hyphenated.) "Buggy," unlike "(un)?conditionally compliant," is not well defined, but I don't think that the majority of implementors would agree that a conditionally compliant TCP implementation is per se a buggy one. It's a good way to argue about text rather than the design decision, though. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/7a4129bf/attachment.bin From touch at ISI.EDU Wed Jan 3 15:51:07 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 15:51:07 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <20070103225935.GA11407@hut.isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> Message-ID: <459C416B.7040702@isi.edu> Ted Faber wrote: > On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote: >> Granted, 'every two' is a SHOULD not a MUST, but that's the only place >> for Linux's behavior to be considered compliant. I don't see sufficient >> reason in "well, it makes *us* go faster" to warrant overriding SHOULD. > > A TCP implementation that acknowledges every packet (and otherwise > implements all MUSTs in the relevant RFCs) is a (conditionally) > compliant implementation as defined by RFC1122. I really don't see any > ambiguity there. (OK, RFC1122 could say that all conditionally and > unconditionally compliant implementations are compliant, which it > doesn't, so strictly speaking I should remove the parens around > "conditionally" above: "anal-retentive" is hyphenated.) Conditional compliance should come with a statement of the conditions. Absent that, it's just buggy. Reasonable conditions do not include "it makes *us* go faster"; the include things like "this implementation is to be deployed in a limited environment that is overwhelmingly satellite-oriented" - e.g., if DirectPC were to use a variant for proxy traffic to its home routers that overrode SHOULDs for those reasons, that'd be non-buggy. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/8d345fa3/signature.bin From L.Wood at surrey.ac.uk Wed Jan 3 16:26:34 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 04 Jan 2007 00:26:34 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C416B.7040702@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> Message-ID: <200701040027.AAA13758@cisco.com> At Wednesday 03/01/2007 15:51 -0800, Joe Touch wrote: >Ted Faber wrote: >> On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote: >>> Granted, 'every two' is a SHOULD not a MUST, but that's the only place >>> for Linux's behavior to be considered compliant. I don't see sufficient >>> reason in "well, it makes *us* go faster" to warrant overriding SHOULD. >> >> A TCP implementation that acknowledges every packet (and otherwise >> implements all MUSTs in the relevant RFCs) is a (conditionally) >> compliant implementation as defined by RFC1122. I really don't see any >> ambiguity there. (OK, RFC1122 could say that all conditionally and >> unconditionally compliant implementations are compliant, which it >> doesn't, so strictly speaking I should remove the parens around >> "conditionally" above: "anal-retentive" is hyphenated.) > >Conditional compliance should come with a statement of the conditions. >Absent that, it's just buggy. > >Reasonable conditions do not include "it makes *us* go faster"; the >include things like "this implementation is to be deployed in a limited >environment that is overwhelmingly satellite-oriented" - e.g., if >DirectPC were to use a variant for proxy traffic to its home routers >that overrode SHOULDs for those reasons, that'd be non-buggy. So, if we're DirecPC, overriding SHOULDs can make us go faster. Do these semantic wranglings actually have a point? L. From L.Wood at surrey.ac.uk Wed Jan 3 16:28:12 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 04 Jan 2007 00:28:12 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C3237.4000709@isi.edu> References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <459C3237.4000709@isi.edu> Message-ID: <200701040028.AAA13798@cisco.com> At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote: >> >> The point is that if *everyone* used QuickACKs, short transfers would >> be faster, with almost no harm done to long flows. > >If you believe that's true, please present some verification. An >implementation based on an assertion is insufficient. And yet everyone is expected to implement based on the simple MUST and SHOULD assertions in RFCs, given without explanation. Which is, as you say, insufficient. L. From touch at ISI.EDU Wed Jan 3 16:36:01 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 16:36:01 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040028.AAA13798@cisco.com> References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com> Message-ID: <459C4BF1.6060004@isi.edu> Lloyd Wood wrote: > At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote: >>> The point is that if *everyone* used QuickACKs, short transfers would >>> be faster, with almost no harm done to long flows. >> If you believe that's true, please present some verification. An >> implementation based on an assertion is insufficient. > > And yet everyone is expected to implement based on the simple MUST > and SHOULD assertions in RFCs, given without explanation. > > Which is, as you say, insufficient. It should be insufficient to get those words into an RFC without evidence that they are appropriate. RFCs are neither the sole nor necessarily the appropriate place for that information; they can and should cite published work that validates their claims. Whether we should trust the IETF to do that is independent of whether we should ignore them solely for the benefit of individual performance. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/11df03ab/signature.bin From touch at ISI.EDU Wed Jan 3 16:44:03 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 16:44:03 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040027.AAA13758@cisco.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> Message-ID: <459C4DD3.3010106@isi.edu> Lloyd Wood wrote: > At Wednesday 03/01/2007 15:51 -0800, Joe Touch wrote: ... >> Reasonable conditions do not include "it makes *us* go faster"; the >> include things like "this implementation is to be deployed in a limited >> environment that is overwhelmingly satellite-oriented" - e.g., if >> DirectPC were to use a variant for proxy traffic to its home routers >> that overrode SHOULDs for those reasons, that'd be non-buggy. > > So, if we're DirecPC, overriding SHOULDs can make us go faster. Yes, but they would not impact others, i.e., their impact would be local to DirectPC's infrastructure. > Do these semantic wranglings actually have a point? The question is "under what conditions is it permissible to override a SHOULD". I would hope that would be clarified in an update to 2119, but don't know what the state of that doc is... Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/e8acd584/signature-0001.bin From L.Wood at surrey.ac.uk Wed Jan 3 17:57:05 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 04 Jan 2007 01:57:05 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C4BF1.6060004@isi.edu> References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com> <459C4BF1.6060004@isi.edu> Message-ID: <200701040157.BAA18111@cisco.com> At Wednesday 03/01/2007 16:36 -0800, Joe Touch wrote: >*** PGP SIGNATURE VERIFICATION *** >*** Status: Good Signature from Invalid Key >*** Alert: Please verify signer's key before trusting signature. >*** Signer: Joe Touch (0x89A766BB) >*** Signed: 04/01/2007 00:36:02 >*** Verified: 04/01/2007 01:24:20 >*** BEGIN PGP VERIFIED MESSAGE *** > > > >Lloyd Wood wrote: >> At Wednesday 03/01/2007 14:46 -0800, Joe Touch wrote: >>>> The point is that if *everyone* used QuickACKs, short transfers would >>>> be faster, with almost no harm done to long flows. >>> If you believe that's true, please present some verification. An >>> implementation based on an assertion is insufficient. >> >> And yet everyone is expected to implement based on the simple MUST >> and SHOULD assertions in RFCs, given without explanation. >> >> Which is, as you say, insufficient. > >It should be insufficient to get those words into an RFC without >evidence that they are appropriate. RFCs are neither the sole nor >necessarily the appropriate place for that information; they can and >should cite published work that validates their claims. Such citations would be informational rather than normative, and therefore optional. Informational references tend to get left out of RFCs. >Whether we >should trust the IETF to do that is independent of whether we should >ignore them solely for the benefit of individual performance. > >Joe > >-- >---------------------------------------- >Joe Touch >Sr. Network Engineer, USAF TSAT Space Segment > > > >*** END PGP VERIFIED MESSAGE *** From Anil.Agarwal at viasat.com Wed Jan 3 19:59:35 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Wed, 3 Jan 2007 22:59:35 -0500 Subject: [e2e] Are we doing sliding window in the Internet? References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com><459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de><459AF57A.5080304@isi.edu><459B1B09.40301@isi.edu><459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov><459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu><459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> Joe Touch wrote : >> Do these semantic wranglings actually have a point? > The question is "under what conditions is it permissible to override a > SHOULD". I would hope that would be clarified in an update to 2119, but > don't know what the state of that doc is... 1. The technical issue in question is QuickAck, where delayed acks are not used for the first R / 2 bytes of received data, where R seems to be the receive socket buffer size 2. QuickAck is enabled in Linux, by default. There is no procedure to disable it, except temporarily, for an application via a system call. 3. Linux supports many other "non-standard" TCP features, but most/all of them seem to be disabled by default. 4. There does not seem to be a whole lot of technical documentation on the feature, except for the Linux man page. It is not clear how this feature gets turned on and off during the life of a connection. There is no RFC on the subject. 5. It seems to violate a "SHOULD" statement in the RFCs. 6. It's objective is certainly not nefarious. It improves performance for individual short data transfers. Perhaps the SHOULD needs to be changed with some qualifications. But that requires an open discussion. It is perhaps understandable that SHOULDs and even MUSTs can be violated in controlled experimental environments (e.g., simulations). It is perhaps understandable that SHOULDs may be violated in controlled , isolated environments (e.g., satellite networks). It may be unavoidable that a SHOULD or MUST is violated by a "hacker" and used over over the Internet. But under what circumstances should a SHOULD be violated and let loose over the Internet as part of a widely used OS? One would like to think that the last category should require some care and a rigorous process. Is this process not documented or well understood? Surely, it cannot be - implement, deploy, publish paper and write RFC :). What role should the IETF play in this process? Advisory only? Anil ----- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/9be39f79/attachment.html From L.Wood at surrey.ac.uk Wed Jan 3 20:29:19 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 04 Jan 2007 04:29:19 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.v iasat.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> Message-ID: <200701040429.EAA24974@cisco.com> This issue is minor compared to the widespread changes to their TCP stack Microsoft made with adopting Compound TCP in Vista. http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx and the IETF didn't have any say in that either. Standards bodies don't ship code. At Wednesday 03/01/2007 22:59 -0500, Agarwal, Anil wrote: > >Joe Touch wrote : > >>> Do these semantic wranglings actually have a point? > >> The question is "under what conditions is it permissible to override a >> SHOULD". I would hope that would be clarified in an update to 2119, but >> don't know what the state of that doc is... > >1. The technical issue in question is QuickAck, where delayed acks are not used for the first R / 2 bytes of received data, where R seems to be the receive socket buffer size >2. QuickAck is enabled in Linux, by default. There is no procedure to disable it, except temporarily, for an application via a system call. >3. Linux supports many other "non-standard" TCP features, but most/all of them seem to be disabled by default. >4. There does not seem to be a whole lot of technical documentation on the feature, except for the Linux man page. It is not clear how this feature gets turned on and off during the life of a connection. There is no RFC on the subject. >5. It seems to violate a "SHOULD" statement in the RFCs. >6. It's objective is certainly not nefarious. It improves performance for individual short data transfers. Perhaps the SHOULD needs to be changed with some qualifications. But that requires an open discussion. > >It is perhaps understandable that SHOULDs and even MUSTs can be violated in controlled experimental environments (e.g., simulations). >It is perhaps understandable that SHOULDs may be violated in controlled , isolated environments (e.g., satellite networks). >It may be unavoidable that a SHOULD or MUST is violated by a "hacker" and used over over the Internet. >But under what circumstances should a SHOULD be violated and let loose over the Internet as part of a widely used OS? > >One would like to think that the last category should require some care and a rigorous process. Is this process not documented or well understood? Surely, it cannot be - implement, deploy, publish paper and write RFC :). What role should the IETF play in this process? Advisory only? > >Anil >----- >Anil Agarwal >ViaSat Inc. > From touch at ISI.EDU Wed Jan 3 21:14:06 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 21:14:06 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040157.BAA18111@cisco.com> References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com> <459C4BF1.6060004@isi.edu> <200701040157.BAA18111@cisco.com> Message-ID: <459C8D1E.5080404@isi.edu> Lloyd Wood wrote: > At Wednesday 03/01/2007 16:36 -0800, Joe Touch wrote: ... >> It should be insufficient to get those words into an RFC without >> evidence that they are appropriate. RFCs are neither the sole nor >> necessarily the appropriate place for that information; they can and >> should cite published work that validates their claims. > > Such citations would be informational rather than normative, and therefore optional. Although there is a distinction between required citations of protocols (normative) and other references, I don't agree that it's appropriate to consider all informative references optional. They're informative only in the sense that they don't cite protocol standards; they're required if they are needed to understand motivation. > Informational references tend to get left out of RFCs. I hope we all avoid making that mistake, or allowing others to do so. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/88153b30/signature.bin From touch at ISI.EDU Wed Jan 3 21:15:52 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 21:15:52 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040429.EAA24974@cisco.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <200701040429.EAA24974@cisco.com> Message-ID: <459C8D88.5020603@isi.edu> Lloyd Wood wrote: > This issue is minor compared to the widespread changes to their TCP stack Microsoft made with adopting Compound TCP in Vista. > http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx > > and the IETF didn't have any say in that either. Standards bodies don't ship code. And two bugs don't make a right ;-) Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/79c7a38b/signature.bin From touch at ISI.EDU Wed Jan 3 21:21:04 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 03 Jan 2007 21:21:04 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com><459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de><459AF57A.5080304@isi.edu><459B1B09.40301@isi.edu><459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov><459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu><459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> Message-ID: <459C8EC0.3050708@isi.edu> Agarwal, Anil wrote: ... > 1. The technical issue in question is QuickAck, where delayed acks are > not used for the first R / 2 bytes of received data, where R seems to be > the receive socket buffer size > 2. QuickAck is enabled in Linux, by default. There is no procedure to > disable it, except temporarily, for an application via a system call. > 3. Linux supports many other "non-standard" TCP features, but most/all > of them seem to be disabled by default. > 4. There does not seem to be a whole lot of technical documentation on > the feature, except for the Linux man page. It is not clear how this > feature gets turned on and off during the life of a connection. There > is no RFC on the subject. > 5. It seems to violate a "SHOULD" statement in the RFCs. > 6. It's objective is certainly not nefarious. It improves performance > for individual short data transfers. Perhaps the SHOULD needs to be > changed with some qualifications. But that requires an open discussion. Nefarious motives are not the issue. The SHOULD currently stands, and it is Linux's default that should be changed first. ... > But under what circumstances should a SHOULD be violated and let loose > over the Internet as part of a widely used OS? > > One would like to think that the last category should require some care > and a rigorous process. Is this process not documented or well > understood? Surely, it cannot be - implement, deploy, publish paper and > write RFC :). How about "implement, *test*, publish a paper or bring the results to the IETF, and publish an RFC"? (i.e., basically, "of course it can be") And don't call me Shirley ;-) (with apologies in advance to those not familiar with the movie "Airplane") > What role should the IETF play in this process? Advisory only? The IETF plays the role of standards body. Linux (and Microsoft) *should* play the role of test first, deploy later. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070103/d677beb1/signature-0001.bin From ian.mcdonald at jandi.co.nz Wed Jan 3 23:54:14 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Thu, 4 Jan 2007 20:54:14 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> Message-ID: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> > One would like to think that the last category should require some care and > a rigorous process. Is this process not documented or well understood? > Surely, it cannot be - implement, deploy, publish paper and write RFC :). > What role should the IETF play in this process? Advisory only? > You'll find that Linux is probably the most RFC compliant implementation of TCP. However Linux isn't perfect and the developers do as they want. I think the bigger issue is that there are academics in one corner and implementors in another and usually they are not the same people and often don't even talk to each other. Linux is a meritocracy so if people from this list were to go over to the netdev mailing list and make a reasonable argument then it will get listened to. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From ian.mcdonald at jandi.co.nz Wed Jan 3 23:55:23 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Thu, 4 Jan 2007 20:55:23 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C8EC0.3050708@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <459C8EC0.3050708@isi.edu> Message-ID: <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com> On 1/4/07, Joe Touch wrote: > Nefarious motives are not the issue. The SHOULD currently stands, and it > is Linux's default that should be changed first. If you think Linux has a problem here post it to netdev at vger.kernel.org and say what is wrong and why. Even better if it comes with patches. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From detlef.bosau at web.de Thu Jan 4 06:24:07 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 04 Jan 2007 15:24:07 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> Message-ID: <459D0E07.7040004@web.de> Ian McDonald wrote: >> > You'll find that Linux is probably the most RFC compliant > implementation of TCP. However Linux isn't perfect and the developers > do as they want. > > I think the bigger issue is that there are academics in one corner and > implementors in another and usually they are not the same people and > often don't even talk to each other. No. I basically disagree. Sounds similar to a paper last year which I criticized and the answer was: "You can publisch results yourself!" Correctness is not proven by acclamation. And if some implementation is buggy or not standard compliant this is not healed by a large number of implementors who do something wrong. Last year, I had some look at some networking code in the BSD kernel and much of it reminded me of code, I?ve seen in the NS2. And there have been comments with names. With authors. And from that I guess, that many of the "academics" have done a great deal of implementation work, particularly in the field of TCP. In addition, computer science is an engineering discipline. And in engineering, you _first_ do research, _then_ you test your protocols, _then_ you write the standards if the tests yield convincing results and further implmentations are to follow the standards. Period. The other way round is some kind of trial and error. I think, we all remember the well known fortune cookie "If builders built buildings like programmers write programs, any woodpecker that came along would destroy human civilization." That directly applies here. I pesonally find it difficult to have always the "state of the art" i.e. the actual standards of TCP in mind, but this my problem and I have to deal with it. However, TCP is not a meritocratic or implmentocratic or commerciocratic election and the winner is M$ for today and Linux for tommorrow and afterwards it?s Novell, and then I once again see one of these funny "TCP probing" papers where some guys propose a sophisticated test suite which standards they follow, if any. I strongly believe in sound scientific work and standards which are based on that. And from that, implementations are simply to follow the standards - no ifs and buts. We have learned this in any other field of enginieriung but computer science. However, it?s necessary for computer science to achieve maturity to catch up with other disciplines here. And I say this from my own experience in professional life, because other engineers often ridicule about CS or even take it not seriously - for exactly this reason. Detlef > Linux is a meritocracy so if > people from this list were to go over to the netdev mailing list and > make a reasonable argument then it will get listened to. > > Ian From touch at ISI.EDU Thu Jan 4 06:39:11 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 06:39:11 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <459C8EC0.3050708@isi.edu> <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com> Message-ID: <459D118F.8070309@isi.edu> Ian McDonald wrote: > On 1/4/07, Joe Touch wrote: >> Nefarious motives are not the issue. The SHOULD currently stands, and it >> is Linux's default that should be changed first. > > If you think Linux has a problem here post it to > netdev at vger.kernel.org and say what is wrong and why. Even better if > it comes with patches. That's a convenient way to ensure that the problem doesn't get fixed. Participating in the IETF is not a full-time job, and going around to every OS's specific discussion venue to make the case to fix a bug - or demanding that we fix it - confuses this body with a free, evangelical repair service, which it is not. I've made the case that this is a problem here, on this list. We can take that discussion to the TSVWG mailing list if desired. Yhe next step in the IETF process - given others agree this is a bug and it does not get fixed by the *Linux community* (no, we're not all part of that) - would be to add this to an update to RFC 2525. If others decide that this should be a change to all TCPs, then the next step would be to propose it as a change in an I-D. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/7e79abbf/signature.bin From touch at ISI.EDU Thu Jan 4 07:09:01 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 07:09:01 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> Message-ID: <459D188D.8060204@isi.edu> Ian McDonald wrote: >> One would like to think that the last category should require some >> care and >> a rigorous process. Is this process not documented or well understood? >> Surely, it cannot be - implement, deploy, publish paper and write RFC :). >> What role should the IETF play in this process? Advisory only? >> > You'll find that Linux is probably the most RFC compliant > implementation of TCP. Should we include the time when Linux defaulted T/TCP to "on" in that? Or the default-ON of ABC? I.e., there are certainly points when versions of Linux were clearly not RFC-compliant in more significant ways; which version are you referring to? And *WE* won't find that. If you want to look for evidence of that fact, then please do. But unfounded assertions do not make it so, nor does throwing the gauntlet at the rest of the world saying, "if you think this is wrong, PROVE it". > However Linux isn't perfect and the developers > do as they want. That's clearly true. The good news is that Linux ends up with some of the earliest versions of new protocols. The bad news is that Linux sometimes enables things as default that were never intended as such. > I think the bigger issue is that there are academics in one corner and > implementors in another and usually they are not the same people and > often don't even talk to each other. If I'm the academic in this discussion, note that I have a number of patches that fixed bugs in FreeBSD. Just because I don't work on Linux doesn't render me an academic. However, you're right - we're not all in the same corner. I'm in the IETF corner, as are developers from other OS's, and right now it seems like you're representing the Linux community in their corner demanding that we all come over there for a chat (see below). > Linux is a meritocracy so if > people from this list were to go over to the netdev mailing list and > make a reasonable argument then it will get listened to. That's the disconnect here. *THE* place for this sort of discussion is the IETF, which this list is a peripheral (IRTF) party to. Perhaps the discussion should occur on TSVWG, or even TCPM. But expecting us to take this to the Linux community is a disconnect on how standards bodies work. Again, we don't all work on Linux. Linux cannot demand that of the world. The Linux community needs to participate in the bodies of standards it uses, and expect that of its developers. I know of no standards body that sends emissaries to developer communities (at best, they send emissaries to other standards bodies). The converse is the way things work; Linux is implementing IETF protocols, and has an *obligation* to participate in the IETF, where other communities participate. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/38a0cbdd/signature.bin From perfgeek at mac.com Thu Jan 4 07:13:26 2007 From: perfgeek at mac.com (rick jones) Date: Thu, 4 Jan 2007 07:13:26 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> <5640c7e00701031346r14fa0d88u1b370cc08631a799@mail.gmail.com> Message-ID: <4e17c6bbe1c216e4f25ced41852dab5f@mac.com> > I don't know as I'm not an expert here - just cross posting the > discussions. You can always email Dave Miller who made the suggestion. Direct email to David Miller generally (well, if my experience can be generalized, perhaps I'm just too far back in the peanut gallery) results in a "send it to the list" response. In this case that would be netdev at vger.kernel.org. rick jones From Anil.Agarwal at viasat.com Thu Jan 4 07:20:18 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Thu, 4 Jan 2007 10:20:18 -0500 Subject: [e2e] Are we doing sliding window in the Internet? References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <200701040429.EAA24974@cisco.com> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A356B@VGAEXCH01.hq.corp.viasat.com> Lloyd Wood wrote: > This issue is minor compared to the widespread changes to their TCP stack > Microsoft made with adopting Compound TCP in Vista. > http://www.microsoft.com/technet/community/columns/cableguy/cg1105.mspx > > and the IETF didn't have any say in that either. Standards bodies don't ship > code. Yikes !! >From the above URL - "CTCP is enabled by default in computers running Windows Server "Longhorn" ..." Whatever happened to the idea of vendors and IETF conducting trial tests over the Internet for a period of time and writing RFCs before widespread deployment of a new protocol feature? Anil ----- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/babb6f99/attachment.html From lachlan.andrew at gmail.com Wed Jan 3 10:38:58 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 3 Jan 2007 10:38:58 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459B4834.1050304@isi.edu> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> Message-ID: Greetings Joe, On 02/01/07, Joe Touch wrote: > The improvements in Reno were MORE conservative than TCP as specified, > not less. Being more conservative is always compliant. Correct me if I'm wrong again, but I thought that RFC 1122 mandated following Jacobson'88, which specifies that specifies that packet loss, as indicated by timeout, should result in setting the CWND to its initial small value. I also thought that Reno retransmits before timeout (less conservative) and consequently only halves the window (less conservative). If the changes made transmission slower, why were they adopted? If they made it faster, perhaps I'm misinterpreting "conservative". Cheers, Lachaln -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From lachlan.andrew at gmail.com Wed Jan 3 13:40:46 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 3 Jan 2007 13:40:46 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> Message-ID: Greetings Ian, On 03/01/07, Ian McDonald wrote: > On 1/3/07, Lachlan Andrew wrote: > > the default in 2.6.18 has been > > changed to "off", possibly as a result of their experiments :) > > > Yes - see http://www.google.com/custom?domains=www.spinics.net&q=%22high+latency+with+tcp+connections%22&sa=Search&sitesearch=www.spinics.net&client=pub-3422782820843221&forid=1&ie=ISO-8859-1&oe=ISO-8859-1&cof=GALT%3A%23003324%3BGL%3A1%3BDIV%3A%2373B59C%3BVLC%3AFF6600%3BAH%3Acenter%3BBGC%3AC5DBCF%3BLBGC%3A66CC99%3BALC%3A330033%3BLC%3A330033%3BT%3A000000%3BGFNT%3A333300%3BGIMP%3A333300%3BFORID%3A1%3B&hl=en-- Thanks for that explanation. > I would even go so far as to suggest that we should drop ACKs which do > not fall on packetization boundaries. Interesting suggstion. Would TSO be a problem? You'd have to make sure that the card never got "creative" and put the boundaries where we don't expect. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From lachlan.andrew at gmail.com Wed Jan 3 14:24:54 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 3 Jan 2007 14:24:54 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3564@VGAEXCH01.hq.corp.viasat.com> Message-ID: Greetigns Anil, On 03/01/07, Agarwal, Anil wrote: > I just did a few quick tests with a Linux 2.6.18 > TCP stack over an (emulated) satellite link. > > A 50 kbyte transfer finishes in 5 RTTs (including one for the SYN exchange). > a Sun Solaris 5.8 machine shows the 50 kbyte transfer take 7 RTTs. > > 1. Is this what the Linux TCP stack implementors intended? Is this > documented somewhere? I can't speak for them, but I would think that speeding up slow start was their aim, yes. Google "quickack", or look at man 7 tcp on a Linux system. > 2. Does this violate any IETF TCP principle, in letter or spirit? It seems > to have an (unfair) advantage over TCP implementations that always perform > delayed ack. I personally think it is within the spirit of TCP. TCP is already internally unfair (look at "RTT unfairness", or "jumbo-frame unfairness", which can give speed disparities much greater than 7:5). The original aim of TCP was a roughly-fair mechanism to achieve good effective data rates while avoiding congestion collapse. Speeding up slow start is an important part of improving the effective data rate. If absolute equality of rates had been the aim, wouldn't the algorithms have been specified independently of the MSS, and wouldn't steps have been taken to avoid RTT-unfairness when it was discovered? As an aside, I thought of a nice hack which I think is within the letter of the standards, but well outside the spirit. 1. First packet, send a MSS 2. After the first ACK, send 2MSS worth of 1-byte packets 3. 1 RTT later, receive 1MSS worth of ACKs (ack'ing every second packet) 4. Without ABC, we now have a CWND of 500-1500 packets. Could someone tell me if this is within the letter of the standards? Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From lachlan.andrew at gmail.com Wed Jan 3 14:37:46 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 3 Jan 2007 14:37:46 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C2960.7030407@isi.edu> References: <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> Message-ID: Greetings, On 03/01/07, Joe Touch wrote: > I.e., "delayed ACK" *means* sending fewer than one ACK per received > segment. It obviously doesn't mean that *every* packet should be ACK'd less than once (i.e., zero times). It means that *some* packets should not be ACK'd, just as Linux does once the transmission is underway. > I don't see sufficient > reason in "well, it makes *us* go faster" to warrant overriding SHOULD. Agreed!! Selfishness should be discouraged. The point is that if *everyone* used QuickACKs, short transfers would be faster, with almost no harm done to long flows. (It is a better approximation to "shortest job first", which is well known to minimise the average delay for a given utilisation.) It is well known that slow start is too slow for modern bandwidth-delay products (althought it was fine when it was proposed). To me, that *is* a good reason to override a SHOULD. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From L.Wood at surrey.ac.uk Thu Jan 4 08:24:34 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 04 Jan 2007 16:24:34 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D118F.8070309@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <459C8EC0.3050708@isi.edu> <5640c7e00701032355g3332edb5ma4897ed996618239@mail.gmail.com> <459D118F.8070309@isi.edu> Message-ID: <200701041625.QAA20711@cisco.com> At Thursday 04/01/2007 06:39 -0800, Joe Touch wrote: >Yhe next step in the IETF process - given others agree this is a bug and >it does not get fixed by the *Linux community* (no, we're not all part >of that) obviously, since ABC and other TCP specifications in RFCs are quite specific to BSD stacks. L. From faber at ISI.EDU Thu Jan 4 08:26:04 2007 From: faber at ISI.EDU (Ted Faber) Date: Thu, 4 Jan 2007 08:26:04 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C416B.7040702@isi.edu> References: <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> Message-ID: <20070104162604.GA85755@hut.isi.edu> On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote: > > > Ted Faber wrote: > > On Wed, Jan 03, 2007 at 02:08:32PM -0800, Joe Touch wrote: > >> Granted, 'every two' is a SHOULD not a MUST, but that's the only place > >> for Linux's behavior to be considered compliant. I don't see sufficient > >> reason in "well, it makes *us* go faster" to warrant overriding SHOULD. > > > > A TCP implementation that acknowledges every packet (and otherwise > > implements all MUSTs in the relevant RFCs) is a (conditionally) > > compliant implementation as defined by RFC1122. I really don't see any > > ambiguity there. (OK, RFC1122 could say that all conditionally and > > unconditionally compliant implementations are compliant, which it > > doesn't, so strictly speaking I should remove the parens around > > "conditionally" above: "anal-retentive" is hyphenated.) > > Conditional compliance should come with a statement of the conditions. > Absent that, it's just buggy. Now who's not reading 1122? The terms are defined there and there's no indication of a "signing statement" requirement for conditionally compliant implementations. It's just a phrase that means "did all the MUSTs and omitted one or more of the SHOULDs." It's precise, unlike the "buggy" word we can't agree on. You may disagree with omitting delayed ACKs, but the RFCs allow it. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/21c5ea18/attachment.bin From touch at ISI.EDU Thu Jan 4 08:57:45 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 08:57:45 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <20070104162604.GA85755@hut.isi.edu> References: <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu> Message-ID: <459D3209.5090602@isi.edu> Ted Faber wrote: > On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote: ... >> Conditional compliance should come with a statement of the conditions. >> Absent that, it's just buggy. > > Now who's not reading 1122? The terms are defined there and there's > no indication of a "signing statement" requirement for conditionally > compliant implementations. It's just a phrase that means "did all the > MUSTs and omitted one or more of the SHOULDs." It's precise, unlike the > "buggy" word we can't agree on. See below... > You may disagree with omitting delayed ACKs, but the RFCs allow it. RFC1122 also states: * "SHOULD" This word or the adjective "RECOMMENDED" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course. I.e., if you negate a SHOULD you ought to demonstrate you understand the implications and have weighed the case. That's clearly stated in RFC1122. It may not be reiterated where "conditionally compliant" is defined, but it comes along when a SHOULD is negated. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/916925b4/signature.bin From faber at ISI.EDU Thu Jan 4 10:16:33 2007 From: faber at ISI.EDU (Ted Faber) Date: Thu, 4 Jan 2007 10:16:33 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D3209.5090602@isi.edu> References: <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu> <459D3209.5090602@isi.edu> Message-ID: <20070104181633.GC85755@hut.isi.edu> On Thu, Jan 04, 2007 at 08:57:45AM -0800, Joe Touch wrote: > > > Ted Faber wrote: > > On Wed, Jan 03, 2007 at 03:51:07PM -0800, Joe Touch wrote: > ... > >> Conditional compliance should come with a statement of the conditions. > >> Absent that, it's just buggy. > > > > Now who's not reading 1122? The terms are defined there and there's > > no indication of a "signing statement" requirement for conditionally > > compliant implementations. It's just a phrase that means "did all the > > MUSTs and omitted one or more of the SHOULDs." It's precise, unlike the > > "buggy" word we can't agree on. > > See below... > > > You may disagree with omitting delayed ACKs, but the RFCs allow it. > > RFC1122 also states: > > * "SHOULD" > > This word or the adjective "RECOMMENDED" means that there > may exist valid reasons in particular circumstances to > ignore this item, but the full implications should be > understood and the case carefully weighed before choosing > a different course. > > I.e., if you negate a SHOULD you ought to demonstrate you understand the > implications and have weighed the case. That's clearly stated in > RFC1122. If we're going to be picky (and why stop now?) no *demonstration* is required. It says that implementors *should* to think seriously about their choice when they violate a SHOULD, not that they have to explain their thinking to you (or me, or anyone else). I understand that there's no objective way to make sure that thinking has been done, but there's no requirement to present it either. To whom would you require such a presentation, anyway? And, of course, there's a "should" in the definition of SHOULD. Regardless of whether any thinking at all has happened, one can ignore a SHOULD and be within the letter of the RFC "law." FWIW, I don't think SHOULDs should be thrown aside lightly, either. But they're spots where the IETF consensus admits that designers and implementors can make a different decision without catastrophic interoperability problems. For my money "bug" is much more derisive than even "wrong design" because it implies (to me) a level of obliviousness that doesn't seem present here. Bugs are accidents; this seems like a conscious choice. I understand it's a choice you disagree with, but IMHO it's a choice that violates no RFC. I think you're much better off debating the content of the design decision than wether it violates some unenforcable boundary. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/7942ae30/attachment.bin From touch at ISI.EDU Thu Jan 4 10:40:16 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 10:40:16 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <20070104181633.GC85755@hut.isi.edu> References: <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu> <459D3209.5090602@isi.edu> <20070104181633.GC85755@hut.isi.edu> Message-ID: <459D4A10.30200@isi.edu> Ted Faber wrote: > On Thu, Jan 04, 2007 at 08:57:45AM -0800, Joe Touch wrote: >> RFC1122 also states: >> >> * "SHOULD" >> >> This word or the adjective "RECOMMENDED" means that there >> may exist valid reasons in particular circumstances to >> ignore this item, but the full implications should be >> understood and the case carefully weighed before choosing >> a different course. ... > FWIW, I don't think SHOULDs should be thrown aside lightly, either. But > they're spots where the IETF consensus admits that designers and > implementors can make a different decision without catastrophic > interoperability problems. That's not what's implied above, IMO, e.g., by using the terms "full" and "carefully". Let's consider ma few of the SHOULDs in 1122 and consider whether we can negate them without catastrophe: - ARP would discard the first packet sent to each unresolved IP address (Nagle saw this problem in 1986: http://www-mice.cs.ucl.ac.uk/multimedia/misc/tcp_ip/8604.mm.www/0126.html) - ICMPs redirects could be used for arbitrary off-path diversion (3.2.2.2) - packets could be forwarded to a gateway indefinitely in the absence of positive information it is available > For my money "bug" is much more derisive than even "wrong design" > because it implies (to me) a level of obliviousness that doesn't seem > present here. Bugs are accidents; this seems like a conscious choice. Bugs can be conscious choices too; they are just incorrect ones. > I understand it's a choice you disagree with, but IMHO it's a choice > that violates no RFC. If 'violates' means obeys only MUSTs, then we agree. If 'violates' means obeys all MUSTs and negates SHOULDs only in particular circumstances, then we disagree. > I think you're much better off debating the content of the design > decision than wether it violates some unenforcable boundary. I've already pointed out that it is likely to be unfair w.r.t. TCPs that ACK every second packet all the time (excepting timeouts). Others seem intent on finding ways to make their preferred OS behave better so long as it's within the 'letter of the RFCs'; it is in that spirit that we need to be clear on the conditions where SHOULDs are OK to skip. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/04461b0b/signature-0001.bin From ian.mcdonald at jandi.co.nz Thu Jan 4 11:25:18 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Fri, 5 Jan 2007 08:25:18 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D188D.8060204@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> <459D188D.8060204@isi.edu> Message-ID: <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> On 1/5/07, Joe Touch wrote: > I'm sorry for the way I said things. I wasn't trying to start a mini-flame war but I have a habit of saying things in a way that causes misunderstanding at times. > > Ian McDonald wrote: > >> One would like to think that the last category should require some > >> care and > >> a rigorous process. Is this process not documented or well understood? > >> Surely, it cannot be - implement, deploy, publish paper and write RFC :). > >> What role should the IETF play in this process? Advisory only? > >> > > You'll find that Linux is probably the most RFC compliant > > implementation of TCP. > > Should we include the time when Linux defaulted T/TCP to "on" in that? > Or the default-ON of ABC? I.e., there are certainly points when versions > of Linux were clearly not RFC-compliant in more significant ways; which > version are you referring to? > What I was meaning is that Linux at present seems to be attracting people to check code against RFCs and implement experimental RFCs. This is probably because Linux is "fashionable" at the moment. I can certainly add to the list of problems as well - e.g. broken BIC the default, DCCP implementation is broken against RFCs. > And *WE* won't find that. If you want to look for evidence of that fact, > then please do. But unfounded assertions do not make it so, nor does > throwing the gauntlet at the rest of the world saying, "if you think > this is wrong, PROVE it". > > > However Linux isn't perfect and the developers > > do as they want. > > That's clearly true. The good news is that Linux ends up with some of > the earliest versions of new protocols. The bad news is that Linux > sometimes enables things as default that were never intended as such. > I think the development community for Linux is significantly different in make up to how the BSD community was. This has its positives as well as some negatives. Linux developers are very much in the mold of "lets try this out and see what happens". > > I think the bigger issue is that there are academics in one corner and > > implementors in another and usually they are not the same people and > > often don't even talk to each other. > > If I'm the academic in this discussion, note that I have a number of > patches that fixed bugs in FreeBSD. Just because I don't work on Linux > doesn't render me an academic. > > However, you're right - we're not all in the same corner. I'm in the > IETF corner, as are developers from other OS's, and right now it seems > like you're representing the Linux community in their corner demanding > that we all come over there for a chat (see below). > I'm not saying you need to chat. I'm saying notify bugs to the relevant place (see also below) > > Linux is a meritocracy so if > > people from this list were to go over to the netdev mailing list and > > make a reasonable argument then it will get listened to. > > That's the disconnect here. *THE* place for this sort of discussion is > the IETF, which this list is a peripheral (IRTF) party to. Perhaps the > discussion should occur on TSVWG, or even TCPM. But expecting us to take > this to the Linux community is a disconnect on how standards bodies work. > But surely if you say Linux is broken and then you don't inform the relevant developers then how will it get fixed? Its nice to moan about a broken TCP implementation but if you talk about that within your own community it doesn't get fixed. I'm referring specifically to the situations where people are saying Linux is not following the RFCs. The rest of the discussion quite rightly does belong here. > Again, we don't all work on Linux. Linux cannot demand that of the > world. The Linux community needs to participate in the bodies of > standards it uses, and expect that of its developers. > > I know of no standards body that sends emissaries to developer > communities (at best, they send emissaries to other standards bodies). > The converse is the way things work; Linux is implementing IETF > protocols, and has an *obligation* to participate in the IETF, where > other communities participate. > What I am trying to do is help bridge some of the gaps. I see the disconnect between the two communities and want to help remove some of that distance. The reason that I didn't directly do this myself in this case as I didn't understand the issue myself properly - other times I do. I encourage people to post comments to relevant Linux people if they are concerned. I know from personal experience that it has helped immensely. There are a few RFC authors now corresponding with Linux developers and that has helped the code base in TCP and DCCP. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From touch at ISI.EDU Thu Jan 4 11:35:16 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 11:35:16 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> <459D188D.8060204@isi.edu> <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> Message-ID: <459D56F4.5090206@isi.edu> Ian McDonald wrote: > On 1/5/07, Joe Touch wrote: >> > > I'm sorry for the way I said things. I wasn't trying to start a > mini-flame war but I have a habit of saying things in a way that > causes misunderstanding at times. ... >> > You'll find that Linux is probably the most RFC compliant >> > implementation of TCP. >> >> Should we include the time when Linux defaulted T/TCP to "on" in that? >> Or the default-ON of ABC? I.e., there are certainly points when versions >> of Linux were clearly not RFC-compliant in more significant ways; which >> version are you referring to? > > What I was meaning is that Linux at present seems to be attracting > people to check code against RFCs and implement experimental RFCs. > This is probably because Linux is "fashionable" at the moment. I think it's also a property of Linux, as you note below. One of its major benefits its that many devices/features/protocols are probably implemented and available; that's also one of its detriments at times, though. >...Linux developers are very much in the mold of > "lets try this out and see what happens". Agreed. >> > Linux is a meritocracy so if >> > people from this list were to go over to the netdev mailing list and >> > make a reasonable argument then it will get listened to. >> >> That's the disconnect here. *THE* place for this sort of discussion is >> the IETF, which this list is a peripheral (IRTF) party to. Perhaps the >> discussion should occur on TSVWG, or even TCPM. But expecting us to take >> this to the Linux community is a disconnect on how standards bodies work. >> > But surely if you say Linux is broken and then you don't inform the > relevant developers then how will it get fixed? Its nice to moan about > a broken TCP implementation but if you talk about that within your own > community it doesn't get fixed. We're not talking about that in "our own community" on this list; this (IRTF) list, as with IETF lists, is for all communities to come together to discuss such issues. > I'm referring specifically to the situations where people are saying > Linux is not following the RFCs. The rest of the discussion quite > rightly does belong here. I agree that the discussion of how to fix this in Linux belongs on a Linux list, but we're all hoping they track this and other IETF lists and take that information there. > I encourage people to post comments to relevant Linux people if they > are concerned. It'd be great for those on this list who either use Linux or who are interested to participate on the Linux lists too, but I'm sincerely hoping the Linux folk aren't waiting around for us to post this issue to their lists to address it. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/3c4246ca/signature.bin From ian.mcdonald at jandi.co.nz Thu Jan 4 11:42:03 2007 From: ian.mcdonald at jandi.co.nz (Ian McDonald) Date: Fri, 5 Jan 2007 08:42:03 +1300 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D56F4.5090206@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> <459D188D.8060204@isi.edu> <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> <459D56F4.5090206@isi.edu> Message-ID: <5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com> > I agree that the discussion of how to fix this in Linux belongs on a > Linux list, but we're all hoping they track this and other IETF lists > and take that information there. > I think this is a false hope in many cases unfortunately. > > I encourage people to post comments to relevant Linux people if they > > are concerned. > > It'd be great for those on this list who either use Linux or who are > interested to participate on the Linux lists too, but I'm sincerely > hoping the Linux folk aren't waiting around for us to post this issue to > their lists to address it. > If they don't read this list (as most aren't I believe) then they won't know about it. I think it is easier to crosspost or separately post to netdev at vger.kernel.org if people believe there are issues with Linux. Ian -- Web: http://wand.net.nz/~iam4 Blog: http://imcdnzl.blogspot.com WAND Network Research Group From touch at ISI.EDU Thu Jan 4 11:46:01 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 11:46:01 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> <459D188D.8060204@isi.edu> <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> <459D56F4.5090206@isi.edu> <5640c7e00701041142p5ea8092chd7a18f1c6c11d002@mail.gmail.com> Message-ID: <459D5979.9050009@isi.edu> Ian McDonald wrote: >> I agree that the discussion of how to fix this in Linux belongs on a >> Linux list, but we're all hoping they track this and other IETF lists >> and take that information there. >> > I think this is a false hope in many cases unfortunately. Agreed. >> > I encourage people to post comments to relevant Linux people if they >> > are concerned. >> >> It'd be great for those on this list who either use Linux or who are >> interested to participate on the Linux lists too, but I'm sincerely >> hoping the Linux folk aren't waiting around for us to post this issue to >> their lists to address it. >> > If they don't read this list (as most aren't I believe) then they > won't know about it. I think it is easier to crosspost or separately > post to netdev at vger.kernel.org if people believe there are issues with > Linux. Easier for whom? *That* is the disconnect. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/ce632376/signature.bin From touch at ISI.EDU Thu Jan 4 11:57:35 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jan 2007 11:57:35 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D56F4.5090206@isi.edu> References: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> <459C4DD3.3010106@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3568@VGAEXCH01.hq.corp.viasat.com> <5640c7e00701032354l1ba3feccn8bdda21e9df37cd4@mail.gmail.com> <459D188D.8060204@isi.edu> <5640c7e00701041125t594c62a3xd4a0f01aac60146d@mail.gmail.com> <459D56F4.5090206@isi.edu> Message-ID: <459D5C2F.7010409@isi.edu> Finally, let me say that I agree with Ian that the best way to fix this issue now is to post to the Linux lists, which I will proceed to do. I sincerely hope that Linux users on this list will track this and other IETF lists for such issues, and bring concerns to the Linux group themselves, rather than expecting "other" list members to do so. We *each* fix our own systems (and we're not all Linux users), and this is (one of) the common place(s) we figure that all out. -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/31741cbb/signature.bin From faber at ISI.EDU Thu Jan 4 13:17:41 2007 From: faber at ISI.EDU (Ted Faber) Date: Thu, 4 Jan 2007 13:17:41 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D4A10.30200@isi.edu> References: <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <20070104162604.GA85755@hut.isi.edu> <459D3209.5090602@isi.edu> <20070104181633.GC85755@hut.isi.edu> <459D4A10.30200@isi.edu> Message-ID: <20070104211741.GD85755@hut.isi.edu> On Thu, Jan 04, 2007 at 10:40:16AM -0800, Joe Touch wrote: > That's not what's implied above, IMO, e.g., by using the terms "full" > and "carefully". Let's consider ma few of the SHOULDs in 1122 and > consider whether we can negate them without catastrophe: Your examples don't convince me. I understand them fine, but I don't agree that they're catastrophic interoperability problems. Furthermore I can think of situations in which a rational implementor would choose to go against the quoted SHOULDs. > > I think you're much better off debating the content of the design > > decision than wether it violates some unenforcable boundary. > > I've already pointed out that it is likely to be unfair w.r.t. TCPs that > ACK every second packet all the time (excepting timeouts). Others seem > intent on finding ways to make their preferred OS behave better so long > as it's within the 'letter of the RFCs'; it is in that spirit that we > need to be clear on the conditions where SHOULDs are OK to skip. "Likely" doesn't seem sufficient for those who disagree with you. But I don't have much to say about this particular choice, except that I think it's in the letter of the law. I expect that the performance change is in the noise most of the time, but I'm not excited enough to either argue about it or to go out and collect data. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070104/aca6e959/attachment.bin From perfgeek at mac.com Thu Jan 4 19:30:44 2007 From: perfgeek at mac.com (rick jones) Date: Thu, 4 Jan 2007 19:30:44 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> Message-ID: <26e64446b0f56dae4d44408c5ee436e6@mac.com> earlier someone wrote: > I would even go so far as to suggest that we should drop ACKs which do not > fall on packetization boundaries. that suggests one is tracking segmentation boundaries, in which case wouldn't one be using conservation of packets heuristics rather than conservation of bytes heuristics - packet counting rather than byte counting? [istr that was part of the issue when Linux tried to implement the byte-counting ABC RFC in their packet counting stack...] The someone else wrote: > Interesting suggstion. Would TSO be a problem? You'd have to make > sure that the card never got "creative" and put the boundaries where > we don't expect. Indeed. rickjones there is no rest for the wicked, yet the virtuous have no pillows From detlef.bosau at web.de Fri Jan 5 02:48:20 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 05 Jan 2007 11:48:20 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> Message-ID: <459E2CF4.6030701@web.de> Hi. When I asked whether wie did sliding window in the Internet, I basically had a quite simple scenario in mind and basically I would like a comment on this one. So, I write it down once again, perhaps making my question more clear. The parameters are examples, so please don?t kill me whether they don?t are "that typical". Basic scenario: Sender------(some Internet path) -----Router---(link)--------Receiver The router may be replaced by a splitter, see below, The basic question is whether the use of a splitter may shorten the RTT seen by the sender to that degree, that the appropriate rate cannot be achieved by a sliding window protocol even if CWND were set to 1 MSS, the sender must hence be stalled from time to time to have the rate slow enough. Is this possible, or do I miss something? Now to the scenario in detail: Case 1: Router. Sender -----------------------------------------Router-------------------------Receiver 10 MBps, 100 ms 300 Bps, 10 ms Baiscally the link behind the router has a "slow dialin-modem bandwidth" here. Imagine a 12000 bit packet traverling from Sender to Receiver. What?s the RTT then? Let?s have a look: Sender-Router: 1.2 ms serialization latency + 100 ms transport latency = 101.2 ms Router-Receiver: 40 s serialization latency + 10 ms tranport latency = 30.01 s ================= Sender-Receiver: 40.1112 ms. If there is one packet in transit in each direction, i.e. the line is full in both directions, we would roughly have CWND/RTT = 2*12000 bit /80 s = 300 bit/s and anything is fine. Now lets replace the router by a splitter: Case 2: Splitter. Sender -----------------------------------------Splitter-------------------------Receiver 10 MBps, 100 ms 300 Bps, 10 ms (Bandwidth, latency) If the Splitter is doing "dumb spoofing", i.e. any packets are acknowledged immediately as they are received, the sender would see a round trip time of about 200 ms. So even in the stop?n waite case, i.e. CWND = 1*12000 bit, the throughput sender/splitter is 12000 bit / 200 ms = 60 bit / ms = 60 kbit/s. Which is obvioulsly to fast for the 300 bps modem line to carry. So, what should the splitter do? 1. stall the sender periodically using zero windo packets? 2. don?t care, doesn?t matter? 3. ?? (let?s ignore my own stupid ideas on this one for the moment ;-)) Detlef From detlef.bosau at web.de Fri Jan 5 03:09:18 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 05 Jan 2007 12:09:18 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <26e64446b0f56dae4d44408c5ee436e6@mac.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <5640c7e00701031315u70a8d89ckabf726487ca3e5f7@mail.gmail.com> <26e64446b0f56dae4d44408c5ee436e6@mac.com> Message-ID: <459E31DE.5030602@web.de> rick jones wrote: > earlier someone wrote: > > > I would even go so far as to suggest that we should drop ACKs which > do not > > fall on packetization boundaries. > > that suggests one is tracking segmentation boundaries, in which case > wouldn't one be using conservation of packets heuristics rather than > conservation of bytes heuristics - packet counting rather than byte > counting? > To my understanding, we do so anyway. AFAIK we use a scoreboard in Reno to track acknowledged _bytes_, we calculate windows in _bytes_, except of course the NS2 and following some rumour, i.e. I didn?t check it, in Linux. From Anil.Agarwal at viasat.com Fri Jan 5 05:25:01 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Fri, 5 Jan 2007 08:25:01 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com> Detlef wrote: > The basic question is whether the use of a splitter may shorten the RTT > seen by the sender to that degree, that the appropriate rate cannot be > achieved by a sliding window protocol even if CWND were set to 1 MSS, > the sender must hence be stalled from time to time to have the rate slow > enough. Yes Here is a more practical example - Sender -------------------------TCP-Splitter---------------------Receiver 100 Mbps, 10 us (LAN) 1 Mbps, 300 ms (geo-satellite) A cwnd of 1 segment of 1500 bytes will achieve roughly 1500 * 8 / (20 + 120) Mbps i.e., 85 Mbps, on the LAN segment, which is much higher than the satellite link rate. > So, what should the splitter do? > 1. stall the sender periodically using zero windo packets? > 2. don?t care, doesn?t matter? > 3. ?? Since, the network can support a maximum of 1 Mbps, on average, the sender should send 1 segment every 1500 * 8 / 1000000 seconds i.e, every 12 ms. So, stalling the sender using zero window Ack packets is an appropriate solution, which does not require any changes to the sender TCP stack. The cwnd value may be 1 segment or larger, it does not matter. Anil ----------- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/ea11dee7/attachment.html From detlef.bosau at web.de Fri Jan 5 06:13:56 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 05 Jan 2007 15:13:56 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3575@VGAEXCH01.hq.corp.viasat.com> Message-ID: <459E5D23.2040902@web.de> Agarwal, Anil wrote: > Detlef wrote: > > > The basic question is whether the use of a splitter may shorten the RTT > > seen by the sender to that degree, that the appropriate rate cannot be > > achieved by a sliding window protocol even if CWND were set to 1 MSS, > > the sender must hence be stalled from time to time to have the rate slow > > enough. > > Yes > Great :-) > Here is a more practical example - > > Sender > -------------------------TCP-Splitter---------------------Receiver > 100 Mbps, 10 us (LAN) 1 Mbps, 300 ms (geo-satellite) > A cwnd of 1 segment of 1500 bytes will achieve roughly > 1500 * 8 / (20 + 120) Mbps > i.e., 85 Mbps, on the LAN segment, > which is much higher than the satellite link rate. > The very interesting thing is that this behaviour is not restricted to a typical dialin modem bandwidth. And the RTT from sender to splitter can even be in a range of some ms and we will still have the same behaviour. > > So, what should the splitter do? > > 1. stall the sender periodically using zero windo packets? > > 2. don?t care, doesn?t matter? > > 3. ?? > > Since, the network can support a maximum of 1 Mbps, > on average, the sender should send 1 segment every > 1500 * 8 / 1000000 seconds > i.e, every 12 ms. > Yes. Question: How is this achieved using actual splitters? > So, stalling the sender using zero window Ack packets is an > appropriate solution, which does not require any changes to the > sender TCP stack. The cwnd value may be 1 segment or larger, > it does not matter. I wonder if splitters actually stall. I personally think, stalling is an extremely bad solution as a stalled sender must wake up somehow or must be woken up somehow. It is woken up by window updates, which unfortunately are sent unreliably as they typically do not carry any data bytes. If it is not woken up, it wakes up by itself after some timeout and sends probing packets. In my own simulations I did not yet implement window updates and do only zero window probing where I use the actual retransmission timeout for zero window probing as well. The throughput decrease is, kindly spoken, disastrous. Depending on the parameters I choose, the flow actually uses only 25 % or less of the available bandwidth. I have yet to add window updates. The problem with window updates however is to model the loss of window updates. This is a typical "paper tuning parameter (abbrev.: PTP)" If you choose this rate to low the paper will be rejected because it?s not relevant. If you choose it to high, no one believes your results (however, no one will call you a liar, it will be written more political correct) and somewhere in the middle you will find something between "weak accept" and "weak reject" :-) O.k., but let?s wait for the "strong reject" comments now. I?m eager to know what the rest of the world is tinking about this problem. In addition, I would appreciate any hint to actual papers on zero window probing. (Of course there is a way to do it without zero window probing but I would like to see whether this is really needed or whether it?s irrelevant.) Detlef From misha at eecs.cwru.edu Fri Jan 5 06:51:26 2007 From: misha at eecs.cwru.edu (Michael Rabinovich) Date: Fri, 5 Jan 2007 09:51:26 -0500 Subject: [e2e] Announcement: a new network measurement platform Message-ID: <43A64099-3E62-4089-B8A6-5FE6B569D76A@eecs.cwru.edu> We are pleased to announce the availability of DipZoom P2P network measurement infrastructure. Unlike existing approaches that face a difficult challenge of building a measurement platform with sufficiently diverse measurements and measuring hosts, DipZoom offers a matchmaking service instead, bringing together experimenters in need of measurements with external measurement providers. Salient features of DipZoom are: 1. DipZoom is an open system. Anyone can perform measurement experiments autonomously. We seeded the system with over a hundred measurement points (MPs) on PlanetLab nodes. Several residential measurement points are also available. 2. DipZoom is an extensible system. While its current standard distribution offers wget, ping, traceroute, and nslookup measurements, anyone can add new measurements as plug-ins, and recruit participants to install these plugins on their MPs. 3. DipZoom offers a coherent view over the entire collection of measurement points, which are all accessible from any local computer with DipZoom installed. The only restriction is that, in the peer-to- peer spirit, in order to run a DipZoom client, the computer must also offer measurements by becoming a DipZooom measurement point. 4. DipZoom offers both navigational and programmatic access to the entire platform. For navigational access, there is a graphical DipZoom client that allows the user to browse available MPs, select the MPs according to a number of characteristics (platform, location, autonomous system), and obtain measurements from those MPs. For programmatic access, DipZoom provides APIs to script and run complex globally distributed measurement experiments from a local computer. The APIs are implemented by a Java class library and can be called from any Java application. As a test of the usability of DipZoom APIs, students in the Fall'07 undergraduate networking class were able to perform a complex measurement experiment (investigating the quality of Akamai's server selection) in a matter of days. 5. Utmost care is paid to security, including the rate limiting of requests to both any given measurement point and to any given measurement target. DipZoom runs on windows, linux, and Mac OS platforms, and can be freely downloaded from http://dipzoom.case.edu/ . The site also includes further details on the system and links to the mailing list and people involved. Please send your comments to any of us. We hope you will find DipZoom useful and fun. Regards, Misha Rabinovich. From touch at ISI.EDU Fri Jan 5 08:38:58 2007 From: touch at ISI.EDU (Joe Touch) Date: Fri, 05 Jan 2007 08:38:58 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <459E2CF4.6030701@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> Message-ID: <459E7F22.2030907@isi.edu> Detlef Bosau wrote: > Hi. > > When I asked whether wie did sliding window in the Internet, I basically > had a quite simple scenario in mind and basically I would like a comment > on this one. ... > The basic question is whether the use of a splitter may shorten the RTT > seen by the sender to that degree, that the appropriate rate cannot be > achieved by a sliding window protocol even if CWND were set to 1 MSS, > the sender must hence be stalled from time to time to have the rate slow > enough. The window doesn't by itself determine rate; it's ACK clocking that does. In high BW*delay product nets, the same stalling happens - you send data, get an ACK, send more, get ACKs of that, etc - and the data keeps bunching up at the source. I.e., ACK clocking works only when the data-ACK look experiences a bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match rates' at all. FWIW, the same thing happens when the receiver application doesn't drain the incoming data fast enough. The receive buffers fill up, and the sender is stalled. The same thing is happening here. ... > So, what should the splitter do? > > 1. stall the sender periodically using zero windo packets? > 2. don?t care, doesn?t matter? > 3. ?? Splitters are bad for other reasons, but as you said, let's ignore them for this discussion.. It seems like the dominant effect is exactly what you expect - the endpoint (the splitter, really) isn't experiencing the bottleneck, but it's "application" (the receiver on the modem) is too slow. So you get bursty 'scheduling' of the sender based on availability of buffers at the (IMO, real, or at least effective) receiver. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/84fc22b2/signature.bin From detlef.bosau at web.de Fri Jan 5 09:45:06 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 05 Jan 2007 18:45:06 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <459E7F22.2030907@isi.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> Message-ID: <459E8EA2.4010000@web.de> Joe Touch wrote: > >> The basic question is whether the use of a splitter may shorten the RTT >> seen by the sender to that degree, that the appropriate rate cannot be >> achieved by a sliding window protocol even if CWND were set to 1 MSS, >> the sender must hence be stalled from time to time to have the rate slow >> enough. >> > > The window doesn't by itself determine rate; it's ACK clocking that > does. I?m totally with you. In the scenario above, the splitter ack?s packets "to fast", when it does dumb spoofing. In other words: Without splitting, the serialization delay of each link ensures that the sender is paced correctly via ACK clocking. When a splitter is used, the ACK pacing mechanism can be undermined. > In high BW*delay product nets, the same stalling happens - you > send data, get an ACK, send more, get ACKs of that, etc - and the data > keeps bunching up at the source. > > I.e., ACK clocking works only when the data-ACK look experiences a > bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match > rates' at all. > This was a little bit too fast for me.... Shouldn?t the ACKs be clocked by the TCP data packets, at least in symmetric paths? Thus, the ACK clocking should reflect the TCP rate which is achieved downstream? > FWIW, the same thing happens when the receiver application doesn't drain > the incoming data fast enough. The receive buffers fill up, and the > sender is stalled. The same thing is happening here. > > Yes, absolutely. When a splitter is in use, the sending socket (directed to the final receiver) doesn?t drain its incomming data fast enogh. It?s an interesting question whether data of short term flows can be buffered entirely at the splitter and then sent to the receiver with a rate the link can handle. It?s interesting what handles to the final CLOSE ACK here which is typically not spoofed in splitters to ensure poper ACK semantics. > Splitters are bad for other reasons, but as you said, let's ignore them > for this discussion.. > > I just see that they are in use. And so I think one should weigh up the pro?s and con?s here. In the particular case of wide area mobile networks, I personally think splitters can be helpful because of the extremely irregular delivery times of datagrams. I had great difficulties to see a reason for this and found Thierry Kleins paper !Improved TCP Performance in Wireless IP Networks through Enhanced Opportunistic Scheduling Algorithms" (Globecom 2004) extremely interesting. Perhaps, the scheduling caused variations in packet delivery times are the most distinguishing mark for mobile wide area networks compared to other network technologies. (I would be glad to get comments on this claim!) > It seems like the dominant effect is exactly what you expect - the > endpoint (the splitter, really) isn't experiencing the bottleneck, but > it's "application" (the receiver on the modem) is too slow. So you get > bursty 'scheduling' of the sender based on availability of buffers at > the (IMO, real, or at least effective) receiver. > It?s just interesting to see, whether this is important / relevant / annoying. Detlef From touch at ISI.EDU Fri Jan 5 09:48:06 2007 From: touch at ISI.EDU (Joe Touch) Date: Fri, 05 Jan 2007 09:48:06 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <459E8EA2.4010000@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> Message-ID: <459E8F56.9070101@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: ... >> In high BW*delay product nets, the same stalling happens - you >> send data, get an ACK, send more, get ACKs of that, etc - and the data >> keeps bunching up at the source. >> >> I.e., ACK clocking works only when the data-ACK look experiences a >> bottleneck. When it doesn't, things bunch up, and TCP doesn't 'match >> rates' at all. > > This was a little bit too fast for me.... > > Shouldn?t the ACKs be clocked by the TCP data packets, at least in > symmetric paths? Thus, the ACK clocking should reflect the TCP rate > which is achieved downstream? It does - 'downstream' is really the splitter, i.e., the thing generating the ACKs. Since the path to the splitter and back has no bottleneck, there's no ACK pacing going on. >> FWIW, the same thing happens when the receiver application doesn't drain >> the incoming data fast enough. The receive buffers fill up, and the >> sender is stalled. The same thing is happening here. > > Yes, absolutely. When a splitter is in use, the sending socket (directed > to the final receiver) doesn?t drain its incomming data fast enogh. > > It?s an interesting question whether data of short term flows can be > buffered entirely at the splitter and then sent to the receiver with a > rate the link can handle. Sure it can; that's what a true proxy does. > It?s interesting what handles to the final CLOSE ACK here which is > typically not spoofed in splitters to ensure poper ACK semantics. I don't understand "proper ACK semantics". The splitter destroys those. The semantics that may be kept are at the connection level (open/closed), but the semantics of data ACKs are irrevocably destroyed. >> Splitters are bad for other reasons, but as you said, let's ignore them >> for this discussion.. > > I just see that they are in use. And so I think one should weigh up the > pro?s and con?s here. > In the particular case of wide area mobile networks, I personally think > splitters can be helpful because of the extremely irregular delivery > times of datagrams. > I had great difficulties to see a reason for this and found Thierry > Kleins paper !Improved TCP Performance in Wireless IP Networks through > Enhanced Opportunistic Scheduling Algorithms" (Globecom 2004) extremely > interesting. > > Perhaps, the scheduling caused variations in packet delivery times are > the most distinguishing mark for mobile wide area networks compared to > other network technologies. (I would be glad to get comments on this > claim!) Variations in delivery times can be handled via PEPs that don't spoof ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively participate in the communication. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070105/84adb585/signature.bin From detlef.bosau at web.de Fri Jan 5 12:45:44 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 05 Jan 2007 21:45:44 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <459E8F56.9070101@isi.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> Message-ID: <459EB8F8.4060304@web.de> Joe Touch wrote: >> Shouldn?t the ACKs be clocked by the TCP data packets, at least in >> symmetric paths? Thus, the ACK clocking should reflect the TCP rate >> which is achieved downstream? >> > > It does - 'downstream' is really the splitter, i.e., the thing > generating the ACKs. Since the path to the splitter and back has no > bottleneck, there's no ACK pacing going on. > > O.k. That?s the general problem with splitting and ACK pacing. When you do dumb spoofing, the sender is not correctly paced by the ACKs. In case of a (imminent) buffer overrun at the splitter the sender is throttled by TCP flow control. > >> It?s interesting what handles to the final CLOSE ACK here which is >> typically not spoofed in splitters to ensure poper ACK semantics. >> > > I don't understand "proper ACK semantics". The splitter destroys those. > The semantics that may be kept are at the connection level > (open/closed), but the semantics of data ACKs are irrevocably destroyed. > > I think of the semantics at the connection level. Which I think to be sufficient in many cases. In fact, I think the main problem is that a splitter introduces a single point of failure / hard state into the path: If a router fails, the flow may continue along an alternate path. If a splitter fails, the flow is dead because we cannot recover from the lost state. However, we should careful look at the technology in use: Particularly in mobile wireless networks, I?m not totally convinced (perhaps somebody can comment on this one?) that there are no single points of failure in the path, e.g. a SGSN in GPRS. In that case, the state is "hard" anyway and "making it harder", e.g. by putting a PEP at the SGSN, does not really worsen the situation. >> Perhaps, the scheduling caused variations in packet delivery times are >> the most distinguishing mark for mobile wide area networks compared to >> other network technologies. (I would be glad to get comments on this >> claim!) >> > > Variations in delivery times can be handled via PEPs that don't spoof > ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively > participate in the communication. > > Really? I agree with you for the Remote Socket Architecture (Schlager/Wolisz) because that architecture actually does not split the connection but places the PEP mechanism at the application/socket interface. Otherwise the problem is: When the bandwidth sender - splitter is, e.g., the average bandwidth / rate splitter-sender but far less than the maximum rate splitter / sender than a simple router perhaps would hardly store any data and thus hardly equalize the rate / delivery times. Thierry describes delay spikes of several seconds. If we think about UMTS, we can imagine a wireless link were nothing happens for up to several seconds - thus even no data is clocked out from the sender - and then we have about 2 Mbps throuhput for a short time - which is perhaps much more than the actual Internet path can carry. In such a scenario we want to have the router / splitter / PEP / whateverbox buffer the data and equalize the rate variations. Can this be achieved by pure pacing in the one or other direction? Detlef From lynne at telemuse.net Fri Jan 5 14:19:20 2007 From: lynne at telemuse.net (Lynne Jolitz) Date: Fri, 5 Jan 2007 14:19:20 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459D5C2F.7010409@isi.edu> Message-ID: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> Good luck, but realize that "linux" is not a monolithic group like BSD was. There are many variations on the theme - some very responsive (and well-backed) and others running more hand-to-mouth. Once the genie is out of the bottle, expect a long relaxation time wrt issues in implementation. Joe is right in his annoyance at the lack of testing and communication given the widespread deployment of linux and windows. It is irresponsible to put out a poorly thought networking change that could potentially and unwittingly cause severe congestion and unfairness. But it has been clear for at least a decade that the slower implement / test / prove / deploy cycle no longer is acceptable - not just because it is too slow or costly but because any delay in the release of any code, worthy or not, makes the releaser look like he's bogarting on the rest of the open source community. The days where IETF RFCs and tested releases were done by many of the same people are long gone. If it's important enough, perhaps it's time to take on the responsibility for correctness of operating systems and networking implementations within an accredited organization and certify such. But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead. Lynne Jolitz ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Joe Touch > Sent: Thursday, January 04, 2007 11:58 AM > To: Joe Touch > Cc: Ted Faber; l.andrew at ieee.org; Lloyd Wood; > end2end-interest at postel.org > Subject: Re: [e2e] Are we doing sliding window in the Internet? > > > Finally, let me say that I agree with Ian that the best way to fix this > issue now is to post to the Linux lists, which I will proceed to do. > > I sincerely hope that Linux users on this list will track this and other > IETF lists for such issues, and bring concerns to the Linux group > themselves, rather than expecting "other" list members to do so. > > We *each* fix our own systems (and we're not all Linux users), and this > is (one of) the common place(s) we figure that all out. > > -- > ---------------------------------------- > Joe Touch > Sr. Network Engineer, USAF TSAT Space Segment > > From gds at best.com Sat Jan 6 13:38:14 2007 From: gds at best.com (Greg Skinner) Date: Sat, 6 Jan 2007 21:38:14 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net>; from lynne@telemuse.net on Fri, Jan 05, 2007 at 02:19:20PM -0800 References: <459D5C2F.7010409@isi.edu> <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> Message-ID: <20070106213814.A82315@gds.best.vwh.net> On Fri, Jan 05, 2007 at 02:19:20PM -0800, Lynne Jolitz wrote: > The days where IETF RFCs and tested releases were done by many of > the same people are long gone. If it's important enough, perhaps it's > time to take on the responsibility for correctness of operating > systems and networking implementations within an accredited > organization and certify such. Doesn't this just push the problem onto the accredited organization? What would make the Linux communities more likely to interact with it? Either they have their own accreditation/certification, or it's not an issue WRT development/deployment. --gregbo From lynne at telemuse.net Sat Jan 6 15:06:34 2007 From: lynne at telemuse.net (Lynne Jolitz) Date: Sat, 6 Jan 2007 15:06:34 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <20070106213814.A82315@gds.best.vwh.net> Message-ID: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net> Yes, Greg. You're right. Buy-in is difficult to achieve and maintain, especially in open source. As I also went on to say in that same email you quote: "But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead." People are very good at finding reasons to justify inaction on their part, and it is frustrating to even try for something better. That takes vision and risk. If one were to set up such an arrangement with any eye towards the long-term, wouldn't it be wise to find an approach that would bring in parties and allow them to all benefit from an accord? Isn't it in the best interests of OS and networking developers, academics, and scientists to make sure things work well? But that would require people to reach out to others, put skin in the game, and take a risk. It requires trust and mutual respect. It's much easier to complain and expect someone else to do the work. And it's much easier to ignore complaints because there is too much work to do already. And that's why the marketplace is the default. It's not the best solution, but it is a solution. Lynne Jolitz. ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Greg Skinner > Sent: Saturday, January 06, 2007 1:38 PM > To: Lynne Jolitz > Cc: end2end-interest at postel.org > Subject: Re: [e2e] Are we doing sliding window in the Internet? > > > On Fri, Jan 05, 2007 at 02:19:20PM -0800, Lynne Jolitz wrote: > > The days where IETF RFCs and tested releases were done by many of > > the same people are long gone. If it's important enough, perhaps it's > > time to take on the responsibility for correctness of operating > > systems and networking implementations within an accredited > > organization and certify such. > > Doesn't this just push the problem onto the accredited organization? > What would make the Linux communities more likely to interact with it? > Either they have their own accreditation/certification, or it's not an > issue WRT development/deployment. > > --gregbo > From detlef.bosau at web.de Sun Jan 7 04:33:44 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 07 Jan 2007 13:33:44 +0100 Subject: [e2e] Borat Science. Was: Re: Are we doing sliding window in the Internet? In-Reply-To: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net> References: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net> Message-ID: <45A0E8A8.5080903@web.de> Once again: NO! First of all: What is Linux? A job application of mine once was rejected, one question was: Do you know Linux? When it came to Unix, I mentioned several flavours of Linux-clones, I?m familiar with - I forgot about Linux. Therefore, in the eyes of that employer, I was an idiot. Excuse me, I use Linux in my home since 1993, that?t not that long but it?s perhaps longer that many kids in some human ressources department use computers at all. O.k. For a even more bad flame on this issue, please refer to RMS? well known talk on Linux and Free Software. I just remember that one day even in some university allegedly a department?s chair had said that Linux were more realistic than the NS2. Here in Germany, we have a "joke". Roughly translated: At night, it?s colder than outside. That?s the same scientific level. What is "realistic"? What is _reality_? I only can talk about standards and whether a software is compliant to them or not. The problem with Linux is, that it is "positioned" on the market as a competitor for M$ products and that their are growing commercial interests behind it - and no adequate commercial responsibility and accountability at the same time. So, Linux lost its virginity when it is taken as a scientific research system and it never achieved maturity when it comes to a commercial accountability. I use Linux because it is free, it is sufficient for my purposes. But I don?t accept this "Linux religion" which appears to be continously spreading. The very point is a different one. As I said some days ago, scientific research starts with a problem statement, than we investigate whether there exist solutions or whether the problem can be solved at all and evaluate solutions and approaches. Perhaps we consider new ones if they are better than existing ones or even the first ones to exist. That is, to my understanding, proper science. And it absolutely doesn?t matter whether wie run TCP/IP on Linux, M$ systems, AIX, HP-UX, SunOS or even the KA9Q stack. So, what are we talking about here? Should we do advertising for certain operating systems? Or should we talk about end to end issues in distributed systems? Here in Germany standards sometimes are respected the same way as an act of parliament. E.g. we have something called "Technischer Ueberwachungsverein", roughly: Technical Supervisory Association. If you own a car, you have to persent this to this association every two years in order to make sure that your car complies to the technical regulations here in Germany. And if you don?t do so, you are not allowed to use this car in the public road traffic otherwise it would be a criminal offense. And it absolutely doesn?t matter in this context if you use a Volkswagen or a BMW. (Thanks to Professor Schrempp Mercedes-Benz does not exist any longer. There is some nostalgic trade mark which remembers us at these cars.) So, even you have a "star" at your hood this won?t help you if the test badge is missing. So, we do not experiment with different brake, steering wheels etc. in the public road traffic and count the victims of deadly accidents afterwards. Instead we _first_ define standards, _then_ we make sure that cars used in Germany comply to these. Otherwise these cars must not be used. Period. I once talked to a colleague who told me how this is handled in some country where he spent his vacation. IIRC they had an extemely scientific way for brake testing there: The experiment. Roughly spoken: Put a child against a wall, tell the driver to brake timely before the wall - and if the child is still alive afterwards the brake may have worked sufficiently fine. Sometimes, this approach is called "Borat Science". Lynne Jolitz wrote: > Yes, Greg. You're right. Buy-in is difficult to achieve and maintain, especially in open source. As I also went on to say in that same email you quote: > "But if it's not worth the time and effort for the academic side to take on this charge, the marketplace will have to serve instead." > > People are very good at finding reasons to justify inaction on their part, and it is frustrating to even try for something better. That takes vision and risk. > Excuse me, but what exactly do you call "inaction" here? I always see a vivid discussion here. Many papers are published - much more than I can read. Problems are identified and solved. Where is "inaction" here? In addition: When will the first M$-guy come to this discussion and will claim that the academic community has to fix what they don?t get handled in Redmond? Do you happen to mix up the task of industrial / commercial implementation and proper academic research? > If one were to set up such an arrangement with any eye towards the long-term, wouldn't it be wise to find an approach that would bring in parties and allow them to all benefit from an accord? Isn't it in the best interests of OS and networking Of course! That?s to my understandig the purpose of the IETF. _That?s_ the venue. > developers, academics, and scientists to make sure things work well? > > But that would require people to reach out to others, put skin in the game, and take a risk. It requires trust and mutual respect. It's much easier to complain and expect someone else to do the work. And it's much easier to ignore complaints because there is too much work to do already. > Excuse me, I have no one to do my work. I?m a single unemployed male and I have to do _any_ of my work on my own. And perhaps, some day this is reckognized. If not? Bad luck. So, _please_ don?t tell me anything about risks before yo know what you?re talking about. I try to take part in the academic discussion _without_ any help or assistance. When I try to publish a paper, I even don?t know who will pay to possbible conference fees. That?s all my own risk. Perhaps, some time this will pay. For the moment, it doesn?t. Howver, there is no opportunity for me to get a job, so I try to do some scientific work. _Without_ any help by the IETF or any others. Perhaps, this requires to do some homework. When something does not work, you will even have to spend a night on it or a weekend. But please don?t talk about taking a risk here. > And that's why the marketplace is the default. It's not the best solution, but it is a solution. > > The marketplace has thrown me out. I?m a single male, unemployed for 3 years now, aged 43. For the marketplace, I?m not longer a human being. I graduated in 1992, so for our employment centre and our human ressources departments I?m regarded as an "unskilled worker". So, I take a risk, i.e. that Joe throws me out of this list when I say this, but is my honest opinion: Please leave me alone with this McKinsey attitude! From simon at limmat.switch.ch Sun Jan 7 05:24:20 2007 From: simon at limmat.switch.ch (Simon Leinen) Date: Sun, 07 Jan 2007 14:24:20 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459AF57A.5080304@isi.edu> (Joe Touch's message of "Tue, 02 Jan 2007 16:14:50 -0800") References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> Message-ID: Joe Touch writes: > FYI,Internet MSS's are usually in the 500-byte range in general. A > 5KB file would take 10 packets and be over by the 4th round. Um, the Internet MSS is usually 1460 bytes, except where it is hacked to between 1300 and 1400 bytes to avoid issues with broken Path MTU Detection in the presence of links with an MTU slightly smaller than 1500 (mostly ADSL links). Packets around 500 bytes have become quite rare on the Internet today. -- Simon. From touch at ISI.EDU Sun Jan 7 08:28:02 2007 From: touch at ISI.EDU (Joe Touch) Date: Sun, 07 Jan 2007 08:28:02 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> Message-ID: <45A11F92.3000102@isi.edu> Simon Leinen wrote: > Joe Touch writes: >> FYI,Internet MSS's are usually in the 500-byte range in general. A >> 5KB file would take 10 packets and be over by the 4th round. > > Um, the Internet MSS is usually 1460 bytes, except where it is hacked > to between 1300 and 1400 bytes to avoid issues with broken Path MTU > Detection in the presence of links with an MTU slightly smaller than > 1500 (mostly ADSL links). > > Packets around 500 bytes have become quite rare on the Internet today. http://netweb.usc.edu/~rsinha/pkt-sizes/ http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html 'better connected' sites show larger packet sizes (show in the USC traces), but that smaller packets are still used, and that the average size depends on the protocol (CSL traces). Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070107/96da0c88/signature.bin From Anil.Agarwal at viasat.com Sun Jan 7 10:39:47 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Sun, 7 Jan 2007 13:39:47 -0500 Subject: [e2e] Are we doing sliding window in the Internet? References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu><459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <45A11F92.3000102@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com> Joe Touch wrote - >>> FYI,Internet MSS's are usually in the 500-byte range in general. A >>> 5KB file would take 10 packets and be over by the 4th round. >> >> Um, the Internet MSS is usually 1460 bytes, except where it is hacked >> to between 1300 and 1400 bytes to avoid issues with broken Path MTU >> Detection in the presence of links with an MTU slightly smaller than >> 1500 (mostly ADSL links). >> >> Packets around 500 bytes have become quite rare on the Internet today. > http://netweb.usc.edu/~rsinha/pkt-sizes/ > http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html > 'better connected' sites show larger packet sizes (show in the USC > traces), but that smaller packets are still used, and that the average > size depends on the protocol (CSL traces). Even though smaller packet sizes are observed on the net, depending on protocol and application, that does not imply that the MSS or path MTU is small. Some applications simply send small amounts of data, at a time (telnet, http GETs, etc). I suspect, MSS is of the order of 1300-1460 bytes, even in these traces. Regards, Anil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070107/539f828c/attachment.html From avg at kotovnik.com Sun Jan 7 13:50:11 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Sun, 7 Jan 2007 13:50:11 -0800 (PST) Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net> Message-ID: On Sat, 6 Jan 2007, Lynne Jolitz wrote: > But that would require people to reach out to others, put skin in the > game, and take a risk. It requires trust and mutual respect. It's much > easier to complain and expect someone else to do the work. And it's much > easier to ignore complaints because there is too much work to do > already. > And that's why the marketplace is the default. It's not the best > solution, but it is a solution. Lynne - I think you meant "commercial" vs "community", not "market" vs "collective". The oft-repeated notion that there's anything superior to market is complete nonsense. Market is *any* kind of voluntary exchange and cooperation. That includes contributing resources and labor in order to gain social status, reputation, or sense of belonging to a community. Not all goods are material, and not all exchanges in a marketplace are intermediated with money (or can be priced). There's really no boundary between "for-profit" and "non-profit" activities, and in the real-life commerce every activity includes both - one gains not only profit, but also reputation, recognition and such. Everything else (i.e. tax-funded projects, work required by law, etc) is fundamentally involuntary and cannot exist without threats of violence towards non-cooperators or simply those who disagree. This reduction to fundamentals not only shows that the market is the best solution; it clearly shows that it is the only possible ethical solution. Sorry for the off-topic. --vadim From lynne at telemuse.net Sun Jan 7 14:29:28 2007 From: lynne at telemuse.net (Lynne Jolitz) Date: Sun, 7 Jan 2007 14:29:28 -0800 Subject: [e2e] Are we doing sliding window in theInternet? In-Reply-To: <45A0E8A8.5080903@web.de> Message-ID: <000801c732ab$4b48b100$6e8944c6@telemuse.net> I think this rant illustrates my point to Greg perfectly as to the pitfalls of getting buy-in in open source and working in a respectful and considerate manner. :-) Lynne Jolitz. ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Detlef Bosau > Sent: Sunday, January 07, 2007 4:34 AM > To: end2end-interest at postel.org > Cc: Lynne Jolitz; frank.duerr; Daniel Minder > Subject: [e2e] Borat Science. Was: Re: Are we doing sliding window in > theInternet? > > > > Once again: NO! > > > > First of all: What is Linux? > > A job application of mine once was rejected, one question was: Do you > know Linux? > > When it came to Unix, I mentioned several flavours of Linux-clones, I?m > familiar with - I forgot about Linux. Therefore, in the eyes of that > employer, I was an idiot. > > Excuse me, I use Linux in my home since 1993, that?t not that long but > it?s perhaps longer that many kids in some human ressources > department use computers at all. > > > > O.k. For a even more bad flame on this issue, please refer to RMS? well > known talk on Linux and Free Software. > > I just remember that one day even in some university allegedly a > department?s chair had said that Linux were more realistic than the NS2. > Here in Germany, we have a "joke". Roughly translated: At night, it?s > colder than outside. > That?s the same scientific level. > > What is "realistic"? What is _reality_? > > I only can talk about standards and whether a software is compliant to > them or not. > > The problem with Linux is, that it is "positioned" on the market as a > competitor for M$ products and that their are growing commercial > interests behind it - and no adequate commercial responsibility and > accountability at the same time. So, Linux lost its virginity when it is > taken as a scientific research system and it never achieved maturity > when it comes to a commercial accountability. > > I use Linux because it is free, it is sufficient for my purposes. But I > don?t accept this "Linux religion" which appears to be continously > spreading. > > The very point is a different one. > > As I said some days ago, scientific research starts with a problem > statement, than we investigate whether there exist solutions or whether > the problem can be solved at all and evaluate solutions and approaches. > Perhaps we consider new ones if they are better than existing ones or > even the first ones to exist. > > That is, to my understanding, proper science. > > And it absolutely doesn?t matter whether wie run TCP/IP on Linux, M$ > systems, AIX, HP-UX, SunOS or even the KA9Q stack. > > So, what are we talking about here? > > Should we do advertising for certain operating systems? > > Or should we talk about end to end issues in distributed systems? > > Here in Germany standards sometimes are respected the same way as an act > of parliament. E.g. we have something called "Technischer > Ueberwachungsverein", roughly: Technical Supervisory Association. If you > own a car, you have to persent this to this association every two years > in order to make sure that your car complies to the technical > regulations here in Germany. And if you don?t do so, you are not allowed > to use this car in the public road traffic otherwise it would be a > criminal offense. And it absolutely doesn?t matter in this context if > you use a Volkswagen or a BMW. (Thanks to Professor Schrempp > Mercedes-Benz does not exist any longer. There is some nostalgic trade > mark which remembers us at these cars.) So, even you have a "star" at > your hood this won?t help you if the test badge is missing. > > So, we do not experiment with different brake, steering wheels etc. in > the public road traffic and count the victims of deadly accidents > afterwards. > > Instead we _first_ define standards, _then_ we make sure that cars used > in Germany comply to these. Otherwise these cars must not be used. Period. > > I once talked to a colleague who told me how this is handled in some > country where he spent his vacation. IIRC they had an extemely > scientific way for brake testing there: The experiment. Roughly spoken: > Put a child against a wall, tell the driver to brake timely before the > wall - and if the child is still alive afterwards the brake may have > worked sufficiently fine. > > Sometimes, this approach is called "Borat Science". > > > Lynne Jolitz wrote: > > Yes, Greg. You're right. Buy-in is difficult to achieve and > maintain, especially in open source. As I also went on to say in > that same email you quote: > > "But if it's not worth the time and effort for the academic > side to take on this charge, the marketplace will have to serve instead." > > > > People are very good at finding reasons to justify inaction on > their part, and it is frustrating to even try for something > better. That takes vision and risk. > > > > Excuse me, but what exactly do you call "inaction" here? I always see a > vivid discussion here. Many papers are published - much more than I can > read. Problems are identified and solved. Where is "inaction" here? In > addition: When will the first M$-guy come to this discussion and will > claim that the academic community has to fix what they don?t get handled > in Redmond? Do you happen to mix up the task of industrial / commercial > implementation and proper academic research? > > If one were to set up such an arrangement with any eye towards > the long-term, wouldn't it be wise to find an approach that would > bring in parties and allow them to all benefit from an accord? > Isn't it in the best interests of OS and networking > > Of course! That?s to my understandig the purpose of the IETF. _That?s_ > the venue. > > developers, academics, and scientists to make sure things work well? > > > > But that would require people to reach out to others, put skin > in the game, and take a risk. It requires trust and mutual > respect. It's much easier to complain and expect someone else to > do the work. And it's much easier to ignore complaints because > there is too much work to do already. > > > > Excuse me, I have no one to do my work. I?m a single unemployed male and > I have to do _any_ of my work on my own. And perhaps, some day this is > reckognized. If not? Bad luck. > > So, _please_ don?t tell me anything about risks before yo know what > you?re talking about. > > I try to take part in the academic discussion _without_ any help or > assistance. When I try to publish a paper, I even don?t know who will > pay to possbible conference fees. That?s all my own risk. Perhaps, some > time this will pay. For the moment, it doesn?t. Howver, there is no > opportunity for me to get a job, so I try to do some scientific work. > > _Without_ any help by the IETF or any others. > > Perhaps, this requires to do some homework. When something does not > work, you will even have to spend a night on it or a weekend. > > But please don?t talk about taking a risk here. > > > And that's why the marketplace is the default. It's not the > best solution, but it is a solution. > > > > > > The marketplace has thrown me out. > > I?m a single male, unemployed for 3 years now, aged 43. For the > marketplace, I?m not longer a human being. I graduated in 1992, so for our > employment centre and our human ressources departments I?m regarded as > an "unskilled worker". > > So, I take a risk, i.e. that Joe throws me out of this list when I say > this, but is my honest opinion: Please leave me alone with this McKinsey > attitude! > > > > From lynne at telemuse.net Sun Jan 7 15:34:05 2007 From: lynne at telemuse.net (Lynne Jolitz) Date: Sun, 7 Jan 2007 15:34:05 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: Message-ID: <000c01c732b4$5244ef60$6e8944c6@telemuse.net> My comments were in the context of harnessing e2e expertise to make sure that experimental networking changes made in releases considered carefully congestion and fairness. If that cannot be achieved, then the marketplace will prevail, with unpredictable consequences to network performance and reliability. I'm afraid a discussion of general economic paradigms is off topic. :-) Lynne Jolitz. ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Vadim Antonov > Sent: Sunday, January 07, 2007 1:50 PM > To: Lynne Jolitz > Cc: end2end-interest at postel.org > Subject: Re: [e2e] Are we doing sliding window in the Internet? > > > On Sat, 6 Jan 2007, Lynne Jolitz wrote: > > > But that would require people to reach out to others, put skin in the > > game, and take a risk. It requires trust and mutual respect. It's much > > easier to complain and expect someone else to do the work. And it's much > > easier to ignore complaints because there is too much work to do > > already. > > > And that's why the marketplace is the default. It's not the best > > solution, but it is a solution. > > Lynne - I think you meant "commercial" vs "community", not "market" > vs "collective". The oft-repeated notion that there's anything superior > to market is complete nonsense. > > Market is *any* kind of voluntary exchange and cooperation. That > includes contributing resources and labor in order to gain social status, > reputation, or sense of belonging to a community. Not all goods are > material, and not all exchanges in a marketplace are intermediated with > money (or can be priced). There's really no boundary between "for-profit" > and "non-profit" activities, and in the real-life commerce every activity > includes both - one gains not only profit, but also reputation, > recognition and such. > > Everything else (i.e. tax-funded projects, work required by law, etc) is > fundamentally involuntary and cannot exist without threats of violence > towards non-cooperators or simply those who disagree. > > This reduction to fundamentals not only shows that the market is the best > solution; it clearly shows that it is the only possible ethical solution. > > Sorry for the off-topic. > > --vadim > > From detlef.bosau at web.de Sun Jan 7 15:47:02 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Jan 2007 00:47:02 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <45A18676.3070408@web.de> Vadim Antonov wrote: > > This reduction to fundamentals not only shows that the market is the best > solution; it clearly shows that it is the only possible ethical solution. > > Oh yeah. Meritocracy as the only ethical form of government. Social darwinism as the only acceptable basis for a modern society. Didn?t we see in Europe during the last decade that this does _not_ work? > Sorry for the off-topic. > > I think many will agree here when I propose to return to end to end topics. In fact, I should not answer to this post and I apologize for doing so. However, in my personal situation, I wrote about it, comments like yours hurt bitterly. Please don?t see this as pure criticism. I did not think the way I do all my life. It?s simple my personal experience of life which makes me reconsider some of my opionions. And thus I simple share my actual point of view that "market" is neither the only solution for all kind of problens nor the best solution. However, it?s quite often a very inhuman solution. Detlef From L.Wood at surrey.ac.uk Sun Jan 7 16:12:24 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Mon, 08 Jan 2007 00:12:24 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <000f01c731e7$4fb7cb00$6e8944c6@telemuse.net> Message-ID: <200701080013.AAA16294@cisco.com> At Sunday 07/01/2007 13:50 -0800, Vadim Antonov wrote: >The oft-repeated notion that there's anything superior >to market is complete nonsense. human society is the thing that enables markets to exist; markets are established and made to be free and fair by law and regulation. Without the threat of punishment for misdeeds, there would be no free or fair markets. And there are many things that need to be done that markets do not and cannot address - and when markets are applied to them, they fail miserably. get a clue. L. From avg at kotovnik.com Sun Jan 7 20:35:29 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Sun, 7 Jan 2007 20:35:29 -0800 (PST) Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <000c01c732b4$5244ef60$6e8944c6@telemuse.net> Message-ID: On Sun, 7 Jan 2007, Lynne Jolitz wrote: > My comments were in the context of harnessing e2e expertise to make sure > that experimental networking changes made in releases considered > carefully congestion and fairness. If that cannot be achieved, then the > marketplace will prevail, with unpredictable consequences to network > performance and reliability. I guess in the end the network will be designed properly - i.e. resistant to any kind of behavior from the end hosts (including malicious). It is not that hard to achieve. The best-effort delivery with no fairness enforcement by the network itself is asking for trouble, and I'm suprised that it still persists. If the network is enforcing fairness, there is nothing a misbehaving host (or millions of misbehaving hosts) could do to degrade performance as seen by other users (except as a part of coordinated DDoS attack on a specific target). How hard it is to turn the Fair Queueing knob to "on" on the gateways? The mass deployment of supposedly poorly behaving stacks is either a problem (in which case ISPs and equipment vendors will do the homework needed to protect their networks - or leave the ground to smarter competitors), or a non-issue (in which case nothing changes). In both cases, there's no problem in the long term. With "long" is closer to days than years. So, why exactly should we care? --vadim From detlef.bosau at web.de Mon Jan 8 02:18:29 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Jan 2007 11:18:29 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <45A21A75.7080506@web.de> Vadim Antonov wrote: > > > I guess in the end the network will be designed properly - i.e. resistant > to any kind of behavior from the end hosts (including malicious). It is > And when will this be? When will this trial and error phase come to an end? Apparently, you have all time of the world. > not that hard to achieve. The best-effort delivery with no fairness > enforcement by the network itself is asking for trouble, and I'm suprised > that it still persists. > > You probably want to read the congavoid paper or RFC 2581. You will learn from that, that fairness enforcement _does_ exist. > If the network is enforcing fairness, there is nothing a misbehaving host > (or millions of misbehaving hosts) could do to degrade performance as seen > by other users (except as a part of coordinated DDoS attack on a > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I appreciate that you name even one problem yourself. > specific target). > > How hard it is to turn the Fair Queueing knob to "on" on the gateways? > > > So, why exactly should we care? > > I don?t know, why "we" should care. But I frankly tell you, what you should care fore. First of all, you most probably want to care for a good text book on networking because what you write on this topic simply makes my hair stands on end. The second is a personal advice I tried to give you already yesterday. Perhaps you should reconsider your opinons from time to time based upon your personal life experience. Times are changing and so are opinions. It is always a good idea to put some kind of low pass filter on opinions and to avoid both, extreme positions and simple answers to complex questions. Finally, and I really think so, we should politics out of this list. When we had birthday parties in our familiy or similar occasions, I was always given a strong advice by my father concerning topics of discussion: " NO sports, NO politics, NO religion." You can talk about anything but these. Believe me: My father was perfectly right. And for this list: You?re welcome to contribute to the discussion of end to end issues. I apologize for posting on this issue again. Please, Lynne, Vadim, let us return to the subject of this list again. Not only for the benefit of ourselves but for the benefits of all the other readers. Particularly the thread on sliding window is an interesting one and I learned a lot from it. Perhaps, others find it interesting as well, at least there are far too many contributions for a boring thread. It would be a pity if people would leave thr thread or even the list because of continous off topic posts on politics and similar issues. Thanks. Detlef From sisalem at fokus.fraunhofer.de Mon Jan 8 06:41:39 2007 From: sisalem at fokus.fraunhofer.de (sisalem@fokus.fraunhofer.de) Date: Mon, 8 Jan 2007 15:41:39 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A21A75.7080506@web.de> References: <45A21A75.7080506@web.de> Message-ID: <532648004.20070108154139@mail.iptel.org> Hello, >> not that hard to achieve. The best-effort delivery with no fairness >> enforcement by the network itself is asking for trouble, and I'm suprised >> that it still persists. >> > You probably want to read the congavoid paper or RFC 2581. > You will learn from that, that fairness enforcement _does_ exist. just a short remark: I would assume that the definition of fairness here is that two TCP connections with the same RTT and packet size would receive the same bandwidth share. Hence, fairness enforcement is only partially done. Two TCP sessions with different congestion avoidance schemes (e.g., one with SACK and another one with Reno) will not achieve the same bandwidth share under the same RTT conditions (whether this is to be considered unfair though is another issue which has more to do with philosophy). And a UDP flow is not interested in fairness at all as well. regarding the input about enforcing fairness in the network. I think that the painful experience ATM and ABR taught us already, that network based fairness enforcement schemes are theoretically great but practically too complex to be of practical use cheers >> If the network is enforcing fairness, there is nothing a misbehaving host >> (or millions of misbehaving hosts) could do to degrade performance as seen >> by other users (except as a part of coordinated DDoS attack on a >> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I appreciate that you name even one problem yourself. >> specific target). >> >> How hard it is to turn the Fair Queueing knob to "on" on the gateways? >> >> >> So, why exactly should we care? >> >> > I don?t know, why "we" should care. But I frankly tell you, what you > should care fore. > First of all, you most probably want to care for a good text book on > networking because what you write on this topic simply makes my hair > stands on end. > The second is a personal advice I tried to give you already yesterday. > Perhaps you should reconsider your opinons from time to time based upon > your personal life experience. Times are changing and so are opinions. > It is always a good idea to put some kind of low pass filter on opinions > and to avoid both, extreme positions and simple answers to complex > questions. > Finally, and I really think so, we should politics out of this list. > When we had birthday parties in our familiy or similar occasions, I was > always given a strong advice by my father concerning topics of discussion: > " NO sports, NO politics, NO religion." > You can talk about anything but these. > Believe me: My father was perfectly right. > And for this list: You?re welcome to contribute to the discussion of end > to end issues. > I apologize for posting on this issue again. Please, Lynne, Vadim, let > us return to the subject of this list again. Not only for the benefit of > ourselves but for the benefits of all the other readers. Particularly > the thread on sliding window is an interesting one and I learned a lot > from it. Perhaps, others find it interesting as well, at least there are > far too many contributions for a boring thread. It would be a pity if > people would leave thr thread or even the list because of continous off > topic posts on politics and similar issues. > Thanks. > Detlef -- Best regards, Dorgham mailto:sisalem at iptel.org From detlef.bosau at web.de Sun Jan 7 16:05:44 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Jan 2007 01:05:44 +0100 Subject: [e2e] Hiccups and scheduling in mobile networks In-Reply-To: <459EB8F8.4060304@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> Message-ID: <45A18AD8.8090903@web.de> Joe Touch wrote: >> Variations in delivery times can be handled via PEPs that don't spoof >> ACKs, e.g., ones that pace the data and/or ACK paths, but don't actively >> participate in the communication. >> >> > And my humble comment was: > Really? I agree with you for the Remote Socket Architecture > (Schlager/Wolisz) because that architecture actually does not split > the connection but places the PEP mechanism at the application/socket > interface. > > Otherwise the problem is: When the bandwidth sender - splitter is, > e.g., the average bandwidth / rate splitter-sender but far less than > the maximum rate splitter / sender than a simple router perhaps would > hardly store any data and thus hardly equalize the rate / delivery times. > Thierry describes delay spikes of several seconds. If we think about > UMTS, we can imagine a wireless link were nothing happens for up to > several seconds - thus even no data is clocked out from the sender - > and then we have about 2 Mbps throuhput for a short time - which is > perhaps much more than the actual Internet path can carry. In such a > scenario we want to have the router / splitter / PEP / whateverbox > buffer the data and equalize the rate variations. Can this be achieved > by pure pacing in the one or other direction? > > Detlef > > > O.k. So, I see: Splitting is unsellable :-) So the question is whether we really need it. So, this weekend I spent some time adding hiccups to a quite complex network scenario: Sender-----(internet)--------BS----(mobile net)-------Receiver And the mobile net suffers from hiccups :-) What I would like to know (and AFAIK Andreas dealt with questions like these, therefore I put him on the cc: list) is "how bad" this hiccups my become. As I said before, Thierry Klein published a paper at Globecom 2004 on this issue. There, he observed delay spikes from up to two seconds. For the moment, I simply model the wireless link as a link with a constant high bandwidth (e.g. 10 Mbps) which reflects its _physical_ rate and I add hiccup times to the serialization delay (i.e. txtime in NS2). These are drawn from a two point distribution: Either the hiccup time is zero or it is 1 second. The probabilities are chosen that way that a given average throughput is achieved. Of course that?s extremely simplified. However: Is this reasonable as a first approach? I would appreciate any comment on this one. I would like to study different pacing techniques in this scenario, _intendedly_ without splitting. AFAIK, there is a variety of scheduling algorithms available for networks like GPRS or UMTS. So, the question is whether we have a, if extremely rough, "worst case model" to get a feeling for what TCP has to cope with. The idea of my model above is to insert constant, say 1 second, delay spikes randomly into the flow, just in a way that I can estimate the average throughput on the link. Is this completely weird? Or does it sound reasonable? Thanks Detlef From braden at ISI.EDU Mon Jan 8 11:28:21 2007 From: braden at ISI.EDU (Bob Braden) Date: Mon, 08 Jan 2007 11:28:21 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040157.BAA18111@cisco.com> References: <459C4BF1.6060004@isi.edu> <45980C60.9020405@web.de> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <459C3237.4000709@isi.edu> <200701040028.AAA13798@cisco.com> <459C4BF1.6060004@isi.edu> Message-ID: <5.1.0.14.2.20070108112451.00ac3988@boreas.isi.edu> Lloyd Wood wrote: >Such citations would be informational rather than normative, and therefore >optional. > >Informational references tend to get left out of RFCs. > Indeed. By the standards of academia, IETF protocol publications, even the best of them, often suggest willful ignorance of earlier related work. And even within academic Computer Science, the level of reinvention is sometimes deplorable. Bob Braden From braden at ISI.EDU Mon Jan 8 11:33:45 2007 From: braden at ISI.EDU (Bob Braden) Date: Mon, 08 Jan 2007 11:33:45 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <459C4DD3.3010106@isi.edu> References: <200701040027.AAA13758@cisco.com> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu> <459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <459B1B09.40301@isi.edu> <459B4834.1050304@isi.edu> <20070103214811.GA27322@grc.nasa.gov> <459C2960.7030407@isi.edu> <20070103225935.GA11407@hut.isi.edu> <459C416B.7040702@isi.edu> <200701040027.AAA13758@cisco.com> Message-ID: <5.1.0.14.2.20070108113020.03285388@boreas.isi.edu> > > >The question is "under what conditions is it permissible to override a >SHOULD". I would hope that would be clarified in an update to 2119, but >don't know what the state of that doc is... > >Joe In its original usage in RFC 1122-1123, SHOULD was applied where we could imagine relatively unusual or extreme conditions where the MUST might not apply. But the intent was that anyone who overrode a SHOULD ought to be able to present a credible argument to his/her peers to justify this deviation. In other words, you had better have a "DAMNED GOOD" (technical term) reason for it. Bob Braden >---------------------------------------- >Joe Touch >Sr. Network Engineer, USAF TSAT Space Segment > From detlef.bosau at web.de Mon Jan 8 14:04:30 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Jan 2007 23:04:30 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <532648004.20070108154139@mail.iptel.org> References: <45A21A75.7080506@web.de> <532648004.20070108154139@mail.iptel.org> Message-ID: <45A2BFEE.8010401@web.de> sisalem at fokus.fraunhofer.de wrote: >> You will learn from that, that fairness enforcement _does_ exist. >> > just a short remark: I would assume that the definition of fairness > here is that two TCP connections with the same RTT and packet size > would receive the same bandwidth share. > Hence, fairness enforcement is only partially done. I agree with you here. However, one could understand Vadim that way, that we do no fairness consideration at all. And that?s simply wrong. > Two > TCP sessions with different congestion avoidance schemes (e.g., one > with SACK and another one with Reno) will not achieve the same > bandwidth share under the same RTT conditions (whether this is to be > considered unfair though is another issue which has more to do with > philosophy). And a UDP flow is not interested in fairness at all as > well. > > From my point of view, you mix up at least three issues here. 1. It?s a basic decision to make whether the Internet is built hierarchical or heterarchical. A heterarchical design is of course much more robust than a hierarchical one. A hierarchical desin is only robust against the failure of up to n (n to be defined) nodes. A heterarchical design will still work, when there is even _one_ path left between two nodes which want to communicate. I don?t know the discussions of the early 70s, because I as a schoolboy then, but I can imagine that robustness was a major issue then. If the Internet were designed hierarchical, you could provide admission control and QoS assignments etc. and then you have fairness matching any criteria you desire. In a heterarchical design, it?s much more difficult, if possible, to enforce arbitrary fairness schemes. 2. That TCP/Tahoe will run unfair against TCP/Reno is no structural problem. It goes without argument that TCP flavours are basically fair if all parties use the same one. In my opinion, this is one decisive argument not to play around with an arbitrary number of TCP flavours and see what happens but to carefully consider which flavour is deployed and which consequences this will have. 3. It?s always a concern if protocols are responsive or not or if they are even TCP friendly. In this context please allow the question: What is a "UDP flow"? If you use UDP, the task of fairness / congestion control / TCP friendlyness / responsiveness is passed to the application. > regarding the input about enforcing fairness in the network. I think > that the painful experience ATM and ABR taught us already, that > network based fairness enforcement schemes are theoretically great but > practically too complex to be of practical use > > That?s perhaps even one reason more to use a heterarchical scheme. And as I stated quite some time ago: When I consider all the arguments, why the Internet is supposed not to work, I?m always suprised that it works quite fine :-) Detlef From avg at kotovnik.com Mon Jan 8 19:07:54 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Mon, 8 Jan 2007 19:07:54 -0800 (PST) Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A21A75.7080506@web.de> Message-ID: On Mon, 8 Jan 2007, Detlef Bosau wrote: > When will this trial and error phase come to an end? Apparently, you > have all time of the world. No, I have some experience in networking since when 2400bps was "high speed", and remember how computers with no integrated circiuts in them looked like. And because of that I know that a lot of things considered hard or impossible to do in a few years aren't. > You probably want to read the congavoid paper or RFC 2581. > > You will learn from that, that fairness enforcement _does_ exist. Ah, you don't understand I said - namely that fairness enforcement should be done in the network, and not by the software in end hosts. Besides, TCP is not fair. For example, long-RTT flows always lose to short-RTT flows in non-stationary (i.e. real-life) scenarios. > First of all, you most probably want to care for a good text book on > networking because what you write on this topic simply makes my hair > stands on end. Ah, looks like you haven't been around for a long time; and I did projects in fields other than networking in the recent years. Or you'd be more inclined to listen to what I say. After all, I'm the guy who built backbones in all 24 time zones (including the first commercial T-3 backbone, and the first backbone which did CIDR), wrote networking code for BSD kernel when I worked at BSD Inc. (though, admittedly, not the TCP stack - hacking TCP stack is my present occupation ), and invented the only practical method for doing packet routing at speeds over 10Gbps. > The second is a personal advice I tried to give you already yesterday. I'm not an unemployed engineer. In fact, I made my first million many years ago. My personal life is quite satisfactory. Why exactly should I listen to your personal advice? As for politics, well. Anyone who's doing any engineering should have some grasp of basic economics - because the success of engineering projects often depends not on technical merits of the design but rather on its economics. An engineer who's oblivious to the economic and sociological ramifications of his decisions is, let me put it mildly, incompetent. In the topic at hand the issue of which part of the overall system (network or end-host software) performs the fairness enforcement is neutral from the technical point of view. Technically, it'll work either way. It is not neutral from the point of view of an economist - having shared resource with no admission control creates the tragedy of commons. Meaning that it creates incentives for people to cheat and overexploit the shared resource, until it becomes useless (this, incidentally, is the problem with socialism in general). Therefore the appeal to developers to be conscientious in the way they design network stacks and applications is not going to work. On the other hand, long-haul ISPs have pretty good reason to protect the value of their resources - i.e. the networks. So far, they do not perceive overexploitation as a problem. That will change as end-users en mass start to exchange huge video files - and, consequently, are starting to use software which does cheat - it does make a lot of difference for them. Any P2P software which opens multiple TCP sessions for simultaneous download essentially overrides the rough fairness of the cooperative congestion control. The end-point based congestion control and fairness enforcement, while quite widely deployed, were a bad architectural decision - economically. People who made that decision didn't pay much attention to the economics - they were doing research, not doing business. (To their credit back then even getting data through reliably, without congestion collapses, was a big deal; and this was a workable approach. Things like FQ and RED were invented much later - and back then doing fancy packet processing in backbone gateways was out of question. Heck, people still think doing longest-prefix search with patricia tries is a good idea, though we're no longer in the "horror, we're running out of 16Mb RAM on the darn backbone router" era.) Well, the reality is starting to catch up - the name of the game in the ISP business is no longer "grab as much ground as you can and damn the cost" but, rather, "drive the costs down". The profit margins are getting slim, and the packet transport is no longer novelty, but simply another commodity. It is no longer feasible just to throw bandwidth at the problem; there's not going to be another mad rush to lay fiber anytime soon. --vadim From avg at kotovnik.com Mon Jan 8 20:08:18 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Mon, 8 Jan 2007 20:08:18 -0800 (PST) Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <532648004.20070108154139@mail.iptel.org> Message-ID: On Mon, 8 Jan 2007 sisalem at fokus.fraunhofer.de wrote: > regarding the input about enforcing fairness in the network. I think > that the painful experience ATM and ABR taught us already, that > network based fairness enforcement schemes are theoretically great but > practically too complex to be of practical use The ATM problems are/were due to it's fundamental dependency on the virtual circuits (and inability to route them at high rates), and having the whole bandwidth reservation boondoggle as a design requirement. FQ does not require either. --vadim From tim at ivisit.com Mon Jan 8 20:52:43 2007 From: tim at ivisit.com (Tim Dorcey) Date: Mon, 08 Jan 2007 20:52:43 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: Message-ID: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com> > Any P2P software which opens multiple TCP sessions for simultaneous > download essentially overrides the rough fairness of the cooperative > congestion control. I wonder how much BitTorrent performance is due to his effect? Might it do almost as well if a receiver opened up multiple TCP sessions to the best single source? I get the point that accessing multiple sources simultaneously deals with asymmetry in upload/download speeds. But, something makes me think this washes out in the aggregate if enough torrents are running. I am ignorant on actual network technology though. Is the asymetric upload/download speed common with consumer broadband a function of the last mile link technology? Or, something else? Tim From dga+e2e at cs.cmu.edu Mon Jan 8 21:35:11 2007 From: dga+e2e at cs.cmu.edu (Dave Andersen) Date: Tue, 09 Jan 2007 00:35:11 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <45A3298F.7070502@cs.cmu.edu> Vadim Antonov wrote: > On Mon, 8 Jan 2007, Detlef Bosau wrote: > >> First of all, you most probably want to care for a good text book on >> networking because what you write on this topic simply makes my hair >> stands on end. And in return, might I kindly suggest: http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782 > It is not neutral from the point of view of an economist - having shared > resource with no admission control creates the tragedy of commons. Meaning > that it creates incentives for people to cheat and overexploit the shared > resource, until it becomes useless (this, incidentally, is the problem > with socialism in general). Though in the case of TCP, it takes a certain amount of effort to cheat. Absent an easy to use mechanism in a popular OS, most people aren't going to do it. If you will, there's a certain cost to cheating (be that the cost of tweaking your stack, writing a new protocol, or installing some "accelerator" program that does it for you). > Therefore the appeal to developers to be conscientious in the way they > design network stacks and applications is not going to work. On the other > hand, long-haul ISPs have pretty good reason to protect the value of their > resources - i.e. the networks. So far, they do not perceive > overexploitation as a problem. That will change as end-users en mass > start to exchange huge video files - and, consequently, are starting to > use software which does cheat - it does make a lot of difference for them. > Any P2P software which opens multiple TCP sessions for simultaneous > download essentially overrides the rough fairness of the cooperative > congestion control. > > The end-point based congestion control and fairness enforcement, while > quite widely deployed, were a bad architectural decision - economically. > People who made that decision didn't pay much attention to the economics - > they were doing research, not doing business. (To their credit back then But if you're making an economic argument, you have to consider all of the costs. There is a cost to enforcement in the network, in hardware and complexity. There is a cost to billing by usage, both in actual costs and in customer satisfaction. There most likely exists a point at which the costs of enforcement or the costs of accounting are lower than the costs imposed by cheating users. But in an environment where capacity is still increasing exponentially and where clueful network operators and programmers are not getting any cheaper, it's not clear to me when we'll reach that point. Some people may argue we already have; I don't think that we're there _for the majority of uses_. It may well be that there are applications that want to pay more for better service today (voip, remote open heart surgery), but it's not clear yet that the economic benefit to ISPs for satisfying that class of apps is worth the costs. (Particularly when most of the voip people can usually be satisfied by simply doing prioritization at the edge.) It's very hard to quantify the costs of things like "complexity", "more code", and "users prefer flat-rate billing", but they do exist. > Well, the reality is starting to catch up - the name of the game in the > ISP business is no longer "grab as much ground as you can and damn the > cost" but, rather, "drive the costs down". The profit margins are getting > slim, and the packet transport is no longer novelty, but simply another > commodity. It is no longer feasible just to throw bandwidth at the > problem; there's not going to be another mad rush to lay fiber anytime > soon. The nice thing about today's environment is that the fiber is already in the ground. Adding more capacity is doable by "only" upgrading the transcievers, adding more wavelengths, upgrading to faster multimillion dollar routers, etc. :) I suspect we're saying the same thing from different perspectives, but have possibly different opinions about where we are on the cost curve. -Dave From avg at kotovnik.com Mon Jan 8 21:44:27 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Mon, 8 Jan 2007 21:44:27 -0800 (PST) Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A315F4.90500@cs.cmu.edu> Message-ID: On Mon, 8 Jan 2007, Dave Andersen wrote: > http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782 Oh, I'm never the first to use ad hominem, but I also won't let anyone to try that on me without getting taste of their own medicine. > Though in the case of TCP, it takes a certain amount of effort to cheat. > Absent an easy to use mechanism in a popular OS, most people aren't > going to do it. Cheating TCP is very simple - it is sufficient to open several TCP sessions. All software written specifically to download large files does that. > But if you're making an economic argument, you have to consider all of > the costs. There is a cost to enforcement in the network, in hardware > and complexity. There is a cost to billing by usage, both in actual > costs and in customer satisfaction. Actually, I didn't talk of usage-based billing. Customers tend to dislike it (people like to have predictable expenses), and switch to flat-rate plans whenever they can afford them. What is really needed is fairness enforcement, not usage accounting. In a fair network you pay for the ability to have a guaranteed use of some fraction of network capacity, plus use of proportionally allocated unused capacity. Ideally, the fee should be proportional to the guaranteed fraction. It does not have to be ideal, just somewhat effective. > There most likely exists a point at which the costs of enforcement or > the costs of accounting are lower than the costs imposed by cheating > users. But in an environment where capacity is still increasing > exponentially and where clueful network operators and programmers are > not getting any cheaper, it's not clear to me when we'll reach that > point. Mmm... demand is expanding faster than capacity. Right now the choke point is distribution networks, but that is slowly (in US) being fixed. Currently DSL providers in US have something like 1:30 oversubscription, and P2P has the capacity to soak all of that. In the recent year the DSL service in major population centers got noticeably slower during peak times, and the customer dissatisfaction will eventually force ISPs to decrease the oversubscription. The backbone capacity has hard physical limits - getting smaller dispersion in the fiber or reducing size of WDM frequency bands can go only that far; the remaining option (just put more fibers) is generally limited by what's already in the ground - with no prospect of another dot-com style financial insanity on the horizon. Besides, "lay more fiber" is not exponential, it's linear in bandwidth to cost ratio. > (Particularly when most of the voip people can usually be satisfied by > simply doing prioritization at the edge.) Yep. That's because right now backbones are faster than edge - given the present duty cycle. The duty cycle is changing from 2-3% to 20-30% as video over Internet becomes popular. This will shift (or already shifting) the bottleneck back to the backbones - to the place where it was 10-15 years ago. > It's very hard to quantify the costs of things like "complexity", "more > code", and "users prefer flat-rate billing", but they do exist. The funny part is that most routers can do FQ out of box. Just enabling that will reduce the misbehaving stack/application problem to the point of insignificance. A better design would track FQ weights on per-prefix basis (and sum them when routes are aggregated) to improve fairness on larger scales. > The nice thing about today's environment is that the fiber is already in > the ground. Adding more capacity is doable by "only" upgrading the > transcievers, adding more wavelengths, upgrading to faster multimillion > dollar routers, etc. :) Unfortunately, it is not that simple. You cannot pack information denser than Shannon limit for a given level of noise, you cannot increase S/N by pumping more power into fibers without causing non-linearity and things like Raman scattering. So the way to expand is to put more equipment in parallel and reduce leg distances. It means the expensive things like building more amplifier stations in the middle of nowhere, and beefing up CO space, power, and cooling. The high-speed stuff is hot, and power budget quickly gets to megawatt range. All the while prices on residential access are getting down to few dozens $ per Mbps of downlink capacity. The market is not growing very fast in financial terms. So it is either cost-cutting or out-of-business. There's a huge disparity between capacity of PCs to source/sink traffic (the modern desktop CPUs can easily run 200-300Mbps or TCP traffic with a suitable NIC) and the capacity of the network. This creates, well, an interesting situation - the demand is potentially huge. > I suspect we're saying the same thing from different perspectives, but > have possibly different opinions about where we are on the cost curve. Yep. But at least it is helpful to think about economics rather than go wishing that the world was perfect and everybody did the Right Thing:) --vadim From detlef.bosau at web.de Mon Jan 8 23:50:42 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 09 Jan 2007 08:50:42 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <45A34952.5090809@web.de> Vadim Antonov wrote: >> nt _does_ exist. >> > > Ah, you don't understand I said - namely that fairness enforcement should > be done in the network, and not by the software in end hosts. > > Besides, TCP is not fair. For example, long-RTT flows always lose to > short-RTT flows in non-stationary (i.e. real-life) scenarios. > > Please read my comments on Dorgham Sisalem?s post yesterday. That long-RTT flows lose to short-RTT flows results from the probing scheme used in TCP: When a flow increases its window one segment per round, a short RTT flow increases faster than a long RTT flow. However, we must use some probing scheme and this probing scheme should be adaptive to the path. And of course we have to take into account a flow?s RTT for a probing scheme because we must take into account how fast reactions of a network on probing will be visible. WRT to leaving fairness to the network: I have no first hand experience with ABR. But I think, Dorgham told us about the experiences here yesterday. To make a long story short: I think it?s already said in the Twelve Basic Network Truths but I think it?s generic: There are always arbitrary much simple and wrong solutions to complex prolblems :-) BTW: Just a pointer to literature: For a network based congestion control approach (hopefully I understood this work correctly) you should read the PhD thesis by Srinivasan Keshav. IIRC this is some really interesting work on ressource allocation in a complex network. Detlef From Arnaud.Legout at sophia.inria.fr Tue Jan 9 01:07:07 2007 From: Arnaud.Legout at sophia.inria.fr (Arnaud Legout) Date: Tue, 09 Jan 2007 10:07:07 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com> References: <012f01c733aa$0067ad80$0300a8c0@int.eyematic.com> Message-ID: <45A35B3B.3020702@sophia.inria.fr> Hello, Tim Dorcey wrote: > > I wonder how much BitTorrent performance is due to his effect? Might it do > almost as well if a receiver opened up multiple TCP sessions to the best > single source? > > I get the point that accessing multiple sources simultaneously deals with > asymmetry in upload/download speeds. But, something makes me think this > washes out in the aggregate if enough torrents are running. I am ignorant > on actual network technology though. Is the asymetric upload/download speed > common with consumer broadband a function of the last mile link technology? > Or, something else? This is not the point with BT. It is not a web client doing parallel download to open servers. It is a P2P protocol that manages to enforce cooperation among selfish peers. This is the main reason of its efficiency. If you want to get data you have to give data. The faster you give, the faster you receive, thus a strong sharing incentive. If you are interested in you can get papers on experimental evaluation of BT from my web page. Concerning parallel download, you can read this insightful paper: P. Rodriguez, W. Ernst Biersack., "Dynamic Parallel-Access to Replicated Content in the Internet". In IEEE/Transactions on Networking, August 2002 (Also in IEEE/Infocom 2000)/ /http://www.research.microsoft.com/~pablo/papers/paraload_ton.pdf/ /In particular the authors evaluate parallel download from a single source or from multiple sources. The major conclusion is that with dynamic parallel download you don't have to know who is the best server. This best server can even change with time, this is transparent and still optimal with dynamic parallel download. Regards, Arnaud. -- Arnaud Legout, Ph.D. INRIA Sophia Antipolis - Plan?te Phone : 00.33.4.92.38.78.15 2004 route des lucioles - BP 93 Fax : 00.33.4.92.38.79.78 06902 Sophia Antipolis CEDEX E-mail: arnaud.legout at sophia.inria.fr FRANCE Web : http://www-sop.inria.fr/planete/Arnaud.Legout/index.html From touch at ISI.EDU Tue Jan 9 06:59:29 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 09 Jan 2007 06:59:29 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <45A3ADD1.9070709@isi.edu> Vadim Antonov wrote: > On Mon, 8 Jan 2007, Dave Andersen wrote: > >> http://www.amazon.com/Emily-Posts-Etiquette-16th-Peggy/dp/0062700782 > > Oh, I'm never the first to use ad hominem, but I also won't let anyone to > try that on me without getting taste of their own medicine. All, Please folks, let's keep personal attacks out of this, or at least off the list. There's enough passion about the technical material to offend, anyway ;-) As to medicine, anyone using ad hominems - whether initiated OR in response - will jeopardize their unmoderated list posting privileges. Joe (as list admin) -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/8fca40aa/signature.bin From L.Wood at surrey.ac.uk Tue Jan 9 07:38:24 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Tue, 09 Jan 2007 15:38:24 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: <45A315F4.90500@cs.cmu.edu> Message-ID: <200701091538.PAA00410@cisco.com> At Monday 08/01/2007 21:44 -0800, Vadim Antonov wrote: >What is really needed is fairness enforcement, not usage accounting. Vadim, didn't you just send me a lot of emails advocating free markets sans all enforcements, which just skew said free markets unfairly? Surely usage accounting leads to cost-based accounting and a valid free market, and enforcement is not required? (snorts.) L. >Yep. But at least it is helpful to think about economics rather than go >wishing that the world was perfect and everybody did the Right Thing:) > >--vadim From touch at ISI.EDU Tue Jan 9 09:02:25 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 09 Jan 2007 09:02:25 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> Message-ID: <45A3CAA1.9010505@isi.edu> Lynne Jolitz wrote: > But if it's not worth the time and effort for the academic side to > take on this charge, the marketplace will have to serve instead. It's not whether academics want to spend the time and effort. Many are already giving it for projects they prefer (e.g., FreeBSD in my case); others have none to give (note the dearth of academics on the IESG, which requires letters of 80% support). I.e., the effort of volunteers is subject to its own market as well. However, the primary tension seems to be that: - standards bodies rely on emissaries from development communities - development communities rely on volunteers This may appear to suggest that the two communities are competing for volunteers, but that's not the case. We all *must* come together to work on standards; the same is not true for particular OS's. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/11831fdb/signature.bin From jg at laptop.org Tue Jan 9 10:35:20 2007 From: jg at laptop.org (Jim Gettys) Date: Tue, 09 Jan 2007 13:35:20 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A3CAA1.9010505@isi.edu> References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> <45A3CAA1.9010505@isi.edu> Message-ID: <1168367720.4840.92.camel@localhost> There is a fundamental divide that has to be overcome. With a few exceptions (Ted T'so comes to mind), there has been few Linux people who also have been exposed to actively participating in the IETF. The culture has been that there are IETF (and other specifications) that the Linux community read and implement. And, as you note, they are (often) volunteers, though these days, a large fraction of the key developers are full time employees of various companies. If this community wants to bridge this divide, I'd recommend some active outreach. Having worked in both communities, it is remarkable how few faces are in common. One opportunity is next week at Linux Conf Australia (in Sydney). A year ago, Van Jacobson gave the best talk I've attended in more than a decade in New Zealand at LCA (it was the best talk of the conference, and given twice as a result), and caused quite a bit of a stir and ferment among the Linux networking people. This kind of cross fertilization is healthy for both communities, I believe. Now I'll throw some stones at some of the academic research I've seen done on Linux. One of the fundamental tenants of Linux development is its continual nature. I've seen some very good academic work end up being entirely ignored since, by the time the work was done, the work (which was based on what had become a several year stale version of Linux), was hopeless integrate into Linux. If you *really* want research that can be taken advantage of by Linux, you have to understand Linux's development model, and be willing to pay the price to keep up with ongoing development, and figure out how to get from where Linux is, to where it should be in an incremental fashion. Particularly since the Linux 2.6 series started, "big bang" integrations of large changes into the system never occur; it is always stepwise evolution, and you have to work in this fashion, as part of the development community. Regards, - Jim On Tue, 2007-01-09 at 09:02 -0800, Joe Touch wrote: > > Lynne Jolitz wrote: > > But if it's not worth the time and effort for the academic side to > > take on this charge, the marketplace will have to serve instead. > > It's not whether academics want to spend the time and effort. Many are > already giving it for projects they prefer (e.g., FreeBSD in my case); > others have none to give (note the dearth of academics on the IESG, > which requires letters of 80% support). > > I.e., the effort of volunteers is subject to its own market as well. > > However, the primary tension seems to be that: > - standards bodies rely on emissaries from > development communities > > - development communities rely on volunteers > > This may appear to suggest that the two communities are competing for > volunteers, but that's not the case. We all *must* come together to work > on standards; the same is not true for particular OS's. > > Joe > -- Jim Gettys One Laptop Per Child From dga+ at cs.cmu.edu Tue Jan 9 10:55:40 2007 From: dga+ at cs.cmu.edu (David Andersen) Date: Tue, 9 Jan 2007 13:55:40 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: References: Message-ID: <9AF36FB5-3460-4895-A0C0-E755DDB200FC@cs.cmu.edu> [Wow, sorry, this is long. And I think I'm on e2e with the wrong address, so this might get held up for moderation.] On Jan 9, 2007, at 12:44 AM, Vadim Antonov wrote: > On Mon, 8 Jan 2007, Dave Andersen wrote: > > >> Though in the case of TCP, it takes a certain amount of effort to >> cheat. >> Absent an easy to use mechanism in a popular OS, most people aren't >> going to do it. > > Cheating TCP is very simple - it is sufficient to open several TCP > sessions. All software written specifically to download large files > does > that. - If everyone does it, is it cheating? - If it's only a small constant factor, is it cheating? (It's certainly still TCP-friendly, though TCP-fair is a more stringent definition.) - If instead of running p2p software, I just download 10 programs in parallel instead of 1 program in ten-parallel, is it cheating? I ask not to poke at your argument, but more to expose a fuzziness in the very definition of end-to-end fairness. The meaning of "flow A and flow B interact fairly" is well-defined. The meaning of "application A and application B interact fairly" is less clear. By the time you get to the host level, it's out the window (what if the host is a proxy server for 1000s of clients?). Combine this with the difficulty of determining the direction of value flow for Internet packets, and I think you've got an incredibly difficult problem. It may well be that what we have today is the best solution: An over-engineered core and endpoints that are limited by the capacity of the access link they purchase and by the limited demand that most people have.* (* -- A recent thread on Nanog is interesting in this regard: The Skype people are starting a real-time TV/video/whatever p2p streaming service. If it becomes as popular as they hope / as Skype has, it's quite possible that the demand will go through the roof. I don't pretend to know if they're right, of course.) >> But if you're making an economic argument, you have to consider >> all of >> the costs. There is a cost to enforcement in the network, in >> hardware >> and complexity. There is a cost to billing by usage, both in actual >> costs and in customer satisfaction. > > Actually, I didn't talk of usage-based billing. Customers tend to > dislike > it (people like to have predictable expenses), and switch to flat-rate > plans whenever they can afford them. I know. I was giving that as an example of a cost of enforcement. There's also a cost to doing fairness in the network. >> There most likely exists a point at which the costs of enforcement or >> the costs of accounting are lower than the costs imposed by cheating >> users. But in an environment where capacity is still increasing >> exponentially and where clueful network operators and programmers are >> not getting any cheaper, it's not clear to me when we'll reach that >> point. > > Mmm... demand is expanding faster than capacity. Right now the choke > point is distribution networks, but that is slowly (in US) being > fixed. > Currently DSL providers in US have something like 1:30 > oversubscription, > and P2P has the capacity to soak all of that. In the recent year > the DSL > service in major population centers got noticeably slower during peak > times, and the customer dissatisfaction will eventually force ISPs to > decrease the oversubscription. Eh, there are still multiple factors involved. We're still in a phase where DSL is still gaining because it's replacing the still- very-present dialup: http://www.pewinternet.org/PPF/r/184/report_display.asp (The Pew Internet & American Life project claims that from March 2005 - 2006, 75% of broadband growth came from "current users switching from dial-up to broadband." The total # of homes with broadband access grew by 40% during this period.) The portion of the growth that is fueled by an increasing # of customers or by customers moving to more expensive service pays for itself. (Or better...) That leaves the other portion - demand growth - that has to be balanced with the growth in capacity per dollar. I suspect you're right that capacity per dollar is growing more slowly than total unfunded demand, but it's not quite as bad as it sounds. >> It's very hard to quantify the costs of things like "complexity", >> "more >> code", and "users prefer flat-rate billing", but they do exist. > > The funny part is that most routers can do FQ out of box. Just > enabling > that will reduce the misbehaving stack/application problem to the > point of > insignificance. > > A better design would track FQ weights on per-prefix basis (and sum > them > when routes are aggregated) to improve fairness on larger scales. Certain vendor's routers can do *everything* out of the box. But they don't necessarily do everything well, or stably, or at full line- speed, or in a way that a network operator is comfortable with or can get to behave properly. Consider RED. >> The nice thing about today's environment is that the fiber is >> already in >> the ground. Adding more capacity is doable by "only" upgrading the >> transcievers, adding more wavelengths, upgrading to faster >> multimillion >> dollar routers, etc. :) > > Unfortunately, it is not that simple. You cannot pack information > denser > than Shannon limit for a given level of noise, you cannot increase > S/N by > pumping more power into fibers without causing non-linearity and > things > like Raman scattering. So the way to expand is to put more > equipment in > parallel and reduce leg distances. It means the expensive things like > building more amplifier stations in the middle of nowhere, and > beefing up > CO space, power, and cooling. The high-speed stuff is hot, and power > budget quickly gets to megawatt range. 1) There's still remaining capacity in existing fibers: I'm not an expert in this area, but getting somewhere around 150 Tbit/ sec out of a fiber (aggregate, WDM) should be doable assuming the technology keeps up. (Grossly simplified and underestimated from a quick read of http://www.nature.com/nature/journal/v411/n6841/full/ 4111027a0.html and a few other papers. A very conservative read might say a lower bound would be in the 45Tbit/sec range). That's about an order of magnitude more than the best research results today: http://www.ntt.co.jp/news/news06e/0609/060929a.html (Sept 2006) which is itself about 15x better than what's in use today. 2) A large chunk of the cost of laying fiber is the cost of physically installing it. Hence, dark fiber. (Wikipedia claims without attribution that the physical process "accounts for more than 60% of the cost of developing fiber networks." I don't know the truth of this.) Yes, we've been increasing capacity by using up the dark fiber that was put in during the dot-com craze, but there's still some left. > There's a huge disparity between capacity of PCs to source/sink > traffic > (the modern desktop CPUs can easily run 200-300Mbps or TCP traffic > with a > suitable NIC) and the capacity of the network. This creates, well, an > interesting situation - the demand is potentially huge. Sure. But there's also a big difference between how much people _can_ source/sink and how much they _want_ to. I, like probably 98% of this list, have a *lot* of capacity at my fingertips, and I (ab)use it to the fullest. Over the last day, when I was using the network _a lot_ (streaming mp3s constantly and one movie from my remote storage server), I used about 200 Kbit/sec on average. That's a far cry from even my access link capacity, much less my NIC's capacity. >> I suspect we're saying the same thing from different perspectives, >> but >> have possibly different opinions about where we are on the cost >> curve. > > Yep. But at least it is helpful to think about economics rather > than go > wishing that the world was perfect and everybody did the Right Thing:) Of course. :) I just think that the economics are a bit more subtle than just saying "people can cheat, so they will." -Dave -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/bb569744/PGP.bin From touch at ISI.EDU Tue Jan 9 12:31:34 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 09 Jan 2007 12:31:34 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com> References: <45980C60.9020405@web.de> <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> <459AA501.8050901@isi.edu><459AB7E3.7010705@web.de> <459AF57A.5080304@isi.edu> <45A11F92.3000102@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A357E@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45A3FBA6.3070904@isi.edu> Agarwal, Anil wrote: > =20 > Joe Touch wrote - >>>> FYI,Internet MSS's are usually in the 500-byte range in general. A >>>> 5KB file would take 10 packets and be over by the 4th round. >>> >>> Um, the Internet MSS is usually 1460 bytes, except where it is hacked= >>> to between 1300 and 1400 bytes to avoid issues with broken Path MTU >>> Detection in the presence of links with an MTU slightly smaller than >>> 1500 (mostly ADSL links). >>> >>> Packets around 500 bytes have become quite rare on the Internet today= =2E >=20 >> http://netweb.usc.edu/~rsinha/pkt-sizes/ >> http://tracer.csl.sony.co.jp/mawi/samplepoint-C/2005/200510250900.html= >=20 >> 'better connected' sites show larger packet sizes (show in the USC >> traces), but that smaller packets are still used, and that the average= >> size depends on the protocol (CSL traces). > Even though smaller packet sizes are observed on the net, > depending on protocol and application, that does not imply > that the MSS or path MTU is small. Some applications simply send small > amounts of data, at a time (telnet, http GETs, etc). > I suspect, MSS is of the order of 1300-1460 bytes, > even in these traces. If that's the case, and such MSSs are indeed predominant throughout the Internet, not just in well-connected universities talking to the world, then that begs why the IETF is bothering with updates to path MTU to avoid black-holing. One possibility is that black-holing is prevalent, and that sites accessible only with smaller MTUs whose ICMP 'too big' error messages are not received are being ignored from these traces. Anyway, it's probably appropriate to consider both 500 and 1500-byte MTUs in these calculations. The real question is how much a connection is sped up by using a larger arithmetic increase factor (2x vs 1.5x), and how much that matters depends on the size of the transfer, the BW, the RTT, and the server load. Packet size is part of that equation, but ultimately not all that critical anyway. Joe --=20 ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 259 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/fe522c57/signature.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070109/fe522c57/signature-0001.bin From L.Wood at surrey.ac.uk Tue Jan 9 15:30:02 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Tue, 09 Jan 2007 23:30:02 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <1168367720.4840.92.camel@localhost> References: <005501c73117$8c0cb3c0$6e8944c6@telemuse.net> <45A3CAA1.9010505@isi.edu> <1168367720.4840.92.camel@localhost> Message-ID: <200701092330.XAA12986@cisco.com> At Tuesday 09/01/2007 13:35 -0500, Jim Gettys wrote: >One of the fundamental tenants of Linux development is its continual >nature. I've seen some very good academic work end up being entirely >ignored since, by the time the work was done, the work (which was based >on what had become a several year stale version of Linux), was hopeless >integrate into Linux. That's no different from simulation work done with an ns network simulator snapshot that is hopeless to integrate into the current version of ns, and so gets ignored. Academics are rewarded by writing papers. They are not rewarded by staying current with the current codebase of the linux kernel/ns. L. >If you *really* want research that can be taken advantage of by Linux, >you have to understand Linux's development model, and be willing to pay >the price to keep up with ongoing development, and figure out how to get >from where Linux is, to where it should be in an incremental fashion. > >Particularly since the Linux 2.6 series started, "big bang" integrations >of large changes into the system never occur; it is always stepwise >evolution, and you have to work in this fashion, as part of the >development community. > Regards, > - Jim From lynne at telemuse.net Tue Jan 9 15:46:54 2007 From: lynne at telemuse.net (Lynne Jolitz) Date: Tue, 9 Jan 2007 15:46:54 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <1168367720.4840.92.camel@localhost> Message-ID: <003801c73448$713c10c0$6e8944c6@telemuse.net> Jim, Perfectly correct. The Linux model is very different from the older BSD model of major changes and revisions every few years, and it follows more the product "renovation" cycle that Ray Lane of KPCB espouses. Bridging the gap by carefully following incremental work is the price to be paid by academics to ensure the continuity of their work in Linux. Many people on the academic side still use forms of BSD, and perhaps prefer the old way of doing things. I use BSD myself. However, Linux is clearly the market leader and cooperating with how they handle their development model is a key consideration for promulgating new work in networking and operating systems. I'm pleased to hear how eager the Linux conference attendees were to hear an academic "star" and take them seriously. And you are right - perhaps it is time for more networking and OS "stars" to reach out to them through talks. Lynne Jolitz. ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Jim Gettys > Sent: Tuesday, January 09, 2007 10:35 AM > To: Joe Touch > Cc: Lynne Jolitz; end2end-interest list > Subject: Re: [e2e] Are we doing sliding window in the Internet? > > > There is a fundamental divide that has to be overcome. > > With a few exceptions (Ted T'so comes to mind), there has been few Linux > people who also have been exposed to actively participating in the IETF. > > The culture has been that there are IETF (and other specifications) that > the Linux community read and implement. And, as you note, they are > (often) volunteers, though these days, a large fraction of the key > developers are full time employees of various companies. > > If this community wants to bridge this divide, I'd recommend some active > outreach. Having worked in both communities, it is remarkable how few > faces are in common. > > One opportunity is next week at Linux Conf Australia (in Sydney). A > year ago, Van Jacobson gave the best talk I've attended in more than a > decade in New Zealand at LCA (it was the best talk of the conference, > and given twice as a result), and caused quite a bit of a stir and > ferment among the Linux networking people. This kind of cross > fertilization is healthy for both communities, I believe. > > Now I'll throw some stones at some of the academic research I've seen > done on Linux. > > One of the fundamental tenants of Linux development is its continual > nature. I've seen some very good academic work end up being entirely > ignored since, by the time the work was done, the work (which was based > on what had become a several year stale version of Linux), was hopeless > integrate into Linux. > > If you *really* want research that can be taken advantage of by Linux, > you have to understand Linux's development model, and be willing to pay > the price to keep up with ongoing development, and figure out how to get > from where Linux is, to where it should be in an incremental fashion. > > Particularly since the Linux 2.6 series started, "big bang" integrations > of large changes into the system never occur; it is always stepwise > evolution, and you have to work in this fashion, as part of the > development community. > Regards, > - Jim > > > > > > On Tue, 2007-01-09 at 09:02 -0800, Joe Touch wrote: > > > > Lynne Jolitz wrote: > > > But if it's not worth the time and effort for the academic side to > > > take on this charge, the marketplace will have to serve instead. > > > > It's not whether academics want to spend the time and effort. Many are > > already giving it for projects they prefer (e.g., FreeBSD in my case); > > others have none to give (note the dearth of academics on the IESG, > > which requires letters of 80% support). > > > > I.e., the effort of volunteers is subject to its own market as well. > > > > However, the primary tension seems to be that: > > - standards bodies rely on emissaries from > > development communities > > > > - development communities rely on volunteers > > > > This may appear to suggest that the two communities are competing for > > volunteers, but that's not the case. We all *must* come together to work > > on standards; the same is not true for particular OS's. > > > > Joe > > > -- > Jim Gettys > One Laptop Per Child > > > From detlef.bosau at web.de Wed Jan 10 03:46:56 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 10 Jan 2007 12:46:56 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <003801c73448$713c10c0$6e8944c6@telemuse.net> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> Message-ID: <45A4D230.5020105@web.de> Lynne Jolitz wrote: > Jim, > Perfectly correct. The Linux model is very different from the older BSD model of major changes and revisions every few years, and it follows more the product "renovation" cycle that Ray Lane of KPCB espouses. Bridging the gap by carefully following incremental work is the price to be paid by academics to ensure the continuity of their work in Linux. > > Many people on the academic side still use forms of BSD, and perhaps prefer the old way of doing things. I use BSD myself. However, Linux is clearly the market leader and cooperating with how they handle their development model is a key consideration for promulgating new work in networking and operating systems. > That?s simply not the point. I think, Lloyd Wood made the point precisely: "Academics are rewarded by writing papers. They are not rewarded by staying current with the current codebase of the linux kernel/ns." And this holds for BSD, the OMNET simulator and for all other software that exists. This is no bad excuse for academics not doint Linux development. It?s simply the fact, that research is focussed an detecting and solving problems. This is totally different from development and marketing. It?s the difference between proving algebraic rules for dealing with natural numbers and developing and sell a new desktop calculator. Research is fundamental by its nature and thus has to be independend from simulators and operating systems. There are many fields of research where implementations even do not yet exists - nevertheless they are necessary. I think the basic dispute here is simply a misconception of the very difference between research and development. Detlef From jg at laptop.org Wed Jan 10 05:14:35 2007 From: jg at laptop.org (Jim Gettys) Date: Wed, 10 Jan 2007 08:14:35 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A4D230.5020105@web.de> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> Message-ID: <1168434876.4840.256.camel@localhost> On Wed, 2007-01-10 at 12:46 +0100, Detlef Bosau wrote: > This is no bad excuse for academics not doint Linux development. It?s > simply the fact, that research is focussed an detecting and solving > problems. This is totally different from development and marketing. It?s > the difference between proving algebraic rules for dealing with natural > numbers and developing and sell a new desktop calculator. > > Research is fundamental by its nature and thus has to be independend > from simulators and operating systems. There are many fields of research > where implementations even do not yet exists - nevertheless they are > necessary. > > I think the basic dispute here is simply a misconception of the very > difference between research and development. I have to fundamentally disagree when it comes to systems research. If you are doing research into *systems*, an academic exercise using a marginal system can only be justified if you are trying a *fundamental* change to that system, and *must* start from scratch. Most systems research does not fall into that category. Doing such work outside the context of a current system invalidates the results as you cannot inter compare the results you get with any sort of "control". This is the basis of doing experimental science. Giving me results that some "improvement" helps Linux 2.4.24, when current Linux is 2.6.19, or whatever, essentially invalidates the result, due to the extensive changes between versions. Much of why Van's research was able to be taken seriously by the Linux community and has had impact was precisely in that he had done the work on a recent version of Linux (independent of whether the code was ever to become available or not), and so the variables were very precisely controlled to those of his TCP implementation. He had real credibility as a result. - Jim -- Jim Gettys One Laptop Per Child From Jon.Crowcroft at cl.cam.ac.uk Wed Jan 10 06:01:04 2007 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Wed, 10 Jan 2007 14:01:04 +0000 Subject: [e2e] s/Re: Are we doing sliding window in the Internet?/systems Message-ID: In missive <1168434876.4840.256.camel at localhost>, Jim Gettys typed: >>Doing such work outside the context of a current system invalidates the >>results as you cannot inter compare the results you get with any sort of >>"control". This is the basis of doing experimental science. Giving me >>results that some "improvement" helps Linux 2.4.24, when current Linux >>is 2.6.19, or whatever, essentially invalidates the result, due to the >>extensive changes between versions. aside from also accidentally being useful as well:-) the idea of science is a bit like the idea of open source - so it isn't surprising that computer systems science flourishes in an open source manner- if other people can look at your experimental equipment, as well as your data and can affordably re-run your experiment in the same, similar or other circumstances, the validity of the work, and the rate of expansion of human knowledge, are both enhanced when people do clinical drug trial papers, they are required by many medical journal publishers to place the data in escrow so that 3 independant reviewers can check the data is being analyzed right - patents mean that the drugs themselves are publically checkable - often published funding requires the results are published (suitably anonymised) completely...as with the genome project >>One Laptop Per Child why not one iphone per person ?:-) by the way, do you take the laptops back when they rich 18 (or 16, or 21, or whatever the gnu age of majority is :-) cheers jon From detlef.bosau at web.de Wed Jan 10 06:16:24 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 10 Jan 2007 15:16:24 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <1168434876.4840.256.camel@localhost> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost> Message-ID: <45A4F538.1040905@web.de> Jim Gettys wrote: > > > I have to fundamentally disagree when it comes to systems research. > > If you are doing research into *systems*, an academic exercise using a > marginal system can only be justified if you are trying a *fundamental* > change to that system, and *must* start from scratch. Most systems > research does not fall into that category. > > What do you mean by "research into systems"? The term "system" extremely general. > Doing such work outside the context of a current system invalidates the > results as you cannot inter compare the results you get with any sort of > "control". This is the basis of doing experimental science. Giving me > Why does research "outside the context" of a current system invalidate results? Could you perhaps provide a concrete example for this? > results that some "improvement" helps Linux 2.4.24, when current Linux > is 2.6.19, or whatever, essentially invalidates the result, due to the > extensive changes between versions. > > Much of why Van's research was able to be taken seriously by the Linux > community and has had impact was precisely in that he had done the work > on a recent version of Linux (independent of whether the code was ever > One prominent example for Van?s research is the congavoid paper. Linux did not yet exist when this work was done. Does that invalidate this work? From jg at laptop.org Wed Jan 10 07:18:17 2007 From: jg at laptop.org (Jim Gettys) Date: Wed, 10 Jan 2007 10:18:17 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A4F538.1040905@web.de> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost> <45A4F538.1040905@web.de> Message-ID: <1168442297.4840.321.camel@localhost> On Wed, 2007-01-10 at 15:16 +0100, Detlef Bosau wrote: > Jim Gettys wrote: > > > > > > I have to fundamentally disagree when it comes to systems research. > > > > If you are doing research into *systems*, an academic exercise using a > > marginal system can only be justified if you are trying a *fundamental* > > change to that system, and *must* start from scratch. Most systems > > research does not fall into that category. > > > > > > What do you mean by "research into systems"? The term "system" extremely > general. If you go look at Van's LCA presentation referenced, you'll see it is rethinking TCP's implementation in a real system. That is systems research. Maybe I should have said research in implementations and algorithms. Simulation of protocols does not fit what I'm talking about here. > > > Doing such work outside the context of a current system invalidates the > > results as you cannot inter compare the results you get with any sort of > > "control". This is the basis of doing experimental science. Giving me > > > Why does research "outside the context" of a current system invalidate > results? Could you perhaps provide a concrete example for this? I saw what looked to be very nice results showing better caching behavior in memory systems, done on an obsolete version of Linux a couple years ago. But since that version was so out of date, the data showing improvement over baseline had to be taken with a large *block* of salt, since so much had been done in the base operating system in the meanwhile. The data had become an apples and orange comparison. If I can remember enough to dig up the paper, I'll send a pointer. Without a control, experimental science becomes hand-waving anecdotes (which typifies research in many fields, unfortunately). > > > results that some "improvement" helps Linux 2.4.24, when current Linux > > is 2.6.19, or whatever, essentially invalidates the result, due to the > > extensive changes between versions. > > > > Much of why Van's research was able to be taken seriously by the Linux > > community and has had impact was precisely in that he had done the work > > on a recent version of Linux (independent of whether the code was ever > > > > One prominent example for Van?s research is the congavoid paper. Linux > did not yet exist when this work was done. > Does that invalidate this work? > I still have scars on my back from the internet collapse in the mid '80's. Things were so bad we were at times reduced to Federal Express between Cambridge and Palo Alto. The *proof* that made people take the congestion avoidance work seriously that I remember was the application of Van and Mike Karel's patches to 4.2BSD that made the internet (and the individual machines) work again. Those patches preceded the paper, if I'm not mistaken of the history. The proof was in the implementation of the algorithms in widely used system of that era. Had it been done in the Twenex implementation, while it might have been noticed, its impact would have taken far longer and could even conceivably been ignored. Regards, - Jim -- Jim Gettys One Laptop Per Child From jg at laptop.org Wed Jan 10 07:36:31 2007 From: jg at laptop.org (Jim Gettys) Date: Wed, 10 Jan 2007 10:36:31 -0500 Subject: [e2e] s/Re: Are we doing sliding window in the Internet?/systems In-Reply-To: References: Message-ID: <1168443391.4840.330.camel@localhost> On Wed, 2007-01-10 at 14:01 +0000, Jon Crowcroft wrote: > > by the way, do you take the laptops back when they rich 18 (or 16, or 21, or > whatever the gnu age of majority is :-) > Nope. The going in premise of the project is that the computers are owned by the individual kids; this is for many reasons, including they get taken care of much better if they are individual property rather than communally shared property. And the children are part of a family. Learning does not stop at age 16, or 18, or 21 (with the exception of certain individuals I've known ;-)). - Jim -- Jim Gettys One Laptop Per Child From ddc at csail.mit.edu Wed Jan 10 08:08:29 2007 From: ddc at csail.mit.edu (David Clark) Date: Wed, 10 Jan 2007 11:08:29 -0500 Subject: [e2e] Opportunity to get involved in the NSF FIND research program Message-ID: <45A50F7D.9010602@csail.mit.edu> Folks, Many of you may know that NSF has announced a focus area for research funding called Future Internet Design, or FIND. The idea behind FIND is to bring together interested researchers to discuss options for a future Internet, and to develop integrated proposals for such a network. NSF understands that there is lots of interesting, relevant work that has been funded from sources other than NSF, and there may be folks who would like to come to the meetings and participate in the process, on a BYOF (Bring Your Own Funding) basis. You might have funding from a different NSF program, from another funding agency, or from your company. Perhaps you are from a different country with its own funding mechanisms. However you are funded, if you are interesting in being part of the intellectual effort, please read the attached announcement, which is an invitation to send in an informal white paper describing what you are up to. If you can conceive of other ways to build bridges between this FIND program and other research efforts, please send me a message directly. We are open to other ideas. David Clark (for the FIND Planning Committee) ---- CALL FOR RESEARCH COLLABORATION ON FUTURE INTERNET ARCHITECTURES IN PARTNERSHIP WITH THE US NSF FIND PROGRAM BACKGROUND Much energy has recently crystallized within the international network research community for developing fresh perspectives on how to architect a single, coherent, global data network. The Internet's unquestionable success at embodying one such architecture has also led over the decades of its operation to unquestionable difficulties with regard to support for some types of functionality and sound operation. As a reflection of this growing community interest, the U.S. National Science Foundation has announced a focus area for networking research called FIND, or Future Internet Design. The agenda of this focus area is to invite the research community to take a long-range perspective, and to consider what our global network of 10 or 15 years should be, and how to build a network that meets the future requirements. (For further information on the FIND program, see NSF solicitation 07-507, available at http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf07507.) The research funded by FIND aims to contribute to the emergence of one or more integrated visions of a future network. A vital part of this effort concerns fostering collaboration and consensus-building among researchers working on future global network architecture. To this end, NSF has created a FIND Planning Committee, which is working with NSF to organize a series of meetings among FIND grant recipients structured around activities to identify and refine overarching concepts for a network of the future. A BROADER COMMUNITY NSF recognizes that its efforts at funding research to contribute to a future global network exists within a broader set of efforts with similar goals supported by other agencies, industry, and nations. Accordingly, NSF seeks researchers external to the FIND program itself?but who share a likeminded vision?to participate in the collaboration and consensus-building. NSF particularly welcomes international collaboration?any vision of a future global network will greatly benefit from global participation. To this end, external researchers interested in such participation are invited to submit short white papers describing themselves and their work. Based on evaluation of these white papers, a select number of researchers will be invited to join the FIND meetings and other events, as overall meeting sizes and logistics permit. EXPECTATIONS AND EVALUATION CRITERIA Since the efficacy of FIND meetings is in part a function of their size and coherence, the evaluation of the white papers will focus on certain criteria that are listed below, along with expectations regarding what external participation entails. Naturally, interested parties should take these considerations into account as they write their white papers, and include information in their papers sufficient to allow the FIND program to evaluate the aptness of their participation. ? In a few sentences, please describe your research and its intended impact. When possible, include as an attachment (or a URL) a longer description, which if you wish can be something prepared for another purpose (e.g. your original funding proposal or a publication). It will help to limit the supporting material to 15 pages or fewer. ? Please summarize in the white paper the ways you see your research as being compatible with the objectives of FIND (the URL for the FIND solicitation is included above). Research that accords with the FIND program will generally be based on a long-term vision of future networking, rather than addressing specific near-term problems, and framed in terms of how it might contribute to an overall architecture for a future network. ? The FIND meetings have been organized for the benefit of researchers who have already been funded and are actively pursuing their research. Research described in white papers should already be funded. Please describe the means you have available to cover your FIND-related research: the source of funds, their duration, and (roughly) the supported level of effort. Unfortunately, NSF lacks additional funds to financially support your participation in the meetings, so you must be prepared to cover those costs as well. If you are planning to submit a FIND research proposal to the current NeTS solicitation, you should not submit a white paper here based on that research. Successful FIND grant recipients will automatically be invited to join the FIND community. ? As one of the goals of FIND is to develop an active community of researchers who over time work increasingly together towards coherent, overall architectural visions, we aim for external participants to likewise become significantly engaged. To this end, you should anticipate (and have resources for) participating in FIND project meetings in an active, sustained fashion. ? Your research must not be encumbered by intellectual property restrictions that prevent you from fully discussing your work and its results with the other participants. Please try to limit your white paper to 2 pages. Your white paper (and the supporting research description) will be read by members of the research community, so do not submit anything that you would not reveal to your peers. (White papers are not viewed as formal submissions to NSF.) TIMING AND SUBMISSION You may submit a white paper at any time during the FIND program. Before each scheduled FIND PI meeting, the papers on hand will be reviewed. Meetings are anticipated to occur approximately three times a year, in March, July/August and November. The next FIND meeting is scheduled for March 5/6, 2007, and priority in consideration for that meeting will be given to white papers that are received by Friday, January 19th, 2007. Send your white paper to Darleen Fisher and Allison Mankin for coordination. -------------- next part -------------- A non-text attachment was scrubbed... Name: FIND-external-invite-7.pdf Type: application/pdf Size: 98448 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/09f5e6a4/FIND-external-invite-7-0001.pdf From craig at aland.bbn.com Wed Jan 10 08:20:36 2007 From: craig at aland.bbn.com (Craig Partridge) Date: Wed, 10 Jan 2007 11:20:36 -0500 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: Your message of "Wed, 10 Jan 2007 10:18:17 EST." <1168442297.4840.321.camel@localhost> Message-ID: <20070110162036.16BFA64@aland.bbn.com> In message <1168442297.4840.321.camel at localhost>, Jim Gettys writes: >Those patches preceded the paper, if I'm not mistaken of the history. >The proof was in the implementation of the algorithms in widely used >system of that era. Had it been done in the Twenex implementation, while >it might have been noticed, its impact would have taken far longer and >could even conceivably been ignored. Just building on Jim's recollection. The patches preceeded the paper. But the patches were vigorously tested. Van actually did his work incrementally. First he worked on trying to improve congestion response and then round-trip time estimation. There is a small set of emails from him reporting progress and asking questions on the E2E and TCP-IP lists. He also gave talks, with various graphs showing the behavior of existing TCP implementations and his implementation with various changes and got feedback. (You can see many of these talks if you go look at the old IETF proceedings at www.ietf.org -- a small tragedy -- the Moffett Field talk, which caused everyone to sit up and notice isn't on-line). He distributed his patches to a small number of beta-testers before releasing them widely. There was a lot of testing and carefully staged progress. One fond memory I have of that time is the Winter USENIX (I think in 1988 in San Diego) and finding Van during a break. He was sitting with a thick stack of graphs showing the performance of round-trip time estimation algorithms on real data over problematic Internet paths and sorting out which algorithms worked well. Craig From detlef.bosau at web.de Wed Jan 10 10:56:36 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 10 Jan 2007 19:56:36 +0100 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <1168442297.4840.321.camel@localhost> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost> <45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost> Message-ID: <45A536E4.2030300@web.de> Jim Gettys wrote: >> What do you mean by "research into systems"? The term "system" extremely >> general. >> > > If you go look at Van's LCA presentation referenced, you'll see it is > Could you give me a pointer please? Unfortunately, I don?t know this talk. > rethinking TCP's implementation in a real system. That is systems > research. Maybe I should have said research in implementations and > algorithms. > > As I said, I don?t know the talk yet. However, rethinking TCP?s implementation in a real system should be done independent from a concrete operating system. Of course, one should consider the difficulties encountered in real systems. But then, we should abstract from concrete systems and look for general principles how we can avoid difficulties und learn from our experiences in the past. > Simulation of protocols does not fit what I'm talking about here. > > What are the alternatives? You can build testbeds and you can trace real traffic. At least we should exploit these befor deploying premature protocols. > Without a control, experimental science becomes hand-waving anecdotes > (which typifies research in many fields, unfortunately). > > There is no argument about this. The dissent is first, what is experimental science? To me, engineering is not purely experimental but always should rely on sound theoretical work and include then proper experiemnts. Second: Can experimental deployment replace a solid research? I don?t think so. >> >> One prominent example for Van?s research is the congavoid paper. Linux >> did not yet exist when this work was done. >> Does that invalidate this work? >> >> > > I still have scars on my back from the internet collapse in the mid > '80's. Things were so bad we were at times reduced to Federal Express > between Cambridge and Palo Alto. > So, there was an opportunity to learn from. More drastically spoken. We all know about the Titanic disaster. And about the Takoma bridge disasater. Do these invalidate academic research for, how it is called in englisch, naval engineering and civil engineering? I don?t think so. I think proper research prevented a number of disasters like these. And it was proper research, when we learnt from the Takoma bridge disaster and eventually, after decades of research, the Akashi-Kaikyo bridge could be completed. > The *proof* that made people take the congestion avoidance work > seriously that I remember was the application of Van and Mike Karel's > patches to 4.2BSD that made the internet (and the individual machines) > work again. > First: How many nodes did the Internet have that time? Seconde: How many operating systems and implementations for TCP/IP support have been around that time? To the best of my knowledge, the "Internet" was an experimental test bed that time. We must not compare this situation to the actual one. > Those patches preceded the paper, if I'm not mistaken of the history. > The proof was in the implementation of the algorithms in widely used > I don?t think that this is a "proof". I think, the congavoid paper has a very sound theoretical foundation. What was experienced practically was the problem and the relevance of congestion control. The rest is proper work. I still think on a remark of some computer science professor who even told me that the timeouts could be only determined by experiments. And even these timeouts are based on sound conceptional work in Van?s paper. > system of that era. Had it been done in the Twenex implementation, while > it might have been noticed, its impact would have taken far longer and > could even conceivably been ignored. > If the congestion collapses in the eighties were as bad as you say and if there was a solution, this surely would not have been ignored. Detlef From jg at laptop.org Wed Jan 10 11:28:43 2007 From: jg at laptop.org (Jim Gettys) Date: Wed, 10 Jan 2007 19:28:43 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45A536E4.2030300@web.de> References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost> <45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost> <45A536E4.2030300@web.de> Message-ID: <1168457323.4840.418.camel@localhost> On Wed, 2007-01-10 at 19:56 +0100, Detlef Bosau wrote: > Jim Gettys wrote: > >> What do you mean by "research into systems"? The term "system" extremely > >> general. > >> > > > > If you go look at Van's LCA presentation referenced, you'll see it is > > > > Could you give me a pointer please? Unfortunately, I don?t know this talk. There is a link in the following article, as I posted before. http://lwn.net/Articles/169961/ > > > rethinking TCP's implementation in a real system. That is systems > > research. Maybe I should have said research in implementations and > > algorithms. > > > > > > As I said, I don?t know the talk yet. However, rethinking TCP?s > implementation in a real system should be done independent from a > concrete operating system. > > Of course, one should consider the difficulties encountered in real > systems. But then, we should abstract from concrete systems and look for > general principles how we can avoid difficulties und learn from our > experiences in the past. > I think you will see that by analyzing and solving the real problems in Linux he came up with principles that are (potentially) transferable to many systems. Doing it independently of a real system would have not proved the points he proved in that work Van reported on at LCA last year. > > Simulation of protocols does not fit what I'm talking about here. > > > > > What are the alternatives? You can build testbeds and you can trace real > traffic. > > At least we should exploit these befor deploying premature protocols. Certainly; but they are at most "doing your homework"; but they cannot substitute for deployment or testing at scale on a real network. > > > Without a control, experimental science becomes hand-waving anecdotes > > (which typifies research in many fields, unfortunately). > > > > > > There is no argument about this. > > The dissent is first, what is experimental science? To me, engineering > is not purely experimental but always should rely on sound theoretical > work and include then proper experiemnts. In a real science, theory and experiment go hand in hand; you can't know what problems are worth trying to apply or develop theory for without experience and experiments, and you can't validate any theory without experiment. > > I don?t think that this is a "proof". I think, the congavoid paper has a > very sound theoretical foundation. Yes, and the motivation and theory worked out in reaction to the real world experience and analysis of the network failing. If theory had been understood in the first place in advance of the Internet's congestion collapse, Van would never have worked on the problem; presumably one would try to avoid what one forsees. > > What was experienced practically was the problem and the relevance of > congestion control. > > The rest is proper work. > > I still think on a remark of some computer science professor who even > told me that the timeouts could be only determined by experiments. > And even these timeouts are based on sound conceptional work in Van?s paper. You seem to think that theory exists in a vacuum from experience and experiment. It doesn't. The theory was worked out in reaction to the situation at hand. > > > system of that era. Had it been done in the Twenex implementation, while > > it might have been noticed, its impact would have taken far longer and > > could even conceivably been ignored. > > > > If the congestion collapses in the eighties were as bad as you say and > if there was a solution, this surely would not have been ignored. For parts of the internet, it really was that bad, and it would *certainly* have taken much longer before the work was validated and deployed, had it been done on a small minority system or as a research prototype model. - Jim -- Jim Gettys One Laptop Per Child From james.Ramming at darpa.mil Tue Jan 9 15:50:01 2007 From: james.Ramming at darpa.mil (Ramming, J. Christopher) Date: Tue, 9 Jan 2007 18:50:01 -0500 Subject: [e2e] Assurable Global Networking (RFI & Workshop) Message-ID: REQUEST FOR INFORMATION - Assurable Global Networking Response deadline: January 31, 2007 Workshop for respondents: February 22, 2007 Defense Advanced Research Projects Agency's (DARPA) Strategic Technology Office (STO) is requesting information on research ideas and approaches that could provide the foundation for next-generation Assurable Global Networks (AGNs). For more information please visit: http://www.darpa.mil/sto/solicitations/AGN/index.html From touch at ISI.EDU Wed Jan 10 13:39:47 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jan 2007 13:39:47 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <459EB8F8.4060304@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> Message-ID: <45A55D23.6080505@isi.edu> Detlef Bosau wrote: ... >>> It?s interesting what handles to the final CLOSE ACK here which is >>> typically not spoofed in splitters to ensure poper ACK semantics. >>> >> I don't understand "proper ACK semantics". The splitter destroys those. >> The semantics that may be kept are at the connection level >> (open/closed), but the semantics of data ACKs are irrevocably destroyed. > > I think of the semantics at the connection level. Which I think to be > sufficient in many cases. The result is that you think you started/ended a connection correctly, but that the wrong data got there? As to PEPs... > Otherwise the problem is: When the bandwidth sender - splitter is, e.g., > the average bandwidth / rate splitter-sender but far less than the > maximum rate splitter / sender than a simple router perhaps would hardly > store any data and thus hardly equalize the rate / delivery times. > Thierry describes delay spikes of several seconds. If we think about > UMTS, we can imagine a wireless link were nothing happens for up to > several seconds - thus even no data is clocked out from the sender - and > then we have about 2 Mbps throuhput for a short time - which is perhaps > much more than the actual Internet path can carry. In such a scenario we > want to have the router / splitter / PEP / whateverbox buffer the data > and equalize the rate variations. Can this be achieved by pure pacing in > the one or other direction? Pacing is a simpler version of what you're asking ACK clocking to do; if ACK clocking works, pacing definitely should. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/d6265b10/signature.bin From gds at best.com Wed Jan 10 15:07:36 2007 From: gds at best.com (Greg Skinner) Date: Wed, 10 Jan 2007 23:07:36 +0000 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <1168457323.4840.418.camel@localhost>; from jg@laptop.org on Wed, Jan 10, 2007 at 07:28:43PM +0000 References: <003801c73448$713c10c0$6e8944c6@telemuse.net> <45A4D230.5020105@web.de> <1168434876.4840.256.camel@localhost> <45A4F538.1040905@web.de> <1168442297.4840.321.camel@localhost> <45A536E4.2030300@web.de> <1168457323.4840.418.camel@localhost> Message-ID: <20070110230736.A10734@gds.best.vwh.net> On Wed, Jan 10, 2007 at 07:28:43PM +0000, Jim Gettys wrote: > On Wed, 2007-01-10 at 19:56 +0100, Detlef Bosau wrote: > > I don??t think that this is a "proof". I think, the congavoid paper has a > > very sound theoretical foundation. > > Yes, and the motivation and theory worked out in reaction to the real > world experience and analysis of the network failing. > > If theory had been understood in the first place in advance of the > Internet's congestion collapse, Van would never have worked on the > problem; presumably one would try to avoid what one forsees. Depending on what you mean by "theory", one could argue that the basis of the congavoid paper is in control theory, which was well understood in the 1980s. OTOH, its application to the Internet and TCP/IP implementations of that time was not well understood. > > If the congestion collapses in the eighties were as bad as you say and > > if there was a solution, this surely would not have been ignored. > > For parts of the internet, it really was that bad, and it would > *certainly* have taken much longer before the work was validated and > deployed, had it been done on a small minority system or as a research > prototype model. For Detlef's benefit, there are archives of the tcp-ip mailing list where the early discussions on congestion avoidance in the emerging Internet were held. Most people involved in this discussion today will read the emails of the past and recognize the problem that was being discussed based on what has been studied and published. Go to http://securitydigest.org/tcp-ip/#archives, follow the July 1986 link, and start with the subject "TCP retransmission efficiency". Follow the discussions from there. You'll eventually get to VJ's results. Jim does note correctly that: > Had [congavoid changes] been done in the Twenex implementation, > while it might have been noticed, its impact would have taken far > longer and could even conceivably been ignored. Benefit of making the changes on a widely used platform was that congestion was considerably reduced, validating the research. IMO, it would have been great if more control theory could have been applied to early Internet design. Fortunately, VJ kept plugging enough that he was able to push his ideas through, providing the bedrock for the R&D in network performance that came afterward. --gregbo From detlef.bosau at web.de Wed Jan 10 15:29:47 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 11 Jan 2007 00:29:47 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A55D23.6080505@isi.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> Message-ID: <45A576EB.206@web.de> Joe Touch wrote: >> I think of the semantics at the connection level. Which I think to be >> sufficient in many cases. >> > > The result is that you think you started/ended a connection correctly, > but that the wrong data got there? > Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK, we know that all data has reached the receiving socket. What we do not know is whether the data has reached the application. To my understanding that?s one reason why we use acknowledgements on application level when it is necessary to know whether an application has received all data. So, to my understanding a PEP which keeps the semantics at the connection level keeps all semantics which is provided by TCP itself. Acknowledgements at the application level are beyond the scope of TCP. > As to PEPs... > > >> Otherwise the problem is: When the bandwidth sender - splitter is, e.g., >> the average bandwidth / rate splitter-sender but far less than the >> maximum rate splitter / sender than a simple router perhaps would hardly >> store any data and thus hardly equalize the rate / delivery times. >> Thierry describes delay spikes of several seconds. If we think about >> UMTS, we can imagine a wireless link were nothing happens for up to >> several seconds - thus even no data is clocked out from the sender - and >> then we have about 2 Mbps throuhput for a short time - which is perhaps >> much more than the actual Internet path can carry. In such a scenario we >> want to have the router / splitter / PEP / whateverbox buffer the data >> and equalize the rate variations. Can this be achieved by pure pacing in >> the one or other direction? >> > > Pacing is a simpler version of what you're asking ACK clocking to do; if > ACK clocking works, pacing definitely should. > The problem I mean is very similar to problems like ACK compression or the problem descriped in an RFC draft by Craig: http://tools.ietf.org/html/draft-partridge-e2e-ackspacing-00 Craig addresses the problem that during slow start bursts may grow that large that buffer queues on the path may be overloaded. A similar problem may happen when a mobile network has intermittend delay spikes and phases with high througput. In phases with high throughut a mobile might receive a data burst and thus an appropriate data burst is clocked out at the sender which may overrun queues on the path. Craig proposed to overcome this problem by appropriate ACK spacing, i.e. intendedly puts short time gaps between ACK datagrams. The problem is also addressed in a paper "Paced TCP for High Delay-Bandwidth Networks" by Joanna Kulik, Robert Coulter, Dennis Rockwell and Craig Partridge. The one interesting question for me (perhaps not for the community, depending on the answer ;-)) is: Do we already have a pacing / spacing scheme which provides appropriate ACK spacing for mobile networks? And of course this question very much depends on whether the problem of intermittend bursts in mobile networks is relevant. That?s why I wrote the post on hiccups in mobile networks some days ago. I haved looked for literature in this area quite intensely but found it extremely hard to get useful information here. I already refered to the Globecom 04 paper by Thierry Klein but I did not find really useful additional material on this issue. Particularly scheduling algorithms seem to be company confidential quite often so it is extremely hard to get information there. Moreover, I?m not quite sure whether ACK spacing is already in use here (sic!) because one consequence of doing ACK spacing in mobile networks is that the sender is confronted with a large delay bandwidth product. From the literatur about mobile networks I know that large delay bandwidth produckts are often claimed for mobile networks - however no one could explain to me where the claimed path capacity should come from. It?s surely not the wireless channel which typically hardly keeps an IP packet layer. I don?t think it?s likely that the ARQ buffers provide too much memory capacity because a "sliding window scheme" for ARQ and RLP would require mobile receivers to keep a number of incomplete IP-packets and therefore a certain amount of storage capacity for a questionable benefit because in mobile networks the wireless path can keep only very few RLP frames on the fly. In short: Perhaps we may find some kbytes memory on L2 here. Perhaps the layer 2 may keep an average of one or two IP packets. That does absolutely not explain why mobile networks are frequently claimed to have that large bandwidth delay products that this would be a problem for TCP. So, I?m just eager to know what mobile network operators are doing here. If mobile networks really exhibit that large delay bandwidth products, and if we have intermittent bursts and delay spikes here we do not talk about some kbytes but we talk about up to several hundred kbytes and more depending on how bursty the traffic is, we have the same issues here as we have in satellite networks and other networks with an extremely large delay bandwidth product. So my question is of course a state of the art question. And I spent a huge amount of time for literature research on this issue but as I said its extremely hard to find resilient research papers here. Most of the information I found is either extremely vague or it is written in PhD theses which are written in close cooperation with network operators and where I find claimed problems - but when it comes to details, this is "corporate confidential", which is definitely not my understanding of proper research. In know that this post here exhibits a very strong criticism against many papers which present "results" from "practical experiences with GPRS" etc. But after having read dozens of papers of this kind for years, my conclusion is that many of the authors present snapshots of non repeatable experiments here and do not really know what they have measured. The more material I read of this kind the less I?m convinced that the material is good. So, it?s my personal opinion, and if this is wrong I?m willing to accept criticism here, that when it comes to mobile networks we have quite a few statement of belief but hardly any resilient material. And what I find extremely annoying here is the permanent excuse "we cannot say anything about the wireless channel". I own a cell phone myself for more than a decade now and use it frequently. And in fact, mobile NOs know there channels that well that they can offer phone service. So the knowledge on mobile channels may be incomplete - but there is more than nothing. In addition, there is a bunch of work on adaptive channel coding. Now, you cannot adapt a coding scheme when you don?t know what channel properties your coding scheme shall be adapted to. So obviously, there _are_ channel models. And they are practically used. And there _are_ Radio Link Protocols and thre _are_ MAC- and scheduling schemes. But when I ascked even research engineers in well known companies which build mobile phones why e.g. GPRS accepts delivery times for a packet of up to 10 minutes, no one was able or willing to explain this to me. Now, why it?s in the standards, when there is no explanation for this or no necessity to accept this? I was involved in an academic research project which dealt with adaptation of multimedia streams at varying channel conditions in mobile networks. And even there I didn?t get resilient material at which conditions I should adapt by our industrial partners. The inevitable consequence was that the reserach ended up in a pure disaster. I waisted years of my life on this one. So, when I write this post, you see me in fact in an angry and bitter condition. Nearly seven years ago, a professor asked my what are the characteristics of mobile network. After seven years I still do not know. And when I tried to talk to colleagues from mobile phone manufacturers the only remark was: "Oh, I see, you?re used to wirebound networks". I have seen a number of PhD theses dealing with hiccups. But I have not yet seen any resilient material whether there _are_ hiccups. Of course we can do research that way: "Let?s assume hiccups." O.k. But which assumptions are reasonable here? And which are resulting delay bandwidth products? 10 kByte? 100 kByte? 10 MByte? And which RTTs are we going to see if we use sufficient buffering? 1 second? 2 seconds? Or - according to the ETSI standard for GPRS - a quarter of an hour? During the last seven years of my life, from which I am unemployed the last three years, I always wanted to understand only one thing: "What are the consequences of mobilty and mobile networks for TCP and upper layers?" And after seven years, to the best of my knowledge, I say: We have a lot of creeds - but hardly any resilient knowlege. Detlef > Joe > > From touch at ISI.EDU Wed Jan 10 15:57:46 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jan 2007 15:57:46 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A576EB.206@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> Message-ID: <45A57D7A.6030505@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: >>> I think of the semantics at the connection level. Which I think to be >>> sufficient in many cases. >>> >> >> The result is that you think you started/ended a connection correctly, >> but that the wrong data got there? >> > > Well, it?s just how I understand the semantics of a "CLOSE ACK". When a > receiver issues a CLOSE ACK, we know that all data has reached the > receiving socket. We should know that. But when we have intermidiates spoofing ACKs, all we know is that the two endpoints agree that they have closed. The data itself is not known. Case in point - if the intermediary ACKs data and continues to buffer it, and the window wraps, and then the intermediary goes down, the endpoints think the data reached the buffer correctly but it really did not. > What we do not know is whether the data has reached > the application. TCP is a reliable transport protocol; it is not a reliable application protocol. Actions outside of TCP are not ensured by TCP. > To my understanding that?s one reason why we use > acknowledgements on application level when it is necessary to know > whether an application has received all data. Agreed, but we do know some other things. As a *receiver*, when we issue a CLOSE, we keep reading until there is no more data. If we do so, AND we receive a "no more data", then we *know* all the data has been received correctly. I.e., the semantics of who knows what are receiver-driven, not sender. > So, to my understanding a PEP which keeps the semantics at the > connection level keeps all semantics which is provided by TCP itself. > Acknowledgements at the application level are beyond the scope of TCP. See above; PEPs that spoof ACKs can result in different data streams being 'correctly' processed without either side knowing so. Joe > > >> As to PEPs... >> >> >>> Otherwise the problem is: When the bandwidth sender - splitter is, e.g., >>> the average bandwidth / rate splitter-sender but far less than the >>> maximum rate splitter / sender than a simple router perhaps would hardly >>> store any data and thus hardly equalize the rate / delivery times. >>> Thierry describes delay spikes of several seconds. If we think about >>> UMTS, we can imagine a wireless link were nothing happens for up to >>> several seconds - thus even no data is clocked out from the sender - and >>> then we have about 2 Mbps throuhput for a short time - which is perhaps >>> much more than the actual Internet path can carry. In such a scenario we >>> want to have the router / splitter / PEP / whateverbox buffer the data >>> and equalize the rate variations. Can this be achieved by pure pacing in >>> the one or other direction? >>> >> >> Pacing is a simpler version of what you're asking ACK clocking to do; if >> ACK clocking works, pacing definitely should. >> > > The problem I mean is very similar to problems like ACK compression or > the problem descriped in an RFC draft by Craig: > > http://tools.ietf.org/html/draft-partridge-e2e-ackspacing-00 > > Craig addresses the problem that during slow start bursts may grow that > large that buffer queues on the path may be overloaded. > A similar problem may happen when a mobile network has intermittend > delay spikes and phases with high througput. In phases with high > throughut a mobile might receive a data burst and thus an appropriate > data burst is clocked out at the sender which may overrun queues on the > path. > > Craig proposed to overcome this problem by appropriate ACK spacing, i.e. > intendedly puts short time gaps between ACK datagrams. > The problem is also addressed in a paper "Paced TCP for High > Delay-Bandwidth Networks" by Joanna Kulik, Robert Coulter, Dennis > Rockwell and Craig Partridge. > > The one interesting question for me (perhaps not for the community, > depending on the answer ;-)) is: Do we already have a pacing / spacing > scheme which provides appropriate ACK spacing for mobile networks? > > And of course this question very much depends on whether the problem of > intermittend bursts in mobile networks is relevant. That?s why I wrote > the post on hiccups in mobile networks some days ago. I haved looked for > literature in this area quite intensely but found it extremely hard to > get useful information here. I already refered to the Globecom 04 paper > by Thierry Klein but I did not find really useful additional material on > this issue. Particularly scheduling algorithms seem to be company > confidential quite often so it is extremely hard to get information there. > > Moreover, I?m not quite sure whether ACK spacing is already in use here > (sic!) because one consequence of doing ACK spacing in mobile networks > is that the sender is confronted with a large delay bandwidth product. > From the literatur about mobile networks I know that large delay > bandwidth produckts are often claimed for mobile networks - however no > one could explain to me where the claimed path capacity should come > from. It?s surely not the wireless channel which typically hardly keeps > an IP packet layer. I don?t think it?s likely that the ARQ buffers > provide too much memory capacity because a "sliding window scheme" for > ARQ and RLP would require mobile receivers to keep a number of > incomplete IP-packets and therefore a certain amount of storage capacity > for a questionable benefit because in mobile networks the wireless path > can keep only very few RLP frames on the fly. > > In short: Perhaps we may find some kbytes memory on L2 here. Perhaps the > layer 2 may keep an average of one or two IP packets. That does > absolutely not explain why mobile networks are frequently claimed to > have that large bandwidth delay products that this would be a problem > for TCP. > > So, I?m just eager to know what mobile network operators are doing here. > > If mobile networks really exhibit that large delay bandwidth products, > and if we have intermittent bursts and delay spikes here we do not talk > about some kbytes but we talk about up to several hundred kbytes and > more depending on how bursty the traffic is, we have the same issues > here as we have in satellite networks and other networks with an > extremely large delay bandwidth product. > > So my question is of course a state of the art question. And I spent a > huge amount of time for literature research on this issue but as I said > its extremely hard to find resilient research papers here. Most of the > information I found is either extremely vague or it is written in PhD > theses which are written in close cooperation with network operators and > where I find claimed problems - but when it comes to details, this is > "corporate confidential", which is definitely not my understanding of > proper research. > > In know that this post here exhibits a very strong criticism against > many papers which present "results" from "practical experiences with > GPRS" etc. > But after having read dozens of papers of this kind for years, my > conclusion is that many of the authors present snapshots of non > repeatable experiments here and do not really know what they have > measured. The more material I read of this kind the less I?m convinced > that the material is good. > > So, it?s my personal opinion, and if this is wrong I?m willing to accept > criticism here, that when it comes to mobile networks we have quite a > few statement of belief but hardly any resilient material. > > And what I find extremely annoying here is the permanent excuse "we > cannot say anything about the wireless channel". I own a cell phone > myself for more than a decade now and use it frequently. And in fact, > mobile NOs know there channels that well that they can offer phone > service. So the knowledge on mobile channels may be incomplete - but > there is more than nothing. In addition, there is a bunch of work on > adaptive channel coding. Now, you cannot adapt a coding scheme when you > don?t know what channel properties your coding scheme shall be adapted > to. So obviously, there _are_ channel models. > And they are practically used. And there _are_ Radio Link Protocols and > thre _are_ MAC- and scheduling schemes. > > But when I ascked even research engineers in well known companies which > build mobile phones why e.g. GPRS accepts delivery times for a packet of > up to 10 minutes, no one was able or willing to explain this to me. Now, > why it?s in the standards, when there is no explanation for this or no > necessity to accept this? > > I was involved in an academic research project which dealt with > adaptation of multimedia streams at varying channel conditions in mobile > networks. And even there I didn?t get resilient material at which > conditions I should adapt by our industrial partners. The inevitable > consequence was that the reserach ended up in a pure disaster. I waisted > years of my life on this one. So, when I write this post, you see me in > fact in an angry and bitter condition. > > Nearly seven years ago, a professor asked my what are the > characteristics of mobile network. After seven years I still do not know. > And when I tried to talk to colleagues from mobile phone manufacturers > the only remark was: "Oh, I see, you?re used to wirebound networks". > > I have seen a number of PhD theses dealing with hiccups. But I have not > yet seen any resilient material whether there _are_ hiccups. > Of course we can do research that way: "Let?s assume hiccups." O.k. But > which assumptions are reasonable here? And which are resulting delay > bandwidth products? 10 kByte? 100 kByte? 10 MByte? And which RTTs are we > going to see if we use sufficient buffering? 1 second? 2 seconds? > Or - according to the ETSI standard for GPRS - a quarter of an hour? > > During the last seven years of my life, from which I am unemployed the > last three years, I always wanted to understand only one thing: > "What are the consequences of mobilty and mobile networks for TCP and > upper layers?" > > And after seven years, to the best of my knowledge, I say: We have a lot > of creeds - but hardly any resilient knowlege. > > Detlef >> Joe >> >> > > -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/48d916ff/signature-0001.bin From L.Wood at surrey.ac.uk Wed Jan 10 16:25:24 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 11 Jan 2007 00:25:24 +0000 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A576EB.206@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> Message-ID: <200701110025.AAA04307@cisco.com> At Thursday 11/01/2007 00:29 +0100, you wrote: >Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK, ...the ACK starts off CLOSE to the receiver and then goes FURTHER AWAY, ending up a FAR ACK. That's the semantics. L. channelling crowcroft. From faber at ISI.EDU Wed Jan 10 17:35:08 2007 From: faber at ISI.EDU (Ted Faber) Date: Wed, 10 Jan 2007 17:35:08 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <200701110025.AAA04307@cisco.com> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> <200701110025.AAA04307@cisco.com> Message-ID: <20070111013508.GA1402@hut.isi.edu> On Thu, Jan 11, 2007 at 12:25:24AM +0000, Lloyd Wood wrote: > At Thursday 11/01/2007 00:29 +0100, you wrote: > > >Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK, > > ...the ACK starts off CLOSE to the receiver and then goes FURTHER > AWAY, ending up a FAR ACK. That's the semantics. And if you collect 5 of them, it's a FIN ACK. Which may be where we were *trying* to go. :-) -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/18fd3301/attachment.bin From Anil.Agarwal at viasat.com Wed Jan 10 20:08:06 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Wed, 10 Jan 2007 23:08:06 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) References: <45A57D7A.6030505@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> Joe Touch wrote - >> >> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a >> receiver issues a CLOSE ACK, we know that all data has reached the >> receiving socket. > We should know that. But when we have intermidiates spoofing ACKs, all > we know is that the two endpoints agree that they have closed. The data > itself is not known. > Case in point - if the intermediary ACKs data and continues to buffer > it, and the window wraps, and then the intermediary goes down, the > endpoints think the data reached the buffer correctly but it really did not. Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes of sender data without delivering any to the receive end-point, then goes down, and the end-points continue the connection using the wrapped sequence number, which in this case match up just right, so that the intervening 2^32 bytes disappear down a black hole, without the sender or receive being any wiser? Cheers, Anil ------------------ Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070110/bb9dc00f/attachment.html From detlef.bosau at web.de Thu Jan 11 02:07:25 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 11 Jan 2007 11:07:25 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <20070111013508.GA1402@hut.isi.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> <200701110025.AAA04307@cisco.com> <20070111013508.GA1402@hut.isi.edu> Message-ID: <45A60C5D.7000300@web.de> Ted Faber wrote: > On Thu, Jan 11, 2007 at 12:25:24AM +0000, Lloyd Wood wrote: > >> At Thursday 11/01/2007 00:29 +0100, you wrote: >> >> >>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a receiver issues a CLOSE ACK, >>> >> ...the ACK starts off CLOSE to the receiver and then goes FURTHER >> AWAY, ending up a FAR ACK. That's the semantics. >> > > And if you collect 5 of them, it's a FIN ACK. Which may be where we > were *trying* to go. :-) > > I apologize if this is a stupid question, but from the RFC I understood that the receiving socket sends an acknowledgement when all data was received. That?s CLOSE ACK. Is there an explicit acknowledgement which tell?s the sender that all data has been delivered to the _application_? Can this even be achieved in finite time? An application may crash or hang! To my understanding the Fin/FinACK/..... is to shut down a TCP connection knowing about the two army problem and the fact that this cannot be solved in finite time? I apologizse when this is a misconception... Detlef From detlef.bosau at web.de Thu Jan 11 03:43:04 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 11 Jan 2007 12:43:04 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A57D7A.6030505@isi.edu> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> <45A57D7A.6030505@isi.edu> Message-ID: <45A622C8.1080707@web.de> Joe Touch wrote: >>> >> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a >> receiver issues a CLOSE ACK, we know that all data has reached the >> receiving socket. >> > > We should know that. But when we have intermidiates spoofing ACKs, all > we know is that the two endpoints agree that they have closed. The data > itself is not known. > > Case in point - if the intermediary ACKs data and continues to buffer > it, and the window wraps, and then the intermediary goes down, the > endpoints think the data reached the buffer correctly but it really did not. > > Of course. But, assumed we can overcome the window wrap problem, to my understanding spoofing boxes must not spoof CLOSE ACK, so that the sender is not notified that all data has reached the final receiver until this happens. Of course, we don?t know anything of intermediate states and of course we run into a problem if a spoofing box fails. >> What we do not know is whether the data has reached >> the application. >> > > TCP is a reliable transport protocol; it is not a reliable application > protocol. Actions outside of TCP are not ensured by TCP. > > Fine :-) Then one could even argue with an end to end argument: When a tranport protocol cannot assure that sent data has been read successfully by the receiving application, we do need an acknowledgement scheme at the application level anyway. Please don?t misunderstand me. I don?t want to be careless about the problem. All I want to say is that there may be situations, e.g. extremely large delay bandwidth products where one perhaps really wants to have an alternative to AIMD probing to have an acceptable startup behaviour, where proxies / splitters / spoofing boxes should be considered very seriously. I don?t remember the paper but I think Sally Floyd once wrote about a satellite connecetion where it takes 20 minutes or so for a flow to achive acceptable throuhgput due to an extremely large delay bandwidth product. So, when we _have_ acknowledgements at application level and we can reduce fate sharing problems to an acceptable level and some proxy could help us to significantly accelerate the start up phase here, I think we should at least consider this as one way to go. >> To my understanding that?s one reason why we use >> acknowledgements on application level when it is necessary to know >> whether an application has received all data. >> > > Agreed, but we do know some other things. As a *receiver*, when we issue > a CLOSE, we keep reading until there is no more data. If we do so, AND > we receive a "no more data", then we *know* all the data has been > received correctly. > O.k., so we can detect an error: The sender sent a CLOSE and there is trailing data afterwards. In that case (I don?t know what the RFCs say here) we can issue an error message , e.g. a RST. So, let?s take the sender?s view then: How long shall a sender wait for a possible error message like that? Doesn?t this lead to the problem that a missing NAK is not equivalent to an ACK? > I.e., the semantics of who knows what are receiver-driven, not sender. > > However, in a reliable connection the sender wants to know when all data has been completely delivered. >> So, to my understanding a PEP which keeps the semantics at the >> connection level keeps all semantics which is provided by TCP itself. >> Acknowledgements at the application level are beyond the scope of TCP. >> > > See above; PEPs that spoof ACKs can result in different data streams > being 'correctly' processed without either side knowing so. > > Joe > > From lars.eggert at nokia.com Thu Jan 11 07:43:16 2007 From: lars.eggert at nokia.com (lars.eggert@nokia.com) Date: Thu, 11 Jan 2007 17:43:16 +0200 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <200701040429.EAA24974@cisco.com> Message-ID: <213394AB6954BE4682E8B2A49103468348536D@esebe108.NOE.Nokia.com> Hi, sorry for jumping in late - big pile of unread mail over the holidays. > This issue is minor compared to the widespread changes to > their TCP stack Microsoft made with adopting Compound TCP in Vista. > http://www.microsoft.com/technet/community/columns/cableguy/cg > 1105.mspx The IETF has approached MS over this issue, and apparently C-TCP will not be enabled by default on end-user Vista versions. The decision for the server versions has not been made AFAIK. Recent Linux versions, however, apparently enable BIC or CUBIC TCP by default, which raises concerns. In any case, these examples all illustrate that there seems to be considerable interest in deploying "faster" TCP variants on the Internet, or TCP variants that are "more optimal" across certain paths. Many of these schemes can be significantly more aggressive than any current congestion control standard. The concern is that while many of these proposals look interesting, few if any have been validated to the point where they can be recommended for wide-spread deployment. (And let's be clear, stuff that gets shipped in the most common stacks out there _is_ seeing wide-spread deployment, especially if enabled by default.) Many modifications haven't even been _documented_ to the point of allowing to analyze their impact, even ones that are shipping. We're on a slippery slope here. Yes, TCP is less than efficient in many network scenarios that are becoming increasingly more common, and modifications can have a positive impact. But they can also have a large negative impact on the careful equilibrium that the VJ mechanism have maintained for the last 15 years. Congestion control is arguably one of the pillars of the Internet, and changes need to be thought through and validated carefully, both by the proposers and the community at large. The good news is that we do have research results and some limited operational experience that looks promising. We need more of it. Before wide-spread deployment, we need wide-spread experimentation. The recent draft-floyd-tsvwg-cc-alt lists a number of important points that such experiments need to discuss. Some IETF transport folks have been discussing how to make progress in this space. A first step seems to be that proposed modifications need to be sufficiently documented by the proposers in a public forum, such that the community can review them. Informational RFCs are a convenient form. The community could then elect to further discuss and analyze promising proposals, developing them towards specification for Experimental use. A Standards Track effort would eventually follow. We're planning to further discuss these issues and a proposed way forward at the ICCRG and PFLDnet meetings in February and would welcome participation from researchers, developers and other interested parties. Lars -- NEW EMAIL: lars.eggert at nokia.com NEW MOBILE: +358 50 48 24461 NEW JABBER: lars.eggert at googlemail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 3368 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/9d44353d/smime-0001.bin From david.borman at windriver.com Thu Jan 11 09:11:43 2007 From: david.borman at windriver.com (David Borman) Date: Thu, 11 Jan 2007 11:11:43 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A60C5D.7000300@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> <200701110025.AAA04307@cisco.com> <20070111013508.GA1402@hut.isi.edu> <45A60C5D.7000300@web.de> Message-ID: <456C4B17-4E6C-4700-A0C8-AA3BA29A5441@windriver.com> Hi Detlef, TCP does not provide any mechanism to tell that all the data has been delivered to the application. TCP can only tell you that all the data has been received by the TCP at the remote end. The application itself on both ends of the connection will need to determine whether or not all the relevant data has been received by the application. That is really the only place where it can take place reliably. For example, even if TCP could let the other side know that the application has read all the data, that doesn't mean that the application has actually processed the data, or gotten it to stable storage, or whatever else it is doing with the data. TCP provides a reliable byte stream, it is up to the application to decide how to use the data that is transferred over that stream. -David Borman On Jan 11, 2007, at 4:07 AM, Detlef Bosau wrote: > I apologize if this is a stupid question, but from the RFC I > understood that the receiving socket sends an acknowledgement when > all data was received. That?s CLOSE ACK. > > Is there an explicit acknowledgement which tell?s the sender that > all data has been delivered to the _application_? Can this even be > achieved in finite time? An application may crash or hang! > > To my understanding the Fin/FinACK/..... is to shut down a TCP > connection knowing about the two army problem and the fact that > this cannot be solved in finite time? > > I apologizse when this is a misconception... > > Detlef > From touch at ISI.EDU Thu Jan 11 10:10:11 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jan 2007 10:10:11 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A622C8.1080707@web.de> References: <032EC4F75A527A4FA58C5B1B5DECFBB301F24A11@KC-MSX1.kc.umkc.edu> <459E2CF4.6030701@web.de> <459E7F22.2030907@isi.edu> <459E8EA2.4010000@web.de> <459E8F56.9070101@isi.edu> <459EB8F8.4060304@web.de> <45A55D23.6080505@isi.edu> <45A576EB.206@web.de> <45A57D7A.6030505@isi.edu> <45A622C8.1080707@web.de> Message-ID: <45A67D83.200@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: ... >> Case in point - if the intermediary ACKs data and continues to buffer >> it, and the window wraps, and then the intermediary goes down, the >> endpoints think the data reached the buffer correctly but it really >> did not. > > Of course. But, assumed we can overcome the window wrap problem, to my > understanding spoofing boxes must not spoof CLOSE ACK, so that the > sender is not notified that all data has reached the final receiver > until this happens. > > Of course, we don?t know anything of intermediate states and of course > we run into a problem if a spoofing box fails. Right - that's the other case where problems occur. >>> What we do not know is whether the data has reached >>> the application. >>> >> >> TCP is a reliable transport protocol; it is not a reliable application >> protocol. Actions outside of TCP are not ensured by TCP. >> >> > Fine :-) > > Then one could even argue with an end to end argument: When a tranport > protocol cannot assure that sent data has been read successfully by the > receiving application, we do need an acknowledgement scheme at the > application level anyway. The E2E argument applies to the ends in question. In this case, the transport protocol is the endpoint, not the application. ... > All I want to say is that there may be situations, e.g. extremely large > delay bandwidth products where one perhaps really wants to have an > alternative to AIMD probing to have an acceptable startup behaviour, Agreed. > where proxies / splitters / spoofing boxes should be considered very > seriously. I do NOT agree with that conclusion. If you want to change AIMD, proceed as Lars suggested in a separate post - within TCP or within a link- or network-consistent PEP, or within an application-visible proxy. You may need to read the transport packets to implement such a PEP, but you should not (MUST NOT, actually) need to spoof transport packets to accomplish the result. ... >>> To my understanding that?s one reason why we use >>> acknowledgements on application level when it is necessary to know >>> whether an application has received all data. >> >> Agreed, but we do know some other things. As a *receiver*, when we issue >> a CLOSE, we keep reading until there is no more data. If we do so, AND >> we receive a "no more data", then we *know* all the data has been >> received correctly. > > O.k., so we can detect an error: The sender sent a CLOSE and there is > trailing data afterwards. In that case (I don?t know what the RFCs say > here) we can issue an error message , e.g. a RST. There are two cases: - the sender still has data to send already buffered in the socket TCP won't ACK the received FIN in that case - the sender has emptied its buffer and ACK'd the FIN TCP won't accept further SEND calls (socket writes) from the sending application in that case > So, let?s take the > sender?s view then: How long shall a sender wait for a possible error > message like that? Doesn?t this lead to the problem that a missing NAK > is not equivalent to an ACK? The sender can wait all it wants. All it will ever know is that the receiving TCP has correctly received the data; it needs a separate signal from the application to know it has actually been read. Otherwise, both TCPs could close with data in the receive buffers and if their corresponding applications die, the data is just lost. >> I.e., the semantics of who knows what are receiver-driven, not sender. > > However, in a reliable connection the sender wants to know when all data > has been completely delivered. Not necessarily. The sender wants to know that IF the data is received, it is received correctly, and that IF the receiver thinks it has all the data, then it will be correct. There's nothing in TCP semantics that ensure that the sender knows anything other than that the receiving TCP has accepted all the data correctly, though. All that knowledge stops at the TCP layer. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/9a5838ee/signature.bin From touch at ISI.EDU Thu Jan 11 10:13:13 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jan 2007 10:13:13 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45A67E39.4000308@isi.edu> Agarwal, Anil wrote: > Joe Touch wrote - >>> >>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a >>> receiver issues a CLOSE ACK, we know that all data has reached the >>> receiving socket. > >> We should know that. But when we have intermidiates spoofing ACKs, all >> we know is that the two endpoints agree that they have closed. The data >> itself is not known. > >> Case in point - if the intermediary ACKs data and continues to buffer >> it, and the window wraps, and then the intermediary goes down, the >> endpoints think the data reached the buffer correctly but it really > did not. > > Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes > of sender > data without delivering any to the receive end-point, then goes down, and > the end-points continue the connection using the wrapped > sequence number, which in this case match up just right, so that the > intervening > 2^32 bytes disappear down a black hole, without the sender or receive > being any wiser? Yes. The system can wrap without things matching up _exactly_, depending on how big the CWND is, though, so this isn't as absurdly specific as it appears at first glance. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/60454759/signature.bin From touch at ISI.EDU Thu Jan 11 10:14:21 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jan 2007 10:14:21 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45A67E7D.4010609@isi.edu> PS - this could also happen within a single CWND, e.g., if the network path temporarily shifts around the TCP-splitter. It doesn't require an entire window wrap to occur. Joe Agarwal, Anil wrote: > Joe Touch wrote - >>> >>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a >>> receiver issues a CLOSE ACK, we know that all data has reached the >>> receiving socket. > >> We should know that. But when we have intermidiates spoofing ACKs, all >> we know is that the two endpoints agree that they have closed. The data >> itself is not known. > >> Case in point - if the intermediary ACKs data and continues to buffer >> it, and the window wraps, and then the intermediary goes down, the >> endpoints think the data reached the buffer correctly but it really > did not. > > Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes > of sender > data without delivering any to the receive end-point, then goes down, and > the end-points continue the connection using the wrapped > sequence number, which in this case match up just right, so that the > intervening > 2^32 bytes disappear down a black hole, without the sender or receive > being any wiser? > > Cheers, > Anil > ------------------ > Anil Agarwal > ViaSat Inc. > > > > -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/078af0b4/signature.bin From touch at ISI.EDU Thu Jan 11 16:05:11 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jan 2007 16:05:11 -0800 Subject: [e2e] FYI - update to list policy and operation Message-ID: <45A6D0B7.5040207@isi.edu> Hi, all, Just a quick note that the list is under new management, at least officially. I'm still running it day-to-day; that hasn't changed (sorry ;-) Some of this info is new, and some may be repeated... Karen Sollins and Craig Partridge are now chairs of the IRTF E2E WG, which is the owner of this list. http://www.irtf.org/charter?gtype=rg&group=end2end I have taken over Bob Braden's role as primary POC for queries about the mailing list and requests to post. PS - I encourage you to thank Bob either personally or on this list; running things, both at the IRTF, and this list, are often otherwise thankless jobs, and he deserves the primary credit for this list's prosperity). The list posting policy has been updated to explain how CFPs are gated more clearly, as follows: * have a primary focus on E2E issues * focus on research discussion * be open to all participants (space permitting) See also the following: http://www.postel.org/e2e.htm http://www.postel.org/mailman/listinfo/end2end-interest This is the spirit in which the recent DARPA and NSF posts have been encouraged, and we further encourage all organizations - national, industrial, academic, and other - to post relevant *open* calls for papers and participation to this list. Posts of calls for papers/participation which have non-space restrictions, proposals for funding, and previously prohibited items (job advertisements, job solicitations, and book announcements) are not permitted under the current policy. I hope this information is useful. Please feel free to contact me directly if you have any questions, either about this list in general or about a specific post. Thanks, Joe (as list admin) -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070111/1f64877d/signature.bin From michael.welzl at uibk.ac.at Fri Jan 12 10:15:40 2007 From: michael.welzl at uibk.ac.at (Michael Welzl) Date: Fri, 12 Jan 2007 19:15:40 +0100 Subject: [e2e] ICCRG meeting agenda (12/13 Feb @ ISI) Message-ID: <006e01c73675$ab2e72d0$0200a8c0@fun> Dear all, As Lars mentioned in his previous message to the list, ICCRG will have a meeting which is co-located with Pfldnet 2007, and would welcome your participation. The agenda can be found at the bottom of this email. It seems to be quite stable now, but if any (minor) changes need to be made, note that they will be announced in the ICCRG list only. Additionally, the most up-to-date version of the agenda is always available at: http://www1.tools.ietf.org/group/irtf/trac/wiki/Agenda (as you can see, Monday afternoon is dedicated to the "how-to-cope-with-all-these-different-TCP's-out-there" issue) and logistics details can be found at: http://www1.tools.ietf.org/group/irtf/trac/wiki/Logistics Please send an email to Alba Regalado-Palacios ( alba at isi.edu ) if you'd like to participate so that we can do a head count. I hope to see you there! Cheers, Michael ============================================== Monday 12. 2. 2007: ------------------- 08:30 - 09:00 Light breakfast 09:00 - 09:15 Welcome and agenda bashing 09:15 - 09:30 Michael Welzl: The current state of ICCRG 09:30 - 10:00 Keshav: What is congestion and what is congestion control 10:00 - 10:45 Jeremy Mineweaser: Congestion control in the Global Information Grid (GIG) 10:45 - 11:00 Break 11:00 - 11:45 Tom Phelan: DCCP, TFRC and Open Problems in Congestion Control for Media Applications 11:45 - 12:15 Lachlan Andrew: Rate control with packet corruption 12:15 - 13:45 Lunch 13:45 - 15:15 Lars Eggert: The role of the IETF/IAB/ICCRG for already deployed non-standard TCP CC 15:15 - 15:30 Break 15:30 - 18:00 Discussion: What should the ICCRG be doing? Tuesday 13. 2. 2007: -------------------- 08:30 - 09:00 Light breakfast 09:00 - 09:45 K. K. Ramakrishnan: LT-TCP: Loss Tolerant TCP 09:45 - 10:30 Ted Faber and Eric Coe: Congestion Control with Explicit Feedback (XCP implementation experiences (Eric), and potential for incremental deployment (Ted)) 10:30 - 10:45 Break 10:45 - 11:30 Doan B. Hoang: FICC-DiffServ: using CC as a QoS element 11:30 - 12:15 Bob Briscoe: Flow Rate Fairness: Dismantling a Religion 12:15 Open discussion: Next steps: meetings, docs, etc From gorinsky at arl.wustl.edu Fri Jan 12 14:54:49 2007 From: gorinsky at arl.wustl.edu (Sergey Gorinsky) Date: Fri, 12 Jan 2007 16:54:49 -0600 (CST) Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: Message-ID: Vadim, > How hard it is to turn the Fair Queueing knob to "on" on the gateways? To put my 2 kopecks in... First, since an application can masquerade as multiple flows, fairness enforcement with FQ is not effective. To lend itself to meaningful enforcement, fairness should be defined not in terms of flows or even hosts/processes generating them. Instead, fairness should be linked to humans behind the communications but this requires a very different network architecture. Second, packet-by-packet FQ and end-to-end TCP strive to approximate instantaneous PS (Processor Sharing) which is not a good fit for any natural application. Multimedia streams need a minimal rate, not a fair share. Elastic applications are not well served by PS either because average message delay is much larger than under SRPT (Shortest Remaining Processing Time), in agreement with Internet experiences where deviations from short-term fair sharing improves overall efficiency. While minimizing the average message delay, SRPT also might starve large messages. However, one can have it both ways: the rich can get richer without making the poor poorer. ViFi (Virtual Finish Time First), which schedules messages preemptively in the order of their finish times under PS, is close to SRPT (and much better than PS) with respect to the average message delay and guarantees that no message is delivered later than under PS. You can read more on ViFi in: S. Gorinsky and N. S. V. Rao, "Dedicated Channels as an Optimal Network Support for Effective Transfer of Massive Data", Proceedings of High-Speed Networking (HSN 2006), April 2006. The paper and respective simulation suite are available at: http://www.arl.wustl.edu/~gorinsky/pdf/HSN_2006_dedicated_fairness.pdf http://www.arl.wustl.edu/~gorinsky/ViFi/ In the context of web servers, ViFi was independently proposed under a name of FSP (Fair Sojourn Protocol): E. J. Friedman and S. G. Henderson, "Fairness and Efficiency in Web Server Protocols", Proceedings of ACM SIGMETRICS 2003, June 2003, available through: http://portal.acm.org/ft_gateway.cfm?id=781056&type=pdf&coll=GUIDE&dl=ACM&CFID=8911549&CFTOKEN=92127204 Thank you, Sergey From detlef.bosau at web.de Fri Jan 12 16:09:07 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 13 Jan 2007 01:09:07 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A67E7D.4010609@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> Message-ID: <45A82323.30405@web.de> Joe Touch wrote: > PS - this could also happen within a single CWND, e.g., if the network > path temporarily shifts around the TCP-splitter. It doesn't require an > entire window wrap to occur. > > Joe > > Two remarks. First. The only scenrios where I see a justification / necessity for doing splitting or spoofing are scenarios where the TCP flow must pass the split box / spoofing box / PEP anyway. These are scenarios without path redundancy or path transparency. Hence, in these scenarios the path cannot temporarily shift around the splitter because no alternative path exist. If we want redundancy in those scenarios, we have to consider hot stand by nodes for splitters which keep flow states and any other data which is "hard" and cannot be recovered synchronously with the backed up system. To be not misunderstood: I don?t want to make restrictions for the benefit of a splitter. I think in scenarios where an alternative path to a splitter exist, a splitter must not be used. In my opinion splitters are to be used with maximum care and only in exceptional cases where any known alternative is worse than a splitter. Second. To my understanding we can avoid wrap around problems by having the receiver window sufficiently small. And of course, a splitter does flow control by itself. I?m not convinced that it is necessary to have 4 GByte of unacknowledged data in the fly in all networks. And _if_ we need windows of this size we can reconsider the length of our sequence numbers. But at least for terrestrial networks 4 GByte of data in transit seems extremely large to me. WRT mobile networks: I?m still looking for material about delivery times and their distributions. It?s absolutely not necessary to have accurate quantitative data here. But it would be helpful to know wether we have to cope with delay spikes of up to 1 second, up to 10 seconds or up to 10 minutes and whether these happen once a minute, once a day or once a year. When we encounter maximum delay spikes of 1 second not more than once a decade, the best idea is to simply ignore these. If a (wireless) link offers 1 Gbps throghput and is blocked each other minute, the situation might be somewhat different. Particularly, and perhaps Anil could help me there, I want to get an idea what is already done by NOs and where research is necessary. As I said before, I presume that there happen quite a lot of things which are not publicly documented. This can lead first to duplicate research. And second this can lead to "strange" problems where protocols and applications do not work for unkown reasons, and the real problem is that there is some strange middlebox in use which does one of these neat "company confidential" or "non disclosure" algorithms. Obscure middleboxes can render our whole work completely worthless because they can cause problems no one can solve. Particularly, I eventually want to understand the problems TCP and other protocols encounter on mobile links - and afterwards I can take a position how these can be solved. As I said before, much of the literature in this context appears quite obscure to me. It simply makes no sense to e.g. talk about a splitter and its benefits in a mobile network when it is yet unclear whether a splitter is even necessary. Detlef > Agarwal, Anil wrote: > >> Joe Touch wrote - >> >>>> Well, it?s just how I understand the semantics of a "CLOSE ACK". When a >>>> receiver issues a CLOSE ACK, we know that all data has reached the >>>> receiving socket. >>>> >>> We should know that. But when we have intermidiates spoofing ACKs, all >>> we know is that the two endpoints agree that they have closed. The data >>> itself is not known. >>> >>> Case in point - if the intermediary ACKs data and continues to buffer >>> it, and the window wraps, and then the intermediary goes down, the >>> endpoints think the data reached the buffer correctly but it really >>> >> did not. >> >> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes >> of sender >> data without delivering any to the receive end-point, then goes down, and >> the end-points continue the connection using the wrapped >> sequence number, which in this case match up just right, so that the >> intervening >> 2^32 bytes disappear down a black hole, without the sender or receive >> being any wiser? >> >> Cheers, >> Anil >> ------------------ >> Anil Agarwal >> ViaSat Inc. >> >> >> >> >> > > From jtw at ISI.EDU Fri Jan 12 16:15:07 2007 From: jtw at ISI.EDU (John Wroclawski) Date: Fri, 12 Jan 2007 16:15:07 -0800 Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: References: Message-ID: At 4:54 PM -0600 1/12/07, Sergey Gorinsky wrote: > Vadim, > >> How hard it is to turn the Fair Queueing knob to "on" on the gateways? > > To put my 2 kopecks in... First, since an application can masquerade as >multiple flows, fairness enforcement with FQ is not effective. To lend >itself to meaningful enforcement, fairness should be defined not in terms >of flows or even hosts/processes generating them. Instead, fairness >should be linked to humans behind the communications but this requires >a very different network architecture. Along these lines folks might want to read Bob Briscoe's internet draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), IETF Internet-Draft , can be found in many formats at http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis john From avg at kotovnik.com Fri Jan 12 17:38:58 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Fri, 12 Jan 2007 17:38:58 -0800 (PST) Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: Message-ID: On Fri, 12 Jan 2007, Sergey Gorinsky wrote: > > Vadim, > > > How hard it is to turn the Fair Queueing knob to "on" on the gateways? > > To put my 2 kopecks in... First, since an application can masquerade as > multiple flows, fairness enforcement with FQ is not effective. Doing FQ on src/dst addresses (not on address+ports) flows will be a lot better than per-flow fairness of TCP in any case. > To lend > itself to meaningful enforcement, fairness should be defined not in terms > of flows or even hosts/processes generating them. Instead, fairness > should be linked to humans behind the communications but this requires > a very different network architecture. That is pretty much what I'm saying. The fairness is an economic, not technical concept. Basically, I'd venture to guess that share of network capacity allocated should be roughly proportional to the payments. Meaning that routing system should be augmented with some way to announce weights for the fairness enforcement. > Second, packet-by-packet FQ and end-to-end TCP strive to approximate > instantaneous PS (Processor Sharing) which is not a good fit for any > natural application. Multimedia streams need a minimal rate, not a fair > share. This a common misconception. The multimedia streams are either pre-recorded, lag-insensitive content, in which case they are, basically, file transfers (that accounts for 99% of the "streams", incidentally); or a real-time content which is quite elastic in bandwidth requirements - especially video (audio bandwidth is not an issue nowadays, anyway). You can reduce frame rate, reduce color & luminosity bit depth, reduce horizontal & vertical resolution, or just increase compression - for a TV quality stream that yields two orders of magnitude acceptable degradation bandwidth-wise. This is more than you can typically get from the TCP congestion control; and more than the common bandwidth oversubscription ratio is. What can be done for real-time streams is doing deadline scheduling on the output queues - and tossing away packets which are past deadline. That'd require accurate timing (in ms resolution) on gateways, but it's quite doable. Instead of the bandwidth reservation nonsense (which screws up dynamic routing) we'd be much better served by the introduction of a millisecond-resolution TTL field from seconds to milliseconds. Or simply adding a bit which changes meaning of TTL field from hops/seconds to milliseconds (255 ms one way should be enough, I guess, at least for this plane :) - it will also be backwards compatible with the existing gateways. > Elastic applications are not well served by PS either because > average message delay is much larger than under SRPT (Shortest Remaining > Processing Time), in agreement with Internet experiences where deviations > from short-term fair sharing improves overall efficiency. Yep. But you still need to enforce fairness between *users*. So it must be some combination of FQ for end-points and deadline packet order and drop scheduling. Thanks for the references! I'm not insisting on FQ being the best way to do things - it's just that it is already implemented and addresses most obvious problems: short sessions, parallel session cheats, point-origin flooding, etc - including overly-aggressive or poorly tested TCP stacks, which was the point of the original discussion. --vadim From avg at kotovnik.com Fri Jan 12 17:57:46 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Fri, 12 Jan 2007 17:57:46 -0800 (PST) Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: Message-ID: On Fri, 12 Jan 2007, John Wroclawski wrote: > Along these lines folks might want to read Bob Briscoe's internet > draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), > IETF Internet-Draft , can be found > in many formats at > http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis Pretty much my point all along. I have one issue with the paper, though -- they advocate fairness based on "the costs, not benefits". That shows that they didn't really thought about economics. The real-life enterprises (such as ISPs) maximise profit, so they are interested in giving most profitable customers a bigger share and penalize less profitable customers - thus creating incentive to pay more for the better performance. The cost-based allocation does not work economically, as it encourages incurring higher costs in order to obtain higher benefits. --vadim From Jon.Crowcroft at cl.cam.ac.uk Sat Jan 13 02:43:27 2007 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sat, 13 Jan 2007 10:43:27 +0000 Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: Message from Vadim Antonov of "Fri, 12 Jan 2007 17:57:46 PST." Message-ID: yes indeed... the use of congestion avoidance is to avoid congestion at shared points in the net, which _should_ be an unusual occurrance (by definition) but: most nets should be designed for the expected traffic load with some design headroom for variance. when i get x million DSL users, I _know_ what the rate is that they are limited to at their access line. I can build a core for that, and i can peer or connect to upstream tiers with capacity in line with that. i can also engineer for p2p traffic. the congestion experienced often by academics and researchers and people working for techy geeky companies is because they don't have a designed network and don't pay proper prices for impedence matched networks - we have a 10Gbps access line to the internet in cambridge - many departmtnts have gigE attachments to the net - TCP encourages (doesnt avoid) congestion:) in this scenario. but it isnt the scenario experienced by the great unwashed public Internet users. bob briscoe's note is good reading, but it talks to the situation where people have way faster access lines than their mean access use. this isn't the dominant situation in the commercial net where DSL (and cable ) broadband access can be co-designed/dimensioned with the core net provisioning. in this case, customers get what they pay for (If i want 20Mbps DSL, I can get it easily, but i pay more than i do for 8Mbps or for 384kbps all in line with the ISP maximising profit subject to competition with Other ISPs controlling them meeting the users utility/satisfaction - this is all discussed in Kelly's work quite nicely actually, if TCP used ECN right (again, other bob briscoe work) it would just be achinveing the same as this argument (kelly etal) but just on a shorter timescale if ECN pricing was done right (see re-feedback etc etc) but if you strive for "fairness" in the current world, you are striving for an illusion In missive , Vadim Antonov typ ed: >>On Fri, 12 Jan 2007, John Wroclawski wrote: >> >>> Along these lines folks might want to read Bob Briscoe's internet >>> draft "Flow Rate Fairness: Dismantling a Religion", Bob Briscoe (BT), >>> IETF Internet-Draft , can be found >>> in many formats at >>> http://www.cs.ucl.ac.uk/staff/bbriscoe/pubs.html#rateFairDis >> >>Pretty much my point all along. >> >>I have one issue with the paper, though -- they advocate fairness based on >>"the costs, not benefits". That shows that they didn't really thought >>about economics. The real-life enterprises (such as ISPs) maximise profit, >>so they are interested in giving most profitable customers a bigger share >>and penalize less profitable customers - thus creating incentive to pay >>more for the better performance. >> >>The cost-based allocation does not work economically, as it encourages >>incurring higher costs in order to obtain higher benefits. >> >>--vadim >> cheers jon From avg at kotovnik.com Sat Jan 13 16:47:04 2007 From: avg at kotovnik.com (Vadim Antonov) Date: Sat, 13 Jan 2007 16:47:04 -0800 (PST) Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: Message-ID: On Sat, 13 Jan 2007, Dado Colussi wrote: > I'm not sure I understand your point. Bob's paper describes > mechanisms to create an economic regulatory system where individuals > are required to pay extra for their unsocial behavior of causing > congestion. It is only a part of the economic landscape ISPs and > other entities operate and it is akin to the cost of causing > greenhouse gas emissions in real world. I don't see why ISPs couldn't > adjust congestion prices per customer in order to drive their system > to a resource allocation that would maximize profit in times of > congestion? > > Dado Dado - ISPs are not interested in reducing amount of traffic; quite opposite. It is their product, and as any producer they are interested in increasing volume - if you remember Econ 101, in the long term the profitability of all kinds of businesses tends to converge to the same norm. (Business segments with higher-than-average ROI attract more invenstments - and competition, thus reducing profitability; underperforming segments lose capital and consequently have less competitive pressure, thus allowing increase in profitability). In the established markets, where the initial period of rapid growth (on the S-curve) is over, the only sustainable way to make more money and increase value of business shares is to increase volume. So it makes no sense for ISPs whatsoever to penalize users for causing congestion (thus reducing the demand). Instead, they want to encourage users to pay more for bigger share of the network resources - the congestion is their friend, if they can differentiate service (who would pay for premium service when regular service is quite good?) Also, congested network is the network operating at full capacity - meaning that there is no overinvestment. If a provider has underloaded network it, basically, means that its business people made a mistake and overinvested (driving ROI - and share prices - lower). --vadim From Anil.Agarwal at viasat.com Sat Jan 13 17:17:09 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Sat, 13 Jan 2007 20:17:09 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E39.4000308@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com> Joe Touch wrote - >> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes >> of sender >> data without delivering any to the receive end-point, then goes down, and >> the end-points continue the connection using the wrapped >> sequence number, which in this case match up just right, so that the >> intervening >> 2^32 bytes disappear down a black hole, without the sender or receive >> being any wiser? >> > Yes. The system can wrap without things matching up _exactly_, depending > on how big the CWND is, though, so this isn't as absurdly specific as it > appears at first glance. I think, this scenario will occur if the TCP-splitters buffer x bytes of undelivered data, the sender cwnd is y, x < 2^32 and x+y >= 2^32, the splitters go down and packets flow between the sender and receiver over an alternate path. In this case, the receiver rcv.nxt value is within the sender [snd.nxt , snd.nxt + cwnd) range and hence the receiver will acknowledge sequence number rcv.nxt and accept data beyond it; the sender will gladly accept the acknowledgement and continue sending data. Now, we know y < 2^30 a constraint required and imposed by TCP (rfc 1323). We can claim that a (good) TCP splitter ensures that x < 2^31 Detlef - this is not as easy as it might first appear, especially since data can get buffered at the sending or receiving TCP-splitter in a two-splitter case, but it can be (is) done. Hence, x + y < 2^32 and the above scenario will not occur. The above also requires that TCP-splitters use the same ISS (Inital Sequence Number) with the receiver as the one used by the sender. A good TCP-splitter should (does). Anil ----------- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/e143068c/attachment.html From Anil.Agarwal at viasat.com Sat Jan 13 17:33:35 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Sat, 13 Jan 2007 20:33:35 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E39.4000308@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3594@VGAEXCH01.hq.corp.viasat.com> This is an re-send of the previous email with some additional info towards the end. Joe Touch wrote - >> Are you describing a scenario where a TCP-Splitter buffers up 2^32 bytes >> of sender >> data without delivering any to the receive end-point, then goes down, and >> the end-points continue the connection using the wrapped >> sequence number, which in this case match up just right, so that the >> intervening >> 2^32 bytes disappear down a black hole, without the sender or receive >> being any wiser? >> > Yes. The system can wrap without things matching up _exactly_, depending > on how big the CWND is, though, so this isn't as absurdly specific as it > appears at first glance. I think, this scenario will occur if the TCP-splitters buffer x bytes of undelivered data, the sender cwnd is y, x < 2^32 and x+y >= 2^32, the splitters go down and packets flow between the sender and receiver over an alternate path. In this case, the receiver rcv.nxt value is within the sender [snd.nxt , snd.nxt + cwnd) range and hence the receiver will acknowledge sequence number rcv.nxt and accept data beyond it; the sender will gladly accept the acknowledgement and continue sending data. Now, we know y < 2^30 a constraint required and imposed by TCP (rfc 1323). We can claim that a (good) TCP splitter ensures that x < 2^31 Detlef - this is not as easy as it might first appear, especially since data can get buffered at the sending or receiving TCP-splitter in a two-splitter case, but it can be (is) done. Hence, x + y < 2^32 and the above scenario will not occur. The above also requires that TCP-splitters use the same ISS (Inital Sequence Number) with the receiver as the one used by the sender. A good TCP-splitter should (does). Also, with a good TCP-splitter, x does not reach 2^31 -1 simply because of a slow receiver or network and a fast sender. x is generally a function of the bandwidth*RTT value and the number of TCP connections using the bottleneck and a good TCP-splitter flow-controls the sender to enforce x. As an aside, I would love to see the day when CWND < 2^30 becomes too limiting and we need to raise the limit :) Anil ----------- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/fe7a1ef5/attachment.html From Anil.Agarwal at viasat.com Sat Jan 13 17:45:38 2007 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Sat, 13 Jan 2007 20:45:38 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com> Joe Touch wrote: > PS - this could also happen within a single CWND, e.g., if the network > path temporarily shifts around the TCP-splitter. It doesn't require an > entire window wrap to occur. Joe - I am not able to think of a scenario similar to what you describe above where a network with TCP-splitters causes undetected loss of data or delivery of incorrect data. I will appreciate if you can describe what you are thinking in some more detail. Thanks, Anil ---------- Anil Agarwal ViaSat Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070113/f3dc6772/attachment-0001.html From detlef.bosau at web.de Sun Jan 14 04:00:29 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 14 Jan 2007 13:00:29 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E39.4000308@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3593@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45AA1B5D.2040805@web.de> Agarwal, Anil wrote: > > > I think, this scenario will occur if > the TCP-splitters buffer x bytes of undelivered data, > the sender cwnd is y, > x < 2^32 and x+y >= 2^32, > the splitters go down and > packets flow between the sender and receiver over an alternate path. If a splitter goes down, all data which is acknowledged by the splitter but not yet delivered to the sender is unreceoverably lost. In addition, sequence numbers are flow specific, so when a splitter goes down and the flow takes an alternate path the ack-numbers received by the sender are completely undefined as they stem from a different TCP flow. So, Joe is right here when he says that end to end semantics on the connection level are destroyed and we cannot recover from a failure of the splitter. However, I don?t know whether there are hot stand by architectures available or at least possible in some cases where a backup can replace a failed split box. But such an architecture would at least require a one to one copy of any flow specific state data to be available at the split box and each of its backup systems as well. > > Detlef - this is not as easy as it might first appear, especially > since data can get buffered at the sending or > receiving TCP-splitter in a two-splitter case, > but it can be (is) done. Are there papers available on this? > > Hence, x + y < 2^32 and the above scenario will not occur. > > The above also requires that TCP-splitters use > the same ISS (Inital Sequence Number) with the receiver > as the one used by the sender. > A good TCP-splitter should (does). This is one issue I refered to above. Particularly on this one: I admittedly have no idea how sequence numbers are frozen. It would be sufficient to freeze sequence numers wrt to a certain address quadrupble - however this is somewhat arduous. So I can imagine (perhaps someone can tell me) that a TCP sender simply freezes every used sequence number for some period of time and does not consider the address quadruple. In that case, I think exact spoofing of sequence numbers can be difficult? Detlef From Jon.Crowcroft at cl.cam.ac.uk Sun Jan 14 06:20:00 2007 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sun, 14 Jan 2007 14:20:00 +0000 Subject: [e2e] any source unicast Message-ID: On Any Source Unicast... So i just caved in and bought a digital flatscreen TV - it is quite nice in terms of multimedia, BUT i now see why the pressure is on to drive the world to digital television - nothing to do with display technology (any how, purists will tell you that CRTs are still, just about, the crispest pound for pound). What Digital TV does is lock you in to a (hammer style cheap) telecom circuit nightmare reality of the 1970s. With an analogue transmission&receiver, you basically live in an any-to-any world, and all the devices switch everything through ether-like (like multicast too:). in the digital world, you can plug in what is, au fond, an analogue signal into the first hop, but once its tuned and decoded that flow, it wont deliver you any other flow off the "air". This means that to watch channel X while recording channel Y (a perfectly legal thing to do in the UK), you have to have 2 digital receivers (once the legacy analogue channels go away) - and worse, in our house there are 6 people, 3 or 4 TVs, 3 or 4 VCRs, 4 or 5 DVD capable devices (if i include games consoles) - interconnecting all of these locks up virtual circuit resources, and ties you down to a real limited set of scenarios without some massive switch - they didn't even learn the ATM lesson of having virtual paths to put virtual channels in. But worse is to come: Once you connect up some HD devices, you find HDMI supports a thing (not completely standardised yet) called HDCP (high definition copy protection) - this means that you cannot put in anything in the path (e.g. a T-connector) to copy something you are viewing; _everything_ has to be point-to-point authorised ... and coz it aint all standard yet, some things don't interwork - [interestingly, amusingly, some display devices let you disable HDCP - a bit like the DVD players that turn of region control:) - basically, in a free market, someone is gonna work around this godawful stupidity, then everyone will eventually follow.] Why is this relevant end-to-end? well this ought to be obvious, but if not, let me spell it out ... A lot of Next Gen Internet ideas are being cast about right now. Some of them have the flavour of circuits. This is a foul and bitter flavour, and should be resisted at all costs, since it reduces the net value of an interconnect, increases its costs, and massively reduces its flexibility (and indeed, reduces everything to the lowest common denominator technology, and locks you in there til kingdm come). In wireless network research, a lot of folks seem (thankfully) to be going the _opposite_ direction, with a more open, many-to-many, multiradio, mesh/community/adhoc/dtn thousand flowers (flow pun intended:) flourishing - indeed network coding, (just for 1 example) means you _have_ to allow in-net copy!). but the whole "triple play" by telecos to pull TV, Telephone and Internet into one box, seems to be more and more predicated on a fundamental misdirection of the world. this is not about QoS - this is about lockin. happy new year jon p.s. for the less mad: see -> -------------------------------------------------------------------- The Second International Workshop on Mobility in the Evolving Internet Architecture (MobiArch 2007) Kyoto, Japan, August 27, 2007 (to be held with ACM SIGCOMM 2007, August 27-31, 2007 http://user.informatik.uni-goettingen.de/~mobiarch/2007 -------------------------------------------------------------------- With the recent development of technologies in wireless access and mobile devices, user, terminal, and network mobility has become an indispensable component of today's Internet vision, and it is likely to continue in the near future, while affecting the whole architectural design of the future Internet. Yet, issues like efficient mobility management and optimization, locator-identifier split, multihoming, security, and related operational/deployment concerns are still in their early stages of development. Moreover, the Internet architecture, its end-to-end principles, and business models will require rethinking due to the massive penetration of mobility into the Internet. MobiArch'07 welcomes submissions, from both researchers and practitioners, in exploration of recent advances in architectures, protocols, and experiences with emerging technologies on wireless and mobility over the Internet, with an emphasis on wireless infrastructures and mobility patterns for mobility support, new mobility protocols, service discovery, routing and location management, mobile network performance evaluation and modeling, multi-homing, security, architectural impacts and deployment considerations. Topics of Interest: ================== Topics of MobiArch?07 cover all aspects of architectural issues and system support for wireless and mobility in the Internet, including but not limited to: - Impacts of new wireless technologies/services and mobility patterns on the Internet architecture - Architectures and protocols for mobility support in the Internet, ranging from approaches in link, network, transport to session/application layers and cross-layer design - Location management, positioning and data management systems for wireless and mobility - Routing and addressing, including locator/identifier split issues and their impacts to the Internet architecture - IP multihoming including flow distribution and load sharing for wireless and mobility - Performance evaluation, experimentation and modeling of mobility in the Internet - Accounting, access control, security and privacy issues and impacts to Internet architecture - Economic, scalability and deployment issues of mobility infrastructure design - Mechanisms and issues with connecting developing regions into the Internet Following the success of MobiArch'06, the MobiArch'07 workshop will be a single-track one-day workshop. Early stages, position papers, systems and measurement papers will be particularly welcome. The proceedings will be published by the ACM and ACM digital library. Submissions: =========== Submissions must be made to MobiArch'07 EDAS entry: http://edas.info/5238, following the guidelines in MobiArch'07 webpage: http://user.informatik.uni-goettingen.de/~mobiarch/2007 Important Dates: =============== Paper registration: March 20, 2007 Submission Deadline: March 27, 2007 Acceptance Notification: May 15, 2007 Camera-ready version due: June 12, 2007 Workshop: August 27, 2007 SIGCOMM Main Conference: August 27-31, 2007 PROGRAM CO-CHAIRS ================= Xiaoming Fu, University of Goettingen (Germany) Katherine Guo, Bell Labs (USA) Sue Moon, KAIST (Korea) Ryuji Wakikawa, Keio University (Japan) PUBLICITY CHAIR =============== Jon Crowcroft, U. Cambridge (UK) Please consult the Program Co-Chairs (mobiarch at informatik.uni-goettingen.de) if you are uncertain whether your paper falls within the scope of the workshop. From randy at psg.com Sun Jan 14 10:33:31 2007 From: randy at psg.com (Randy Bush) Date: Sun, 14 Jan 2007 08:33:31 -1000 Subject: [e2e] any source unicast References: Message-ID: <17834.30587.611439.898650@roam.psg.com> > [interestingly, amusingly, some display devices let you disable > HDCP - a bit like the DVD players that turn of region control:) - > basically, in a free market, someone is gonna work around this > godawful stupidity, then everyone will eventually follow.] the lawyers will follow. and the us congress will follow the industry lobbiests. and the other govts will follow the us congress. > Why is this relevant end-to-end? well this ought to be obvious, > but if not, let me spell it out ... bingo! > but the whole "triple play" by telecos to pull TV, Telephone and > Internet into one box, seems to be more and more predicated on a > fundamental misdirection of the world. > > this is not about QoS - this is about lockin. they do not see this as misdirection, quite the opposite. they see it as cleaning up the layer 8/9 disaster created by the free-for-all open connectivity of the pre-circuit internet. randy From Jon.Crowcroft at cl.cam.ac.uk Sun Jan 14 23:44:35 2007 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Mon, 15 Jan 2007 07:44:35 +0000 Subject: [e2e] any source unicast In-Reply-To: Message from Randy Bush of "Sun, 14 Jan 2007 08:33:31 -1000." <17834.30587.611439.898650@roam.psg.com> Message-ID: In missive <17834.30587.611439.898650 at roam.psg.com>, Randy Bush typed: >>> this is not about QoS - this is about lockin. >>they do not see this as misdirection, quite the opposite. they >>see it as cleaning up the layer 8/9 disaster created by the >>free-for-all open connectivity of the pre-circuit internet. yes, so thats what you get when you let lawyers re-design your net: 1/ a system that is exponentially less efficient than the copy net we now have. 2/ a system of intellectual property protection that maximises revenue for a small (oligopoly/cartel) number of palyers and actually reduces the overal profitability of the business of content 3/ a system that invents concepts of ownership that were made up recently and didnt exist for millenia when copying was actually valuble 4/ a system that didn't ake into account that : i) the internet is efficient ii) lack of copy protection in the net is a no-op - just as with any aspect of security, copy protection, if and only if you want it, is an end-to-end matter - preventing copying in a copy technology is an oxymoron iii) there is actually virtually no evidence that the fact of a near zero-cost copy system is actually harming content profits - most of the papers that look at scientific evidence on music, film, games and other content profits and internet or other based piracy (and I've read a lot) are at most, equivical, and many show the obvioius, that free copies are free advertising, and boost legit sales, provided the low copy cost is _passed on to the consumer_ - iv) the vandelism of this circuit technology is that it prevents the passing of this efficiency on to the commercvial consumer and prevents the non commercial consumer from free copies of things which were created for public good. the CTO of Time warner gave a talk here a couple of summers back where he was challenged about copy technologies (audio cassette, vhs, etc) - he said that people in the film business were NEVER against lower copy cost as they were in the content business, not in the pressing plastic and postal business - he looked forward to using video bittorrent so that film directos and publishers could take blockbuster AND netflix out of the equation. This technology flies in the face of that position, and I bet people like him will be very very annoyed when it sinks in where the digital tv lawyer-led lunacy is leading cheers jon p.s. end-to-end copy protection rather than hop-by-hop - think about it: you know it makes cents. From detlef.bosau at web.de Mon Jan 15 07:25:30 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 15 Jan 2007 16:25:30 +0100 Subject: [e2e] How often does congestion control react upon loss? Message-ID: <45AB9CEA.8020109@web.de> I apologize, if this is a beginner?s question. During simulations with large buffers I often have several "drop bursts" when a buffer overruns because the RTT is quite large (due to the large buffers) and therefore it takes some time for a receiver to detect a drop. Now I wonder (and I?m currently reading RFC 3517 to get this question answered but perhaps someone can help me here to understand this) whether it is possible to do several congestion actions in a round. To my understanding it is by all means possible (and makes sense) to get several sequence numbers in the scoreboard marked "lost" within a round. Of course, when a sequence number is marked lost, the sender has to retransmit the appropriate segment. However, is the congestion window halved each time a sequence number is recognized as "lost"? Or is there a limit, e.g. the congestion window is halved only once a round? Thanks. Detlef From francesco at net.infocom.uniroma1.it Tue Jan 16 06:50:58 2007 From: francesco at net.infocom.uniroma1.it (Francesco Vacirca) Date: Tue, 16 Jan 2007 15:50:58 +0100 Subject: [e2e] How often does congestion control react upon loss? In-Reply-To: <45AB9CEA.8020109@web.de> References: <45AB9CEA.8020109@web.de> Message-ID: <45ACE652.5090903@net.infocom.uniroma1.it> Detlef, When NewReno TCP and SACK TCP are in the Fast Recovery phase (after the Fast Retransmit) the congestion window is not halved if a further packet loss is detected. In the following the procedure used by NewReno: After the Fast Retransmit retransmission the TCP protocol enters in the Fast Recovery phase. This phase lasts till the ACK for the highest transmitted packet at the beginning of this phase (stored in the "recover" variable) is received. During this phase, for each additional duplicate ACK, the congestion window is incremented by one packet, to reflect the departure from the network of an additional segment. A new segment is transmitted if it is allowed by the congestion window and the receiver advertised window. When an ACK is received acknowledging new packets, but with sequence number lower than "recover" (partial ACK), the first unacknowledged packet is retransmitted, the congestion window is deflated by the amount of packet acknowledged by the partial ACK. The window deflation attempts to keep the congestion window at the level of the number of outstanding packets when the "Fast Recovery" phase ends. When an ACK acknowledging a packet greater or equal to "recover" is received, the congestion window is set to the value of ssthresh and TCP exits from the ?Fast Recovery? phase. Francesco Detlef Bosau wrote: > I apologize, if this is a beginner?s question. > > During simulations with large buffers I often have several "drop bursts" > when a buffer overruns because the RTT is quite large (due to the large > buffers) and therefore it takes some time for a receiver to detect a drop. > > Now I wonder (and I?m currently reading RFC 3517 to get this question > answered but perhaps someone can help me here to understand this) > whether it is possible to do several congestion actions in a round. To > my understanding it is by all means possible (and makes sense) to get > several sequence numbers in the scoreboard marked "lost" within a round. > Of course, when a sequence number is marked lost, the sender has to > retransmit the appropriate segment. > > However, is the congestion window halved each time a sequence number is > recognized as "lost"? Or is there a limit, e.g. the congestion window is > halved only once a round? > Thanks. > > Detlef > > From touch at ISI.EDU Tue Jan 16 09:05:40 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 16 Jan 2007 09:05:40 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45A82323.30405@web.de> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> Message-ID: <45AD05E4.5040200@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: >> PS - this could also happen within a single CWND, e.g., if the network >> path temporarily shifts around the TCP-splitter. It doesn't require an >> entire window wrap to occur. >> >> Joe >> >> > > Two remarks. > > First. > > The only scenrios where I see a justification / necessity for doing > splitting or spoofing are scenarios where the TCP flow must pass the > split box / spoofing box / PEP anyway. These are scenarios without path > redundancy or path transparency. Why are you so confident about the path, when you cannot control whether there is a PEP/spoofing box in it? ... > To be not misunderstood: I don?t want to make restrictions for the > benefit of a splitter. I think in scenarios where an alternative path to > a splitter exist, a splitter must not be used. Either the use of splitters is under your control or it is not. If it is, then there are a number of reasons to remove them, alternate paths are just one. If it is not, then you cannot make assumptions about the path. > In my opinion splitters > are to be used with maximum care and only in exceptional cases where any > known alternative is worse than a splitter. It would be interesting if you could explain a sample case. IMO, splitters just lie - they lie about being an endpoint they are not. Either you are lying to yourself (you own the endpoint you're lying to) or you're lying to others. The first is silly - just install a true application proxy - and the second is YOU making a decision for ME about what's more important. If I don't want to talk to a true proxy, you have no business tricking me into thinking I'm not. > Second. > > To my understanding we can avoid wrap around problems by having the > receiver window sufficiently small.... As I said, there are other cases where the splitter comes/goes, either because it is unreliable or due to multipath, that can cause silent data errors too. You can't know whether that will happen; all you DO know is that you'll mess up the data to the receiver. If you own the receiver, that's your decision. If not, then you're silently breaking TCP semantics. That's not worth any alternative. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070116/28c32421/signature.bin From touch at ISI.EDU Tue Jan 16 09:47:19 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 16 Jan 2007 09:47:19 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3595@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45AD0FA7.9060106@isi.edu> Agarwal, Anil wrote: > > Joe Touch wrote: >> PS - this could also happen within a single CWND, e.g., if the network >> path temporarily shifts around the TCP-splitter. It doesn't require an >> entire window wrap to occur. > > Joe - I am not able to think of a scenario similar to what you > describe above where a network with TCP-splitters causes > undetected loss of data or delivery of incorrect data. > > I will appreciate if you can describe what you are thinking in > some more detail. The scenario I was thinking of is when the splitter ACKs the data, and the source moves the window forward. In the meantime, the splitter has not yet sent the data to the receiver, and goes down. I'm not sure what would happen when the receiver has a hole in its window and the sender lacks the data to resend. This may cause a lockup, though wouldn't cause silent loss/corruption. There's also the way in which different MSS's could cause similar hiccups in congestion control. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070116/a81629dc/signature.bin From detlef.bosau at web.de Wed Jan 17 11:21:28 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 17 Jan 2007 20:21:28 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AD05E4.5040200@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> Message-ID: <45AE7738.6070701@web.de> Joe Touch wrote: >> >> First. >> >> The only scenrios where I see a justification / necessity for doing >> splitting or spoofing are scenarios where the TCP flow must pass the >> split box / spoofing box / PEP anyway. These are scenarios without path >> redundancy or path transparency. >> > > Why are you so confident about the path, when you cannot control whether > there is a PEP/spoofing box in it? > > Honestly, I don?t understand the question. I wrote: "The only scenrios where I see a justification / necessity for doing splitting or spoofing are scenarios where the TCP flow must pass the split box / spoofing box / PEP anyway. " In other words: I restrict the use of split boxes to scenarios where there is no other path. Either the flow passes the box - or the flow passes away. This _is_ a strong restriction. I don?t want to advocate split boxes etc., which are hard state by nature, as an optimal solution for any problem. I?m totally with you that nearly any alternative to a split box is better then a split box. I only want to concede that there may be situations where the use of a splitter should be considered. Practically spoken: If the word "splitter" appears in the abstract of a paper submission, please don?t reject it immediately. Please read at least the introduction ;-) > ... > >> To be not misunderstood: I don?t want to make restrictions for the >> benefit of a splitter. I think in scenarios where an alternative path to >> a splitter exist, a splitter must not be used. >> > > Either the use of splitters is under your control or it is not. > > From my assumptions / restrictions it clearly _is_. And if you feel more comfortable that way we perfectly can integrate some kind of option or switch in a mobile network?s UNI where the user has the choise whether a splitter shall be allowed or shall be forbidden. So the use of a splitter must not be transparent but explicitely granted / requested by a user. We have similar options for transcoders / WWW proxies in mobile networks here in Germany. IIRC, E-plus offers optional transcoders / application level PEP. > If it is, then there are a number of reasons to remove them, alternate > paths are just one. > > If it is not, then you cannot make assumptions about the path. > > Hm. Admittedly, I think we?re talking somewhat at cross-purposes here. I perfectly understand why you are strongly opposed against splitters and the reasons are compelling. However, when in a particular situation a splitter is the only yet known possibility e.g. to achieve acceptable throughput for a flow within a settling time of 10 seconds instead of 10 minutes ore more, then we should consider giving the user the option to allow splitting. >> In my opinion splitters >> are to be used with maximum care and only in exceptional cases where any >> known alternative is worse than a splitter. >> > > It would be interesting if you could explain a sample case. IIRC, Mark Allman has published some interesing work where he used splitters for satellite / deep space networks. To my understanding the major concern was the extremely large time TCP needs to fill the line here. I did not deal with TCP and extremely large line capacities too much yet. However, actually I do. It?s just your opposition to splitting which made me reconsider my paper on Path Tail Emulation and to redesign it that way that it relies only on pacing / spacing and does not assume / use splitting or spoofing. I?m not sure whether one is interested in the results. If so, I would be glad to discuss this. > IMO, > splitters just lie - they lie about being an endpoint they are not. > When I was a child, my mother occasionally sang a song, I don?t know where she got it from or if anybody know ist, "It?s a sin to tell a lie" And I don?t know (I never saw the text in a written form) whether this is a statement or a question. (According to the WWW, it?s a statement.) > Either you are lying to yourself (you own the endpoint you're lying to) > or you're lying to others. The first is silly - just install a true > application proxy - and the second is YOU making a decision for ME about > what's more important. If I don't want to talk to a true proxy, you have > no business tricking me into thinking I'm not. > > As I said: We can agree that splitters shall not be used transparently / without permission by the user. Detlef From touch at ISI.EDU Wed Jan 17 11:29:04 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 17 Jan 2007 11:29:04 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AE7738.6070701@web.de> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> Message-ID: <45AE7900.70400@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: >>> >>> First. >>> >>> The only scenrios where I see a justification / necessity for doing >>> splitting or spoofing are scenarios where the TCP flow must pass the >>> split box / spoofing box / PEP anyway. These are scenarios without path >>> redundancy or path transparency. >> >> Why are you so confident about the path, when you cannot control whether >> there is a PEP/spoofing box in it? >> > > Honestly, I don?t understand the question. > > I wrote: "The only scenrios where I see a justification / necessity for > doing > > splitting or spoofing are scenarios where the TCP flow must pass the > split box / spoofing box / PEP anyway. " > > In other words: I restrict the use of split boxes to scenarios where > there is no other path. Either the flow passes the box - or the flow > passes away. I do not agree that you have control over this restriction. ... > Practically spoken: If the word "splitter" appears in the abstract of a > paper submission, please don?t reject it immediately. Please read at > least the introduction ;-) A key aspect of such a review is whether the assumptions are realistic. I do not consider "control over path", as above, a realistic assumption for splitters you do not control. >>> To be not misunderstood: I don?t want to make restrictions for the >>> benefit of a splitter. I think in scenarios where an alternative path to >>> a splitter exist, a splitter must not be used. >>> >> >> Either the use of splitters is under your control or it is not. >> >> > > From my assumptions / restrictions it clearly _is_. And if you feel more > comfortable that way we perfectly can integrate some kind of option or > switch in a mobile network?s UNI where the user has the choise whether a > splitter shall be allowed or shall be forbidden. I don't believe this is useful. People who deploy splitters that are intended to be found, simply, do not - they deploy proxies. The whole point of a splitter is to be transparent - either for backward compatibility with devices that aren't capable of working with a proxy, or to deliberately hide their presence. > So the use of a > splitter must not be transparent but explicitely granted / requested by > a user. We have similar options for transcoders / WWW proxies in mobile > networks here in Germany. IIRC, E-plus offers optional transcoders / > application level PEP. As noted above, I don't believe this is a viable case for TCP splitters. ... > I perfectly understand why you are strongly opposed against splitters > and the reasons are compelling. However, when in a particular situation > a splitter is the only yet known possibility e.g. to achieve acceptable > throughput for a flow within a settling time of 10 seconds instead of 10 > minutes ore more, then we should consider giving the user the option to > allow splitting. It would be useful to show such a case. I do not believe there is a case where a splitter works where a proxy would not - or would not be more appropriate. >>> In my opinion splitters >>> are to be used with maximum care and only in exceptional cases where any >>> known alternative is worse than a splitter. >> >> It would be interesting if you could explain a sample case. > > IIRC, Mark Allman has published some interesing work where he used > splitters for satellite / deep space networks. > To my understanding the major concern was the extremely large time TCP > needs to fill the line here. In that case the splitter is either isomorphic to a proxy, or it is spoofing the sender into violating current TCP congestion profiles. It'd be useful for Mark to comment on this to clarify. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070117/082e4f87/signature.bin From detlef.bosau at web.de Wed Jan 17 12:50:12 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 17 Jan 2007 21:50:12 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AE7900.70400@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> Message-ID: <45AE8C04.1000900@web.de> Joe Touch wrote: >> >> In other words: I restrict the use of split boxes to scenarios where >> there is no other path. Either the flow passes the box - or the flow >> passes away. >> > > I do not agree that you have control over this restriction. > When a network operator places a splitting box into a base station for a mobile or an earth station for a satellite, why shouldn? t he have control over that? At least the network operator who does the technical design and implementation should have control over that. > ... > >> Practically spoken: If the word "splitter" appears in the abstract of a >> paper submission, please don?t reject it immediately. Please read at >> least the introduction ;-) >> > > A key aspect of such a review is whether the assumptions are realistic. > I do not consider "control over path", as above, a realistic assumption > for splitters you do not control. > > Absolutely. However, if we consider a mobile in a mobile wireless network the network?s infratstructure is completely under control by the network operator. Detlef From touch at ISI.EDU Wed Jan 17 12:51:55 2007 From: touch at ISI.EDU (Joe Touch) Date: Wed, 17 Jan 2007 12:51:55 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AE8C04.1000900@web.de> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <45AE8C04.1000900@web.de> Message-ID: <45AE8C6B.7040200@isi.edu> Detlef Bosau wrote: > Joe Touch wrote: >>> >>> In other words: I restrict the use of split boxes to scenarios where >>> there is no other path. Either the flow passes the box - or the flow >>> passes away. >>> >> >> I do not agree that you have control over this restriction. > > When a network operator places a splitting box into a base station for a > mobile or an earth station for a satellite, why shouldn? t he have > control over that? At least the network operator who does the technical > design and implementation should have control over that. Sure; if you're the exclusive path to the rest of the net, that's true. But you still haven't explained how such a splitter would help better than a non-spoofing PEP or a proxy, or why you need a splitter instead of those alternatives. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070117/e7c6ddcf/signature.bin From david.borman at windriver.com Thu Jan 18 08:46:58 2007 From: david.borman at windriver.com (David Borman) Date: Thu, 18 Jan 2007 10:46:58 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AE7900.70400@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> Message-ID: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> There are real-world scenarios where the insertion of a splitter into a TCP path does make a lot of sense. The cases I am familiar with all are necessitated by a severe mismatch in MTU, buffering and performance, the splitter is in the only path by which the packets can travel, and it is sitting at the crossover between the two disparate paths. In the specific case that I dealt with, the splitter's main purpose was to change the TCP MSS option, send larger window sizes, and buffer/repackage data. Getting the splitter to operate well takes some work. It has to maintain state for the connections in both directions. Besides acking and buffering data in both directions, and possibly repackaging data between the two sides, it also has to make sure that it synchronizes control events between the two halves so that neither endpoint gets into a state of believing that the connection has completed successfully when it hasn't. And there will still be failure modes that you wouldn't get with a straight TCP connection, but most of them are when the connection doesn't complete successfully. But in general, deploying a splitter where there is a possibility that packets can take an alternate route around the splitter, or where you do not have some degree of control over one side of the network seems like a bad idea to me. A splitter should not be a general purpose device, it should be tied to the unique bandwidth*delay mismatch of the problem that is being addressed. A TCP splitter that is *not* a NAT box operates at the TCP layer, and should not require any changes to the content of the TCP data stream, whereas an application level proxy often requires that the proxy has knowledge of the particular application, and may have to modify the data stream. -David Borman From touch at ISI.EDU Thu Jan 18 10:00:01 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 18 Jan 2007 10:00:01 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> Message-ID: <45AFB5A1.9030407@isi.edu> David Borman wrote: > There are real-world scenarios where the insertion of a splitter into a > TCP path does make a lot of sense. The cases I am familiar with all are > necessitated by a severe mismatch in MTU, buffering and performance, Taking each individually: Mismatched MTU - sounds like a PMTU issue, otherwise you're hyper-optimizing IP overheads in ways that the Internet protocols are not designed to support. If you have a broken PMTU situation, using a splitter to 'patch' the situation is fixing one broken system with another, IMO. Buffing problems could as easily be solved by non-splitter PEPs that buffer and retransmit, acting like a two-port router. The same is true for many performance problems. I don't agree that either makes sense, although I appreciate the desire for the first case where there are no alternatives., but only as a patch. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070118/07ce59d4/signature.bin From david.borman at windriver.com Thu Jan 18 13:20:48 2007 From: david.borman at windriver.com (David Borman) Date: Thu, 18 Jan 2007 15:20:48 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45AFB5A1.9030407@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> Message-ID: Hi Joe, A little more detail, see below. On Jan 18, 2007, at 12:00 PM, Joe Touch wrote: > > > David Borman wrote: >> There are real-world scenarios where the insertion of a splitter >> into a >> TCP path does make a lot of sense. The cases I am familiar with >> all are >> necessitated by a severe mismatch in MTU, buffering and performance, > > Taking each individually: > > Mismatched MTU - sounds like a PMTU issue, otherwise you're > hyper-optimizing IP overheads in ways that the Internet protocols are > not designed to support. If you have a broken PMTU situation, using a > splitter to 'patch' the situation is fixing one broken system with > another, IMO. It's not a PMTU issue, PMTU finds the smallest MTU along the path. I'm talking about a large MTU mismatch, such as a standard ethernet on one side with 1500 byte packets, and an interface with a 64K MTU on the other side (HIPPI, FibreChannel, etc). The goal is to be able to use the large packets between the splitter and the host on the 64K MTU network, an ethernet sized packets out to the other endpoint. With PMTU and without the intervention of the splitter, packets will be limited to 1500 bytes along the whole path. > Buffing problems could as easily be solved by non-splitter PEPs that > buffer and retransmit, acting like a two-port router. The same is true > for many performance problems. In this scenario, the 1500 byte host may be only offering a window of, say 16K. The splitter offers a window to the 64K host of something like 512K. This allows the 64K MTU host to send multiple 64K sized packets, which the splitter then sends out as ethernet size packets to the remote host. In other words, for a 16K vs. 512K scenario, for each window of data transferred between the 64K host and the splitter, there are 32 windows of data transferred out to the remote hosts. Conversely, as 1500 byte packets arrive from the remote host, they are acked and accumulated into larger packets that are then transferred over the 64K MTU network in larger packets. > I don't agree that either makes sense, although I appreciate the > desire > for the first case where there are no alternatives., but only as a > patch. Again, I don't think a splitter is a good general solution, but there are specific cases where it can do what needs to be done within the constraints of the system. -David Borman > > Joe > > -- > ---------------------------------------- > Joe Touch > Sr. Network Engineer, USAF TSAT Space Segment > From touch at ISI.EDU Thu Jan 18 14:09:08 2007 From: touch at ISI.EDU (Joe Touch) Date: Thu, 18 Jan 2007 14:09:08 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> Message-ID: <45AFF004.3060709@isi.edu> David Borman wrote: > Hi Joe, > > A little more detail, see below. > > On Jan 18, 2007, at 12:00 PM, Joe Touch wrote: > >> >> >> David Borman wrote: >>> There are real-world scenarios where the insertion of a splitter into a >>> TCP path does make a lot of sense. The cases I am familiar with all are >>> necessitated by a severe mismatch in MTU, buffering and performance, >> >> Taking each individually: >> >> Mismatched MTU - sounds like a PMTU issue, otherwise you're >> hyper-optimizing IP overheads in ways that the Internet protocols are >> not designed to support. If you have a broken PMTU situation, using a >> splitter to 'patch' the situation is fixing one broken system with >> another, IMO. > > It's not a PMTU issue, PMTU finds the smallest MTU along the path. I'm > talking about a large MTU mismatch, such as a standard ethernet on one > side with 1500 byte packets, and an interface with a 64K MTU on the > other side (HIPPI, FibreChannel, etc). The goal is to be able to use > the large packets between the splitter and the host on the 64K MTU > network, an ethernet sized packets out to the other endpoint. With PMTU > and without the intervention of the splitter, packets will be limited to > 1500 bytes along the whole path. That's in the margins of 'hyperoptimization' I noted above, IMO. I'm not clear what the utility of having the larger MTU is there, vs., e.g., frame bursting, except that it offloads data coalescing to an outboard processor. If that's the goal, then this amounts to an outboard 'network coprocessor'. >> Buffing problems could as easily be solved by non-splitter PEPs that >> buffer and retransmit, acting like a two-port router. The same is true >> for many performance problems. > > In this scenario, the 1500 byte host may be only offering a window of, > say 16K. The splitter offers a window to the 64K host of something like > 512K. This allows the 64K MTU host to send multiple 64K sized packets, > which the splitter then sends out as ethernet size packets to the remote > host. In other words, for a 16K vs. 512K scenario, for each window of > data transferred between the 64K host and the splitter, there are 32 > windows of data transferred out to the remote hosts. > > Conversely, as 1500 byte packets arrive from the remote host, they are > acked and accumulated into larger packets that are then transferred over > the 64K MTU network in larger packets. > > >> I don't agree that either makes sense, although I appreciate the desire >> for the first case where there are no alternatives., but only as a patch. > > Again, I don't think a splitter is a good general solution, but there > are specific cases where it can do what needs to be done within the > constraints of the system. The above both look like outboard coprocessors. If that's the goal, then you're really extending the boundary of what the endhost is, and that's reasonable. Most other uses - to silently help someone who doesn't know you're there - are the problem, IMO. Joe ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070118/29745269/signature.bin From perfgeek at mac.com Fri Jan 19 08:52:53 2007 From: perfgeek at mac.com (rick jones) Date: Fri, 19 Jan 2007 08:52:53 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> Message-ID: <8b38e92efb05d97f0587240a04505367@mac.com> > In this scenario, the 1500 byte host may be only offering a window of, > say 16K. The splitter offers a window to the 64K host of something > like 512K. This allows the 64K MTU host to send multiple 64K sized > packets, which the splitter then sends out as ethernet size packets to > the remote host. In other words, for a 16K vs. 512K scenario, for > each window of data transferred between the 64K host and the splitter, > there are 32 windows of data transferred out to the remote hosts. > > Conversely, as 1500 byte packets arrive from the remote host, they are > acked and accumulated into larger packets that are then transferred > over the 64K MTU network in larger packets. Apart from calling it a splitter, superficially at least that resembles what some 10G NICs can do today, albeit with some explicit knowledge/assistance by the stack. Large send has the stack(host) giving the NIC(splitter) a large "segment" which the NIC(splitter) resegments for the link. Those flow across the ethernet to the other NIC(splitter) which if it has Large Receive Offload enabled will "upsegment" the ethernet-sized traffic and give larger segments to the receiving stack(host). rick jones there is no rest for the wicked, yet the virtuous have no pillows From touch at ISI.EDU Fri Jan 19 09:17:41 2007 From: touch at ISI.EDU (Joe Touch) Date: Fri, 19 Jan 2007 09:17:41 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <8b38e92efb05d97f0587240a04505367@mac.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> Message-ID: <45B0FD35.7050708@isi.edu> rick jones wrote: >> In this scenario, the 1500 byte host may be only offering a window of, >> say 16K. The splitter offers a window to the 64K host of something >> like 512K. This allows the 64K MTU host to send multiple 64K sized >> packets, which the splitter then sends out as ethernet size packets to >> the remote host. In other words, for a 16K vs. 512K scenario, for >> each window of data transferred between the 64K host and the splitter, >> there are 32 windows of data transferred out to the remote hosts. >> >> Conversely, as 1500 byte packets arrive from the remote host, they are >> acked and accumulated into larger packets that are then transferred >> over the 64K MTU network in larger packets. > > Apart from calling it a splitter, superficially at least that resembles > what some 10G NICs can do today, albeit with some explicit > knowledge/assistance by the stack. Large send has the stack(host) > giving the NIC(splitter) a large "segment" which the NIC(splitter) > resegments for the link. Those flow across the ethernet to the other > NIC(splitter) which if it has Large Receive Offload enabled will > "upsegment" the ethernet-sized traffic and give larger segments to the > receiving stack(host). Right - this looks like a cooperative outboard processor, which makes a lot of sense in some environments when both the outboard processor and host are managed/controlled by the same entity, but still makes very little sense (to me) when that's not the case. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070119/530b7034/signature.bin From david.borman at windriver.com Fri Jan 19 11:03:33 2007 From: david.borman at windriver.com (David Borman) Date: Fri, 19 Jan 2007 13:03:33 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B0FD35.7050708@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> Message-ID: No, it's more than just Large Send Offload or Large Receive Offload. That's done on a per-packet basis, without needing to keep much, if any state. In the scenario I'm citing the splitter is also changing the window and the MSS option. The remote host offers a (relatively) small window, the splitter offers a much bigger (512K) window to the host on the 64K MTU network (in addition to rewriting the MSS option). With the small delay*bandwith to the remote host, the splitter has no trouble keeping the pipe full using standard ethernet packets. But if those packets went all the way to the 64K host across the large delay*bandwidth 64KMTU network, there'd be a lot of idle time waiting for window updates, and you get much lower throughput from end-to-end. -David Borman On Jan 19, 2007, at 11:17 AM, Joe Touch wrote: > > > rick jones wrote: >>> In this scenario, the 1500 byte host may be only offering a >>> window of, >>> say 16K. The splitter offers a window to the 64K host of something >>> like 512K. This allows the 64K MTU host to send multiple 64K sized >>> packets, which the splitter then sends out as ethernet size >>> packets to >>> the remote host. In other words, for a 16K vs. 512K scenario, for >>> each window of data transferred between the 64K host and the >>> splitter, >>> there are 32 windows of data transferred out to the remote hosts. >>> >>> Conversely, as 1500 byte packets arrive from the remote host, >>> they are >>> acked and accumulated into larger packets that are then transferred >>> over the 64K MTU network in larger packets. >> >> Apart from calling it a splitter, superficially at least that >> resembles >> what some 10G NICs can do today, albeit with some explicit >> knowledge/assistance by the stack. Large send has the stack(host) >> giving the NIC(splitter) a large "segment" which the NIC(splitter) >> resegments for the link. Those flow across the ethernet to the other >> NIC(splitter) which if it has Large Receive Offload enabled will >> "upsegment" the ethernet-sized traffic and give larger segments to >> the >> receiving stack(host). > > Right - this looks like a cooperative outboard processor, which > makes a > lot of sense in some environments when both the outboard processor and > host are managed/controlled by the same entity, but still makes very > little sense (to me) when that's not the case. > > Joe > > -- > ---------------------------------------- > Joe Touch > Sr. Network Engineer, USAF TSAT Space Segment > From perfgeek at mac.com Fri Jan 19 18:57:43 2007 From: perfgeek at mac.com (rick jones) Date: Fri, 19 Jan 2007 18:57:43 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> Message-ID: <5e0bbd63ccb9905200d8c14e373a851e@mac.com> On Jan 19, 2007, at 11:03 AM, David Borman wrote: > No, it's more than just Large Send Offload or Large Receive Offload. > That's done on a per-packet basis, without needing to keep much, if > any state. In the scenario I'm citing the splitter is also changing > the window and the MSS option. Then I guess TOE may be closer, but still not quite there. rick jones there is no rest for the wicked, yet the virtuous have no pillows From touch at ISI.EDU Sun Jan 21 10:01:49 2007 From: touch at ISI.EDU (Joe Touch) Date: Sun, 21 Jan 2007 10:01:49 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> Message-ID: <45B3AA8D.7070203@isi.edu> This device offloads processing on behalf of the endpoint. Such devices can offload on a per-socketbuffer basis; this isn't much different, except that it sends IP packets to the offloader. Buffering serves two purposes: help a source with an insufficient buffer for the BW*delay product, and help compensate for a receiver that can't keep pace with a bunch of little packets. The former helps only if you have an endpoint that can send relatively fast, but has poor buffering. The latter helps only if you have a receiver that can't keep pace with lots of small packets. Both point to broken implementations, and trade correctness for performance. That's not how the rest of TCP is optimized. IMO, this argues for a different transport protocol, not for these splitters. Joe David Borman wrote: > No, it's more than just Large Send Offload or Large Receive Offload. > That's done on a per-packet basis, without needing to keep much, if any > state. In the scenario I'm citing the splitter is also changing the > window and the MSS option. The remote host offers a (relatively) small > window, the splitter offers a much bigger (512K) window to the host on > the 64K MTU network (in addition to rewriting the MSS option). With the > small delay*bandwith to the remote host, the splitter has no trouble > keeping the pipe full using standard ethernet packets. But if those > packets went all the way to the 64K host across the large > delay*bandwidth 64KMTU network, there'd be a lot of idle time waiting > for window updates, and you get much lower throughput from end-to-end. > -David Borman > > On Jan 19, 2007, at 11:17 AM, Joe Touch wrote: > >> >> >> rick jones wrote: >>>> In this scenario, the 1500 byte host may be only offering a window of, >>>> say 16K. The splitter offers a window to the 64K host of something >>>> like 512K. This allows the 64K MTU host to send multiple 64K sized >>>> packets, which the splitter then sends out as ethernet size packets to >>>> the remote host. In other words, for a 16K vs. 512K scenario, for >>>> each window of data transferred between the 64K host and the splitter, >>>> there are 32 windows of data transferred out to the remote hosts. >>>> >>>> Conversely, as 1500 byte packets arrive from the remote host, they are >>>> acked and accumulated into larger packets that are then transferred >>>> over the 64K MTU network in larger packets. >>> >>> Apart from calling it a splitter, superficially at least that resembles >>> what some 10G NICs can do today, albeit with some explicit >>> knowledge/assistance by the stack. Large send has the stack(host) >>> giving the NIC(splitter) a large "segment" which the NIC(splitter) >>> resegments for the link. Those flow across the ethernet to the other >>> NIC(splitter) which if it has Large Receive Offload enabled will >>> "upsegment" the ethernet-sized traffic and give larger segments to the >>> receiving stack(host). >> >> Right - this looks like a cooperative outboard processor, which makes a >> lot of sense in some environments when both the outboard processor and >> host are managed/controlled by the same entity, but still makes very >> little sense (to me) when that's not the case. >> >> Joe >> >> ------------------------------------------ >> Joe Touch >> Sr. Network Engineer, USAF TSAT Space Segment >> -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070121/7c89ae7d/signature.bin From detlef.bosau at web.de Sun Jan 21 15:02:37 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 22 Jan 2007 00:02:37 +0100 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> Message-ID: <45B3F10D.9010501@web.de> David Borman wrote: > There are real-world scenarios where the insertion of a splitter into > a TCP path does make a lot of sense. The cases I am familiar with all > are necessitated by a severe mismatch in MTU, buffering and > performance, the splitter is in the only path by which the packets can > travel, and it is sitting at the crossover between the two disparate > paths. In the specific case that I dealt with, the splitter's main > purpose was to change the TCP MSS option, send larger window sizes, > and buffer/repackage data. In addition to the MTU issue you mention let me point to Joseph Ishac, Mark Allman. / On the Performance of TCP Spoofing in Satellite Networks /. IEEE Milcom. October 2001. http://www.icir.org/mallman/papers/milcom01.pdf The issue here is the extremely large round trip time in satellite networks which causes TCP to need a quite long time to achieve sufficient throughput. In fact, the RTT described by Mark Allman aren?t even that bad. I?m still working on the issue of opportunistic scheduling in mobile networks. I just found a technical report on that issue: TCP Performance in Wireless Systems with Opportunistic Scheduling, R. Srinivasan and J. S. Baras, *Number:* TR 2002-48, *Year:* 2002 , *Advisor:* John Baras http://techreports.isr.umd.edu/reports/2002/TR_2002-48.pdf As far as I see, this is really excellent work. In the example at the beginning at the paper opportunistic scheduling introduces a delay jitter of up to 1 second into the flow. I currently simulate networks with a physcical bandwidth of 10 Mbps and an average throughput of 100 kbps at the link layer due to retransmissions and accept a delay jitter of up to 1 second. When a equalize the delay spikes by buffering to an extent that the TCP fully exploits the average throughput at the link layer the round trip time as perceived by the sender can reach up to 10 seconds. It?s simply the question in wich kind of application 10 seconds RTT will be accepted. Particularly for mobile networks with no splitting or spoofing I clearly expect the quite strict alternative that a network will either exhibit acceptable round trip times or acceptable throughput. Of course delay spikes themselves will be annoying in interactive appliccations. But one second delay may be acceptable for a user whereas the same user would simply abolish an application / connection with 10 seconds round trip time. Detlef From david.borman at windriver.com Mon Jan 22 08:02:26 2007 From: david.borman at windriver.com (David Borman) Date: Mon, 22 Jan 2007 10:02:26 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B3AA8D.7070203@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> Message-ID: <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> Joe, You keep missing the point. The delay*bandwidth between the end hosts is sufficiently large that it can not be driven at full speed from end-to-end given the window advertised by the host on the ethernet side of things. Even if that host advertised a sufficiently large window, the inefficiencies of small packets on the 64K MTU side of the network will keep the network from being driven at full speed, not to mention the cost of ramping up slowstart using 1.5K byte packets vs. 64K byte packets. The splitter in this case is sitting between the two networks, transparently connecting what it has effectively turned into two TCP connections, providing the necessary resources to allow TCP to run optimally on each half of the path, without the end nodes needing to have explicit knowledge of the splitter. TCP does not always work well in all scenarios, but there is a lot of value in being able to use TCP instead of designing a new transport and internet layer. In a scenario like this, the splitter allows TCP to be used in an environment where it otherwise wouldn't work very well. And sure, it'd be best if the splitter wasn't needed, and the connection could be run at full speed from end-to-end, but sometimes you have to deal with realities and not just theory. Focusing on details and calling the implementation broken, and then ignoring the underlying issues, doesn't resolve anything. -David Borman On Jan 21, 2007, at 12:01 PM, Joe Touch wrote: > This device offloads processing on behalf of the endpoint. Such > devices > can offload on a per-socketbuffer basis; this isn't much different, > except that it sends IP packets to the offloader. > > Buffering serves two purposes: help a source with an insufficient > buffer > for the BW*delay product, and help compensate for a receiver that > can't > keep pace with a bunch of little packets. > > The former helps only if you have an endpoint that can send relatively > fast, but has poor buffering. > > The latter helps only if you have a receiver that can't keep pace with > lots of small packets. > > Both point to broken implementations, and trade correctness for > performance. That's not how the rest of TCP is optimized. IMO, this > argues for a different transport protocol, not for these splitters. > > Joe > > David Borman wrote: >> No, it's more than just Large Send Offload or Large Receive Offload. >> That's done on a per-packet basis, without needing to keep much, >> if any >> state. In the scenario I'm citing the splitter is also changing the >> window and the MSS option. The remote host offers a (relatively) >> small >> window, the splitter offers a much bigger (512K) window to the >> host on >> the 64K MTU network (in addition to rewriting the MSS option). >> With the >> small delay*bandwith to the remote host, the splitter has no trouble >> keeping the pipe full using standard ethernet packets. But if those >> packets went all the way to the 64K host across the large >> delay*bandwidth 64KMTU network, there'd be a lot of idle time waiting >> for window updates, and you get much lower throughput from end-to- >> end. >> -David Borman >> >> On Jan 19, 2007, at 11:17 AM, Joe Touch wrote: >> >>> >>> >>> rick jones wrote: >>>>> In this scenario, the 1500 byte host may be only offering a >>>>> window of, >>>>> say 16K. The splitter offers a window to the 64K host of >>>>> something >>>>> like 512K. This allows the 64K MTU host to send multiple 64K >>>>> sized >>>>> packets, which the splitter then sends out as ethernet size >>>>> packets to >>>>> the remote host. In other words, for a 16K vs. 512K scenario, for >>>>> each window of data transferred between the 64K host and the >>>>> splitter, >>>>> there are 32 windows of data transferred out to the remote hosts. >>>>> >>>>> Conversely, as 1500 byte packets arrive from the remote host, >>>>> they are >>>>> acked and accumulated into larger packets that are then >>>>> transferred >>>>> over the 64K MTU network in larger packets. >>>> >>>> Apart from calling it a splitter, superficially at least that >>>> resembles >>>> what some 10G NICs can do today, albeit with some explicit >>>> knowledge/assistance by the stack. Large send has the stack(host) >>>> giving the NIC(splitter) a large "segment" which the NIC(splitter) >>>> resegments for the link. Those flow across the ethernet to the >>>> other >>>> NIC(splitter) which if it has Large Receive Offload enabled will >>>> "upsegment" the ethernet-sized traffic and give larger segments >>>> to the >>>> receiving stack(host). >>> >>> Right - this looks like a cooperative outboard processor, which >>> makes a >>> lot of sense in some environments when both the outboard >>> processor and >>> host are managed/controlled by the same entity, but still makes very >>> little sense (to me) when that's not the case. >>> >>> Joe >>> >>> ------------------------------------------ >>> Joe Touch >>> Sr. Network Engineer, USAF TSAT Space Segment >>> > > -- > ---------------------------------------- > Joe Touch > Sr. Network Engineer, USAF TSAT Space Segment > From touch at ISI.EDU Mon Jan 22 09:09:22 2007 From: touch at ISI.EDU (Joe Touch) Date: Mon, 22 Jan 2007 09:09:22 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> Message-ID: <45B4EFC2.3020408@isi.edu> David Borman wrote: > Joe, > > You keep missing the point. The delay*bandwidth between the end hosts > is sufficiently large that it can not be driven at full speed from > end-to-end given the window advertised by the host on the ethernet side > of things. Even if that host advertised a sufficiently large window, > the inefficiencies of small packets on the 64K MTU side of the network > will keep the network from being driven at full speed, not to mention > the cost of ramping up slowstart using 1.5K byte packets vs. 64K byte > packets. This is a contradiction: clearly the splitter needs to keep up with receiving small packets at rate or it can't sustain emitting the large packets at full speed. If the splitter can do this, then the destination can. The fact that it doesn't means this is (by definition) a patch to a broken system. Using splitters to patch broken systems is understandable, but it's still preferable (IMO) to make the splitter visible and run it as a true proxy, terminating the TCP on both ends properly. > The splitter in this case is sitting between the two networks, > transparently connecting what it has effectively turned into two TCP > connections, That's the point that's missed, IMO - this isn't "effectively' two TCP connections; it provides the benefit of two TCP connections without actually terminating the connections, which means this isn't 'effectively' two, but 'one TCP connection with the performance and semantics of two'. The former is understandable, but the latter is the problem. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/d2d27035/signature.bin From david.borman at windriver.com Mon Jan 22 13:33:33 2007 From: david.borman at windriver.com (David Borman) Date: Mon, 22 Jan 2007 15:33:33 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B4EFC2.3020408@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> Message-ID: <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> On Jan 22, 2007, at 11:09 AM, Joe Touch wrote: > > > David Borman wrote: >> Joe, >> >> You keep missing the point. The delay*bandwidth between the end >> hosts >> is sufficiently large that it can not be driven at full speed from >> end-to-end given the window advertised by the host on the ethernet >> side >> of things. Even if that host advertised a sufficiently large window, >> the inefficiencies of small packets on the 64K MTU side of the >> network >> will keep the network from being driven at full speed, not to mention >> the cost of ramping up slowstart using 1.5K byte packets vs. 64K byte >> packets. > > This is a contradiction: clearly the splitter needs to keep up with > receiving small packets at rate or it can't sustain emitting the large > packets at full speed. If the splitter can do this, then the > destination > can. The fact that it doesn't means this is (by definition) a patch > to a > broken system. Ah, you are assuming that both the ethernet side and the 64K MTU side of the path operate equally efficiently using small packets. That is not the case. The splitter isn't able to keep the pipe full over the 64K network using 1500 byte packets, but it can using larger packets, so the further remote host is even less able to keep it full using 1500 byte packets. It's like using dump trucks to haul individual wheelbarrow size loads; it'll work, but you won't be very efficient and the transfer will take a lot longer. So, I disagree with your contention that the system is broken. It's different and heterogeneous, but that doesn't make it broken. -David Borman From touch at ISI.EDU Mon Jan 22 14:05:58 2007 From: touch at ISI.EDU (Joe Touch) Date: Mon, 22 Jan 2007 14:05:58 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> Message-ID: <45B53546.4080507@isi.edu> David Borman wrote: >> This is a contradiction: clearly the splitter needs to keep up with >> receiving small packets at rate or it can't sustain emitting the large >> packets at full speed. If the splitter can do this, then the destination >> can. The fact that it doesn't means this is (by definition) a patch to a >> broken system. > > Ah, you are assuming that both the ethernet side and the 64K MTU side of > the path operate equally efficiently using small packets. source ---------------> splitter ----------------> dest 1500byte 64K byte You're claiming that the splitter is required to keep the 64Kbyte side running at full rate. That means the 1500-byte side has to handle packets roughly 40x faster. Otherwise, the 64K byte side is not running at high-rate. So here's what we have: - dest can handle 64K but not 1500 - source must handle 1500 at high rate - splitter must receive 1500 at high rate Now you're claiming that there's a link (source-splitter) that's efficient enough for small packets. If that's the case, why would we ever want the kind of link that's being used splitter-dest? Again, this argues that something is seriously broken. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/08092b10/signature.bin From david.borman at windriver.com Mon Jan 22 14:38:14 2007 From: david.borman at windriver.com (David Borman) Date: Mon, 22 Jan 2007 16:38:14 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B53546.4080507@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> <45B53546.4080507@isi.edu> Message-ID: On Jan 22, 2007, at 4:05 PM, Joe Touch wrote: > > > David Borman wrote: >>> This is a contradiction: clearly the splitter needs to keep up with >>> receiving small packets at rate or it can't sustain emitting the >>> large >>> packets at full speed. If the splitter can do this, then the >>> destination >>> can. The fact that it doesn't means this is (by definition) a >>> patch to a >>> broken system. >> >> Ah, you are assuming that both the ethernet side and the 64K MTU >> side of >> the path operate equally efficiently using small packets. > > source ---------------> splitter ----------------> dest > 1500byte 64K byte > > > You're claiming that the splitter is required to keep the 64Kbyte side > running at full rate. That means the 1500-byte side has to handle > packets roughly 40x faster. Otherwise, the 64K byte side is not > running > at high-rate. > > So here's what we have: > - dest can handle 64K but not 1500 > - source must handle 1500 at high rate > - splitter must receive 1500 at high rate > > Now you're claiming that there's a link (source-splitter) that's > efficient enough for small packets. If that's the case, why would we > ever want the kind of link that's being used splitter-dest? If all you're ever going to do is talk through the splitter to remote ethernet hosts, then yes, it'd be preferable to bring ethernet directly to the host instead of using the 64K MTU network. But you don't always get what you want. For various reasons it might not be possible to bring ethernet directly to the hosts on the 64K network. And while the 64K MTU network may not be as efficient with 1500 byte packets as an ethernet network, replacing it with an ethernet network might be slower internally than the 64K MTU network. So the trade off is a faster 64K network that works well with large packets but not ethernet sized packets, vs. a slower ethernet network that works better with ethernet sized packets, but doesn't have the overall capacity of the 64K network. > > Again, this argues that something is seriously broken. Sometimes there isn't an optimal solution and you have to make hard choices. Just because it isn't the one you want doesn't mean things are *broken* when you then try to mitigate the effects of those choices. -David Borman From touch at ISI.EDU Mon Jan 22 14:49:48 2007 From: touch at ISI.EDU (Joe Touch) Date: Mon, 22 Jan 2007 14:49:48 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> <45B53546.4080507@isi.edu> Message-ID: <45B53F8C.409@isi.edu> David Borman wrote: > > On Jan 22, 2007, at 4:05 PM, Joe Touch wrote: ... >>> Ah, you are assuming that both the ethernet side and the 64K MTU side of >>> the path operate equally efficiently using small packets. >> >> source ---------------> splitter ----------------> dest >> 1500byte 64K byte >> >> >> You're claiming that the splitter is required to keep the 64Kbyte side >> running at full rate. That means the 1500-byte side has to handle >> packets roughly 40x faster. Otherwise, the 64K byte side is not running >> at high-rate. >> >> So here's what we have: >> - dest can handle 64K but not 1500 >> - source must handle 1500 at high rate >> - splitter must receive 1500 at high rate >> >> Now you're claiming that there's a link (source-splitter) that's >> efficient enough for small packets. If that's the case, why would we >> ever want the kind of link that's being used splitter-dest? > > If all you're ever going to do is talk through the splitter to remote > ethernet hosts, then yes, it'd be preferable to bring ethernet directly > to the host instead of using the 64K MTU network. But you don't always > get what you want. For various reasons it might not be possible to > bring ethernet directly to the hosts on the 64K network. In that case you're making a case for an outboard ethernet adapter, e.g., like the USB dongles. > And while the > 64K MTU network may not be as efficient with 1500 byte packets as an > ethernet network, replacing it with an ethernet network might be slower > internally than the 64K MTU network. That's the part that's confusing. In order to warrant the splitter, the ethernet side must keep up. But then you're saying here that the ethernet is slower. Either: ethernet keeps up which you need to assume to warrant a splitter, but which begs the question of why you have less capable net to the right ethernet doesn't keep up in which case aggregation doesn't help > So the trade off is a faster 64K > network that works well with large packets but not ethernet sized > packets, vs. a slower ethernet network that works better with ethernet > sized packets, but doesn't have the overall capacity of the 64K network. In the latter case, you're not keeping up on the source-splitter side. Which means you don't need the splitter either to aggregate or to buffer. >> Again, this argues that something is seriously broken. > > Sometimes there isn't an optimal solution and you have to make hard > choices. Just because it isn't the one you want doesn't mean things are > *broken* when you then try to mitigate the effects of those choices. It's still very unclear what effects this is mitigating. -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070122/d1f75b85/signature.bin From david.borman at windriver.com Mon Jan 22 17:48:46 2007 From: david.borman at windriver.com (David Borman) Date: Mon, 22 Jan 2007 19:48:46 -0600 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B53F8C.409@isi.edu> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> <45B53546.4080507@isi.edu> <45B53F8C.409@isi.edu> Message-ID: <2404896E-2714-4BBF-97C1-B1C952280F90@windriver.com> On Jan 22, 2007, at 4:49 PM, Joe Touch wrote: > > > David Borman wrote: >> >> On Jan 22, 2007, at 4:05 PM, Joe Touch wrote: > ... >>>> Ah, you are assuming that both the ethernet side and the 64K MTU >>>> side of >>>> the path operate equally efficiently using small packets. >>> >>> source ---------------> splitter ----------------> dest >>> 1500byte 64K byte >>> >>> >>> You're claiming that the splitter is required to keep the 64Kbyte >>> side >>> running at full rate. That means the 1500-byte side has to handle >>> packets roughly 40x faster. Otherwise, the 64K byte side is not >>> running >>> at high-rate. >>> >>> So here's what we have: >>> - dest can handle 64K but not 1500 >>> - source must handle 1500 at high rate >>> - splitter must receive 1500 at high rate >>> >>> Now you're claiming that there's a link (source-splitter) that's >>> efficient enough for small packets. If that's the case, why would we >>> ever want the kind of link that's being used splitter-dest? >> >> If all you're ever going to do is talk through the splitter to remote >> ethernet hosts, then yes, it'd be preferable to bring ethernet >> directly >> to the host instead of using the 64K MTU network. But you don't >> always >> get what you want. For various reasons it might not be possible to >> bring ethernet directly to the hosts on the 64K network. > > In that case you're making a case for an outboard ethernet adapter, > e.g., like the USB dongles. If you can't bring in ethernet, you aren't going to have USB... > >> And while the >> 64K MTU network may not be as efficient with 1500 byte packets as an >> ethernet network, replacing it with an ethernet network might be >> slower >> internally than the 64K MTU network. > > That's the part that's confusing. In order to warrant the splitter, > the > ethernet side must keep up. But then you're saying here that the > ethernet is slower. > > Either: > ethernet keeps up > which you need to assume to warrant a splitter, > but which begs the question of why you have less capable > net to the right > > ethernet doesn't keep up > in which case aggregation doesn't help Why is this not clear? The overall capacity of the 64K network exceeds the capacity of the ethernet network. So for large packets, the 64K network is better than the ethernet network. But with the smaller ethernet sized packets, the 64K network is unable to make use of that capacity. That's the scenario. > >> So the trade off is a faster 64K >> network that works well with large packets but not ethernet sized >> packets, vs. a slower ethernet network that works better with >> ethernet >> sized packets, but doesn't have the overall capacity of the 64K >> network. > > In the latter case, you're not keeping up on the source-splitter side. > Which means you don't need the splitter either to aggregate or to > buffer. Huh? I'm saying that even if you could replace the 64K network with an ethernet network, you'd improve the end-to-end performance from hosts on that network to remote hosts without the need for the splitter, but the cost of doing that is lower performance between hosts that used to be on the 64K network. > >>> Again, this argues that something is seriously broken. >> >> Sometimes there isn't an optimal solution and you have to make hard >> choices. Just because it isn't the one you want doesn't mean >> things are >> *broken* when you then try to mitigate the effects of those choices. > > It's still very unclear what effects this is mitigating. I'm sorry that you don't understand it. I've tried to be clear in my description of the scenario. The throughput across the 64K network with 1500 byte packets is worse than ethernet. The throughput across the 64K network with larger packets exceeds the throughput over ethernet. And even without that issue, the typical window at the remote host across ethernet is not large enough for the delay*bandwidth end-to-end. That's the scenario. The splitter mitigates those issues without needing to add a new transport. -David Borman From dpreed at reed.com Tue Jan 23 09:13:39 2007 From: dpreed at reed.com (David P. Reed) Date: Tue, 23 Jan 2007 12:13:39 -0500 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> <45B53546.4080507@isi.edu> Message-ID: <45B64243.8000109@reed.com> This is a very strange debate. One can (of course) develop an idiosyncratic protocol that works in just this case better than any other protocol. The situation is not "broken" - just highly specific, the kind of thing that one encounters as a result of historical accidents, and most of the Internet infrastructure is full of historical accidents. So are we accomplishing anything with this discussion? I assert that all concerned are quite intelligent people. So if the debate is just to measure your intellectual manhood against each other, perhaps a contest like "American Idol" would be a better place than here? David Borman wrote: > > On Jan 22, 2007, at 4:05 PM, Joe Touch wrote: > >> >> >> David Borman wrote: >>>> This is a contradiction: clearly the splitter needs to keep up with >>>> receiving small packets at rate or it can't sustain emitting the large >>>> packets at full speed. If the splitter can do this, then the >>>> destination >>>> can. The fact that it doesn't means this is (by definition) a patch >>>> to a >>>> broken system. >>> >>> Ah, you are assuming that both the ethernet side and the 64K MTU >>> side of >>> the path operate equally efficiently using small packets. >> >> source ---------------> splitter ----------------> dest >> 1500byte 64K byte >> >> >> You're claiming that the splitter is required to keep the 64Kbyte side >> running at full rate. That means the 1500-byte side has to handle >> packets roughly 40x faster. Otherwise, the 64K byte side is not running >> at high-rate. >> >> So here's what we have: >> - dest can handle 64K but not 1500 >> - source must handle 1500 at high rate >> - splitter must receive 1500 at high rate >> >> Now you're claiming that there's a link (source-splitter) that's >> efficient enough for small packets. If that's the case, why would we >> ever want the kind of link that's being used splitter-dest? > > If all you're ever going to do is talk through the splitter to remote > ethernet hosts, then yes, it'd be preferable to bring ethernet > directly to the host instead of using the 64K MTU network. But you > don't always get what you want. For various reasons it might not be > possible to bring ethernet directly to the hosts on the 64K network. > And while the 64K MTU network may not be as efficient with 1500 byte > packets as an ethernet network, replacing it with an ethernet network > might be slower internally than the 64K MTU network. So the trade off > is a faster 64K network that works well with large packets but not > ethernet sized packets, vs. a slower ethernet network that works > better with ethernet sized packets, but doesn't have the overall > capacity of the 64K network. > >> >> Again, this argues that something is seriously broken. > > Sometimes there isn't an optimal solution and you have to make hard > choices. Just because it isn't the one you want doesn't mean things > are *broken* when you then try to mitigate the effects of those choices. > > -David Borman > > > From touch at ISI.EDU Tue Jan 23 10:36:35 2007 From: touch at ISI.EDU (Joe Touch) Date: Tue, 23 Jan 2007 10:36:35 -0800 Subject: [e2e] A simple scenario. (Basically the reason for the sliding window thread ; -)) In-Reply-To: <45B64243.8000109@reed.com> References: <45A57D7A.6030505@isi.edu> <0B0A20D0B3ECD742AA2514C8DDA3B0650A358D@VGAEXCH01.hq.corp.viasat.com> <45A67E7D.4010609@isi.edu> <45A82323.30405@web.de> <45AD05E4.5040200@isi.edu> <45AE7738.6070701@web.de> <45AE7900.70400@isi.edu> <77574BB0-D22C-42F0-A86B-CFFA160B6CEA@windriver.com> <45AFB5A1.9030407@isi.edu> <8b38e92efb05d97f0587240a04505367@mac.com> <45B0FD35.7050708@isi.edu> <45B3AA8D.7070203@isi.edu> <0C098A44-5CAE-4C96-8181-E7EFF7DEC501@windriver.com> <45B4EFC2.3020408@isi.edu> <2A01BBB8-D84F-4B84-B81E-7ED440F6F519@windriver.com> <45B53546.4080507@isi.edu> <45B64243.8000109@reed.com> Message-ID: <45B655B3.1050903@isi.edu> FWIW... David P. Reed wrote: > This is a very strange debate. One can (of course) develop an > idiosyncratic protocol that works in just this case better than any > other protocol. The situation is not "broken" - just highly specific, > the kind of thing that one encounters as a result of historical > accidents, and most of the Internet infrastructure is full of historical > accidents. The key question IMO is whether this is a useful component of the architecture or whether it is support for legacy systems. The latter need not be something we propagate. > So are we accomplishing anything with this discussion? I thought we were deciding whether this accident was useful in general or just for legacy. D. Borman and I have taken the rest of the discussion off-line, though. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20070123/4ed8df67/signature.bin From rbriscoe at jungle.bt.co.uk Wed Jan 24 11:12:40 2007 From: rbriscoe at jungle.bt.co.uk (Bob Briscoe) Date: Wed, 24 Jan 2007 19:12:40 +0000 Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: References: Message-ID: <5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk> Vadim, At 00:47 14/01/2007, Vadim Antonov wrote: >Dado - ISPs are not interested in reducing amount of traffic; quite >opposite. It is their product, and as any producer they are interested in >increasing volume - if you remember Econ 101, in the long term the >profitability of all kinds of businesses tends to converge to the same >norm. (Business segments with higher-than-average ROI attract more >invenstments - and competition, thus reducing profitability; >underperforming segments lose capital and consequently have less >competitive pressure, thus allowing increase in profitability). > >In the established markets, where the initial period of rapid growth (on >the S-curve) is over, the only sustainable way to make more money and >increase value of business shares is to increase volume. Agreed (assuming by 'established' you mean highly competitive / commoditised) >So it makes no sense for ISPs whatsoever to penalize users for causing >congestion (thus reducing the demand). Instead, they want to encourage >users to pay more for bigger share of the network resources - the >congestion is their friend, if they can differentiate service (who would >pay for premium service when regular service is quite good?) [First, a caveat: I'm going to talk in terms of charging for congestion, as that's how your conversation started. However, limiting a customer's congestion is probably much more acceptable than charging for it, and I say below why the two are equivalent.] It only /seems/ to make no sense for ISPs to penalize users for causing congestion on a superficial first look. Econ 101 also says that a business doesn't want to supply a customer if the cost of supply is higher than that which the customer is willing to pay. Those customers willing to pay for congestion are saying "if you supplied more capacity I'd pay for it". Those unwilling to pay for congestion are saying "ok, you've hit my limit, I don't actually want more capacity so much that I'd be willing to pay as much as it will cost you to provide it". The key to this is to understand that congestion charges /complement/ capacity subscription charges - it certainly wouldn't make sense to /only/ charge for congestion whilst not charging subscriptions. The idea isn't that an ISP adds congestion charges on top of subscriptions. Increasing one should reduce the other, so that overall the user pays the same. It's just a question of tying a proportion of the charge to the user's traffic behaviour. In fact, if you'd attended the Econ 103 class ;) you would have been able to predict what the usage proportion will tend to in a competitive market. Let's say an ISP's costs are 60% capacity-related and 40% operational costs (faults, customer service, marketing, billing and so on). We're only concerned here with the 60%. It turns out that an ISP's most competitive strategy will be to get proportion p of its capacity-related revenues (the 60%) from usage using this simple formula: p = 1/e where e is the elasticity of scale. e measures how the cost of capacity flattens out as the ISP buys more (aka. economy of scale). This formula comes from Hal Varian & Jeffrey MacKie-Mason's seminal 1995 paper "Pricing Congestible Network Resources". It comes from optimising what an ISP would do in a scenario where ISPs all compete by charging the same in total, but varying the proportions due to usage vs. subscription. If you want to argue that pricing congestion makes no sense in networks, that's the paper to argue against. No-one has successfully done that, so good luck. One (unpublished) study found the cost of optical capacity (interface cards and links) rose with capacity by about a square-root law, implying e=2. If everything were optical and the market was perfectly competitive, this would imply a successful ISP would aim to tie 30% of its revenue to usage (if capacity related costs are 60%, half should be usage). So what's the intuition behind all this? I think you will agree ISPs will probably want to limit the amount of congestion one user can cause. Otherwise that customer reduces the value of the ISP's business for all the other customers. If the ISP doesn't limit each user's ability to cause congestion, customers will switch to another ISP that does. To be competitive, an ISP does well to aim for p=1/e. Returning to my initial caveat: Why is limiting congestion equivalent to charging for it? Many people prefer a 'fixed price contract' to 'pay as you go'. Basically the ISP would be saying "For your $10/month, you're getting up to X capacity and up to Y congestion." It's not actually saying X capacity costs $7 and Y congestion costs $3. But you would be able to infer the internal prices the ISP is using for X and Y if the same ISP also sells X capacity and 2Y congestion for say $13/month. Many people think congestion charging is all theoretical clap trap. However, it's actually what is happening all around us already. However, in practice, an ISP can't measure how much congestion each user causes in other networks. So instead we see various attempts to limit congestion using other more convenient levers: - Volume caps are one crude proxy for congestion. - DPI against p2p is a really crude attempt to limit congestion. - TCP congestion control is the nearest we have to a perfect example of congestion charging. Except it's a voluntary reduction in rate /as if/ the TCP algorithm were being charged for congestion. But it's certainly not perfect (my fairness-religion I-D that John W mentioned explains why TCP is myopic in time and myopic across flows). BTW, I've posted a more convenient version of the paper in CCR On-line that prints in 10pp (not 32pp of I-D format bloat). I've also updated it to say specifically what's wrong with the fairness in TCP, TFRC, WFQ and XCP as examples: >Also, congested network is the network operating at full capacity - >meaning that there is no overinvestment. If a provider has underloaded >network it, basically, means that its business people made a mistake and >overinvested (driving ROI - and share prices - lower). Congestion is excess load over offered load (another way of saying loss rate). Loosely, you can think of congestion charges as the part of the charge that pays for the capacity needed to serve the traffic that isn't being served (what Econ 101 calls 'marginal cost of capacity'). Subscriptions recover past investment in capacity. Together they cover the average cost of capacity. In fact economists usually calcualte elasticity of scale from 1/e = marginal cost / average cost, which is why 1/e = p, the proportion of congestion charge to total charge. In summary, it makes absolute sense for ISPs to limit congestion, which is equivalent to setting aside part of the monthly charge as if they are charging for congestion. Cheers Bob ____________________________________________________________________________ Bob Briscoe, Networks Research Centre, BT Research B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK. +44 1473 645196 From dpreed at reed.com Wed Jan 24 12:21:18 2007 From: dpreed at reed.com (David P. Reed) Date: Wed, 24 Jan 2007 15:21:18 -0500 Subject: [e2e] why fair sharing? ( Are we doing sliding window in the Internet?) In-Reply-To: <5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk> References: <5.2.1.1.2.20070124100306.01875a30@pop3.jungle.bt.co.uk> Message-ID: <45B7BFBE.7060505@reed.com> Bob - nice analysis, but beware of simple models being viewed as complete. The end user values more than just transport, which is all that is modeled in this notion of congestion. "Choices" or "options" also matter to users - whether it is the perception that there are "500 channels" a la the US cable system vs. the British broadcasting model of a couple of gov't channels and a few more gov't granted monopolies called private channels - users will pay for choices that they may or may not exercise. This provides a value to "switching" functions in networks. The freedom to channel surf, or the freedom to assemble a web page from many sources, with a small switching latency matters. But congestion directly blocks the ability to switch - it kills option value, and if option value is a large part of customer value, then congestion means that greedy users who don't value choice can kill value for other users. The other point is that network infrastructure is at scale a dynamically priced thing. If you study the other literature on "real options" (besides that which applies to R&D and network switching options) you will find that options or contingent value analysis is crucial to pricing such infrastructures as refineries, power plants, cable plants, etc. when faced with variable costs such as tooling, plant construction costs (think semiconductor fabs and Moore's Law estiimates of demand opportunity). So equilibrium economic models are helpful, but in fact contingent and dynamic economic models are far more important than easy analyses like these would imply. Bob Briscoe wrote: > Vadim, > > At 00:47 14/01/2007, Vadim Antonov wrote: > >> Dado - ISPs are not interested in reducing amount of traffic; quite >> opposite. It is their product, and as any producer they are >> interested in >> increasing volume - if you remember Econ 101, in the long term the >> profitability of all kinds of businesses tends to converge to the same >> norm. (Business segments with higher-than-average ROI attract more >> invenstments - and competition, thus reducing profitability; >> underperforming segments lose capital and consequently have less >> competitive pressure, thus allowing increase in profitability). >> >> In the established markets, where the initial period of rapid growth (on >> the S-curve) is over, the only sustainable way to make more money and >> increase value of business shares is to increase volume. > > Agreed (assuming by 'established' you mean highly competitive / > commoditised) > >> So it makes no sense for ISPs whatsoever to penalize users for causing >> congestion (thus reducing the demand). Instead, they want to encourage >> users to pay more for bigger share of the network resources - the >> congestion is their friend, if they can differentiate service (who would >> pay for premium service when regular service is quite good?) > > [First, a caveat: I'm going to talk in terms of charging for > congestion, as that's how your conversation started. However, limiting > a customer's congestion is probably much more acceptable than charging > for it, and I say below why the two are equivalent.] > > It only /seems/ to make no sense for ISPs to penalize users for > causing congestion on a superficial first look. > > Econ 101 also says that a business doesn't want to supply a customer > if the cost of supply is higher than that which the customer is > willing to pay. Those customers willing to pay for congestion are > saying "if you supplied more capacity I'd pay for it". Those unwilling > to pay for congestion are saying "ok, you've hit my limit, I don't > actually want more capacity so much that I'd be willing to pay as much > as it will cost you to provide it". > > The key to this is to understand that congestion charges /complement/ > capacity subscription charges - it certainly wouldn't make sense to > /only/ charge for congestion whilst not charging subscriptions. The > idea isn't that an ISP adds congestion charges on top of > subscriptions. Increasing one should reduce the other, so that overall > the user pays the same. It's just a question of tying a proportion of > the charge to the user's traffic behaviour. > > In fact, if you'd attended the Econ 103 class ;) you would have been > able to predict what the usage proportion will tend to in a > competitive market. Let's say an ISP's costs are 60% capacity-related > and 40% operational costs (faults, customer service, marketing, > billing and so on). We're only concerned here with the 60%. It turns > out that an ISP's most competitive strategy will be to get proportion > p of its capacity-related revenues (the 60%) from usage using this > simple formula: > > p = 1/e > > where e is the elasticity of scale. e measures how the cost of > capacity flattens out as the ISP buys more (aka. economy of scale). > > This formula comes from Hal Varian & Jeffrey MacKie-Mason's seminal > 1995 paper "Pricing Congestible Network Resources". It comes from > optimising what an ISP would do in a scenario where ISPs all compete > by charging the same in total, but varying the proportions due to > usage vs. subscription. If you want to argue that pricing congestion > makes no sense in networks, that's the paper to argue against. No-one > has successfully done that, so good luck. > > One (unpublished) study found the cost of optical capacity (interface > cards and links) rose with capacity by about a square-root law, > implying e=2. If everything were optical and the market was perfectly > competitive, this would imply a successful ISP would aim to tie 30% of > its revenue to usage (if capacity related costs are 60%, half should > be usage). > > So what's the intuition behind all this? I think you will agree ISPs > will probably want to limit the amount of congestion one user can > cause. Otherwise that customer reduces the value of the ISP's business > for all the other customers. If the ISP doesn't limit each user's > ability to cause congestion, customers will switch to another ISP that > does. To be competitive, an ISP does well to aim for p=1/e. > > Returning to my initial caveat: Why is limiting congestion equivalent > to charging for it? > > Many people prefer a 'fixed price contract' to 'pay as you go'. > Basically the ISP would be saying "For your $10/month, you're getting > up to X capacity and up to Y congestion." It's not actually saying X > capacity costs $7 and Y congestion costs $3. But you would be able to > infer the internal prices the ISP is using for X and Y if the same ISP > also sells X capacity and 2Y congestion for say $13/month. > > Many people think congestion charging is all theoretical clap trap. > However, it's actually what is happening all around us already. > However, in practice, an ISP can't measure how much congestion each > user causes in other networks. So instead we see various attempts to > limit congestion using other more convenient levers: > - Volume caps are one crude proxy for congestion. > - DPI against p2p is a really crude attempt to limit congestion. > - TCP congestion control is the nearest we have to a perfect example > of congestion charging. Except it's a voluntary reduction in rate /as > if/ the TCP algorithm were being charged for congestion. But it's > certainly not perfect (my fairness-religion I-D that John W mentioned > explains why TCP is myopic in time and myopic across flows). > > BTW, I've posted a more convenient version of the paper in CCR On-line > that prints in 10pp (not 32pp of I-D format bloat). I've also updated > it to say specifically what's wrong with the fairness in TCP, TFRC, > WFQ and XCP as examples: > > > >> Also, congested network is the network operating at full capacity - >> meaning that there is no overinvestment. If a provider has underloaded >> network it, basically, means that its business people made a mistake and >> overinvested (driving ROI - and share prices - lower). > > Congestion is excess load over offered load (another way of saying > loss rate). Loosely, you can think of congestion charges as the part > of the charge that pays for the capacity needed to serve the traffic > that isn't being served (what Econ 101 calls 'marginal cost of > capacity'). Subscriptions recover past investment in capacity. > Together they cover the average cost of capacity. In fact economists > usually calcualte elasticity of scale from > > 1/e = marginal cost / average cost, > > which is why 1/e = p, the proportion of congestion charge to total > charge. > > > In summary, it makes absolute sense for ISPs to limit congestion, > which is equivalent to setting aside part of the monthly charge as if > they are charging for congestion. > > Cheers > > > Bob > > > ____________________________________________________________________________ > > Bob Briscoe, Networks Research Centre, BT > Research > B54/77 Adastral Park,Martlesham Heath,Ipswich,IP5 3RE,UK. +44 1473 > 645196 > > > From ihsanqazi at gmail.com Wed Jan 24 17:04:26 2007 From: ihsanqazi at gmail.com (Ihsan Qazi) Date: Wed, 24 Jan 2007 20:04:26 -0500 Subject: [e2e] TCP and bi-directional traffic Message-ID: Hi everyone, I have a question on which I would like to get some comments. To what extent the current analytic models of TCP accurately capture the (real) behaviour of TCP? Does there exist a body of work which analytically characterizes the TCP latency and throughput taking into consideration the effects of bi-directional traffic (factors like ACK Compression, reduced forward path capacity due to the presence of ACKs etc) on TCP flows? I am aware about some observational studies and some work related to mitigating the effects of ACK Compression and asymmetric links (e.g. prioritizing ACKs, applying backpressure, connection-level bandwidth allocation schemes etc) but my question pertains to analytical work. Thanks in advance. Ihsan -- Ihsan Ayyub Qazi PhD Student, Department of Computer Science 6803 Sennott Square, University of Pittsburgh Pittsburgh, PA 15260 WWW: http://www.cs.pitt.edu/~ihsan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20070124/ff9507e5/attachment.html From detlef.bosau at web.de Mon Jan 29 12:48:47 2007 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 29 Jan 2007 21:48:47 +0100 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? Message-ID: <45BE5DAF.5040701@web.de> My apologies for this question, perhaps it?s simple: In TCP, lost / dropped packets are recognised as congestion indicator. We don?t do so with missing ACKs. Consider the following net: (downstream:) T T T T T T T T T Sender Receiver (upstream: ) AAAAAAAAAA Then the flow occupies the cumulated capacity of T(CP packets) and A(CK packets). If CWND grows too large (by probing) and the available path capacity is exceeded, packet drop occurs. If a TCP packet is dropped, this is reckognized as congestion indication. Shouldn?t be a dropped ACK packet seen as congestion indication as well? Perhaps, this question is a bit stupid, but I don?t see the clue here at the moment. Perhaps, someone could help me please? Thanks! Detlef From baruch at ev-en.org Tue Jan 30 00:01:35 2007 From: baruch at ev-en.org (Baruch Even) Date: Tue, 30 Jan 2007 10:01:35 +0200 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <45BE5DAF.5040701@web.de> References: <45BE5DAF.5040701@web.de> Message-ID: <20070130080135.GP22455@galon.ev-en.org> * Detlef Bosau [070129 23:29]: > My apologies for this question, perhaps it?s simple: > > In TCP, lost / dropped packets are recognised as congestion indicator. > We don?t do so with missing ACKs. > > Consider the following net: > > > (downstream:) T T T T T T T T T > Sender Receiver > (upstream: ) AAAAAAAAAA > > > Then the flow occupies the cumulated capacity of T(CP packets) and A(CK packets). > > If CWND grows too large (by probing) and the available path capacity is exceeded, packet drop occurs. > If a TCP packet is dropped, this is reckognized as congestion indication. Shouldn?t be a dropped ACK packet seen as > congestion indication as well? How would you go about detecting that an ACK was lost? TCP packet loss is detected by receiving repeating ACKs with the same sequence number or by packets with SACK information. ACKs might not be for each TCP packet, delayed-acks can and are being sent all around the net and they usually acknowledge two or more packets. Linux sometimes takes its time and was seen to ack 7 packets per ack. And then there is a (more important) question of why would you consider an ACK lost to be a congestion event at all. A congestion event means that we are pushing too much data through the link and we should slow down, but the ACK packets normally carry no payload so the only congestion signal should be on the direction that the payload is flowing. Rarely the protocol has bidirectional data transfers (the lovely days of bimodem compared to zmodem!) and then congestion is detect on each direction independently. There are cases of asymmetric links that might cause trouble, but that will only serve to slow down the payload direction as well since packets are released to the network only when acks come back, so a lost ack will already slow down the rate of the payload, just not by cutting the cwnd to half. Baruch From baruch at ev-en.org Tue Jan 30 01:58:17 2007 From: baruch at ev-en.org (Baruch Even) Date: Tue, 30 Jan 2007 11:58:17 +0200 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com> References: <45BE5DAF.5040701@web.de> <20070130080135.GP22455@galon.ev-en.org> <79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com> Message-ID: <20070130095817.GQ22455@galon.ev-en.org> * Fred Baker [070130 10:52]: > > On Jan 30, 2007, at 12:01 AM, Baruch Even wrote: > > >There are cases of asymmetric links that might cause trouble, but that > >will only serve to slow down the payload direction as well since packets > >are released to the network only when acks come back, so a lost ack will > >already slow down the rate of the payload, just not by cutting the cwnd > >to half. > > actually, one can argue that it speed the payload up, or that it causes it to burst. If I have octets 10000..20000 > outstanding, receive an ack for 10000-11999, and drop one for 12000-13999, and now receive an ack indicating that my > peer has received "through 15999", that looks to me like an ack for 12000-15999, and I should send a burst of that size. I don't know in other OSes but in Linux that's not the case. Linux will send up to 3 packets IIRC. So if we get an ack for more than 3 packets we will only send 3 packets and "lose" the extra credit for now. Ofcourse, the next packet that acks two packets will cause three packets to be sent as well, so it does slow us down some and also causes larger micro-bursts. Baruch From lachlan.andrew at gmail.com Mon Jan 29 14:18:09 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Mon, 29 Jan 2007 14:18:09 -0800 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <45BE5DAF.5040701@web.de> References: <45BE5DAF.5040701@web.de> Message-ID: Greetings Detlef, On 29/01/07, Detlef Bosau wrote: > > In TCP, lost / dropped packets are recognised as congestion indicator. > We don?t do so with missing ACKs. > > If a TCP packet is dropped, this is reckognized as congestion > indication. Shouldn?t be a dropped ACK packet seen as congestion > indication as well? Because ACKs are cumulative, we don't know that separate ACKs were sent for each packet. For example, high-end NICs typically have "interrupt coalescence", which delivers a large bunch of packets simultaneously to reduce CPU overhead. A single "fat ACK" is sent which cumulatively acknowledges all of these packets. This happens even when the receiver is not congested. Another factor is that ACKs are typically small compared with data packets. The total network throughput is much greater if we throttle only the sources contributing most to a given link's congestion, namely those sending full data packets over the link. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From fred at cisco.com Mon Jan 29 14:38:04 2007 From: fred at cisco.com (Fred Baker) Date: Mon, 29 Jan 2007 14:38:04 -0800 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <45BE5DAF.5040701@web.de> References: <45BE5DAF.5040701@web.de> Message-ID: missing acks are indeed an indicator of something, but it may not be forward path congestion. In asymmetric circuits, for example, it is often an indicator of reverse path congestion. eg, if I have 100 KBPS up and 1000 KBPS down, I might use up the 100 KBPS before I use up the 1000 KBPS. Some research I read a few years back suggested that in such cases it might be interesting to use last-in-first-out queuing on the slower speed path, with a view to letting the later-and-more-inclusive ack get through first and eat the bypassed ones later, just to keep the forward path going. One of the criticisms of FAST TCP is that it is susceptible to reverse path congestion. and in any event, I can think of many networks in which loss is an indicator of nothing more than loss. Just say "radio"... On Jan 29, 2007, at 12:48 PM, Detlef Bosau wrote: > My apologies for this question, perhaps it?s simple: > > In TCP, lost / dropped packets are recognised as congestion indicator. > We don?t do so with missing ACKs. > > Consider the following net: > > > (downstream:) T T T T T T T T T > Sender > Receiver > (upstream: ) AAAAAAAAAA > > > Then the flow occupies the cumulated capacity of T(CP packets) and A > (CK packets). > > If CWND grows too large (by probing) and the available path > capacity is exceeded, packet drop occurs. > If a TCP packet is dropped, this is reckognized as congestion > indication. Shouldn?t be a dropped ACK packet seen as congestion > indication as well? > > Perhaps, this question is a bit stupid, but I don?t see the clue > here at the moment. Perhaps, someone could help me please? > > Thanks! > > Detlef > > From fred at cisco.com Tue Jan 30 00:42:44 2007 From: fred at cisco.com (Fred Baker) Date: Tue, 30 Jan 2007 00:42:44 -0800 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <20070130080135.GP22455@galon.ev-en.org> References: <45BE5DAF.5040701@web.de> <20070130080135.GP22455@galon.ev-en.org> Message-ID: <79EF774E-E9E6-48C0-9568-254A397CCD07@cisco.com> On Jan 30, 2007, at 12:01 AM, Baruch Even wrote: > There are cases of asymmetric links that might cause trouble, but that > will only serve to slow down the payload direction as well since > packets > are released to the network only when acks come back, so a lost ack > will > already slow down the rate of the payload, just not by cutting the > cwnd > to half. actually, one can argue that it speed the payload up, or that it causes it to burst. If I have octets 10000..20000 outstanding, receive an ack for 10000-11999, and drop one for 12000-13999, and now receive an ack indicating that my peer has received "through 15999", that looks to me like an ack for 12000-15999, and I should send a burst of that size. From Jon.Crowcroft at cl.cam.ac.uk Wed Jan 31 12:23:51 2007 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Wed, 31 Jan 2007 20:23:51 +0000 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: Message from "Lachlan Andrew" of "Mon, 29 Jan 2007 14:18:09 PST." Message-ID: its clear we should devise a schmee for disguising data packets as acks a they'd 1/ advance the congestion window and so on 2/ get highrer priority than data packets otoh, how do we do this - compression, perhaps? how well would VJ's compressed tcp./ip headers scale over multiple hops? intersting to thin kabout sratge recovery ( a la nat state recovery) too... also, what would happen if this was typical behaviour? virtual circuit IP? MPLS on IP? who knows? In missive , "Lachlan Andrew" typed: >>Greetings Detlef, >> >>On 29/01/07, Detlef Bosau wrote: >>> >>> In TCP, lost / dropped packets are recognised as congestion indicator. >>> We don=B4t do so with missing ACKs. >>> >>> If a TCP packet is dropped, this is reckognized as congestion >>> indication. Shouldn=B4t be a dropped ACK packet seen as congestion >>> indication as well? >> >>Because ACKs are cumulative, we don't know that separate ACKs were >>sent for each packet. >> >>For example, high-end NICs typically have "interrupt coalescence", >>which delivers a large bunch of packets simultaneously to reduce CPU >>overhead. A single "fat ACK" is sent which cumulatively acknowledges >>all of these packets. This happens even when the receiver is not >>congested. >> >> >>Another factor is that ACKs are typically small compared with data >>packets. The total network throughput is much greater if we throttle >>only the sources contributing most to a given link's congestion, >>namely those sending full data packets over the link. >> >>Cheers, >>Lachlan >> >>--=20 >>Lachlan Andrew Dept of Computer Science, Caltech >>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA >>Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 >> cheers jon From rewaskar at email.unc.edu Wed Jan 31 13:02:41 2007 From: rewaskar at email.unc.edu (Sushant Rewaskar) Date: Wed, 31 Jan 2007 16:02:41 -0500 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: References: <45BE5DAF.5040701@web.de> Message-ID: <002c01c7457b$2580f210$7a850298@cs.unc.edu> Hi, I agree with Lachlan. In TCP there is no way to know when an ack is lost as it carries no "sequence number" of its own. (so in fact not only it is not done but it cannot be easily done in the current set-up). To get a better understanding of these issues you may want to read the string of papers and RFC on Datagram Congestion Control Protocol (DCCP) (http://www.read.cs.ucla.edu/dccp/ ) Take care, Sushant Rewaskar ----------------------------- UNC Chapel Hill www.cs.unc.edu/~rewaskar -----Original Message----- From: end2end-interest-bounces at postel.org [mailto:end2end-interest-bounces at postel.org] On Behalf Of Lachlan Andrew Sent: Monday, January 29, 2007 5:18 PM To: Detlef Bosau Cc: end2end-interest at postel.org Subject: Re: [e2e] Stupid Question: Why are missing ACKs not considered asindicator for congestion? Greetings Detlef, On 29/01/07, Detlef Bosau wrote: > > In TCP, lost / dropped packets are recognised as congestion indicator. > We don4t do so with missing ACKs. > > If a TCP packet is dropped, this is reckognized as congestion > indication. Shouldn4t be a dropped ACK packet seen as congestion > indication as well? Because ACKs are cumulative, we don't know that separate ACKs were sent for each packet. For example, high-end NICs typically have "interrupt coalescence", which delivers a large bunch of packets simultaneously to reduce CPU overhead. A single "fat ACK" is sent which cumulatively acknowledges all of these packets. This happens even when the receiver is not congested. Another factor is that ACKs are typically small compared with data packets. The total network throughput is much greater if we throttle only the sources contributing most to a given link's congestion, namely those sending full data packets over the link. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From L.Wood at surrey.ac.uk Wed Jan 31 14:41:00 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Wed, 31 Jan 2007 22:41:00 +0000 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <002c01c7457b$2580f210$7a850298@cs.unc.edu> References: <45BE5DAF.5040701@web.de> <002c01c7457b$2580f210$7a850298@cs.unc.edu> Message-ID: <200701312241.WAA09104@cisco.com> At Wednesday 31/01/2007 16:02 -0500, Sushant Rewaskar wrote: >Hi, >I agree with Lachlan. In TCP there is no way to know when an ack is lost as >it carries no "sequence number" of its own. It can - timestamps are used for disambiguation, and they disambiguate the acks. They can act as unique sequence numbers. (In fact, you wouldn't naively issue a timestamp, and expect the other end to copy and reflect it in an ack, as that's open to a variety of DoS attacks. The sender would have a table of timestamp times, with unique keys for each timestamp, and the sender would send out and look for the key in the timestamp option field. To get a better understanding of these issues you may want to read RFC1323.) It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for. Stupid question: why is a missing ack presumed to automatically be due to congestion, rather than link errors along the path? L. >(so in fact not only it is not >done but it cannot be easily done in the current set-up). > >To get a better understanding of these issues you may want to read the >string of papers and RFC on Datagram Congestion Control Protocol (DCCP) >(http://www.read.cs.ucla.edu/dccp/ ) > > >Take care, >Sushant Rewaskar >----------------------------- >UNC Chapel Hill >www.cs.unc.edu/~rewaskar > > >-----Original Message----- >From: end2end-interest-bounces at postel.org >[mailto:end2end-interest-bounces at postel.org] On Behalf Of Lachlan Andrew >Sent: Monday, January 29, 2007 5:18 PM >To: Detlef Bosau >Cc: end2end-interest at postel.org >Subject: Re: [e2e] Stupid Question: Why are missing ACKs not considered >asindicator for congestion? > >Greetings Detlef, > >On 29/01/07, Detlef Bosau wrote: >> >> In TCP, lost / dropped packets are recognised as congestion indicator. >> We don4t do so with missing ACKs. >> >> If a TCP packet is dropped, this is reckognized as congestion >> indication. Shouldn4t be a dropped ACK packet seen as congestion >> indication as well? > >Because ACKs are cumulative, we don't know that separate ACKs were >sent for each packet. > >For example, high-end NICs typically have "interrupt coalescence", >which delivers a large bunch of packets simultaneously to reduce CPU >overhead. A single "fat ACK" is sent which cumulatively acknowledges >all of these packets. This happens even when the receiver is not >congested. > > >Another factor is that ACKs are typically small compared with data >packets. The total network throughput is much greater if we throttle >only the sources contributing most to a given link's congestion, >namely those sending full data packets over the link. > >Cheers, >Lachlan > >-- >Lachlan Andrew Dept of Computer Science, Caltech >1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA >Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From L.Wood at surrey.ac.uk Wed Jan 31 14:43:03 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Wed, 31 Jan 2007 22:43:03 +0000 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: References: Message-ID: <200701312243.WAA09233@cisco.com> At Wednesday 31/01/2007 20:23 +0000, Jon Crowcroft wrote: >its clear we should devise a schmee for disguising data packets as acks which is what piggybacking acks on data packets already does. (ns one-way tcp doesn't simulate this. Try Fulltcp.) >a they'd >1/ advance the congestion window and so on >2/ get highrer priority than data packets > >otoh, how do we do this - compression, perhaps? how well would VJ's compressed >tcp./ip headers scale over multiple hops? intersting to thin kabout sratge >recovery ( a la nat state recovery) too... > >also, what would happen if this was typical behaviour? virtual circuit IP? >MPLS on IP? who knows? who cares? >In missive , "Lachlan Andrew" >typed: > > >>Greetings Detlef, > >> > >>On 29/01/07, Detlef Bosau wrote: > >>> > >>> In TCP, lost / dropped packets are recognised as congestion indicator. > >>> We don=B4t do so with missing ACKs. > >>> > >>> If a TCP packet is dropped, this is reckognized as congestion > >>> indication. Shouldn=B4t be a dropped ACK packet seen as congestion > >>> indication as well? > >> > >>Because ACKs are cumulative, we don't know that separate ACKs were > >>sent for each packet. > >> > >>For example, high-end NICs typically have "interrupt coalescence", > >>which delivers a large bunch of packets simultaneously to reduce CPU > >>overhead. A single "fat ACK" is sent which cumulatively acknowledges > >>all of these packets. This happens even when the receiver is not > >>congested. > >> > >> > >>Another factor is that ACKs are typically small compared with data > >>packets. The total network throughput is much greater if we throttle > >>only the sources contributing most to a given link's congestion, > >>namely those sending full data packets over the link. > >> > >>Cheers, > >>Lachlan > >> > >>--=20 > >>Lachlan Andrew Dept of Computer Science, Caltech > >>1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA > >>Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 > >> > > cheers > > jon From acaro at bbn.com Wed Jan 31 14:55:52 2007 From: acaro at bbn.com (Armando L. Caro, Jr.) Date: Wed, 31 Jan 2007 17:55:52 -0500 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: References: <45BE5DAF.5040701@web.de> Message-ID: <45C11E78.4030203@bbn.com> Fred Baker wrote: > and in any event, I can think of many networks in which loss is an > indicator of nothing more than loss. Just say "radio"... That might not always be true. For simplicity, let's assume a single wireless link in the end-to-end path. If that link does L2 retransmissions, loss on the radio channel will build up a queue at L2. Now if the endpoints are seeing loss at L4, then that means the loss was so bad that multiple L2 retransmissions were unsuccessful... which implies a larger queue. Thus, the sender should back off, just as it would if it experienced a loss on a wired network. -- Armando From L.Wood at surrey.ac.uk Wed Jan 31 17:40:56 2007 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Thu, 01 Feb 2007 01:40:56 +0000 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: References: <45BE5DAF.5040701@web.de> <002c01c7457b$2580f210$7a850298@cs.unc.edu> <200701312241.WAA09104@cisco.com> Message-ID: <200702010141.BAA15760@cisco.com> At Wednesday 31/01/2007 16:34 -0800, Lachlan Andrew wrote: >Greetings Lloyd, > >On 31/01/07, Lloyd Wood wrote: >>It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for. > >No, that was my point. We can't distinguish between ACKs which are >lost and those which are never sent in the first place. Yes, we can. If a SACK block is present, it tells you which datagrams were and weren't received. If a datagram was received, an ack was sent (modulo the delack mechanism), and the datagram will not be called out in the SACK block. If the datagram wasn't received, this will be reflected in the SACK block. >Also, having a unique identifier (like a timestamp) isn't the same as >having sequence numbers which can say "We're (not) consecutive". The >latter can detect loss but the former can't. If you have timestamps on every ack and packet, what's the difference? >Cheers, >Lachlan > >-- >Lachlan Andrew Dept of Computer Science, Caltech >1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA >Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From michael.welzl at uibk.ac.at Wed Jan 31 23:14:15 2007 From: michael.welzl at uibk.ac.at (Michael Welzl) Date: 01 Feb 2007 08:14:15 +0100 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <200702010141.BAA15760@cisco.com> References: <45BE5DAF.5040701@web.de> <002c01c7457b$2580f210$7a850298@cs.unc.edu> <200701312241.WAA09104@cisco.com> <200702010141.BAA15760@cisco.com> Message-ID: <1170314055.4775.12.camel@lap10-c703.uibk.ac.at> > >On 31/01/07, Lloyd Wood wrote: > >>It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for. > > > >No, that was my point. We can't distinguish between ACKs which are > >lost and those which are never sent in the first place. > > Yes, we can. If a SACK block is present, it tells you which datagrams were and weren't received. > > If a datagram was received, an ack was sent (modulo the delack mechanism), and the datagram will not be called out in the SACK block. > > If the datagram wasn't received, this will be reflected in the SACK block. > > > >Also, having a unique identifier (like a timestamp) isn't the same as > >having sequence numbers which can say "We're (not) consecutive". The > >latter can detect loss but the former can't. > > If you have timestamps on every ack and packet, what's the difference? I think that these methods of ACK loss detection are interesting ideas, and there might be a way to intelligently combine them with what's already in http://www.icir.org/floyd/papers/draft-floyd-tcpm-ackcc-00d.txt Cheers, Michael From lachlan.andrew at gmail.com Wed Jan 31 16:34:51 2007 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 31 Jan 2007 16:34:51 -0800 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <200701312241.WAA09104@cisco.com> References: <45BE5DAF.5040701@web.de> <002c01c7457b$2580f210$7a850298@cs.unc.edu> <200701312241.WAA09104@cisco.com> Message-ID: Greetings Lloyd, On 31/01/07, Lloyd Wood wrote: > It's possible for the sender to infer that an ack has been lost, based on subsequent receiver behaviour in sending a cumulative ack including packets received that the sender didn't get individual acks for. No, that was my point. We can't distinguish between ACKs which are lost and those which are never sent in the first place. Also, having a unique identifier (like a timestamp) isn't the same as having sequence numbers which can say "We're (not) consecutive". The latter can detect loss but the former can't. Cheers, Lachlan -- Lachlan Andrew Dept of Computer Science, Caltech 1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA Phone: +1 (626) 395-8820 Fax: +1 (626) 568-3603 From fred at cisco.com Wed Jan 31 23:32:41 2007 From: fred at cisco.com (Fred Baker) Date: Wed, 31 Jan 2007 23:32:41 -0800 Subject: [e2e] Stupid Question: Why are missing ACKs not considered as indicator for congestion? In-Reply-To: <45C11E78.4030203@bbn.com> References: <45BE5DAF.5040701@web.de> <45C11E78.4030203@bbn.com> Message-ID: <4D40438F-B763-4A16-83B5-6563A7357935@cisco.com> yes, there are cases in which it means congestion in a radio circuit. My point is that there are cases in which it means nothing of the kind. On Jan 31, 2007, at 2:55 PM, Armando L. Caro, Jr. wrote: > Fred Baker wrote: >> and in any event, I can think of many networks in which loss is an >> indicator of nothing more than loss. Just say "radio"... > > That might not always be true. For simplicity, let's assume a single > wireless link in the end-to-end path. If that link does L2 > retransmissions, loss on the radio channel will build up a queue at > L2. > Now if the endpoints are seeing loss at L4, then that means the > loss was > so bad that multiple L2 retransmissions were unsuccessful... which > implies a larger queue. Thus, the sender should back off, just as it > would if it experienced a loss on a wired network. > > -- > Armando