From anoop at brocade.com Fri May 7 14:24:28 2010 From: anoop at brocade.com (Anoop Ghanwani) Date: Fri, 7 May 2010 14:24:28 -0700 Subject: [e2e] TCP implementations in various OS's Message-ID: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> Is there a website that lists out TCP implementation details (such as default window scale, whether or not SACK is implemented/enabled, etc.) by OS? Right now I'm mainly interested in the Window Scale option, but I'm sure there'll be a time when I'm interested in some other parameter. Thanks, Anoop From jasleen at cs.unc.edu Mon May 10 08:52:09 2010 From: jasleen at cs.unc.edu (Jasleen Kaur) Date: Mon, 10 May 2010 11:52:09 -0400 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> Message-ID: <4BE82BA9.9050409@cs.unc.edu> Anoop, Although it doesn't contain details about the window-scale option, our ICNP'07 paper lists some other parameters: S. Rewaskar, J. Kaur, and F.D. Smith, "A Performance Study of Loss Detection/Recovery in Real-world TCP Implementations", in Proceedings of the IEEE International Conference on Network Protocols (ICNP'07), Beijing, China, Oct 2007. http://www.cs.unc.edu/~jasleen/papers/icnp07.pdf Thanks, Jasleen Anoop Ghanwani wrote: > Is there a website that lists out TCP implementation > details (such as default window scale, whether or not > SACK is implemented/enabled, etc.) by OS? > > Right now I'm mainly interested in the Window Scale > option, but I'm sure there'll be a time when I'm > interested in some other parameter. > > Thanks, > Anoop > > > From hagen at jauu.net Mon May 10 15:17:12 2010 From: hagen at jauu.net (Hagen Paul Pfeifer) Date: Tue, 11 May 2010 00:17:12 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> Message-ID: <20100510221712.GD24715@nuttenaction> * Anoop Ghanwani | 2010-05-07 14:24:28 [-0700]: >Is there a website that lists out TCP implementation >details (such as default window scale, whether or not >SACK is implemented/enabled, etc.) by OS? > >Right now I'm mainly interested in the Window Scale >option, but I'm sure there'll be a time when I'm >interested in some other parameter. You can reverse engineer nmap's OS database [0], especially results catched in T1 are of interest. So you get a nearly complete image of the window scale option used in the wild. Cheers, Hagen TIP: use perl, read in paragraph mode, split the paragraph into a array via split at newline boundaries and use a REGEX to filter the relevant infos. [0] /usr/share/nmap/nmap-os-db -- Hagen Paul Pfeifer || http://jauu.net/ Telephone: +49 174 5455209 || Key Id: 0x98350C22 Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22 Always in motion, the future is. From detlef.bosau at web.de Tue May 11 09:33:40 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 11 May 2010 18:33:40 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> Message-ID: <4BE986E4.6070105@web.de> Anoop Ghanwani wrote: > Is there a website that lists out TCP implementation > details (such as default window scale, whether or not > SACK is implemented/enabled, etc.) by OS? > > Right now I'm mainly interested in the Window Scale > option, but I'm sure there'll be a time when I'm > interested in some other parameter. > > Thanks, > Anoop > > I'm curious why some people are interested in a big variety of TCP implementations and parameter settings. Wouldn't it improve compatibility and interoperability if we choose algorithms and parameters according to the recommendations given in the RFC? -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From lachlan.andrew at gmail.com Tue May 11 18:22:53 2010 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Wed, 12 May 2010 11:22:53 +1000 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BE986E4.6070105@web.de> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> Message-ID: On 12 May 2010 02:33, Detlef Bosau wrote: > > I'm curious why some people are interested in a big variety of TCP > implementations and parameter settings. > > Wouldn't it improve compatibility and interoperability if we choose > algorithms and parameters according to the recommendations given in the RFC? Greetings Detlef, If the RFCs were perfect and everyone used the recommended parameters, then that would give the best system, and be a "Nash equilibrium". However, if many people already use non-recommended parameters, then compatibility isn't necessarily maximised by used the recommended values. The RFCs' SHOULDs are typically not MUSTs for a reason. It is interesting to know which options people choose for many reasons: 1. so that we can "optimise for the common case". 2. many people say "we can't turn on feature X because it breaks middle boxes", and it is interesting to know which systems actually work despite having feature X turned on. 3. If an option must be supported by both ends, the current rate of deployment affects the incremental benefit to one user of turning it on. Also, not everyone studying the Internet is trying to design it; there is value is just understanding how it currently behaves. $0.02, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From hagen at jauu.net Wed May 12 00:28:50 2010 From: hagen at jauu.net (Hagen Paul Pfeifer) Date: Wed, 12 May 2010 09:28:50 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> Message-ID: <20100512072850.GA10097@nuttenaction> * Lachlan Andrew | 2010-05-12 11:22:53 [+1000]: >On 12 May 2010 02:33, Detlef Bosau wrote: >> >> I'm curious why some people are interested in a big variety of TCP >> implementations and parameter settings. >> >> Wouldn't it improve compatibility and interoperability if we choose >> algorithms and parameters according to the recommendations given in the RFC? > >Greetings Detlef, > >If the RFCs were perfect and everyone used the recommended parameters, >then that would give the best system, and be a "Nash equilibrium". >However, if many people already use non-recommended parameters, then >compatibility isn't necessarily maximised by used the recommended >values. I threat the query from Anoop pragmatically, currently he is only interested in Window scaling. Window scaling underlies no MUST/SHOULD/COULD/WHATEVER semantic. Sensor networks for example can disable window scaling where on the other hand satelite/internet2 networks requires the full range of scaling. Window scaling is a option where the user can and should use the value that fits into his particular environment. This generates a lot of varities - but with good cause. ;-) Cheers, Hagen -- Hagen Paul Pfeifer || http://jauu.net/ Telephone: +49 174 5455209 || Key Id: 0x98350C22 Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22 Always in motion, the future is. From detlef.bosau at web.de Wed May 12 02:30:10 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 11:30:10 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> Message-ID: <4BEA7522.9060405@web.de> Lachlan Andrew wrote: > On 12 May 2010 02:33, Detlef Bosau wrote: > >> I'm curious why some people are interested in a big variety of TCP >> implementations and parameter settings. >> >> Wouldn't it improve compatibility and interoperability if we choose >> algorithms and parameters according to the recommendations given in the RFC? >> > > Greetings Detlef, > > If the RFCs were perfect and everyone used the recommended parameters, > then that would give the best system, and be a "Nash equilibrium". > However, if many people already use non-recommended parameters, then > compatibility isn't necessarily maximised by used the recommended > values. > Correct. However, the compatibility is neither maximised by he use of arbitrarily chosen parameters. I think, this reflects the typical CS guy's attitude to standardization. Drastically spoken, CS guys believe in applied chaos theory while e.g. civil engineers take standards for divine commandments. (We all know the fortune cooky concerning programmers, woodpeckers and builders.) > The RFCs' SHOULDs are typically not MUSTs for a reason. It is > SHOULD is equivalent for "MUST by divine commandmend", with the only exception that in some extremely rare circumstances (Jesus returns, Jesus II is born by a male virgin, the Pope becames lutheran....) there may be some excuses to violate the recommendation. There must be extremely compelling reasons to violate a SHOULD. Personally, I'm convinced that any engineering discipline (and CS _is_ an engineering discipline) requires strict standards, otherwise it simply wouldn't work. One anecdote about a former president of the federal republic of Germany was about an electric razor, which was a gift to our president by the US ambassador - and which broke the first time it was used, simply because our president missed to adjust the power supply. (We used 220 V / 50 Hz AC that time, the US used 110 V / 60 Hz in the sixties?) So, when a German president grows a beard, it may be a consequence of a lack of standardization. > interesting to know which options people choose for many reasons: > 1. so that we can "optimise for the common case". > That's always a good idea for a general purpose protocol. > 2. many people say "we can't turn on feature X because it breaks > middle boxes", Which generally should be used with great care. Most of all, it is up to the middlebox not to break the protocol and it is not up to the protocol to leave alone the middlebox. > and it is interesting to know which systems actually > work despite having feature X turned on. > O.k., it is always interesting why people live without brain %-) (Claims that those do appear on a quite regular basis ;-)) I totally agree with you in a lab. I totally disagree in the field. Although I well know that the deployment of the Internet was achieved by quite some "experiments in the field". However, the requirement is stated quite frequently that CS should become an "aduld science" some day and that CS guys should behave like grown up engineers who serve the customer and the field and not like toddlers in a play ground. > 3. If an option must be supported by both ends, the current rate of > deployment affects the incremental benefit to one user of turning it > on. > Absolutely. And this holds particularly true for the window scaling parameter which was part of the original question. (And the subject of a somewhat strange discussion I once had with a colleague....) And although I expect M$ to furiously contradict here, I think window scaling should be discouraged for terrestrial TCP sessions. The case where 65536 bytes are not sufficient here is extremely rare - in all other cases the one guy sitting in Berkeley maintaining a TCP session to a node in Hamburg will eventually exploit the queue memory of intermediate nodes - and will cause severe grief for all competing users. > Also, not everyone studying the Internet is trying to design it; there > However, those who use the Internet should not suffer from those who study it. > is value is just understanding how it currently behaves. > > I'm totally with you here, however we must accept that the Internet is actually used. By customers and users, some of whom even spend money on it. Those days were the Internet was constituted by about 40 nodes situated at some locations in the US are gone. And they will not come back. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From detlef.bosau at web.de Wed May 12 02:42:56 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 11:42:56 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <20100512072850.GA10097@nuttenaction> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> <20100512072850.GA10097@nuttenaction> Message-ID: <4BEA7820.5000901@web.de> Hagen Paul Pfeifer wrote: > I threat the query from Anoop pragmatically, Oh, the poor query ;-) > currently he is only interested > in Window scaling. Window scaling underlies no MUST/SHOULD/COULD/WHATEVER > semantic. Sensor networks for example can disable window scaling where on the > other hand satelite/internet2 networks requires the full range of scaling. > > See my remark above. On satellite links, window scaling may well be appropriate. However, on terrestrial links TCP sessions doing window scaling are simply misbehaved in the vast majority of cases. (Even if _all_ sessions did so, because it is not our goal to support hardware vendors but to keep the queues small.) For a general purpose protocol, we assume route transparency. So the problem is to know whether window scaling actually makes sense or not. This may be a question of wheter a one size fits all approach in TCP really makes sense or if we should make a difference between enzygotic twins like Arnold Schwarzenegger and Dany deVito. > Window scaling is a option where the user can and should use the value that > fits into his particular environment. This generates a lot of varities - but > with good cause. ;-) > The _user_? Beware of users, my friend! There is nothing such as terrible als OSI layer 8. (credits to H. Oldach.) -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From sthaug at nethelp.no Wed May 12 03:55:22 2010 From: sthaug at nethelp.no (sthaug@nethelp.no) Date: Wed, 12 May 2010 12:55:22 +0200 (CEST) Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEA7820.5000901@web.de> References: <20100512072850.GA10097@nuttenaction> <4BEA7820.5000901@web.de> Message-ID: <20100512.125522.74733250.sthaug@nethelp.no> > See my remark above. On satellite links, window scaling may well be > appropriate. > > However, on terrestrial links TCP sessions doing window scaling are > simply misbehaved in the vast majority of cases. (Even if _all_ sessions > did so, because it is not our goal to support hardware vendors but to > keep the queues small.) Why on earth do you say "simply misbehaved in the vast majority of cases"? If I have a TCP connection with 35 ms RTT, this connection will be limited to 15 Mbps unless I do window scaling. Also, *your* goal may be to keep queues small. That is not necessarily the goal of all operators in all situations. Steinar Haug, Nethelp consulting, sthaug at nethelp.no From hagen at jauu.net Wed May 12 04:00:42 2010 From: hagen at jauu.net (Hagen Paul Pfeifer) Date: Wed, 12 May 2010 13:00:42 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEA7820.5000901@web.de> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> <20100512072850.GA10097@nuttenaction> <4BEA7820.5000901@web.de> Message-ID: <0078089f6064595372204dd6fa3612ba@localhost> On Wed, 12 May 2010 11:42:56 +0200, Detlef Bosau wrote: > However, on terrestrial links TCP sessions doing window scaling are > simply misbehaved in the vast majority of cases. (Even if _all_ sessions > did so, because it is not our goal to support hardware vendors but to > keep the queues small.) -v, can you explain this more specific (why misbehaved)? How should a stack behave correctly? Cheers, Hagen From detlef.bosau at web.de Wed May 12 04:17:12 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 13:17:12 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <20100512.125522.74733250.sthaug@nethelp.no> References: <20100512072850.GA10097@nuttenaction> <4BEA7820.5000901@web.de> <20100512.125522.74733250.sthaug@nethelp.no> Message-ID: <4BEA8E38.1080701@web.de> On 05/12/2010 12:55 PM, sthaug at nethelp.no wrote: > Why on earth do you say "simply misbehaved in the vast majority of cases"? > If I have a TCP connection with 35 ms RTT, this connection will be limited > to 15 Mbps unless I do window scaling. > > What's the problem with your RTT? The RTT is absolutely meaningless in this context. The one and only significant question is whether the fair share of the path capacity for your session is greater than 65536 bytes. > Also, *your* goal may be to keep queues small. That is not necessarily > the goal of all operators in all situations. > > Probably you want to read the SIGCOMM 2004 paper by Guido Appenzeller some years ago and the basic works by John Nagle on this matter. The problem with some sessions using window scaling while others do not is that some sessions may in fact utilize all the memory which is put into intermediate nodes - be it even Gigabytes - while those without window scaling are restricted to 65536 bytes. This way, the window scaling flows largely outpeform the others while there is absolutely no advantage for the overall throughput in the network at all. If every flow would use window scaling in long fat networks, the windows most likely would converge to reasonable sizes. Please keep in mind that the central idea of slinding window is to fully utilize the _link_ capacity and not to support a memory chip vendor's marketing department. With sufficient queueing memory, you can even grow a 1 meter Ethernet link to a link capacity of 5000 Terabytes. I only don't see any valid reason for doing so. > Steinar Haug, Nethelp consulting, sthaug at nethelp.no > -- ------------------------------------------------------------------------ Detlef Bosau Galileistra?e 30 70565 Stuttgart Tel.: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de ------------------------------------------------------------------------ From perfgeek at mac.com Wed May 12 07:55:43 2010 From: perfgeek at mac.com (rick jones) Date: Wed, 12 May 2010 07:55:43 -0700 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEA7522.9060405@web.de> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> <4BEA7522.9060405@web.de> Message-ID: <8FE460E8-27B0-4643-9626-E0E15DE22A92@mac.com> On May 12, 2010, at 2:30 AM, Detlef Bosau wrote: > And although I expect M$ to furiously contradict here, I think > window scaling should be discouraged for terrestrial TCP sessions. > The case where 65536 bytes are not sufficient here is extremely > rare - in all other cases the one guy sitting in Berkeley > maintaining a TCP session to a node in Hamburg will eventually > exploit the queue memory of intermediate nodes - and will cause > severe grief for all competing users. I'm arriving late to the discussion - perhaps data centers and LANs were not included in your set of terrestrial TCP sessions and I'm but providing fodder for "TCP as the one true protocol is bad" school of thought, but it has been my experience thusfar that over a 10 Gbit/s Ethernet LAN, TCP needs 128KB or more of window to achieve reasonable throughput. Get much more than 1 ms of delay in the LAN or data center and even that is insufficient. rick jones Wisdom teeth are impacted, people are affected by the effects of events From sthaug at nethelp.no Wed May 12 08:11:34 2010 From: sthaug at nethelp.no (sthaug@nethelp.no) Date: Wed, 12 May 2010 17:11:34 +0200 (CEST) Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEA8E38.1080701@web.de> References: <4BEA7820.5000901@web.de> <20100512.125522.74733250.sthaug@nethelp.no> <4BEA8E38.1080701@web.de> Message-ID: <20100512.171134.74744878.sthaug@nethelp.no> > > Why on earth do you say "simply misbehaved in the vast majority of cases"? > > If I have a TCP connection with 35 ms RTT, this connection will be limited > > to 15 Mbps unless I do window scaling. > > What's the problem with your RTT? The RTT is absolutely meaningless in > this context. > > The one and only significant question is whether the fair share of the > path capacity for your session is greater than 65536 bytes. We clearly have different goals. I work for a service provider. If I have a customer who wants to transmit more than 15 Mbps between end points with a 35 ms RTT, one of my goals is to make it *possible* for the customer to do this (assuming the customer has sufficient access capacity at both ends). To do this with normal TCP, the customer needs window scaling. > > Also, *your* goal may be to keep queues small. That is not necessarily > > the goal of all operators in all situations. > > Probably you want to read the SIGCOMM 2004 paper by Guido Appenzeller > some years ago and the basic works by John Nagle on this matter. I read this several years ago - but thanks for the reminder. I consider this paper highly relevant for our backbone links. It is less relevant for customer access links with only one or a few active flows. Thus I stand by my words "not necessarily the goal of all operators in all situations". > The problem with some sessions using window scaling while others do not > is that some sessions may in fact utilize all the memory which is put > into intermediate nodes - be it even Gigabytes - while those without > window scaling are restricted to 65536 bytes. This way, the window > scaling flows largely outpeform the others while there is absolutely no > advantage for the overall throughput in the network at all. If every > flow would use window scaling in long fat networks, the windows most > likely would converge to reasonable sizes. Real life networks tend to have a range of RTTs and capacities. We need to accomodate this range. If vendors supplied TCP implementations which always performed well with window scaling turned on, I would be happy with that. Unfortunately, I don't believe we're there today... > With sufficient queueing memory, you can even grow a 1 meter Ethernet > link to a link capacity of 5000 Terabytes. > > I only don't see any valid reason for doing so. Here we agree. Steinar Haug, Nethelp consulting, sthaug at nethelp.no From detlef.bosau at web.de Wed May 12 11:01:49 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 20:01:49 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <20100512.171134.74744878.sthaug@nethelp.no> References: <4BEA7820.5000901@web.de> <20100512.125522.74733250.sthaug@nethelp.no> <4BEA8E38.1080701@web.de> <20100512.171134.74744878.sthaug@nethelp.no> Message-ID: <4BEAED0D.8010603@web.de> On 05/12/2010 05:11 PM, sthaug at nethelp.no wrote: >> > We clearly have different goals. I work for a service provider. If I > have a customer who wants to transmit more than 15 Mbps between end > points with a 35 ms RTT, one of my goals is to make it *possible* for > the customer to do this (assuming the customer has sufficient access > capacity at both ends). To do this with normal TCP, the customer needs > window scaling. > To do this with normal TCP, you need to know the reason for the 35 ms RTT. I remember a situation in a lab, where the RTT between to 802.11 nodes was 90 SECONDS (sic!). This was obviously due to noise/interference and 802.11 retransmissions. Window scaling does not solve this problem - window scaling worsens problems like these. It was broadly discussed in this list, that some 802.11 nodes do up to 254 retransmissions. Hence, the reason for 35 ms is not a large propagation delay or a large pipeline but simply retransmissions in the presence of noise. It is of no help to scale up queues and windows here and to make thousands of packets wait for a service they will never get. > > Real life networks tend to have a range of RTTs and capacities. We need > to accomodate this range. If vendors supplied TCP implementations which > always performed well with window scaling turned on, I would be happy > with that. Unfortunately, I don't believe we're there today... > > I well remember a discussion with a colleague who enjoyed a fine goodput, I think I talked about this before. Lucky him. Poor others... That's what my criticism is all about. In order to achieve reasonable fairness, either _all_ competing users use window scaling - or none. Up to now, I'm convinced that in the latter case hardly any user will benefit from window scaling in the majority of scenarios ;-) >> With sufficient queueing memory, you can even grow a 1 meter Ethernet >> link to a link capacity of 5000 Terabytes. >> >> I only don't see any valid reason for doing so. >> > Here we agree. > :-) -- ------------------------------------------------------------------------ Detlef Bosau Galileistra?e 30 70565 Stuttgart Tel.: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de ------------------------------------------------------------------------ From detlef.bosau at web.de Wed May 12 11:55:54 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 20:55:54 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <8FE460E8-27B0-4643-9626-E0E15DE22A92@mac.com> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> <4BEA7522.9060405@web.de> <8FE460E8-27B0-4643-9626-E0E15DE22A92@mac.com> Message-ID: <4BEAF9BA.5070306@web.de> rick jones wrote: > > On May 12, 2010, at 2:30 AM, Detlef Bosau wrote: > > I'm arriving late to the discussion - perhaps data centers and LANs > were not included in your set of terrestrial TCP sessions and I'm but > providing fodder for "TCP as the one true protocol is bad" school of > thought, but it has been my experience thusfar that over a 10 Gbit/s > Ethernet LAN, TCP needs 128KB or more of window to achieve reasonable > throughput. Is this due to the link lenghts or due to huge interface buffers? > Get much more than 1 ms of delay in the LAN or data center and even > that is insufficient. > I left out the consideration, that we have to take into account the number of active flows. Using VJCC, any flow has a minimum window of 1 MSS. Actually, even 1 MSS may not fit on a small link. Hence, we have to provide a certain minimum of queueing memory to make the system work with the actual number of flows being active. May this be the reason for the delays you mentioned? Actually, I don't mind reasonable window scaling when there are sound reasons for it. Perhaps, the general term "misbehaved" is too strict and we should better encourage a reasonable usage of window scaling. Unfortunately, I read several discussions on this matter where window scaling was used or encouraged quite carelessly. > rick jones > Wisdom teeth are impacted, people are affected by the effects of events Lucky me, I've only two wisdom teeth left ;-) Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From sthaug at nethelp.no Wed May 12 13:00:38 2010 From: sthaug at nethelp.no (sthaug@nethelp.no) Date: Wed, 12 May 2010 22:00:38 +0200 (CEST) Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEAED0D.8010603@web.de> References: <4BEA8E38.1080701@web.de> <20100512.171134.74744878.sthaug@nethelp.no> <4BEAED0D.8010603@web.de> Message-ID: <20100512.220038.104036934.sthaug@nethelp.no> > > We clearly have different goals. I work for a service provider. If I > > have a customer who wants to transmit more than 15 Mbps between end > > points with a 35 ms RTT, one of my goals is to make it *possible* for > > the customer to do this (assuming the customer has sufficient access > > capacity at both ends). To do this with normal TCP, the customer needs > > window scaling. > > To do this with normal TCP, you need to know the reason for the 35 ms RTT. Please assume that I am aware of this. The 35 ms I am talking about is mostly due to speed of light in optical fiber. 35 ms RTT corresponds to a fiber path length of 3000+ km, and a few ms of forwarding delay, which is perfectly reasonable in my country (Norway). > That's what my criticism is all about. In order to achieve reasonable > fairness, either _all_ competing users use window scaling - or none. Here again we have different goals. For my *core* network links, one of my primary goals is to ensure sufficient capacity. In a normal situation there is *no* congestion (throw bandwidth at the problem) - and as far as I can see fairness doesn't enter the picture at all. For my *customer* network links, the customer is paying for a certain capacity. How the customer shares the capacity between different flows is none of my business. Fairness is none of my concerns here either. However, I *do* care about the customer having the possibility of using the capacity he's paying for - and that means the customer needs to be told about window scaling if expects decent TCP throughput on high RTT paths. > Up to now, I'm convinced that in the latter case hardly any user will > benefit from window scaling in the majority of scenarios ;-) Clearly your "majority of scenarios" are different from mine. Steinar Haug, Nethelp consulting, sthaug at nethelp.no From detlef.bosau at web.de Wed May 12 13:29:42 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 12 May 2010 22:29:42 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <20100512.220038.104036934.sthaug@nethelp.no> References: <4BEA8E38.1080701@web.de> <20100512.171134.74744878.sthaug@nethelp.no> <4BEAED0D.8010603@web.de> <20100512.220038.104036934.sthaug@nethelp.no> Message-ID: <4BEB0FB6.5000403@web.de> sthaug at nethelp.no wrote: >> To do this with normal TCP, you need to know the reason for the 35 ms RTT. >> > > Please assume that I am aware of this. The 35 ms I am talking about is > mostly due to speed of light in optical fiber. 35 ms RTT corresponds > to a fiber path length of 3000+ km, and a few ms of forwarding delay, > which is perfectly reasonable in my country (Norway). > O.k. Another consideration is the number of flows. Are those long distance fibers common for backbone connections or for connections with only some few flows? >> That's what my criticism is all about. In order to achieve reasonable >> fairness, either _all_ competing users use window scaling - or none. >> > > Here again we have different goals. For my *core* network links, one > of my primary goals is to ensure sufficient capacity. In a normal > situation there is *no* congestion (throw bandwidth at the problem) - > and as far as I can see fairness doesn't enter the picture at all. > I don't see a conflict here. However, when we enable window scaling for _all_ competing flows, there may a moderate level of congestion here, because the flows' windows must achieve an equilibrium somehow. > For my *customer* network links, the customer is paying for a certain > capacity. How the customer shares the capacity between different flows > is none of my business. Fairness is none of my concerns here either. > And it's not your business whether or not the customer utilizes or underutilizes the line, I see. > However, I *do* care about the customer having the possibility of using > the capacity he's paying for - and that means the customer needs to be > told about window scaling if expects decent TCP throughput on high RTT > paths. > > I admit that I missed the situation of long distance _customer_ lines. That's an actual shortcoming in my rationale. However, can we agree that a good measure to prevent misbehaviour (which _can_ result from a single flow using window scaling while the competitors don't) is to enable window scaling actually on _all_ flows or on _no_ flows? Although this might lead to some moderate level of congestions even in lines with comparably moderate load? -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From lachlan.andrew at gmail.com Wed May 12 17:05:02 2010 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Thu, 13 May 2010 10:05:02 +1000 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEB0FB6.5000403@web.de> References: <4BEA8E38.1080701@web.de> <20100512.171134.74744878.sthaug@nethelp.no> <4BEAED0D.8010603@web.de> <20100512.220038.104036934.sthaug@nethelp.no> <4BEB0FB6.5000403@web.de> Message-ID: On 13 May 2010 06:29, Detlef Bosau wrote: > > However, can we agree that a good measure to prevent misbehaviour (which > _can_ result from a single flow using window scaling while the competitors > don't) is to enable window scaling actually on _all_ flows or on _no_ flows? > Although this might lead to some moderate level of congestions even in lines > with comparably moderate load? I don't think that those conditions are necessary. Any flow is entitled to send at *less* than their congestion window. If they choose to do that by limiting the protocol to a legacy mode (no window scaling), that is their prerogative. Window scaling is an approved mechanism for allowing the intended AIMD behaviour of Reno. The fact that a saturated source sends on a link with a source which chooses not to use its full window doesn't mean the saturated source is "misbehaving". $0.02, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From perfgeek at mac.com Wed May 12 19:45:13 2010 From: perfgeek at mac.com (rick jones) Date: Wed, 12 May 2010 19:45:13 -0700 Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEAF9BA.5070306@web.de> References: <26468D0323239E4184C3C4B41E6B63B404E050E5DA@HQ-EXCH-7.corp.brocade.com> <4BE986E4.6070105@web.de> <4BEA7522.9060405@web.de> <8FE460E8-27B0-4643-9626-E0E15DE22A92@mac.com> <4BEAF9BA.5070306@web.de> Message-ID: <60386597-88D8-46B5-A586-51ADB3F196E0@mac.com> On May 12, 2010, at 11:55 AM, Detlef Bosau wrote: > rick jones wrote: >> >> On May 12, 2010, at 2:30 AM, Detlef Bosau wrote: >> >> I'm arriving late to the discussion - perhaps data centers and LANs >> were not included in your set of terrestrial TCP sessions and I'm >> but providing fodder for "TCP as the one true protocol is bad" >> school of thought, but it has been my experience thusfar that over >> a 10 Gbit/s Ethernet LAN, TCP needs 128KB or more of window to >> achieve reasonable throughput. > > Is this due to the link lenghts or due to huge interface buffers? I believe it is the result of the basic bandwidth delay product. 10 billion bits per second does not leave room for much delay. 40 or 100 billion bits per second will leave even less. To get 9 Gbit/s with a 65535 byte window means the RTT on the LAN must be less than 0.000583 seconds - so anything more than half a millisecond and 65535 won't cut it. And that will include getting through the stack on the transmitter, DMA into the NIC, getting through the NIC, toggle the bits onto the fibre, get through the switch(s) and any bit toggling that entails on the inter-switch links, then bit toggling across the last hop, through that NIC and up through that stack. >> Get much more than 1 ms of delay in the LAN or data center and >> even that is insufficient. >> > > I left out the consideration, that we have to take into account the > number of active flows. > > Using VJCC, any flow has a minimum window of 1 MSS. Actually, even 1 > MSS may not fit on a small link. Hence, we have to provide a certain > minimum of queueing memory to make the system work with the actual > number of flows being active. > May this be the reason for the delays you mentioned? In the tests I run, the only queueing is consumed by the data of the connection itself. I'm talking about a single TCP flow, not even when there are multiple flows attempting to go through a common path. > Actually, I don't mind reasonable window scaling when there are > sound reasons for it. Perhaps, the general term "misbehaved" is too > strict and we should better encourage a reasonable usage of window > scaling. Unfortunately, I read several discussions on this matter > where window scaling was used or encouraged quite carelessly. To be sure, I've seen some crazy things - like 10GbE NIC vendors suggesting people set their TCP windows to 16 MB, but I do not see that as condemning TCP window scaling in general. I actually use a 1MB socket buffer in my 10GbE netperf tests - for example, in: ftp://ftp.netperf.org/netperf/misc/dl380g5_2.6.32-3_ad386_1.1.3-ko_T7.4.0_b2b_to_same_1500mtu_20100114.csv The multiple results are generally when I am shifting the CPU affinity of netperf/netserver around relative to the core taking interrupts from the NIC. You will notice, if you scroll way over to the right, how far Linux autotuning will take the TCP window/socket buffers- eg Column I rows 6-8 for the local send socket buffer final and Column R for the remote receive socket buffer final (4194304 was the configured limit - the default in the kernel I was using.) (rows 6-8 were using autotuning, rows 12-14 were with explicit setsockopt() calls on the socket buffers.) >> rick jones >> Wisdom teeth are impacted, people are affected by the effects of >> events > > Lucky me, I've only two wisdom teeth left ;-) I'm not sure if that leaves you one up or one down on me - I have just the one :) rick http://homepage.mac.com/perfgeek From detlef.bosau at web.de Thu May 13 05:28:56 2010 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 13 May 2010 14:28:56 +0200 Subject: [e2e] TCP implementations in various OS's In-Reply-To: References: <4BEA8E38.1080701@web.de> <20100512.171134.74744878.sthaug@nethelp.no> <4BEAED0D.8010603@web.de> <20100512.220038.104036934.sthaug@nethelp.no> <4BEB0FB6.5000403@web.de> Message-ID: <4BEBF088.9000304@web.de> Lachlan Andrew wrote: > On 13 May 2010 06:29, Detlef Bosau wrote: > >> However, can we agree that a good measure to prevent misbehaviour (which >> _can_ result from a single flow using window scaling while the competitors >> don't) is to enable window scaling actually on _all_ flows or on _no_ flows? >> Although this might lead to some moderate level of congestions even in lines >> with comparably moderate load? >> > > I don't think that those conditions are necessary. > > Any flow is entitled to send at *less* than their congestion window. > If they choose to do that by limiting the protocol to a legacy mode > (no window scaling), that is their prerogative. Window scaling is an > approved mechanism for allowing the intended AIMD behaviour of Reno. > However, the AIMD scheme will simply not converge properly, when some flows do window scaling while others don't. Actually, the prerogative of doing no window scaling means that a a flow intentionally misses to use its fair share of capacity. > The fact that a saturated source sends on a link with a source which > chooses not to use its full window doesn't mean the saturated source > is "misbehaving". > > ;-) It was not my intention to call a decent call misbehaving but the greedy one :-) However, VJCC is a distributed algorithm which implicitly assumes a compatible implementation or behaviour in all participating nodes to work as intended. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 detlef.bosau at web.de http://www.detlef-bosau.de From sthaug at nethelp.no Thu May 13 07:17:52 2010 From: sthaug at nethelp.no (sthaug@nethelp.no) Date: Thu, 13 May 2010 16:17:52 +0200 (CEST) Subject: [e2e] TCP implementations in various OS's In-Reply-To: <4BEB0FB6.5000403@web.de> References: <4BEAED0D.8010603@web.de> <20100512.220038.104036934.sthaug@nethelp.no> <4BEB0FB6.5000403@web.de> Message-ID: <20100513.161752.74692827.sthaug@nethelp.no> > Another consideration is the number of flows. Are those long distance > fibers common for backbone connections or for connections with only some > few flows? Long distance fibers pretty much always means backbone connections of some sort. > > Here again we have different goals. For my *core* network links, one > > of my primary goals is to ensure sufficient capacity. In a normal > > situation there is *no* congestion (throw bandwidth at the problem) - > > and as far as I can see fairness doesn't enter the picture at all. > > I don't see a conflict here. However, when we enable window scaling for > _all_ competing flows, there may a moderate level of congestion here, > because the flows' windows must achieve an equilibrium somehow. You're assuming that the equilibrium must be reached because a flow is competing against other flows. In *my* world it is also very common that a flow reaches its equilibrium because it manages to saturate the customer access link. > > For my *customer* network links, the customer is paying for a certain > > capacity. How the customer shares the capacity between different flows > > is none of my business. Fairness is none of my concerns here either. > > And it's not your business whether or not the customer utilizes or > underutilizes the line, I see. Correct, the customer is free to underutilize the link. Many service provider networks are based on statistical multiplexing, which implies the customer is underutilizing the access link at least some of the time. > > However, I *do* care about the customer having the possibility of using > > the capacity he's paying for - and that means the customer needs to be > > told about window scaling if expects decent TCP throughput on high RTT > > paths. > > > I admit that I missed the situation of long distance _customer_ lines. > That's an actual shortcoming in my rationale. The customer access link is seldom long distance. But the customer may well wish to communicate (via TCP sessions) with something at the end of a high-RTT path. > However, can we agree that a good measure to prevent misbehaviour (which > _can_ result from a single flow using window scaling while the > competitors don't) is to enable window scaling actually on _all_ flows > or on _no_ flows? Although this might lead to some moderate level of > congestions even in lines with comparably moderate load? I'm not prepared to agree on that yet. If you say "using window scaling" then you also need to say something about which scale factor. If the scale factor is 1, we should (at least in theory) get pretty much the same behavior as a TCP implementation without window scaling, modulo any software bugs due to more lines of code, less well tested code etc. I haven't seen any good comparative study of TCP implementation behavior with and without window scaling - this would certainly be interesting. Steinar Haug, Nethelp consulting, sthaug at nethelp.no From end2end-interest at isi.edu Mon May 31 16:01:21 2010 From: end2end-interest at isi.edu (end2end-interest@isi.edu) Date: Tue, 1 Jun 2010 04:01:21 +0500 Subject: [e2e] ED-Tabl'sV!agra - 51% Message-ID: <201006011031.o51AV9l2001495@boreas.isi.edu> An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20100601/4d07c026/attachment.html