From simon at limmat.switch.ch Thu Mar 1 07:09:42 2001 From: simon at limmat.switch.ch (Simon Leinen) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010228164710.G51394@ted.isi.edu> References: <13695.982276825@dstc.edu.au> <20010228164710.G51394@ted.isi.edu> Message-ID: >>>>> "tf" == Ted Faber writes: >> [...] >> TCP 532400385076 95.81 % >> UDP 21201575665 3.82 % > Bytes or packets? Does the other unit support your conclusion, too? Those were bytes. Here's a summary from the same transatlantic links (Eastbound direction only) which counts flows and packets too (new version of script appended): protocol.......flows..............packets...............bytes......... GRE 7071 ( 0.01 %) 268698 ( 0.02 %) 213346212 ( 0.04 %) ICMP 3473563 ( 6.09 %) 10420689 ( 0.94 %) 1083751656 ( 0.20 %) IGMP 4 ( 0.00 %) 8 ( 0.00 %) 7264 ( 0.00 %) IP 11604 ( 0.02 %) 3724884 ( 0.34 %) 763601220 ( 0.14 %) IPINIP 4716 ( 0.01 %) 14148 ( 0.00 %) 2589084 ( 0.00 %) TCP 35155287 (61.62 %) 942530269 (85.00 %) 532400385076 (95.81 %) UDP 18399711 (32.25 %) 151843881 (13.69 %) 21201575665 ( 3.82 %) So in terms of number of flows and packets, this particular transatlantic link does indeed have a higher share of UDP than what the 1998 "beast" paper observed. Wether this is a general trend, or just due to different usage patterns between our link and the links observed by CAIDA, I don't know. Actually I'd be glad if people could run the script on other routers which aggregate large numbers of users (especially non-academic users) and tell me whether the results are wildly different or wildly similar. For a comparison, here are the totals from an access router at a random university: protocol.......flows..............packets...............bytes......... GRE 383 ( 0.00 %) 17235 ( 0.00 %) 3602115 ( 0.00 %) ICMP 101931237 ( 1.75 %) 305793711 ( 0.45 %) 37918420164 ( 0.11 %) IGMP 34662 ( 0.00 %) 901212 ( 0.00 %) 58578780 ( 0.00 %) IP 1406788 ( 0.02 %) 15474668 ( 0.02 %) 3528224304 ( 0.01 %) IPINIP 1297 ( 0.00 %) 1297 ( 0.00 %) 583650 ( 0.00 %) TCP 4361852662 (74.91 %) 63919315234 (93.39 %) 32455859980970 (96.82 %) UDP 1357265629 (23.31 %) 4201556174 ( 6.14 %) 1025284993546 ( 3.06 %) This matches the CAIDA values much better. Maybe the transatlantic figures are biased because we run authoritative name servers for some ccTLDs - those can be expected to generate lots and lots of single-packet UDP port 53 flows. Note also that the "flow" concept used in the CAIDA work isn't exactly the same as Cisco NetFlow's, although the numbers may still be comparable. In case someone wants to know, we use the default flow timeout values (30 minutes maximum lifetime or 1 minute(??) maximum inactivity). -- Simon. #!/usr/local/bin/perl -w my (%tb,%tp,%tf); my ($tb,$tp,$tf) = (0,0,0); while (<>) { my ($proto,$f,$fps,$ppf,$bbp) = split; next unless defined $bbp && $proto =~ /^[A-Z]/ && $f =~ /^[0-9]+$/; $proto =~ s/-.*//; my $p = $f * $ppf; my $b = $p * $bbp; $tf{$proto} += $f, $tf += $f, $tb{$proto} += $b, $tb += $b, $tp{$proto} += $p, $tp += $p unless $proto eq 'Total:'; } printf "protocol.......flows..............packets...............bytes.........\n"; foreach (sort keys %tb) { printf "%-7s %8.0f (%5.2f %%) %9.0f (%5.2f %%) %13.0f (%5.2f %%)\n", $_, $tf{$_}, 100.0 * $tf{$_} / $tf, $tp{$_}, 100.0 * $tp{$_} / $tp, $tb{$_}, 100.0 * $tb{$_} / $tb; } 1; From simon at limmat.switch.ch Thu Mar 1 07:32:23 2001 From: simon at limmat.switch.ch (Simon Leinen) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <4591.983412132@dstc.edu.au> References: <4591.983412132@dstc.edu.au> Message-ID: >>>>> "gm" == George Michaelson writes: > Wow. Completely proved me wrong. Well... other networks may be different. For example, we used to charge the universities by volume (for transatlantic traffic we still do), so that certainly has some influence on our usage. >> 98 Also predates an explosion in IP-in-IP and other encapsulated >> flows (VPNs, IPSEC, PPPoE) so I'd be willing to hazard there are >> more fragmented flows than shown there. > And I look to be wrong on IP-in-IP as well. That may also be different for other ISPs. Our university users don't do much VPN (yet?). > It also predates the explosion of applications such as Napster and > Gnutella (which both run over TCP), whose traffic volume dwarfs > that of all UDP traffic (at least on our network). > The application mix that makes TCP predominate.. I didn't expect > that. I had assumed like FSP these things used UDP layering. You underestimate people's ability to learn from past mistakes (-: although TCP is used for concerns other than TCP-friendliness). > The UDP is going to be NTP and DNS? Re-running my script with a tiny change: protocol.............flows..............packets...............bytes......... GRE 7071 ( 0.01 %) 268698 ( 0.02 %) 213346212 ( 0.04 %) ICMP 3473563 ( 6.09 %) 10420689 ( 0.94 %) 1083751656 ( 0.20 %) IGMP 4 ( 0.00 %) 8 ( 0.00 %) 7264 ( 0.00 %) IP-other 11604 ( 0.02 %) 3724884 ( 0.34 %) 763601220 ( 0.14 %) IPINIP 4716 ( 0.01 %) 14148 ( 0.00 %) 2589084 ( 0.00 %) TCP-BGP 154665 ( 0.27 %) 154665 ( 0.01 %) 10053225 ( 0.00 %) TCP-FTP 1478600 ( 2.59 %) 7393000 ( 0.67 %) 2336188000 ( 0.42 %) TCP-FTPD 161850 ( 0.28 %) 69433650 ( 6.26 %) 49783927050 ( 8.96 %) TCP-Frag 285 ( 0.00 %) 3420 ( 0.00 %) 413820 ( 0.00 %) TCP-NNTP 70222 ( 0.12 %) 113338308 (10.22 %) 21307601904 ( 3.83 %) TCP-SMTP 968681 ( 1.70 %) 17436258 ( 1.57 %) 7689389778 ( 1.38 %) TCP-Telnet 75043 ( 0.13 %) 1876075 ( 0.17 %) 324560975 ( 0.06 %) TCP-WWW 24258155 (42.52 %) 315356015 (28.44 %) 235255587190 (42.34 %) TCP-X 6509 ( 0.01 %) 2512474 ( 0.23 %) 293959458 ( 0.05 %) TCP-other 7981277 (13.99 %) 415026404 (37.43 %) 215398703676 (38.76 %) UDP-DNS 8185178 (14.35 %) 16370356 ( 1.48 %) 2111775924 ( 0.38 %) UDP-Frag 444 ( 0.00 %) 1053168 ( 0.09 %) 767759472 ( 0.14 %) UDP-NTP 3676906 ( 6.44 %) 3676906 ( 0.33 %) 279444856 ( 0.05 %) UDP-TFTP 11 ( 0.00 %) 11 ( 0.00 %) 693 ( 0.00 %) UDP-other 6537172 (11.46 %) 130743440 (11.79 %) 18042594720 ( 3.25 %) So NTP is marginal in terms of traffic, DNS too (although not in terms of number of flows). The bulk of UDP *bytes* does in fact come from "UDP-other" - I can think of audio/video streaming and gaming, although the latter may be insignificant on a transatlantic link. > Are the ssh tunnels looking like TCP and so IPSEC/ip-in-ip doesn't > figure because grassroots, people use applications tunnels instead? Maybe. We definitely have customers who use IPSEC for VPN applications (probably showing up in the "IP-other" category), but I don't know whether they do this transatlantically, and our users may do this less than users of commercial networks(?). -- Simon. From floyd at aciri.org Fri Mar 2 09:12:11 2001 From: floyd at aciri.org (Sally Floyd) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <200103021712.f22HCB567380@elk.aciri.org> >So the 95% figure for TCP still looks reasonable in 2001, at least for >that particular link Many thanks for posting your results. I have added a pointer to them on a web page on "Measurement Studies of End-to-End Congestion Control in the Internet", at "http://www.aciri.org/floyd/ccmeasure.html", where we are trying to track information from measurement studies about how end-to-end congestion control is actually doing in the Internet. I would be particularly interested if anyone's measurements ever indicated a surge of non-congestion-controlled traffic in the Internet... - Sally -------------------------------- http://www.aciri.org/floyd/ -------------------------------- From kjc at csl.sony.co.jp Fri Mar 2 10:25:35 2001 From: kjc at csl.sony.co.jp (Kenjiro Cho) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <200103021712.f22HCB567380@elk.aciri.org> References: <200103021712.f22HCB567380@elk.aciri.org> Message-ID: <20010303032535C.kjc@csl.sony.co.jp> Sally Floyd wrote: > >So the 95% figure for TCP still looks reasonable in 2001, at least for > >that particular link > > Many thanks for posting your results. I have added a pointer to them > on a web page on "Measurement Studies of End-to-End Congestion Control > in the Internet", at "http://www.aciri.org/floyd/ccmeasure.html", > where we are trying to track information from measurement studies about > how end-to-end congestion control is actually doing in the Internet. We are maintaining trans-Pacific packet traces along with their summary info taken from the WIDE project backbone at http://tracer.csl.sony.co.jp/mawi/ (note that addresses are scrambled in tcpdump binary outputs.) > I would be particularly interested if anyone's measurements ever > indicated a surge of non-congestion-controlled traffic in the > Internet... Our data also confirms that TCP is still more than 90% of the traffic under normal situations. But, unfortunately, unusual traffic patterns do happen these days. For example, http://tracer.csl.sony.co.jp/mawi/samplepoint-A/2000/200006171359.html http://tracer.csl.sony.co.jp/mawi/samplepoint-A/2000/200006181359.html -Kenjiro From dpreed at reed.com Fri Mar 2 11:02:32 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <200103021712.f22HCB567380@elk.aciri.org> Message-ID: <5.0.2.1.2.20010302135526.03130a60@mail.reed.com> At 09:12 AM 3/2/01 -0800, Sally Floyd wrote: >I would be particularly interested if anyone's measurements ever >indicated a surge of non-congestion-controlled traffic in the >Internet... Good idea, but I'd caution people to observe that non-TCP traffic is still capable of congestion control. For example, one can do streaming media over UDP with congestion control - the same signals (lost packets, RED, and ECN) can be used to reflect congestion to the endpoints and implement a closed-loop adaptive solution (for video, lowering frame rate, and prioritizing audio, for example). So the actual detection and measurement of "non-congestion-controlled" traffic flows is an end-to-end issue. It isn't strictly observable at router, certainly not by just looking at protocol numbers. - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html From simon at limmat.switch.ch Fri Mar 2 12:48:03 2001 From: simon at limmat.switch.ch (Simon Leinen) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <5.0.2.1.2.20010302135526.03130a60@mail.reed.com> References: <5.0.2.1.2.20010302135526.03130a60@mail.reed.com> Message-ID: >>>>> "dpr" == David P Reed writes: > At 09:12 AM 3/2/01 -0800, Sally Floyd wrote: >> I would be particularly interested if anyone's measurements ever >> indicated a surge of non-congestion-controlled traffic in the >> Internet... > Good idea, but I'd caution people to observe that non-TCP traffic is > still capable of congestion control. For example, one can do > streaming media over UDP with congestion control - the same signals > (lost packets, RED, and ECN) can be used to reflect congestion to > the endpoints and implement a closed-loop adaptive solution (for > video, lowering frame rate, and prioritizing audio, for example). ...or giving up out of frustration, or getting kicked out of a game. The thing that comes closest to incapable of congestion control is probably DNS (except zone transfers). But in terms of bytes, DNS makes up only ~0.3% of all traffic around here (even though we have a couple of ccTLD servers on our network). Unfortunately I cannot look at the "UDP-other" traffic (~90% of UDP traffic or 2.7% of all bytes) very well. I'd venture a guess that most of this is RealMedia/QuickTime/Windows Media Player. Those should use fairly well-defined congestion control mechanisms. Is there any work on characterizing these kinds of transport protocols with respect to their levels of "TCP-friendliness"? > So the actual detection and measurement of "non-congestion-controlled" > traffic flows is an end-to-end issue. It isn't strictly observable at > router, certainly not by just looking at protocol numbers. Absolutely, -- Simon. From floyd at aciri.org Fri Mar 2 21:01:11 2001 From: floyd at aciri.org (Sally Floyd) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <200103030501.f2351B573554@elk.aciri.org> >I'd venture a guess that >most of this is RealMedia/QuickTime/Windows Media Player. The packet traces from the MAWI Working Group Traffic Archive at "http://tracer.csl.sony.co.jp/mawi/" break down the udp traffic into dns, rip, realaud, halflif, everque, quake, and other. E.g., "http://tracer.csl.sony.co.jp/mawi/samplepoint-B/2001/200102251400.html". For the days that I looked, the UDP traffic on this transoceanic link was dominated by DNS, actually. But maybe transoceanic links have different traffic mixes than other ones. >Those >should use fairly well-defined congestion control mechanisms. Is >there any work on characterizing these kinds of transport protocols >with respect to their levels of "TCP-friendliness"? We have just started to look at this. In addition to thinking some about the potential fit of equation-based congestion control (e.g., TFRC) for these kinds of traffic. It turns out that the deployment of ECN in the Internet would add a new interest to some of these issues. - Sally -------------------------------- http://www.aciri.org/floyd/ -------------------------------- From ehall at ehsco.com Fri Mar 2 22:07:58 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <200103030501.f2351B573554@elk.aciri.org> Message-ID: <3AA08A3E.541D233@ehsco.com> > For the days that I looked, the UDP traffic on this transoceanic > link was dominated by DNS, actually. But maybe transoceanic links > have different traffic mixes than other ones. People don't play action-oriented multi-player games over long-haul networks. Shoot-em-up games are very sensitive to latency and packet loss. Playing a shoot-em-up with >200ms RTT will get you killed fast by players with <20ms (client-side events have to wait for server-side messages to arrive so the "closer" player gets a distinct advantage in terms of shorter inter-command gap). After a while, you learn to play on servers that are close. Anyway, trans-oceanic links are radically different in that regard. They will always have lower gaming levels. FWIW, network games are fascinating examples of interactive applications. They are the new TELNET, except that they also have range issues that TELNET didn't often encounter (similar to the annoying remote echo problem but on a larger scale). Also, not all of these games use UDP. Many of them are using TCP for a variety of familiar reasons. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From akyol at pluris.com Sun Mar 4 00:22:24 2001 From: akyol at pluris.com (Bora Akyol) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <200103030501.f2351B573554@elk.aciri.org> Message-ID: I would expect to see lots of content caching/distribution going on in the transoceanic links such that multimedia traffic probably always gets served from the nearest content server. If that is the case, how is the content getting replicated to these different continents? Do the traffic statistics over the transoceanic links capture this replication or are these being beamed over satellite? Thanks Bora On Fri, 2 Mar 2001, Sally Floyd wrote: > >I'd venture a guess that > >most of this is RealMedia/QuickTime/Windows Media Player. > > The packet traces from the MAWI Working Group Traffic Archive at > "http://tracer.csl.sony.co.jp/mawi/" break down the udp traffic > into dns, rip, realaud, halflif, everque, quake, and other. E.g., > "http://tracer.csl.sony.co.jp/mawi/samplepoint-B/2001/200102251400.html". > For the days that I looked, the UDP traffic on this transoceanic > link was dominated by DNS, actually. But maybe transoceanic links > have different traffic mixes than other ones. > > >Those > >should use fairly well-defined congestion control mechanisms. Is > >there any work on characterizing these kinds of transport protocols > >with respect to their levels of "TCP-friendliness"? > > We have just started to look at this. In addition to thinking some > about the potential fit of equation-based congestion control (e.g., > TFRC) for these kinds of traffic. It turns out that the deployment > of ECN in the Internet would add a new interest to some of these issues. > > - Sally > -------------------------------- > http://www.aciri.org/floyd/ > -------------------------------- > From smd at ebone.net Sun Mar 4 02:55:34 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010304105534.B8DE58A3@sean.ebone.net> | I would expect to see lots of content caching/distribution going on in the | transoceanic links such that multimedia traffic probably always gets | served from the nearest content server. I wouldn't. | If that is the case, how is the content getting replicated to these | different continents? Do the traffic statistics over the transoceanic | links capture this replication or are these being beamed over satellite? Some people (Yahoo, CNN, etc.) locate "european-flavour" servers in Europe, "cantonese-style" servers in Hong Kong, and so forth. Some people use Akamai and their competitors, which seem to be locating stuff in various places around the world. Most content just comes from wherever it happens to be hosted, and often enough that's somewhere in California. Works great. Sean. From jstevenson at orblynx.com Sun Mar 4 06:49:55 2001 From: jstevenson at orblynx.com (John Stevenson) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010304105534.B8DE58A3@sean.ebone.net> Message-ID: <3AA25613.5678C303@orblynx.com> << Most content just comes from wherever it happens to be hosted, and often enough that's somewhere in California. Works great. >> Not so if the client is in Indonesia, or eastern Europe, or Egypt, etc., a long or an indirect link over any big cable. And recently not quite so even in California (when the resulting brownouts kick in). John Stevenson Sean Doran wrote: > | I would expect to see lots of content caching/distribution going on in the > | transoceanic links such that multimedia traffic probably always gets > | served from the nearest content server. > > I wouldn't. > > | If that is the case, how is the content getting replicated to these > | different continents? Do the traffic statistics over the transoceanic > | links capture this replication or are these being beamed over satellite? > > Some people (Yahoo, CNN, etc.) locate "european-flavour" servers in Europe, > "cantonese-style" servers in Hong Kong, and so forth. Some people use > Akamai and their competitors, which seem to be locating stuff in various > places around the world. Most content just comes from wherever it happens > to be hosted, and often enough that's somewhere in California. Works great. > > Sean. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20010304/69472035/attachment.html From cerpa at ISI.EDU Sun Mar 4 21:13:02 2001 From: cerpa at ISI.EDU (Alberto Cerpa) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010304105534.B8DE58A3@sean.ebone.net> Message-ID: On Sun, 4 Mar 2001, Sean Doran wrote: > | If that is the case, how is the content getting replicated to these > | different continents? Do the traffic statistics over the transoceanic > | links capture this replication or are these being beamed over satellite? > > Some people (Yahoo, CNN, etc.) locate "european-flavour" servers in Europe, > "cantonese-style" servers in Hong Kong, and so forth. Some people use > Akamai and their competitors, which seem to be locating stuff in various > places around the world. Most content just comes from wherever it happens > to be hosted, and often enough that's somewhere in California. Works great. > Do you have some measurements to back this up? I would be really interested to get any pointers to some data available confirming this. Best regards, -Al > Sean. > From T.Henderson at cs.ucl.ac.uk Mon Mar 5 07:29:58 2001 From: T.Henderson at cs.ucl.ac.uk (Tristan Henderson) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Message from "Eric A. Hall" of "Fri, 02 Mar 2001 22:07:58 PST." <3AA08A3E.541D233@ehsco.com> Message-ID: <20010305153003.9C49337D3F@kylie.cs.ucl.ac.uk> In message <3AA08A3E.541D233@ehsco.com>, "Eric A. Hall" said: > >People don't play action-oriented multi-player games over long-haul >networks. Shoot-em-up games are very sensitive to latency and packet loss. >Playing a shoot-em-up with >200ms RTT will get you killed fast by players >with <20ms (client-side events have to wait for server-side messages to >arrive so the "closer" player gets a distinct advantage in terms of >shorter inter-command gap). After a while, you learn to play on servers >that are close. > Do you have any data/stats to support these figures? I'm doing some analysis of shoot-em-up games and haven't been able to find anything authoritative about the maximum delays for networked games. I've seen figures of ~200ms being declared as the "maximum" delay before; e.g. a games designer says that they design for 200-300ms delays at http://www.gamasutra.com/features/19970905/ ng_01.htm. OTOH, there are plenty of usenet postings from people playing with RTTs of 300-1000ms, e.g. http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=1daccce21a879875 http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=2cd5a305b3152d89 http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=41803c558ae2df07 (apologies if these google links don't work; I haven't quite got used to their usenet archive yet) It would be useful to know the absolute highest delays that gamers can tolerate. > >FWIW, network games are fascinating examples of interactive applications. I agree. I'm particularly interested in the multiuser aspects - for example, as you state, there are dynamics which may force users with similar network characteristics to congregate together. Alas, games seem to have been neglected by the networking research community, but hopefully that is changing. Cheers, Tristan From smd at ebone.net Mon Mar 5 08:10:48 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: (Alberto Cerpa's message of "Sun, 4 Mar 2001 21:13:02 -0800 (PST)") References: Message-ID: <52d7bwgprr.fsf@sean.ebone.net> Alberto Cerpa writes: > Do you have some measurements to back this up? I would be really > interested to get any pointers to some data available confirming this. What type of measurements would you be looking for? My assertion of "works great" is based on some measurements of RTT (absolute and variability) and loss various networks here and there use to construct their SLAs. In the little network in which I play, all approach the minimum most of the time, even in outlying places like Bratislava, Budapest, Bucharest, Prague and so on, that are in the "Eastern Europe" someone suggested was badly connected to the world. Yes, TCP's ACK clocking means things farther away from each other are at a disadvantage, but realistically, it *is* only ~190ms from Romania to Southern California (which seems like a reasonable selection of worst case, topologically speaking). I am open to the argument that these are atypically good numbers, and that performance to Eastern Europe (for example) tends to be much worse. In fact, I would love to get such an argument into the hands of our sales people. :-) :-) Sean. - -- robuc101-ta#trace Protocol [ip]: Target IP address: cs.ucsd.edu Source address: 213.174.64.13 Numeric display [n]: Timeout in seconds [3]: Probe count [3]: 10 Minimum Time to Live [1]: Maximum Time to Live [30]: Port Number [33434]: Loose, Strict, Record, Timestamp, Verbose[none]: Type escape sequence to abort. Tracing the route to cs.ucsd.edu (132.239.51.18) 1 atvie103-ta-s1-0.ebone.net (213.174.70.81) 20 msec 20 msec 12 msec 16 msec 12 msec 24 msec 16 msec 12 msec 16 msec 12 msec 2 atvie101-tc-r6-0.ebone.net (195.158.245.49) 16 msec 20 msec 12 msec 16 msec 20 msec 20 msec 12 msec 16 msec 12 msec 16 msec 3 czpra103-tc-p2-0.ebone.net (195.158.242.45) 20 msec 28 msec 28 msec 28 msec 20 msec 24 msec 24 msec 24 msec 20 msec 20 msec 4 debln302-tc-p2-0.ebone.net (213.174.70.45) 32 msec 32 msec 28 msec 36 msec 32 msec 32 msec 36 msec 40 msec 32 msec 28 msec 5 debln301-tc-p1-0.ebone.net (213.174.70.37) 32 msec 40 msec 28 msec 36 msec 36 msec 36 msec 32 msec 32 msec 32 msec 32 msec 6 dedus206-tc-p6-0.ebone.net (213.174.70.41) 36 msec 44 msec 36 msec 40 msec 40 msec 36 msec 40 msec 40 msec 36 msec 36 msec 7 dedus205-tc-p7-0.ebone.net (213.174.70.125) 40 msec 40 msec 40 msec 36 msec 36 msec 40 msec 40 msec 44 msec 48 msec 40 msec 8 nlams303-tc-p2-0.ebone.net (213.174.70.134) 40 msec 40 msec 40 msec 40 msec 44 msec 40 msec 40 msec 40 msec 40 msec 44 msec 9 bebru203-tc-p1-0.ebone.net (213.174.71.1) 44 msec 44 msec 44 msec 44 msec 44 msec 44 msec 44 msec 44 msec 40 msec 44 msec 10 bebru204-tc-p2-0.ebone.net (195.158.225.82) 40 msec 44 msec 44 msec 44 msec 40 msec 48 msec 40 msec 48 msec 44 msec 48 msec 11 gblon505-tc-p1-0.ebone.net (195.158.232.41) 48 msec 48 msec 48 msec 48 msec 52 msec 52 msec 48 msec 48 msec 48 msec 48 msec 12 usnyk105-tc-p1-1.ebone.net (195.158.229.25) 116 msec 120 msec 116 msec 116 msec 120 msec 120 msec 116 msec 120 msec 120 msec 116 msec 13 sl-bb11-nyc-5-3.sprintlink.net (144.232.9.229) [AS 1239] 116 msec 120 msec 116 msec 116 msec 120 msec 120 msec 116 msec 120 msec 124 msec 116 msec 14 144.232.9.202 [AS 1239] 140 msec 116 msec 120 msec 116 msec 120 msec 116 msec 116 msec 120 msec 116 msec 116 msec 15 pos3-0-622M.nyc-bb8.cerf.net (134.24.33.158) [AS 1740] 120 msec 120 msec 116 msec 120 msec 116 msec 120 msec 116 msec 116 msec 116 msec 120 msec 16 so6-3-0-622M.chi-bb5.cerf.net (134.24.32.213) [AS 1740] 136 msec 136 msec 136 msec 136 msec 136 msec 140 msec 136 msec 140 msec 132 msec 148 msec 17 pos2-0-622M.chi-bb3.cerf.net (134.24.33.197) [AS 1740] 136 msec 140 msec 136 msec 140 msec 136 msec 140 msec 140 msec 136 msec 140 msec 136 msec 18 pos0-0-622M.sfo-bb4.cerf.net (134.24.46.58) [AS 1740] 192 msec 192 msec 188 msec 192 msec 192 msec 188 msec 188 msec 188 msec 188 msec 192 msec 19 pos7-0-622M.sfo-bb3.cerf.net (134.24.32.78) [AS 1740] 196 msec 196 msec 196 msec 196 msec 196 msec 192 msec 196 msec 196 msec 200 msec 196 msec 20 pos3-0-622M.lax-bb4.cerf.net (134.24.29.234) [AS 1740] 192 msec 188 msec 192 msec 192 msec 188 msec 192 msec 188 msec 192 msec 192 msec 192 msec 21 so1-0-0-622M.lax-bb7.cerf.net (134.24.33.170) [AS 1740] 188 msec 192 msec 188 msec 192 msec 192 msec 192 msec 188 msec 192 msec 192 msec 188 msec 22 so-6-0-0.san-bb4.cerf.net (134.24.29.13) [AS 1740] 196 msec 192 msec 196 msec 196 msec 192 msec 196 msec 196 msec 196 msec 196 msec 196 msec 23 pos1-0-0-155M.san-bb1.cerf.net (134.24.29.190) [AS 1740] 196 msec 196 msec 196 msec 196 msec 204 msec 196 msec 196 msec 192 msec 444 msec * 24 sdsc-gw.san-bb1.cerf.net (134.24.12.26) [AS 1740] 204 msec 204 msec 200 msec 200 msec 204 msec 208 msec 200 msec 204 msec 208 msec 200 msec 25 bigmama.ucsd.edu (192.12.207.5) [AS 195] 220 msec 220 msec 228 msec 220 msec 292 msec 236 msec 244 msec 248 msec 232 msec 224 msec 26 cse-rs.ucsd.edu (132.239.254.45) [AS 7377] 224 msec 224 msec 244 msec 224 msec 224 msec 228 msec 228 msec 228 msec 224 msec 224 msec 27 cs.ucsd.edu (132.239.51.18) [AS 7377] 216 msec * 212 msec * 212 msec * 212 msec * 208 msec * From smd at ebone.net Mon Mar 5 08:25:41 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:32 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010305162541.5E74F8A3@sean.ebone.net> Mmmm, socio-psychology meets networking. Always fun, never understood fully. :) | It would be useful to know the absolute highest delays that gamers can | tolerate. Surely this will be somewhat application-dependent? However, there's probably some literature here and there about human reflexes and how fast one needs a result back from a "twitch" in order to feel reasonably interactive. Probably very little of that will focus on network impact. | >FWIW, network games are fascinating examples of interactive applications. They're also fun. I've never been big into shoot-em-up games, since building the Internet is faster and harder, but some friends had me over to play Unreal Tournament with their clan the other week, and my eyes were opened a bit. UT in any event was more sensitive to loss and "drop outs" than to stable delay -- for me, anyway, choppy updates and missed action was more important and harder to compensate for than aiming ahead along the direction the target is seen to be moving. | I agree. I'm particularly interested in the multiuser aspects - for example, | as you state, there are dynamics which may force users with similar network | characteristics to congregate together. It turns out that LAN parties are pretty common: people drive across Europe to gather together around a hub or small switch, matching up as teams in a series of competitions within a broader league. The social aspect, it turns out, is as important as the locality. By analogy, although a good SLA can be gotten from a high-quality Chinese Restaurant's delivery service, 15 people eating the same stuff and communicating via a telephone bridge or IRC or whatever is not as much fun as the same 15 people together in the restuarant, even if the food is no better prepared or presented, and arrives at the table no more quickly. In the Internet space, we all know that there is a significant value in the social aspect of IETF meetings, despite the formalization of the mailing lists as being the places where real work happens. | Alas, games seem to have been neglected by the networking | research community, but hopefully that is changing. Heh - well, they're sure popular among operators, at least those on the operations front, as far as I can tell. Perhaps that is a reflection of a dichotomy between people who are reactive & practical versus people who like to plan in advance and understand the theory behind things. Sean. From T.Henderson at cs.ucl.ac.uk Mon Mar 5 09:54:38 2001 From: T.Henderson at cs.ucl.ac.uk (Tristan Henderson) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Message from smd@ebone.net (Sean Doran) of "Mon, 05 Mar 2001 17:25:41 +0100." <20010305162541.5E74F8A3@sean.ebone.net> Message-ID: <20010305175443.BF70537D3F@kylie.cs.ucl.ac.uk> In message <20010305162541.5E74F8A3@sean.ebone.net>, Sean Doran said: >Mmmm, socio-psychology meets networking. Always fun, never understood fully. >:) > >| It would be useful to know the absolute highest delays that gamers can >| tolerate. > >Surely this will be somewhat application-dependent? > Yes, you'd expect (within networked games) that delay requirements would look like shoot-em-up < RPG < chess. It should be possible, however, to come up with some general figures, a G.114 equivalent for shoot-em-ups. I'd just like something more concrete than figures pulled out of a hat, so if anyone knows of any (reasonably) scientific studies please point me at them. >However, there's probably some literature here and there about >human reflexes and how fast one needs a result back from a "twitch" >in order to feel reasonably interactive. Probably very little >of that will focus on network impact. > Precisely. There is stuff in the VR and physiology worlds about reflexes, but it's not clear that this applies to the Internet, where people seem to put up with a lot more than they'll admit to in a lab experiment. >| >FWIW, network games are fascinating examples of interactive applications. > >They're also fun. I've never been big into shoot-em-up games, >since building the Internet is faster and harder, but some friends >had me over to play Unreal Tournament with their clan the other week, >and my eyes were opened a bit. UT in any event was more sensitive >to loss and "drop outs" than to stable delay -- for me, anyway, choppy >updates and missed action was more important and harder to compensate >for than aiming ahead along the direction the target is seen to be moving. > Interesting. I've been concentrating on Half-Life mainly (it seems to be the most widely-played game according to tracking sites such as http://www.theclq.com/games.asp) but I might have to give UT a go as well. >| I agree. I'm particularly interested in the multiuser aspects - for example, > >| as you state, there are dynamics which may force users with similar network >| characteristics to congregate together. > >It turns out that LAN parties are pretty common: people drive across >Europe to gather together around a hub or small switch, matching up >as teams in a series of competitions within a broader league. > But this isn't always an option for geographically dispersed groups, so a lot of games server operators allow clans to book servers for private games. That's why I'd quite like to determine the QoS requirements for applications such as these; games players are already spending lots of money on their habit, so they should be quite receptive to paying for QoS. >| Alas, games seem to have been neglected by the networking >| research community, but hopefully that is changing. > >Heh - well, they're sure popular among operators, at least those >on the operations front, as far as I can tell. Perhaps that is >a reflection of a dichotomy between people who are reactive & practical >versus people who like to plan in advance and understand the theory >behind things. > No comment :) Cheers, Tristan From ehall at ehsco.com Mon Mar 5 10:02:38 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305153003.9C49337D3F@kylie.cs.ucl.ac.uk> Message-ID: <3AA3D4BE.212F190E@ehsco.com> > Do you have any data/stats to support these figures? Word-of-mouth, casual research. IE, asking the guy that killed me what his ping is. Also, lots of player forums (newsgroups, message boards, etc). I would agree that 200ms RTT seems to be about the max for combat. > design for 200-300ms delays at http://www.gamasutra.com/features/19970905/ > ng_01.htm. Interesting read. Thanks. > OTOH, there are plenty of usenet postings from people playing with RTTs > of 300-1000ms, e.g. Well, not all of them are telling the truth. I'm not sure I'd believe the boastings of nine-year olds in public forums. But there is a lot of skill involved. There are people with 5ms RTT that can't win no matter what, and there are people with 300ms RTT that win all of the time. Another issue here is that not all of the games are shooters. UO in particular has a lot of social elements, and it doesn't require any combat at all. A lot of the high-ping players naturally gravitate more towards the role-playing or social elements instead of combat, particularly after getting their clock cleaned consistently by low-pingers. I'm not saying low RTTs are not important, I am saying that there are games which embrace high-RTT players by offering non-combat activities, and this will likely become more important over time. > Alas, games seem to have been neglected by the networking research > community, but hopefully that is changing. It has gone both ways. Developers of new Internet-specific apps are not coming here, either. But I agree that there is a growing separation between the current Internet and the research community in general. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From ehall at ehsco.com Mon Mar 5 10:14:55 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305162541.5E74F8A3@sean.ebone.net> Message-ID: <3AA3D79E.B3BBCD7A@ehsco.com> > However, there's probably some literature here and there about > human reflexes and how fast one needs a result back from a "twitch" > in order to feel reasonably interactive. Probably very little > of that will focus on network impact. I'm sure that has something to do with it but I don't think it's the principle issue. I mean, it might be the primary factor when everybody is on the same LAN, but when you're talking about cross-country RTTs it's not the primary issue. Command queueing is the problem. Longer RTTs mean larger gaps between commands. This works both ways, in that movement and actions sent from the client take longer to reach the server, but data coming from the server is also rapidly becoming outdated by the time it reaches the client. This puts high RTTs at a distinct disadvantage to low RTTs, regardless of the player's twitch reflex capabalities. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From jms at central.cis.upenn.edu Mon Mar 5 10:35:29 2001 From: jms at central.cis.upenn.edu (Jonathan M. Smith) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Your message of "Mon, 05 Mar 2001 17:47:25 GMT." Message-ID: <200103051835.f25IZTj27538@central.cis.upenn.edu> There's a classic book which has a bunch of experiments on timing - very high quality stuff. The authors are Card, Newell and Simon, and it's called the "Psychology of Computer Human Interaction". I looked at it when I was trying to understand how much queueing delays and jitter of other types "mattered". -JMS From T.Henderson at cs.ucl.ac.uk Mon Mar 5 11:39:22 2001 From: T.Henderson at cs.ucl.ac.uk (Tristan Henderson) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Message from Lloyd Wood of "Mon, 05 Mar 2001 19:17:32 GMT." Message-ID: <20010305193927.CFD4B37D3F@kylie.cs.ucl.ac.uk> In message , Ll oyd Wood said: >On Mon, 5 Mar 2001, Eric A. Hall wrote: > >> But there is a lot of skill involved. There are people with 5ms RTT that >> can't win no matter what, and there are people with 300ms RTT that win all >> of the time. > >hacking your copy of the game for e.g. shooting accuracy has nothing >to do with it. (apropos: there's a rant on security of multiplayer >games under >http://tuxedo.org/~esr/writings/quake-cheats.html >) Deliberately compensating for lag in the game client in some >similar way would be interesting. Apparently the more delay-tolerant RPGs, e.g. Age of Empires and Warcraft, already do some compensation - they deliberately delay all interactions so that all players have similar delay. Not sure about shoot-em-ups though. Cheers, Tristan From touch at ISI.EDU Mon Mar 5 11:47:13 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305153003.9C49337D3F@kylie.cs.ucl.ac.uk> Message-ID: <3AA3ED41.E50A300F@isi.edu> Tristan Henderson wrote: > > In message <3AA08A3E.541D233@ehsco.com>, "Eric A. Hall" said: > > > >People don't play action-oriented multi-player games over long-haul > >networks. Shoot-em-up games are very sensitive to latency and packet loss. > >Playing a shoot-em-up with >200ms RTT will get you killed fast by players > >with <20ms (client-side events have to wait for server-side messages to > >arrive so the "closer" player gets a distinct advantage in terms of > >shorter inter-command gap). After a while, you learn to play on servers > >that are close. As Sean indicated, these are application dependent. More precisely, they depend on the level of predicatability in the feedback system, and how high in the human the processing occurs. The most basic human feedback loops (single flashing light, hit a switch) are in the 100 ms range. That means the network portion must be in the 20ms range to be 'noise' on the overall system delay. However, it gets longer for things like "multiple lights, hit the switch only if one of the lights is red". The response delay gets larger the more complicated the task. E.g., ask someone for a review of War and Peace, and you're liable to be willing to wait a few days. :-) It's all about expectations. Figures in the 100-200ms range are for maximum auditory delay for telephone echos, and date back to the early Bell Labs days. > OTOH, there are plenty of usenet postings from people playing with RTTs of > 300-1000ms, e.g. Many old video games had artificial delays incorporated (e.g., sluggishness in the controls of space invaders, etc). Part of the 'game' is getting acclimated to those delays. > It would be useful to know the absolute highest delays that gamers can > tolerate. People play chess by mail. It's more about expectations than about the inherent delay of the system. ---------------------- Regarding latency papers, there is Stuart Cheshire's from 1996, as well as more recent notes from David Reed. My dissertation was on this stuff, and examined the fundamental limits of latency in communication (pub'd 1992, links on my home page). Joe http://www.isi.edu/touch From ehall at ehsco.com Mon Mar 5 11:48:42 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305193927.CFD4B37D3F@kylie.cs.ucl.ac.uk> Message-ID: <3AA3ED9A.F68F1723@ehsco.com> > Apparently the more delay-tolerant RPGs, e.g. Age of Empires and > Warcraft, already do some compensation - they deliberately delay all > interactions so that all players have similar delay. Not sure about > shoot-em-ups though. Some of the shooters are using turn-based play in order to put more of an emphasis on tactics and less on connection. There are still cheat problems of course, but when *everything* goes through a scheduler on the server it really changes the nature of the game. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From ehall at ehsco.com Mon Mar 5 12:06:27 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: Message-ID: <3AA3F1C2.F69E2B94@ehsco.com> > Deliberately compensating for lag in the game client in some > similar way would be interesting. This was [re]introduced as a fairly big problem just a few weeks back, with client-based clock emulators being used to eliminate programmed delays. EG, for those actions which required client-side delay (casting a spell, healing, shooting a paced weapon, etc.), emulating the clock (and running the emulator at very high rates) meant that client-side delays were essentially removed from the game. It has been around is some form or another for a while. UO has long suffered from a "fast walk" hack that allowed players to move at their own pace instead of a rate set by the server. This was eventually fixed with rotating random keys and encrypted commands. The clock hack essentially made these fixes irrelevant, and allowed for many more cheats. Closing the loop, what this is driving is a return to closed communities. Diablo saw it badly (nobody with any sense played on the public servers when the clients had a built-in "God mode"), others are recognizing the problem and are now designing for it. I think it is trending away from massive multiplayer towards multiple-islands. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From demir at usc.edu Mon Mar 5 12:12:58 2001 From: demir at usc.edu (demir) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <3AA3ED41.E50A300F@isi.edu> Message-ID: > As Sean indicated, these are application dependent. > More precisely, they depend on the level of > predicatability in the feedback system, and how > high in the human the processing occurs. I am, completely, agree with above lines. I interpret this as "engineering human perception/action" where "communication" is also part of this. As "perception/action" will differ in our real life, so should differ in applications. As Joe stated, it is all about "expectations", I think, too. Alper K. Demir From ehall at ehsco.com Mon Mar 5 12:30:08 2001 From: ehall at ehsco.com (Eric A. Hall) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305153003.9C49337D3F@kylie.cs.ucl.ac.uk> <3AA3ED41.E50A300F@isi.edu> Message-ID: <3AA3F74F.C2724ACE@ehsco.com> > > >arrive so the "closer" player gets a distinct advantage in terms of > > >shorter inter-command gap). > As Sean indicated, these are application dependent. > The most basic human feedback loops (single flashing light, > hit a switch) are in the 100 ms range. That means the > network portion must be in the 20ms range to be 'noise' > on the overall system delay. However, it gets longer Not all functions fall in that category. Strafing is holding down a key while turning, for example, not click-click-click. Running/motion is holding down a key. Etc. Whenever a task involves interactive exchange of packets which are not driven by user interaction, then the player with the lower latency gets a distinct advantage. There are also tasks which are user-automated. For example, a user may have practiced a particular sequence of events, and may have developed a timing patter such that they can execute events without waiting for feedback from the system. Rather than "hit switch when light flashes" it becomes "hit switch every 5ms because that's how often the light flashes" which is fundamentally different, and this model also rewards players who have low RTTs vs high RTTs. The best Player-vs-Player fighters are trained monkeys with well-honed reactionary pathways which allow them to react to macros that fail. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/ From touch at ISI.EDU Mon Mar 5 13:00:44 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <20010305153003.9C49337D3F@kylie.cs.ucl.ac.uk> <3AA3ED41.E50A300F@isi.edu> <3AA3F74F.C2724ACE@ehsco.com> Message-ID: <3AA3FE7C.E4C63F99@isi.edu> "Eric A. Hall" wrote: > > > > >arrive so the "closer" player gets a distinct advantage in terms of > > > >shorter inter-command gap). > > > As Sean indicated, these are application dependent. > > > The most basic human feedback loops (single flashing light, > > hit a switch) are in the 100 ms range. That means the > > network portion must be in the 20ms range to be 'noise' > > on the overall system delay. However, it gets longer > > Not all functions fall in that category. Strafing is holding down a key > while turning, for example, not click-click-click. Running/motion is > holding down a key. Etc. Whenever a task involves interactive exchange of > packets which are not driven by user interaction, then the player with the > lower latency gets a distinct advantage. Strafing needs high packet rate, but is latency independent. A better implementation would just send "start strafe" and "end strafe" signals anyway. > There are also tasks which are user-automated. For example, a user may > have practiced a particular sequence of events, and may have developed a > timing patter such that they can execute events without waiting for > feedback from the system. Rather than "hit switch when light flashes" it > becomes "hit switch every 5ms because that's how often the light flashes" > which is fundamentally different, and this model also rewards players who > have low RTTs vs high RTTs. Any such timing pattern should be uploadable. If you're forcing the user to input the sequence manually, it's just like the forced delays of the old Space Invaders days. > The best Player-vs-Player fighters are trained monkeys with well-honed > reactionary pathways which allow them to react to macros that fail. Right - all you really need to adjust is the non-predicatable part. Joe From J.Crowcroft at cs.ucl.ac.uk Mon Mar 5 14:17:11 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Your message of "Mon, 05 Mar 2001 13:00:44 PST." <3AA3FE7C.E4C63F99@isi.edu> Message-ID: <11047.983830631@cs.ucl.ac.uk> what tristan asked for was _evidence_ you've all turned anecdotal or prescriptive since the evidne about relative TCP and UDP traffic fact is theres few facts about any of this, jusdt lots of opinion. go look at the original bell labs papers on interative audio RTTs : that was just opinion too - when we get to games (pace, Cheriton) same applies in spades j. From touch at ISI.EDU Mon Mar 5 14:51:03 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <11047.983830631@cs.ucl.ac.uk> Message-ID: <3AA41857.9F074B30@isi.edu> Jon Crowcroft wrote: > > what tristan asked for was _evidence_ There's abundant evidence in the human factors community. It's just statistical, like everything involving 'bags of mostly water,' and it's highly domain-specific, because the level of comprehensive and predictive complexity is hard to provide quantitative measures for. Joe From smd at ebone.net Mon Mar 5 15:01:02 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:33 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010305230102.1BAF88A3@sean.ebone.net> | fact is theres few facts about any of this, jusdt lots of opinion. Well, yeah, but Jon, given that the Internet is heterogeneous, anisotropic, expanding, and mutating, it is really hard to be anything but anecdotal, since even the most comprehensive data set (one that defeats the observer problem (i.e., in the absence of isotropism, how do we know what things look like "over there"?)) will quickly grow stale. | go look at the original bell labs papers on interative audio RTTs : | that was just opinion too - when we get to games (pace, Cheriton) same | applies in spades Are you arguing on the question of whether opnion can be "good enough", or on the question of whether something much more strong than opinion or localized (in space and time) measurements can be obtained with an affordable amount of effort? Sean. [in a long-ago CIDRD wg meeting when they were contentious] smd: well, that's just my opinion voice in crowd (tli? postel?): and we're ALL entitled to Sean's opinion From demir at usc.edu Mon Mar 5 15:31:38 2001 From: demir at usc.edu (demir) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010305230102.1BAF88A3@sean.ebone.net> Message-ID: I agree with below lines. However, these are all "chaotic", to me. I think, the main challenge is how could we "engineer" these anisotropic, expanding, and mutating world as engineered as possible so that, may be, "the Turing test" is achieved in the current state. As Joe Touch indicated, "the levels of comprehensive and comlexity is hard to provide quantitative measures". I think, searching for "evidence" requires to solve the "relativity" problem as a human factor. I assume these are all "phylosophical" issues that one might think unimportant. I think an "enhanced architecture" should consider all these and other related factors. Again, thse are all about "expectations". Alper K. Demir > | fact is theres few facts about any of this, jusdt lots of opinion. > > Well, yeah, but Jon, given that the Internet is heterogeneous, > anisotropic, expanding, and mutating, it is really hard to be > anything but anecdotal, since even the most comprehensive data > set (one that defeats the observer problem (i.e., in the absence > of isotropism, how do we know what things look like "over there"?)) > will quickly grow stale. > > | go look at the original bell labs papers on interative audio RTTs : > | that was just opinion too - when we get to games (pace, Cheriton) same > | applies in spades > > Are you arguing on the question of whether opnion can be "good enough", > or on the question of whether something much more strong than opinion > or localized (in space and time) measurements can be obtained with > an affordable amount of effort? > > Sean. > > [in a long-ago CIDRD wg meeting when they were contentious] > smd: well, that's just my opinion > voice in crowd (tli? postel?): and we're ALL entitled to Sean's opinion > From foo at eek.org Mon Mar 5 17:42:52 2001 From: foo at eek.org (foo) Date: Thu Mar 25 11:59:34 2004 Subject: [e2e] TEAR. Message-ID: <20010305194252.L33489@eek.org> Does anyone have any experience with or thoughts about TEAR (TCP Emulation at Receivers) developed by Injong Rhee at NCSU? http://www.csc.ncsu.edu/faculty/rhee/export/tear_page/ -Brian From J.Crowcroft at cs.ucl.ac.uk Tue Mar 6 00:00:35 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Your message of "Tue, 06 Mar 2001 00:01:02 +0100." <20010305230102.1BAF88A3@sean.ebone.net> Message-ID: <3377.983865635@cs.ucl.ac.uk> In message <20010305230102.1BAF88A3@sean.ebone.net>, Sean Doran typed: >>| fact is theres few facts about any of this, jusdt lots of opinion. >>Well, yeah, but Jon, given that the Internet is heterogeneous, >>anisotropic, expanding, and mutating, it is really hard to be >>anything but anecdotal, since even the most comprehensive data >>set (one that defeats the observer problem (i.e., in the absence >>of isotropism, how do we know what things look like "over there"?)) >>will quickly grow stale. a few of us actually try to do some measurements in the real world - before we do to many, we thoguht we would see if some other people had some - there is a LOT on web, a LOT on voice over IP now, and a lot of it is done over a fairly well characterised set of IP paths globally, despite what you say about the heterogeneity - sorry, but the fac t is that when it comes to games, there isnt, as far as we can tell, but we thought we;d ask. >>| go look at the original bell labs papers on interative audio RTTs : >>| that was just opinion too - when we get to games (pace, Cheriton) same >>| applies in spades >>Are you arguing on the question of whether opnion can be "good enough", >>or on the question of whether something much more strong than opinion >>or localized (in space and time) measurements can be obtained with >>an affordable amount of effort? look at vern's work on characterising end to end paths, look at sculzrinne, and bolot's workl on charcartiering delay jitter and its effect on voice, look at abundent work on zipf law and not for web page download size/time, etc etc etc where is the _equivalent_ _experimental_ data for games, please? the point is that a lot of early work in this area (50s,60s, itu standards definitions for toll quality speech) was based on LAB experiments, often with small, culture specific samples. a LOT of recent internet measurement work is based on real world data, which is NOT magic, not impossible (its hard work, and has to be incremental, painstaking, and very careful, but there is a lot) - we just wanted to see where the work had got to in one more part of the space..... >>[in a long-ago CIDRD wg meeting when they were contentious] >>smd: well, that's just my opinion >>voice in crowd (tli? postel?): and we're ALL entitled to Sean's opinion thanks, given that sean's comments are heterogeneous, anisotropic and expanding and mutating, i guess we are. cheers jon From craig at aland.bbn.com Tue Mar 6 05:28:32 2001 From: craig at aland.bbn.com (Craig Partridge) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Your message of "Mon, 05 Mar 2001 22:17:11 GMT." <11047.983830631@cs.ucl.ac.uk> Message-ID: <200103061328.IAA19448@aland.bbn.com> In message <11047.983830631@cs.ucl.ac.uk>, Jon Crowcroft writes: >go look at the original bell labs papers on interative audio RTTs : >that was just opinion too Hi Jon: In defense of the Bell Labs folks. There were some badly done studies in the 1950s and 1960s [not all at Bell Labs if I recall] -- many of which had the property that the people doing the studies didn't understand echo cancellers, with the result that, *surprise surprise*, they all reported that interactive voice couldn't be sustained with a delay of more than something like 100ms (exact number no longer remembered) which was the point at which the lack of echo cancellation started to be a problem. But there were three very good studies, all out of Bell Labs, which did the tests properly. They're still worth reading: Riesz and Klemmer in Bell System Technical Journal of Nov 1963, Klemmer in Bell System Technical Journal of July-August 1967, and P.T. Brady's article in the Bell System Technical Journal of January 1971. Incidentally, Klemmer sent me a note in the early 1990s says that, in retrospect, his studies didn't account for learned delay sensitivity -- that is, a delay which previously was acceptable will become annoying if you've become used to a much shorter delay. Side note: the BSTJ studies often used the following test procedure: * every time you picked up your phone handset, a delay was randomly chosen from a pool of possible delay times * on the phone was a button that you could press if you were unhappy with the audio quality, and I think you were rewarded by having the delay eliminated Something like this might work for gaming (though we'd have to get the incentives right -- if pressing the button eliminates the delay, everyone will do it all the time) Craig From smd at ebone.net Tue Mar 6 06:04:00 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010306140400.EFF438A3@sean.ebone.net> | Something like this might work for gaming (though we'd have to get the | incentives right -- if pressing the button eliminates the delay, everyone | will do it all the time) If pressing the button debits the presser's account a couple of euros in our favour, I would be more than happy to see this supported as quickly as possible in my network. Sean. P.S.: Peter Lothberg & I like to argue how QoS stuff can be done in various interrelated networks which generally offer only "platinum" quality service (zero average queue length, zero drops most of the time) can offer lower-quality (gold, silver, bronze, lead, barbed wire) at more market-competitive prices when there is demand. The idea is to put up a web page where the customer can dial a flavour, such as moving a slide-bar between 0-100% drop probability, and one that lengthens and thickens the tail of one-way delays at the interface facing the customer. While turning the knobs, one would also see the list price change -- worse quality -> lower pricing. Prese here when satisfied. (We also like the idea of having pre-build "profiles". Click here for network XYZ's observed level of service, price is 5% less than network XYZ's list, that kind of thing). There is apparently a decades-long history of doing this in the X.25 world. From karir at wam.umd.edu Tue Mar 6 07:38:31 2001 From: karir at wam.umd.edu (Manish Karir) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010306140400.EFF438A3@sean.ebone.net> Message-ID: On Tue, 6 Mar 2001, Sean Doran wrote: > > P.S.: Peter Lothberg & I like to argue how QoS stuff can be done in > various interrelated networks which generally offer only "platinum" > quality service (zero average queue length, zero drops most of the time) > can offer lower-quality (gold, silver, bronze, lead, barbed wire) at > more market-competitive prices when there is demand. The idea is to > put up a web page where the customer can dial a flavour, such as moving > a slide-bar between 0-100% drop probability, and one that lengthens > and thickens the tail of one-way delays at the interface facing the > customer. While turning the knobs, one would also see the list > price change -- worse quality -> lower pricing. Prese here when satisfied. > > (We also like the idea of having pre-build "profiles". Click here for > network XYZ's observed level of service, price is 5% less than network > XYZ's list, that kind of thing). > > There is apparently a decades-long history of doing this in the X.25 world. > I think something similar to this was done under the INDEX project at berkeley, the paper is at: http://www.path.berkeley.edu/~varaiya/papers_ps.dir/networkpaper.pdf though the web site for the project iteself does'nt seem to exist anymore or has moved... manish karir From kuang at sask.trlabs.ca Sat Mar 3 15:55:01 2001 From: kuang at sask.trlabs.ca (Tianbo kuang) Date: Thu Mar 25 11:59:34 2004 Subject: [e2e] TEAR. In-Reply-To: <20010305194252.L33489@eek.org> Message-ID: Hi, I was just about to ask a question about TEAR. It seems unclear to me how TEAR calculates RTT in the technical report (TEAR: TCP emulation at receivers - flow control for multimedia streaming). Does the sender calculate it and send it to the receiver, or does the receiver calculate it (and how?)? In section 3.5, it does mention under the title _timeout_ that, "this information is embedded in the packet header by the sender". What does "this information" refer to? Cheers, --Tianbo ------------------------------------------------------ Kuang Tianbo TRlabs 111-116 Research Drive Saskatoon, Saskatchewan S7N 3R3 Tel: (306) 668-9325(office) (306) 343-9747 (home) kuang@sask.trlabs.ca ------------------------------------------------------ On Mon, 5 Mar 2001, foo wrote: > Date: Mon, 5 Mar 2001 19:42:52 -0600 > From: foo > To: end2end-interest@ISI.EDU, tcp-impl@lerc.nasa.gov > Subject: [e2e] TEAR. > Resent-Date: Mon, 5 Mar 2001 19:48:05 -0600 > Resent-From: foo@eek.org > Resent-To: end2end-interest@postel.org > > Does anyone have any experience with or thoughts about TEAR (TCP Emulation > at Receivers) developed by Injong Rhee at NCSU? > > http://www.csc.ncsu.edu/faculty/rhee/export/tear_page/ > > -Brian > From rhee at eos.ncsu.edu Tue Mar 6 09:01:50 2001 From: rhee at eos.ncsu.edu (Injong Rhee) Date: Thu Mar 25 11:59:34 2004 Subject: [e2e] TEAR. In-Reply-To: Message-ID: Hi, I can't help overhearing, and want to drop a few lines. The RTT calculation can be done by the sender through receiving the receiver report about the rate. The receiver sends back the time stamp of the last packet received to the sender with the sequence number which will be used by the sender to compute the RTT. This is one way to do it, and there are many other ways. For instance, you can use the same way that RTP does or use GPS to synchronize the clocks and measure the one-way time. Then use the one-way trip time in place of RTT -- I know in this case that TCP-friendliness may suffer, but it can at least give some bounded fairness. In fact this removes back-channel concerns completely from flow control scuh as losses and delays in the back channels. Some of nice things about TEAR are that (1) it does not use back channels much so sutiable for wireless comm; (2) rate control is very smooth; (3) TCP-friendly over various ranges of bandwidth --- TFRC has some prblems under very low bandwidth cases. We have improved TEAR quite bit from the initial work and TEAR is incorporated into an MPEG-4 stream player and stream server, and it seems to give pretty good performance over other existing streaming solutions. Other areas of exploration are multicast and wireless communication. Sorry I have not kept you guys up to date about the progress. I got tired of writing papers and have been digging into writing codes.....Maybe its time to come out and see the light :-) Injong > -----Original Message----- > From: end2end-interest-admin@postel.org > [mailto:end2end-interest-admin@postel.org]On Behalf Of Tianbo kuang > Sent: Saturday, March 03, 2001 6:55 PM > To: foo > Cc: end2end-interest@ISI.EDU; tcp-impl@lerc.nasa.gov; > end2end-interest@postel.org > Subject: Re: [e2e] TEAR. > > > Hi, > > I was just about to ask a question about TEAR. It seems unclear to me how > TEAR calculates RTT in the technical report (TEAR: TCP emulation at > receivers - flow control for multimedia streaming). Does the sender > calculate it and send it to the receiver, or does the receiver calculate > it (and how?)? In section 3.5, it does mention under the title _timeout_ > that, "this information is embedded in the packet header by the sender". > What does "this information" refer to? > > Cheers, > > --Tianbo > > ------------------------------------------------------ > Kuang Tianbo > TRlabs > 111-116 Research Drive > Saskatoon, Saskatchewan S7N 3R3 > Tel: (306) 668-9325(office) (306) 343-9747 (home) > > kuang@sask.trlabs.ca > ------------------------------------------------------ > On Mon, 5 Mar 2001, foo wrote: > > > Date: Mon, 5 Mar 2001 19:42:52 -0600 > > From: foo > > To: end2end-interest@ISI.EDU, tcp-impl@lerc.nasa.gov > > Subject: [e2e] TEAR. > > Resent-Date: Mon, 5 Mar 2001 19:48:05 -0600 > > Resent-From: foo@eek.org > > Resent-To: end2end-interest@postel.org > > > > Does anyone have any experience with or thoughts about TEAR > (TCP Emulation > > at Receivers) developed by Injong Rhee at NCSU? > > > > http://www.csc.ncsu.edu/faculty/rhee/export/tear_page/ > > > > -Brian > > > > From touch at ISI.EDU Tue Mar 6 11:35:19 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] References: <200103061328.IAA19448@aland.bbn.com> Message-ID: <3AA53BF7.665AC944@isi.edu> Craig Partridge wrote: > > Side note: the BSTJ studies often used the following test procedure: > > * every time you picked up your phone handset, a delay was randomly > chosen from a pool of possible delay times > * on the phone was a button that you could press if you were unhappy > with the audio quality, and I think you were rewarded by having > the delay eliminated > > Something like this might work for gaming (though we'd have to get the > incentives right -- if pressing the button eliminates the delay, everyone > will do it all the time) A variant of that is that when you press the button, the delay goes down, but so does the bandwidth. There's your disincentive. Joe From J.Crowcroft at cs.ucl.ac.uk Wed Mar 7 01:24:26 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Your message of "Tue, 06 Mar 2001 08:00:35 GMT." <3377.983865635@cs.ucl.ac.uk> Message-ID: <7570.983957066@cs.ucl.ac.uk> interesting data- http://www.jisc-tau.ac.uk/linx-access.html has a nice graph of latency improving as local access speed increases and matches the in/out capacity better, but http://www.jisc-tau.ac.uk/usa-access.html shows how it aint that simple and as latent demand tracks supply, long haul latency goes up again...roughly speaking... j. From dpreed at reed.com Wed Mar 7 05:12:04 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <7570.983957066@cs.ucl.ac.uk> References: Message-ID: <5.0.2.1.2.20010307080716.024df030@mail.reed.com> If carriers at all points got "paid" based on average latency, the investment would be there to move latency to a better attractor, which would track latent demand. This is something I've been trying to get started for a long time. The movement to pay carriers based on traffic volume, rather than delay experienced, will always drive the system to its worst case latency. we need a closed loop congestion control that works in the time-scale of fiber deployment and LAN-speed upgrades. We don't have one that does this, and no one (other than me) seems to be even seriously thinking about it. I've even done something about it by advising some of the bandwidth exchanges. At 09:24 AM 3/7/01 +0000, Jon Crowcroft wrote: >interesting data- >http://www.jisc-tau.ac.uk/linx-access.html >has a nice graph of latency improving as local access speed increases >and matches the in/out capacity better, but >http://www.jisc-tau.ac.uk/usa-access.html >shows how it aint that simple and as latent demand tracks supply, long >haul latency goes up again...roughly speaking... > >j. - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html From david_zhang at ins.com Wed Mar 7 06:41:19 2001 From: david_zhang at ins.com (david_zhang@ins.com) Date: Thu Mar 25 11:59:34 2004 Subject: [e2e] (no subject) Message-ID: <00fc01c0a714$ac23b660$df59a4d0@C991473C> unsubscribe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20010307/b000bc50/attachment.html From smd at ebone.net Wed Mar 7 08:46:42 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010307164642.B1BCB8A3@sean.ebone.net> [some graphs from www.jisc-tau.ac.uk] well, yes, one expects to move bottlenecks around from time to time, i don't suppose there's any chance of using RED on the US->Europe side to limit the delay, and doing a comparison to actual utilization? that would be very cool (but understandably may not be possible) Sean. From smd at ebone.net Wed Mar 7 08:55:34 2001 From: smd at ebone.net (Sean Doran) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <20010307165534.C7AA28A3@sean.ebone.net> | If carriers at all points got "paid" based on average latency, the | investment would be there to move latency to a better attractor, which | would track latent demand. This is something I've been trying to get | started for a long time. The movement to pay carriers based on traffic | volume, rather than delay experienced, will always drive the system to its | worst case latency. It's not going to be cheaper to have an empty network than to have one with a bottleneck here and there. It's also not like there are that many applications that are so inelastic that latency is worth paying real extra $ to remove, when volume-over-time figures are "good enough" to make the per-available-mbps or 95th-percentile-utilization charges worth it, and there is no obvious killer app that is unamenable to adapting to the Internet's "rough approximation" of good performance, on the grounds that it's cheaper to do that than to do fancy QoS everywhere. | we need a closed loop congestion control that works in the time-scale of | fiber deployment and LAN-speed upgrades. We don't have one that does this, Well, so convince people it's cheaper than what we have now, without eliminating (much) utility. Start with explaining what it takes to have a bounded queueing delay at every potential or real bottleneck. Sean. From djw1005 at cam.ac.uk Wed Mar 7 15:54:57 2001 From: djw1005 at cam.ac.uk (Damon Wischik) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010307165534.C7AA28A3@sean.ebone.net> Message-ID: On Wed, 7 Mar 2001, Sean Doran wrote: > It's not going to be cheaper to have an empty network than to have one > with a bottleneck here and there. It's also not like there are that > many applications that are so inelastic that latency is worth paying > real extra $ to remove, when volume-over-time figures are "good enough" > to make the per-available-mbps or 95th-percentile-utilization charges > worth it > ... Might it be that by reducing latency one can improve the performance even of elastic traffic? TCP, for example, controls its rate using a feedback loop: the lower the latency, the tighter the control loop. There are results to suggest that tighter control loops will improve the stability of the network; this could be worth paying for. For references, see http://www.statslab.cam.ac.uk/~frank/int/ particularly "Stability of distributed congestion control with heterogeneous feedback delays" (L.Massoulie) "End-to-end congestion control for the Internet: delays and stability" (R.Johari and D.Tan.) Damon Wischik. From vijay at umbc.edu Thu Mar 8 08:41:08 2001 From: vijay at umbc.edu (Vijay Gill) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: Message-ID: On Sun, 4 Mar 2001, Alberto Cerpa wrote: [snip snip] Some data regarding flows from the Pacific Rim to the US. Keep in mind that the sampling rate used was fairly low. period: 03/04/2001 15:55:18 - 03/05/2001 15:55:19 PST Protocol Pkts Pkts/sec Bytes Bits/sec -------- ------------- ------------- ------------- ------------- tcp 4566586 52 1118739284 103585 icmp 97183 1 59403325 5500 udp 207550 2 33848925 3134 esp 2019 0 1216156 112 skip 673 0 337732 31 gre 298 0 94225 8 ipip 93 0 49844 4 ospf 108 0 33204 3 ipv6 14 0 1327 0 rsvp 5 0 580 0 the rsvp is an anomaly. /vijay From P.Gevros at cs.ucl.ac.uk Thu Mar 8 09:17:49 2001 From: P.Gevros at cs.ucl.ac.uk (Panos GEVROS) Date: Thu Mar 25 11:59:34 2004 Subject: [e2e] Re: UDP vs. TCP distribution In-Reply-To: Your message of "Thu, 08 Mar 2001 11:41:08 EST." Message-ID: <1602.984071869@cs.ucl.ac.uk> it is true that providers care about the exchanged volume - and that packet statistics are much easier to gather compared to flow statistics - still "flow" statistics would be very interesting (e.g active flows per T-sec, for some defintion of flow) does anyone know whether such data have been published anywhere cheers, Panos Vijay Gill writes: |On Sun, 4 Mar 2001, Alberto Cerpa wrote: | | [snip snip] | | |Some data regarding flows from the Pacific Rim to the US. Keep in mind |that the sampling rate used was fairly low. | |period: 03/04/2001 15:55:18 - 03/05/2001 15:55:19 PST |Protocol Pkts Pkts/sec Bytes Bits/sec |-------- ------------- ------------- ------------- ------------- | tcp 4566586 52 1118739284 103585 | icmp 97183 1 59403325 5500 | udp 207550 2 33848925 3134 | esp 2019 0 1216156 112 | skip 673 0 337732 31 | gre 298 0 94225 8 | ipip 93 0 49844 4 | ospf 108 0 33204 3 | ipv6 14 0 1327 0 | rsvp 5 0 580 0 | |the rsvp is an anomaly. | |/vijay From braden at ISI.EDU Thu Mar 8 09:36:51 2001 From: braden at ISI.EDU (Bob Braden) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] Message-ID: <200103081736.RAA11946@gra.isi.edu> *> ipv6 14 0 1327 0 *> rsvp 5 0 580 0 *> *> the rsvp is an anomaly. *> *> /vijay *> *> *> *> *> *> "anomaly"? What does that mean? If you mean statistically, it would appear that IPv6 is also an anomaly. Bob Braden From vijay at umbc.edu Thu Mar 8 09:39:16 2001 From: vijay at umbc.edu (Vijay Gill) Date: Thu Mar 25 11:59:34 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <200103081736.RAA11946@gra.isi.edu> Message-ID: On Thu, 8 Mar 2001, Bob Braden wrote: > > *> ipv6 14 0 1327 0 > *> rsvp 5 0 580 0 > *> > *> the rsvp is an anomaly. > > "anomaly"? What does that mean? If you mean statistically, it would > appear that IPv6 is also an anomaly. Apologies for being cryptic. RSVP was a misconfiguration somewhere; once fixed, it did not come back. There are some people who feverently wish that v6 could also be fixed and not come back, so I'm watching the links with an eagle eye. /vijay From dpreed at reed.com Thu Mar 8 09:07:53 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:35 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback be generated...] In-Reply-To: <20010307165534.C7AA28A3@sean.ebone.net> Message-ID: <5.0.2.1.2.20010308120144.02fa35f0@mail.reed.com> At 05:55 PM 3/7/01 +0100, Sean Doran wrote: >It's not going to be cheaper to have an empty network than to have >one with a bottleneck here and there. This presumes that customers want the lowest price regardless of delay. Not true. And in any case, operating a network with queues mostly full (which increases utilization) a lot of the time is great strategy if you want to maximize profit when you are being paid by the byte (or by the portal access rate). Most applications benefit from low queueing delay, so this isn't about QoS differentiations. Only FTPs with no human in the loop want capacity with no delay constraint. From touch at ISI.EDU Thu Mar 8 11:20:45 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:35 2004 Subject: UDP vs. TCP distribution [was: Re: [e2e] Can feedback begenerated...] References: <5.0.2.1.2.20010308120144.02fa35f0@mail.reed.com> Message-ID: <3AA7DB8D.B0A9C8E7@isi.edu> "David P. Reed" wrote: > > At 05:55 PM 3/7/01 +0100, Sean Doran wrote: > >It's not going to be cheaper to have an empty network than to have > >one with a bottleneck here and there. > > This presumes that customers want the lowest price regardless of > delay. Not true. And in any case, operating a network with queues mostly > full (which increases utilization) a lot of the time is great strategy if > you want to maximize profit when you are being paid by the byte (or by the > portal access rate). > > Most applications benefit from low queueing delay, so this isn't about QoS > differentiations. Only FTPs with no human in the loop want capacity with > no delay constraint. NTP wants it too. Capacity can also be used to mask latency, PROVIDED the variability in the feedback loop can be described (if not predicted). E.g., even FTPs with people in the loop work - you send the whole directory when the person does a "cd" (see Infocom 1995). (both the above cases are related; the variability is derived from the application, rather than the network). Further, many interactive systems are more sensitive to variability in the latency itself than in the latency value (e.g., NTP, it turns out). Joe From P.Gevros at cs.ucl.ac.uk Thu Mar 8 13:04:13 2001 From: P.Gevros at cs.ucl.ac.uk (Panos GEVROS) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] Re: UDP vs. TCP distribution In-Reply-To: Your message of "Thu, 08 Mar 2001 11:20:45 PST." <3AA7DB8D.B0A9C8E7@isi.edu> Message-ID: <1805.984085453@cs.ucl.ac.uk> Joe Touch writes: | | |"David P. Reed" wrote: |> |> At 05:55 PM 3/7/01 +0100, Sean Doran wrote: |> >It's not going to be cheaper to have an empty network than to have |> >one with a bottleneck here and there. |> |> This presumes that customers want the lowest price regardless of |> delay. Not true. And in any case, operating a network with queues mostly |> full (which increases utilization) a lot of the time is great strategy if |> you want to maximize profit when you are being paid by the byte (or by the |> portal access rate). |> |> Most applications benefit from low queueing delay, so this isn't about QoS |> differentiations. Only FTPs with no human in the loop want capacity with |> no delay constraint. | |NTP wants it too. | |Capacity can also be used to mask latency, PROVIDED the variability |in the feedback loop can be described (if not predicted). E.g., even |FTPs with people in the loop work - you send the whole directory |when the person does a "cd" (see Infocom 1995). low delay (or jitter) is good but whether this should be the network design goal is a different matter; if capacity was not a constrain i would be willing to tolerate an extra delay to download the whole "structure" (or a specific subset) of a web site and do all the searching/browsing locally, also store it for future reference in case i find it interesting enough, and save subsequent network accesses the fact that only a fraction of the information downloaded would be of interest may be irrelevant (because there are no capacity constrains) of course this does not apply to using the web for transactions or dynamic content, so ftp-style transport may not be completely out of fashion yet, it all depends on whether need for interactive experience or access to information proves to be the killer app - but if bounded delay is a necessity and customers are prepared to pay for it then we may be better of with something where one directly "dials the web server" Panos From touch at ISI.EDU Thu Mar 8 14:01:15 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] Re: UDP vs. TCP distribution References: <1805.984085453@cs.ucl.ac.uk> Message-ID: <3AA8012B.FC1BFE58@isi.edu> Panos GEVROS wrote: > > low delay (or jitter) is good but whether this should be the network design > goal is a different matter; ... > it all depends on whether need for interactive experience or access to > information proves to be the killer app - but if bounded delay is a necessity > and customers are prepared to pay for it then we may be better of with > something where one directly "dials the web server" My concern is optimizing the entire network for a single class of applications as well. There are different goals - maximizing BW, minimizing latency, minimizing jitter. What matters is how flexible the infrastructure is to providing these, hopefully concurrently. Joe From vern at ee.lbl.gov Fri Mar 9 16:39:19 2001 From: vern at ee.lbl.gov (Vern Paxson) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] paper on "Difficulties in Simulating the Internet" now available Message-ID: <200103100039.f2A0dJN00605@daffy.ee.lbl.gov> The following paper is to appear in IEEE/ACM Transactions on Networking. It's a revision of a previous paper titled "Why We Don't Know How to Simulate the Internet". Vern & Sally Difficulties in Simulating the Internet Sally Floyd & Vern Paxson AT&T Center for Internet Research at ICSI (ACIRI) {floyd,vern}@aciri.org http://www.aciri.org/vern/papers/sim-difficulty.TON.2001.ps.gz http://www.aciri.org/vern/papers/sim-difficulty.TON.2001.pdf Simulating how the global Internet behaves is an immensely challenging undertaking because of the network's great heterogeneity and rapid change. The heterogeneity ranges from the individual links that carry the network's traffic, to the protocols that interoperate over the links, to the "mix" of different applications used at a site, to the levels of congestion seen on different links. We discuss two key strategies for developing meaningful simulations in the face of these difficulties: searching for invariants, and judiciously exploring the simulation parameter space. We finish with a brief look at a collaborative effort within the research community to develop a common network simulator. From vijay at umbc.edu Fri Mar 9 23:00:23 2001 From: vijay at umbc.edu (Vijay Gill) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] Some stats broken down by protocol In-Reply-To: Message-ID: Based on some queries regarding protocols and flows, here are some of the protocol breakdowns. > Some data regarding flows from the Pacific Rim to the US. Keep in mind > that the sampling rate used was fairly low. > > period: 03/04/2001 15:55:18 - 03/05/2001 15:55:19 PST > Protocol Pkts Pkts/sec Bytes Bits/sec > -------- ------------- ------------- ------------- ------------- > tcp 4566586 52 1118739284 103585 > icmp 97183 1 59403325 5500 > udp 207550 2 33848925 3134 > esp 2019 0 1216156 112 > skip 673 0 337732 31 > gre 298 0 94225 8 > ipip 93 0 49844 4 > ospf 108 0 33204 3 > ipv6 14 0 1327 0 > rsvp 5 0 580 0 > > the rsvp is an anomaly. Adding to the above. Regarding Panos' query about flows: Here is what the statman sayeth (josh wepman) 1. How hard would it be to quantify the traffic in terms of "flow" (src/dst/port) pair? "Flow" statistics would be very interesting (e.g active flows per T-sec, for some definition of flow). I'm working with some folks regarding TCP ECN and flow data would be very useful. Number flows (N) at time (T) (a snapshot) or Number flows (N) over time T1 -> T2 (counter over time) Realtime data is of course out of the question. We only get "expired" flow information exported to cflowd. Historical values can be gotten with a bit of work. This is not data available in the cflowd tables maintained in ARTS data. #Flows is not an attribute maintained. So we have to view the raw flow files. A snapshot could be obtained by counting all flows in flows files whose start/end time encompass T(snapshot). The latter (counter over time) could be determined by counting all the flows seen from Time1 to time2. The value is NOT real as a representation of flows on a link. They are a value based on "flows" exported from the router as determined by a router. A flow could be terminated and exported because a FIN occurred, because it was idle for time T, or because the total time of the flow exceeded a limit time value. In order to do either of the above, it needs to be clear that the value represented is NOT flows on a link at a given time, but flows seen exported based on flow-export criteria. It should also be mentioned that the functionality to do this does not currently exist in cflowd, so any efforts here would have to be part of a larger Flow Development effort. Re: DNS To or from port53. More generically, we can use artsprotos to characterize tcp vs udp vs icmp vs whatever else is seen. The time domain can be manipulated to what you may be looking for. Since protocol is an ARTS stored value, we have historical data to work with. Likewise, DNS (port 53TCP/UDP) is maintained in ARTS and available via artsportms and artsports. We can state from T1 -> T2, what was the protocol distribution, and for a set of ports, what were the port distributions. As with the first question, we do not have #Flows, but we do have pkt/byte data. # /usr/local/arts/bin/artsports daily/arts.20010308.ports router: blah blah blah ifIndex: 27 period: 03/07/2001 15:55:20 - 03/08/2001 15:55:18 PST selected ports: 20-21,53,80,119,443 Port InPkts InBytes OutPkts OutBytes ----- ------------- ------------- ------------- ------------- http 5920632 422354154 1350887 1368382195 ELSE 1055537 252207931 3039 2648398 nntp 269884 80456315 32733 2257472 ftp-data 39530 4840976 52149 76430542 domain 69178 4482451 82340 13109974 https 9266 1253745 4470 2222407 ftp 10868 560030 6164 319734 This data was based on a 1:64 packet sampling rate and has not been extrapolated to 1:1 values. An optimal N for sampling has not been determined for this class of link, so the degree of skew in the above numbers cannot be stated with any certainty. If we assumed that 1:64 sampling did correctly represent the true population, then multiplying out the values by 64 would give you the approximate real values. --end statman Hope this was useful. /vijay From gr224 at hermes.cam.ac.uk Sat Mar 10 03:37:53 2001 From: gr224 at hermes.cam.ac.uk (Gaurav Raina) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] paper on "Difficulties in Simulating the Internet" now available In-Reply-To: <200103100039.f2A0dJN00605@daffy.ee.lbl.gov> Message-ID: On Fri, 9 Mar 2001, Vern Paxson wrote: > Difficulties in Simulating the Internet .... > and judiciously exploring the simulation parameter space. We finish with a > brief look at a collaborative effort within the research community to develop > a common network simulator. Apologies if I am merely stating the obvious, but along with developing simulation tools I think it is *imperative* to try and develop theoretical tools which might give some rules of thumb on how the results could scale to a network as large and complex as the Internet. Apart from the obvious industries involved in Internet Research - *some* of the academic reseach groups are : http://netlab.caltech.edu/ http://www-net.cs.umass.edu/ http://www.statslab.cam.ac.uk/~frank/int/ http://comm.csl.uiuc.edu:80/~srikant/pub.html/ An exhaustive list is not possible... Might it be a good idea to consider having a common database/pool for research papers/preprints dealing with the different research topics? Like the way the physics community has the Los Alamos archive. Gaurav From hgs at cs.columbia.edu Sat Mar 10 06:01:14 2001 From: hgs at cs.columbia.edu (Henning G. Schulzrinne) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] paper on "Difficulties in Simulating the Internet" nowavailable References: Message-ID: <3AAA33AA.5FCAA3B9@cs.columbia.edu> Gaurav Raina wrote: > Might it be a good idea to consider having a common database/pool for > research papers/preprints dealing with the different research topics? Like > the way the physics community has the Los Alamos archive. Well, there's netbib, with about 55,000 networking-related papers, http://www.cs.columbia.edu/~hgs/netbib > > Gaurav > > -- Henning Schulzrinne http://www.cs.columbia.edu/~hgs From J.Crowcroft at cs.ucl.ac.uk Sun Mar 11 05:58:05 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] paper on "Difficulties in Simulating the Internet" nowavailable In-Reply-To: Your message of "Sat, 10 Mar 2001 09:01:14 EST." <3AAA33AA.5FCAA3B9@cs.columbia.edu> Message-ID: <18880.984319085@cs.ucl.ac.uk> In message <3AAA33AA.5FCAA3B9@cs.columbia.edu>, "Henning G. Schulzrinne" typed: >>> Might it be a good idea to consider having a common database/pool for >>> research papers/preprints dealing with the different research topics? Like >>> the way the physics community has the Los Alamos archive. there was/is an attempt to do exactly this, but it takes time to build - i'm not sure what its current status is in the meantime, henning is right - his is the nearest we have, and given the efforts of IEEE Infocom and ACM SIGCOMM and other releated conferences to encourage the archival of conference proceedings online for all, you can generally find most timely information via netbib (and citeseer) now without recourse to walking over the your real libray (btw, you have quite a good one below that ugly tower in cambridge that domiantes the skyline from many approaches:-) >>Well, there's netbib, with about 55,000 networking-related papers, >>http://www.cs.columbia.edu/~hgs/netbib & i personally freel that distributed lassez-fair approaches work better for a diverse growing community than the focuessed approach that math & phsyics people have enjoyed in their social context... cheers jon From hgs at cs.columbia.edu Sun Mar 11 06:28:19 2001 From: hgs at cs.columbia.edu (Henning G. Schulzrinne) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] paper on "Difficulties in Simulating the Internet" nowavailable References: <18880.984319085@cs.ucl.ac.uk> Message-ID: <3AAB8B83.E0C0D23A@cs.columbia.edu> Jon Crowcroft wrote: > > > in the meantime, henning is right - his is the nearest we have, and given > the efforts of IEEE Infocom and ACM SIGCOMM and other releated > conferences to encourage the archival of conference proceedings online > for all, you can generally find most timely information via netbib > (and citeseer) now without recourse to walking over the your real > libray (btw, you have quite a good one below that ugly tower in > cambridge that domiantes the skyline from many approaches:-) Also, if your favorite paper (or your own paper) is not yet in netbib, you're encouraged to enter this information (same web site). > > >>Well, there's netbib, with about 55,000 networking-related papers, > >>http://www.cs.columbia.edu/~hgs/netbib > > & i personally freel that distributed lassez-fair approaches work better > for a diverse growing community than the focuessed approach that math > & phsyics people have enjoyed in their social context... There is also the http://arXiv.org service, but from what I can tell, it is being used extremely rarely by this community. > > > cheers > > jon -- Henning Schulzrinne http://www.cs.columbia.edu/~hgs From demir at usc.edu Wed Mar 14 15:36:31 2001 From: demir at usc.edu (demir) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] [Diffserv-interest] A question on Adaptive Protocols vs Expected Service Classes of Diffserv Message-ID: Hi, There has been vast amount of research on how TCP will react on top of services based on AF/AF-alike PHBs. However, I am not aware of a research that elaborates TCP on top of EF/EF-alike PHBs from service perspective (it seems this is unneccessary at all???). I am aware of that TCP is a widely implemented and used protocol for congestion control and avoidance. I assume, in a "short" time scale, TCP seems reasonable to be used for AF PHB-based services cause TCP-friendly traffic conditioners would refine the TCP's behavior. It seems to me that, may be, we need different adaptive protocols for different service classes (TCP has been developed for the "best-effort" service class). Any ideas/insights/comments? I appreciate very much. Alper K. Demir From nichols at packetdesign.com Wed Mar 14 16:25:38 2001 From: nichols at packetdesign.com (Kathleen Nichols) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] [Diffserv-interest] A question on Adaptive Protocols vs ExpectedService Classes of Diffserv References: Message-ID: <3AB00C02.A6C2BA33@packetdesign.com> Kedar Poduri carried out some simulations like this quite a while back when we were talking about something called a "virtual leased line" built from the EF PHB described in RFC2598. This is real easy to do (of course, you need a shaper at the edge, but Van's been saying that since the origins of this as his "premium" service). You can look at slide 23 of the talk at: http://www.nren.nasa.gov/CFP/nichols_pres/index.htm, a NASA NREN workshop on QoS. It's supposed to "look like a wire". Kathie demir wrote: > > Hi, > There has been vast amount of research on how TCP will react on top of > services based on AF/AF-alike PHBs. However, I am not aware of a research > that elaborates TCP on top of EF/EF-alike PHBs from service perspective > (it seems this is unneccessary at all???). I am aware of that TCP is a > widely implemented and used protocol for congestion control and > avoidance. I assume, in a "short" time scale, TCP seems reasonable to be > used for AF PHB-based services cause TCP-friendly traffic conditioners > would refine the TCP's behavior. It seems to me that, may be, we need > different adaptive protocols for different service classes (TCP has been > developed for the "best-effort" service class). Any > ideas/insights/comments? I appreciate very much. > > Alper K. Demir From bsikdar at networks.ecse.rpi.edu Wed Mar 14 16:42:07 2001 From: bsikdar at networks.ecse.rpi.edu (bsikdar@networks.ecse.rpi.edu) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] Vegas on Linux In-Reply-To: Message-ID: Hi, Could anyone please direct me to any TCP Vegas implementations for Linux? Or for any other platforms? And are there any differences in these implementations? Thanks a lot, Biplab Sikdar Dept. of ECSE Rensselaer Polytechnic Inst., Troy NY 12180 From demir at usc.edu Wed Mar 14 16:44:28 2001 From: demir at usc.edu (demir) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] [Diffserv-interest] A question on Adaptive Protocols vs ExpectedService Classes of Diffserv In-Reply-To: <3AB00C02.A6C2BA33@packetdesign.com> Message-ID: Kathie, From nseddigh at tropicnetworks.com Thu Mar 15 07:53:14 2001 From: nseddigh at tropicnetworks.com (Nabil Seddigh) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] [Diffserv-interest] A question on Adaptive Protocols vs ExpectedService Classes of Diffserv References: Message-ID: <3AB0E56A.1C9DD55C@tropicnetworks.com> There has been work on both TCP modifications as well as "Intelligent" or "TCP-friendly" traffic conditioners to address issues with TCP over the AF PHB. We discovered some interesting results in our experiments with the latter approach - send me email if you're interested. In principle, it should be easier to incorporate changes in edge device traffic conditioners than to affect TCP standard modifications. However, at the same time, there has been limited implementation of intelligent traffic conditioners or policers in deployed products. Best, Nabil Seddigh demir wrote: > My intention > of asking this question was "do we need such an (adaptive) protocol > complexity"? If not, then "do we need to have a different levels of > complexity in the protocols that can used for each service class"? /Or > "One uniform adaptive protocol will suffice cause traffic conditioners can > take care of this"? I assume all is a possibility. What would be a proper > trend to go? Thank you very much > From floyd at aciri.org Thu Mar 15 09:08:56 2001 From: floyd at aciri.org (Sally Floyd) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] two questions about the Internet Message-ID: <200103151708.f2FH8u523767@elk.aciri.org> I maintain a web page http://www.aciri.org/floyd/questions.html of (mostly unanswered) questions about the Internet. I just posted two new questions to that page, and I thought I would also mention them here, in case anyone on this list knows any (partial) answers to any of them. The new questions: ROUND-TRIP TIMES (HOPS, NUMBER OF ASes) OF PACKETS? For packets on a particular link, each packet could be assigned an estimated round-trip time, a number of ASes for the end-to-end path, etc, based on the IP source and destination addresses for that packet. For packets on a particular link, what can we say about the distribution of round-trip times, or of the number of hops traversed, or the number of ASes traversed, or number of continents traversed, or (this is harder) the number of congested links traversed? [Example: For link X, can we say that most packets/bytes stay on that continent? Or that most packets have a minimum round-trip time of at least S seconds? Or that most packets on that link during this period of time traverse more than one congested link on their path from source to destination?] PERIODS OF EXTREME CONGESTION AT A ROUTER? For those routers in the network that do occationally experience congestion, how can we characterize their rare periods of *extreme* congestion (defining extreme congestion, say, as packet drop rates above 5%)? How frequently to these periods of extreme congestion occur, and how long do they last? What fraction can be attributed to flash crowds? to Denial of Service attacks? to fiber cuts or other routing changes? Many thanks, - Sally -------------------------------- http://www.aciri.org/floyd/ -------------------------------- From dovrolis at mail.eecis.udel.edu Thu Mar 15 14:33:18 2001 From: dovrolis at mail.eecis.udel.edu (Constantinos Dovrolis) Date: Thu Mar 25 11:59:35 2004 Subject: [e2e] two questions about the Internet In-Reply-To: <200103151708.f2FH8u523767@elk.aciri.org> Message-ID: Sally, I may have a (very partial) answer to the first question, i.e., what is the distribution of round-trip times (RTTs) for the packets on a certain link. A couple of initial "disclaimers": a) an RTT can be only associated to a packet of a closed-loop protocol, and so our measurements only looked at TCP packets b) our measurements do not refer to per-packet RTTs, but to per-connection RTTs. It is likely that there are important differences in these two distributions (short-RTT flows may tend to carry more data, causing the per-packet RTT distribution to be heavier on lower RTT values). So, the measurements that I refer to were done in the summer of 99 at CAIDA. We were processing traffic traces captured from passive monitors on certain links (note that we get two different traces, one for each direction of the link). In a certain trace, we were estimating the RTT of each TCP connection using the following two rules: a) if we observe the flow from the caller to the callee, the RTT is estimated as the time interval from the SYN to the SYN-ACK. b) if we observe the flow from the callee to the caller (which is usually the traffic from the server to the client), the RTT is estimated from the time spacing of the first 2 or 3 slow-start bursts. The code to do this is tricky (I can send to you, or to anyone else, the code if you want to play with this). So, using these tricks we were measuring the distribution of RTTs in the TCP connections that were present in each (unidirectional) trace. Just as an example of the distributions that we were getting, take a look at: http://www.cis.udel.edu/~dovrolis/rtt-sdsc.eps The graph shows two RTT distributions, one for each direction of the OC-3 link that used to connect UCSD with CERFnet. A few major points from the graph: - About 35% of the connections have RTT < 50ms - About 60% of the connections have RTT < 100ms - There is a significant fraction of connections (20-30%) with RTT>200ms (which is probably close to the upper bound for any type of interactive applications). - About 10% of the RTTs are quite large (some of them in the order of multiple seconds), which may indicate errors in our measurement methodology. This is why I did not include that fraction of RTTs in the graph. Some very interesting measurements on this subject also appear at Mark Allman's "A Web server's view of the transport layer" published at CCR Oct-2000. Mark's measurements originate from traces of the server's traffic (instead of a passive monitor in the the network). Also, he could measure the RTTs more accurately based on the time distance between a non-retransmitted packet and the corresponding ACK. Obviously we cannot do the same, because we don't have the flow of ACKs in the trace. It is interesting that Mark's measurements (see Figure 9) are not *very* different from the graph that I mentioned before. Specifically, his graph shows: - About 35% of the RTTs < 100msec - About 60-70% of the RTTs < 200msec - About 85% of the RTTs < 500msec. Of course Mark's measurements/analysis were much more methodically done (my measurements were only done to get some reasonable values for simulations about other stuff). I hope that this helps. I am also very interested in answers to the rest of your questions. Constantinos Computer and Information Sciences - University of Delaware http://www.cis.udel.edu/~dovrolis/ On Thu, 15 Mar 2001, Sally Floyd wrote: > I maintain a web page > http://www.aciri.org/floyd/questions.html > of (mostly unanswered) questions about the Internet. > I just posted two new questions to that page, and I thought I would > also mention them here, in case anyone on this list knows any > (partial) answers to any of them. > > The new questions: > > ROUND-TRIP TIMES (HOPS, NUMBER OF ASes) OF PACKETS? > For packets on a particular link, each packet could be assigned an > estimated round-trip time, a number of ASes for the end-to-end > path, etc, based on the IP source and destination addresses for > that packet. For packets on a particular link, what can we say > about the distribution of round-trip times, or of the number of hops > traversed, or the number of ASes traversed, or number of continents > traversed, or (this is harder) the number of congested links traversed? > > [Example: For link X, can we say that most packets/bytes stay on > that continent? Or that most packets have a minimum round-trip > time of at least S seconds? Or that most packets on that link > during this period of time traverse more than one congested link > on their path from source to destination?] > > PERIODS OF EXTREME CONGESTION AT A ROUTER? > For those routers in the network that do occationally experience > congestion, how can we characterize their rare periods of *extreme* > congestion (defining extreme congestion, say, as packet drop rates > above 5%)? How frequently to these periods of extreme congestion > occur, and how long do they last? What fraction can be attributed > to flash crowds? to Denial of Service attacks? to fiber cuts or > other routing changes? > > Many thanks, > - Sally > -------------------------------- > http://www.aciri.org/floyd/ > -------------------------------- > From nahum at watson.ibm.com Thu Mar 15 15:51:11 2001 From: nahum at watson.ibm.com (Erich Nahum) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: from "Constantinos Dovrolis" at Mar 15, 2001 05:33:18 PM Message-ID: <200103152351.SAA33906@orinoco.watson.ibm.com> Constantinos Dovrolis writes: > > It is interesting that Mark's measurements (see Figure 9) are not *very* > different from the graph that I mentioned before. Specifically, > his graph shows: > - About 35% of the RTTs < 100msec > - About 60-70% of the RTTs < 200msec > - About 85% of the RTTs < 500msec. > Of course Mark's measurements/analysis were much more methodically > done (my measurements were only done to get some reasonable values > for simulations about other stuff). Srini Seshan (when he was here at Watson) had some packet trace data from the 1996 Olympic Web server, but it's a bit old now. The technique was similar to what Mark Allman did. For the record, though, it had: - 25% of the RTTs < 115 ms - 50% of the RTTs < 338 ms - 75% of the RTTs < 778 ms The RTTs are obviously going to vary depending on what kind of connection you have (T3, OC-768) as well as where your clients are (NY, CA, Greece). -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking Research P.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598 From ggm at dstc.edu.au Thu Mar 15 16:23:51 2001 From: ggm at dstc.edu.au (George Michaelson) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: Your message of "Thu, 15 Mar 2001 18:51:11 EST." <200103152351.SAA33906@orinoco.watson.ibm.com> Message-ID: <12360.984702231@dstc.edu.au> Srini Seshan (when he was here at Watson) had some packet trace data from the 1996 Olympic Web server, but it's a bit old now. The technique was similar to what Mark Allman did. For the record, though, it had: - 25% of the RTTs < 115 ms - 50% of the RTTs < 338 ms - 75% of the RTTs < 778 ms The RTTs are obviously going to vary depending on what kind of connection you have (T3, OC-768) as well as where your clients are (NY, CA, Greece). -Erich The 96 Olympics were hosted behind multiple backends, geographically distributed? I thought Nagano was, I went to a seminar by IBM on it. Because if so, there were presumably frontend boxes making decisions on backend server, which would either intuit best-fit path or else map it into some simple model like BGP AS or link-based region and so skew RTT in favour of shorter-hop and/or ligher-load hosts. -George -- George Michaelson | DSTC Pty Ltd Email: ggm@dstc.edu.au | University of Qld 4072 Phone: +61 7 3365 4310 | Australia Fax: +61 7 3365 4311 | http://www.dstc.edu.au From nahum at watson.ibm.com Thu Mar 15 18:04:46 2001 From: nahum at watson.ibm.com (Erich Nahum) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: <12360.984702231@dstc.edu.au> from "George Michaelson" at Mar 16, 2001 10:23:51 AM Message-ID: <200103160204.VAA53100@orinoco.watson.ibm.com> George Michaelson writes: > > The 96 Olympics were hosted behind multiple backends, geographically > distributed? I thought Nagano was, I went to a seminar by IBM on it. > > Because if so, there were presumably frontend boxes making decisions > on backend server, which would either intuit best-fit path or else > map it into some simple model like BGP AS or link-based region and > so skew RTT in favour of shorter-hop and/or ligher-load hosts. 96 (Atlanta) was the first olympics that IBM hosted, and I believe it was just one complex in Southbury, CT. 98 (Nagano) and 2000 (Australia) were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and someplace in Ohio (for the Americas) and Tokyo (for Asia). The request routing was done on a very course-grain level, basically through the routing tables. E.g., if you were in Europe, olympics.com pointed to Bethesda. I think it was done at the routing layer and not through DNS. The front ends of each cluster were IBM network dispatcher TCP sprayers, which routed to back-end nodes on the same LAN. So I believe the RTT distribution seen by a complex would be the same across nodes within that cluster. -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking Research P.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598 From hari at chive.lcs.mit.edu Thu Mar 15 19:02:17 2001 From: hari at chive.lcs.mit.edu (Hari Balakrishnan) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: Message from Erich Nahum of "Thu, 15 Mar 2001 21:04:46 EST." <200103160204.VAA53100@orinoco.watson.ibm.com> Message-ID: <200103160302.f2G32Ha28828@chive.lcs.mit.edu> Erich, > George Michaelson writes: > > > > The 96 Olympics were hosted behind multiple backends, geographically > > distributed? I thought Nagano was, I went to a seminar by IBM on it. > > > > Because if so, there were presumably frontend boxes making decisions > > on backend server, which would either intuit best-fit path or else > > map it into some simple model like BGP AS or link-based region and > > so skew RTT in favour of shorter-hop and/or ligher-load hosts. > > 96 (Atlanta) was the first olympics that IBM hosted, and I believe it was > just one complex in Southbury, CT. For the 1996 Atlanta games, IBM actually ran multiple servers for the Olympics, but they weren't transparent (i.e., they had distinct DNS names). The data being referred to here was collected at Southbury, CT. The other sites were, if I recall, at Keio (Japan), Cornell (NY), Karlsruhe (Germany), and Hursley (UK). The Southbury site was connected via T3 links to 4 US NAPs: Chicago (Bellcore & Ameritech), SF Bay Area (Bellcore and PacBell), NY (Sprint), and DC (MFS Datanet). > 98 (Nagano) and 2000 (Australia) > were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and > someplace in Ohio (for the Americas) and Tokyo (for Asia). The > request routing was done on a very course-grain level, basically > through the routing tables. E.g., if you were in Europe, > olympics.com pointed to Bethesda. I think it was done at > the routing layer and not through DNS. > The front ends of each cluster were IBM network dispatcher TCP > sprayers, which routed to back-end nodes on the same LAN. So > I believe the RTT distribution seen by a complex would be the same > across nodes within that cluster. Sounds about right, if you believe the load-balancing was working correctly. :} (I'm not saying it wasn't!) Hari > > -Erich > > -- > Erich M. Nahum IBM T.J. Watson Research Center > Networking Research P.O. Box 704 > nahum@watson.ibm.com Yorktown Heights NY 10598 From widmer at informatik.uni-mannheim.de Fri Mar 16 05:37:48 2001 From: widmer at informatik.uni-mannheim.de (Joerg Widmer) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] SIG on Networked Computer Games Message-ID: <3AB2172C.EA67F19A@informatik.uni-mannheim.de> Hi all, With respect to the recent discussion about networked computer games this SIG might be of interest. Cheers, J?rg/Martin Call for Participation Special Interest Group on Networked Computer Games (SIG NetGame) Over the past three to four years networked computer games have been a tremendous commercial success. Games like Ultima Online, Everquest, Doom, Quake, Diablo II and others have attracted an audience of several million players, worldwide. They are one of the few Internet services for which end users are actually willing to pay money. As the Internet becomes ubiquitous through wireless and/or cheaper Internet access the the audience for networked computer games will increase rapidly, creating a mass market with a multi-billion dollar volume. However, most - if not all - of the successful networked computer games have encountered a large number of technical challenges that are inherent to this application area. These range from inadequate support by network and transport protocols to consistency problems and security breaches (or cheating as players prefer to call it). At the same time scientists have begun to discover networked computer games as an extremely challenging and rewarding area of research. What makes this area of research particularly fascinating is that solutions found for networked computer games tend to solve related problems in other areas such as computer supported collaborative work, distance education and telemedicine. It is the aim of this SIG to bring together developers of commercial and non-commercial networked computer games, service providers, scientists, and interested individuals in order to discuss - and possibly solve - technical challenges of networked computer games. Topics of interest include, but are certainly not limited to: - network and transport protocols - application-level protocol design - architectures for service providers - consistency mechanisms - security / cheating prevention - middle-ware (e.g. Direct Play) - billing and charging ... for networked computer games. You can subscribe to the NetGame mailing list through the NetGame webpage: http://www.informatik.uni-mannheim.de/netgame/index.html or by sending a mail to: netgame-l-request@pi4.informatik.uni-mannheim.de with the following line in the BODY of the message: subscribe I'm looking forward to interesting discussions on netgame-l. Sincerely, Martin Mauve Disclaimer: This SIG is currently not affiliated with any other organization. From oleg at inforocket.com Fri Mar 16 06:46:43 2001 From: oleg at inforocket.com (Oleg Vishnepolsky) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: <200103160204.VAA53100@orinoco.watson.ibm.com> Message-ID: Erich M. Nahum writes: >96 (Atlanta) was the first olympics that IBM hosted, and I believe it was >just one complex in Southbury, CT. 98 (Nagano) and 2000 (Australia) >were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and >someplace in Ohio (for the Americas) and Tokyo (for Asia). The >request routing was done on a very course-grain level, basically >through the routing tables. E.g., if you were in Europe, >olympics.com pointed to Bethesda. I think it was done at >the routing layer and not through DNS. How is it even possible not to involve DNS ? If DNS was giving out the same IP address to olympics.com irrespective of the where requests came from, then routing would have been real tricky, to say the least. Oleg Vishnepolsky From pingpan at juniper.net Fri Mar 16 09:40:48 2001 From: pingpan at juniper.net (Ping Pan) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet References: <200103151708.f2FH8u523767@elk.aciri.org> Message-ID: <3AB25020.99C75551@cs.columbia.edu> Sally Floyd wrote: > > The new questions: > > ROUND-TRIP TIMES (HOPS, NUMBER OF ASes) OF PACKETS? > For packets on a particular link, each packet could be assigned an > estimated round-trip time, a number of ASes for the end-to-end > path, etc, based on the IP source and destination addresses for > that packet. For packets on a particular link, what can we say > about the distribution of round-trip times, or of the number of hops > traversed, or the number of ASes traversed, or number of continents > traversed, or (this is harder) the number of congested links traversed? > Hi, Hop-counters: http://www.nlanr.net/NA/Learn/wingspan.html AS length: http://moat.nlanr.net/ASPL/ (from University of Oregon) BTW, there are several good pages on Internet questions: 1. Henning Schulzrinne: http://www.cs.columbia.edu/~hgs/internet/traffic.html 2. Merit: http://www.merit.edu/ipma/reports/ 3. NLANR: http://www.nlanr.net/NA/Learn/ - Ping Pan From pingpan at juniper.net Fri Mar 16 10:07:48 2001 From: pingpan at juniper.net (Ping Pan) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet References: <200103151708.f2FH8u523767@elk.aciri.org> Message-ID: <3AB25674.B3EE78A9@juniper.net> Sally Floyd wrote: > > PERIODS OF EXTREME CONGESTION AT A ROUTER? > For those routers in the network that do occationally experience > congestion, how can we characterize their rare periods of *extreme* > congestion (defining extreme congestion, say, as packet drop rates > above 5%)? How frequently to these periods of extreme congestion > occur, and how long do they last? What fraction can be attributed > to flash crowds? to Denial of Service attacks? to fiber cuts or > other routing changes? > Almost forgot, please take a look at http://www.nordu.net/stats/. This is one of the better places where you can monitor the link traffic for both average and peak rates, and draw your own conclusion on link congestion and duration. In the past several years, most of US providers stop showing their networks, and only provide the average bw utilization, which is low anyway. It is believed that the peak/average ratio is around 3-4 or higher, but I have not seen solid evidence on this since NSFNET. I don't think there are too many fiber cuts in the network (well... on the other hand, China-US undersea cable was cut many days ago, and the link was still down.) But in some networks, providers do shift traffic between links quite often. - Ping From nahum at watson.ibm.com Fri Mar 16 11:00:51 2001 From: nahum at watson.ibm.com (Erich Nahum) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] two questions about the Internet In-Reply-To: from "Oleg Vishnepolsky" at Mar 16, 2001 09:46:43 AM Message-ID: <200103161900.OAA33888@orinoco.watson.ibm.com> Oleg Vishnepolsky writes: > > >96 (Atlanta) was the first olympics that IBM hosted, and I believe it was > >just one complex in Southbury, CT. 98 (Nagano) and 2000 (Australia) > >were hosted by 4 main sites: Bethesda (for Europe), Shaumberg IL and > >someplace in Ohio (for the Americas) and Tokyo (for Asia). The > >request routing was done on a very course-grain level, basically > >through the routing tables. E.g., if you were in Europe, > >olympics.com pointed to Bethesda. I think it was done at > >the routing layer and not through DNS. > > How is it even possible not to involve DNS ? If DNS was giving out the > same IP address to olympics.com irrespective of the where requests > came from, then routing would have been real tricky, to say the least. I wasn't the one who did the work, so take my recollections with a grain of salt. Hari was one of the authors on the SigMetrics 97 and InfoCom 98 papers that describe this work, so I would trust him on this one about the 96 olympics. As for the later ones, this is what I've been told. It doesn't seem tricky to me, but I'm not a routing person. I'll try to dig up the info and post it here next week. -Erich -- Erich M. Nahum IBM T.J. Watson Research Center Networking Research P.O. Box 704 nahum@watson.ibm.com Yorktown Heights NY 10598 From dino.saija at libero.it Tue Mar 20 01:57:41 2001 From: dino.saija at libero.it (dino.saija@libero.it) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Traffic generator Message-ID: I'd like to use a free TCP traffic generator(i.e in c++).Where can I find? thank you From srh at merit.edu Tue Mar 20 05:50:12 2001 From: srh at merit.edu (Susan Harris) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] NANOG 22 CFP Message-ID: * * * * * * * * * * * * * * * * * CALL FOR PRESENTATIONS * * NANOG 22 * * May 20 - 22, 2001 * * * * * * * * * * * * * * * * * The North American Network Operators' Group (NANOG) will hold its 22nd meeting in Scottsdate, Arizona, between May 20-22, 2001. The meeting will be hosted by CenterGate Research Group. NANOG conferences provide a forum for the coordination and dissemination of technical information related to large-scale (i.e., national/international) Internet backbone networking technologies and operational practices. NANOG meetings, held three times each year, include two days of short presentations, plus afternoon/evening tutorial sessions. Meetings are informal, with an emphasis on relevance to current backbone engineering practices. The conference draws over 600 participants, mainly consisting of engineering staff from large national service providers, and members of the research and education community. Now in its sixth year, NANOG evolved from the NSFNET "regional-techs" meetings, where technical staff from the regional networks met to discuss operational issues of common concern. With the emergence of the commercial Internet, NANOG meetings evolved to include a broader base of providers, network operators, and researchers. The meeting will be held at the DoubleTree Paradise Valley. For more information about NANOG meetings, schedules, and logistics, see: http://www.nanog.org ------------------------------------------------------------------------------ CALL FOR PRESENTATIONS NANOG invites presentations on backbone engineering, coordination, and research topics. Presentations should highlight issues relating to technology already deployed or soon to be deployed in core Internet backbones and exchange points. Previous meetings have included presentations on: - Backbone traffic engineering - Coordination of inter-provider QoS - Deployment experience with queueing disciplines (CAR, RED) - Inter-provider security and routing protocol authentication - Routing scalability in backbone infrastructures - Security issues for the Internet core - Routing policy specification and backbone router configuration - Building large-scale measurement infrastructure - Cooperative inter-provider caching - Alternatives to hot-potato routing - Recommendations on queue management and congestion avoidance - Experience with differentiated services - Reports from next-generation networks (CANARIE, Internet2)) - Inter-domain multicast deployment - Backbone network failure analysis Tutorials have covered topics such as: - BGP case studies - MPLS fundamentals - External route selection - IP multicast technologies - Distributed content caching in large IP networks ------------------------------------------------------------------------------ HOW TO PRESENT Submit an informal one- or two-paragraph abstract describing the presentation in email to nanog-support@nanog.org. The deadline for proposals is April 9, 2001. While the majority of speaking slots will be filled by April 9, a limited number of slots will be available after that date for topics that are exceptionally timely and important. Submissions will be reviewed by the NANOG Program Committee, and presenters will be notified of acceptance by April 23, 2001. NANOG also welcomes suggestions/recommendations for tutorials, panels and other presentation topics. --------------------------------------------------------------------------- From hussein at ee.washington.edu Thu Mar 22 18:40:41 2001 From: hussein at ee.washington.edu (Alhussein Abouzeid) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP timestamping inquiry In-Reply-To: <200103161900.OAA33888@orinoco.watson.ibm.com> Message-ID: Hi all, Anyone has any pointers to info/measurements regarding noticeable performance issues with/without TCP time-stamping or any deployment issues? specifically, I am interested in any clear negative, positive or 'no effect' results regarding its use in the Internet, satellite or wireless/ad-hoc. Thanks in advance, Hussein. From Michael.Meyer at eed.ericsson.se Fri Mar 23 02:08:31 2001 From: Michael.Meyer at eed.ericsson.se (Michael Meyer (EED)) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] Initial TCP Window Message-ID: <5E5172B4DE05D311B3AB0008C75DA94106D7B454@edeacnt100.eed.ericsson.se> Does anyone know the current status, which initial window size is used for TCP in different operating systems? Are most up-to-date with RFC2581 (two segments) ? It would be interested in - windows NT - windows 95, 98 and 2000 - Linux /Michael From mallman at grc.nasa.gov Fri Mar 23 06:32:36 2001 From: mallman at grc.nasa.gov (Mark Allman) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP timestamping inquiry Message-ID: <200103231432.JAA25090@lawyers.grc.nasa.gov> > Anyone has any pointers to info/measurements regarding noticeable > performance issues with/without TCP time-stamping or any > deployment issues? specifically, I am interested in any clear > negative, positive or 'no effect' results regarding its use in the > Internet, satellite or wireless/ad-hoc. Vern Paxson and I found that using timestamps with the current RTO algorithm doesn't really buy you much. See: Mark Allman, Vern Paxson. On Estimating End-to-End Network Path Properties. ACM SIGCOMM, September 1999. http://roland.grc.nasa.gov/~mallman/papers/estimation.ps (However, note that timestamps are needed for PAWS if your sending rate is quite high.) As for deployment, I have some measurements on the use of timestamps "in the wild" in the following paper: Mark Allman. A Web Server's View of the Transport Layer. ACM Computer Communication Review, 30(5), October 2000. http://roland.grc.nasa.gov/~mallman/papers/webobs-ccr.ps allman --- Mark Allman -- BBN/NASA GRC -- http://roland.grc.nasa.gov/~mallman/ From michael at tk.uni-linz.ac.at Fri Mar 23 08:19:49 2001 From: michael at tk.uni-linz.ac.at (Michael Welzl) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup Message-ID: Hi all, I would really appreciate feedback on this draft, especially from the router vendor folks :) ---------------------------------------------------------------------- A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : IP Fast Option Lookup Author(s) : M. Welzl Filename : draft-welzl-opt-lookup-00.txt Pages : 8 Date : 22-Mar-01 This memo describes a new IP Option type that allows routers to more efficiently check whether the IP header contains options that need further processing. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-welzl-opt-lookup-00.txt (..) ---------------------------------------------------------------------- Specific questions include: - What do you think of the idea in general? Is it nice, or plainly a useless waste of space in the IP header? - I am somewhat unsure about the offsets (the second octet of each Option Entry) - should I really leave them in there? They're additionally wasted space and are only useful if an option is found... which probably means that the packet is going to end up in the slow path anyway. An alternate version of the draft (without the offset) is available from ftp://ftp.tk.uni-linz.ac.at/pub/michael/lookup-nooffset/ - Where else should I discuss this? Would the NANOG list be appropriate? Cheers, Michael Welzl From dpreed at reed.com Fri Mar 23 09:20:47 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] web100 Message-ID: <5.0.2.1.2.20010323121349.02fd0ab0@mail.reed.com> So, I got a press release on web100.org and its TCP improvement software. The press will probably get this completely wrong (the slant in the press release is that TCP is *the big problem* and that scarce bandwidth is the reason we can't use 100 MB pipes). Has anyone done any studies that would reasonably support the huge investment here? - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html From rja at inet.org Fri Mar 23 09:48:06 2001 From: rja at inet.org (RJ Atkinson) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup In-Reply-To: Message-ID: <5.0.2.1.2.20010323124628.009f6090@10.30.15.2> At 11:19 23/03/01, Michael Welzl wrote: >Hi all, > >I would really appreciate feedback on this draft, >especially from the router vendor folks :) It isn't needed by anyone who has ASIC-based forwarding. Folks building big routers these days generally either already have or are moving to ASIC-based forwarding. I work for a small router vendor with ASIC-based forwarding. THis option isn't especially interesting to us at least. Ran rja@inet.org From vjs at calcite.rhyolite.com Fri Mar 23 10:27:56 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup Message-ID: <200103231827.f2NIRuX12662@calcite.rhyolite.com> > From: Michael Welzl > I would really appreciate feedback on this draft, especially from > the router vendor folks :) I'm not a router person, but I have designed and implemented commercial stuff that peeked at headers and chose faster or slower paths. > - What do you think of the idea in general? Is it nice, or plainly a > useless waste of space in the IP header? The history of such speed hints is bad. A fast path wants to deal with simple things, and it is usually trivial to detect things that are not simple. For IP headers, I bet most implementations would do best by noticing whether there are any IP options at all. In this case, given MPLS and other tag-forwarding schemes, what's the point? That this option is variable length means that it is among the most complicated IP options, which is an odd characteristic for something that is intended to help things go faster. That it must not be used if there are fewer than two other IP options confuses me. How many IP packets have more than 2 options? Has the list of possible IP options exploded while I wasn't paying attention? There are other problems with options such as this. For example, why would hosts add them? "To make routers go faster" is not a reason, because the speed of routers doesn't affect a host's SPECMARK or other benchmark value. Then there are the boundary and error conditions. For example, what happens if an option is mentioned in this option is absent? What if there are 2 Fast Option Lookup options or if another option precedes it? > ... > - Where else should I discuss this? Would the NANOG list be appropriate? Long ago, there was some overlap between those who buy and operate routers and those who design and implement things. Today, the camps are quite separate. (Never mind those whose resumes say they have "implemented TCP/IP in the Enterprise".) Too many operators tend to be uncritical of sales blarney about the current magic speed pill. Previous pills included "ASCI" and "RISC." "RISC core" and "DSP" are more recent. It is possible to sell such features in forums such as NANOG, but not profitably. Features that cannot be detected by watching the wire are ignored in the long run. A router that does the obvious and uses a fast path for IP headers with no options and a slow path for IP headers with any option would be indistinguishable from a router that used a hint like this. Sooner instead of later, designers omit those magic sales pill, usually without telling their own salescritters and customers and always without telling the competition. Finally and far more important, if you want to design such things, it's best to start by designing a faster router in private. Only after you have some experience with what makes a router (or anything) faster is it wise to consider publishing an RFC instructing other people. The weaknesses in the RFC's that tried to instruct how to make compute the TCP checksum faster are classic examples of that syndrome. Vernon Schryver vjs@rhyolite.com From RACarlson at anl.gov Fri Mar 23 11:08:56 2001 From: RACarlson at anl.gov (Richard Carlson) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] web100 In-Reply-To: <5.0.2.1.2.20010323121349.02fd0ab0@mail.reed.com> Message-ID: <4.3.2.7.2.20010323123011.00aec6b0@atalanta.ctd.anl.gov> David; Can you elaborate on your question? Are you asking if TCP stacks are really a performance bottleneck, if bandwidth is a scarce resource, of if we have any proof of this? From the DOE perspective getting access to high bandwidth pipes is not the major problem scientific applications are running into. There is 'easy' access to OC-3 to OC-48 links both within North America and around the globe. (Take a look at the number of OC-3/12 links coming into the US from Europe.) The problem is getting effective e2e throughput (goodput) through between 2 nodes (i.e., moving a GB of data from a storage system at SLAC to a users desktop at UTK). The BW*delay product requires large windows on both end nodes and almost no loss over SLAC's campus network, ESnet, Abilene, and UTK's campus network. The major problem DOE scientists have is determining why the goodput is so low (i.e., 5 Mbps e2e over a 100 Mbps channel). The Web100 activities are designed to answer the question 'is the biggest problem in the local host, the remote host, or the network'. Getting an authoritative answer to this simple question would be of immense value to the DOE scientific community and well worth the investment NSF is making in funding the Web100 activities. Rich At 12:20 PM 3/23/01 -0500, David P. Reed wrote: >So, I got a press release on web100.org and its TCP improvement software. > >The press will probably get this completely wrong (the slant in the press >release is that TCP is *the big problem* and that scarce bandwidth is the >reason we can't use 100 MB pipes). > >Has anyone done any studies that would reasonably support the huge >investment here? > >- David >-------------------------------------------- >WWW Page: http://www.reed.com/dpr.html > > ------------------------------------ Richard A. Carlson e-mail: RACarlson@anl.gov Network Research Section phone: (630) 252-7289 Argonne National Laboratory fax: (630) 252-4021 9700 Cass Ave. S. Argonne, IL 60439 From michael at tk.uni-linz.ac.at Fri Mar 23 11:40:24 2001 From: michael at tk.uni-linz.ac.at (Michael Welzl) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup In-Reply-To: Message-ID: > > - What do you think of the idea in general? Is it nice, or plainly a > > useless waste of space in the IP header? > > The history of such speed hints is bad. A fast path wants to deal > with simple things, and it is usually trivial to detect things that > are not simple. For IP headers, I bet most implementations would do > best by noticing whether there are any IP options at all. In this > case, given MPLS and other tag-forwarding schemes, what's the point? It is useless for routers which simply ignore IP options; this option is supposed to help routers which DO support options, but only a subset because most are turned off. > That it must not be used if there are fewer than two other IP options > confuses me. How many IP packets have more than 2 options? More than 1. And not many, I suppose. It is a small aid for a rare case :) But it is useless when there is only one option (you don't need to have an index for ONE entry). > Has the list > of possible IP options exploded while I wasn't paying attention? I just registered a few new ones to push this draft :) On a serious note, I DO agree that packets with more than one option will be rare. Still, it's a possibility. > There are other problems with options such as this. For example, > why would hosts add them? "To make routers go faster" is not > a reason, because the speed of routers doesn't affect a host's > SPECMARK or other benchmark value. Right - it's just a recommendation. > Then there are the boundary and error conditions. For example, what > happens if an option is mentioned in this option is absent? This is described in the "security issues" section. > What if there > are 2 Fast Option Lookup options or if another option precedes it? This should not be the case according to the draft. But I agree that it should be discussed (actually, another option preceding it won't cause much trouble - it's just inefficient). > Finally and far more important, if you want to design such things, > it's best to start by designing a faster router in private. > Only after > you have some experience with what makes a router (or > anything) faster is > it wise to consider publishing an RFC instructing other people. Designing a router is not an option for me. I trust in the IESG to prevent me from publishing an absolutely pointless RFC, though :) Cheers, Michael Welzl From ddc at lcs.mit.edu Fri Mar 23 11:40:39 2001 From: ddc at lcs.mit.edu (David Clark) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] web100 In-Reply-To: <5.0.2.1.2.20010323121349.02fd0ab0@mail.reed.com> References: <5.0.2.1.2.20010323121349.02fd0ab0@mail.reed.com> Message-ID: At 12:20 PM -0500 3/23/01, David P. Reed wrote: >So, I got a press release on web100.org and its TCP improvement software. > >The press will probably get this completely wrong (the slant in the >press release is that TCP is *the big problem* and that scarce >bandwidth is the reason we can't use 100 MB pipes). > >Has anyone done any studies that would reasonably support the huge >investment here? > >- David Dave, Not sure what you mean by "huge investment". (They just got funded at a rate of just under $1M a year, which is not all that much these day...) I think what these folks are doing is trying to distribute software that is pre-configured so that it actually goes fast, as opposed to what happens today. Guys like Matt Mathis have put a lot of work into understanding the tuning of TCP, and so on, and they have a lot of real world knowledge. The problem today is that the vendors are not shipping stuff that benefits from that knowledge. I think that TCP is the problem, but it is the implementation, not the design. That is what the press may get wrong. Dave From vjs at calcite.rhyolite.com Fri Mar 23 11:40:34 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup Message-ID: <200103231940.f2NJeYQ13951@calcite.rhyolite.com> > It isn't needed by anyone who has ASIC-based forwarding. > Folks building big routers these days generally either already > have or are moving to ASIC-based forwarding. I work for a > small router vendor with ASIC-based forwarding. THis option > isn't especially interesting to us at least. "ASIC-based forwarding" might mean "has specialized hardware for the fast path. However, in general "ASIC-based" is as meaningful as "electronic based," no matter that router users and the trade rags have been talking about "ASIC" as magic speed pill for 10 years. From what I've seen, whether you use full-custom, custom with purchase IP (not the protocol) such as RISC cores, ASIC's, only commodity parts, or some other point in the spectrum no more about speed than other design issues including power, real estate (both board and package), product life, time-to-market, and available design and simulation tools and talent. For example, with high enough volumes, a full custom silicon but rather slow router (e.g. SOHO) might make sense. Well, I am assuming that ASIC means application specific integrated circuit, the less aggressive shore of the full custom swamp. And I've never been involved with router custom silicon, although I have watched fun with ion milling and related wonders in other contexts. Vernon Schryver vjs@rhyolite.com From raj at cup.hp.com Fri Mar 23 11:55:27 2001 From: raj at cup.hp.com (Rick Jones) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] web100 References: <4.3.2.7.2.20010323123011.00aec6b0@atalanta.ctd.anl.gov> Message-ID: <3ABBAA2F.785033C@cup.hp.com> I suspect this could be an issue: #begin exceprt http://www.web100.org/papers/bdp.discovery.html ... The method proposed herein for automatic BDP discovery and caching is to use a simple mechanism modeled after the ICMP Echo Request and Echo Reply protocol to discover the bandwidth of the least-capable hop between a given source and destination host pair. This new mechanism could be a new type of ICMP Request/Reply pair, or it could be a simple enhancement to the existing Echo Request/Reply, but using a new IP option class/number combination. The main difference between the new mechanism and the existing ICMP Request/Reply pair is that the router would have to process two new fields in the message. ... Development of this BDP protocol initially requires the cooperation of at least one router vendor, though a crude prototype could be demonstrated with traceroute and SNMP-derived information. #end Seems that the ifSpeed fields of the standard SNMP MIBs would be the best way to go here anyhow. It does mean knowing the community string or authentication stuff for the SNMP access. True, that will have "issues" crossing AS (is that the right term?) but then I suspect that AS would not like to have that bandwidth info escaping their shpere anyhow. As far as driving "supercharged Web" (http://www.web100.org/papers/web100.html) I would have thought that if the commercial types were that keen on it, they would be taking part in the SPECweb9X benchmarks and perhaps the IRC bakeoffs. If the 100 in web100 is supposed to represent 100 Mbit/s, those benchmarks are already demonstrating solutions going far faster. The stuff about driving demand for fibre to the home was fun to read in the context of long-haul bandwidth prices bottoming-out due to oversupply, and vendors not being able to recoup their investments. Other interesting things from the concept paper: #begin A great deal of fine research has been underway by the Pittsburgh Supercomputer Center's Networking Research Group, the University of Washington's Department of Computer Science & Engineering, and several other groups regarding networking performance tuning and TCP protocol stack improvements. This research needs to be intensified and capitalized upon in terms of application to the TCP protocol stack in the chosen development system. The individual research groups might also be more effective if their various efforts could be utilized in a cohesive fashion. For instance, no standing TCP-stack improvement forum exists to provide a focal point for the exchange of ideas. Finally, it should be noted that the TCP protocol stack improvement task would be the most complex and most difficult task of all of those listed. #end I guess e2e and tcp-impl don't count... :) #begin Needed TCP-stack improvements are listed below. Include Well-Known Mechanisms Standard mechanisms like per-destination MTU-discovery (RFC 1191) and "Large Windows" extensions to TCP (RFC1323) would certainly be included in the development system. #end is there a commercial stack out there that doesn't already have these things?!? Their target OS - Linux already has them. #begin include Advanced Mechanisms In addition to such standard mechanisms as listed above, more advanced improvements are needed. For instance, TCP Selective Acknowledgment (SACK), defined by RFC 2018, should also be included in the development system. #end hmm, also in the latest (?) linux bits, and in HP-UX 11, and in Solaris, and in WinSomething. Seems that is already done... #begin Furthermore, work needs to be done not just to improve high-performance networking, but to improve short-duration network-flows as well, particularly when congestion is relatively high, as such short-duration high-loss transfers are typical of most current Web transfers. Current end-to-end congestion avoidance and congestion control mechanisms can greatly impede performance in such circumstances. #end I must be missing something - that sounds like the increase in the allowable initial cwnd? #begin The following is a list of needed improvements. Kernel Hooks Currently, operating system kernels generally provide statistics regarding network only in the aggregate. Kernel hooks to monitor individual TCP sessions in real-time need to be added as a foundation for developing a large class of highly needed network diagnostic and performance monitoring tools. Such hooks should maintain dynamic counts of important TCP-session parameters, as well as be able to supply TCP-session packet streams upon demand. #end OK, per-session stats might be interesting. It will be move overhead in the stack of course :) #begin GUI-based TCP-Session Monitoring Tools Based upon the aforementioned kernel hooks, one or more TCP-monitoring tools need to be developed that are capable of concurrent, dynamic, real-time graphing of sets of user-selected real-time TCP-session statistics. Among these statistics are: data rate, window size, round-trip-time, number of packets unacknowledged, number of retransmitted packets, number of out-of-order packets, number of duplicate packets, etc. A variety of display options should be available such as totals, deltas, running-averages, etc. #end All nice and wizzy, but to what end? How a GUI for traceroute makes it any better is an open question. (I've not bothered to quote from the article) Anyhow, it sounds like nice cushy funding if you can get it :) rick jones -- ftp://ftp.cup.hp.com/dist/networking/misc/rachel/ these opinions are mine, all mine; HP might not want them anyway... :) feel free to email, OR post, but please do NOT do BOTH... my email address is raj in the cup.hp.com domain... From vjs at calcite.rhyolite.com Fri Mar 23 12:08:42 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] draft on IP Fast Option Lookup Message-ID: <200103232008.f2NK8gA14461@calcite.rhyolite.com> > From: Michael Welzl > It is useless for routers which simply ignore IP options; this option > is supposed to help routers which DO support options, but only a subset > because most are turned off. I said nothing about routers that simply ignore IP options because they are not routers, or at best are broken by design. Please read RFC 1812. There are a lot of MAY's for IP options, but there is at least one MUST. > ... > On a serious note, I DO agree that packets with more than one > option will be rare. Still, it's a possibility. Optimizing rare cases is rarely interesting. > ... > Designing a router is not an option for me. If you want to design router optimizations and you're like most of us and don't have a few $10M to fund a new router design, then why not get a job at a router vendor? Participation in the IETF is no more a substitute for experience impliementing routers than participation in the ISO was a substitute for designing and implementing transport protocols. > I trust in the IESG to prevent me from publishing an absolutely > pointless RFC, though :) The last I looked, the IESG is not a router vendor or custom silicon design group. In other words, that is not a reasonable or respectable hope, as demonstrated by plenty of RFC's. Specifying hardware optimizations without benefit of relevant design experience is unlikely to improve one's professional reputation outside the trade rags. The trade rags are something else. Vernon Schryver vjs@rhyolite.com From braden at ISI.EDU Fri Mar 23 12:54:03 2001 From: braden at ISI.EDU (Bob Braden) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing Message-ID: <200103232054.UAA16559@gra.isi.edu> A non-text attachment was scrubbed... Name: not available Type: x-sun-attachment Size: 4895 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20010323/7da5391a/attachment.ksh From campbell at comet.columbia.edu Fri Mar 23 16:02:23 2001 From: campbell at comet.columbia.edu (Andrew T. Campbell) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] Software release of the Columbia IP Micro-Mobility Suite (CIMS) Message-ID: <000801c0b3f5$b47f87b0$3d443b80@SWEETPEA> IP Micro-mobility has been a hot topic over the the last few years. We have released the Columbia IP Micro-Mobility Suite (CIMS) http://www.comet.columbia.edu/micromobility/ that includes an ns 2 extension for the following IP micro-mobility protocols: -Cellular IP (draft-ietf-mobileip-cellularip-00.txt) -HAWAII (draft-ietf-mobileip-hawaii-00.txt) -Hierarchical Mobile IP (draft-ietf-mobileip-reg-tunnel-04.txt) The Cellular IP implementation supports hard and semi-soft handoff, and IP paging. The Hawaii implementation supports Unicast Non-Forwarding (UNF) and Multiple Stream Forwarding (MSF) schemes. Hawaii's IP paging capability is currently not supported. In addition, the CIMS implementation of Hierarchical Mobile IP currently does not support IP paging. These and other features will be added in due course - we would be happy to add any extensions worked on by other groups to the next release of CIMS. Best, --- Andrew http://comet.columbia.edu/~campbell From chase at cs.duke.edu Fri Mar 23 13:58:46 2001 From: chase at cs.duke.edu (Jeff Chase) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing References: <200103232054.UAA16559@gra.isi.edu> Message-ID: <3ABBC716.3DA09850@cs.duke.edu> Bob Braden wrote: .... > the wire from what the user sends. (See the following sentences from > RFC 793, for example: > > The TCP is able to transfer a continuous stream of octets in each > direction between its users by packaging some number of octets into > segments for transmission through the internet system. In general, > the TCPs decide when to block and forward data at their own > convenience. Bob, your point is indeed subtle. One could also argue that this snippet of RFC 793 *supports* the "TCP framing" proposal, which does nothing more than to affirm the right of mutually consenting TCPs to block data at their own convenience when speaking amongst themselves (as negotiated by a ULP). > Now for the subtle bit. Generality and optimization are typically > contradictory. The Internet protocol suite was designed deliberately > and carefully for generality, at the possible expense of optimization. > It was also designed for simplicity at the expense of optimization. > We recognized that later engineering efforts would rob some of the > simplicity in order to reach greater optimization, and indeed, > this has happened and is probably not a bad thing. On the other > hand, we should be very wary of over-engineering optimal solutions > that cut down the generality. Any TCP implementation that supports this proposal is fully interoperable with any TCP that does not. The wire format does not change, ever, even when the feature is enabled. Thus it is interoperable with any other compliant TCP; indeed the RFC snippet you quoted above ensures this. So how does it compromise generality? You almost seem to be suggesting that a TCP implementor should never add a new locally-selectable policy feature, because any ULP that benefits from it won't benefit any more if you later take the feature away. In this case, the argument is specious because the intended beneficiary (a layered RDMA protocol called WARP that is still under development) can run happily over a TCP that does not support this feature; it just won't be capable of the same degree of hardware acceleration, in the case of an unreliable network. > change. The ULP proposal changes this to tight coupling, since it only > works if the send call units are mapped directly into segments. So > adopting ULP MAY (and note that one cannot ever be sure that it > will/won't) reduce the freedom of TCP to adapt to future changes. > And contrary to what the ULP proponents claim, it is a very fundamental > change in TCP. We should think VERY carefully before making such > a change, and we should be honest about what we are doing. No, the ULP proposal does not reduce TCP's freedom of choice, it only assumes that the TCP sender implementation notifies the ULP of its choice, e.g., by upcalling to the ULP buffers to fill each segment, as in many current implementations. No ULP will rely on this behavior, but if the sender TCP provides it, then the RDMA ULP can benefit from hardware acceleration. In most cases (e.g., BSD) this is a superficial change to the TCP *implementation* itself, although I will allow that it may be a fundamental change to the way some might think about the implementation. In any case, the proposal does not affect the TCP wire protocol, it does not affect interoperability, and it does not affect the congestion behavior. That is all that its proponents have claimed. We are honest people, and we are sincerely trying to do this in a way that is responsive to the legitimate but sometimes rather shrill concerns about "changing TCP" among those most experienced with TCP and its history. One worthwhile question to ask is: can an intermediary observing the flow at the TCP level determine for certain if this proposed feature is in use or not? Jeff From dpreed at reed.com Fri Mar 23 13:38:01 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] web100 In-Reply-To: <4.3.2.7.2.20010323123011.00aec6b0@atalanta.ctd.anl.gov> References: <5.0.2.1.2.20010323121349.02fd0ab0@mail.reed.com> Message-ID: <5.0.2.1.2.20010323162458.02fdfd90@mail.reed.com> At 01:08 PM 3/23/01 -0600, Richard Carlson wrote: >David; > >Can you elaborate on your question? Are you asking if TCP stacks are >really a performance bottleneck, if bandwidth is a scarce resource, of if >we have any proof of this? It was genuinely a question to clarify a press release and website that are quite puzzling. Fixing a performance bottleneck is a good thing to do, I just don't understand what the big hoopla is about, or why it takes $3MM. So, none of the above. I include the press release here - I've also looked at the website. Reading the press release and the website, I get the idea that there is an answer that is already being disseminated in the form of software (middleware?), and it has to do with TCP-MIBs and autotuning. So with claims of first distribution of a "solution" implied in the press release, it would be interesting to know if the researchers in the TCP field actually have validated that this is the source of the problem. Crappy applications programs and TCP implementations could be the problem, as well, one might think. Or maybe the APIs (Berkeley Sockets? and file system buffering don't work very well). And the mystery of why the project is called "WEB" 100? We know that web protocols have too much handshaking and parsing to be good bulk transfer vehicles. And what do supercomputer users have to do with the Web? But what most puzzles me is that this is an NSF research project, not a software development project, yet the press release talks about it as the latter. I'm probably just confused. Maybe this is how science is done these days, but I'd think that one grad student could have figured out where a bottleneck is by just a few measurements, then passed the info off to the community of developers to fix it. Since the project is "open source" according to the website (but I, at least, can't look at the source because I don't have a password), one might think that the fix would simply be posted, at low cost. ------------------------------------------------------------------------------ FOR IMMEDIATE RELEASE Mar. 19, 2001 Web100 Takes First Step Towards Improving Network Performance PITTSBURGH -- The Web100 Project has distributed the initial version of software that aims to bring data-transmission rates of 100 megabits per second to users of high-speed networks. Select researchers at universities and government laboratories are getting a sneak peek at the Web100 software to do real-world testing and provide feedback to developers. "Today's release of the Web100 software promises improved network performance at a time when bandwidth is increasingly precious," said Tom Greene, the Senior Program Director for Infrastructure in the National Science Foundation's Division of Advanced Networking Infrastructure and Research. "This type of middleware can help us use existing resources more efficiently." While most home users still connect to the Internet with a 56K modem, universities, research centers and some businesses today have connections capable of transmitting data at 100 megabits per second (Mbps) or higher. Research has shown, however, that users rarely see performance greater than three Mbps. Web100 researchers traced the problem to software that governs the Transmission Control Protocol (TCP) -- a "language" that computers use to communicate across networks. Networking experts are able to overcome this limit by fine tuning connections with adjustments to TCP. The Web100 software will eventually allow users to take full advantage of available network bandwidth without the help of a networking expert. Web100 programmers are refining TCP software in the Linux operating system to automatically achieve the highest possible transfer rate. "Our goal is to make it easier for everyone to move data across networks at 100 megabits per second or higher," said Matt Mathis, Pittsburgh Supercomputing Center network research coordinator and one of the principal investigators of Web100. Twenty-one researchers at ten institutions -- including Stanford Linear Accelerator Center, Oak Ridge National Laboratory, Lawrence Berkeley Laboratory and Argonne National Laboratory -- will test the initial release of Web100 software. At the University of Michigan, for example, Brian Athey will test the Web100 software for use with the Visible Human Project. Athey is working with Art Wetzel at PSC to develop applications that allow students to view large Visible Human data-sets over high-speed networks. "In situations of marginal bandwidth availability," said Athey, "tuning could make the difference between a choppy and unusable 500 Kbps to 1 Mbps stream to a perfectly useful 2 Mbps to 5 Mbps stream." The Web100 Project is a collaboration of Pittsburgh Supercomputing Center, the National Center for Atmospheric Research and the National Center for Supercomputing Applications. More information can be found at: http://www.web100.org/ # # # CONTACT: Sean Fulton sfulton@psc.edu Pittsburgh Supercomputing Center 412-268-4960 [R. Sean Fulton | Public Information Specialist | sfulton@psc.edu] [***** Pittsburgh Supercomputing Center | 412/268-7141 *****] ----------------------------------------------------------------------------- > From the DOE perspective getting access to high bandwidth pipes is not > the major problem scientific applications are running into. There is > 'easy' access to OC-3 to OC-48 links both within North America and around > the globe. (Take a look at the number of OC-3/12 links coming into the > US from Europe.) The problem is getting effective e2e throughput > (goodput) through between 2 nodes (i.e., moving a GB of data from a > storage system at SLAC to a users desktop at UTK). The BW*delay product > requires large windows on both end nodes and almost no loss over SLAC's > campus network, ESnet, Abilene, and UTK's campus network. > >The major problem DOE scientists have is determining why the goodput is so >low (i.e., 5 Mbps e2e over a 100 Mbps channel). The Web100 activities are >designed to answer the question 'is the biggest problem in the local host, >the remote host, or the network'. Getting an authoritative answer to this >simple question would be of immense value to the DOE scientific community >and well worth the investment NSF is making in funding the Web100 activities. > >Rich > >At 12:20 PM 3/23/01 -0500, David P. Reed wrote: >>So, I got a press release on web100.org and its TCP improvement software. >> >>The press will probably get this completely wrong (the slant in the press >>release is that TCP is *the big problem* and that scarce bandwidth is the >>reason we can't use 100 MB pipes). >> >>Has anyone done any studies that would reasonably support the huge >>investment here? >> >>- David >>-------------------------------------------- >>WWW Page: http://www.reed.com/dpr.html > >------------------------------------ > >Richard A. Carlson e-mail: RACarlson@anl.gov >Network Research Section phone: (630) 252-7289 >Argonne National Laboratory fax: (630) 252-4021 >9700 Cass Ave. S. >Argonne, IL 60439 - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20010323/ffcdc588/attachment.html From rja at inet.org Fri Mar 23 14:23:20 2001 From: rja at inet.org (RJ Atkinson) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> Message-ID: <5.0.2.1.2.20010323171417.00a09380@10.30.15.2> >Moral: Over-engineering may be bad for the Internet, eventually. I'm not known for being subtle. I might have phrased the above more along these lines: "Folks are doing lots of micro-optimisations these days, at various points in the stack and in the end-to-end network system. While a given micro-optimisation might be worthwhile in a micro-network, many micro-optimisations are being deployed in a haphazard manner in the global Internet these days. The real result is a significantly less robust network for very little (if any) real measurable benefit." >Finally, an historical irony: the ULP hack is acknowledged to be a >stopgap until SCTP has advanced to save us. Essentially, SCTP is OSI >TP4 with features. TCP's idea of decoupling API from wire protocol >units was, at the time of its development a new idea that was at >variance with the evolving OSI suite. Now, things seem to be >running full circle. The property of SCTP that I am hearing the most interest in is the decoupling of the transport-layer session state from the actual IP address at each end of the connection. Lots of folk seem interested in having a transport-layer session that could have increased robustness simply by multi-homing each end-host for the session onto different networks (providing path diversity). In this regard, I'm influenced by talking with folks who are implementing or deploying SCTP for various purposes. My sample space is certainly not statistically valid. IMHO, if the TCP checksum were bound to some form of host identifier other than the IP address, TCP could provide that particular property quite nicely. Obviously I'm influenced by conversations with jnc on this particular point. It would be an interesting exercise to work out the details for such an approach. All that noted, I could imagine using SCTP underneath a suitably generic API. It isn't clear to me that the API has to necessarily be as tightly coupled as is the case in some SCTP API proposals. Mayhap I'm confused. Cheers, Ran rja@inet.org From dpreed at reed.com Fri Mar 23 14:40:40 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> Message-ID: <5.0.2.1.2.20010323173039.02fd7e60@mail.reed.com> Bob - I agree with you that having TCP control the framing underneath it places a significant burden on future evolution of the TCP and IP. You are right, as well, about the intention of the TCP spec to avoid linking segmentation to the sequence of calls - it was considered and dropped. And for that matter, I still don't see the need. If your data should be processable out of order, why not use multiple TCP connections, RTP, or (gasp) UDP? If the data needs to be processed in order, then framing can be embedded in the data stream. From vjs at calcite.rhyolite.com Fri Mar 23 15:20:00 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing Message-ID: <200103232320.f2NNK0j17500@calcite.rhyolite.com> > From: Bob Braden > Filename : draft-williams-tcpulpframe-01.txt > ... > Moral: Over-engineering may be bad for the Internet, eventually. > > Finally, an historical irony: the ULP hack is acknowledged to be a > stopgap until SCTP has advanced to save us. Essentially, SCTP is OSI > TP4 with features. TCP's idea of decoupling API from wire protocol > units was, at the time of its development a new idea that was at > variance with the evolving OSI suite. Now, things seem to be > running full circle. Are we really sure we want to do this? > > I will be interested to hear other opinions. As long as such proposals don't consume finite, non-renewable resources such as protocol or IP option numbers, who cares? They won't be implemented by a significant number of hosts. No one will ask for them, instead of the usual TCP byte stream framing code (surely in some C++ or Java class by now). It will all be forgotten a lot sooner than TP4. If we're wrong about all of that and it becomes wildly popular, then no harm is done. The fact that the proposal does change TCP could be handled. The easiest way is to observe that like the recently suggest IP option acceleration, the idea is ok but the implementation (protocol) is wrong. It is easy to change the TCP API without changing the on-wire behavior to get and put data directly to and from application buffers when things are going well (e.g. no retransmissions), and fall back to a slow path in what must be the other, rare cases. If you're doing enough retransmitting to notice, you're not going fast enough to care about such things. In other words, contrary to various claims, no black magic was required to "page flip" on both input and output more than 10 years ago. Vernon Schryver vjs@rhyolite.com From day at std.com Fri Mar 23 15:16:37 2001 From: day at std.com (John Day) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> References: <200103232054.UAA16559@gra.isi.edu> Message-ID: >Hi. At the IETF just completed, I sat through an exposition of >the following Internet Draft: > > "Title : ULP Framing for TCP > Author(s) : J. Williams et al. > Filename : draft-williams-tcpulpframe-01.txt > Pages : 12 > Date : 22-Mar-01 > > This document proposes a framing protocol for TCP which is designed > to be fully compliant with applicable TCP RFC's and fully > interoperable with existing TCP implementations. The framing > mechanism is designed to work as a 'shim' between TCP and higher- > level protocols, preserving the reliable, in-order delivery of TCP > while adding the preservation of higher-level protocol record > boundaries if the record is less than or equal to the path MTU. The > shim is designed to enable hardware acceleration of data movement > operations (e.g. direct placement of receive TCP segments into > higher-level protocol buffers) for the protocols that use it, even > if TCP segments are delivered out-of-order." Bob, let me take a stab at this. I have thoroughly read your note but have not read the draft yet. I wil do that next, but it is late in the day here and I don't want to wait until I can get to the draft which might not be until tomorrow sometime. I have agonized over this problem for almost 30 years. I have for some time believed that TCP approach was the proper approach. (This stemming from our early discussions back in the early 70's when Multics did streams and that IBM system you were in charge of did records. I think I remember you arguing hard for record capabilities in some of those early meetings! ;-) Sorry couldn't resist. Streams were elegant. I remember how much I detested those damn half duplex terminals and the push for records in Telnet. Telnet was the most elegant thing we created in that early batch of protocols.) Streams were more general than records. (In those days, records usually meant fixed length as you no doubt remember.) Records were a pain in the neck (or something else). One of the mistakes I always thought the OSI model made (it wasn't just TP4) was the definition that "the identity of SDUs were maintained end-to-end." SDU (Service Data Unit) was the lump of stuff handed across the layer boundary. SDUs could be fragmented or contatenated however the protocol wanted into any number of PDUs for sending to the other end. The only requirement was the the guy on the other end was handed the same SDU that it had been given. Now it is the case that some applications do work with fixed or variable records and can't do anything until they have the whole thing. In fact most of our applications today do this except Telnet and it probably should be. (Telnet processing would be a whole lot more efficient if you could find the next IAC without looking at every byte But that is a nit and much less a problem now than with the hardware then.) Right, it was quite a revelation to me when I realized that the one protocol that I really thought was pure stream would be more efficient not with records but with SDUs.) Now clearly it can be rightly argued that if an application wants to see records, then it should do it. Of course, the application designer could just as easily argue that "there are many more applications that need to do this than just the one application. And isn't one of the main design principles that if lots of things need the same function that rather than do it n times (potentially differently), it should be done once consistently? TCP already has the machinery to do reassembly, why not have it do it, why should I have to replicate this byte diddling stuff in my application. Afterall, it is perfectly legal for TCP to deliver my data one byte at a time. If I am lucky, one call will give me what I need but more likely it will take more system calls to TCP before I get something I can use. Why does it need to be that much work." Well, yea. . . . So I started looking for some other reason to decide which side of the line it should be done on. Frankly, I haven't found any. It appears that the work is the same whether TCP does it or the application does it. Although, it may be replicated code, but that is not a big deciding factor. So no real architectural argument there. The only thing I have really found goes against us and that sort of comes from thinking about these things as objects. At first I thought that the only difference between the OSI SDU and what we thought of as records was that records were fixed length and SDUs could be variable in length. But at some point, I realized that from the application's point of view it was something different: "Here I want you to send this stuff. I don't care what you do to it. Send it all at once, send it with something else, break it up into big or little pieces, I don't care what kind of mess you make of it. Just clean up after yourself when you are done." And I thought hmmmm, that isn't an unreasonable demand by the layer above and it maintains the invariance of the interaction which is always a strong design property for interfaces. So these days, I am still on the fence, but leaning toward a solution that maintains invariance, i.e. what I put in at one end will come out the other. This is really the orthogonality that all interfaces should exhibit. groan. > >I would like to suggest two things about this, one simple and one >subtle. The simple one is this: to say that the ULP framing is fully >compliant with the applicable TCP RFCs is simply false. For some of >us, at least, such a lack of truth in technical advertising is a red >flag. > >The reason why it is false, and its consequences, form the subtle bit. >It is true that the proposed shim does not change the definition of the >TCP protocol on the wire. However, it does change a more fundamental >principle of TCP, which is the deliberate decoupling of what happens on >the wire from what the user sends. (See the following sentences from Yes, but in some sense we have decoupled the sender but coupled the receiver more tightly. >RFC 793, for example: > > The TCP is able to transfer a continuous stream of octets in each > direction between its users by packaging some number of octets into > segments for transmission through the internet system. In general, > the TCPs decide when to block and forward data at their own > convenience. > >The last sentence may be phrased in a slightly academic manner; >the reader is assumed to understand the the "convenience" of the >transport layer is to provide optimal performance. In an earlier >paragraph the spec says: > > TCP is designed to work in a very general environment of > interconnected networks. > >Now for the subtle bit. Generality and optimization are typically >contradictory. The Internet protocol suite was designed deliberately >and carefully for generality, at the possible expense of optimization. >It was also designed for simplicity at the expense of optimization. >We recognized that later engineering efforts would rob some of the >simplicity in order to reach greater optimization, and indeed, >this has happened and is probably not a bad thing. On the other >hand, we should be very wary of over-engineering optimal solutions >that cut down the generality. Above I was trying to point out that this is generality on sending but not for receiving. It does not impact either generality or optimality "to clean up after yourself." Be careful about the generality thing: It was generality as we understood it in the mid-70's. (I have found some aspects of TCP that I thought were done for good performance and design reasons and have recently realized that they were really based on the nature of the traffic at the time and that those conditions no longer hold. ooops.) We weren't and aren't infallible. > >The ULP protocol is a classic example of this issue. The TCP spec >deliberately decoupled the packaging of data across the API (the ADPU, >if you will) from the segmentation that TCP does on the wire. This was >not an accident; it was designed to allow TCPs to be able to adapt to >new and different environments, without requiring that applications >change. The ULP proposal changes this to tight coupling, since it only >works if the send call units are mapped directly into segments. So >adopting ULP MAY (and note that one cannot ever be sure that it >will/won't) reduce the freedom of TCP to adapt to future changes. >And contrary to what the ULP proponents claim, it is a very fundamental >change in TCP. We should think VERY carefully before making such >a change, and we should be honest about what we are doing. You may be right here. I will read the document later and see what I think. IF ULP is really a protocol on top of TCP, I don't see how it can constrain TCP. I don't even see that every application that uses TCP would have to use ULP. But I also would agree with you about whether this is really a necessary addition at this point. > >Moral: Over-engineering may be bad for the Internet, eventually. > >Finally, an historical irony: the ULP hack is acknowledged to be a >stopgap until SCTP has advanced to save us. Essentially, SCTP is OSI >TP4 with features. TCP's idea of decoupling API from wire protocol >units was, at the time of its development a new idea that was at >variance with the evolving OSI suite. Now, things seem to be >running full circle. Are we really sure we want to do this? I have browsed through SCTP and it looks like a bell-heads dream. Not sleek and simplicity at all. Just lots and lots of mechanism and weight. I would also agree with your appraisal of the OSI situation at the time. I think that when the people put the SDU thing in the OSI stuff they were trying to force a fixed record view of the world. The problem was that it gotten written in general terms and while you could read that into it (and I did for years). Later as I indicated, I began to realize that it actually said something much more interesting. So now when I talk about this, I say there are three cases: Stream, record and for lack of a better term Idempotent for the literal interpretation of the SDU because it keeps things invariant. Well, there it is something for you to think about too. ;-))) Whaddya think? Now to read the damn thing! Gotta go to dinner, people are hollering at me. Take care, John From mfisk at lanl.gov Fri Mar 23 15:29:20 2001 From: mfisk at lanl.gov (Mike Fisk) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> Message-ID: An abstract TCP byte stream is very similar to the byte or bit stream provided by serial links. It would be useful to know why one wouldn't use normal byte stuffing to denote frame boundaries within ULP data. I assume the argument is that it is inefficient to scan and twiddle bytes and that some out-of-band (ala packet segmentation) framing looks cheaper. But the draft presents this problem in the context of special-purpose NICs which could presumably handle byte stuffing pretty cheaply. With regard to your "subtle problems", the one that first comes to my mind is a new dependence on the end-to-end characteristics of TCP packets. With all of the middleboxes munging TCP packets, this seems dangerous. Even if the draft is correct in assuming that the ULP payload won't contain something that looks like a valid ULP header, any performance gains from using this protocol are lost in these situations. Second, from an upper-layer point of view, I don't know that I want to have to limit my PDUs to the current path MSS or force the protocol to degrade when the MSS falls below 512. It doesn't seem far fetched to me that some future (wireless?) network will only permit very small MTUs. What if I have an application that thinks in fixed-size blocks of, say 1k. Depending on the MSS, this can result in a lot of small packets if one is trying to preserve message boundaries. For good reasons, people have gone to a lot of effort to remove small packets from TCP streams. Again, it would be helpful if there was a good argument against byte-stuffing. On Fri, 23 Mar 2001, Bob Braden wrote: > "Title : ULP Framing for TCP > Author(s) : J. Williams et al. > Filename : draft-williams-tcpulpframe-01.txt > Pages : 12 > Date : 22-Mar-01 > > This document proposes a framing protocol for TCP which is designed > to be fully compliant with applicable TCP RFC's and fully > interoperable with existing TCP implementations. The framing > mechanism is designed to work as a 'shim' between TCP and higher- > level protocols, preserving the reliable, in-order delivery of TCP > while adding the preservation of higher-level protocol record > boundaries if the record is less than or equal to the path MTU. The > shim is designed to enable hardware acceleration of data movement > operations (e.g. direct placement of receive TCP segments into > higher-level protocol buffers) for the protocols that use it, even > if TCP segments are delivered out-of-order." > I would like to suggest two things about this, one simple and one > subtle. The simple one is this: to say that the ULP framing is fully > compliant with the applicable TCP RFCs is simply false. For some of > us, at least, such a lack of truth in technical advertising is a red > flag. > > The reason why it is false, and its consequences, form the subtle bit. > It is true that the proposed shim does not change the definition of the > TCP protocol on the wire. However, it does change a more fundamental > principle of TCP, which is the deliberate decoupling of what happens on > the wire from what the user sends. (See the following sentences from > RFC 793, for example: -- Mike Fisk, RADIANT Team, Network Engineering Group, Los Alamos National Lab See http://home.lanl.gov/mfisk/ for contact information From jonathan at DSG.Stanford.EDU Fri Mar 23 16:11:03 2001 From: jonathan at DSG.Stanford.EDU (Jonathan Stone) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of "Fri, 23 Mar 2001 16:20:00 MST." <200103232320.f2NNK0j17500@calcite.rhyolite.com> Message-ID: <200103240011.QAA10208@champagne.dsg.stanford.edu> In message <200103232320.f2NNK0j17500@calcite.rhyolite.com>, Vernon Schryver writes: >In other words, contrary to various claims, no black magic was required >to "page flip" on both input and output more than 10 years ago. Yes, provided that the MSS is a multiple of the pagesize, (or the sender rounded down to that), and that you already DMAed the packet into memory, aligned such that the TCP (or whatever) payload ended up page-aligned. That is, it works provided the receiver's guess about alignment and preceding header sizes pays off. Reading between the lines, one of the aims of this proposal is to address the cases where such guesses would fail. The "RDMA" makes me wonder if this isn't just about preserving record boundaries, but about preserving in-memory alignment of each record, too. Alignment constraints might be why byte-stuffing (or Stuart Cheshire's COBS) was not proposed. More explanation of WARP (the target remote-DMA ULP) might give some insight. Maybe the authors of the draft could comment? That said -- it seems an awful lot of effort, in an Internet where both Ethernet-sized MTUs, and signficantly larger alignment constraints -- pagesizes of 4K, 8k, 16k or larger -- are common. From vjs at calcite.rhyolite.com Fri Mar 23 17:40:05 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing Message-ID: <200103240140.f2O1e5p20358@calcite.rhyolite.com> > From: Jonathan Stone > >In other words, contrary to various claims, no black magic was required > >to "page flip" on both input and output more than 10 years ago. > > Yes, provided that the MSS is a multiple of the pagesize, (or the > sender rounded down to that), and that you already DMAed the packet > into memory, aligned such that the TCP (or whatever) payload > ended up page-aligned. > > That is, it works provided the receiver's guess about alignment and > preceding header sizes pays off... The proposal also doesn't work unless it's somewhat similar guesses pay off. That the guesses are made explicit and put on the wire doesn't change that problem. I prefer things that just work by default to things that are explicit but work just as (in)frequently. But that's largely a matter of style. I'm not a fan of having an on-the-wire protocol for every cloud on a white board. Many people disagree with me, as demonstrated by enthusiasm for such as the PPP BOD protocol which can do and does nothing except put on the wire what both peers either already know or don't care about. Vernon Schryver vjs@rhyolite.com From day at std.com Fri Mar 23 17:50:03 2001 From: day at std.com (John Day) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> References: <200103232054.UAA16559@gra.isi.edu> Message-ID: At 20:54 +0000 3/23/01, Bob Braden wrote: >---------- >X-Sun-Data-Type: text >X-Sun-Data-Description: text >X-Sun-Data-Name: text >X-Sun-Charset: us-ascii >X-Sun-Content-Lines: 84 > > >Hi. At the IETF just completed, I sat through an exposition of >the following Internet Draft: > > "Title : ULP Framing for TCP > Author(s) : J. Williams et al. > Filename : draft-williams-tcpulpframe-01.txt > Pages : 12 > Date : 22-Mar-01 Okay, I am back from dinner and have glanced through this proposal. What I wrote before was sort of my general view of years of trying to figure out the stream vs record vs SDU thing. Now I could imagine a completely independent shim layer above TCP that inserted some framing around the user's data and gave it to TCP and then simply just took the receiving stream of bytes as they came in in order and put together what had been sent and handed it to the application. So I could imagine some good ways to do what Bob was talking about in his note that would not affect TCP, and might provide a common procedure for applications. BUT THIS ISN'T THAT!!!!!!!!!!!!!! THIS is a real bad idea. Just scanning it I saw many things that set off more than a few red flags. Perhaps this weekend or early next week I can detail my problems with this. But right now is not the time. I have to get up at 0330 tomorrow morning and that is looking sooner than I like. I'm with Bob on this one. Although, I think he was to gentle in his objection. Take care, John From Erik.Nordmark at eng.sun.com Fri Mar 23 23:02:43 2001 From: Erik.Nordmark at eng.sun.com (Erik Nordmark) Date: Thu Mar 25 11:59:36 2004 Subject: [e2e] TCP Framing In-Reply-To: "Your message with ID" <200103232054.UAA16559@gra.isi.edu> Message-ID: As much as we might dislike the various middle boxes, I wonder what would happen if one of these TCP connections passed through a middle box. While many middleboxes tweak things on a packet by packet basis, there might be some that are essentially implemented as a read+write loop in application space, i.e. the TCP segment boundaries would not be preserved. Thus trying to make the TCP segment boundaries matter for the ULP is threading into unchartered territory. Erik From conrad at joda.cis.temple.edu Sat Mar 24 09:00:59 2001 From: conrad at joda.cis.temple.edu (Phillip Conrad) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <5.0.2.1.2.20010323173039.02fd7e60@mail.reed.com> References: <200103232054.UAA16559@gra.isi.edu> Message-ID: <5.0.2.1.2.20010324112843.039c5770@155.247.71.60> At 05:40 PM 3/23/2001 -0500, David P. Reed wrote: >Bob - I agree with you that having TCP control the framing underneath it >places a significant burden on future evolution of the TCP and IP. You >are right, as well, about the intention of the TCP spec to avoid linking >segmentation to the sequence of calls - it was considered and dropped. > >And for that matter, I still don't see the need. If your data should be >processable out of order, why not use multiple TCP connections, RTP, or >(gasp) UDP? If the data needs to be processed in order, then framing can >be embedded in the data stream. Without making any comment on the TCP framing question one way or the other, I think I can address David's last question ("why not use...?") Let's take each of the proposed alternatives in turn: >Why not use multiple TCP connections Two reasons: (1) fairness (2) slow start/congestion avoidance. Fairness: If I use "n" TCP connections for a single flow because I have three logical streams that I want to be processed out-of-order with respect to one another, then I am getting "n" times greater a share of the bandwidth on congested links that I should reasonably be entitled to. Slow-start/congestion avoidance: if I have "n" TCP connections for my packet flow rather than one, there is no communication among them. If one of my "n" TCP connections experiences a packet loss, then I should probably back off my sending rate on all three. My expectation is that having "n" connections all independently doing slow-start/congestion avoidance to find an appropriate sending rate, would mean that each of the flows would converge to an appropriate sending rate more slowly, than if there were a single flow, with the result that the overall goodput of the network is reduced. I may be wrong on this point.. . sometimes intuition leads us astray. If NS-2 work hasn't already been done to investigate this point, it probably should be... but I'd be surprised if it hasn't already. (If someone reading this message knows of such work, could they point it out?) >RTP RTP, it seems to me, is widely misunderstood. While RTP *does* contain some transport layer functionality (e.g. end-to-end delivery, sequence numbers, etc.) RTP is most definitively NOT a transport protocol in the sense that TCP, UDP, or SCTP are transport protocols. Typically, RTP must be layered on top of one of those (TCP, UDP, SCTP, or something else in that category). Thus it is a red herring in this discussion. >or (gasp) UDP? Apart from the issue of reliability, the main reason: flow control/TCP-friendly congestion control. Applications without flow control don't work well, and those without TCP-friendly congestion control are "considered harmful". Building TCP-friendliness *correctly* into an application built on top of UDP is a corner that many developers are inclined to cut. In short, applications need a wide variety of qualities of service at the transport layer: total order, partial order, unordered service... reliable, unreliable, or something in between... but what ALL applications need to be good network citizens is flow-control and TCP-friendly congestion control. To my way of thinking, this is why the time is right for SCTP, which offers a choice between reliable/ordered service, reliable/partially ordered service, and now has extensions under development for unreliable and partially reliable service as well. More thinking on this topic can be found at: http://www.cis.udel.edu/~pconrad/thesis and in some tech reports at http://netlab.cis.temple.edu/techrpts. Phill Conrad Asst. Professor, CIS Dept., Temple University From dpreed at reed.com Sat Mar 24 11:54:58 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <5.0.2.1.2.20010324112843.039c5770@155.247.71.60> References: <5.0.2.1.2.20010323173039.02fd7e60@mail.reed.com> <200103232054.UAA16559@gra.isi.edu> Message-ID: <5.0.2.1.2.20010324142828.02fd37a0@mail.reed.com> At 12:00 PM 3/24/01 -0500, Phillip Conrad wrote: >I think I can address David's last question ("why not use...?") Let's >take each of the proposed alternatives in turn: > >>Why not use multiple TCP connections > > Two reasons: (1) fairness (2) slow start/congestion avoidance. > > Fairness: If I use "n" TCP connections for a single flow because I > have three logical streams that I want to be processed out-of-order > with respect to one another, then I am >getting "n" times greater a share of the bandwidth on congested links that >I should reasonably be entitled to. Don't think this is actually true. packet drop rate on the shared link has nothing to do with port numbers - even RED discriminates only on IP address. Now ECN might cause one TCP to back off and another to back off less, but the stable state would seem to be the same, whether multiple TCP connections are used or not. (some of the less end-to-endian notions of router fairness might give 3 TCP cnxns better service, by looking deeper into the packets). > Slow-start/congestion avoidance: if I have "n" TCP connections for my > packet flow rather than one, there is no communication among them. If > one of my "n" TCP connections experiences a packet loss, then I should > probably back off my sending rate on all three. I would think that if the total traffic between the two ends is divided among the "n" connections, that slow start would converge just as fast. But it would be an interesting experiment. >>RTP > > RTP, it seems to me, is widely misunderstood. While RTP *does* > contain some transport layer functionality (e.g. end-to-end delivery, > sequence numbers, etc.) RTP is most definitively NOT a transport protocol > in the sense that TCP, UDP, or SCTP are transport protocols. Typically, > RTP must be layered on top of one of those (TCP, UDP, SCTP, or something > else in that category). Thus it is a red herring in this discussion. Don't agree. RTP on UDP is a transport. UDP was invented (I was there) as sugaring of IP layer in order to allow a wide variety of experimental transport protocols, one key particular example being protocols like packet voice that didn't want reliable delivery, but instead timely delivery that allowed the application to decide what to do with out-of-order and lost packets. RTP is pretty much for that example. >>or (gasp) UDP? > >Apart from the issue of reliability, the main reason: flow >control/TCP-friendly congestion control. Applications without flow >control don't work well, and those without TCP-friendly congestion control >are "considered harmful". Building TCP-friendliness *correctly* into an >application built on top of UDP is a corner that many developers are >inclined to cut. One can create a TCP-friendly congestion-controlled protocol on top of UDP quite easily. You just use the same mechanisms as TCP to control the aggregate volume of data, respond to the same signals of congestion (unack'ed packets from loss and RED, and ECN if capable). That's what I meant. >To my way of thinking, this is why the time is right for SCTP, which >offers a choice between reliable/ordered service, reliable/partially >ordered service, and now has extensions under development for unreliable >and partially reliable service as well. Maybe. But the problem with SCTP is that it is a "kitchen-sink" protocol, full of options and combinations of options that make it hard to test (for performance and so forth) in all its combinations. I did such a protocol once - it was called DSP, and some of the old-timers like Vint, Bob Kahn, and Bob B. may remember that I was arguing for it to replace TCP, because it was more "general". It did all of the things that SCTP seems to want to do, but was much simpler. After much thought and debate, it became obvious that I was really ignoring my own "end-to-end argument" because most of the functionality was only there as a subroutine library for lazy application programmers, and we had no particular way to argue that the application needs were optimized by the design. If so, then it should have been a subroutine library, not built into the "standard" - instead we created UDP to encourage experimentation in those domains - e.g. RTP, DNS, ... SCTP may seem to do a lot, and it may be fun to deploy, but I'm conservative about throwing features into low level protocols until you can prove they are needed (not just wanted by folks who could experiment on top of UDP). This fetish of opposing UDP is based on a falsehood - that somehow UDP protocols aren't TCP-friendly, or closed-loop congestion-controlling, by definition. Some may be, but that's because no one has thought it through for them. - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html From vjs at calcite.rhyolite.com Sat Mar 24 13:00:47 2001 From: vjs at calcite.rhyolite.com (Vernon Schryver) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing Message-ID: <200103242100.f2OL0ld06539@calcite.rhyolite.com> > From: "David P. Reed" > ... > This fetish of opposing UDP is based on a falsehood - that somehow UDP > protocols aren't TCP-friendly, or closed-loop congestion-controlling, by > definition. Some may be, but that's because no one has thought it through > for them. I think UDP is resisted for reasons like those that give CSMA/CD a bad name. People misunderstand "collision" as something bad that breaks bits or at least uses vast quantities of bandwidth much as a collision on a freeway causes traffic jams in both directions. UDP is misunderstood as a bad thing because its acronym is often expanded as the "unreliable datagram protocol" or at best as the "unreliable user datagram protrocol." They hear "unreliable" and think it's not for the precious data of their wonderful application. For proof, ask http://www.google.com/ about "unreliable datagram protocol" (with or without the double-quotes). Vernon Schryver vjs@rhyolite.com From jnc at ginger.lcs.mit.edu Sat Mar 24 14:51:32 2001 From: jnc at ginger.lcs.mit.edu (J. Noel Chiappa) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing Message-ID: <200103242251.RAA07976@ginger.lcs.mit.edu> > From: RJ Atkinson > if the TCP checksum were bound to some form of host identifier other > than the IP address, TCP could provide that particular property quite > nicely. Obviously I'm influenced by conversations with jnc on this > particular point. It would be an interesting exercise to work out the > details for such an approach. As part of the NSRG work, I have a basically-done I-D which worked out in some detail how to do it. (I.e. down to specifying the mechanics of how you do an upwardly compatible change to the TCP checksum, the format of a TCP option to carry the host identifier in the SYN packet, etc, etc.) It turns out you can do it with zero extra overhead in both (packet) space and processing time - if you keep the same checksum algorithm - more if you upgrade to a better one, as I recall Dave Reed wanted to do originally :-). (In fact, a tiny bit less computing, if you're recomputing the partial checksum of the pseudo-header at the moment.) If people are interested, I can put it up on the web somewhere, or even (gasp) turn it in as an I-D. Noel From djw1005 at cam.ac.uk Sat Mar 24 16:22:19 2001 From: djw1005 at cam.ac.uk (Damon Wischik) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <5.0.2.1.2.20010324112843.039c5770@155.247.71.60> Message-ID: Phillip Conrad wrote: > David P. Reed wrote: > >Why not use multiple TCP connections > > Two reasons: (1) fairness (2) slow start/congestion avoidance. > Fairness: If I use "n" TCP connections for a single flow because I have > three logical streams that I want to be processed out-of-order with > respect to one another, then I am getting "n" times greater a share of > the bandwidth on congested links that I should reasonably be entitled > to. This begs the question: what are you reasonably entitled to? If you have three logically separate streams which can be processed out-of-order, I would have thought there is a case to be made that those are three essentially independent streams (which just happen to be between the same end-nodes), and so together they deserve three times the bandwidth of a single stream. Damon Wischik. From cannara at attglobal.net Sat Mar 24 22:44:08 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103242100.f2OL0ld06539@calcite.rhyolite.com> Message-ID: <3ABD93B8.E87D26D9@attglobal.net> Just an added note -- "unreliable" is redundant in UDP. "Datagram means unreliable, best-effort packet service, which is why Xerox XNS termed its equivalent to IP as "IDP", for "Internet Datagram Protocol". Further, UDP isn't completely unreliable, if a packet gets received, since the data are checksummed. Alex Vernon Schryver wrote: > > > From: "David P. Reed" > > > ... > > This fetish of opposing UDP is based on a falsehood - that somehow UDP > > protocols aren't TCP-friendly, or closed-loop congestion-controlling, by > > definition. Some may be, but that's because no one has thought it through > > for them. > > I think UDP is resisted for reasons like those that give CSMA/CD > a bad name. People misunderstand "collision" as something bad that > breaks bits or at least uses vast quantities of bandwidth much as > a collision on a freeway causes traffic jams in both directions. > UDP is misunderstood as a bad thing because its acronym is often > expanded as the "unreliable datagram protocol" or at best as the > "unreliable user datagram protrocol." They hear "unreliable" and > think it's not for the precious data of their wonderful application. > > For proof, ask http://www.google.com/ about "unreliable datagram protocol" > (with or without the double-quotes). > > Vernon Schryver vjs@rhyolite.com From craig at aland.bbn.com Sun Mar 25 09:00:39 2001 From: craig at aland.bbn.com (Craig Partridge) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of "Sat, 24 Mar 2001 22:44:08 PST." <3ABD93B8.E87D26D9@attglobal.net> Message-ID: <200103251700.f2PH0dZ51050@aland.bbn.com> In message <3ABD93B8.E87D26D9@attglobal.net>, Cannara writes: >Just an added note -- "unreliable" is redundant in UDP. That's why it isn't in UDP's name. From the RFC index: 0768 User Datagram Protocol. J. Postel. Aug-28-1980. (Format: TXT=5896 bytes) (Also STD0006) (Status: STANDARD) Craig From dotis at sanlight.net Sun Mar 25 15:53:14 2001 From: dotis at sanlight.net (Douglas Otis) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103232054.UAA16559@gra.isi.edu> Message-ID: Bob, You are in effect changing the wire specifications of TCP by insisting on the payload being bound to the TCP frame. This is a change to the wire specifications in that a middle box is likely to re-position these bytes into a non-compatible form forcing a software intervention likely to break your intended application. TCP framing is NOT an interim fix if indeed it is intended to be placed into hardware to perform content directed placement of data. As such hardware will not be able to cope with non-aligned data, you are placing a new requirement on the wire format; that being byte placement with respect to TCP frames. At the same time you attempt to implement a major modification to the TCP API while viewing this modification as unrelated to the TCP standard, there is already an API/Wire Format that provides the exact features that you desire that is documented and agreed upon. It is called RFC 2960 or SCTP. This RFC does the same function as this modified version of TCP is hoped to do. The real desire is not for an interim version pending deployment of SCTP, as anyone knows, protocols built into hardware tend to live much longer than a short period of time as you allude. In reality, you are not happy with SCTP and do not want to use it as it is likely adding features you do not want to deal with. What are those features of SCTP that make it hard to deal with I wonder? Is it the stream ID that allows multiple independent flows? A feature surely to be a boon to hardware implementations as this then requires only a single flow control for many independent streams. Is it the multi-homing feature? Also a boon to those wishing hardware to provide highly reliable connections. Is it the ability to prevent spoofing, or DoS attacks? Perhaps it is the ability of SCTP to identify payloads unlike TCP. Sorry, but any framed version of TCP you create will look like a hack compared to SCTP with its highly desired features. Please, do not tell me SCTP is too hard to implement in hardware and only a mangled version of TCP is something you are willing to attempt. SCTP will be in place years before your mangled version of TCP is even seriously considered. Instead, this is a prelude to something similar to Modem standard wars where manufactures either could not wait or could not agree to standards. I think before you reject SCTP out of hand, you should make some effort to explain why a record based protocol does not suit your needs and yet only a modified byte stream protocol does. Should I desire to attack your adapter, all I would need to do is to provide you with non-aligned data, something that anyone will agree is a valid TCP data stream. Think twice before using PPP or IP-SEC with your framing equipment. This framing is likely to be the worst thing ever afflicted upon TCP as it corrupts wire specifications and APIs. Doug > -----Original Message----- > From: end2end-interest-admin@postel.org > [mailto:end2end-interest-admin@postel.org]On Behalf Of Bob Braden > Sent: Friday, March 23, 2001 12:54 PM > To: end2end-interest@postel.org > Cc: braden@ISI.EDU > Subject: [e2e] TCP Framing > > > From craig at aland.bbn.com Mon Mar 26 05:39:28 2001 From: craig at aland.bbn.com (Craig Partridge) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] draft on IP Fast Option Lookup In-Reply-To: Your message of "Fri, 23 Mar 2001 11:27:56 MST." <200103231827.f2NIRuX12662@calcite.rhyolite.com> Message-ID: <200103261339.f2QDdSZ54143@aland.bbn.com> In message <200103231827.f2NIRuX12662@calcite.rhyolite.com>, Vernon Schryver wr ites: >The weaknesses in the RFC's that tried to instruct how to make compute >the TCP checksum faster are classic examples of that syndrome. Really -- my recollection is that those of us who wrote those RFCs actually had done checksum work. We just managed to completely bungle writing down some of the details. (Which is unfortunate but tars us with a different brush...) Craig From craig at aland.bbn.com Mon Mar 26 05:46:00 2001 From: craig at aland.bbn.com (Craig Partridge) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of "Fri, 23 Mar 2001 15:29:20 PST." Message-ID: <200103261346.f2QDk0Z54200@aland.bbn.com> In message , Mike Fis k writes: >I assume the argument is that it is inefficient to scan and twiddle bytes >and that some out-of-band (ala packet segmentation) framing looks cheaper. COBS is a very efficient byte stuffing that doesn't require much byte scanning. If you're asking the question, you might go looks at Cheshire's SIGCOMM paper and see how COBS might fit. Craig From RShankar at Novell.COM Mon Mar 26 08:04:44 2001 From: RShankar at Novell.COM (Ramesh Shankar) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: Message-ID: <3ABF689C.6060305@Novell.COM> The fairness issue is an interesting angle and seems relevant only when bandwidth is really limited or from an ISP perspective (perhaps). This angle is similar to the "fair share scheduling" approach used in time sharing UNIX systems. This issue has been discussed in the following Ph.D. thesis: V. N. Padmanabhan Ph.D. Dissertation Computer Science Division, University of California at Berkeley, USA September 1998 (Also published as Technical Report UCB/CSD-98-1016.) http://www.research.microsoft.com/~padmanab/phd-thesis.html I am not a researcher to be able to make authoritative statements, but I felt that just like the FSS concept is no longer relevant in todays systems, the fairness issue is perhaps not so relevant. I have been curious to understand this issue and perhaps someone can throw some light on this. Thanks, S.R. Damon Wischik wrote: > Phillip Conrad wrote: > >> David P. Reed wrote: >> >>> Why not use multiple TCP connections >> >> Two reasons: (1) fairness (2) slow start/congestion avoidance. >> Fairness: If I use "n" TCP connections for a single flow because I have >> three logical streams that I want to be processed out-of-order with >> respect to one another, then I am getting "n" times greater a share of >> the bandwidth on congested links that I should reasonably be entitled >> to. > > > This begs the question: what are you reasonably entitled to? > > If you have three logically separate streams which can be processed > out-of-order, I would have thought there is a case to be made that those > are three essentially independent streams (which just happen to be between > the same end-nodes), and so together they deserve three times the > bandwidth of a single stream. > > Damon Wischik. From cannara at attglobal.net Mon Mar 26 09:01:45 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103251700.f2PH0dZ51050@aland.bbn.com> Message-ID: <3ABF75F9.16C5209A@attglobal.net> Craig, this has been a common test for years, to see how old a "network- knowledgeable" student is. Ask the what UDP means. Prior to the interesting RFC Jeremy produced the "U" stood for just what it stands for in all other families of protocols that have datagram services -- "unreliable". Somehow some Internet folks seemed to become sensitive, almost ashamed, of that very accurate and truthful engineering label, and turned to seek a "u"-word that had marketability. I've yet to meet a user who knowingly "uses" a datagram protocol. You're younger than I thought! Alex Craig Partridge wrote: > > In message <3ABD93B8.E87D26D9@attglobal.net>, Cannara writes: > > >Just an added note -- "unreliable" is redundant in UDP. > > That's why it isn't in UDP's name. From the RFC index: > > 0768 User Datagram Protocol. J. Postel. Aug-28-1980. (Format: TXT=5896 > bytes) (Also STD0006) (Status: STANDARD) > > Craig From jim.williams at emulex.com Mon Mar 26 10:43:32 2001 From: jim.williams at emulex.com (Jim Williams) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103232054.UAA16559@gra.isi.edu> Message-ID: <00f101c0b624$a8620d50$710e10ac@giganet.com> ----- Original Message ----- From: "Bob Braden" To: Cc: Sent: Friday, March 23, 2001 3:54 PM Subject: [e2e] TCP Framing >Hi. At the IETF just completed, I sat through an exposition of >the following Internet Draft: > > "Title : ULP Framing for TCP > Author(s) : J. Williams et al. > Filename : draft-williams-tcpulpframe-01.txt > Pages : 12 > Date : 22-Mar-01 > > This document proposes a framing protocol for TCP which is designed > to be fully compliant with applicable TCP RFC's and fully > interoperable with existing TCP implementations. The framing > mechanism is designed to work as a 'shim' between TCP and higher- > level protocols, preserving the reliable, in-order delivery of TCP > while adding the preservation of higher-level protocol record > boundaries if the record is less than or equal to the path MTU. The > shim is designed to enable hardware acceleration of data movement > operations (e.g. direct placement of receive TCP segments into > higher-level protocol buffers) for the protocols that use it, even > if TCP segments are delivered out-of-order." > >I would like to suggest two things about this, one simple and one >subtle. The simple one is this: to say that the ULP framing is fully >compliant with the applicable TCP RFCs is simply false. For some of >us, at least, such a lack of truth in technical advertising is a red >flag. I hope you are not attacking the honesty of the authors. I may well be the most intellectually dishonest scoundrel to ever roam the internet, but I can assure you that the other authors are fine, honest, upstanding people who would not let me get away with anything underhanded. :-) More seriously, many alternatives had been considered which defined new TCP options or defined currently reserved TCP header bits. The point being that the submitted proposal does not do any of those things, which leads to the claim of full compliance with existing RFCs. >The reason why it is false, and its consequences, form the subtle bit. >It is true that the proposed shim does not change the definition of the >TCP protocol on the wire. However, it does change a more fundamental >principle of TCP, which is the deliberate decoupling of what happens on >the wire from what the user sends. (See the following sentences from >RFC 793, for example: > > The TCP is able to transfer a continuous stream of octets in each > direction between its users by packaging some number of octets into > segments for transmission through the internet system. In general, > the TCPs decide when to block and forward data at their own > convenience. These sentences don't seem to support your point. Stating what TCPs are able to do, or what they generally do, hardly indicates what they MUST do. It seems inconsistant to state on one hand that APIs are outside the scope of the TCP specification, and on the other hand claim that a particular implementation is non-compliant because the API doesn't map to the wire in a way that suits your liking. The core of your objections may be that the framing proposal uses TCP in a way different from what was originally intended. I would agree with this. My view is that the point of standards compliance is interoperability, not "original intent". If two consenting endpoints want to violate "original intent", that should be fine as long they follow all the rules. >The last sentence may be phrased in a slightly academic manner; >the reader is assumed to understand the the "convenience" of the >transport layer is to provide optimal performance. In an earlier >paragraph the spec says: > > TCP is designed to work in a very general environment of > interconnected networks. > >Now for the subtle bit. Generality and optimization are typically >contradictory. The Internet protocol suite was designed deliberately >and carefully for generality, at the possible expense of optimization. >It was also designed for simplicity at the expense of optimization. >We recognized that later engineering efforts would rob some of the >simplicity in order to reach greater optimization, and indeed, >this has happened and is probably not a bad thing. On the other >hand, we should be very wary of over-engineering optimal solutions >that cut down the generality. These types of arguments tend to be more philosophical than technical, and probably the best that can be done is to clearly state the differences in point of view without presuming to resolve it one way or the other. There are certain things that should be decided by standards organizations and certain things that should be decided by the market place. My view is that the best standards are those that allow the optimization versus generality tradeoffs to be resolved by the market place while still insuring full interoperability of the various competing design points. From jnc at ginger.lcs.mit.edu Mon Mar 26 11:24:12 2001 From: jnc at ginger.lcs.mit.edu (J. Noel Chiappa) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing Message-ID: <200103261924.OAA11530@ginger.lcs.mit.edu> > From: Cannara > Craig, this has been a common test for years, to see how old a > "network-knowledgeable" student is. Ask the what UDP means. Prior to > the interesting RFC Jeremy [sic - JNC] produced .. You're younger than > I thought! Well, this is kind of pointless, but in the name of historical accuracy: Actually, those of us who are *really* "older" will remember that UDP was *actually* done by Dave Reed (whose name appears nowhere on RFC-768, alas). RFC-768 is simply a re-packaging of IEN-71, "User Datagram Protocol", by Dave Reed. The reason I recall this is that I seem to recall that Dave discussed the design with me, and I've ever since had this bit set that we screwed up the "no-checksum" value. (It should have been all-1's, since as the checksum is the 1's complement of the 1's-completement sum, and as no sequence of numbers [except all 0's, which you never see in a real packet] can be 1's-completement summed to 0, all 1-'s is the value you can never get - and thus should have been the "no checksum" value. Making it 0 requires a check of the complement of the sum, and inversion if it's 0, on all packets.) Perhaps Dave Reed will correct me if my memory is wrong? As to why it now bear's Jon's name, that was because he was editing all the TCP/IP standards documents (IP, TCP, ICMP etc), and he edited the UDP document to be part of the set. As for the protocol's name, it was quite deliberately chosen to be "User". At that point, the only user-accessible service was TCP. There were a number of theorized services, including host-name lookup, which didn't want a full-blown bi-directional stream connection. UDP - allowing "users" access to a datagram protocol - was the answer. It deliberately wasn't made reliable i) to allow its use by applications which didn't care about reliability (we'd had experience of this problem, with packet voice and TCP, where the enforced reliability got in the way of the application), and ii) to keep it simple. I believe Name Resolution (IEN-116, I think - not DNS, this was long before DNS - it allowed a client with no disk, such as a terminal server, to allow use of hostnames, which it queried a time-sharing machine which has a copy of HOSTS.TXT to convert to an IP address) was the first service defined to run on UDP. Noel From braden at ISI.EDU Mon Mar 26 11:33:49 2001 From: braden at ISI.EDU (Bob Braden) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing Message-ID: <200103261933.TAA01019@gra.isi.edu> *> *> There are certain things that should be decided *> by standards organizations and certain things that should be *> decided by the market place. My view is that the best standards *> are those that allow the optimization versus generality tradeoffs *> to be resolved by the market place while still insuring full *> interoperability of the various competing design points. *> *> Jim, I would suggest that the marketplace is most specifically a poor place to make wise high-level technical decisions. One could make the case that TCP/IP has been so successful just because it was allowed to mature in military and academic environments that shielded it from irrelevant marketplace pressures for many years. X.25 is a good example of a technology that did not have that advantage. There are also XNS, WAP, VHS, and lots of other examples of market-driven entries. The marketplace is concerned only with optimization, since it is necessary very short-term in its outlook. Any "optimization vs. generality tradoff" performed by the marketplace will certainly end up with generality getting the short end of the stick. Of course, generality is in the long-range best interest of the marketplace, but the marketplace itself is like a 5 year old child, incapable of seeing its long-range best interest. This is why there are grown-ups in the world. Bob Braden From lixia at CS.UCLA.EDU Mon Mar 26 12:16:36 2001 From: lixia at CS.UCLA.EDU (Lixia Zhang) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103261933.TAA01019@gra.isi.edu> from Bob Braden at "Mar 26, 2001 07:33:49 pm" Message-ID: <200103262016.MAA23990@aurora.cs.ucla.edu> > Jim, > > I would suggest that the marketplace is most specifically a poor place > to make wise high-level technical decisions. One could make the case > that TCP/IP has been so successful just because it was allowed to > mature in military and academic environments that shielded it from > irrelevant marketplace pressures for many years. X.25 is a good > example of a technology that did not have that advantage. There are > also XNS, WAP, VHS, and lots of other examples of market-driven > entries. I beg to exclude XNS from the rest of the "market-driven" entries. Lixia (unrelated to the fact that I worked for Xerox for 7 years) From dpreed at reed.com Mon Mar 26 11:54:29 2001 From: dpreed at reed.com (David P. Reed) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <3ABF75F9.16C5209A@attglobal.net> References: <200103251700.f2PH0dZ51050@aland.bbn.com> Message-ID: <5.0.2.1.2.20010326141852.023a3310@mail.reed.com> At 09:01 AM 3/26/01 -0800, Cannara wrote: >Craig, this has been a common test for years, to see how old a "network- >knowledgeable" student is. Ask the what UDP means. Prior to the interesting >RFC Jeremy produced the "U" stood for just what it stands for in all other >families of protocols that have datagram services -- "unreliable". Somehow >some Internet folks seemed to become sensitive, almost ashamed, of that very >accurate and truthful engineering label, and turned to seek a "u"-word that >had marketability. I've yet to meet a user who knowingly "uses" a datagram >protocol. You're younger than I thought! Alex - Craig may be young, but then I must be ancient at only 49. Anyway, I was there at the meeting where we created UDP (and split TCP into the TCP and IP layers), in Marina del Rey in winter '77/'78. We called it the "User Datagram Protocol" from the first, and the reason was to distinguish it from the IP layer, which was the "datagram protocol" not well tuned for users, since you couldn't demux sensibly on the "protocol" field to the correct "user process" aka "application program instance". (I won't bore you with the radical idea that we had tried to force into TCP of using a 64-bit process-specific address in IP, rather than a machine specific address - but memory cost a few pennies per bit then, so we were viewed as dangerously profligate). Now there may have been some in the years after that that called it "Unreliable ...", but I'd suggest that only those who had fought against the idea that a base datagram function was useful would have stooped to that kind of propaganda. Those of us who fought for a datagram protocol (the PARC people, Danny Cohen and the speech people, and the LAN people like me) used the term "best efforts", not "unreliable", to describe the delivery reliability of IP and UDP. - David -------------------------------------------- WWW Page: http://www.reed.com/dpr.html From touch at ISI.EDU Mon Mar 26 13:03:06 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <3ABF689C.6060305@Novell.COM> Message-ID: <3ABFAE8A.EC47C9F2@isi.edu> Ramesh Shankar wrote: > > The fairness issue is an interesting angle and seems relevant only when > bandwidth is really limited or from an ISP perspective (perhaps). This > angle is similar to the "fair share scheduling" approach used in time > sharing UNIX systems. This issue has been discussed in the following > Ph.D. thesis: > > V. N. Padmanabhan > Ph.D. Dissertation > Computer Science Division, University of California at Berkeley, USA > September 1998 > (Also published as Technical Report UCB/CSD-98-1016.) > > http://www.research.microsoft.com/~padmanab/phd-thesis.html FWIW, RFC2140 (April 1997) speaks directly to the issue of how sharing is compliant with TCP and is an extension of T/TCP concepts. Fairness can be completely decoupled from the number of connections between two hosts. Joe > >> David P. Reed wrote: > >> > >>> Why not use multiple TCP connections > >> > >> Two reasons: (1) fairness (2) slow start/congestion avoidance. > >> Fairness: If I use "n" TCP connections for a single flow because I have > >> three logical streams that I want to be processed out-of-order with > >> respect to one another, then I am getting "n" times greater a share of > >> the bandwidth on congested links that I should reasonably be entitled > >> to. > > > > > > This begs the question: what are you reasonably entitled to? > > > > If you have three logically separate streams which can be processed > > out-of-order, I would have thought there is a case to be made that those > > are three essentially independent streams (which just happen to be between > > the same end-nodes), and so together they deserve three times the > > bandwidth of a single stream. > > > > Damon Wischik. From touch at ISI.EDU Mon Mar 26 13:13:18 2001 From: touch at ISI.EDU (Joe Touch) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103232054.UAA16559@gra.isi.edu> <00f101c0b624$a8620d50$710e10ac@giganet.com> Message-ID: <3ABFB0EE.832AF0E5@isi.edu> Jim Williams wrote: > > ----- Original Message ----- > From: "Bob Braden" > To: > Cc: > Sent: Friday, March 23, 2001 3:54 PM > Subject: [e2e] TCP Framing > > >Hi. At the IETF just completed, I sat through an exposition of > >the following Internet Draft: > > > > "Title : ULP Framing for TCP > > Author(s) : J. Williams et al. > > Filename : draft-williams-tcpulpframe-01.txt > > Pages : 12 > > Date : 22-Mar-01 > > > > This document proposes a framing protocol for TCP which is designed > > to be fully compliant with applicable TCP RFC's and fully > > interoperable with existing TCP implementations. The framing > > mechanism is designed to work as a 'shim' between TCP and higher- > > level protocols, preserving the reliable, in-order delivery of TCP > > while adding the preservation of higher-level protocol record > > boundaries if the record is less than or equal to the path MTU. The > > shim is designed to enable hardware acceleration of data movement > > operations (e.g. direct placement of receive TCP segments into > > higher-level protocol buffers) for the protocols that use it, even > > if TCP segments are delivered out-of-order." > > > >I would like to suggest two things about this, one simple and one > >subtle. The simple one is this: to say that the ULP framing is fully > >compliant with the applicable TCP RFCs is simply false. For some of > >us, at least, such a lack of truth in technical advertising is a red > >flag. > > I hope you are not attacking the honesty of the authors. I may well be > the most intellectually dishonest scoundrel to ever roam the internet, > but I can assure you that the other authors are fine, honest, upstanding > people who would not let me get away with anything underhanded. :-) > > More seriously, many alternatives had been considered which defined new > TCP options or defined currently reserved TCP header bits. The point > being that the submitted proposal does not do any of those things, > which leads to the claim of full compliance with existing RFCs. My primary concern is that this appears to be a stopgap measure until SCTP is available. Stopgap modifications to widely-deployed protocols (e.g., TCP), even optional ones, should be considered only very hesitantly. As a stopgap, it might be sufficient to create a new "protocol" which happens to be based on a TCP implementation with the addition of record boundary enforcement, as a new (and somewhat temporary) protocol. Backward compatibility can be achieved by having the server sit on BOTH protocol ports - conventional TCP and this new enhanced-reliable-record-transport. This allows implementers to leverage the current base of silicon-friendly TCP implementations with somewhat minor modifications. ----- The concern with having even an optional modification to the TCP API is that it can creep into the assumptions of the default API. I prefer the freedom of the existing decoupling; anything that even implicitly endorses an optional modification to that API is sliding down the path to a true modification. Given the ephemeral nature of this proposed modification, that seems premature. Joe From mfisk at lanl.gov Mon Mar 26 21:21:09 2001 From: mfisk at lanl.gov (Mike Fisk) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103261346.f2QDk0Z54200@aland.bbn.com> Message-ID: My message was misunderstood; I'm familiar with COBS. I was attempting to ask a leading question of the authors of the draft and other supporters of similar proposals. I was hoping that they could explain why _they_ don't feel that byte stuffing is an appropriate solution. To date, I haven't heard any credible arguments about why byte-stuffing wouldn't be sufficient. On Mon, 26 Mar 2001, Craig Partridge wrote: > > In message , Mike Fis > k writes: > > >I assume the argument is that it is inefficient to scan and twiddle bytes > >and that some out-of-band (ala packet segmentation) framing looks cheaper. > > COBS is a very efficient byte stuffing that doesn't require much byte > scanning. If you're asking the question, you might go looks at Cheshire's > SIGCOMM paper and see how COBS might fit. > > Craig > -- Mike Fisk, RADIANT Team, Network Engineering Group, Los Alamos National Lab See http://home.lanl.gov/mfisk/ for contact information From cannara at attglobal.net Tue Mar 27 01:31:43 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <5.0.2.1.2.20010323173039.02fd7e60@mail.reed.com> <200103232054.UAA16559@gra.isi.edu> <5.0.2.1.2.20010324142828.02fd37a0@mail.reed.com> Message-ID: <3AC05DFF.C7030051@attglobal.net> Actually, with current network processors (e.g., Vitesse, IBM, PMCC, Intel...) flows are queued and can be classified for RED or other QoS purposes by 5-tuples, which include ports. This is quite logical, since a conversation on one port pair, especially to a common system (e.g., server) will rightly deserve differing flow treatment from other port pairs. Loss probability under RED then can vary across connections between individual IP pairs. Alex "David P. Reed" wrote: > [clip] > Don't think this is actually true. packet drop rate on the shared link has > nothing to do with port numbers - even RED discriminates only on IP > address. Now ECN might cause one TCP to back off and another to back off > less, but the stable state would seem to be the same, whether multiple TCP > connections are used or not. (some of the less end-to-endian notions of > router fairness might give 3 TCP cnxns better service, by looking deeper > into the packets). > [clip] From cannara at attglobal.net Tue Mar 27 01:32:49 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103262016.MAA23990@aurora.cs.ucla.edu> Message-ID: <3AC05E41.FC5EBAF1@attglobal.net> Definitely agree, given Xerox's tradition of 'success' in marketing. XNS was researched rather than marketed. TCP/IP, however, has been subsidized beyond grandest imaginings -- free distribution with Sun, ATT, HP... machines for years, untold public $ spent on graduate students, research projects, papers, committees... And, the real hero of the Internet, Bob Kahn, rarely gets the recognition he deserves, for zealously working to maintain the flow of public finances, even when DARPA was ready to cut and run. Even now, millions more are being spent to get back even the basics of a secure, uniformly-addressable internetworking structure that were overlooked in the adolescent design process that has left us with the profoundly hackable Internet protocol family. I only use "adolescent" rather than "bureaucratic" here, because The Economist has an Internet piece out using that modifier. {:o] Alex Lixia Zhang wrote: > > > Jim, > > > > I would suggest that the marketplace is most specifically a poor place > > to make wise high-level technical decisions. One could make the case > > that TCP/IP has been so successful just because it was allowed to > > mature in military and academic environments that shielded it from > > irrelevant marketplace pressures for many years. X.25 is a good > > example of a technology that did not have that advantage. There are > > also XNS, WAP, VHS, and lots of other examples of market-driven > > entries. > > I beg to exclude XNS from the rest of the "market-driven" entries. > > Lixia > (unrelated to the fact that I worked for Xerox for 7 years) From cannara at attglobal.net Tue Mar 27 01:35:00 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <200103251700.f2PH0dZ51050@aland.bbn.com> <5.0.2.1.2.20010326141852.023a3310@mail.reed.com> Message-ID: <3AC05EC4.B720078C@attglobal.net> This is interesting David, having known the people at Parc and being still older, the idea that packet networking began with those meetings is, as you'll agree, incorrect. Since "unreliable" was used in packet networking as equivalent to "datagram" for years before those meetings, and books describing UDP even later used "unreliable", perhaps as a matter of ethical choice, I can only say that the choice of "user" as a modifier for a user-invisible protocol component underscores how arbitrary many choices of terms in the TCP/IP family have been. The idea of "best effort" is also a hard one to support, since "best" is very much open to interpretation, especially by a receiver who got nothing, or something trashed. If "best effort" is a euphemism for datagram, then it's no wonder some folks thought the imaginitive naming, adopted as you say, was objectionable. The problem that "user" and "best-effort" raise is that they mean nothing and add nothing to pre-existing terms, such as datagram. Actually, since UDP at least checksums a datagram, it could well have been called "CDP", for "checksummed datagram protocol", thus being much clearer to "users" in its purpose and capability. Alex "David P. Reed" wrote: > > At 09:01 AM 3/26/01 -0800, Cannara wrote: > >Craig, this has been a common test for years, to see how old a "network- > >knowledgeable" student is. Ask the what UDP means. Prior to the interesting > >RFC Jeremy produced the "U" stood for just what it stands for in all other > >families of protocols that have datagram services -- "unreliable". Somehow > >some Internet folks seemed to become sensitive, almost ashamed, of that very > >accurate and truthful engineering label, and turned to seek a "u"-word that > >had marketability. I've yet to meet a user who knowingly "uses" a datagram > >protocol. You're younger than I thought! > > Alex - Craig may be young, but then I must be ancient at only 49. Anyway, > I was there at the meeting where we created UDP (and split TCP into the TCP > and IP layers), in Marina del Rey in winter '77/'78. We called it the > "User Datagram Protocol" from the first, and the reason was to distinguish > it from the IP layer, which was the "datagram protocol" not well tuned for > users, since you couldn't demux sensibly on the "protocol" field to the > correct "user process" aka "application program instance". (I won't bore > you with the radical idea that we had tried to force into TCP of using a > 64-bit process-specific address in IP, rather than a machine specific > address - but memory cost a few pennies per bit then, so we were viewed as > dangerously profligate). > > Now there may have been some in the years after that that called it > "Unreliable ...", but I'd suggest that only those who had fought against > the idea that a base datagram function was useful would have stooped to > that kind of propaganda. Those of us who fought for a datagram protocol > (the PARC people, Danny Cohen and the speech people, and the LAN people > like me) used the term "best efforts", not "unreliable", to describe the > delivery reliability of IP and UDP. > > - David > From cannara at attglobal.net Tue Mar 27 01:36:00 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: Message-ID: <3AC05F00.D2A42706@attglobal.net> Lloyd, as I said to Craig it was late (or early) and Jon or Jeremy were equally good for me. :] Ok, so how old is that? And, is an x.25 datagram now reliable? Alex Lloyd Wood wrote: > > On Mon, 26 Mar 2001, Cannara wrote: > > > Craig, this has been a common test for years, to see how old a "network- > > knowledgeable" student is. Ask the what UDP means. Prior to the interesting > > RFC Jeremy produced > > Okay, just how old do you have to be to know that Jeremy 'Bentham' > Postel later changed his name by deed poll to Jon? > > > the "U" stood for just what it stands for in all other > > families of protocols that have datagram services -- "unreliable". > > such as, oh, X.25 datagram transport? > > L. > [clip] From J.Crowcroft at cs.ucl.ac.uk Tue Mar 27 02:02:45 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of "Tue, 27 Mar 2001 01:35:00 -0800." <3AC05EC4.B720078C@attglobal.net> Message-ID: <4382.985687365@cs.ucl.ac.uk> >>The idea of "best effort" is also a hard one to support, since "best" is very >>much open to interpretation this isnt rocket science. the reliability sematics of the UDP service are not distnguisible from the IP service that carries a UDP payload. the checksum is optional and UDPlite work is working on amaking it partially optional:-) so even the bit delivery sematics aren't any better or worse than IP other protocols above IP add different value. SCTP and TCP and RDP and netblt and PGM and so on all add some notion of a lower failure probability, as well as what quaint old iso people used to call "signalled" errors only - i.e. apart fro ma few corner cases that stone/partidge etal idneity in the engineering noise (literally) they attempt to reduce unsignaled errors to as close to zero as acceptable for the application (or Upper Layer Protocol as we used to say).... btw, we used to have several types of datagrams in other networks other than IP ones- for example, n the cambridge distributed system there was a Universe Datagram Protocol (i even did a gateway to ip once for it as well as a layering of IP on it - oh, and it was run o na sort of ATM layer, except we only had 16bit cells:-) and in X25 nets there WAS actually a datagram service - it was called Fast Select, and was rarely implemented. Some folks in X.25 made mistakes about the X.25 semantics and didn't get edge-to-edge reliability beyond the _interface_ - in this case, while pedfantically,m they were right, what ytou actyually got was an unreliable data transfer service withotu signaled errors -for exampl the UK academic IP on 2Mbps X.25 service in the late 80s suffered interesting performance effects from this... other cases -oh, look at GPRS and the "reliable" (aka window and GBN/retransmit) link layer it offers as an option - the effect depends on the interface spec and how long you are acutally prepared to _wait_ for a signaled error too...so its quite subtle in reality.... oh, lets not getinto fragmentation debates too :-)... cheers jon who is 43 From laws at dera.gov.uk Tue Mar 27 02:29:25 2001 From: laws at dera.gov.uk (John Laws) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: References: <3ABF75F9.16C5209A@attglobal.net> Message-ID: <3AC07995.10056.18800C@localhost> LLoyd, On 26 Mar 2001, at 19:09, Lloyd Wood wrote: > Okay, just how old do you have to be to know that Jeremy 'Bentham' > Postel later changed his name by deed poll to Jon? I never knew that (and I'm old), but it's a very "interesting" connection (James Burke, Scientific American style) back to another Jon (Crowcroft) at UCL AND that the major benefactor for the foundation of UCL is a Jeremy Bentham. His mummified body is in a display case within UCL (a condition I think of granting his money over to UCL). John _________________________ John Laws Security & Information Systems Battlespace Management Dept. Integrated Systems Sector, Security Division Defence Evaluation & Research Agency, Malvern Worcs WR14 3PS UK Tel +44 1684 89-4903 (with voice mail), Fax +44 1684 89-6064 DERA Standard Internet Disclaimer "The Information contained in this e-mail and any subsequent correspondence is private and is intended solely for the intended recipient(s). For those other than the intended recipient any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on such information is prohibited and may be unlawful." From J.Crowcroft at cs.ucl.ac.uk Tue Mar 27 03:37:56 2001 From: J.Crowcroft at cs.ucl.ac.uk (Jon Crowcroft) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of "Tue, 27 Mar 2001 11:29:25 BST." <3AC07995.10056.18800C@localhost> Message-ID: <4729.985693076@cs.ucl.ac.uk> In message <3AC07995.10056.18800C@localhost>, John Laws typed: >>I never knew that (and I'm old), but it's a very "interesting" >>connection (James Burke, Scientific American style) back to another >>Jon (Crowcroft) at UCL AND that the major benefactor for the >>foundation of UCL is a Jeremy Bentham. His mummified body is in a >>display case within UCL (a condition I think of granting his money >>over to UCL). John technically, Bentham was not a founder - he was the mentor for a group of utilitarians who were the actual foudners - his body is in a glass case in the Quad (mummified, sans head) and has top be present at all college council meetings (interesting given he was against all organised religion) - his head is elsewherre (believed to be in a safe since various pranksters stole it and did various dubious things to it, though i have heard the same story about oliver cromwell's head in cambridge (pembroke college?)...) those of you coming to the London IETF this summer may wish to visit UCL and see for yourselves - see http://www-mice.cs.ucl.ac.uk/ietf/ for a totally informal set of info about this event i believe ip protocol #7 is still assigned as 7 UCL UCL [PK] for those of you interested in history - it was part of a "remote" transport end-point hack that was called "clean and simple" that allowed one to concatenate a variety of "end" to end protocols together and provide transparent protocol translation - the way the different families linked to each other was through a type of "network address translation" in a true sense of translation, and so long as the protocols had the right semantics, the service actually kind of worked.... (module signaled error models...:-) one of the protocols went by the name of yellow book ,and another was TP4 and another was a Byte Stream Protocol on a Cambridge ring and another was something we came across in 81 called TCP.... of course, the term "clean and simple" was (i believe) ironic cheers jon From cannara at attglobal.net Tue Mar 27 09:35:16 2001 From: cannara at attglobal.net (Cannara) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <4382.985687365@cs.ucl.ac.uk> Message-ID: <3AC0CF54.AC35DA8A@attglobal.net> Exactly the point Jon -- no rocket science. So we should have no need for meaningless adjectives that mislead folks naive to the systems. "DP" could be as sufficient as "IP". Alex Jon Crowcroft wrote: > > >>The idea of "best effort" is also a hard one to support, since "best" is very > >>much open to interpretation > > this isnt rocket science. > > the reliability sematics of the UDP service are not distnguisible from > the IP service that carries a UDP payload. [clip] From ballardie at dial.pipex.com Tue Mar 27 09:53:37 2001 From: ballardie at dial.pipex.com (Tony Ballardie) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing References: <4729.985693076@cs.ucl.ac.uk> Message-ID: <007501c0b6e6$f928ae20$7f0dbc3e@vaionote> ----- Original Message ----- From: "Jon Crowcroft" To: Cc: "Lloyd Wood" ; Sent: 27 March 2001 12:37 Subject: Re: [e2e] TCP Framing > i believe ip protocol #7 is still assigned as > 7 UCL UCL [PK] > for those of you interested in history - it was part of a "remote" > transport end-point hack that was called "clean and simple" IP ptcl #7 was assigned to CBT in '95 or '96. So, still UCL related, and... "clean and simple" too :-) Tony From braden at ISI.EDU Tue Mar 27 13:24:53 2001 From: braden at ISI.EDU (Bob Braden) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] FYI - Proposal: Real-time transfer protocol Message-ID: <200103272124.VAA01913@gra.isi.edu> From: "Real-time transfer protocol" To: , Subject: Proposal: Real-time transfer protocol Date: Mon, 26 Mar 2001 12:46:44 +0300 Dear Sir/Madam, Excuse me for sending this letter without your permission. I just wanted to intruduce to your attention my research, concerning a protocol for real-time data transmission in Internet. It is published at: http://over-ground.net/rttp This is a public research, relying on volunteers for its development. It is a request for comments, though this is not an RFC in the common meaning of this abbreviation. If my work is outside the range of your interests, I would much appreciate informing your colegues who could be ineterested about it. Thank you! Yours faithfully, Dimitar Aleksandrov From mfisk at lanl.gov Tue Mar 27 14:55:36 2001 From: mfisk at lanl.gov (Mike Fisk) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: <200103270610.WAA10577@champagne.dsg.stanford.edu> Message-ID: Once you know where the record boundary is, you can find an upper layer header and use whatever upper-layer logic is neccessary to place (DMA) the block. What you don't want is to receive a packet that lacks a header describing where to put it. You can add an optional RDMA header to TCP or IP or you can add it to the TCP payload and make sure that there's one per packet. What seems problematic to me is assuming a 1-1 mapping between upper-layer blocks and TCP segments. To me, this suggests that when building TCP segments you want to insert a header into the byte stream right before each block and at the beginning of each segment. But this header can be generated at the last minute by the TCP output routines. There doesn't seem to be a need to require that segment sizes match upper-layer protocol size. And if you don't want to use something like byte-stuffing to find the header, you can place the header(s) at the beginning of each segment. Assuming that DF is set, and middleboxes are well-behaved (is that an oxymoron?), that segment should be preserved end-to-end. On Mon, 26 Mar 2001, Jonathan Stone wrote: > I suspect its because they want not just to preserve record > boundaries, but to align "records" onto suitable memory boundaries. > > Think of scsi-over-tcp, with the TCP stream carrying a mix of > "scsi CCBs" and "disk blocks." > > Then again, i could be completey wrong... As could I. In particular, the folks designing NICs to do this may have some constraints that I'm not aware of. -- Mike Fisk, RADIANT Team, Network Engineering Group, Los Alamos National Lab See http://home.lanl.gov/mfisk/ for contact information From knm at protocol.ece.iisc.ernet.in Tue Mar 27 19:13:35 2001 From: knm at protocol.ece.iisc.ernet.in (K N Manoj) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] (no subject) Message-ID: Hello, We face the following problem at our mail server Mail coming from .iisc.ernet.in reach the server in time, but those coming from outside bounce back. We are unable to track it. Can you help us? Thanks and regards, Manoj. -- K N Manoj, Coding and Modulation Lab, Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore, India. 560 012. Ph: 309 2855 12 Deg 58 Min N, 77 Deg 39 Min E -- From mankin at ISI.EDU Thu Mar 29 05:05:33 2001 From: mankin at ISI.EDU (Allison Mankin) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP Framing In-Reply-To: Your message of Fri, 23 Mar 2001 23:02:43 -0800. Message-ID: <10103291305.AA15081@maia.east.isi.edu> > As much as we might dislike the various middle boxes, > I wonder what would happen if one of these TCP connections > passed through a middle box. While many middleboxes tweak things > on a packet by packet basis, there might be some that are essentially > implemented as a read+write loop in application space, i.e. > the TCP segment boundaries would not be preserved. > > Thus trying to make the TCP segment boundaries matter for the ULP > is threading into unchartered territory. The shim does have a provision for detecting that middleboxes have happened to it (resegmenting) and reverting to normal processing if so. Reviewing to see that detection is a s sure as the designers hope would be good. Allison From dino.saija at libero.it Fri Mar 30 07:44:20 2001 From: dino.saija at libero.it (dino.saija@libero.it) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP modeling Message-ID: exist a recent analitical model for TCP ? thank you From tjo at research.telcordia.com Fri Mar 30 08:56:12 2001 From: tjo at research.telcordia.com (Teunis J Ott) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP modeling In-Reply-To: "dino.saija@libero.it" "[e2e] TCP modeling" (Mar 30, 5:44pm) References: Message-ID: <1010330115611.ZM27735@buzz> On Mar 30, 5:44pm, dino.saija@libero.it wrote: > Subject: [e2e] TCP modeling > exist a recent analitical model for TCP ? > thank you > > >-- End of excerpt from dino.saija@libero.it See ftp://ftp.research.telcordia.com/pub/tjo/TCPwindow.ps It is pretty old, but it makes sense. Teun Ott. From larse at ISI.EDU Fri Mar 30 10:27:59 2001 From: larse at ISI.EDU (Lars Eggert) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP modeling References: Message-ID: <3AC4D02F.8B0106BB@isi.edu> "dino.saija@libero.it" wrote: > > exist a recent analitical model for TCP ? > thank you I think Vishal Misra (http://www-net.cs.umass.edu/~misra/) has some papers on that. He just gave a talk at USC last week. -- Lars Eggert Information Sciences Institute http://www.isi.edu/larse/ University of Southern California -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 2087 bytes Desc: S/MIME Cryptographic Signature Url : http://www.postel.org/pipermail/end2end-interest/attachments/20010330/e8bebf2e/smime.bin From padhye at aciri.org Fri Mar 30 10:34:24 2001 From: padhye at aciri.org (Jitendra Padhye) Date: Thu Mar 25 11:59:37 2004 Subject: [e2e] TCP modeling In-Reply-To: from "dino.saija@libero.it" at "Mar 30, 2001 5:44:20 pm" Message-ID: <200103301834.f2UIYOU80169@moose.aciri.org> http://www.aciri.org/padhye/tcp-model.html Lists some (but certainly not all) of the papers on this topic. If you find any more, please let me know! - Jitu > exist a recent analitical model for TCP ? > thank you > > > From guol at cs.bu.edu Fri Mar 30 12:32:46 2001 From: guol at cs.bu.edu (Guo, Liang) Date: Thu Mar 25 11:59:38 2004 Subject: [e2e] [ns]: RED treatment to SYN packet from TCP/ECN source In-Reply-To: Message-ID: I'm reading tcp.cc/red.cc file and intrigued by the following questions. Since tcp.cc in ns assumes one-way session, the first packet will be serving as a SYN packet although in most time it is treated as the first data packet. Here comes the problem. For ECN capable tcp, here's the code from output() function: if (seqno == 0) { if (syn_) { hdr_cmn::access(p)->size() = tcpip_base_hdr_size_; } if (ecn_) { hf->ecnecho() = 1; // hf->cong_action() = 1; hf->ect() = 0; ~~~~~~~~~~~~~~~~~~ } So the first packet will carry no ECT codepoint. I guess this is following the specification from draft-ietf-tsvwg-ecn-03.txt which demands "A host MUST NOT set ECT on data packets unless it has sent at least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has received at least one ECN-setup SYN or ECN-setup SYN-ACK packet, and has sent no non-ECN-setup SYN or non-ECN-setup SYN-ACK packet." However, at the RED queue, the queue only do ECN (marking instead of dropping) to packets that carries ECT bit. Here's the code from red.cc: hdr_flags* hf = hdr_flags::access(pickPacketForECN(pkt)); if (edp_.setbit && hf->ect() && edv_.v_ave < edp_.th_max) { ~~~~~~~~~~~~ hf->ce() = 1; // mark Congestion Experienced bit return (0); // no drop } else { return (1); // drop } My question is, does this mean that the SYN packet is more likely to be dropped than the data packet? This is horrible because dropping SYN packet will cause a 6 seconds timeout even if the RTT is say 0.1 msec. Wouldn't it be nice if RED queue also protects these SYN packets? I'm not sure how it is implemented in real network products. But at least I've seen different implementation of RED on linux machines. One more thing, why does TCP/ECN allows congestion window to go below 1 (so if not using double precision, that means cwnd could be 0)? Any special purpose for this to happen? Guo, Liang guol@cs.bu.edu Dept. of Comp. Sci., Boston Univ., (617)353-5222 (O) 111 Cummington St., MCS-217, (617)375-9206 (H) Boston, MA 02215 From csapuntz at stanford.edu Fri Mar 30 22:33:41 2001 From: csapuntz at stanford.edu (Constantine Sapuntzakis) Date: Thu Mar 25 11:59:38 2004 Subject: [e2e] TCP Framing References: Message-ID: <016101c0b9ac$87560950$0f00000a@KEALIACSAPUNTZ> Hi Mike, I hope this e-mail can respond to a couple of your very good and thought-provoking points. I'll use the term upper-layer protocol (ULP) to talk about the protocol riding on top of TCP. Some examples of ULPs include iSCSI, SSL, NFS, and RDMA. There are two properties we were looking to get from TCP: 1) finding NLP message boundaries in segments received out-of-order This involves having some signalling discipline for message boundaries. There are several ways other than the one we proposed of providing this property (including techniques that do not modify the TCP sender). These include having a header periodically in the TCP stream (say every 1000 bytes) or a byte-stuffing technique like COBS. 2) application messages not spanning segments This simplifies the receiver as it does not have to deal with cases where ULP headers span segments or where ULP datagrams are broken across TCP segments. I don't believe that property #2 can be had without modifying the TCP sender a la the proposal presented. ----------- One could question how critical property #2 is. After all, if stuff arrives mostly in order except for the occasional drop, you can keep a bit of application state from packet to packet. I would still argue that property #2 makes life on the fast path a good deal easier for the receiver. -Costa From ggumdol at comis.kaist.ac.kr Sat Mar 31 03:34:15 2001 From: ggumdol at comis.kaist.ac.kr (Jeong-woo Cho) Date: Thu Mar 25 11:59:38 2004 Subject: [e2e] TCP modeling References: Message-ID: <001b01c0b9d6$84208ec0$2992f88f@ggumdol> ----- Original Message ----- From: To: Sent: Saturday, March 31, 2001 12:44 AM Subject: [e2e] TCP modeling > exist a recent analitical model for TCP ? > thank you > > > I think that the following paper is the most excellent paper about TCP modeling. Jitendra Padhye, Victor Firoiu, and Donald F. Towsley, "Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation". IEEE/ACM Transaction on Networking, vol. 8, no. 2, April 2000. From ggumdol at comis.kaist.ac.kr Sat Mar 31 03:43:03 2001 From: ggumdol at comis.kaist.ac.kr (Jeong-woo Cho) Date: Thu Mar 25 11:59:38 2004 Subject: [e2e] RED with TFRC Message-ID: <003901c0b9d7$c1f17d30$2992f88f@ggumdol> Although Sally insists that TFRC could achieve smooth sending rates of real-time applications, (in fact, TFRC is smoother than TCP) RED is not a good router mechanism for real-time applications which would adopt TFRC as their congestion control mechanism. I think that dropping strategy of RED is to simplified and it cannot avoid "random drops" which is quite bad for TFRC flows, which uses weighted sum of last n packet drop intervals to estimate current fair share. In conclude, I think that their should be another router mechanisms to avoid these "random packet drops". Is their any discussions on this? From tqbf at sonicity.com Mon Mar 26 00:25:26 2001 From: tqbf at sonicity.com (Thomas H. Ptacek) Date: Thu Mar 25 11:59:46 2004 Subject: [e2e] UDP length field In-Reply-To: <200104191632.f3JGW2I22876@baskerville.CS.Arizona.EDU> References: <200104191632.f3JGW2I22876@baskerville.CS.Arizona.EDU> Message-ID: <985595126.1070.4.camel@tqbf-notebook.int.sonicity.com> > The validity checks is another issue, all UDPs that we've > looked at seems to adhere to Craigs rules. There is one > exception which is Quake's home-grown UDP. They do some Quake has its own UDP? I can see (evil) reasons for building ones own TCP, but almost no benefit to a custom UDP. Is this an OS issue (ie, they reimplemented sockets for speed) or did they also build their own IP? Does anyone know the answer to this? --- Thomas H. Ptacek