From sm at mirapoint.com Fri Jul 1 01:09:04 2005 From: sm at mirapoint.com (Sam Manthorpe) Date: Fri, 1 Jul 2005 01:09:04 -0700 (PDT) Subject: [e2e] Reacting to corruption based loss Message-ID: <20050701010904.EDJ00828@alpo.mirapoint.com> Hi Alex, Sorry for the delay in replying... ---- Original message ---- >Date: Wed, 29 Jun 2005 11:47:04 -0700 >From: Cannara >Subject: Re: [e2e] Reacting to corruption based loss >To: end2end-interest at postel.org > >Good response Sam. The kind that leads to more thought, in fact. > >How many years ago for the 1st example, you ask. For that one, 6. For the >one this year, 0.25 year. :] > >You say you "spent a long time simulating the BSD stack". That's great, and >part of the problem. Folks do simulations which are based on code written to >simulate someone's ideas on how something works. Then, they believe the >simulations, despite what's actually seen in reality. We all know that >simulators and their use can be very limited in relevance, if not accuracy. Yes I know, which was why I qualified my observation with my practical experience as well. I keep seeing these rathole threads one e2e and, to my shame, dipped my ignorant toe in :-) I think for the benefit of e2e part-time readers as myself, a synopsis of the actual problem with TCP as it stands today for local and global communication would be a good thing. Because I can't perceive any. And let's not do the anectodal thing, I'm thinking more of a cost-based analysis, including details of how much the alleged problem is costing. >One of the biggest issues is lack of release control for things as important >as Internet protocols (e.g., TCP). Thus the NT server may have a different >version of TCP from that on the user's spanking new PC. No one ever addresses >even the basics of stack parameter settings in their manuals, and network >staffers rarely have the time to go in and check versions, timer settings, >yadda, yadda. This is indeed why many performance problems occur. You fixed >IRIX 6 years ago. Great. Um, it was a bug. I didn't understand the argument... > >Now, why does the Internet work? Not simply because of TCP, for sure. Your >experiment illustrates the rush to acceptance these points are raised >against: > >"I transfered a largish file to my sluggish corporate ftp server. Took 77 >seconds (over the Internet, from San Francisco to Sunnyvale). I then did the >same thing, this time I unplugged my Ethernet cable 6 times, each time for 4 >seconds. The transfer took 131 seconds." > >So, what is "largish" in more precise terms? What are the RTT and limiting >bit-rate of your "Internet" path from SF to S'vale? As I said, it was "for fun". :-) >The file evidently went >right by our house! But, despite the imprecision, we can use your result: 77 >+ 6 x 4 = 101. Your transfer actually took 131 seconds, fully 30% more than >one would expect on a link that's simply interrupted, not congested. Good >experiment! But the relevant fact is that it worked. And didn't suck too much. And I'm confident that it would still work and not suck too much even if SBC replaced all their hardware with something that had a hypothetical bug in the OS that made my biterror notification not inform my transport layer that loss was due to congestion and not link flakiness. Sure you could architect something that utilized every spare bit on a link, but at what cost? And why? What's the justification for all the added points-of-failure? Again, I don't follow this list much, but reading a few of your postings, you seem to be suggesting that TCP/IP is fundamentally flawed as a layer3/4 team and think that a replacement of the protocol is in order. Do I understand you correctly? Cheers, -- Sam ------------------------ Sam Manthorpe, Mirapoint From rja at extremenetworks.com Fri Jul 1 02:31:13 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Fri, 1 Jul 2005 05:31:13 -0400 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C441E8.3080806@isi.edu> References: <42C441E8.3080806@isi.edu> Message-ID: On Jun 30, 2005, at 15:03, Joe Touch wrote: > It does suggest, however, that if new keys are used on both sides, > then > both sides ought to flush their connections entirely (i.e., drop all > TCBs using old keys). This affects TCP/MD5 keying, but that's not > automatically managed, though. I would normally expect that if a reboot triggered rekeying, then some form of automated key management would be in place. In practice, current deployments change keys roughly never. Along that line, it should be quite practical for someone to write a "TCP MD5 Domain of Interpretation" specification to permit the existing ISAKMP/IKE protocol to be used for this purpose. In practice, most current implementations of "TCP MD5 for BGP" only support one key per remote-peer at a time, which is a challange if one wants to have smooth key rollover (whether manual or via some automated key management). This is just an implementation issue; nothing prevents supporting more than one key at a time. Actually, the bit I find most surprising is that the majority of deployed BGP sessions (including the majority of e-BGP sessions) run without even enabling TCP MD5. Given that folks generally don't deploy TCP MD5 to protect against basic attacks (e.g. TCP RST attacks or TCP session stealing), I don't see why one would think that some form of authentication enhancement within the BGP protocol itself would have rapid or widespread deployment.[1] Ran rja at extremenetworks.com [1] My non-scientific sample of network operators can't find anyone who thinks Kent's S-BGP is deployable. Most think that SO-BGP is deployable, but would be challenging to deploy, and are hoping for something more deployable than either one of those two. From gaylord at dirtcheapemail.com Fri Jul 1 07:06:47 2005 From: gaylord at dirtcheapemail.com (Clark Gaylord) Date: Fri, 01 Jul 2005 10:06:47 -0400 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C473A9.FEE67D3A@attglobal.net> References: <20050626001318.CB7A424D@aland.bbn.com> <42BEE8D7.D2FFC9E6@attglobal.net> <42BF2738.AD082D57@web.de> <42BF814A.C4948522@attglobal.net> <42BFDB1C.2010009@reed.com> <42C473A9.FEE67D3A@attglobal.net> Message-ID: <42C54DF7.2030501@dirtcheapemail.com> Cannara wrote: >1) Indeed "hardware corruption can be detected by software", which is what >MIBs report to us via net mgmnt systems, and all metro distribution systems > > MIBs don't detect, they report. --ckg From detlef.bosau at web.de Fri Jul 1 08:25:19 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 01 Jul 2005 17:25:19 +0200 Subject: [e2e] Reacting to corruption based loss References: <42C095F8.26792D6@web.de> <20050628145330.GD2392@grc.nasa.gov> <42C1A645.F4A563F2@web.de> <42C24148.ECD1AEC4@attglobal.net> <42C28353.6E5E1861@web.de> <42C2A557.7010600@dirtcheapemail.com> <42C2E362.A1F03AC7@attglobal.net> <20050629191853.GA30771@colt.internal> <42C43B15.62EC968D@attglobal.net> <20050630194358.GH30771@colt.internal> <42C479F0.23D6DC19@attglobal.net> Message-ID: <42C5605F.B0326612@web.de> Cannara wrote: > product design. Is that your feeling? It happens to be mine, on TCP/IP as > well, since it's so important nowadays, even for our upcoming robotic > surgeries, where the doc can do them from his/her PC while taking a break from > surfing in Maui. Alex, get a life. The Internet ist still basically a best effort packet switched network. And as such, it serves its purposes. And quite satisfactory, in my humble opinion. When some marketing guys babble about "Internet Telephony" as a i) fully functional and ii) cheaper replacment for the telephone system, this seems to me at least questionable, however I often have fun from that when I read all this funny advertisments :-) But when it comes to "robotic surgeries" (I think, the first lawsuits are pending) or perhaps the "Internet version" of this: "Internet remote surgery", which is even more lunatic, my personal limit of tolerance is reached. Please excuse me, but it is not a mousepad or a keyboard lying on the table in the operating theater. It?s a human being. We should not forget about the purposes _and_ limitations of the Internet. In addition: If you really _need_ some conferening system for supportive purposes.e.g. during some surgery, it is absolutely no problem to achieve this via highly proven techniques and services offered by any telephone company you prefer. And I really do not see any sense in blaming the Internet when the Internet does not serve a purpose it really was not intented for. I?m new to this list. And I do follow the discussions here only for a few days. And I?ve learned a lot of things since then. And when there have been different views on a certain issue, we could discuss that. However, it took me only three or four days to get a summary of a large number of your posts: "The Internet is bad." "The Internet is worth nothing." "The Internet is broken." "Nobody understands the Internet." "TCP is bad." "TCP is fully broken." "TCP does not work." "TCP is brought to is knees when a bug is entering my kitchen." Why on earth does this list work pretty fine? Of course, you will know. Because you know everything. In this respect, you remind me on our German Foreign Secretary, Joschka Fischer. In our television news, Joschka Fischer is sometimes simply called "God Father". Because he knows all things, he can do all thins, and whoever else knows all and can do all - "GF" knows and does even better. > > So, your hyperbole, like: "endless prattle that TCP is completely broken, IP > is worthless, it's all done wrong, it will never work, it's insecure, blah > blah blah blah blah" or "...panic it out of existence" are your own figments, Please excuse me, but it basically matches my impression too. > and "endless prattle". The discussion has always been what can be done to > improve matters. I mean that was one thing behind IPv6, wasn't it? :] Alex, there is an, admittedly little, difference between "the whole thing is broken" and "the thing can be improved" or "there is still some flaw and I propose a way to fix it." I sometimes tend to fail on this one myself. E.g., when I said some days ago the loss differentiation debate does not come to and end, I must be careful not to offend those hundreds of people, who spent years on this issue and who honestly tried to make contributions here. Perhaps, I have to learn from that and correct my own attitude towards other people. Honestly spoken: In some respect, we are companions in misfortune here. Perhaps, both of us talk a little bit too much - and don?t listen enough. And perhaps both of us have a talent to blame and to offend people - instead of appreciate them and their work. > > Oh, since you're impressed by, and love saying: "> P.S.: Somehow your mail got > to me again, and it even traversed the Internet..." Consider the fully Alex, this is exactly the style of formulation, which people do not like from _me_. When I talk that way, people simply leave me alone. I only can tell you my own experience, but talking to people like that makes _me_ lonesame and makes _me_ bitter. Other people simply go away. They do not suffer from that. They do not care. > optical, networks across Europe around 1800 -- Napoleon got lots of reliable > mail that way, about as quickly you get mine. Why, the interfaces even had > compression, encryption and error correction. {:o] You are totally correct there. And therefore, I take pride to say: I?m from the "Old Europe". The semaphore system was a really great invention. And how much can we learn from it, when we learn to appreciate this work and not only say, it is old, it is bad, it is outdated. How much of the basic principles of discrete coding, the store and forward principile, overlay networks etc. etc. etc. are adopted from those old examples! Have a look at Wesley?s homepage. http://roland.grc.nasa.gov/~weddy/ Wes has a very wise citation there: "Mathematicians stand on each other's shoulders while computer scientists stand on each other's toes." -- R. W. Hamming There?s really a deep wisdom in this sentence. I honestly apologize, when I have offended anyone with this post. This is definitely _not_ my intention. However, dear Alex, I really think that you have much to say and we can learn a lot from your knowledge. But the way you share your knowledge, the way you write is sometimes, excuse me, I?m no native speaker and in German I would know how to say what I mean, it is sometimes a little bit unfortunate. It is sometimes a little bit difficult to follow. Exactly as me - I know. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From perfgeek at mac.com Fri Jul 1 08:23:30 2005 From: perfgeek at mac.com (rick jones) Date: Fri, 1 Jul 2005 08:23:30 -0700 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C4425D.18C9EABF@attglobal.net> References: <42C095F8.26792D6@web.de> <20050628145330.GD2392@grc.nasa.gov> <42C1A645.F4A563F2@web.de> <42C24148.ECD1AEC4@attglobal.net> <42C4425D.18C9EABF@attglobal.net> Message-ID: <8fc6006532233700000e4887c9c40bdd@mac.com> >> Why on earth should that have mattered unless perhaps the sending TCP >> had a broken implementation of Nagle that was going segment by segment >> rather than send by send?" > ...When it takes about 32000/1460 pkts to send one block, and the > sender's > window (not easily found & configured in typical stacks) is less than > that, > more than one send window is needed for each block sent. If the last > send > window needed is odd in pkt count, then the sender is done and > awaiting an > ack, while the receiver is awaiting another pkt (for the ack timer > value). If > the path is running at a reasonable rate, then this wasted ~100mS > every 32k > block in a mulit-MB transfer adds up to lots of dead time and a > significant > throughput hit. OK, perhaps I'm being dense, but I still don't see where the TCP specs say that the first segment(s) of the next 32KB chunk to be sent couldn't be sent while awaiting the ack of that last odd segment. I thought (I'll avoid saying assume so I don't have to spell it the way i was taught as ass-u-me :) that if SMB were request/response, the SMB response would have that last odd ACK piggybacked, and even if not, and it was back-to-back 32KB sends, there being window into which it could send the next 32KB block could start flowing. Was the SMB/TCP combination only allowing one 32KB send to be outstanding at a time, and not really making use of sliding window? In a nutshell, I'm having a hard time seeing where the problem wasn't a case of implementation errors. rick jones Wisdom teeth are impacted, people are affected by the effects of events From sboone at cs.hmc.edu Fri Jul 1 08:45:20 2005 From: sboone at cs.hmc.edu (Scott Boone) Date: Fri, 1 Jul 2005 08:45:20 -0700 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C54DF7.2030501@dirtcheapemail.com> References: <20050626001318.CB7A424D@aland.bbn.com> <42BEE8D7.D2FFC9E6@attglobal.net> <42BF2738.AD082D57@web.de> <42BF814A.C4948522@attglobal.net> <42BFDB1C.2010009@reed.com> <42C473A9.FEE67D3A@attglobal.net> <42C54DF7.2030501@dirtcheapemail.com> Message-ID: <47ECA8C4-B200-4840-9173-9646A3E3F46A@cs.hmc.edu> Let's play 'find the active verb': -Scott On 1 Jul 2005, at 7:06 AM, Clark Gaylord wrote: > Cannara wrote: > >> 1) Indeed "hardware corruption can be detected by software", which >> is what >> MIBs report to us via net mgmnt systems, and all metro >> distribution systems >> ^^^^^^ >> > MIBs don't detect, they report. > > --ckg > > From mdalal at cisco.com Fri Jul 1 09:20:10 2005 From: mdalal at cisco.com (Mitesh Dalal) Date: Fri, 1 Jul 2005 09:20:10 -0700 (PDT) Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C441E8.3080806@isi.edu> References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> Message-ID: On Thu, 30 Jun 2005, Joe Touch wrote: > > > RJ Atkinson wrote: > > > > On Jun 27, 2005, at 15:52, Tapan Karwa wrote: > > > >> I am wondering if there is any consensus on how we > >> should deal with the problem mentioned in Section 4.1 > >> of RFC 2385. > > > > I don't think this is a significant issue in real world deployments. > > TCP MD5 is designed to prevent acceptance of unauthenticated TCP RST > > message to reduce risk of (D)DOS attacks on the TCP sessions of BGP. > > An adversary could send an unauthenticated RST anytime. If that took > > out BGP, such would be a much larger operational problem. > > > > In practice, if the first (i.e. unauthenticated) RST is ignored, the > > router will send another RST a bit later on (e.g. after it is rebooted > > sufficiently to know which MD5 key to use) and that one WILL be > > authenticated and will be accepted rather than ignored. > > > > So it should sort itself out without any spec changes, just taking > > a time period closer to the reboot-time of the router that is > > rebooting rather than some small fraction of that time. No real > > harm done with the current situation at all. > > > > Ran > > rja at extremenetworks.com > > Agreed. > > Another point along these lines - if you had a secure connection with > another host, then the host reboots and 'forgets' the security > altogether (i.e., doesn't reestablish keys), it shouldn't be able to > reset the old connection anyway. and why would that be Joe ? By saying so you have no love for network reliability. Do you know networks can go down if MD5 enabled LDP connection cannot recover from this problem and rely on timeouts to recover ? Do you know the same thing can happen to BGP ? Security shouldnt come at the cost of reliablity! Mitesh From touch at ISI.EDU Fri Jul 1 09:29:43 2005 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Jul 2005 09:29:43 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> Message-ID: <42C56F77.6020604@isi.edu> Mitesh Dalal wrote: ... >>Another point along these lines - if you had a secure connection with >>another host, then the host reboots and 'forgets' the security >>altogether (i.e., doesn't reestablish keys), it shouldn't be able to >>reset the old connection anyway. > > and why would that be Joe ? By saying so you have no love for network > reliability. Do you know networks can go down if MD5 enabled LDP > connection cannot recover from this problem and rely on timeouts > to recover ? Do you know the same thing can happen to BGP ? > Security shouldnt come at the cost of reliablity! New keys should - as I noted later in my post - flush state associated with old keys. Lacking new keys, old state does no harm, since new connections shouldn't occur. Recovering from a problem doesn't mean leaving your doors unlocked. Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050701/602e4a0e/signature-0001.bin From rja at extremenetworks.com Fri Jul 1 10:12:27 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Fri, 1 Jul 2005 13:12:27 -0400 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> Message-ID: <5CB744DF-844A-49C5-8E68-751134C7CBCA@extremenetworks.com> On Jul 1, 2005, at 12:20, Mitesh Dalal wrote: > On Thu, 30 Jun 2005, Joe Touch wrote: >> Another point along these lines - if you had a secure connection with >> another host, then the host reboots and 'forgets' the security >> altogether (i.e., doesn't reestablish keys), it shouldn't be able to >> reset the old connection anyway. >> > > and why would that be Joe ? By saying so you have no love for network > reliability. Do you know networks can go down if MD5 enabled LDP > connection cannot recover from this problem and rely on timeouts > to recover ? Do you know the same thing can happen to BGP ? > Security shouldnt come at the cost of reliablity! > > Mitesh Mitesh, I think the point is that if one wants a reliable network, one should deploy BGP implementations that will not forget the security state across a reboot. Operating with security turned off is a recipe for intrusions that cause reliability problems (for reasons explained earlier in this thread). Cheers, Ran rja at extremenetworks.com From touch at ISI.EDU Fri Jul 1 10:15:33 2005 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Jul 2005 10:15:33 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: References: <42C441E8.3080806@isi.edu> Message-ID: <42C57A35.8000906@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 RJ Atkinson wrote: > > On Jun 30, 2005, at 15:03, Joe Touch wrote: > >> It does suggest, however, that if new keys are used on both sides, then >> both sides ought to flush their connections entirely (i.e., drop all >> TCBs using old keys). This affects TCP/MD5 keying, but that's not >> automatically managed, though. > > I would normally expect that if a reboot triggered rekeying, > then some form of automated key management would be in place. Agreed. > In practice, current deployments change keys roughly never. As you note below, in practice deployments don't even use keys ;-) > Along that line, it should be quite practical for someone to > write a "TCP MD5 Domain of Interpretation" specification to > permit the existing ISAKMP/IKE protocol to be used for this > purpose. Agreed, however it's even easier to configure IKE to setup a transport association between BGP peers (which, as below, I presume you are referring to). Joe > In practice, most current implementations of "TCP MD5 for BGP" only > support one key per remote-peer at a time, which is a challange > if one wants to have smooth key rollover (whether manual or > via some automated key management). This is just an > implementation issue; nothing prevents supporting more than one > key at a time. > > Actually, the bit I find most surprising is that the majority of > deployed BGP sessions (including the majority of e-BGP sessions) > run without even enabling TCP MD5. > > Given that folks generally don't deploy TCP MD5 to protect against > basic attacks (e.g. TCP RST attacks or TCP session stealing), > I don't see why one would think that some form of authentication > enhancement within the BGP protocol itself would have rapid or > widespread deployment.[1] > > Ran > rja at extremenetworks.com > > [1] My non-scientific sample of network operators can't find anyone > who thinks Kent's S-BGP is deployable. Most think that SO-BGP is > deployable, but would be challenging to deploy, and are hoping for > something more deployable than either one of those two. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxXo1E5f5cImnZrsRAsX8AKDiSEjhX0AyXgyDALhcA+XXCb0IoACcC8jn 8HzwfrPHRKkpk/wJQwn8Gz8= =TyIA -----END PGP SIGNATURE----- From mdalal at cisco.com Fri Jul 1 11:05:32 2005 From: mdalal at cisco.com (Mitesh Dalal) Date: Fri, 1 Jul 2005 11:05:32 -0700 (PDT) Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C56F77.6020604@isi.edu> References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> <42C56F77.6020604@isi.edu> Message-ID: On Fri, 1 Jul 2005, Joe Touch wrote: > > > Mitesh Dalal wrote: > ... > >>Another point along these lines - if you had a secure connection with > >>another host, then the host reboots and 'forgets' the security > >>altogether (i.e., doesn't reestablish keys), it shouldn't be able to > >>reset the old connection anyway. > > > > and why would that be Joe ? By saying so you have no love for network > > reliability. Do you know networks can go down if MD5 enabled LDP > > connection cannot recover from this problem and rely on timeouts > > to recover ? Do you know the same thing can happen to BGP ? > > Security shouldnt come at the cost of reliablity! > > New keys should - as I noted later in my post - flush state associated > with old keys. Lacking new keys, old state does no harm, since new > connections shouldn't occur. > what we are discussing is how fast can we detect a stale connections to a rebooted host. New keys come into picture only if the host is up. For TCP MD5 scenarios we dont change keys ever. > Recovering from a problem doesn't mean leaving your doors unlocked. yes, so lets use a combination lock, the owner does not have to carry a key around (and potentially loose it) and instead simply remember the right combination (hint:tcpsecure) to gain access :) Mitesh From tapankarwa at yahoo.com Fri Jul 1 11:10:09 2005 From: tapankarwa at yahoo.com (Tapan Karwa) Date: Fri, 1 Jul 2005 11:10:09 -0700 (PDT) Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C57A35.8000906@isi.edu> Message-ID: <20050701181009.92903.qmail@web53705.mail.yahoo.com> I would like to point out this simple scenario: Consider XX and YY as BGP peers. They have a TCP connection between them, lets say, with port numbers 179 on XX and 65001 on YY (lets say YY initiated the connection). Lets say YY goes down. XX doesnt know about it and keeps sending keepalives to YY on port 65001. While all this is happening, YY comes up and it will try to make a new connection with XX (port 179) again. Lets say YY chooses 65002 as its port this time. Even if MD5 was being used for both connections, the new connection might work fine, depending on how you implement it. But, for each retransmission that XX sends to port 65001, YY will keep sending RSTs back to XX. And these RSTs wont have MD5 since YY does not have anyone listening on port 65001. Standard TCP on XX will retry 12 times before it finally gives up. Even if we assume that the whole world is using MD5 (thereby not worrying about the security issue as much), XX will not honor the RSTs from YY since they dont have the MD5 digest. And this behaviour will be independent of keys i.e. will happen whether you use new keys or old keys. The old connection will stay for atleast 8 minutes until XX is done retransmitting and finally gives up on the old connection. --- Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > RJ Atkinson wrote: > > > > On Jun 30, 2005, at 15:03, Joe Touch wrote: > > > >> It does suggest, however, that if new keys are > used on both sides, then > >> both sides ought to flush their connections > entirely (i.e., drop all > >> TCBs using old keys). This affects TCP/MD5 > keying, but that's not > >> automatically managed, though. > > > > I would normally expect that if a reboot triggered > rekeying, > > then some form of automated key management would > be in place. > > Agreed. > > > In practice, current deployments change keys > roughly never. > > As you note below, in practice deployments don't > even use keys ;-) > > > Along that line, it should be quite practical for > someone to > > write a "TCP MD5 Domain of Interpretation" > specification to > > permit the existing ISAKMP/IKE protocol to be used > for this > > purpose. > > Agreed, however it's even easier to configure IKE to > setup a transport > association between BGP peers (which, as below, I > presume you are > referring to). > > Joe > > > In practice, most current implementations of "TCP > MD5 for BGP" only > > support one key per remote-peer at a time, which > is a challange > > if one wants to have smooth key rollover (whether > manual or > > via some automated key management). This is just > an > > implementation issue; nothing prevents supporting > more than one > > key at a time. > > > > Actually, the bit I find most surprising is that > the majority of > > deployed BGP sessions (including the majority of > e-BGP sessions) > > run without even enabling TCP MD5. > > > > Given that folks generally don't deploy TCP MD5 to > protect against > > basic attacks (e.g. TCP RST attacks or TCP session > stealing), > > I don't see why one would think that some form of > authentication > > enhancement within the BGP protocol itself would > have rapid or > > widespread deployment.[1] > > > > Ran > > rja at extremenetworks.com > > > > [1] My non-scientific sample of network operators > can't find anyone > > who thinks Kent's S-BGP is deployable. Most think > that SO-BGP is > > deployable, but would be challenging to deploy, > and are hoping for > > something more deployable than either one of those > two. > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (MingW32) > Comment: Using GnuPG with Thunderbird - > http://enigmail.mozdev.org > > iD8DBQFCxXo1E5f5cImnZrsRAsX8AKDiSEjhX0AyXgyDALhcA+XXCb0IoACcC8jn > 8HzwfrPHRKkpk/wJQwn8Gz8= > =TyIA > -----END PGP SIGNATURE----- > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From mdalal at cisco.com Fri Jul 1 11:12:07 2005 From: mdalal at cisco.com (Mitesh Dalal) Date: Fri, 1 Jul 2005 11:12:07 -0700 (PDT) Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <5CB744DF-844A-49C5-8E68-751134C7CBCA@extremenetworks.com> References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> <5CB744DF-844A-49C5-8E68-751134C7CBCA@extremenetworks.com> Message-ID: On Fri, 1 Jul 2005, RJ Atkinson wrote: > > On Jul 1, 2005, at 12:20, Mitesh Dalal wrote: > > On Thu, 30 Jun 2005, Joe Touch wrote: > >> Another point along these lines - if you had a secure connection with > >> another host, then the host reboots and 'forgets' the security > >> altogether (i.e., doesn't reestablish keys), it shouldn't be able to > >> reset the old connection anyway. > >> > > > > and why would that be Joe ? By saying so you have no love for network > > reliability. Do you know networks can go down if MD5 enabled LDP > > connection cannot recover from this problem and rely on timeouts > > to recover ? Do you know the same thing can happen to BGP ? > > Security shouldnt come at the cost of reliablity! > > > > Mitesh > > Mitesh, > > I think the point is that if one wants a reliable network, > one should deploy BGP implementations that will not forget the > security state across a reboot. Operating with security turned > off is a recipe for intrusions that cause reliability problems agreed. It wasnt my intention to turn off security by any means. What I am disagreeing with is Joe assertion that its ok for a rebooted legit host to not be able to inform the other end of a stale connection thereby making RSTs useless as they were meant to be in RFC793 (i.e to sync stale connections). What we need is tcpsecure which provides reasonable protection and very fast detection. Mitesh From touch at ISI.EDU Fri Jul 1 11:18:41 2005 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Jul 2005 11:18:41 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <20050701181009.92903.qmail@web53705.mail.yahoo.com> References: <20050701181009.92903.qmail@web53705.mail.yahoo.com> Message-ID: <42C58901.2090007@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tapan Karwa wrote: > I would like to point out this simple scenario: > > Consider XX and YY as BGP peers. They have a TCP > connection between them, lets say, with port numbers > 179 on XX and 65001 on YY (lets say YY initiated the > connection). > > Lets say YY goes down. XX doesnt know about it and > keeps sending keepalives to YY on port 65001. > > While all this is happening, YY comes up and it will > try to make a new connection with XX (port 179) again. > Lets say YY chooses 65002 as its port this time. > > Even if MD5 was being used for both connections, the > new connection might work fine, depending on how you > implement it. But, for each retransmission that XX > sends to port 65001, YY will keep sending RSTs back to > XX. And these RSTs wont have MD5 since YY does not > have anyone listening on port 65001. > > Standard TCP on XX will retry 12 times before it > finally gives up. TCP doesn't focus on cleaning up old state. This should happen just fine in background. This seems like a problem only if XX and YY try to reuse their respective port pair, but in that case YY owuld have someone listening on 65001 and would issue the RST with the right key. > Even if we assume that the whole world is using MD5 > (thereby not worrying about the security issue as > much), XX will not honor the RSTs from YY since they > dont have the MD5 digest. > > And this behaviour will be independent of keys i.e. > will happen whether you use new keys or old keys. The > old connection will stay for atleast 8 minutes until > XX is done retransmitting and finally gives up on the > old connection. > > --- Joe Touch wrote: > > > > > RJ Atkinson wrote: > >>On Jun 30, 2005, at 15:03, Joe Touch wrote: > > >>>It does suggest, however, that if new keys are > > used on both sides, then > >>>both sides ought to flush their connections > > entirely (i.e., drop all > >>>TCBs using old keys). This affects TCP/MD5 > > keying, but that's not > >>>automatically managed, though. > >>I would normally expect that if a reboot triggered > > rekeying, > >>then some form of automated key management would > > be in place. > > Agreed. > > >>In practice, current deployments change keys > > roughly never. > > As you note below, in practice deployments don't > even use keys ;-) > > >>Along that line, it should be quite practical for > > someone to > >>write a "TCP MD5 Domain of Interpretation" > > specification to > >>permit the existing ISAKMP/IKE protocol to be used > > for this > >>purpose. > > Agreed, however it's even easier to configure IKE to > setup a transport > association between BGP peers (which, as below, I > presume you are > referring to). > > Joe > > >>In practice, most current implementations of "TCP > > MD5 for BGP" only > >>support one key per remote-peer at a time, which > > is a challange > >>if one wants to have smooth key rollover (whether > > manual or > >>via some automated key management). This is just > > an > >>implementation issue; nothing prevents supporting > > more than one > >>key at a time. > >>Actually, the bit I find most surprising is that > > the majority of > >>deployed BGP sessions (including the majority of > > e-BGP sessions) > >>run without even enabling TCP MD5. > >>Given that folks generally don't deploy TCP MD5 to > > protect against > >>basic attacks (e.g. TCP RST attacks or TCP session > > stealing), > >>I don't see why one would think that some form of > > authentication > >>enhancement within the BGP protocol itself would > > have rapid or > >>widespread deployment.[1] > >>Ran >>rja at extremenetworks.com > >>[1] My non-scientific sample of network operators > > can't find anyone > >>who thinks Kent's S-BGP is deployable. Most think > > that SO-BGP is > >>deployable, but would be challenging to deploy, > > and are hoping for > >>something more deployable than either one of those > > two. > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxYkBE5f5cImnZrsRAknHAJ0aeMDFZpgFy4YVwrqVYHAnTiW9tgCg3S+H 0Hj4q4ZiXMZEMT9UU5tpc20= =aTtv -----END PGP SIGNATURE----- From touch at ISI.EDU Fri Jul 1 11:22:09 2005 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Jul 2005 11:22:09 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> <42C56F77.6020604@isi.edu> Message-ID: <42C589D1.10808@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mitesh Dalal wrote: > > On Fri, 1 Jul 2005, Joe Touch wrote: > > >> >>Mitesh Dalal wrote: >>... >> >>>>Another point along these lines - if you had a secure connection with >>>>another host, then the host reboots and 'forgets' the security >>>>altogether (i.e., doesn't reestablish keys), it shouldn't be able to >>>>reset the old connection anyway. >>> >>>and why would that be Joe ? By saying so you have no love for network >>>reliability. Do you know networks can go down if MD5 enabled LDP >>>connection cannot recover from this problem and rely on timeouts >>>to recover ? Do you know the same thing can happen to BGP ? >>>Security shouldnt come at the cost of reliablity! >> >>New keys should - as I noted later in my post - flush state associated >>with old keys. Lacking new keys, old state does no harm, since new >>connections shouldn't occur. > > what we are discussing is how fast can we detect a stale connections > to a rebooted host. New keys come into picture only if the host is up. > For TCP MD5 scenarios we dont change keys ever. Are you worried about establishing new connections or detecting and cleaning up old ones? The former seems like it already works as per RFC2385; the latter isn't an issue for TCP, as there's no focus on cleaning up state to be space efficient until such state affects new connections. >>Recovering from a problem doesn't mean leaving your doors unlocked. > > yes, so lets use a combination lock, the owner does not have to carry > a key around (and potentially loose it) and instead simply remember > the right combination (hint:tcpsecure) to gain access :) I should need to have the key to gain access via a new connection - whether it involves cleaning up the old state or not. There's no utility to just cleaning up old state per se unless the keys change, and we agree that's not the issue here. Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxYnRE5f5cImnZrsRAjNWAKD6ORU5KoUfMcEUAr9sXulyjTB/aQCgoitE j7YHDHbqvJ4LKRRkP2al/ME= =BHHh -----END PGP SIGNATURE----- From tapankarwa at yahoo.com Fri Jul 1 11:45:41 2005 From: tapankarwa at yahoo.com (Tapan Karwa) Date: Fri, 1 Jul 2005 11:45:41 -0700 (PDT) Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C58901.2090007@isi.edu> Message-ID: <20050701184541.14796.qmail@web53704.mail.yahoo.com> > TCP doesn't focus on cleaning up old state. This > should happen just fine in background. Consider the 2 cases: 1) NOT using TCP-MD5 for BGP. 2) Using TCP-MD5 for BGP. If I were "not" using MD5 and YY reboots, comes up and chooses a different port (65002), XX would not know that YY has rebooted and it would continue to send a segment on the old connection i.e. on port 65001 to YY. YY would respond with an RST and XX would happily accept it and close the old connection. This is because segments in either direction dont need to have the MD5 digest and so the RST from YY is valid for XX and it will accept it. So, this is the case when I am not using MD5 and things work fine even when YY reboots. The problem case is when I "am" using MD5 and YY reboots and comes up again. XX doesnt know about it and XX sends segments "with" the MD5 digest and YY responds with RSTs "without" the MD5 digest. Thats when the old connection will stick around until XX has tried 12 retransmissions since its going to ignore the RSTs without the MD5 digest from YY. The RFC for TCP-MD5 says this is a problem but does not recommend any solution. Maybe its ok to let the old connection stick around until XX is done retransmitting and gives up. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From touch at ISI.EDU Fri Jul 1 12:49:10 2005 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Jul 2005 12:49:10 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <20050701184541.14796.qmail@web53704.mail.yahoo.com> References: <20050701184541.14796.qmail@web53704.mail.yahoo.com> Message-ID: <42C59E36.9020700@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tapan Karwa wrote: >>TCP doesn't focus on cleaning up old state. This >>should happen just fine in background. > > Consider the 2 cases: > 1) NOT using TCP-MD5 for BGP. > 2) Using TCP-MD5 for BGP. > > If I were "not" using MD5 and YY reboots, comes up and > chooses a different port (65002), XX would not know > that YY has rebooted and it would continue to send a > segment on the old connection i.e. on port 65001 to > YY. YY would respond with an RST and XX would happily > accept it and close the old connection. This is > because segments in either direction dont need to have > the MD5 digest and so the RST from YY is valid for XX > and it will accept it. So, this is the case when I am > not using MD5 and things work fine even when YY > reboots. > > The problem case is when I "am" using MD5 and YY > reboots and comes up again. XX doesnt know about it > and XX sends segments "with" the MD5 digest and YY > responds with RSTs "without" the MD5 digest. Thats > when the old connection will stick around until XX has > tried 12 retransmissions since its going to ignore the > RSTs without the MD5 digest from YY. The RFC for > TCP-MD5 says this is a problem but does not recommend > any solution. Yes, but there are two subcases: 2a) YY wants to use the original port, i.e., there is a conflict with the ports used by the old connection state that XX retains 2b) YY wants to use a new port In 2a, YY will be listening on the original port, and will repond with RSTs with the MD5 digest, and the state will be cleaned up and ready to use quickly. In 2b, the state will take a few RTTs to clean up - but in the meantime, new connections will work just fine. I.e., overall, all we appear to be worried about is the old connection state on XX. While I appreciate that BGP takes cues on connectivity from TCP connection state, that is an error of BGP IMO - TCP is well-defined not to clean up old state until such state interferes with new connections. Joe > > Maybe its ok to let the old connection stick around > until XX is done retransmitting and gives up. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCxZ42E5f5cImnZrsRAvp/AJ9guSqow+Tvx0iiAv9G5HjEFD+IRQCg0Y09 PEv/Brdu2MthvlIJ9zlibDc= =s+l6 -----END PGP SIGNATURE----- From cannara at attglobal.net Fri Jul 1 13:42:21 2005 From: cannara at attglobal.net (Cannara) Date: Fri, 01 Jul 2005 13:42:21 -0700 Subject: [e2e] Reacting to corruption based loss References: <20050626001318.CB7A424D@aland.bbn.com> <42BEE8D7.D2FFC9E6@attglobal.net> <42BF2738.AD082D57@web.de> <42BF814A.C4948522@attglobal.net> <42BFDB1C.2010009@reed.com> <42C473A9.FEE67D3A@attglobal.net> <42C54DF7.2030501@dirtcheapemail.com> Message-ID: <42C5AAAD.45E315AB@attglobal.net> Ok, Clark, I see we have a problem with system understanding going on here. MIBs are indeed repositories. How does their content get created? Firmware/software, written by the folks who built the box/interface/chip. Fir instance, we've been able to buy a complete web server inside an RJ45 module, to solder onto a board, for a couple of years now. How do you think its MIB content about errors on its Ethernet interface gets into its memory? You know how it gets out, of course. Alex Clark Gaylord wrote: > > Cannara wrote: > > >1) Indeed "hardware corruption can be detected by software", which is what > >MIBs report to us via net mgmnt systems, and all metro distribution systems > > > > > MIBs don't detect, they report. > > --ckg From cannara at attglobal.net Fri Jul 1 14:18:15 2005 From: cannara at attglobal.net (Cannara) Date: Fri, 01 Jul 2005 14:18:15 -0700 Subject: [e2e] Reacting to corruption based loss References: <20050701010904.EDJ00828@alpo.mirapoint.com> Message-ID: <42C5B317.23D89C88@attglobal.net> Sam, the issue isn't that I'm "suggesting that TCP/IP is fundamentally flawed as a layer3/4 team and think that a replacement of the protocol is in order". It's that the bigger, elephant-sized issue, alluded to by Cerf himself, has been for years that protocol development for the Internet stopped short. Surely, if you're a corporate CIO and your employees have to spend 30% more time, 8 hours a day passing their data around, you'd be concerned. I mean, if you could get the 30% back that your own experiment showed was lost unnecessarily, then you could do your CIO job better and lay off some folks. :] The fact that our apps must use a transport that's brain dead about whether to slow down because losses are errors or congestion creates an unnecessary inefficiency that could have been resolved years ago, and now requires more effort to move the established bureaucracy and installed base. Of course, we can write our own transports (and L3s), as some have, particularly when RTTs are very large, as the space program has done for decades. Your bottom line comment on your experiment's 30% slowdown of it "didn't suck too much", illustrates the problem. Any protocol stack (Atalk, Netware, XNS, SNA, DECnet, Vines...) could have been used to get the same, or better, result. That's a pretty low bar to set for something as important as the Internet. I'd not feel good about it, if it had been my responsibility to continue TCP/IP protocol work, even given its non-competitive subsidy. After all, was there ever a bakeoff with other development results? No. TCP/IP development stagnated, yet it was subsidized around the world by free distribution with almost every OS and box being shipped. Can't beat that marketing. But, that marketing, as we know with uSoft, inevitably leads to mediocrity. The other side of the current state of mediocrity is the amazing lack of installation control. This is apart from the umpteen security flaws built into the Internet protocols from the start, costing us billions to ameliorate. Release control is the more fundamental, bigger elephant, because sloppiness in its realm leads to all manner of problems, inevitably expensive to us all. My point is that there's opportunity in all these issues to do better. There's been that opportunity for years. A bureaucacy formed long ago that thwarts addressing it. It'd be great to see folks engage the opportunity and make progress. Alex Sam Manthorpe wrote: > > Hi Alex, > > Sorry for the delay in replying... > > ---- Original message ---- > >Date: Wed, 29 Jun 2005 11:47:04 -0700 > >From: Cannara > >Subject: Re: [e2e] Reacting to corruption based loss > >To: end2end-interest at postel.org > > > >Good response Sam. The kind that leads to more thought, in fact. > > > >How many years ago for the 1st example, you ask. For that one, 6. For the > >one this year, 0.25 year. :] > > > >You say you "spent a long time simulating the BSD stack". That's great, and > >part of the problem. Folks do simulations which are based on code written to > >simulate someone's ideas on how something works. Then, they believe the > >simulations, despite what's actually seen in reality. We all know that > >simulators and their use can be very limited in relevance, if not accuracy. > > Yes I know, which was why I qualified my observation with my > practical experience as well. I keep seeing these rathole > threads one e2e and, to my shame, dipped my ignorant toe in :-) > I think for the benefit of e2e part-time readers as myself, a synopsis > of the actual problem with TCP as it stands today for local and > global communication would be a good thing. Because I can't > perceive any. And let's not do the anectodal thing, I'm thinking > more of a cost-based analysis, including details of how much the > alleged problem is costing. > > >One of the biggest issues is lack of release control for things as important > >as Internet protocols (e.g., TCP). Thus the NT server may have a different > >version of TCP from that on the user's spanking new PC. No one ever addresses > >even the basics of stack parameter settings in their manuals, and network > >staffers rarely have the time to go in and check versions, timer settings, > >yadda, yadda. This is indeed why many performance problems occur. You fixed > >IRIX 6 years ago. Great. > > Um, it was a bug. I didn't understand the argument... > > > > >Now, why does the Internet work? Not simply because of TCP, for sure. Your > >experiment illustrates the rush to acceptance these points are raised > >against: > > > >"I transfered a largish file to my sluggish corporate ftp server. Took 77 > >seconds (over the Internet, from San Francisco to Sunnyvale). I then did the > >same thing, this time I unplugged my Ethernet cable 6 times, each time for 4 > >seconds. The transfer took 131 seconds." > > > >So, what is "largish" in more precise terms? What are the RTT and limiting > >bit-rate of your "Internet" path from SF to S'vale? > > As I said, it was "for fun". :-) > > >The file evidently went > >right by our house! But, despite the imprecision, we can use your result: 77 > >+ 6 x 4 = 101. Your transfer actually took 131 seconds, fully 30% more than > >one would expect on a link that's simply interrupted, not congested. Good > >experiment! > > But the relevant fact is that it worked. And didn't suck too much. > And I'm confident that it would still work and not suck too much > even if SBC replaced all their hardware with something that had a > hypothetical bug in the OS that made my biterror notification not > inform my transport layer that loss was due to congestion and not > link flakiness. Sure you could architect something that utilized > every spare bit on a link, but at what cost? And why? What's > the justification for all the added points-of-failure? > > Again, I don't follow this list much, but reading a few of your > postings, you seem to be suggesting that TCP/IP is fundamentally > flawed as a layer3/4 team and think that a replacement of the > protocol is in order. Do I understand you correctly? > > Cheers, > -- Sam > ------------------------ > Sam Manthorpe, Mirapoint From gds at best.com Fri Jul 1 16:34:40 2005 From: gds at best.com (Greg Skinner) Date: Fri, 1 Jul 2005 23:34:40 +0000 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C5B317.23D89C88@attglobal.net>; from cannara@attglobal.net on Fri, Jul 01, 2005 at 02:18:15PM -0700 References: <20050701010904.EDJ00828@alpo.mirapoint.com> <42C5B317.23D89C88@attglobal.net> Message-ID: <20050701233440.A31811@gds.best.vwh.net> On Fri, Jul 01, 2005 at 02:18:15PM -0700, Cannara wrote: > Sam, the issue isn't that I'm "suggesting that TCP/IP is > fundamentally flawed as a layer3/4 team and think that a replacement > of the protocol is in order". It's that the bigger, elephant-sized > issue, alluded to by Cerf himself, has been for years that protocol > development for the Internet stopped short. It seems to me that there's still quite a bit of Internet protocol development. It is not the fault of the developers that these new and/or improved protocols don't get widespread deployment. > I'd not feel good about it, if it had been my responsibility to > continue TCP/IP protocol work, even given its non-competitive > subsidy. After all, was there ever a bakeoff with other development > results? No. TCP/IP development stagnated, yet it was subsidized > around the world by free distribution with almost every OS and box > being shipped. Can't beat that marketing. But, that marketing, as > we know with uSoft, inevitably leads to mediocrity. Again, this is not the fault of the developers of new and/or improved protocols. > My point is that there's opportunity in all these issues to do > better. There's been that opportunity for years. A bureaucacy > formed long ago that thwarts addressing it. It'd be great to see > folks engage the opportunity and make progress. Many people are doing this. The question is beyond the community of developers and other interested parties, are there enough people who wish to use these protocols that they can gain widespread deployment? --gregbo From cannara at attglobal.net Sat Jul 2 22:43:08 2005 From: cannara at attglobal.net (Cannara) Date: Sat, 02 Jul 2005 22:43:08 -0700 Subject: [e2e] Reacting to corruption based loss References: <20050701010904.EDJ00828@alpo.mirapoint.com> <42C5B317.23D89C88@attglobal.net> <20050701233440.A31811@gds.best.vwh.net> Message-ID: <42C77AEC.12C25A3D@attglobal.net> Yes Greg, I agree. The organized effort to continue/coordinate work and end up with deployed, managed protocols is the problem. Alex Greg Skinner wrote: > > On Fri, Jul 01, 2005 at 02:18:15PM -0700, Cannara wrote: > > Sam, the issue isn't that I'm "suggesting that TCP/IP is > > fundamentally flawed as a layer3/4 team and think that a replacement > > of the protocol is in order". It's that the bigger, elephant-sized > > issue, alluded to by Cerf himself, has been for years that protocol > > development for the Internet stopped short. > > It seems to me that there's still quite a bit of Internet protocol > development. It is not the fault of the developers that these new > and/or improved protocols don't get widespread deployment. > > > I'd not feel good about it, if it had been my responsibility to > > continue TCP/IP protocol work, even given its non-competitive > > subsidy. After all, was there ever a bakeoff with other development > > results? No. TCP/IP development stagnated, yet it was subsidized > > around the world by free distribution with almost every OS and box > > being shipped. Can't beat that marketing. But, that marketing, as > > we know with uSoft, inevitably leads to mediocrity. > > Again, this is not the fault of the developers of new and/or improved > protocols. > > > My point is that there's opportunity in all these issues to do > > better. There's been that opportunity for years. A bureaucacy > > formed long ago that thwarts addressing it. It'd be great to see > > folks engage the opportunity and make progress. > > Many people are doing this. The question is beyond the community of > developers and other interested parties, are there enough people who > wish to use these protocols that they can gain widespread deployment? > > --gregbo From detlef.bosau at web.de Sun Jul 3 05:16:37 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 03 Jul 2005 14:16:37 +0200 Subject: [e2e] Reacting to corruption based loss References: <200506062104.j56L4tKA067656@cougar.icir.org> <1118128952.4771.23camel@lap10-c703.uibk.ac.at><20050607111809.GA1970@grc.nasa.gov> Message-ID: <42C7D725.7DE1D1C@web.de> (I apologize when this is received twice, but apparently it was not sent properly the first time) On Tue, 7 Jun 2005 07:18:09 -0400, Wesley Eddy wrote > > This idea is sort of discussed in the ETEN paper Craig sent a link to > earlier. One approach that it describes (CETEN_A) adapts beta between > 1/2 and 1 based on the rate of congestion events reported. In the > October 2004 CCR, there is a paper that goes into greater depth on > CETEN; "New Techniques for Making Transport Protocols Robust to > Corruption-Based Loss" by Eddy, Ostermann, and Allman. I think, this range for beta illustrates the problem quite well. In cases of low corruption rates, the so modified TCP congestion control will behave quite similar to what we know from, e.g. TCP/Reno etc. The mentioned approach sets beta to (1+e/p)/2, where p = e+c. e is the corruption loss rate and c the congestion loss rate. However, with large error rates (0.8, 0.9 or even close to 1) the so modified beta will practically suspend the congestion control mechanism. Of course, this may increase the network load along the path and will thus in turn increase congestion drop and therefore lower beta again. I?m not quite sure about the resulting dynamics. For the moment, I consider the mechanism using error rates as 0.1 or so, which Wesley mentioned some days ago. In that case, let?s take for granted that anything is just fine. My question is: How dow we deal with _high_ corruption rates like 0.8 or 0.9, typically met in mobile wireless networks? In other words: It?s again my question whether there exists a general consensus on whether we should attempt to handle even those large error rates e2e? Or whether local recovery / PEP etc. should be used? Besides the pure congestion dynamics, we should in addition think of possible network load caused by e2e retransmission. Perhas, the question is an old one. But for me, it?s simply of interest to know the "common position" here: Should we treat path with corruption loss larger than, say, 80 % with e2e-means, e.g. CETEN? Or should we, in case those corruption loss occurs on a mobile access line to the Internet, make use of proxies here? I?m just curious and unsafe about the "common" position. -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From Jon.Crowcroft at cl.cam.ac.uk Sun Jul 3 05:48:20 2005 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sun, 03 Jul 2005 13:48:20 +0100 Subject: [e2e] Reacting to corruption based loss Message-ID: I am glad that e2e is discussing this so thoroughly in the light of Live8 Its clear that corruption based loss is the biggest problem facing the G8 in trying to make poverty history in africa, and some of the finest minds in the planet are working on it jon. p.s. thoise of you who saw the London Hyde Park part of the event will have seen that Bill Gates was onside too, and so we can be assured that Windows will feature solutions soon too. From iam4 at cs.waikato.ac.nz Sun Jul 3 15:10:00 2005 From: iam4 at cs.waikato.ac.nz (Ian McDonald) Date: Mon, 04 Jul 2005 10:10:00 +1200 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C7D725.7DE1D1C@web.de> References: <200506062104.j56L4tKA067656@cougar.icir.org> <1118128952.4771.23camel@lap10-c703.uibk.ac.at><20050607111809.GA1970@grc.nasa.gov> <42C7D725.7DE1D1C@web.de> Message-ID: <42C86238.8010207@cs.waikato.ac.nz> > > Should we treat path with corruption loss larger than, say, 80 % with > e2e-means, e.g. CETEN? Or should we, in case those corruption loss > occurs on a mobile access line to the Internet, make use of proxies > here? > > I?m just curious and unsafe about the "common" position. My personal opinion, for what it is worth, with that much loss you definitely do not want to run Reno. I would use Westwood as a bare minimum and if I had control of both ends I would consider customising the IP stack. Ian McDonald WAND Network Research http://www.wand.net.nz From detlef.bosau at web.de Sun Jul 3 15:18:03 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 04 Jul 2005 00:18:03 +0200 Subject: [e2e] TCP Spoofing and Path Tail Emulation Message-ID: <42C8641B.3000402@web.de> Hi to all! I?m just reading Mark Allman?s paper "On the Performance of TCP Spoofing in Sattellite Networks" and have some questions. I explicitly say _questions_. As usual, it?s a large stack of paper to read and perhaps I missed some details. However, this work directly relates to my own Path Tail Emulation (PTE) approach and therefore I?m interested in this discussion. Perhaps, it may be somewhat suprising that I talked about PTE in the context of loss differentiation during the last days and now talk about satellite communication. In fact, that list is easily extended because some issues are basically the same. So, let?s start here with Martin?s post. However, for those of you who are not familiar with Path Tail Emulation (presumably most of you because I only published it on a German conference and now, I?m struggling with the CiteSeer for a few months to have it listed) I will give a reference here: @inproceedings{bosau, booktitle = "KiVS Kurzbeitr?ge und Workshop 2005", year = 2005, title = "{Path Tail Emulation: An Approach to Enable End--to--End Congestion Control for Split Connections and Performance Enhancing Proxies}", author = "D.~Bosau", address = "Kaiserslautern, Germany", pages = "33-40" } http://www.detlef-bosau.de/043.pdf > > [e2e] TCP spoofing in overlay networks > Martin Swany swany at cis.udel.edu > Wed Mar 2 10:45:16 PST 2005 > > * Previous message: [e2e] TCP spoofing in overlay networks > * Next message: [e2e] TCP spoofing in overlay networks > * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > Hi there, > >> I recently had occaision to read a few papers about the practice of >> "TCP spoofing" over satellite links---i.e inserting a proxy prior to >> the satellite link to provide TCP feedback to the sender, effectively >> splitting into two TCP sessions connected in tandem. I was wondering >> if anyone had ever proposed a similar idea to improve TCP throughput >> in overlay networks over terestrial links. > > We proposed such an approach in terms of an E2E "session" layer. The > 2001 > tech report (and some newer information) is available here: > http://www.cis.udel.edu/~swany/projects/lsl > > regards, > martin > The interesting observation is that TCP spoofing is used in the context of - grid computing (Martin Swany?s paper), - satellite networking (Mark Allman?s paper), - wireless Internet access (I-TCP by Bake, Badrinath, M-TCP by Brown, Singh)..... The pros and cons of each application are beyond the scope of this mail. In each case, TCP spoofing / using a PEP (both terms are used as synonyms) aims to hide the properties of a network _Path_ _Tail_ behind a PEP. E.g., in the particular case discussed by Mark Allman, it?s the transport latency introduced in a network path by a satellite link. Figure 1 in his paper illustrates the main effect of spoofing: The perceived RTT for the sender is shortened. However, I did not yet see how appropriate _rate_ control is achieved, i.e. the sender is made to send with a rate appropriate for the path. I presume, this is done using TCP flow control, as it is done in I-TCP. But please correct me if I?m wrong. Sections 3.1.1 and 3.1.2 in RFC 3135 are rather general here. Section 3.1.1 is quite related to what I suggest in my paper. However, it?s not my intention to smoothen out the effect of packet burst, but to make the path tail appear as a single, loss free segment, the bandwidth and capacity of which refer to the emulated path tail. In the particular case of a mobile wireless path tail, this is done using a smoothed estimate of the wireless paths actual throughput. (I?m currently working at the problem, how much smoothing I need.) However, the mechanism is not restricted to mobile networks but can be applied to any arbitrary PEP. E.g. in Mark Allman?s scenario, a throughput estimate could be gained from Padhye?s forumal in conjunction with the LEAST algorithm. As a perhaps somewhat extreme case, the path tail need not even be packet switched. Think of the Remote Socket Architecture (ReSoA) proposed by Schl"ager, Wolisz et al. ReSoA does not make any assumptions concerning the network technology behind the PEP. I suggest PTE as an extension to PEP, which is easily adapted to different network scenarios by providing an appropriate throughput estimate. And I would greatly appreciate some comments on this work. I think it is useful to discuss this mechanism, and PEP as well, in a rather general context, because the application of PEP / TCP spoofing is no way restricted to wireless networks. And the ongoing discussion on this matter makes clear, that there is an interest in PEP. For the end-to-end argument, which is often risen in this context, I refer to section 4 in RFC 3135. This is a discussion particular to the PEP in use. For the "trivial PEP" used as an example in my work, I basically follow the rationale given by Chakravorthy et al., please find the reference in my paper. DB -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Sun Jul 3 15:24:51 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 04 Jul 2005 00:24:51 +0200 Subject: [e2e] Reacting to corruption based loss In-Reply-To: <42C86238.8010207@cs.waikato.ac.nz> References: <200506062104.j56L4tKA067656@cougar.icir.org> <1118128952.4771.23camel@lap10-c703.uibk.ac.at><20050607111809.GA1970@grc.nasa.gov> <42C7D725.7DE1D1C@web.de> <42C86238.8010207@cs.waikato.ac.nz> Message-ID: <42C865B3.3070703@web.de> Ian McDonald wrote: >>Should we treat path with corruption loss larger than, say, 80 % with >>e2e-means, e.g. CETEN? Or should we, in case those corruption loss >>occurs on a mobile access line to the Internet, make use of proxies >>here? >> >>I?m just curious and unsafe about the "common" position. > > > My personal opinion, for what it is worth, with that much loss you > definitely do not want to run Reno. I would use Westwood as a bare > minimum and if I had control of both ends I would consider customising > the IP stack. > > Ian McDonald > WAND Network Research > http://www.wand.net.nz To my understandig, Westwood would alleviate the congestion control problem. (I?m just curious: Has CETEN been compared to Westwood?) But what about the high number of retransmissions? DB -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Mon Jul 4 00:39:34 2005 From: alokdube at hotpop.com (alok) Date: Mon, 04 Jul 2005 13:09:34 +0530 Subject: [e2e] Satellite networks latency and data corruption Message-ID: <42C8E7B6.2020702@hotpop.com> Hi, Is there any standard and are there any figures available on: a. latency in satellite based networks and losses associated? b. error correction techniques used (those which are used on a hop by hop bases, if any) c. a general measurement of latency in the absence of hop by hop error correction techniques (assuming zero message corruption). Is the general deployment on satellite based networks based on hop by hop error correction techniques? -thanks Alok From rja at extremenetworks.com Mon Jul 4 02:17:20 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Mon, 4 Jul 2005 05:17:20 -0400 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <42C57A35.8000906@isi.edu> References: <42C441E8.3080806@isi.edu> <42C57A35.8000906@isi.edu> Message-ID: <9660BA8C-2A2D-4665-A4E6-46F777925D0F@extremenetworks.com> On Jul 1, 2005, at 13:15, Joe Touch wrote: > RJ Atkinson wrote: >> Along that line, it should be quite practical for someone to >> write a "TCP MD5 Domain of Interpretation" specification to >> permit the existing ISAKMP/IKE protocol to be used for this >> purpose. >> > > Agreed, however it's even easier to configure IKE to setup a transport > association between BGP peers (which, as below, I presume you are > referring to). I am unclear what you mean by "setup a transport association". To be clear, I was referring to the prospective use of IKE to provide dynamic key management for the existing TCP MD5 authentication mechanism. As near as I can tell, the only thing missing is a "Domain of Interpretation" specification for how IKE is applied to TCP MD5. IKE is nicely modular in this way, so IKE can be extended in a straight-forward manner to things well beyond IPsec (which is the main reason I have felt all along that IKE should have been done in a different WG than the IPsec WG). Ran From rja at extremenetworks.com Mon Jul 4 02:31:21 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Mon, 4 Jul 2005 05:31:21 -0400 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: References: <20050627195202.11232.qmail@web53701.mail.yahoo.com> <504E230F-439F-4FF7-BA79-347362AE219F@extremenetworks.com> <42C441E8.3080806@isi.edu> <42C56F77.6020604@isi.edu> Message-ID: On Jul 1, 2005, at 14:05, Mitesh Dalal wrote: > what we are discussing is how fast can we detect a stale connections > to a rebooted host. New keys come into picture only if the host is up. > For TCP MD5 scenarios we dont change keys ever. Mitesh, Years of operational experience says that at present, using the existing specification, stale connections are detected more than sufficiently quickly for a rebooted BGP router. Deployment of TCP MD5 by ISPs preceded the RFC describing the mechanism, so the age of the RFC represents a lower bound, not an upper bound, on the duration of operational experience in this particular case. In short, you are worrying about something that is not actually a problem in an operational IP network using BGP with TCP MD5. > yes, so lets use a combination lock, the owner does not have to carry > a key around (and potentially loose it) and instead simply remember > the right combination (hint:tcpsecure) to gain access :) I'm sure that I don't understand the above paragraph. If you have some proposal for enhancing BGP, the right place to send that proposal is probably the IETF's IDR Working Group, though putting out an I-D with one's ideas is rarely a bad step to undertake. Cheers, Ran rja at extremenetworks.com From rja at extremenetworks.com Mon Jul 4 02:36:27 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Mon, 4 Jul 2005 05:36:27 -0400 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <20050701184541.14796.qmail@web53704.mail.yahoo.com> References: <20050701184541.14796.qmail@web53704.mail.yahoo.com> Message-ID: On Jul 1, 2005, at 14:45, Tapan Karwa wrote: > Maybe its ok to let the old connection stick around > until XX is done retransmitting and gives up. This is true in any event. Ran From touch at ISI.EDU Mon Jul 4 08:39:52 2005 From: touch at ISI.EDU (Joe Touch) Date: Mon, 04 Jul 2005 08:39:52 -0700 Subject: [e2e] Receiving RST on a MD5 TCP connection. In-Reply-To: <9660BA8C-2A2D-4665-A4E6-46F777925D0F@extremenetworks.com> References: <42C441E8.3080806@isi.edu> <42C57A35.8000906@isi.edu> <9660BA8C-2A2D-4665-A4E6-46F777925D0F@extremenetworks.com> Message-ID: <42C95848.7040904@isi.edu> RJ Atkinson wrote: > > On Jul 1, 2005, at 13:15, Joe Touch wrote: > >> RJ Atkinson wrote: >> >>> Along that line, it should be quite practical for someone to >>> write a "TCP MD5 Domain of Interpretation" specification to >>> permit the existing ISAKMP/IKE protocol to be used for this >>> purpose. >>> >> >> Agreed, however it's even easier to configure IKE to setup a transport >> association between BGP peers (which, as below, I presume you are >> referring to). > > I am unclear what you mean by "setup a transport association". Use IKE to setup a transport mode security association on TCP on the port used for BGP. > To be clear, I was referring to the prospective use of IKE to provide > dynamic key management for the existing TCP MD5 authentication mechanism. Which would be useful as well; I'd like to have IKE configure IPIP tunnels (not just IPsec), regular firewalls (not just IPsec SAs), and other keys as well. > As near as I can tell, the only thing missing is a "Domain of > Interpretation" specification for how IKE is applied to TCP MD5. > IKE is nicely modular in this way, so IKE can be extended in a > straight-forward manner to things well beyond IPsec (which is the > main reason I have felt all along that IKE should have been done > in a different WG than the IPsec WG). > > Ran Agreed. Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050704/09d5834b/signature.bin From sboone at cs.hmc.edu Mon Jul 4 09:54:22 2005 From: sboone at cs.hmc.edu (Scott Boone) Date: Mon, 4 Jul 2005 09:54:22 -0700 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: <42C8E7B6.2020702@hotpop.com> References: <42C8E7B6.2020702@hotpop.com> Message-ID: On 4 Jul 2005, at 12:39 AM, alok wrote: > Hi, > > Is there any standard and are there any figures available on: > > a. latency in satellite based networks and losses associated? Optimal latency can be calculated using transmission time based on speed of light, depending on how high up the satellite is. There is no loss associated with latency. If you mean loss associated with the wireless SATCOM links, that varies a lot from satellite to satellite based on antennae and beam design. > b. error correction techniques used (those which are used on a hop > by hop bases, if any) Note that these are mostly in context of dealing with TCP issues, but: See RFC 2488 for a brief discussion of FEC. See RFC 3366 for a discussion of Link ARQ > c. a general measurement of latency in the absence of hop by hop > error correction techniques (assuming zero message corruption). See my answer to (a). Scott From arjuna.sathiaseelan at gmail.com Mon Jul 4 13:20:27 2005 From: arjuna.sathiaseelan at gmail.com (Arjuna Sathiaseelan) Date: Mon, 4 Jul 2005 21:20:27 +0100 Subject: [e2e] Satellite networks latency and data corruption Message-ID: <1ef2259005070413209c4d261@mail.gmail.com> Dear Alok, The link delay in a satellite network depends on whether it is a geostationary orbit (GEO) or a lower earth orbit (LEO) network. The GEO satellite link has roughly a delay of 300 ms (one way). The LEO satellite link has a one way delay that varies from [40,400] ms depending on whether the LEO network has one satellite hop or multiple hops and how far each of these satellite hops are placed. For more details, u can refer this paper: T. R. Henderson, R.H. Katz, Transport Protocols for Internet-Compatible Satellite Networks, IEEE Journal on Selected Areas in Communications, Vol. 17, No. 2, pp: 345-359, February 1999. Regards, Arjuna From arjuna.sathiaseelan at gmail.com Mon Jul 4 13:45:06 2005 From: arjuna.sathiaseelan at gmail.com (Arjuna Sathiaseelan) Date: Mon, 4 Jul 2005 21:45:06 +0100 Subject: [e2e] Question regarding duplicate threshold values based on cwnd Message-ID: <1ef2259005070413457f5d57ea@mail.gmail.com> Dear All, I have been reading several papers on improving TCP performance in the presence of packet reordering and all of them say that when the packet is assumed to be reordered, the duplicate threshold is increased upto a cwnd of packets. But is it right to use cwnd as the maximum limit. Assume this scenario: In ns-2 implementations, the cwnd grows bigger than the receiver window..thus if the sender's cwnd is 50 and the receiver window is only 20, that means the sender can send only 20 packets..Thus if a packet has been reordered, the sender's dupthresh value would be 50 rather than waiting for the 20 packets. That means the sender is waiting for a large number of packets than it has sent. So would it be better to have the dupthresh values based on the sending window rather than the cwnd? I am not sure if I am making sense, but its just a question thats been lingering on my mind for a long time..Please pardon me if this question doesnt make any sense :). Regds, Arjuna From alokdube at hotpop.com Mon Jul 4 22:22:50 2005 From: alokdube at hotpop.com (alok) Date: Tue, 05 Jul 2005 10:52:50 +0530 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: <1ef2259005070413209c4d261@mail.gmail.com> References: <1ef2259005070413209c4d261@mail.gmail.com> Message-ID: <42CA192A.20200@hotpop.com> Hi, thanks for you response, Arjuna Sathiaseelan wrote: >Dear Alok, > >The link delay in a satellite network depends on whether it is a >geostationary orbit (GEO) or a lower earth orbit (LEO) network. The >GEO satellite link has roughly a delay of 300 ms (one way). The LEO >satellite link has a one way delay that varies from [40,400] ms >depending on whether the LEO network has one satellite hop or multiple >hops and >how far each of these satellite hops are placed. > > My question was specific to ARQ strategies and the latency induced due to the same. I understand that the media is highly error prone but was wondering what is the performance difference if I do : Host1-----Satellite--Satellite---Host2 | | Host3------+ +---------Host4 with a. no ACK/ARQ "per hop". versus b. only endhost to end host ARQs/ACKs model. (basically leaving all flow releated and error recovery operations to the end hosts). It would also help to understand the typical errors seen, does the noise tend to impact just one "end to end to end association", is it evenly distributed etc? >For more details, u can refer this paper: >T. R. Henderson, R.H. Katz, Transport Protocols for >Internet-Compatible Satellite Networks, IEEE Journal on Selected Areas >in Communications, Vol. 17, No. 2, pp: 345-359, February 1999. > > > Thanks, will look into it, >Regards, >Arjuna > > > -thanks again, Alok From huitema at windows.microsoft.com Tue Jul 5 00:03:53 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Tue, 5 Jul 2005 00:03:53 -0700 Subject: [e2e] Satellite networks latency and data corruption Message-ID: > It would also help to understand the typical errors seen, does the noise > tend to impact just one "end to end to end association", is it evenly > distributed etc? Well, last time I checked that was about 20 years ago, but conditions probably have not changed too much. The error rate depends on the propagation conditions. In the band where most satellites operate (12/14 GHz), these conditions are affected mostly by the weather, and more precisely by the presence of hydrometeors. A large cumulonimbus between antenna and satellite can drop the transmission balance by 3 to 5 db. Cumulonimbus can be a few kilometer wide, so a typical event can last a few minutes, depending of the size of the wind. The effect on the error rate depends of the engineering of the system. If the system is "simple" (no FEC), users may see a very low error rate when the sky is clear, and a rate 1000 times higher during a weather event. If the system uses FEC, the effect can be amplified, i.e. quasi no error in clear sky, and a high error rate during the event. -- Christian Huitema From Jon.Crowcroft at cl.cam.ac.uk Tue Jul 5 01:28:59 2005 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Tue, 05 Jul 2005 09:28:59 +0100 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: Message from Arjuna Sathiaseelan of "Mon, 04 Jul 2005 21:20:27 BST." <1ef2259005070413209c4d261@mail.gmail.com> Message-ID: geo is easy to compute from the eqn in Arthur C Clarke's paper on comsats in Wireless World in 1945 See for example http://lakdiva.org/clarke/1945ww/ a typical british invention/discovery (like steam engines, dna, the genome, jet engines) that was more succewfully exploited by other countries... lucky for them we didnt patent it:) oh, its .72 secs RTT - see for example sigcomm 88 paper on satnet by Seo et al In missive <1ef2259005070413209c4d261 at mail.gmail.com>, Arjuna Sathiaseelan type d: >>Dear Alok, >> >>The link delay in a satellite network depends on whether it is a >>geostationary orbit (GEO) or a lower earth orbit (LEO) network. The >>GEO satellite link has roughly a delay of 300 ms (one way). The LEO >>satellite link has a one way delay that varies from [40,400] ms >>depending on whether the LEO network has one satellite hop or multiple >>hops and >>how far each of these satellite hops are placed. >> >>For more details, u can refer this paper: >>T. R. Henderson, R.H. Katz, Transport Protocols for >>Internet-Compatible Satellite Networks, IEEE Journal on Selected Areas >>in Communications, Vol. 17, No. 2, pp: 345-359, February 1999. >> >>Regards, >>Arjuna cheers jon From alokdube at hotpop.com Tue Jul 5 01:56:09 2005 From: alokdube at hotpop.com (alok) Date: Tue, 05 Jul 2005 14:26:09 +0530 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: References: Message-ID: <42CA4B29.5040404@hotpop.com> Hi, What I want to know is: (a) If the retransmission/ARQ is entirely offloaded to the end transmitter and receiver (say my PC and your PC if we are doing a peer to peer), versus (b) each transmitter and receiver pair on intermediate hop does the same, How is (a) different from (b) in terms of effective utilization? Obviously it is true if an end point A is talking to B and C : A----hop1---hop2--hop3---C | B so if the loss is seen in hop2--hop3, incase of (b) we have hop1 and hop2 being utilised again for the same data that A has to send to C. but are there any numbers/simple experiments illustrating (a) and (b) and the corresponding results on the utilization of A--hop1 and hop1--hop2? My other question is as to how the "buffer sizes" on each hop are estimated in case of (a). -thanks Alok Christian Huitema wrote: >>It would also help to understand the typical errors seen, does the >> >> >noise > > >>tend to impact just one "end to end to end association", is it evenly >>distributed etc? >> >> > >Well, last time I checked that was about 20 years ago, but conditions >probably have not changed too much. > >The error rate depends on the propagation conditions. In the band where >most satellites operate (12/14 GHz), these conditions are affected >mostly by the weather, and more precisely by the presence of >hydrometeors. A large cumulonimbus between antenna and satellite can >drop the transmission balance by 3 to 5 db. Cumulonimbus can be a few >kilometer wide, so a typical event can last a few minutes, depending of >the size of the wind. > >The effect on the error rate depends of the engineering of the system. >If the system is "simple" (no FEC), users may see a very low error rate >when the sky is clear, and a rate 1000 times higher during a weather >event. If the system uses FEC, the effect can be amplified, i.e. quasi >no error in clear sky, and a high error rate during the event. > >-- Christian Huitema > > > From alokdube at hotpop.com Tue Jul 5 01:57:15 2005 From: alokdube at hotpop.com (alok) Date: Tue, 05 Jul 2005 14:27:15 +0530 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: <42CA4B29.5040404@hotpop.com> References: <42CA4B29.5040404@hotpop.com> Message-ID: <42CA4B6B.9070909@hotpop.com> > so if the loss is seen in hop2--hop3, incase of (b) we have hop1 and > hop2 being utilised again for the same data that A has to send to C. > > but are there any numbers/simple experiments illustrating (a) and (b) > and the corresponding results on the utilization of A--hop1 and > hop1--hop2? > > My other question is as to how the "buffer sizes" on each hop are > estimated in case of (a). > a correction: "My other question is as to how the "buffer sizes" on each hop are estimated in case of (b)." From detlef.bosau at web.de Tue Jul 5 05:58:38 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 05 Jul 2005 14:58:38 +0200 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption References: <42CA4B29.5040404@hotpop.com> Message-ID: <42CA83FE.30504@web.de> alok wrote: > Hi, > > What I want to know is: > > (a) > If the retransmission/ARQ is entirely offloaded to the end transmitter > and receiver (say my PC and your PC if we are doing a peer to peer), > > versus > > (b) > each transmitter and receiver pair on intermediate hop does the same, This is the one millione dollar question. O.k. First of all, the standard reference on this matter is: Saltzer, Reed, Clark: End-To-End Arguments in System Design. ACM Transactions on Computer Systems 2(4), Nov. 1984, pp. 277-288. (Hopefully the reference is correct. But I think, title and authors are.) However, "commonly accepted general truths" are similar to the bible. Each believer tell?s you, the bible is true. Ask two believers - you will hear three truths ;-) Basically, your question meets _eaxctly_ the point of local recovery in mobile wireless networks (you guessed correctly, it?s me again ;-)), however I?m not sure whether the resulting design decision is the same. Some helpful criteria might be: 1.: User perspective: What is the _goodput_ from the user?s point of view? 2.: Fairness perspective: Does a user unduely waste network ressources? 3.: System perspecive: Where could error recovery be done cheapest? Let?s start with 3. Where could error recovery be done cheapest. Let me give to propositions, and please correct me here because I do not know the precicese values here. But as fas as I have in mind, on backbone routers / switching systems we have - about 30000 (3^e4) active TCP flows per 100 MBits/s capacity, - about 100 ns available processing time for a packet on a router. (A colleague told me about this some years ago. I personally think this is rather dated. I personally would expect 10 ns or even 1 ns) For these reasons, the general wisdom is to put complexity on the end systems if possible. This is perhaps a possible problem for CETEN, where a correct implementation requests floating point calculation for each IP datagram. (Perhaps, one can improve the algorithm in this respect.) To illustrate the importance of this matter, please consider the IPv6 header: Because there was no compelling reason for spending this processing effort, one has _left_ _out_ the header checksum! For 3G networks, my position is that the gateways between Internet and mobile network are typically quite large computer systems, each one serving some few hundreds of flows. In this case, the effort is acceptable. In satellite networks: I don?t know. Particularly the state variables for ARQ in high bandwidth systems may turn out inacceptable high. 2.: Fairness: If ARQ is placed on the end system, the whole network path "enjoys" necessary retransmissions. Particularly, when a packet must be sent 100 times or more to be successfuly received ad least once, it may increase the network performance to plate ARQ on intermediate sywtems. Once again on 3G networks: Typically, 3G networks are only used as access line. So the major part of the path typically resides in the wirebound internet. Therefore, it makes sense not to bother ther internet with retransmissions. Even more, ARQ in 3G networks is done on radio block level, which is more efficient than ARQ on pakcet level. However, in satellite networks, I can imagine that the bottleneck is really the satellite link itself. In that case, it would make only a minor difference, if ARQ is placed on IS or ES. 3.: User perspective: How long does it take for a packet to be delivered? Again: On a 3G network, the major transission time is spent on the Intentet, in case a _RAW_ channel _WITHOUT_ ARQ/RLP is used. Let?s consider a latency 50 ms and 100 transmissions, than a user will see 5 s STT latency for a packet. When the same packet could be sucessfully delivered via RLP and STT would be increased by 100 ms for that reason, STT would be 150 ms. This is less than 5 s, and this is preferable to the user. Satellite networks: Here the major time is spent on the satellite link. In summary, I?m not quite sure but I can imagine that in satellite networks error recovery is left to the end systems. I think the error recovery effort for IS can turn out unduly high with not that much benefit for fairness and user. Basically, high costs (1.) are an argument for (a), utilization and good user performance (2., 3.) are an argument for (b). It is a tradeoff. > > How is (a) different from (b) in terms of effective utilization? > Obviously it is true if an end point A is talking to B and C : This is mainly covered by 2. Fairness. Of couse, the utilization of a link decreases if it is fed up with retransmissions only. I think, the consideration can turn out quite different, depending on the actual scenario: E.g. a satellite mobile phone could be attached to the Internnet. Or a satellite link could be used for Internet backbone connections, perhaps wheather dependent in combination with a fibre link. As you see, I cannot offer a real "answer" here. My intention is to draw attention to the question. I?ve got the impression that there are typically strong objections against doint local recovery in the TCP community. Althouth RLP is practically in use for more about a decade now in mobile networks, I freuqently see the position that TCP should be run on faw e2e networks without any local recovery support. Perhaps, this impression is wrong. However: I think the decision is not easy to make. DB -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Tue Jul 5 06:27:01 2005 From: alokdube at hotpop.com (alok) Date: Tue, 05 Jul 2005 18:57:01 +0530 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption In-Reply-To: <42CA83FE.30504@web.de> References: <42CA4B29.5040404@hotpop.com> <42CA83FE.30504@web.de> Message-ID: <42CA8AA5.5050704@hotpop.com> Hi, Thank you for the response.. > > This is perhaps a possible problem for CETEN, where a correct > implementation requests floating point calculation for each IP > datagram. (Perhaps, one can improve the algorithm in this respect.) > > To illustrate the importance of this matter, please consider the IPv6 > header: Because there was no compelling reason for spending this > processing effort, one has _left_ _out_ the header checksum! > I prefer being silent on the matter :-) > For 3G networks, my position is that the gateways between Internet and > mobile network are typically quite large computer systems, each one > serving some few hundreds of flows. In this case, the effort is > acceptable. > > In satellite networks: I don?t know. Particularly the state variables > for ARQ in high bandwidth systems may turn out inacceptable high. > > 2.: Fairness: > > If ARQ is placed on the end system, the whole network path "enjoys" > necessary retransmissions. Particularly, when a packet must be sent > 100 times or more to be successfuly received ad least once, it may > increase the network performance to plate ARQ on intermediate sywtems. > if it is wire line/fibre, i think we do not have to worry about losses. > Once again on 3G networks: Typically, 3G networks are only used as > access line. So the major part of the path typically resides in the > wirebound internet. Therefore, it makes sense not to bother ther > internet with retransmissions. Even more, ARQ in 3G networks is done > on radio block level, which is more efficient than ARQ on pakcet level. > well I did see your website, but ....what is "3G"? it does not use anything but the same fiber and copper or media than was already there, right? > However, in satellite networks, I can imagine that the bottleneck is > really the satellite link itself. In that case, it would make only a > minor difference, if ARQ is placed on IS or ES. > > Yes I wanted to know if there are numbers when done on ES and how it impacts the channel utilization on intermediate hops? > 3.: User perspective: > > How long does it take for a packet to be delivered? > > Again: On a 3G network, the major transission time is spent on the > Intentet, in case a _RAW_ channel _WITHOUT_ ARQ/RLP is used. > Let?s consider a latency 50 ms and 100 transmissions, than a user will > see 5 s STT latency for a packet. > > When the same packet could be sucessfully delivered via RLP and STT > would be increased by 100 ms for that reason, STT would be 150 ms. > This is less than 5 s, and this is preferable to the user. > > Satellite networks: Here the major time is spent on the satellite link. > > In summary, I?m not quite sure but I can imagine that in satellite > networks error recovery is left to the end systems. I think not, perhaps someone can confirm. > I think the error recovery effort for IS can turn out unduly high with > not that much benefit for fairness and user. > > Basically, high costs (1.) are an argument for (a), utilization and > good user performance (2., 3.) are an argument for (b). > > It is a tradeoff. > >> >> How is (a) different from (b) in terms of effective utilization? >> Obviously it is true if an end point A is talking to B and C : > > > This is mainly covered by 2. Fairness. > > Of couse, the utilization of a link decreases if it is fed up with > retransmissions only. > > I think, the consideration can turn out quite different, depending on > the actual scenario: E.g. a satellite mobile phone could be attached > to the Internnet. Or a satellite link could be used for Internet > backbone connections, perhaps wheather dependent in combination with a > fibre link. > By fairness, do you mean that "i use up someone else's space/time/bandwidth"? no that is not what I was asking. What on the internet is "fairness"? I just wanted to know what the results are, if any, when ARQ is done ES<==>ES. -thanks Alok From craig at aland.bbn.com Tue Jul 5 07:20:38 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Tue, 05 Jul 2005 10:20:38 -0400 Subject: [e2e] Satellite networks latency and data corruption In-Reply-To: Your message of "Tue, 05 Jul 2005 10:52:50 +0530." <42CA192A.20200@hotpop.com> Message-ID: <20050705142038.ED2AF1FF@aland.bbn.com> >My question was specific to ARQ strategies and the latency induced due >to the same. > >I understand that the media is highly error prone but was wondering what >is the performance difference if I do : > >Host1-----Satellite--Satellite---Host2 > | | >Host3------+ +---------Host4 > >with > >a. no ACK/ARQ "per hop". >versus >b. only endhost to end host ARQs/ACKs model. (basically leaving all flow >releated and error recovery operations to the end hosts). Depends, I think, on your metric of cost/success. The answer if you're trying to minimize the number of packets sent is that you better do hop-by-hop ACK if the error rate is at all bad (as well as E2E ACKs). Not sure what you get if you seek to minimize delay or maximize throughput. Craig From detlef.bosau at web.de Tue Jul 5 08:31:40 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 05 Jul 2005 17:31:40 +0200 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption References: <42CA4B29.5040404@hotpop.com> <42CA83FE.30504@web.de> <42CA8AA5.5050704@hotpop.com> Message-ID: <42CAA7DC.8080507@web.de> alok wrote: > >> For 3G networks, my position is that the gateways between Internet and >> mobile network are typically quite large computer systems, each one >> serving some few hundreds of flows. In this case, the effort is >> acceptable. >> >> In satellite networks: I don?t know. Particularly the state variables >> for ARQ in high bandwidth systems may turn out inacceptable high. >> >> 2.: Fairness: >> >> If ARQ is placed on the end system, the whole network path "enjoys" >> necessary retransmissions. Particularly, when a packet must be sent >> 100 times or more to be successfuly received ad least once, it may >> increase the network performance to plate ARQ on intermediate sywtems. >> > if it is wire line/fibre, i think we do not have to worry about losses. > >> Once again on 3G networks: Typically, 3G networks are only used as >> access line. So the major part of the path typically resides in the >> wirebound internet. Therefore, it makes sense not to bother ther >> internet with retransmissions. Even more, ARQ in 3G networks is done >> on radio block level, which is more efficient than ARQ on pakcet level. >> > well I did see your website, but ....what is "3G"? Third Generation mobile networks. http://www.3gpp.org/ I think "Third Generation Wide Area Mobile Networking and Ubiquitous Computing" sounds better - therefore sells better than: UMTS :-) *SCNR* > it does not use anything but the same fiber and copper or media than was > already there, right? > Yes, I think the wireless media was already there ;-) (There must be some papers by James C. Maxwell about this.... and certainly, JC will add: another british invention/discovery, however: JCM was a scot :-)) >> This is mainly covered by 2. Fairness. >> >> Of couse, the utilization of a link decreases if it is fed up with >> retransmissions only. >> >> I think, the consideration can turn out quite different, depending on >> the actual scenario: E.g. a satellite mobile phone could be attached >> to the Internnet. Or a satellite link could be used for Internet >> backbone connections, perhaps wheather dependent in combination with a >> fibre link. >> > By fairness, do you mean that "i use up someone else's > space/time/bandwidth"? no that is not what I was asking. What on the > internet is "fairness"? In this particular case I meant: I do not use more bandwidth for retransmission than others do. Basically, fairness on the Internet means that competing flows use a fair share of common ressources. -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From huitema at windows.microsoft.com Tue Jul 5 09:03:09 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Tue, 5 Jul 2005 09:03:09 -0700 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption Message-ID: There are pros and cons to hop by hop and end to end control. The nature of loss (weather events) implies that there is generally no correlation between the error rates of the multiple hops. In practice, only one hop will have a significant error rate, while the others will be quasi free of errors. If you assume a perfect selective ARQ mechanism, you will find that end-to-end is slightly less efficient from a bandwidth point of view, because any retransmitted packet is carried over every hop. However, this is a relatively contained issue, since even with a "large" error rate only a small fraction of packets are in fact retransmitted. You will also find that a typical hop-by-hop ARQ (e.g. HDLC with selective rejects) results in large delays if you assume re-sequencing at each node, because all flows must wait for the retransmission of any error -- in practice, the offending hop delay becomes three times larger than necessary. You could opt to not implement re-sequencing, but then you have to deal with end to end re-ordering. Given the satellite hop delays, the out of sequence packet will arrive about 450 ms after the original packet. In all likelihood, TCP will have triggered an end to end retransmission, thus negating any bandwidth advantage to hop by hop retransmission. The technique that is most useful is variable per hop FEC. If the satellite system can detect that the link is experiencing bad conditions, it could push the FEC system to use more redundancy for the duration of the event. The effect may not be perfect, there will still be some residual errors, but the overall error rate will remain very low and the transmission delays will remain contained. -- Christian Huitema From detlef.bosau at web.de Tue Jul 5 11:28:50 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 05 Jul 2005 20:28:50 +0200 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption References: Message-ID: <42CAD162.9040308@web.de> Christian Huitema wrote: > There are pros and cons to hop by hop and end to end control. > > The nature of loss (weather events) implies that there is generally no > correlation between the error rates of the multiple hops. In practice, > only one hop will have a significant error rate, while the others will > be quasi free of errors. Could it be feasible then to insert ARQ IS-IS only around the satellite link, which suffer from weather events? > > You will also find that a typical hop-by-hop ARQ (e.g. HDLC with > selective rejects) results in large delays if you assume re-sequencing > at each node, because all flows must wait for the retransmission of any > error -- in practice, the offending hop delay becomes three times larger > than necessary. O.k. But I think this must be compared to the delay seen by a pure ES-ES approach. If you consider a typical path drain behaviour resulting from packet loss, there may be short pause in the flow as well. > > You could opt to not implement re-sequencing, but then you have to deal > with end to end re-ordering. Given the satellite hop delays, the out of > sequence packet will arrive about 450 ms after the original packet. In > all likelihood, TCP will have triggered an end to end retransmission, > thus negating any bandwidth advantage to hop by hop retransmission. > Could this behaviour be alleviated by TCP spoofing as discussed e.g. in Mark Allman?s paper? DB -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Tue Jul 5 23:14:29 2005 From: alokdube at hotpop.com (alok) Date: Wed, 06 Jul 2005 11:44:29 +0530 Subject: [e2e] local recovery or not local recovery, was: Re: Satellite networks latency and data corruption In-Reply-To: <42CAD162.9040308@web.de> References: <42CAD162.9040308@web.de> Message-ID: <42CB76C5.8060402@hotpop.com> Detlef Bosau wrote: > Christian Huitema wrote: > >> There are pros and cons to hop by hop and end to end control. >> The nature of loss (weather events) implies that there is generally no >> correlation between the error rates of the multiple hops. In practice, >> only one hop will have a significant error rate, while the others will >> be quasi free of errors. > > > Could it be feasible then to insert ARQ IS-IS only around the > satellite link, which suffer from weather events? A bit closer to what I was looking at, I thought the concept of selective reject around a window would ensure this is how it behaves anyways? However given: A1---hop1----hop2--hop3--hop4---A2 a. does each hop reorder (no I think) b. does each hop *only keep track of the window* and so SREJ and forward what it has received? c. is the window relevant *only* between 2 hops? More like A1 has frames1--frames10 window is 5 between A1 and A2 (do intermediate nodes keep track of window size, or do they keep all the frames locally, which is what I was asking about buffer sizes??) hop3 sees it has got frames1,2,4,5 (in a sequential order) but not 3 does hop3 send a SREJ back to hop2 for that frame3 *but* forwards the rest as and when they come in? Ofcourse this would mean hop3 gets a SREJ from hop4 and hop4 from A2 and so on, but it in no way makes each node hold up the rest of the data. so basically each hop knows the window and can induce a SREJ, but it does not in anyway "wait for the whole window". Is this different from what it actually behaves like? -thanks Alok From rja at extremenetworks.com Wed Jul 6 23:23:11 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Thu, 7 Jul 2005 02:23:11 -0400 Subject: [e2e] local recovery or not local recovery In-Reply-To: <42CAA7DC.8080507@web.de> References: <42CA4B29.5040404@hotpop.com> <42CA83FE.30504@web.de> <42CA8AA5.5050704@hotpop.com> <42CAA7DC.8080507@web.de> Message-ID: <59C8C535-1053-4B23-A9BA-98617C20556F@extremenetworks.com> On Jul 5, 2005, at 11:31, Detlef Bosau wrote: >> well I did see your website, but ....what is "3G"? > > Third Generation mobile networks. ^ insert 'telephone' there above. :-) > http://www.3gpp.org/ > > I think "Third Generation Wide Area Mobile Networking and > Ubiquitous Computing" sounds better - therefore sells better than: > UMTS :-) *SCNR* "3GPP" is a European standardisation effort for the next-generation of GSM compatible mobile telephones. There are other competing efforts, also called "3G", for standardising next-generations of IS-95 ("CDMA") compatible mobile telephones. However, the key phrase in the quoted text above is actually "mobile telephone", because any of the mobile telephone specs have link properties and mobility properties that are rather different than currently available satellite communications (SATCOM) links, currently standardised wireless Ethernet (i.e. IEEE 802.11*) links, or other currently being standardised LAN/MAN radio (e.g. IEEE 802.16) links. Further afield, the military radio communities seem to be moving to software-defined radios supporting link technologies that are a bit different than any of the above. Yours, Ran rja at extremenetworks.com From frasker at hotmail.com Fri Jul 8 08:59:19 2005 From: frasker at hotmail.com (F. Bouy) Date: Fri, 08 Jul 2005 15:59:19 +0000 Subject: [e2e] New Reno [RFC 3782] Message-ID: May I know why step 1B in RFC 3782 is necessary? As far as I am concern if the sender is not already in the Fast Recovery procedure, the Cumulative Acknowledgement field will never covers more than "recover" [in step 1] because it will never have the chance to execute step 1A. I may miss something here. If I am wrong, I would appreciate a counterexample. _________________________________________________________________ Find love online with MSN Personals. http://match.msn.com.my/match/mt.cfm?pg=channel From swati at seas.upenn.edu Sat Jul 9 12:58:35 2005 From: swati at seas.upenn.edu (Saswati Sarkar) Date: Sat, 9 Jul 2005 15:58:35 -0400 Subject: [e2e] ACM SIGCOMM 2005 Conference Message-ID: <1120939115.42d02c6b7ed95@webmail.seas.upenn.edu> ACM SIGCOMM 2005 Conference August 22-26, 2005 Sheraton Society Hill Hotel Philadelphia, PA http://www.acm.org/sigcomm/sigcomm2005 IMPORTANT DATES: -Early Registration ends July 21 -Hotel Reservation deadline is July 21 -Submissions for Work in Progress due August 1 -Turing Award Lecture by V. Cerf and R. Kahn is August 22 -Welcoming Reception is August 22 - SIGCOMM Conference: Tuesday August 23 - Thursday August 25 - Tutorials: Monday August 22 and Friday August 26 - Workshops: Monday August 22 and Friday August 26 Program Highlights ------------------------- ACM SIGCOMM is pleased to announce that the recipient of the 2005 SIGCOMM Award is Dr. Paul Mockapetris of Nominum, Redwood City, CA, who will give the keynote address in the opening session on Tuesday. The technical program consists of 27 refereed papers and includes four innovative Workshops and three Tutorials. The three half-day Tutorials are: * Infrastructure attack detection and mitigation, Craig Labovitz, Arbor Networks; Danny McPherson, Arbor Networks; Farnam Jahanian, University of Michigan, Ann Arbor *Operations and Management of IP Networks: What Researchers Should Know Aman Shaikh, AT&T Labs Research; Albert Greenberg, AT&T Labs Research *Broadband Wireless Access and High-Speed Wireless Data Applications, Sanjoy Paul, Whenu Inc. The four Workshops are: *Workshop on experimental approaches to wireless network design and analysis (E-WIND05) Ed Knightly, Rice University Christophe Diot, Intel Corporation *Workshop on economics of peer-to-peer systems (P2PECON-05) Eric Friedman, Cornell University Emin Gun Sirer, Cornell University *Workshop on mining network data (MineNet-05) Subhabrata Sen, AT&T Labs-Research, Chuanyi Ji, Georgia Tech Debanjan Saha, IBM Research, Joe McCloskey Dept. of Defense *Workshop on delay tolerant networking and related networks (WDTN-05) Kevin Fall, Intel S. Keshav, U. of Waterloo In addition, there will be an interesting poster session, a stimulating work in progress session, an enjoyable banquet, and, of course, a provocative outrageous opinions session! For details about registration and the conference, visit the conference web site at http://www.acm.org/sigcomm/sigcomm2005 Best Regards Saswati Sarkar Saswati Sarkar Assistant Professor Department of Electrical and Systems Engineering University of Pennsylvania Email: swati at seas.upenn.edu Phone: 2155739071 Fax: 2155732068 Webpage: http://www.seas.upenn.edu/~swati Mail: 354 Moore, 200 S. 33rd street Philadelphia, PA 19107 USA Saswati Sarkar Assistant Professor Department of Electrical and Systems Engineering University of Pennsylvania Email: swati at seas.upenn.edu Phone: 2155739071 Fax: 2155732068 Webpage: http://www.seas.upenn.edu/~swati Mail: 354 Moore, 200 S. 33rd street Philadelphia, PA 19107 USA From frasker at hotmail.com Mon Jul 11 04:19:08 2005 From: frasker at hotmail.com (F. Bouy) Date: Mon, 11 Jul 2005 11:19:08 +0000 Subject: [e2e] New Reno Message-ID: Why nobody answering my question? >May I know why step 1B in RFC 3782 is necessary? As far as I am concern if >the sender is not already in the Fast Recovery procedure, the Cumulative >Acknowledgement field will never covers more than "recover" [in step 1] >because it will never have the chance to execute step 1A. > >I may miss something here. If I am wrong, I would appreciate a >counterexample. I have an extra question. The partial acknowledgement procedure in step 5 requires that the cwnd be deflated by the amount of new data acknowledged. I understand the concern there, but if the amount of new data acknowledged is sufficiently large (say nearly a window of data), wouldn't it make the result negative? Although in this case it can be corrected to zero, we cannot send any segment since the cwnd is completely closed. _________________________________________________________________ Get your mobile ringtones, operator logos and picture messages from MSN Mobile http://msn.smsfactory.no/ From craig at aland.bbn.com Mon Jul 11 05:28:22 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Mon, 11 Jul 2005 08:28:22 -0400 Subject: [e2e] New Reno In-Reply-To: Your message of "Mon, 11 Jul 2005 11:19:08 -0000." Message-ID: <20050711122822.992311FF@aland.bbn.com> In message , "F. Bouy" writes: >Why nobody answering my question? Because usually one lets the authors do it. >>May I know why step 1B in RFC 3782 is necessary? As far as I am concern if >>the sender is not already in the Fast Recovery procedure, the Cumulative >>Acknowledgement field will never covers more than "recover" [in step 1] >>because it will never have the chance to execute step 1A. "recover" gets set in step 1A to the highest sequence number transmitted, which may be well past the current ACK value. The algorithm ensures that cleaning up the segments between the current cumulative ack and the highest value sent doesn't cause you to (re)enter frast retransmit/fast recovery. Craig From frasker at hotmail.com Mon Jul 11 07:04:35 2005 From: frasker at hotmail.com (F. Bouy) Date: Mon, 11 Jul 2005 14:04:35 +0000 Subject: [e2e] New Reno Message-ID: > >Why nobody answering my question? > >Because usually one lets the authors do it. > > >>May I know why step 1B in RFC 3782 is necessary? As far as I am concern >if > >>the sender is not already in the Fast Recovery procedure, the Cumulative > >>Acknowledgement field will never covers more than "recover" [in step 1] > >>because it will never have the chance to execute step 1A. > >"recover" gets set in step 1A to the highest sequence number transmitted, >which may be well past the current ACK value. The algorithm ensures >that cleaning up the segments between the current cumulative ack >and the highest value sent doesn't cause you to (re)enter frast >retransmit/fast recovery. I thought so in the first place, but step 1 states that the checking is only done for the case when we are not in Fast Recovery. _________________________________________________________________ Get your mobile ringtones, operator logos and picture messages from MSN Mobile http://msn.smsfactory.no/ From craig at aland.bbn.com Mon Jul 11 08:22:32 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Mon, 11 Jul 2005 11:22:32 -0400 Subject: [e2e] New Reno In-Reply-To: Your message of "Mon, 11 Jul 2005 14:04:35 -0000." Message-ID: <20050711152232.62F351FF@aland.bbn.com> In message , "F. Bouy" writes: >>"recover" gets set in step 1A to the highest sequence number transmitted, >>which may be well past the current ACK value. The algorithm ensures >>that cleaning up the segments between the current cumulative ack >>and the highest value sent doesn't cause you to (re)enter frast >>retransmit/fast recovery. > >I thought so in the first place, but step 1 states that the checking is only >done for the case when we are not in Fast Recovery. This isn't my particular expertise, but I assume (because there are a number of situations in which this would be a useful mechanism) that it is possible to exit Fast Recovery without having retransmitted all the lost data (e.g., several losses on retransmissions), right? And then you presumably don't want to re-enter. If we're going to go further down this path, you probably need to sketch out a proof of the TCP conditions and the value of recover that proves that 1B is a situation that can never happen. Craig From frasker at hotmail.com Tue Jul 12 03:44:51 2005 From: frasker at hotmail.com (F. Bouy) Date: Tue, 12 Jul 2005 10:44:51 +0000 Subject: [e2e] New Reno In-Reply-To: <20050711152232.62F351FF@aland.bbn.com> Message-ID: > >>"recover" gets set in step 1A to the highest sequence number >transmitted, > >>which may be well past the current ACK value. The algorithm ensures > >>that cleaning up the segments between the current cumulative ack > >>and the highest value sent doesn't cause you to (re)enter frast > >>retransmit/fast recovery. > > > >I thought so in the first place, but step 1 states that the checking is >only > >done for the case when we are not in Fast Recovery. > >This isn't my particular expertise, but I assume (because there are a >number of situations in which this would be a useful mechanism) that it >is possible to exit Fast Recovery without having retransmitted all the >lost data (e.g., several losses on retransmissions), right? > >And then you presumably don't want to re-enter. > >If we're going to go further down this path, you probably need to sketch >out a proof of the TCP conditions and the value of recover that proves >that 1B is a situation that can never happen. You are right about it. After re-reading the document again, I picked up points I had missed earlier. Thank you. _________________________________________________________________ Find just what you are after with the more precise, more powerful new MSN Search. http://search.msn.com.my/ Try it now. From zchen at ece.gatech.edu Wed Jul 13 19:26:14 2005 From: zchen at ece.gatech.edu (Zesheng Chen) Date: Wed, 13 Jul 2005 22:26:14 -0400 (EDT) Subject: [e2e] Comments on "Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation" Message-ID: <3803.192.168.5.18.1121307974.squirrel@secure2.ece.gatech.edu> We have written a short Comments paper to correct several errors in the paper "Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation". Title: Comments on "Modeling TCP Reno Performance: A Simple Model and Its Empirical Validation" Authors: Zesheng Chen, Tian Bu, Mostafa Ammar, and Don Towsley Abstract: In this Comments, several errors in [1] are pointed out. The more serious of these errors results in an over-prediction of the send rate. The expression obtained for send rate in this Comments leads to greater accuracy when compared with the measurement data than the original send rate expression in [1]. A draft of the Comments can be found at: http://users.ece.gatech.edu/~zchen/corrections.pdf We welcome any feedback. Best Regards, Zesheng Chen =================== Ph.D. Candidate School of Electrical & Computer Engineering Georgia Institute of Technology From gianluca.iannaccone at intel.com Fri Jul 15 06:06:50 2005 From: gianluca.iannaccone at intel.com (Iannaccone, Gianluca) Date: Fri, 15 Jul 2005 14:06:50 +0100 Subject: [e2e] CFP -- IEEE JSAC Sampling The Internet: Techniques and Applications Message-ID: <101EBE844CE72E4F8F72686631823B2B02E021AA@swsmsx404> Apologies if you receive multiple copies of this email. All information can be found at http://www.argreenhouse.com/society/J-SAC/Calls/sampling_internet.html. Deadline for manuscript submission: OCTOBER 1, 2005. ======================================================= CALL FOR PAPERS IEEE Journal on Selected Areas in Communications SAMPLING THE INTERNET: TECHNIQUES AND APPLICATIONS Scope As the Internet continues to grow rapidly in size and complexity, it has become increasingly clear that its evolution is closely tied to a detailed understanding of network traffic. Network traffic measurements are invaluable for a wide range of tasks such as network capacity planning, traffic engineering, fault diagnosis, application and protocol performance profiling, and anomaly detection. This large and diverse set of applications raises the question of how to monitor the Internet in an efficient and scalable way. In the case of active monitoring (where probe packets are sent across the network to infer specific properties) the scalability issue arises from the size of the Internet and the potentially large number of end systems that one needs to instrument, as well as the number of probing experiments that one must conduct. Intuitively, sampling is an essential component of scalable Internet monitoring. Broadly speaking, sampling is the process of making partial observations of a system of interest, and drawing conclusions about the full behaviour of the system from these limited observations. The observation problem is concerned with minimising information loss whilst reducing the volume of collected data. It is this reduction that makes the collection process scalable. The way in which the partial information is transformed into knowledge of the system as a whole is the inversion problem. The inversion is in general imperfect and error-prone. The aim of this issue is to bring together work from researchers and practitioners devoted to the understanding of the practical and theoretical issues related to all aspects of sampling the Internet. In this context, sampling may take various forms. A classic example is to observe only a subset of the packets carried over a link, and then estimate traffic parameters which apply to all packets. Alternatively, one could target a subset of routers with packet probes in order to infer network characteristics such as the topology or routing matrix. Examples abound from a wide variety of application areas within Internet measurement, management, and analysis. Independent of subject area, papers will be in scope if they focus substantially on the sampling aspects of the problem under study, for example by exploring the tradeoff between observation and inversion processes, revealing the limitations of inversion techniques, analysing their properties, or proposing new ones, or by providing new insights by explicitly recognizing the impact of implicit sampling in many measurement studies. Topics of interest include (but are not limited to): - Sampling and inverting traffic metrics with passive or active systems. - Internet end-to-end measurements seen from a sampling standpoint. - Sampling aspects of network topology inference. - Impact of sampling on anomaly detection. - Mechanisms for sampling live Internet traffic or collected traces. - Theoretical studies of the sampling/inversion problem (e.g., accuracy, complexity). - Distributed and adaptive sampling techniques. - New sampling methods. Submission guidelines Authors should follow the IEEE J-SAC manuscript format described in the Information for Authors. There will be one round of reviews and acceptance will be limited to papers needing only moderate revisions. Prospective authors should submit a PDF version of their complete manuscript via email to jsac-sampling at sophia.inria.fr according to the following timetable: Manuscript submission: October 1, 2005 Acceptance notification: March 1, 2006 Final manuscript due: June 1, 2006 Publication: 4th quarter 2006 Guest Editors Chadi Barakat INRIA Plan?te group 2004, route des Lucioles 06902 Sophia Antipolis France Chadi.Barakat at sophia.inria.fr Gianluca Iannaccone Intel Research 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom gianluca.iannaccone at intel.com Jim Kurose Department of Computer Science University of Massachusetts Amherst MA 01003 United States kurose at cs.umass.edu Darryl Veitch CUBIN (ARC Special Research Ctr) Department of Electrical & Electronic Engineering University of Melbourne Victoria 3010 Australia dveitch at unimelb.edu.au From detlef.bosau at web.de Fri Jul 15 09:58:05 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 15 Jul 2005 18:58:05 +0200 Subject: [e2e] Agility of RTO Estimates Message-ID: <42D7EB1D.8050003@web.de> Hi! Please excuse me for bothering you with this question, but I?m interested whether there exists material on this question. I?m interesting in the "agility" of TCP?s retransmission timeout (RTO). To my understanding, the RTO defines a confidence interval for a packets RTT and therefore maintains a simple yet effective model of the path. This model is continuously adapted by evaluation of actual ACK packets, and immediately yields a confidence test for the zero hypothesis "packet is successfuly delivered" with a certain, if implicit, level of significance.. Therefore, Spurious Timeouts (refer to Ludwig, Gurtov et al.), mathematically spoken, are errors of the first kind and thus may occur on _any_ network, not only in wireless ones. Thus, if ST are observed "unduly often", this may indicate that the level of confidence has risen - in other words: RTO has not yet adapted to a larger value. My question is, with repect to mobile wireless networks as UMTS or GPRS: How "quickly" does RTO adapt? I expect, this is restricted by the ES-ES latency, the packet rate (i.e. "samplin rate"), the burstiness of traffic etc. Can this "RTO model" follow e.g. the latency variations met on the mobile network in "real time"? Or are there basic limiations. (At least, I expect so.) Many thanks! Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From craig at aland.bbn.com Fri Jul 15 10:49:36 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Fri, 15 Jul 2005 13:49:36 -0400 Subject: [e2e] Agility of RTO Estimates In-Reply-To: Your message of "Fri, 15 Jul 2005 18:58:05 +0200." <42D7EB1D.8050003@web.de> Message-ID: <20050715174936.D1CC61FF@aland.bbn.com> In message <42D7EB1D.8050003 at web.de>, Detlef Bosau writes: >My question is, with repect to mobile wireless networks as UMTS or GPRS: >How "quickly" does RTO adapt? I expect, this is restricted by the ES-ES >latency, the packet rate (i.e. "samplin rate"), the burstiness of >traffic etc. >Can this "RTO model" follow e.g. the latency variations met on the >mobile network in "real time"? >Or are there basic limiations. (At least, I expect so.) I'll take a stab at this and be delighted to be corrected by others who know better. I believe the immediate issue is not the "RTO model" but rather the question of what RTO estimator you use. In the late 1980s there was a crisis of confidence in RTO estimators -- a problem we dealt with by developing Karn's algorithm (to deal with retransmission ambiguity) and improving the RSRE estimation algorithm with Van Jacobson's replacement. Van did a bunch of testing of his estimator on real Internet traffic and looked to see how often the estimator failed. (Note that spurious timeouts are only one failure -- delaying a retransmission overly long after the loss is also a failure.) He picked an estimator that was easy to compute and gave good results in the real world. If there's reason to believe the estimator today is working less well, we could obviously replace it. That doesn't mean the RTO model needs fixing. Second point is that the RTO model now works in concert with other mechanisms. I.e. it used to be that we relied only on RTO to determine if we should retransmit. Now we have Fast Retransmit to catch certain types of loss. Craig From detlef.bosau at web.de Fri Jul 15 12:18:38 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 15 Jul 2005 21:18:38 +0200 Subject: [e2e] Agility of RTO Estimates References: <20050715174936.D1CC61FF@aland.bbn.com> Message-ID: <42D80C0E.9010404@web.de> Craig Partridge wrote: > In message <42D7EB1D.8050003 at web.de>, Detlef Bosau writes: > > >>My question is, with repect to mobile wireless networks as UMTS or GPRS: >>How "quickly" does RTO adapt? I expect, this is restricted by the ES-ES >>latency, the packet rate (i.e. "samplin rate"), the burstiness of >>traffic etc. >>Can this "RTO model" follow e.g. the latency variations met on the >>mobile network in "real time"? >>Or are there basic limiations. (At least, I expect so.) > > > I'll take a stab at this and be delighted to be corrected by others who > know better. > > I believe the immediate issue is not the "RTO model" but rather the > question of what RTO estimator you use. In the late 1980s there was Basically, it?s the same question. Maybe, I was confusing there. The "RTO model" consists of 1. the RTT estimate, 2. the variation estimate, 3. the "recipe", how you "cook" a confidence interval from those parameters. > a crisis of confidence in RTO estimators -- a problem we dealt with by > developing Karn's algorithm (to deal with retransmission ambiguity) and > improving the RSRE estimation algorithm with Van Jacobson's replacement. > > Van did a bunch of testing of his estimator on real Internet traffic and > looked to see how often the estimator failed. (Note that spurious > timeouts are only one failure -- delaying a retransmission overly long > after the loss is also a failure.) He picked an estimator that was > easy to compute and gave good results in the real world. > > If there's reason to believe the estimator today is working less well, we > could obviously replace it. That doesn't mean the RTO model needs fixing. I don?t want to fix the RTO model itself - just to be not misunderstood. I only want to understand the basic limiations. E.g.: The RTT estimate ("SRTT") _has_ to rely on a certain time series of RTT observations taken from the flow. Similar to the sentence: "Nehmen Se de Menschen, wie se sind. Andere jibt et nich.", Konrad Adenauer :-) Or in english (hopefully, the translation is not too bad): Adenauer advised us: Take the people as they are. There are no others. BTW: Sp. T. and delaying a retransmission overly long are basically the same problem. In each statistical test, you have two kinds of errors. The ones of the first kind: Falsely reject a correct zero hypothesis. If your z.h. is "The packet is correctly delivered and acknowledged", a sp. t. is an error of the first kind. Then, there are the ones of the second kind: Falsely "accept" (precisely "not reject", because tests make a decision whether or not to reject a z.hyp.) a wrong zero hypothesis. Back to RTT estimators. You have to rely on a certain time series. Depending at least on your throughput, this series is restricted to a certain "sampling rate". From this, the resolution of the estimator, i.e. it?s ability to follow network property changes in their original bandwidth is limited. A concrecte example: Properties of an UMTS channel may change extremely quickly. The transport latency for a radio block may change several times even _within_ one IP packet (which may be split into severyl RB for transmsission). Thus the end-to-end latency for a packet will change several times within one packet transmission. It is obvious that a RTT estimate _cannot_ follow these changes, independent of the chosen estimator. (It is a very rough analogy, but I always think of Shannon?s sampling theorem here.) > > Second point is that the RTO model now works in concert with other > mechanisms. I.e. it used to be that we relied only on RTO to determine > if we should retransmit. Now we have Fast Retransmit to catch certain > types of loss. > ...which raises other questions of course, e.g. the question whether packet reordering is neglectible or not. However, for the moment I don?t think about that. The underlying question in fact is: When I could place a bandwidth restriction upon network property changes (don?t ask me how ;-), but for the moment, let?s assume I could), which restriction would be enough to allow RTT and variation estimators to follow network properties "quickly enough"? I.e. to keep the risk of spurious timeouts etc. at a constant level? Please note: I do not say _avoid_ here, because in a test, the level of significance _is_ the propability for an error of the first kind. Particularly for spurious timeouts, that means these are not restricted to wireless network but are an inherent (and inevitable!) part of TCP which is met on _all_ networks. In other words: What (bandwidth) restrictions must be eonforced on network properties, to maintain a "constant" level of significance for the RTO test here? I think about this for weeks now and sometimes, I fear that I have to rely only simulations on this one. And I must reveal a secret here: I hate simulations. Not only, that simulations can "prove" everything and nothing - but sometimes I fear that the NS2 is for networks what Google is for reality..... (Not to be misunderstood: A well done simulation may provide useful insight. However, it does not replace a thorough rationale for proposed mechanisms.) Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Sat Jul 16 09:52:02 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 16 Jul 2005 18:52:02 +0200 Subject: [e2e] Satellite networks latency and data corruption References: Message-ID: <42D93B32.90601@web.de> Christian Huitema wrote: > > Well, last time I checked that was about 20 years ago, but conditions > probably have not changed too much. > > The error rate depends on the propagation conditions. In the band where > most satellites operate (12/14 GHz), these conditions are affected > mostly by the weather, and more precisely by the presence of > hydrometeors. A large cumulonimbus between antenna and satellite can > drop the transmission balance by 3 to 5 db. Cumulonimbus can be a few > kilometer wide, so a typical event can last a few minutes, depending of > the size of the wind. > > The effect on the error rate depends of the engineering of the system. > If the system is "simple" (no FEC), users may see a very low error rate > when the sky is clear, and a rate 1000 times higher during a weather > event. If the system uses FEC, the effect can be amplified, i.e. quasi > no error in clear sky, and a high error rate during the event. > > -- Christian Huitema Just another question. I?m trying to understand a paper "Dynamik Congestion Control to Improve Performacne of TCP Split-Connections over Satellite Links" by Lijuan Wu, Fei Peng, Victor C.M. Leung, appeared in Proc. ICCCN 2004, and I?m about to throw the towel. I somehow feel, it?s related work for me, but after a couple of days I still do not see, which problem is solved in this paper. Therefore, I would like to continue this discussion a little bit and start with an empty sheet of paper. What I would like to now: -Which bandwidths can be achieved by satellitle links? -Which packet corruption rates? (I know, you said this above, but I?m not an electrical engineer, so I have to translate his into my way of thinking. As I understand you say there are basically two states: Sane: Packet corruption errors are neglectible. Ill: Packet corruption rates are...? Are there typical values? 90 %, 9 %, 0.9 %.....? Refering to the aforementioned paper, a satellite link is "lossy". Now, the authors propose a splitting/spoofing architecture: SND----netw.cloud----P1---satellite link----P2-----netw.cloud----RCV P1, P2: "Proxies". Let?s assume splitt connection gateways (I-TCP, Bakre/Badrinath, 1994 or 1995). SND-P1 and P2-RCV run TCP, P1-P2 may use some other protocol which is appropriate for satellite links, e.g. STP. My first question is: Where do I expect congestion in this scenario, particularly congestion, which is not handled by existing congestion control mechanisms? One of the authors told me: He expects congestion at P1, because several TCP flows may share the same uplink here. Hm. Wouldn?t it be correct to say: There is _no_ unhandled congestion? TCP and STP offer congestion handling, thus none of the three parts of the path suffer from unhandled congestion. Is this correct? So, if we assume STP is working just fine (I don?t know, I?m not familiar with STP): What?s the problem in this scenario? Personally, I see one. And it seems to be exactly one of the problems, I want to overcome whit PTE. The problem is that there is no synchronization of rates across the proxies. If the bottleneck were P1-P2, this would result in a _flow_ control problem between RCV and P1: P1 would experience frequent buffer shortages and would have to slow down SND. This problem essentialy results from the fact, that P1 breaks the TCP self clocking semantics, at least if there is something like that on P1-P2. If the link P1-P2 were rate controled, the self clocking semantics would not be broken, a self clocking simply would not exist. Is this way of thinking correct? Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From huitema at windows.microsoft.com Sat Jul 16 15:27:05 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Sat, 16 Jul 2005 15:27:05 -0700 Subject: [e2e] Satellite networks latency and data corruption Message-ID: > What I would like to now: > -Which bandwidths can be achieved by satellitle links? It depends. Back in the 1980's, the satellite that we were using featured simple transponders, receiving on one frequency, shifting the signal to another, and amplifying it back. The final stage was a 20 W vacuum tube, and the bandwidth was 35 MHz. The signal was sent on a single beam that covered Western Europe. The ground stations used 3 meter antennas. Using a simple modulation, the transponder's capacity was 35 Mbps. All that can change from system to system. Power on board satellites is still limited, but better electronics on board the satellites and in ground stations can reduce the transmitter and receiver noise -- although I am not sure whether solid state electronics are actually less noisy than vacuum tubes. Some satellites use directed beams, and so can concentrate more power towards the expected receiver. With modern DSP, you can certainly implement much more sophisticated reception algorithms. On the other hand, modern systems tend to use smaller antennas, which are much more practical than a 3 meter diameter dish, but which are also less directive and less powerful. We can certainly carry several 100 Mbps through a satellite, and I would not be surprised if some systems were able to carry several Gbps. But the reality is that there is no such thing as a generic satellite link. Different systems will have different characteristics. > -Which packet corruption rates? (I know, you said this above, but I?m > not an electrical engineer, so I have to translate his into my way of > thinking. As I understand you say there are basically two states: Sane: > Packet corruption errors are neglectible. Ill: Packet corruption rates > are...? Are there typical values? 90 %, 9 %, 0.9 %.....? Again, the values depend on the specific characteristics of the system. The designers will aim for some reasonable point of operation, typically express as "a bit error rate of less than X for a fraction Y of the time". The fraction Y will depend of the expected usage -- the classic two, three, five nines. This will determine the worst weather condition under which the system can operate, i.e. conditions that are only met 1%, 0.1%, 0.001%. Then, the designers will pick a desired service level. If they expect to serve voice, they will aim for a bit error rate compatible with the voice compression codec -- typically 10-5. If they aim for data, they will pick a lower rate -- typically 10-6 or 10-7. In some cases, they will propose two level of circuits, using FEC to upgrade from "voice" to "data" quality. Then, there is the difference between the design spec and the actual implementation. The design spec typically allocates a budget to each of the elements in the chain -- stations, antenna, transponders. The engineers in charge of each component strive to match the allocation, and in many cases do a little bit better than expected. These little gains accumulate, and in the end you may well find out that the actual bit error rate much lower than the initial spec. Bottom line, you need to actually measure the system for which you are designing the protocol. There is no "one size fits all" answer to your question. > Now, the authors propose a splitting/spoofing architecture: > > SND----netw.cloud----P1---satellite link----P2-----netw.cloud----RCV > > P1, P2: "Proxies". In theory, there is no particular performance benefit to the proxy architecture. If the transmission protocol uses selective retransmission and large windows, the differences of performance between SND-RCV and P1-P2 are truly minimal. In practice, there is only an advantage if the end-to-end implementations are not well tuned, e.g. do not allow for large windows. -- Christian Huitema From detlef.bosau at web.de Sun Jul 17 05:00:26 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 17 Jul 2005 14:00:26 +0200 Subject: [e2e] Satellite networks latency and data corruption References: Message-ID: <42DA485A.8EE487C4@web.de> Christian Huitema wrote: > > Bottom line, you need to actually measure the system for which you are designing the protocol. There is no "one size fits all" answer to your question. > > > Now, the authors propose a splitting/spoofing architecture: > > > > SND----netw.cloud----P1---satellite link----P2-----netw.cloud----RCV > > > > P1, P2: "Proxies". > > In theory, there is no particular performance benefit to the proxy architecture. If the transmission protocol uses selective retransmission and large windows, the differences of performance between SND-RCV and P1-P2 are truly minimal. In practice, there is only an advantage if the end-to-end implementations are not well tuned, e.g. do not allow for large windows. > The authors of the aformentioned papers claim three problems. 1.: Packet loss rate / loss differentiation. 2.: Long RTT. 3.: Restrictions of window size. My remarks on these: Now, for 3: The authors don?t mention / discuss window scaling. For 2: A proxy on transport layer cannot shorten the RTT as perceived by the _application_. For 1: I?m not quite sure, whether loss differentiation is really a problem here. From what you say I conclude that this is a borderline case. However, conerning long RTT in combination with window size, there may be a problem left. It may occur that a flow?s fair share of capacity along the satellite link becomes quite large (this is of course load dependent). Is it possible, that it takes an unwanted number of "rounds" then for a sender to achieve the appropriate congestion window? I?m not quite sure about this, because proper window scaling should alleviate this problem. Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Sun Jul 17 07:44:18 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 17 Jul 2005 16:44:18 +0200 Subject: [e2e] Once again "Dynamik Congestion Control to Improve Performacne of TCP Split-Connections over Satellite Links" References: <42D93B32.90601@web.de> Message-ID: <42DA6EC2.203@web.de> Detlef Bosau wrote: Once upon a time, I wrote: > Just another question. I?m trying to understand a paper "Dynamik > Congestion Control to Improve Performacne of TCP Split-Connections over > Satellite Links" by Lijuan Wu, Fei Peng, Victor C.M. Leung, appeared in > Proc. ICCCN 2004, and I?m about to throw the towel. I somehow feel, it?s > related work for me, but after a couple of days I still do not see, > which problem is solved in this paper. ... > > Refering to the aforementioned paper, a satellite link is "lossy". > > Now, the authors propose a splitting/spoofing architecture: > > > SND----netw.cloud----P1---satellite link----P2-----netw.cloud----RCV > > P1, P2: "Proxies". > > Let?s assume splitt connection gateways (I-TCP, Bakre/Badrinath, 1994 or > 1995). I think, I?ve seen the problem. Normally, I would not bother this list with critical remarks on each paper I read. Surely, this would be annoying and distracting. (However, there are so funny papers around. Is there a list for howlers, where one could discuss the best ones?) However, this is an important one, because I guess, the authors here are not the only victims of a NS2 pecularity, I myself made this mistake more than once: The authors claim a congestion problem at the "MAC queue" from P1 to the satellite link. It?s the good old "source congestion", which is met in NS2 because the NS2 implementation does not distinguish TCP senders from routers and thus does not implement the "backpressure to wire speed", which should be part of any well done socket implementation. In NS2, a TCP sender enqueues a packet at the link and afterwards continues its work. Normally, this does not result in problems. In NS2, queue lenghts are default 20 packets, as well as AWND. Thus, in startup phase, the "sender queue" may fill up a little until the flow is in congestion avoidance phase and the packet intervalls equalize. However, in special cases, this can be a) a pitfall and b) an _important_ difference to reality. Please correct me, if I?m wrong. But at the moment, I guess, the authors solve a self made problem here, which results from a programming simplification in NS2. Again: This simplification is justified in most cases and implementing backpressure to wirespeed in the NS2 is a bit tricky. However, the authors have found one of the rare cases, where "NS2 source congestion" is a problem ;-) And of course, "backpressure through proxies" _is_ a problem as far as I can see. (I know that some people disagree here.) But to my understanding, the paper mentioned above does not tackle this issue. (However, it is really difficult to put this in a "related work" section in a paper. Ignoring it is wrong - a proper discussion is nearly impossible....) Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From KSCOTT at mitre.org Mon Jul 18 06:24:58 2005 From: KSCOTT at mitre.org (Scott,Keith L.) Date: Mon, 18 Jul 2005 09:24:58 -0400 Subject: [e2e] Satellite networks latency and data corruption Message-ID: A few comments: 1. A common claim is that if the proxies are proximate to the satellite channel, then one can assume that loss between the proxies is due to corruption and not congestion. This requires a number of assumptions, including that the proxies are smart enough to NOT over-drive the satellite link. However, this knowledge can sometimes be useful, see 2b below. 2. While it's true that proxies don't shorten the RTT from the point of view of the application, they do shorten the RTT as seen by TCP. Thus end systems can use standard window sizes and not have to allocate a ton of buffer space to all connections, whether they go over the satellite or not. Automatic buffer tuning mitigates this. 2b. Perhaps more importantly though, proxies allow the parameters of the TCP connection (window size, congestion control algorithm and settings, etc.) to change to accommodate the _local_ network characteristics. This can give dramatically better performance, depending on the applications involved. FTP performance can be substantially improved; stop-and-wait application protocols (e.g. CIFS) will still take a hit from the long (e2e) RTT. 3. The only point I would make here is that if one opens up the default window sizes for all endpoints to get good performance 'just in case' connections go over large BDP channels, it may end up consuming memory for small BDP connections as well. On your last point, consider a point in the P2--RCV network where multiple flows, some of them over the satellite and some of them short RTT, share a link. If there is congestion loss on that link, the long-RTT flow will be 'unfairly' penalized in the recovery process, and the short-RTT flows will get more of the link bandwidth. In the proxy environment, the P2--RCV flows that go over the satellite are 'short-RTT' on the P2--RCV part of the path, and so compete on a more even basis with other flows. --keith >-----Original Message----- >From: end2end-interest-bounces at postel.org >[mailto:end2end-interest-bounces at postel.org] On Behalf Of Detlef Bosau >Sent: Sunday, July 17, 2005 8:00 AM >To: end2end-interest at postel.org >Subject: Re: [e2e] Satellite networks latency and data corruption > >Christian Huitema wrote: >> >> Bottom line, you need to actually measure the system for >which you are designing the protocol. There is no "one size >fits all" answer to your question. >> >> > Now, the authors propose a splitting/spoofing architecture: >> > >> > SND----netw.cloud----P1---satellite >link----P2-----netw.cloud----RCV >> > >> > P1, P2: "Proxies". >> >> In theory, there is no particular performance benefit to the >proxy architecture. If the transmission protocol uses >selective retransmission and large windows, the differences of >performance between SND-RCV and P1-P2 are truly minimal. In >practice, there is only an advantage if the end-to-end >implementations are not well tuned, e.g. do not allow for >large windows. >> > >The authors of the aformentioned papers claim three problems. > >1.: Packet loss rate / loss differentiation. >2.: Long RTT. >3.: Restrictions of window size. > >My remarks on these: >Now, for 3: The authors don?t mention / discuss window scaling. >For 2: A proxy on transport layer cannot shorten the RTT as >perceived by >the _application_. >For 1: I?m not quite sure, whether loss differentiation is really a >problem here. From what you say I conclude that this is a borderline >case. > >However, conerning long RTT in combination with window size, there may >be a problem left. It may occur that a flow?s fair share of capacity >along the satellite link becomes quite large (this is of course load >dependent). Is it possible, that it takes an unwanted number >of "rounds" >then for a sender to >achieve the appropriate congestion window? I?m not quite sure about >this, because proper window scaling should alleviate this problem. > >Detlef Bosau > > > >-- >Detlef Bosau >Galileistrasse 30 >70565 Stuttgart >Mail: detlef.bosau at web.de >Web: http://www.detlef-bosau.de >Mobile: +49 172 681 9937 > From detlef.bosau at web.de Mon Jul 18 16:19:01 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 19 Jul 2005 01:19:01 +0200 Subject: [e2e] Satellite networks latency and data corruption References: Message-ID: <42DC38E5.3BD9BB11@web.de> "Scott,Keith L." wrote: > > A few comments: > > 1. A common claim is that if the proxies are proximate to the satellite > channel, then one can assume that loss between the proxies is due to > corruption and not congestion. This requires a number of assumptions, > including that the proxies are smart enough to NOT over-drive the > satellite link. However, this knowledge can sometimes be useful, see 2b > below. > That?s the reason why I asked for error rate estimates. When I refer to Christian Huitema?s post, I?m not quite sure whether we need proxies around the satellite link or wheter FEC will suffice. It?s, however, a question. Besides the posts by Christian Hutima, Adrian Hooke just sent me some references for helpful material here. The central question here is indead, whether proxies really make sense. If so, of course a number of questions arise as you point out in 2b. Particularly, I would expect that the use of _rate_ controled protocols on the satellite link would be advantegous over window based protocols, because in window based protocols it takes a number of "rounds", therefore a large time, to achieve equilibrium. One could investigate architechtures which couple window based TCP segments with rate controled ones and ask for possible problems and their solutions. However, before I would dive into the goary details here, I want to avoid mounting a dead horse. And at least from Christian?s posts I learned: the horse could have more vitality.... > 2. While it's true that proxies don't shorten the RTT from the point of > view of the application, they do shorten the RTT as seen by TCP. Thus This is clear. This is the rationale behind a proxy to shorten the RTT on transport layer, to hide a large LBP etc. However, I often intendedly take a user?s perspective. The central question is: What is the benefit to the user, if we introduce a proxy? I think, any research must never ignore this question. If a proxy shortens the RTT on transport layer and there is no benefit to the user, that would be maximum effort with minimum effect. And if satellite links won?t be used for interactive applications, due to the large RTT, but only for long term flows, I really wouldn?t care about fife or ten rounds (i.e. 2.5 or 5 seconds) for a flow to achieve equilibrium, if the subsequent "download of the new RedHat ISO" lasts two hours anyway. > end systems can use standard window sizes and not have to allocate a ton > of buffer space to all connections, whether they go over the satellite > or not. Automatic buffer tuning mitigates this. Should we distinguish between LTN and LFN here? LTN: An internet backbone, where a flows fair share of capacity is some few packets. The network is "thin". LFN: A user connected to the Internet via a satellite phone. In that case, the fair share of capacity may indeed become quite large. The network is "fat". In case of "LTN" (from a users point of view) I?m not totally convinced that the situation is that much different from intercontinental terrestrical deep sea cable links. > > 3. The only point I would make here is that if one opens up the default > window sizes for all endpoints to get good performance 'just in case' > connections go over large BDP channels, it may end up consuming memory > for small BDP connections as well. > > On your last point, consider a point in the P2--RCV network where > multiple flows, some of them over the satellite and some of them short > RTT, share a link. If there is congestion loss on that link, the > long-RTT flow will be 'unfairly' penalized in the recovery process, and > the short-RTT flows will get more of the link bandwidth. In the proxy Hm. Is this behaviour really _that_ terrible? It?s only a question. But to the best of my knowledge, the vast majority of TCP flows are short term flows (less then 20 packets in about 95 % of flows). These are often interactive dialogs where a user requests quick responses from the system. However, if satellite links are used for long term flows (e.g. large file transfers) who cares for some unfairness which causes a long term flow to last 10 percent longer? Or 20 percent? I don?t know. However, it?s a question of relevance, and therefore the question for "the horse". And I even heared about the opinion: Anyone is interested in bulk transfers, perhaps we can ignore some short term flows. My opinion on this one is: The Internet is made for the users and not vice versa. And when a bulk transfer lasts one hour and five minutes instead of one hour, hardly anyone would really mind. However, when a WWW server?s response time increases from ten seconds to eleven..... Have you ever worked at a user?s help desk? I have....... A purely academic view will be of course quite different from that. At least, because long term flows are much easier to deal with ;-) But again: research must not ignore reality and user?s requirements. So, I see bulk transfers as a simplification. The really interesting thing is user interaction and responsiveness because that defines the "look and feel" of the Internet and therefore its acceptance by the users. Consequently, it may be worthwile to (re)mount the diffserve horse here: e-mail is not the same as WWW. A remote login via SSH is not the same as a download of the latest RedHat ISOs. I can very well imagine to treat different things in different ways. This may be more obvious in mobile wireless systems than in satellite links. Particularly in mobile wireless networks you always immediately encounter tradeoffs when you consider the use of PEP: large buffers vs. varying bandwidth, optimum throughput vs. optimum responsiveness etc. Whereas diffserve may be somewhat academic for the wirebound Internet, I can very well imagine that it may become helpful in systems with any kind of PEP. (Eventually, someone will use the TOS bits =8-)) Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Sun Jul 24 02:08:55 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 24 Jul 2005 11:08:55 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: <20050715174936.D1CC61FF@aland.bbn.com> <42D80C0E.9010404@web.de> Message-ID: <42E35AA7.60301@web.de> I?m somwhat confused that apparently hardly anyone is interested in this topic. Perhaps, it?s a stupid one. Then, please explain to me why. Perhaps, I did not pose my questions clear enough. I will give it another try. Q1: What is the semantics of RTO? Is it correct to see RTO as a confidence intervall for the RTT? If so, I?m particulary confused about quite a couple of papers concerning "spurious timeouts". Sometimes, I got the impression that spurious timeouts are some "strange phenomenon" which was "detected" by chance or by accident. I don?t know. In addition, years ago a professort told me the formula RTO=RTT+2VAR was found by "probing", "experiments". O.k. He is professor, not me.. So he must be right there ;-) So once again: Is RTO commonly seen as a confidence intervall for RTT or not? Craig wrote: >> I believe the immediate issue is not the "RTO model" but rather the >> question of what RTO estimator you use. In the late 1980s there was >> a crisis of confidence in RTO estimators -- a problem we dealt with by What?s the meaning of confidence here? I use "confidence interval" in its mathematical sense. An interval I is a p confidence interfal vor a stochastic variable X, if an instance x of X is in the interval I with propability p. To my understanding, it?s important for competing TCP flows to use similar or equal confidence intervals here, otherwise we hardly would achieve fairness. So, we have basically two issues here. The first one is the robustness issue. How robust are RTO/RTT/.. estimates? I don?t want to discuss this here, because this is bascially no TCP related question. It?s the question if it is at least _possible_ to estimate a RTT. And this is a requirement for the network itself and its structure. To my knowledge, there are quite a few papers around dealing with "self similarity". I?m not quite sure, but if e2e latencies were in fact "self similar" (I use "" here because the term "self similar" is sometimes used without a satisfactory mathematical definition), we could stop the discussion here. In that case, there would be hardly any chance to have acceptable RTT estimates. (I?m no expert here, but estimators often converge due to the SLLN or similar theorems and there is an assumption "i.i.d." in it. Identically and _independently_ distributed. In a self similar series of stochastic variables I _strongly_ doubt their independence.) So, at least _one_ assumption for a network is inevitable in order to use sender initiated, timeout based retransmission: Convergent estimators for the timeout must exist. Unfortunaly, a priori we do not know about possible limitaions of RTT, particularly there is no general upper limit. So it is somewhat cumbersome to derive an 1-alpha confidence interval directly from the sample here. In fact, it is a common approach in statistics, to derive confidence intervals from estimates for expectation and variation of a stochastic variable. Often there is some implicit assumption about the districution function of this varible, e.g. gaussian. So, if whe use RTT and VAR (as we do in TCP), we implictly assume that estimators for RTT and VAR _exist_. But in principle, these estimators are not defined by TCP, they are _assumed_ by TCP. Bascially, we _assume_ the existence of a RTT/VAR/RTO estimators here and then we use them. And hopefully, we use appropriate ones for the packet switched network in use. So once again and very short: The RTO used in TCP is a confidence interval for RTT. TCP _assumes_ (if implicitly) the existence of a reasonable RTO estimator. Is this correct? O.k. Then the next steps are: -Identification of a gerenic estimator, if possible. -Identification and elimination of vulnerabilities. >> developing Karn's algorithm (to deal with retransmission ambiguity) and >> improving the RSRE estimation algorithm with Van Jacobson's replacement. O.k. Let?s ignore the retransmission ambiguity for the moment. (An easy way to overcome this would be to mark each TCP datagram sent with a unique identifier, which is reflected by the according ACK. Particularly, if a TCP datagram is sent more than once, it would be given a different identifier each time it is sent.AFAIK this is the rationale behaind the "sequence number" in ICMP.) Q2: What are other vulnerabilities and implicit assumptions? -Are there assumptions concerning the latency distribution? -Are there assumptions concerning the latency _stability_? What about latency oscillations? In other words: What is the system model behind the RTT estimators used in TCP? What are the _requirements_ for TCP to work properly? Can we make implicit assumptions explicit? Which requirements must be met by a network so that TCP can work without problems? Is this question stupid? If not: Is there existing work on this issue? If so, I would appreciate any hint. Detlef Bosau -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From fmenard at xittelecom.com Sun Jul 24 04:07:13 2005 From: fmenard at xittelecom.com (Francois Menard) Date: Sun, 24 Jul 2005 06:07:13 -0500 (EST) Subject: [e2e] peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange In-Reply-To: <42E35AA7.60301@web.de> References: <20050715174936.D1CC61FF@aland.bbn.com> <42D80C0E.9010404@web.de> <42E35AA7.60301@web.de> Message-ID: What is the latest state of the art thinking towards peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange. -=Francois=- -- Francois D. Menard Project Manager/Charge de projet Xit telecom inc. 1350 Place Royale Suite 800 Trois-Rivieres, QC, G9A 4J4 +1 819 692 1383 fmenard at xittelecom.com From dpreed at reed.com Sun Jul 24 06:03:40 2005 From: dpreed at reed.com (David P. Reed) Date: Sun, 24 Jul 2005 09:03:40 -0400 Subject: [e2e] peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange In-Reply-To: References: <20050715174936.D1CC61FF@aland.bbn.com> <42D80C0E.9010404@web.de> <42E35AA7.60301@web.de> Message-ID: <42E391AC.4050903@reed.com> Francois - The sarcastic answer is "walled gardens" based on "firewalls". That is, it's not in the interest of anyone but us out-of-favor hackers to focus on security and resiliency solutions that don't involve the operators. The operators are heavily invested in marketing and selling "security" (the latest ridiculous idea is IRAP, the next-most-recent is elimination of the "analog hole" in DRM solutions that involve the transport in every aspect of the application's business). So instead of true resilience and security, we get operators who think they should "provide" security - after all they "provide" the whole damn Internet, don't they? I spend a lot of time with operators through the MIT Communications Futures Program. *None* of their management seem interested in peer-to-peer solutions to security issues. However, some of the equipment vendors (those that dare to wave a sharp stick towards the eye of Verizon Wireless, their slavemaster) are exploring edge-based security solutions, which Verizon promptly removes. - David Francois Menard wrote: > > What is the latest state of the art thinking towards peer to peer, > e2e, PKI authentication, trust chain discovery, management and > capabilities exchange. From s.malik at tuhh.de Sun Jul 24 08:12:28 2005 From: s.malik at tuhh.de (sireen malik) Date: Sun, 24 Jul 2005 17:12:28 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites Message-ID: <42E3AFDC.50607@tuhh.de> Hi, My $2/100. Self-Similarity (SS) on its own is not a problem, consider Brownian motion, however, the property that the sum of correlations is infinite (Long Range Dependence) has the potential to deviate performance in a very significant way from the classical queuing behavior. Note: When people talk about Self-Similarity in Internet traffic, they are in fact talking about "Second Order Asymptotic Self-Similarity". Central limit theorem is applicable under the condition of IID's and more importantly the "finite variance" assumption. One way of modeling Internet traffic is fractional Brownian motion (fractional Gaussian). It has infinite variance and non-summable long range correlations. So theoretically, CLT does not apply to Internet traffic. In fact, mathematicians talk about Generalized CLT i.e,.convergence to an alpha-stable distribution. Then TCP's RTO estimator looks in trouble under the CLT assumption. However, that is theory. In my opinion, following are some significant practical considerations: * Buffers are finite, so buffering delays are bounded. So E2E delay and the resultant RTT's will never be infinite. * If we assume that packets are of fixed size then a fairly utilized queue system will start putting deterministic/constant gaps in the departure process. Recall Prof. Paul Kuhn's (Univ Stuttgart) formula for G/G/1 system under decomposition. It predicts a perfectly deterministic departure process for a 100% utilized queue. Theoretical, but it does make sense! * Buffers operate at small time scales, while SS is a large time asymptotic property...so is it relevant at all to the question question? * The heavy-tailed file-sizes distribution, linked with the LRD in the Internet traffic, is truncated in practice. There is no such thing as "file of infinite size". If so, we will have a heavy-tailed distribution with "finite" variation. So correlations should only be visible for some orders of magnitude of time-scales, and not at "all" times scales. Therefore the process should not be LRD in the the pure sense, rather Short Range Dependent. So my feeling is that E2E and resultant RTT distribution has finite mean and finite variance. Therefore, the assumption of CLT is not a bad one for practical applications. regards, Sireen Malik- Hamburg > I don?t want to discuss this here, because this is bascially no TCP > related question. It?s the question if it is at least _possible_ to > estimate a RTT. And this is a requirement for the network itself and > its structure. To my knowledge, there are quite a few papers around > dealing with "self similarity". I?m not quite sure, but if e2e > latencies were in fact "self similar" (I use "" here because the term > "self similar" is sometimes used without a satisfactory mathematical > definition), we could stop the discussion here. In that case, there > would be hardly any chance to have acceptable RTT estimates. (I?m no > expert here, but estimators often converge due to the SLLN or similar > theorems and there is an assumption "i.i.d." in it. Identically and > _independently_ distributed. > In a self similar series of stochastic variables I _strongly_ doubt > their independence.) > So, at least _one_ assumption for a network is inevitable in order to > use sender initiated, timeout based retransmission: Convergent > estimators for the timeout must exist. > Unfortunaly, a priori we do not know about possible limitaions of RTT, > particularly there is no general upper limit. So it is somewhat > cumbersome to derive an 1-alpha confidence interval directly from the > sample here. In fact, it is a common approach in statistics, to derive > confidence intervals from estimates for expectation and variation of a > stochastic variable. Often there is some implicit assumption about the > districution function of this varible, e.g. gaussian.So, if whe use > RTT and VAR (as we do in TCP), we implictly assume that estimators for > RTT and VAR _exist_. > > But in principle, these estimators are not defined by TCP, they are > _assumed_ by TCP. Bascially, we _assume_ the existence of a > RTT/VAR/RTO estimators here and then we use them. And hopefully, we > use appropriate ones for the packet switched network in use. > > So once again and very short: > > The RTO used in TCP is a confidence interval for RTT. > TCP _assumes_ (if implicitly) the existence of a reasonable RTO > estimator. > > Is this correct? > > O.k. > > Then the next steps are: > -Identification of a gerenic estimator, if possible. > -Identification and elimination of vulnerabilities. > >>> developing Karn's algorithm (to deal with retransmission ambiguity) and >>> improving the RSRE estimation algorithm with Van Jacobson's >>> replacement. >> >> > > O.k. Let?s ignore the retransmission ambiguity for the moment. > (An easy way to overcome this would be to mark each TCP datagram sent > with a unique identifier, which is reflected by the according ACK. > Particularly, if a TCP datagram is sent more than once, it would be > given a different identifier each time it is sent.AFAIK this is the > rationale behaind the "sequence number" in ICMP.) > > Q2: What are other vulnerabilities and implicit assumptions? > > -Are there assumptions concerning the latency distribution? > -Are there assumptions concerning the latency _stability_? What about > latency oscillations? > > In other words: What is the system model behind the RTT estimators > used in TCP? > > What are the _requirements_ for TCP to work properly? Can we make > implicit assumptions explicit? > > Which requirements must be met by a network so that TCP can work > without problems? > > Is this question stupid? If not: Is there existing work on this issue? > If so, I would appreciate any hint. > > Detlef Bosau > > From huitema at windows.microsoft.com Sun Jul 24 12:52:52 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Sun, 24 Jul 2005 12:52:52 -0700 Subject: [e2e] peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange Message-ID: > Francois - The sarcastic answer is "walled gardens" based on > "firewalls". That is, it's not in the interest of anyone but us > out-of-favor hackers to focus on security and resiliency solutions that > don't involve the operators. On a less sarcastic note, you can check the Microsoft efforts for deploying IPSEC (http://www.microsoft.com/ipsec/) and in particular the "domain isolation" scenario for deploying end-to-end protection inside an enterprise. I would personally love to see the scenario extended to Internet-wide protection, but there are practical difficulties to overcome. As the subject header mentions, one of these difficulties is end-to-end authentication, which seems only possible so far within well constrained communities. -- Christian Huitema From MURSHEDM at uiu.edu Sun Jul 24 17:27:54 2005 From: MURSHEDM at uiu.edu (MURSHEDM) Date: Sun, 24 Jul 2005 19:27:54 -0500 Subject: [e2e] RED simulation problems Message-ID: <5151ECD6856C0F4493E3DD985A64671F01765CA7@uiu-ex1.uiu.uiu.edu> Hi all, Can anybody please explain the following result of ns-2 simulation using ns-2? I have a RED router that serves exactly one packet after receiving 5 packets because 5 sources are connected to it and they will send a bust of 15 packets. For the first 5 packets (queue is empty) Router will serve only one packet therefore in the queue there will be 4 packets, before the next 5 packets arrive. Surprisingly at the end of serving one packet trace data shows there are only 3 packets (But accordingly based on my calculation and setting there will be 4 packets, nam simulation also shows 4 packets in the queue!) It doesn't work only for the first packet arrival for all the other situation it gives corrects expected result. For example the last two lines of the result must be (a 0.014 0.01996 Q 0.014 4) not (a 0.047 0.01996 Q 0.047 4) I changed router service rate but still got same result. At the beginning it always wrong trace. Can anybody please explain this to me? How can I fixed it, any thoughts or suggestion? It's really urgent please help me if you have any clue. Please send responses at murshedm at uiu.edu a 0.014 0.002 Q 0.014 1 a 0.014 0.005996 Q 0.014 2 a 0.014 0.011984 Q 0.014 3 a 0.047 0.01996 Q 0.047 4 Thanks and regards Murshed From Jon.Crowcroft at cl.cam.ac.uk Mon Jul 25 04:22:10 2005 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Mon, 25 Jul 2005 12:22:10 +0100 Subject: [e2e] peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange In-Reply-To: Message from "Christian Huitema" of "Sun, 24 Jul 2005 12:52:52 PDT." Message-ID: there's loads of work in the research community on how to do infrastructure free trust community building (e..g for p2p and manet) recent PhDs at eurecom and imperial (can't recall the imperial college one but pietro michiardi's work with refik molva on CORE is worth a look there are some breakthrouhs on how to bootstrap the systems (e.g. preventing sybil attacks undermining the usual reputation based strategic learning systems) there are ideas kicking around for using social networks (based on recommendation but limiting damage to size of commuunity by degree distribution of acquaintances)... i dont know what I would cite as state of the art, but its a lively area the usual limits ondecentralised trust are how many witnesses you have to have to detect bad players - in overlay p2p this is well known but in MANETs or nets where you can triangulate on bad players from multiple points, I think there's probably a much better bound an interesting feature of the p2p trust management systems is that while they only give you statistical trust (and some with do risk evaluation so you can use trust+risk to compute expected gain/loss like eBay etc), when there is an infrastructure too, they are as good as you can get, but configuration free...so much more deployable than traditional PKIs - of course, they have the problem that human understanding of risk is very poor but hey, whats new ? i can't wait to see an article on how many unlikely bad things can you experience before breakfast: the net effect from pier-to-peer, Phish _and_ spam? what a menu:-) From fmenard at xittelecom.com Mon Jul 25 05:51:42 2005 From: fmenard at xittelecom.com (Francois Menard) Date: Mon, 25 Jul 2005 07:51:42 -0500 (EST) Subject: [e2e] peer to peer, e2e, PKI authentication, trust chain discovery, management and capabilities exchange In-Reply-To: References: Message-ID: On Mon, 25 Jul 2005, Jon Crowcroft wrote: > an interesting feature of the p2p trust management systems is that while they only give you statistical trust (and > some with do risk evaluation so you can use trust+risk to compute expected gain/loss like eBay etc), > when there is an infrastructure too, they are as good as you can get, but configuration free...so much more > deployable than traditional PKIs - of course, they have the problem that human understanding of risk is very poor > but hey, whats new ? > > i can't wait to see an article on > how many unlikely bad things can you experience before breakfast: > the net effect from pier-to-peer, Phish _and_ spam? what a menu:-) It seems that it scales in proportion of how much people, food and knowledge transfer you can mix during a breakfast key signing party. -=Francois=- From craig at aland.bbn.com Mon Jul 25 11:21:08 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Mon, 25 Jul 2005 14:21:08 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: Your message of "Sun, 24 Jul 2005 11:08:55 +0200." <42E35AA7.60301@web.de> Message-ID: <20050725182108.788561FF@aland.bbn.com> In message <42E35AA7.60301 at web.de>, Detlef Bosau writes: >Craig wrote: > >>> I believe the immediate issue is not the "RTO model" but rather the >>> question of what RTO estimator you use. In the late 1980s there was >>> a crisis of confidence in RTO estimators -- a problem we dealt with by > >What´s the meaning of confidence here? I wrote in the non-mathematical sense -- if you read Lixia Zhang's "Why TCP Timers Don't Work" or Raj Jain's paper on the failings of all known methods of RTT estimation at the time, you'll see there was a view that said, perhaps, it wasn't possible to measure RTT correctly. >So, at least _one_ assumption for a network is inevitable in order to >use sender initiated, timeout based retransmission: Convergent >estimators for the timeout must exist. > >Unfortunaly, a priori we do not know about possible limitaions of RTT, >particularly there is no general upper limit. So it is somewhat >cumbersome to derive an 1-alpha confidence interval directly from the >sample here. In fact, it is a common approach in statistics, to derive >confidence intervals from estimates for expectation and variation of a >stochastic variable. Often there is some implicit assumption about the >districution function of this varible, e.g. gaussian. Right -- we do not know the distribution function for the RTT. The key issue (and the one that Zhang and Jain raised) is that periodically the path changes such that the RTT changes wildly. More in following note. Craig From craig at aland.bbn.com Mon Jul 25 11:26:20 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Mon, 25 Jul 2005 14:26:20 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: Your message of "Sun, 24 Jul 2005 11:08:55 +0200." <42E35AA7.60301@web.de> Message-ID: <20050725182620.265801FF@aland.bbn.com> I split out the more formal discussion. In message <42E35AA7.60301 at web.de>, Detlef Bosau writes: >The RTO used in TCP is a confidence interval for RTT. >TCP _assumes_ (if implicitly) the existence of a reasonable RTO estimator. > >Is this correct? Yes. > >Q2: What are other vulnerabilities and implicit assumptions? > >-Are there assumptions concerning the latency distribution? I don't know that the question has been asked in quite this way in the past. As best I can answer it, I believe the assumption is the following: * there may be multiple round-trip paths in use at the same time for the same TCP connection. * for a particular path the following is true: + there is a minimum latency (the base RTT) PLUS + some distribution of added delay which is non-gaussian >-Are there assumptions concerning the latency _stability_? What about >latency oscillations? Yes. A path is assumed to be stable only for some (undetermined but non-negligible) period of time after which it may change and the RTT may change by orders of magnitude. It is possible to bounce between two or more paths. >Is this question stupid? If not: Is there existing work on this issue? >If so, I would appreciate any hint. Not much was written down. The literature mostly dates from a flurry of work in the late 1980s. If you want to pursue this formally, I think the field is clear before you. Craig From dpreed at reed.com Mon Jul 25 12:59:29 2005 From: dpreed at reed.com (David P. Reed) Date: Mon, 25 Jul 2005 15:59:29 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <20050725182620.265801FF@aland.bbn.com> References: <20050725182620.265801FF@aland.bbn.com> Message-ID: <42E544A1.8020706@reed.com> The most fundamental problem with RTO estimates in the Internet is that the most significant sources of measured variation (queueing delay, for example) are variables that are being used as signalling channels between multiple independent goal-seeking processes at multiple levels. Note that the load distribution cannot be characterized by a stable a priori description, because load is itself responsive at all timescales to behavior of humans (users, app designers, cable plant investors, pricing specialists, arbitrage experts, criminal hackers, terrorists, network installers, e-commerce sites, etc.) So you are fooling yourself if you start with a simple a priori model, even if that model passes so-called "peer review" (also called a folie a deux - mutually reinforcing hallucinations about reality) and becomes the common problem statement for a generation of graduate students doing network theory. In my era, the theorists all assumed that Poisson arrival processes were sufficient. These days, "heavy tails" are assumed to be correct. Beware - there's much truth and value, but also a deep and profound lie, in such assertions and conventional wisdoms. Those of you who understand the profound difference between Bayesian and Classical statistical inference will understand ... That said, From craig at aland.bbn.com Mon Jul 25 13:08:49 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Mon, 25 Jul 2005 16:08:49 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: Your message of "Mon, 25 Jul 2005 15:59:29 EDT." <42E544A1.8020706@reed.com> Message-ID: <20050725200849.CA8061FF@aland.bbn.com> In message <42E544A1.8020706 at reed.com>, "David P. Reed" writes: >The most fundamental problem with RTO estimates in the Internet is that >the most significant sources of measured variation (queueing delay, for >example) are variables that are being used as signalling channels >between multiple independent goal-seeking processes at multiple levels. Hi Dave: Right -- they are variables, not independent activities. However, you need a better example -- queueing is not a major source of delay in most of the Internet these days. And the one place it seems to be present, in wireless networks -- the issue is often channel access delays (which is probably a better illustration of your point anyway). Craig From detlef.bosau at web.de Mon Jul 25 13:57:24 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 25 Jul 2005 22:57:24 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> Message-ID: <42E55234.6060100@web.de> O.k., so welcome to the "cage aux folles" :-) *SCNR* And perhaps, I?m a real fool, because much of that what you say is in fact new to me. David P. Reed wrote: > The most fundamental problem with RTO estimates in the Internet is that > the most significant sources of measured variation (queueing delay, for > example) are variables that are being used as signalling channels > between multiple independent goal-seeking processes at multiple levels. Let me put an in example in my own words to see whether I understood your remark correctly. And please be patient because I?m a beginner here. Some example for what you have said could be the congestion signaling between a congested router queue and a TCP source. Is this correct? > > Note that the load distribution cannot be characterized by a stable a > priori description, because load is itself responsive at all timescales > to behavior of humans (users, app designers, cable plant investors, > pricing specialists, arbitrage experts, criminal hackers, terrorists, > network installers, e-commerce sites, etc.) > O.k.. If we leave out Al Quaida for the moment (apparently, I can?t because Al Quaida?s target has moved and appears to be now Good Ol?Europe...) the question is: What are the invariants at least for the lifetime of a TCP connection? E.g.: - Does a path change during that lifetime? - How often do we encounter multipath routing? > So you are fooling yourself if you start with a simple a priori model, > even if that model passes so-called "peer review" (also called a folie a > deux - mutually reinforcing hallucinations about reality) and becomes > the common problem statement for a generation of graduate students doing > network theory. In my era, the theorists all assumed that Poisson > arrival processes were sufficient. These days, "heavy tails" are > assumed to be correct. Beware - there's much truth and value, but also > a deep and profound lie, in such assertions and conventional wisdoms. As I said, I?m an absolute beginner here. But when you simply look at the assumptions made for the definition of a Poisson process, it?s really heavy stuff. And I sometimes wonder, where the justification for those assumptions come from. You can extend this to markov chains etc. One of the first lessons, I?ve learned from textbooks about stochastic processes is, that markovian processes are really nice - however reality is not quite as nice, it typically is not markovian....;-) However, I?m not quite sure whether we really need an explicit knowledge of a latency distribution to establish a confidence interval. Of course, it?s comfortable if a given statistic is always N(0,1) distributed. If you want a symmetric 0.9975 confidence interval, you simply can look it up in a table. And if your distribution is N(\mu,\sigma), o.k. that?s more difficult, left to the reader ;-) However, from what I?ve seen in mearurements from real networks, I?m afraid network latencies won?t do us the favour to obey that simple distributions. And what seems realy nasty to me is, that the often used "asymptotic" distributions will perhaps not hold here, because although long term flows are perhaps comfortable to deal with, they are rare in reality. I?ve read statistics and observations that claim that 95 % of all TCP flow consist of less then 20 packets in there lifetime. > > Those of you who understand the profound difference between Bayesian and > Classical statistical inference will understand ... O.k., admittedly I do not... So, I have to learn about it. -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Mon Jul 25 14:08:50 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 25 Jul 2005 23:08:50 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: <20050725200849.CA8061FF@aland.bbn.com> Message-ID: <42E554E2.5000107@web.de> Craig Partridge wrote: > > > Hi Dave: > > Right -- they are variables, not independent activities. > > However, you need a better example -- queueing is not a major source of > delay in most of the Internet these days. And the one place it But isn?t it a source for delay variation? > seems to be present, in wireless networks -- the issue is often > channel access delays (which is probably a better illustration of your > point anyway). Craig, if you told me that RTO estimation is a solved problem in wirebound networks and an open issue in wireless ones, I would love you for that.... (Oh, I apologize, we?re not really in the cage of folles here....;-)) In that case, it would be sufficient to hide wireless delay variations from the wirelebound network. Right? (I always talk about wide area mobile wireless networks, not WLAN etc., but if you talk about channel access delays, we apparently talk about the same thing.) However: would it be at least _helpful_ to seperate latency variations in wirebound and wireless network paths? Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From s.malik at tuhh.de Tue Jul 26 06:58:16 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Tue, 26 Jul 2005 15:58:16 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <42E544A1.8020706@reed.com> References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> Message-ID: <42E64178.709@tuhh.de> Hi, A technical discussion on heavy-tailed distributions is perhaps not relevant to this list, however, one gets an impression that these distributions are not relevant/suitable from Internet's point of view. > "Note that the load distribution cannot be characterized by a stable a > priori description, because load is itself responsive at all > timescales to behavior of humans (users, app designers, cable plant > investors, pricing specialists, arbitrage experts, criminal hackers, > terrorists, network installers, e-commerce sites, > etc.)".............................and......."you are fooling yourself > if you start with a simple a priori model, even if that model passes > so-called "peer review" (also called a folie a deux - mutually > reinforcing hallucinations about reality) and becomes the common > problem statement for a generation of graduate students doing network > theory. In my era, the theorists all assumed that Poisson arrival > processes were sufficient. These days, "heavy tails" are assumed to > be correct. Beware - there's much truth and value, but also a deep > and profound lie, in such assertions and conventional wisdoms. " From a network's point of view, the users (of all kinds) generate data - let us call it their on-phase. After downloading/generating data, they go into thinking- or reading-phase. This is the off-phase. The users remains in the on- and off-phase for randomly distributed times. Each user cycles through this On-Off behavior. This is the starting point of atleast one way of modeling Internet traffic. Poisson arrival process assumption at the session level is still ok, but the data, or files, these arrivals cause to flow through the net, are heavy-tailed distributed. This assumption is correct because empirical studies have showed us that - time and again. There is this proof that says that if either, or both, the on- and off-times of the on-off source are heavy-tailed distributed then the resultant traffic is LRD in nature. Simply put, LRD is tied to the large/infinite variance of the heavy-tailed distributions. Now the situation gets more complicated when we consider that the packets generation process in the heavy-tailed on-phase is not Poissonian, rather controlled by TCP. The protocol introduces additional burstiness in the small-time scales. This is also known as Multifractality. Therefore, the two most significant factors from Internet traffic's point of view are the heavy-tailed distributed file-sizes and congestion control mechanism of TCP. Please note, even if the variance in real world is not infinite, and that the LRD is only visible for some orders of time-scale, the queue performance is still significantly different from the one based on the simple assumption of Poissonian renewal arrival process (of packets on the line). Side note: 90% Internet traffic is based on TCP. The small-flow model holds for the web-traffic, the long-flows model is relevant to the P2P downloads. Traffic measurements show that P2P traffic now makes almost 50% (or perhaps more) of the TCP traffic. See Sprint website for traffic traces and analysis. > Those of you who understand the profound difference between Bayesian > and Classical statistical inference will understand ... !!! Sireen Malik Hamburg University of Technology, Germany From dpreed at reed.com Tue Jul 26 08:18:08 2005 From: dpreed at reed.com (David P. Reed) Date: Tue, 26 Jul 2005 11:18:08 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <42E64178.709@tuhh.de> References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> Message-ID: <42E65430.1060704@reed.com> Sireen Habib Malik wrote: > From a network's point of view, the users (of all kinds) generate data > - let us call it their on-phase. After downloading/generating data, > they go into thinking- or reading-phase. This is the off-phase. The > users remains in the on- and off-phase for randomly distributed times. Sireen - models ~ reality, but NOT model == reality. At the risk of belaboring the obvious (at least to me) - there is no "the network", just a collection of pieces loosely joined by agreements to cooperate, and there is no "user from the point of view of the network", there are just users who are not just people, but more and more consist of autonomous algorithms that mediate complex and persistent relationships among people. For example, the Firefox browser does communications based on a model it constructs of what the user might decide to look at next. This model is not in the network, nor is it in the user. It may not even work very well. Some network users are DDOS botnets, or out-of-control buggy communications. The network It's just plain not reasonable to say that "users" is a unitary concept subject to a stable model, just as it is not reasonable to say that networks (which include traffic shapers, MPLS, ...) are simple collections of queues and links. That's something that mathematicians do to map a real thing into a tractable problem space. But it's a dangerous mapping, as I've learned over and over in my career. On the one hand, you have to abstract in order gain one sort of understanding. But the abstraction creates a brittle problem structure - knock one of the assumptions and the whole elaborate business falls over dead. You can't describe why a rainforest behaves as it does by analysis at the level of van der Waals forces, nor can you by studying the genomes as encoded in DNA. You need to understand such things as the sunspot cycle and the path-dependent evolutionary and development cycle as well. My point about "incredibly useful" but containing an "essential lie" can be stated imprecisely as a statement that reductionism is insufficient to understand many very important phenomena that are VERY real-world. Read P.A. Anderson's famous article, "More is Different". Or read an introductory complexity theory text about how heat conduction creates stable vortices in fluids of a stable size. In other words, when you use the word "network" in the paragraph above, you are referring to something that exists *only* in the minds of professors of computer science, because they created it as a modeling tool. The "network" we actually use is quite different. From s.malik at tuhh.de Tue Jul 26 09:14:16 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Tue, 26 Jul 2005 18:14:16 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <42E65430.1060704@reed.com> References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> Message-ID: <42E66158.3090402@tuhh.de> > ..... something that exists *only* in the minds of professors of > computer science, because they created it as a modeling tool. The > "network" we actually use is quite different. What "network" should one consider when estimating RTO's (as given in the topic)? -- SM From huitema at windows.microsoft.com Tue Jul 26 10:21:35 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Tue, 26 Jul 2005 10:21:35 -0700 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites Message-ID: I think we should just look at a simple question. Does the current algorithm actually works? I personally did measurements 6 years ago. The measurement of tcp-connect times to various web servers clearly showed a power law distribution. There is in fact a history of finding power laws in measurement of communication systems. In fact, Mandelbrot work on fractals started with an analysis of the distribution of errors on a modem link! Based on all that, it is quite reasonable to assume that the distribution of RTT measurement follows a power law. People will immediately mention that it should be a truncated power law, but even that is far from clear. There is at least anecdotal evidence of packets being held up in queues and then transmitted after a very long time, e.g. half an hour... The current RTT estimators are based on exponential averages of consecutive samples of delays and variations. This is an issue, as the exponential average of a heavy tailed distribution also is a heavy tailed distribution. If you plug that in a simulation, you will observe that the estimates behave erratically. My personal feeling is that the current RTT estimators do not actually work. -- Christian Huitema From dpreed at reed.com Tue Jul 26 11:10:09 2005 From: dpreed at reed.com (David P. Reed) Date: Tue, 26 Jul 2005 14:10:09 -0400 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <42E66158.3090402@tuhh.de> References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> Message-ID: <42E67C81.2040003@reed.com> Sireen Habib Malik wrote: > > What "network" should one consider when estimating RTO's (as given in > the topic)? > > A discursive answer - from the abstract to the concrete - follows: That depends on why you are doing the measurement - in particular, what you intend to use the measurement for... a corollary of my rant is that measurements acquire their meaning partly from their context of use. So, if you want to measure RTO as part of an adaptive control algorithm, the relevant evaluation of measurement approach is the one that is most suited for the purposes of that algorithm. (e.g. tuning a tight control loop or deciding when to plan for more capacity are quite different contexts). Of course the algorithm is probably correct based on an argument expressed in terms of assumptions about what the measurement measures. The logic is essentially meta-circular. We won't go into that recursion, but instead assume that one or more non-trivial fixed points exist. In less abstract terms, there are a large class of candidates to be plugged in to the slot in your algorithm called "RTO measurement". A subset of those measurements give outputs that make your algorithm work well. One supposes that candidate measurement algorithms that will work give answer sequences in the near neighborhood of an ideal "correct" answer which is an idealization that may not even be well defined (since the actual RTO exists only when a bit experiences that actual RTO value, the "correct" RTO is an extension of actual real RTOs over a domain of time instants where its value is not even of interest - does a network have an RTO when no packet is actually being sent?). Consider for example a control problem involving noisy and missing data (such as using RTO measurements to control congestion). It's been shown in some cases, for some control algorithms, that the control system can still work quite well with very "large" errors, whereas trying to correct the errors by some strategy that smooths or delays the arrival of measurements actually results in far worse control. So "accuracy" need not be something that is calculated by comparison of the results of the measurement numerically as if there is some well-ordered invariant frame of reference. Confidence intervals are usually defined in terms of numerical quantity, not in terms of effect on a larger system. On the other hand if the use of the RTO measurement is getting a paper published, accuracy is best calculated as whatever it takes to get peer reviewers to nominate your paper for publication. One hopes that peer reviewers are quite familiar with the normal needs for which such measurements are done. But for new fields and for "mature fields" where theory has gone a separate way from practice, peers may be just as limited as the author in terms of their perspective. This is the "danger" I refer to. From detlef.bosau at web.de Tue Jul 26 11:24:01 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 26 Jul 2005 20:24:01 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> Message-ID: <42E67FC1.3090205@web.de> Sireen Habib Malik wrote: > > >> ..... something that exists *only* in the minds of professors of >> computer science, because they created it as a modeling tool. The >> "network" we actually use is quite different. > > > > What "network" should one consider when estimating RTO's (as given in > the topic)? > > No one =8-) My basic intention behind my original post was _not_ to understand the network between sender and receiver. Frankly spoken, I?m not quite sure whether this discussion will ever come to an end, particularly as more and more complexity is added to the Internet every day. The focus of my question is thus not the network. It?s the RTO estimator itself. I don?t want to replace it. It?s simple, it works. I even don?t want to adapt TCP to more and more complex network structures. Why should I? Perhaps one of he best characterisations of the situation may be found here, even for non german readers: http://portale.web.de/Schlagzeilen/BilddesTages/msg/5911184/5911185/1/ As you see: 1.: Even pigs are not always RFC compliant, so why networks should be? 2.: Unfortunately, I?ve lost the link to the famous NS2 pig. Surely you remember it. It?s obvious: Simulators and models are an _abstraction_ of reality and there are notable differences. My intention is to have a "top down" perspective to the network: What does TCP expect from a network? In other words: How must a network behave to make TCP work fine? How must a network _appear_? Of course, when we talk about TCP, we talk about some transport system which conveys TCP datarams from here to there - one way or another, not nessecarily exclusive. And even that is not true. I talk about TCP senders. So, TCP senders send TCP datagrams. In most cases, they honestly believe, there may be a TCP receiver at the other end of the line. Even _that_ is not granted. Think of a mobile receiver. For years I had to deal with a project where streaming media should be conveyed to mobile receivers via IP!!!!!! People wasted lots of mony to convey speach to a mobile receiver via VoIP, instead of using the obvious solution: the "speach service", which is offered by _every_ wide area mobile network technology on the market. If I think of TCP, i may think of file transfer. When there is a simply yet perfect file transfer protocol in ISDN (let me take this example because there _IS_ one), why shouldn?t I use it if appropriate? When you consider my Path Tail Emulation proposoal: The idea is to _hide_ networks behind "TCP conforming ones". So, it?s not my question what a network does offer. It?s the question what TCP requires. And then the idea is to make a network _appear_ to conform to TCP requirements. Basically, I think of mobile netowrks here. But I?m not restricted to mobile networks. The question is: What is the ideal behaviour for a network used for TCP? When I can answer this one, I want to make an arbitrary network to appear that way. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Tue Jul 26 11:33:41 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 26 Jul 2005 20:33:41 +0200 Subject: [e2e] The picture References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> <42E67FC1.3090205@web.de> Message-ID: <42E68205.2030507@web.de> The picture did not work..... Here it is: http://img.web.de/c/00/5A/35/7C.420 -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From s.malik at tuhh.de Wed Jul 27 06:21:01 2005 From: s.malik at tuhh.de (sireen malik) Date: Wed, 27 Jul 2005 15:21:01 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites In-Reply-To: <42E67C81.2040003@reed.com> References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> <42E67C81.2040003@reed.com> Message-ID: <42E78A3D.10709@tuhh.de> David, I agree with your following statements. "... the abstraction creates a brittle problem structure - knock one of the assumptions and the whole elaborate business falls over dead. " and " On the other hand if the use of the .....*****.....is getting a paper published, accuracy is best calculated as whatever it takes to get peer reviewers to nominate yourpaper for publication. One hopes that peer reviewers are quite familiar with the normal needs for which such measurements are done. But for new fields and for "mature fields" where theory has gone a separate way from practice, peers may be just as limited as the author in terms of their perspective. This is the "danger" I refer to. " So what do you think about the Poissonian assumption now? Isn't it that one assumption which when one knocks, and the whole elaborate business begins to look wobbly around knees, if it does not fall over dead . I have read a few papers on TCP, and have the impression that this assumption remains at the core of the analysis "for getting papers published" as well as getting them "nominated"! For the vast majority of the TCP people, heavy-tails distributions is theory. Just that. I think this is the "danger" your refer to. I do note, however, that a great utility of the Poisson assumption is that no one hurls heat vortices and path-dependent sunspots at you! -- SM From detlef.bosau at web.de Wed Jul 27 17:12:09 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 28 Jul 2005 02:12:09 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> <42E67C81.2040003@reed.com> Message-ID: <42E822D9.61067CAC@web.de> "David P. Reed" wrote: > > Sireen Habib Malik wrote: > > > > > What "network" should one consider when estimating RTO's (as given in > > the topic)? > > > > > A discursive answer - from the abstract to the concrete - follows: > > That depends on why you are doing the measurement - in particular, what > you intend to use the measurement for... a corollary of my rant is > that measurements acquire their meaning partly from their context of use. O.k. Perhaps it is helpful for me to return to the starting point of this discussion. Basically, I want to understand the system model of TCP. Thus, I want to talk about the RTO measurement used in TCP and its inherent limitations. I don?t want to fix them. I only want to understand them. Lucky me, I found Raj Jains Paper on Divergence of Timeout Algorithms... online. And now I?m trying to obtain the paper by Lixia Zhang, Why TCP Timers don?t work well. And primarily of course I want to get and to read the paper on Edge?s algorithm. I think, that?s what I want and what is my next step. Understanding the rationale behind Edge?s algorithm, which is used until today AFAIK. The discussion whether to use Edge?s algorithm or not has surely been conducted quite thoroughly during the past twenty years, so I simply assume that Edge?s algorithm is a good choice. And once again: When I startet this discussion, my interest was not primarily to build the one, perfect, phantastic universal network model. There are numbers of them around, they may be right or wrong, I don?t know. And I don?t have the possibility to check that. The rationale behind this is my Path Tail Emulation algorithm, it may be convincing or not at the moment. The simple question is: When I hide a network?s last mile, e.g. a mobile wireless network, behind some kind of PEP and thus provide the sender with a "changed RTT behaviour", which any spoofing or splitting PEP does, what is the ideal RTT behaviour? I.e. an RTT behaviour which makes Edge?s algorithm work perfect? And I don?t even care for any kind of network in between. Neither do I for the often so beloved cross traffics. Whenever I get a paper rejected, at least one reviewer misses cross traffics. Perhaps the most stupid comment on this one was: "You simply must simulate all possible scenarios, Mr. Bosau, it?s a matter of industry!" Wow. Perhaps I?m stupid, but to the best of my knowledge, the number of network scenarios is infinite. Now, 1 as an approximation for infinity is not better and not worse then 2 or 10 or 100. So, perhaps this standard excuse for paper rejection should undergo some careful rethinking in that way, that of course a reasonable choice of scenarios should be provided - but that?s enough. In addition, I know that cross traffic does not improve network performance. This is similar to bad plugs, a sudden system crash at a computer or a failure of the public utilities. (N.B. From my experience, this is one of the worsest reason for system failure ever. Thus it should be part of any simulatin *SCNR*) However, this does not matter for my little playground. I want to make a PEP behave as good as possible. And when there are bad guys who flood a network with crosstraffics, or Al Quaida bombs the utilities - sorry, I can?t help it. This is far beyond my sphere of influence. And it?s far beyound a PEP?s sphere of influence. My only intention is to have splitting and spoofing PEP?s not worsen the situation. Therefore my only simulation scenario is: Sender----------------PTE/PEP-----Receiver "------------" are some networks in betwwen. For a TCP Sender this appears as Sender---------------"Receiver". And now, it?s my intention to have the "Receiver" as well behaved as possible. Whatever will be placed in between could be designed best, that?s my firm conviction, if existing systems are well behaved and compliant to _explicit_ requirements. It?s some sort of induction argument: Any system/node/structure/.... placed into a network must not worsen it?s performance, at least not more then inevitably necessary. So, first of all it?s my interest: What are the vulnerabilities of Edge?s algorithm? (This is more concrete as the original question. But I have learned, that Edge?s algorithm is the matter of interest here.) When I _know_ them, I (hopefully) can _avoid_ an inappropriate behaviour. That?s all I want to do. As I said: This is my little playground. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Thu Jul 28 12:14:40 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 28 Jul 2005 21:14:40 +0200 Subject: [e2e] Some simple, perhaps stupid, idea. was: Re: Agility of RTO Estimates, stability, vulneratibilites References: <20050725182620.265801FF@aland.bbn.com> <42E544A1.8020706@reed.com> <42E64178.709@tuhh.de> <42E65430.1060704@reed.com> <42E66158.3090402@tuhh.de> <42E67C81.2040003@reed.com> <42E822D9.61067CAC@web.de> Message-ID: <42E92EA0.90200@web.de> Detlef Bosau wrote: > > Lucky me, I found Raj Jains Paper on Divergence of Timeout Algorithms... > online. And now I?m trying to obtain the paper by Lixia Zhang, Why TCP > Timers don?t work well. And primarily of course I want to get and to > read the paper on Edge?s algorithm. I think, that?s what I want and what > is my next step. Understanding the rationale behind Edge?s algorithm, > which is used until today AFAIK. > ... > > The simple question is: When I hide a network?s last mile, e.g. a mobile > wireless network, behind some kind of PEP and thus provide the sender > with a "changed RTT behaviour", which any spoofing or splitting PEP > does, what is the ideal RTT behaviour? I.e. an RTT behaviour which makes > Edge?s algorithm work perfect? Eventually, I got Edge?s paper. Perhaps, we should turn away from computers and go for good old books. O.k., I got Edge?s paper as pdf File. However: The longer I study literature on TCP, the more I get the impression: The older it is, the better it is. This may a stupid prejudice. But much of the somewhat older papers are really carefully thought through. And perhaps do not follow blindly the "publish or perish" principle, which is perhaps modern nowadays. In this post, I simply want to share a very spontaneous idea, presumably it?s stoneaged, but it is quite simple and clear. And it illustrates my thoughts. Edge poses rather weak requirements for the RTT process. E.g., the individual stochastic variables T(n) must share a common expectation and variance. And there is some requirement concerning the convariance. This is far from being "i.i.d." or memoryless or Poissonian or something like that. Particularly, there is absolutely no assumption for the T(n) to obey some specific distribution funkcion. The rationale for the RTO itself is based on Chebyshev?s inequality, thus it is absolutely generic. However, Edge wants the T(n) to share a common expectation and variance. Now, when I think of RED strategies, I remember a strategy where there are two thresholds a, b, a < b, for a queuelength q. If q < a, packets are accepted. If b < q, packets are rejected. If a <= q <= b packets are rejected randomly with a probality p which is linear increased from p=0 if q=a to p=1 if q=b. Question: Would it make sense to chose a and b that way, that i) q has a constant expectation and ii) q has a constand variance for certain periods of time? Expection and variance could well be chosen appropriate for the load situation. When I consider a network as a sequence of links and queues (I know..... but I will do this for the moment), the varying part of the RTT is the queueing delay, as long as the path for a connection does not change. So, if every router on the path would try to maintain a constant expectation and variance for queue lenghts, the queueing delays would have a constant expectation and variance. Therefore, the observed Tn would have a constant expectation and variance, at least for small periods of time. Would it be possible to achieve this by management of the thresholds a and b? If so, this could be achieved by each router individually. As a consequence, at least the requirement for a common expectation and variation of the T(n) would be met. So far. It?s spontaneous, it?s perhaps stupid. But it shall illustrate my way of thinking, that it may be reasonable to perhaps make the network meet a protocol?s requirements instead of always make a protocol suitable for a network. However, I expect that someone has discussed this before, it?s just too simple. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937