From s.malik at tuhh.de Mon Aug 1 04:10:12 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Mon, 01 Aug 2005 13:10:12 +0200 Subject: [e2e] RTO Estimation... was "Agility..." In-Reply-To: References: Message-ID: <42EE0314.6020202@tuhh.de> Hi all, I have been thinking about David's emails and some points raised by Detlef in the background discussion. It's a learning process. So ...there are some questions which I think are important in the context of this discussion. We say that the RTT's distribution is heavy-tailed. However, the discussion on heavy-tailed sized files, the resultant LRD in the traffic and the sub-exponential queue occupancy distribution, is based upon the "open-loop" queue anaylsis. However, TCP is a "closed-loop" protocol (David's point). The first set of questions then is, "what impacts the queue occupancy distribution more, the closed loop operation, or the heavy-tailedness of E2E distribution?", or, "under what loads/traffic conditions one of them is more dominant?" , or, "is there a dependency between them?". Second point: It is clear that present RTO estimation will work in the frame of assumptions under which it is supposed to work. Like Detlef says, "nobody will complain that a car does not run if it is out of gas", so nobody should complain if RTO estimator does not work when traffic parameters do not fall inside the space of the relevant assumptions. If that is true then one way to resolve this issue is to adjust/shape traffic in such a way that RTO should work (I think this is what Detlef is saying), or make a "general purpose" RTO estimator that reduces/relaxes the set of assumptions - ideally, it should work if IID assumption holds, or not. I think the work in the second direction is more general, and conducive to practical environments. How difficult or easy it is, I don't know! A good way is to first find out, if there is any work already done in this direction? Thanks and regards, Sireen Malik Christian Huitema wrote: >I think we should just look at a simple question. Does the current >algorithm actually works? > >I personally did measurements 6 years ago. The measurement of >tcp-connect times to various web servers clearly showed a power law >distribution. There is in fact a history of finding power laws in >measurement of communication systems. In fact, Mandelbrot work on >fractals started with an analysis of the distribution of errors on a >modem link! Based on all that, it is quite reasonable to assume that the >distribution of RTT measurement follows a power law. > >People will immediately mention that it should be a truncated power law, >but even that is far from clear. There is at least anecdotal evidence of >packets being held up in queues and then transmitted after a very long >time, e.g. half an hour... > >The current RTT estimators are based on exponential averages of >consecutive samples of delays and variations. This is an issue, as the >exponential average of a heavy tailed distribution also is a heavy >tailed distribution. If you plug that in a simulation, you will observe >that the estimates behave erratically. > >My personal feeling is that the current RTT estimators do not actually >work. > >-- Christian Huitema > > -- M.Sc.-Ing. Sireen Malik Communication Networks Hamburg University of Technology FSP 4-06 (room 5.012) Schwarzenbergstrasse 95 (IVD) 21073-Hamburg, Deutschland Tel: +49 (40) 42-878-3443 Fax: +49 (40) 42-878-2941 E-Mail: s.malik at tuhh.de --Everything should be as simple as possible, but no simpler (Albert Einstein) From detlef.bosau at web.de Mon Aug 1 11:07:46 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 01 Aug 2005 20:07:46 +0200 Subject: [e2e] RTO Estimation... was "Agility..." References: <42EE0314.6020202@tuhh.de> Message-ID: <42EE64F2.BCD0621E@web.de> Sireen Habib Malik wrote: > > Second point: It is clear that present RTO estimation will work in the > frame of assumptions under which it is supposed to work. Like Detlef > says, "nobody will complain that a car does not run if it is out of > gas", so nobody should complain if RTO estimator does not work when > traffic parameters do not fall inside the space of the relevant assumptions. And we should well consider the consequences, if it?s true that RTO estimators actually don?t work, as Christian suggested. For the particular case of mobile wireless networks, we would have to reconsider the whole work on "spurious timeouts", because what?s called "spurious timeout" is perhaps not the problem, but a symptom. Unduly often spurious timeouts are nothing else than a too high probability for unwanted retransmissions, to remain in the words chosen e.g. by Edge. > > If that is true then one way to resolve this issue is to adjust/shape > traffic in such a way that RTO should work (I think this is what Detlef Exactly. In my post, I focussed solely on the routers. However, any change in a router?s behaviour directly influences the traffic switched by this. Even more difficult: Influencing the traffic on a router will perhaps not only affect this router, but the behaviour of other routers as well. So, I?m not quite sure whether we are allowed to consider the router queues as being decoupled. > is saying), or make a "general purpose" RTO estimator that > reduces/relaxes the set of assumptions - ideally, it should work if IID > assumption holds, or not. > > I think the work in the second direction is more general, and conducive > to practical environments. How difficult or easy it is, I don't know! A The assumptions made by Edge are extremely general. E.g., for RTT and VAR he assumes hardly more than the pure existence. "Weakly stationary" means that all observation variables must share the same E and V, moreover the correlation of the latest observation variable to some arbitrary other one in a given sample does not depend on the sample size. Bearing in mind, that we look for E and V, these assumptions appear rather weak to me. I?ve looked around for other estimators. In fact, there may be estimators which yield better values for forecasting, however under much stronger assumptions, e.g. the forecasted process must be stationary and obey a normal distribution. > good way is to first find out, if there is any work already done in this > direction? In addition I would be interested in work on the convergence speed of EWMA filters. I?m not quite sure, whether I can access the work on EWMA filters quoted by Edge, these are older textbooks by Cox (1965) or Box (1976). I think, a signal theory perspective would be helpful here. An EWMA filter is basically nothing else as a grade 1 lowpass IIR filter. So it may be helpful to consider step response and impulse response functions of this one, particularly the step impulse because this will reveal the behaviour in case of sudden steps in the latency. However, if we know the impulse response, we can describe the general behaviour on arbitrary signals here. One difficulty here is that the EWMA filters impulse response is that one of a time discrete system and hence its consequences on a real time (continuous time!) system depend on the sampling freuquency, i.e. on the acknowledgement rate. This becomes extremely important in mobile networks where path characteristics may change due to physical or enviorenmental circumstances which are more or less beyound our influence and where e.g. a filtering of changes, which is of course a time discrete filtering, must be adapted to the flows "sampling frequency", i.e. the ACK rate. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From chris at cs.utexas.edu Tue Aug 2 14:33:59 2005 From: chris at cs.utexas.edu (Chris Edmondson-Yurkanan) Date: Tue, 2 Aug 2005 16:33:59 -0500 Subject: [e2e] Cerf & Kahn's Turing Lecture: Open to all, 8/22/2005 Message-ID: <8934b77371b7e568f412d0ef5104e628@cs.utexas.edu> The Turing Lecture by Vint Cerf and Bob Kahn is OPEN TO ALL! SIGCOMM 2005 is the host for this year's ACM Turing Lecture, and has opened the Lecture beyond the conference attendees to ALL who are interested. In addition, SIGCOMM will stream it live over the Internet that Cerf & Kahn helped create. * You are invited to attend the ACM Turing Lecture in Philadelphia, PA, US, August 22nd: 6:00-7:30 EDT (and join the reception which begins at 4:30) at the Irvine Auditorium, University of Pennsylvania (free-of-charge) (with thanks to Penn's School of Engineering & Applied Science) * Bring your colleagues, guests, students, advisors... and help Vint & Bob celebrate the first time that networking researchers have received this prestigious award, in the 39 years of the ACM Turing Award. * The Lecture will be a moderated discussion between Vint and Bob, with the title: Assessing the Internet: Lessons Learned, Strategies for Evolution, and Future Possibilities Afterwards, there will be a Q&A session with the audience. * To reserve one of 600 seats set aside for the public, please sign up via the Turing Lecture web page: http://www.acm.org/sigcomm/sigcomm2005/turinglecture.html That same web page has details, directions, ticket reservations, and info on how-to access the live webcast and the eventual archived webcast. Reservations will be filled on a first-come, first-served basis. --------------------- If you have not heard about this year's Turing Award or have not heard about Cerf and Kahn, here's a little background. The A.M. Turing Award is often recognized as the "Nobel Prize of Computing". The citation for Cerf and Kahn reads: "For pioneering work on internetworking, including the design and implementation of the Internet's basic communications protocols, TCP/IP, and for inspired leadership in networking." Their first paper on "internetworking" was published in IEEE Transactions on Communications, May 1974: A Protocol for Packet Network Intercommunication. If you haven't read their first paper, add it to your summer reading list! Bob Kahn and Vint Cerf started in 1973 to solve the problem of how to interconnect a network of networks, i.e. an "internetwork", or "internet". For Bob, new at DARPA, his interest was in building and connecting a packet radio network to the existing ARPA network along with a packet satellite network. Bob invited Vint to work with him, and they jointly designed TCP, which included an internetwork header and a process header (but the two headers didn't start to split into IP and TCP until 5 years later). In 1973 Vint was already the chair of the International Network Working Group, so he was interested as well in interconnecting the ARPA network to the French network Cyclades & the British network at National Physics Laboratory. The following link has a small bio on each: http://www.acm.org/awards/turing_citations/cerf_kahn.html At a reception at the Computer History Museum June 9th, Vint and Bob "cited the collaborative nature of their work, acknowledging the contributions from many in the room who had made their achievements possible." For more information on a few of their collaborators, see: http://campus.acm.org/public/membernet/storypage_2.cfm? ci=July_2005&story=2&CFID=48919977&CFTOKEN=16561738 --------------------- PS: if you cannot attend the lecture, then please do watch the live webcast or the archived lecture. Check out the Turing Lecture website for all details: http://www.acm.org/sigcomm/sigcomm2005/turinglecture.html --------------------- Chris Edmondson-Yurkanan (chris at cs.utexas.edu) Contact info: www.cs.utexas.edu/~chris/ From detlef.bosau at web.de Wed Aug 3 12:54:50 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 03 Aug 2005 21:54:50 +0200 Subject: [e2e] Agility of RTO Estimates, stability, vulneratibilites References: Message-ID: <42F1210A.4030508@web.de> Christian Huitema wrote: > I think we should just look at a simple question. Does the current > algorithm actually works? > > I personally did measurements 6 years ago. The measurement of > tcp-connect times to various web servers clearly showed a power law > distribution. There is in fact a history of finding power laws in > measurement of communication systems. In fact, Mandelbrot work on > fractals started with an analysis of the distribution of errors on a > modem link! Based on all that, it is quite reasonable to assume that the > distribution of RTT measurement follows a power law. > Hm. I believe I remember some newspaper article, where the origin for the work on "the fractal geometry of nature" was the question: How long is the coast of England? Shortly afterwards, we typically learn that a butterly in the Himalaya may cause a tornado in Europe. When I attendet lessons in stochastics, I was told: When we think there may be a stochastic behaviour, we must consider where the stochastic behaviour is supposed to come from. Do we really _expect_ this behaviour? And why do we? It?s the same with the whole thing of chaos theory, self similarity and its variations. 1.: What does it describe exactly? (I frequently miss precise definitions.) 2.: Where does chaotic/self-similar/.... behaviour come from? (It?s not enough to list up occasional observations. Why it?s reasonable to assume a hehaviour like that? Is there hard evidence, that e.g. latencies are self similar? 3.: What do we learn from that behaviour? Does it end in itself? Or can we really tell about "lessons learned" from the self similarity debate? > People will immediately mention that it should be a truncated power law, > but even that is far from clear. There is at least anecdotal evidence of > packets being held up in queues and then transmitted after a very long > time, e.g. half an hour... > Does not sound like a solid basis. > The current RTT estimators are based on exponential averages of > consecutive samples of delays and variations. This is an issue, as the > exponential average of a heavy tailed distribution also is a heavy > tailed distribution. If you plug that in a simulation, you will observe > that the estimates behave erratically. O.k. After having played around for a few minutes with EWMA filters in Octave, I?ve seen that even the settling behaviour is simply disastrous. When we keep in mind that Internet latencies vary from some microseconds (10e-6) in an Ethernet segment to some hundred _seconds_ (sic!) in some mobile wireless networks (10e2) then we see that Internet latenies vary on a scale covering at least eight orders of magnitude. When we keep in mind further, that the Internet is dominated by short term flows (20 packets or so), then we must conclude, that an ordinary TCP flow is quite unlikely to see even _one_ correct RTT estimate in its whole lifetime. Is this correct? Now, to my knowledge, we use an initial value about 2 seconds, which is a reasonable upper limit for quite a few internet connections and therefore, during a flow?s lifetime some few distracting RTT measurements do not really matter. So, from a "practicioner?s view", TCP "works". "Somehow". However, as soon as we are confronted with latencies larger than this initial value or subject to variation on a large scale, the situation deteriorates. > > My personal feeling is that the current RTT estimators do not actually > work. > What should be considered bad news ... However, I would like to focus the problem a little bit more on a hone hop scenario. The reason for doing so is, that after having read the works by Zhang, Jain and Edge two problems become evident. 1.: RTT estimators suffer from poor convergence and a problem with their initial value. 2.: RTT estimators suffer from a poor forecast capability. Their are numerous other difficulties, e.g. Edge?s assumptions, however I think these can be handled as well as 1. The hard problem may be 2. Let?s consider one hop. Two routers, r1 and r2, one link in between. (Routers: These systems may well be IS in an arbitrary network path.) r1--------------------------r2 Consider one packet. t1: packet?s arrival time on r1. t2: packet?s arrival time on r2. If a packet is yet to arrive on r1 latest at a time now + delta and delta is known, can we forecast the estimation of (t2-t1)? -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From keshav at uwaterloo.ca Wed Aug 3 13:02:35 2005 From: keshav at uwaterloo.ca (S. Keshav) Date: Wed, 03 Aug 2005 16:02:35 -0400 Subject: [e2e] end2end-interest Digest, Vol 17, Issue 26 In-Reply-To: Message-ID: > I think of RED strategies, I remember a strategy where there > are two thresholds a, b, a < b, for a queuelength q. If q < a, packets > are accepted. If b < q, packets are rejected. If a <= q <= b packets are > rejected randomly with a probality p which is linear increased from p=0 > if q=a to p=1 if q=b. > > Question: Would it make sense to chose a and b that way, that > i) q has a constant expectation and > ii) q has a constand variance > for certain periods of time? > > > However, I expect that someone has discussed this before, it?s just too > simple. > The easiest way to make the queueing delay constant, or nearly so, is to introduce wait times where the link is idle even though there are packets in the queue. This reduces delay jitter in the system and makes the whole network more circuit-like. By introducing new 'work', the system is what is called 'non-work-conserving'. Such systems were studied extensively in the early 90's. For more details, you should look up Hui Zhang's comprehensive survey on scheduling: "Service disciplines for guaranteed performance service in packet-switching networks" Proceedings of the IEEE, Volume 83, Issue 10, Oct. 1995 Page(s):1374 - 1396. hope this helps keshav From detlef.bosau at web.de Wed Aug 3 15:20:53 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 04 Aug 2005 00:20:53 +0200 Subject: [e2e] end2end-interest Digest, Vol 17, Issue 26 References: Message-ID: <42F14345.7000804@web.de> S. Keshav wrote: > > The easiest way to make the queueing delay constant, or nearly so, is to > introduce wait times where the link is idle even though there are packets in > the queue. This reduces delay jitter in the system and makes the whole > network more circuit-like. By introducing new 'work', the system is what is Exactly. And I?m not quite sure whether it?s that what I want to do. > called 'non-work-conserving'. Such systems were studied extensively in the > early 90's. For more details, you should look up Hui Zhang's comprehensive > survey on scheduling: "Service disciplines for guaranteed performance > service in packet-switching networks" Proceedings of the IEEE, Volume 83, > Issue 10, Oct. 1995 Page(s):1374 - 1396. > > hope this helps It exactly marks the problem. The more circuit like a network is, the less are the economical advantages for typical "packet switching users". When we make a delay?s _expectation_ constant for a certain amount of time, we can well accept a large variation. Jitter is not the problem. So, this could be overkill here. However, I don?t know of a "weaker" way. In my other post from today (Augst, 3rd) I tried to weaken the problem that way, that I only ask for a limited forecast capability. It is not necessary to keep a queueing delay constant or makeing it obey a certain distribution. It would be sufficient to forecast its expectation, and if possible its variance, for a limited period of time, e.g. 200 ms. Do you think, there?s a way to do so, thereby maintaining the typical "packet-switching best effort" nature of the Internet? Perhaps, this is a borderline between "best effort" traffic shaping (if this even exists) and some kind of guaranteed service. I really don?t know yet. -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Thu Aug 4 05:50:28 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 04 Aug 2005 14:50:28 +0200 Subject: [e2e] Expected latency for a single hop Message-ID: <42F20F14.3000007@web.de> I posted this in another context yesterday, but perhaps, I should isolate the problem to state it more clearly. Consider an arbitrary packet-switching network. Consider two adjacent nodes n1, n2 with link l in between n1--------------------------n2 l Consider a packet traveling the network, it?s path shall contain n1 and n2 subsequently. Now, let t1: packet?s arrival time on r1. t2: packet?s arrival time on r2. Can we forecast expectaition and variance (if only for the _near_ future!) for the "one hop latency" t2 - t1 ? I explicitely focus on a "best effort" context. For link l I assume, that expectation and variance of the transport latency exist. Is there any work in this direction? -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From craig at aland.bbn.com Thu Aug 4 07:09:22 2005 From: craig at aland.bbn.com (Craig Partridge) Date: Thu, 04 Aug 2005 10:09:22 -0400 Subject: [e2e] Expected latency for a single hop In-Reply-To: Your message of "Thu, 04 Aug 2005 14:50:28 +0200." <42F20F14.3000007@web.de> Message-ID: <20050804140922.821FF1FF@aland.bbn.com> Is l a physical link, an IPsec or IP-in-IP tunnel, or ...? Note that if it is a tunnel, the answer is that the expectation and variance of latency is potentially the same as any random multi-hop Internet path.... Craig In message <42F20F14.3000007 at web.de>, Detlef Bosau writes: >I posted this in another context yesterday, but perhaps, I should >isolate the problem to state it more clearly. > >Consider an arbitrary packet-switching network. > >Consider two adjacent nodes n1, n2 with link l in between > >n1--------------------------n2 > l > > >Consider a packet traveling the network, it´s path shall contain n1 and >n2 subsequently. > >Now, let > t1: packet´s arrival time on r1. > t2: packet´s arrival time on r2. > >Can we forecast expectaition and variance (if only for the _near_ >future!) for the "one hop latency" t2 - t1 ? > >I explicitely focus on a "best effort" context. > >For link l I assume, that expectation and variance of the transport >latency exist. > >Is there any work in this direction? > > >-- >Detlef Bosau >Galileistrasse 30 >70565 Stuttgart >Mail: detlef.bosau at web.de >Web: http://www.detlef-bosau.de >Mobile: +49 172 681 9937 From keshav at uwaterloo.ca Thu Aug 4 07:30:35 2005 From: keshav at uwaterloo.ca (S. Keshav) Date: Thu, 04 Aug 2005 10:30:35 -0400 Subject: [e2e] end2end-interest Digest, Vol 17, Issue 26 In-Reply-To: <42F14345.7000804@web.de> Message-ID: Detlef, In general, what you are asking for is difficult. Consider the following scenario. Suppose a router forecasts that the queueing delays at a particular interface are small at time t and expects this forecast to hold until t+200ms. Now, suddenly, a burst of packets from multiple input ports destined to that interface arrive at time t+epsilon. This builds up the queue, increasing delays. You have two choices: 1. violate the forecast or 2. drop packets in order to meet the forecast. Neither one is a good alternative. If you violate the forecast, then what use is it? If you drop packets to meet the forecast, that's a waste, because adequate buffers exist. I do not think that dropping packets in order to make RTO computations sane is a good tradeoff. A similar situation holds if traffic is generally high, so that queue lengths are large, and you forecast a large delay. Now, if the traffic dies down, you have to either violate the forecast or add new work to the system. Adding new work delays all subsequent packets, so if you now get a burst, you are in trouble. As such, I believe that any sort of forecast is only possible if there is a way to bound the total incoming traffic, both in terms of rate and burstiness. keshav > > In my other post from today (Augst, 3rd) I tried to weaken the problem > that way, that I only ask for a limited forecast capability. It is not > necessary to keep a queueing delay constant or makeing it obey a certain > distribution. It would be sufficient to forecast its expectation, and if > possible its variance, for a limited period of time, e.g. 200 ms. > > Do you think, there?s a way to do so, thereby maintaining the typical > "packet-switching best effort" nature of the Internet? > > Perhaps, this is a borderline between "best effort" traffic shaping (if > this even exists) and some kind of guaranteed service. I really don?t > know yet. > From dpreed at reed.com Thu Aug 4 08:19:08 2005 From: dpreed at reed.com (David P. Reed) Date: Thu, 04 Aug 2005 11:19:08 -0400 Subject: [e2e] Expected latency for a single hop In-Reply-To: <42F20F14.3000007@web.de> References: <42F20F14.3000007@web.de> Message-ID: <42F231EC.1060100@reed.com> Detlef - Though it seems simple, your statement is about as complex as a problem can be. This is the kind of problem statement that creates the definitional trap I was referring to in earlier discussions. By construing the "latency" as being a propery of the "link" rather than of the network as a whole, the statement acquires a misleading simplicity The latency only is well defined for real packets that actually arrive and traverse the link. Expectation and variance are properties of distributions, not packets. There is no random process at all on the link itself (at least in the common case - there are links where the link itself has a random delay, but that usually arises where the link's physical characteristics vary faster and larger than the queue management and link pacing mechanisms). The random process is the network environment that provides competing packets. So the latency is everywhere but the link itself. The other issue is that prediction is more reliable over a collection of packets, but a sufficient collection cannot happen in an instant. The first order predictor is the queue size at the entry to the link. That's a very reliable predictor of latency for the next event. But it provides very little input about variance (which depends entirely on packets arriving from elsewhere at "light speed"). I think there might be a much better (i.e. less complex to state) approach in NOT trying to start with the link and go by induction to the multilink case. Instead, perhaps start with an end-to-end flow (over a path) and reason about what happens as you add flows that superpose themselves on the existing paths. Detlef Bosau wrote: > I posted this in another context yesterday, but perhaps, I should > isolate the problem to state it more clearly. > > Consider an arbitrary packet-switching network. > > Consider two adjacent nodes n1, n2 with link l in between > > n1--------------------------n2 > l > > > Consider a packet traveling the network, it?s path shall contain n1 > and n2 subsequently. > > Now, let > t1: packet?s arrival time on r1. > t2: packet?s arrival time on r2. > > Can we forecast expectaition and variance (if only for the _near_ > future!) for the "one hop latency" t2 - t1 ? > > I explicitely focus on a "best effort" context. > > For link l I assume, that expectation and variance of the transport > latency exist. > > Is there any work in this direction? > > From nicolasc at andrew.cmu.edu Thu Aug 4 07:59:50 2005 From: nicolasc at andrew.cmu.edu (Nicolas Christin) Date: Thu, 4 Aug 2005 10:59:50 -0400 Subject: [e2e] end2end-interest Digest, Vol 17, Issue 26 In-Reply-To: <42F14345.7000804@web.de> References: <42F14345.7000804@web.de> Message-ID: <20050804145950.GA18305@lithium.ini.cmu.edu> Detlef, On Wed Aug 03, 2005, Detlef Bosau wrote: > > Do you think, there?s a way to do so, thereby maintaining the typical > "packet-switching best effort" nature of the Internet? > > Perhaps, this is a borderline between "best effort" traffic shaping (if > this even exists) and some kind of guaranteed service. I really don?t > know yet. I actually studied a very related problem in the good ol' days of my Ph.D dissertation. I basically tried to combine buffer management and packet scheduling to provide service differentiation without admission control. The trick is essentially that the bounds that you give to traffic classes are actually soft, in that they can be violated. (Keshav is completely right - when traffic is really bursty, it is very difficult to do any type of intelligent predicition, and you might end up with something that is not much better than best effort.) The good news is that you can do quite a lot if you combine scheduling with packet dropping, and even more when you start looking at ways to playing with TCP congestion control to do essentially endpoint admission control for you. If you are interested, a summary of my dissertation is in: N. Christin and J. Liebeherr. A QoS Architecture for Quantitative Service Differentiation. In IEEE Communications Magazine 41(6), Special Issue on Scalability in IP-Oriented Networks, pages 38-45. June 2003. http://www.comsoc.org/livepubs/ci1/DLPREVIEW/christin.pdf Best, Nicolas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050804/c8ddb4df/attachment-0001.bin From detlef.bosau at web.de Thu Aug 4 11:27:27 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 04 Aug 2005 20:27:27 +0200 Subject: [e2e] Expected latency for a single hop References: <42F20F14.3000007@web.de> <42F231EC.1060100@reed.com> Message-ID: <42F25E0F.4000601@web.de> David P. Reed wrote: > Detlef - Though it seems simple, your statement is about as complex as a > problem can be. > This is the kind of problem statement that creates the definitional trap > I was referring to in earlier discussions. By construing the "latency" > as being a propery of the "link" rather than of the network as a whole, > the statement acquires a misleading simplicity > I know. However, the rationale behind my question is quite obvious: If you place a TCP sender at n1 and the according receiver at n2, the adaptive RTO mechanism in TCP exactly relys upon estimated mean and variance of (t2-t1). If I had written: "Can we provide an adaptive RTO for a single hop TCP connection?" I surely had been directed to the relevant literature. Perhaps, one had considered it a stupid question. Thus, I thought it might be useful to state the same problem(sic!) which TCP claims to solve (even for n hops!) in somewhat different words >:-) Honestly, I believe, if we cannot estimate mean and variance for a _single_ hop, it?s perhaps not that much easier to do the job for an arbitrary number of hops (which of course includes the nasty case of a single hop). > The latency only is well defined for real packets that actually arrive > and traverse the link. Expectation and variance are properties of > distributions, not packets. > Yes. The intention of EWMA filtering used in TCP is an attempt to do a parameterless forecast of mean and variance the actual latency distribution. (Some weeks ago someone refered me to this well known saying by Niels Bohr: "Prediction is hard, especially of the future.") Thus, of course we estimate properties of an unknown (!) distribution and in turn derive an RTO by application of an inequality similar to Chebyshev?s inequality. However, the basic assumption is that we can provide estimates for mean and variance of a packets round trip time. > There is no random process at all on the link itself (at least in the > common case - there are links where the link itself has a random delay, > but that usually arises where the link's physical characteristics vary > faster and larger than the queue management and link pacing My assumptions on l are tough. I totally agree with Craig here. In general, we do know nothing about l. It may be a tunnel, it may be a mobile wireless link. E.g. for mobile wireless links, I do not even know whether a finite variance for the link?s latency distribution exists. > mechanisms). The random process is the network environment that > provides competing packets. So the latency is everywhere but the link > itself. > > The other issue is that prediction is more reliable over a collection of > packets, but a sufficient collection cannot happen in an instant. > I did not make any assumptions here, especially I did not assume that the estimation should be based upon the observation of the one packet. Perhaps my formulation was somewhat misleading here. We could use testpackets sent from n1 to n2 or observe traffic from several flows. If n1 and n2 are routers and we can observe a large number of flows, the job should be much easier than if done at a TCP flows source which has to rely on a _very_ rough sample. My favourate expample is always TCP including a 2400 bps link. (Nowadays forgotten, two years ago known from GSM - and we all know it from the good old modem times.) Depending on MSS, the sender gets a sample every one second or so. For wirebound systems in between one second is _ages_. In one of my posts, I claimed (please correct me, if I?m wrong) that in contemporary networks, even link bandwidths cover a range of eight orders of magnitude. When our single packet crawls along a GSM link, a Tier 1 backbone link may convey the whole Encyclopedia Britannica within the same period of time. However, in a scenario Sender----Tier1/Enc.Brit. Link--------router----GSM---receiver the sender estimates mean and variance of the round trip time using EWMA filters and the extremely rough time series gained from the ACK packets. Question: Is there a justification for doing so? I looked at Edge?s paper, esecially for the assumptions for the observation variables, i.e. the time series Tn (Tn: stocastic variables, tn: instances.) One sufficient assumption is that all Tn share the same mean and variance. Some "drift" is accepted as well as are "occasional steps" (put in my own words). When I look at my "Britannica example" and consider "sane mean and variance", I do not feel comfortable with these assumptions. > The first order predictor is the queue size at the entry to the link. > That's a very reliable predictor of latency for the next event. But it > provides very little input about variance (which depends entirely on > packets arriving from elsewhere at "light speed"). > > I think there might be a much better (i.e. less complex to state) > approach in NOT trying to start with the link and go by induction to the > multilink case. Instead, perhaps start with an end-to-end flow (over a > path) and reason about what happens as you add flows that superpose > themselves on the existing paths. > Is this really that more promising? Admittedly, this a rhetorical question. Basically, this is already being done. So I sharpened it a little bit by omitting n-1 links in the n link case ;-) Or is it a matter of how coarse or fine grained we look at the problem? I?m thinking about this problem - and at the same time, I use TCP and everything seems just to be fine :-) Then I read Raj Jain?s paper about the divergence of RTO estimators and Lixia Zhang?s paper on TCP timers. And I understand that we already addressed a number, perhaps nearly all, issues in these papers. But one issue which I do not yet understand is the use of EWMA filters. - Do they hold for arbitrary TCP connections? Can we reasonable assume the necesseary conditions given by Edge? Or alternative ones? - Do they converge fast enough in case of a sudden step in latency? Do they follow drifting latencies? How must we set the gain? I sometimes here something about "agility" and "stability". Basically, we should minimize the forecast error by proper choice of the gain. Can we use the same gain for all flows? - Is the temporal resolution of an ACK clocked TCP flow sufficient to provide reasonable estimates? Or is the time series? resolution obtained from that too coarse? (Nilsson, Martin and Rhee do so claim in there paper on lateny change / congestion correlation in June, 2003. One central point there was that the temporal resolution of observed round trip times in most cases is by far too coarse to derive reasonable conclusions concerning path properties.) I get no "feeling" for this situation. I see lots of scenarios and individual papers there, but I don?t see the big picture yet. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Thu Aug 4 13:19:25 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 04 Aug 2005 22:19:25 +0200 Subject: [e2e] end2end-interest Digest, Vol 17, Issue 26 References: Message-ID: <42F2784D.5020809@web.de> S. Keshav wrote: > Detlef, > In general, what you are asking for is difficult. Consider the following > scenario. Suppose a router forecasts that the queueing delays at a > particular interface are small at time t and expects this forecast to hold > until t+200ms. Now, suddenly, a burst of packets from multiple input ports > destined to that interface arrive at time t+epsilon. This builds up the > queue, increasing delays. You have two choices: > > 1. violate the forecast > or > 2. drop packets in order to meet the forecast. > > Neither one is a good alternative. If you violate the forecast, then what > use is it? If you drop packets to meet the forecast, that's a waste, because > adequate buffers exist. I do not think that dropping packets in order to > make RTO computations sane is a good tradeoff. > Perhaps, we talk a litte bit cross purposes here. What I?m trying to understand is the estimation of mean and variation of RTT in TCP flows. I don?t want to give any guarantees. So the purpose of a forecast is only to estimate latencies for the near future. If there is a traffic burst, then the forecast may be violated. So what? It?s an _estimate_. Moreover, it?s an estimate for a _mean_. An actual latency may well be greater or less. Basically, there are two objectives: 1. provide an RTO estimator with _less_ assumptions than e.g. Edge?s algorithm. 2. alleviate the settling behaviour and the consequences of the sometimes quite rough sampling done by the usual RTT observation. Perhaps, this could be helpful, I don?t know yet. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Mon Aug 8 09:27:23 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Aug 2005 18:27:23 +0200 Subject: [e2e] Expected latency for a single hop: What about 802.11 networks? References: <42F20F14.3000007@web.de> <42F231EC.1060100@reed.com> <42F25E0F.4000601@web.de> Message-ID: <42F787EB.3040006@web.de> I just had a very first glance on a paper by Christoph Lindemann et al., MobiHoc 05. The paper deals with TCP in multihop wireless networks, as far as I see particularly 802.11 networks. The paper mentions the typical consideration: In wireless networks, corruption based loss happens more often than corruption based drop. Now, first of all: What is the MAC algorithm in 802.11 ad hoc (not infrastructure!) networks / MANETs? To the best of my knowledge, this is ALOHA. (BTW: I would greatly appreciate a copy of Abramsons Paper. It?s on my reading list, but I could not find it yet.) AFAIK, ALOHA does _not_ detect collisions but relys upon positive acknowledments: A packet is sent, repeated if necessary, until it is acknowledged by the receiver. Q: Is this correct? If so, we have implict retransmissions on the MAC layer here. Particularly, we would observe transport latencies as the temporal distance between the first sending attempt and the final reception. This seems to be similar to the latency estimation used in the ARPAnet in the 80s and which is proven to be insufficient / divergent according to Jains paper "Divergence of Timeout Algorithms....", refer to the discussion concerning "Round Trip Delay with Retransmissions" in that paper. Q: Does this mean, it is difficult to obtain correct latency estimates by pure TCP/ACK observation in case of networks where local recovery is implicit/compulsory? Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From dpreed at reed.com Mon Aug 8 10:31:25 2005 From: dpreed at reed.com (David P. Reed) Date: Mon, 08 Aug 2005 13:31:25 -0400 Subject: [e2e] Expected latency for a single hop: What about 802.11 networks? In-Reply-To: <42F787EB.3040006@web.de> References: <42F20F14.3000007@web.de> <42F231EC.1060100@reed.com> <42F25E0F.4000601@web.de> <42F787EB.3040006@web.de> Message-ID: <42F796ED.7040401@reed.com> The MAC protocol in 802.11 is not ALOHA. You'd best get the spec if you really want to understand it, because it's pretty complex. It doesn't detect collisions, however. Nor does it depend on positive acks. It relies on collision avoidance techniques to reduce collision losses to a low enough level, and end-to-end acks to clean up the rest. There is a "polled" mode (point coordination function) that is hardly ever implemented. Instead, the "distributed coordination function" (DCF) is typically employed, but modified in many cases by RTS/CTS exchanges, this latter being the means to reduce collisions in most cases (CTS is a positive ack for RTS). Many networks are set up so that CTS/RTS applies only to long frames (i.e. file transfers). Ultimately, it means that what TCP/ACK observation sees when an 802.11 link is involved depends on how well the CTS/RTS works. From rja at extremenetworks.com Mon Aug 8 11:44:45 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Mon, 8 Aug 2005 14:44:45 -0400 Subject: [e2e] Expected latency for a single hop: What about 802.11 networks? In-Reply-To: <42F796ED.7040401@reed.com> References: <42F20F14.3000007@web.de> <42F231EC.1060100@reed.com> <42F25E0F.4000601@web.de> <42F787EB.3040006@web.de> <42F796ED.7040401@reed.com> Message-ID: <8F6DF257-8277-47FC-92A5-13EB5793E349@extremenetworks.com> On Aug 8, 2005, at 13:31, David P. Reed wrote: > The MAC protocol in 802.11 is not ALOHA. You'd best get the spec > if you really want to understand it, because it's pretty complex. > By the way, most IEEE 802.* standards are available in PDF at no cost from this URL: http://standards.ieee.org/getieee802/ From detlef.bosau at web.de Mon Aug 8 14:46:35 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 08 Aug 2005 23:46:35 +0200 Subject: [e2e] Expected latency for a single hop: What about 802.11 networks? References: <42F20F14.3000007@web.de> <42F231EC.1060100@reed.com> <42F25E0F.4000601@web.de> <42F787EB.3040006@web.de> <42F796ED.7040401@reed.com> Message-ID: <42F7D2BB.5040009@web.de> David P. Reed wrote: > The MAC protocol in 802.11 is not ALOHA. You'd best get the spec if you > really want to understand it, because it's pretty complex. > > It doesn't detect collisions, however. Nor does it depend on positive > acks. It relies on collision avoidance techniques to reduce collision > losses to a low enough level, and end-to-end acks to clean up the rest. > Oh :-( You just have destroyed my view of life.......... I knew about the CA stuff before, but not that 802.11 in fact does not care, when collision actually _occurs_. (Call me lazybones, call me coward, but I avoid reading IEEE standards whenever possible =8-0 It?s nevertheless inevitable sometimes, but I rather read 20 RFCs than 1 IEEE stanard. O.k., it?s a standard, not a cartoon.....) However, what you say here totally changes my way of thinking. I typically compare WLAN and Ethernet, which is still possible for low loads and when single, independent segments are compared. I.e., collusion does hardly occur and in a single segment e2e recovery should not behave that different than ALOHA, moreover there is hardly any network capacity at all and CWND etc. is small. In case of increasing load, and therefore an increasing number of collisions), and if the 802.11 network is the last link in a number of subsequent links, there should be quite a difference to Ethernet when all collision losses must be cured end to end.... O.k., bearing this in mind, local recovery protocols like snoop appear totally different to me than before. I think, it will take some days for me to understand all the consequences. Thanks a lot. I?ve learned something new today. BTW: (Of course I will find it in the standards, it?s only I fear it?s on page 345 of 800....) What is the _reason_ for this decision _not_ to handle actual collisions locally but leave it to the e2e protocol? To my understanding (up to now...) CA does _avoid_ collisions but does not totally prevent them. Or is CA that successfull that actual collisions can nearly be neglected? Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From crk at research.att.com Mon Aug 8 18:43:26 2005 From: crk at research.att.com (crk@research.att.com) Date: Mon, 8 Aug 2005 21:43:26 -0400 Subject: [e2e] Expected latency for a single hop: What about 802.11networks? Message-ID: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> QoS for 802.11 was actually fairly recently defined in the 802.11e specifications. Since the scheduled "HCCA" mode is a new addition, it is true that it's not widely deployed, but that will change. The unscheduled QoS mode uses randomized backoff timers with the max value determined by traffic class; the scheduled HCCA mode allows clients to provide a Tspec to an Access Point, which can then provide bounded and predictable delay. Since HCCA "parameterizes" QoS via scheduled "polls", the latency is normally max'd at the superframe beacon interval, but can be less. A typical example is 20 msec in one implementation that we've worked on with several vendors. These kinds of guarantees will be needed if you ever want to use 802.11 to provide WVoIP in an enterprise environment... I believe we have some good model outputs from simulations that we could share if there is an interest... Regards, chuck -----Original Message----- From: end2end-interest-bounces at postel.org [mailto:end2end-interest-bounces at postel.org] On Behalf Of David P. Reed Sent: Monday, August 08, 2005 1:31 PM To: Detlef Bosau Cc: Michael.kochte at gmx.net; end2end-interest at postel.org Subject: Re: [e2e] Expected latency for a single hop: What about 802.11networks? The MAC protocol in 802.11 is not ALOHA. You'd best get the spec if you really want to understand it, because it's pretty complex. It doesn't detect collisions, however. Nor does it depend on positive acks. It relies on collision avoidance techniques to reduce collision losses to a low enough level, and end-to-end acks to clean up the rest. There is a "polled" mode (point coordination function) that is hardly ever implemented. Instead, the "distributed coordination function" (DCF) is typically employed, but modified in many cases by RTS/CTS exchanges, this latter being the means to reduce collisions in most cases (CTS is a positive ack for RTS). Many networks are set up so that CTS/RTS applies only to long frames (i.e. file transfers). Ultimately, it means that what TCP/ACK observation sees when an 802.11 link is involved depends on how well the CTS/RTS works. From detlef.bosau at web.de Tue Aug 16 01:54:36 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 16 Aug 2005 10:54:36 +0200 Subject: [e2e] Latency Variation and Contention. References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> Message-ID: <4301A9CC.8090103@web.de> Hi to all. Recently, I found the following paper by Sherif M. ElRakabawy, Alexander Klemm and Christoph Lindemann: http://mobicom.cs.uni-dortmund.de/publications/TCP-AP_MobiHoc05.pdf The paper proposes a congestion control algorithm for ad hoc networks. Perhaps, this paper is interesting within the context of our latency discussion. However, I?m not yet convinced of this work. If I leave out some sheets of paper, some simulations and many words, the paper basically assumes that in ad hoc networks a TCP sender can measurethe degree of network contention using the variance of (recently seen) round trip times: -If the variance is close to zero, the network is hardly loaded. -If the variance is "high" (of course "high" is to be defined) there is a high degree of contention on this network. Afterwards the authors propose a sender pacing scheme, where a TCP flow?s rate is decreased with respect to the so measured "degree of contention". What I do not yet understand is basic assumption: variance 0 <=> no load; variance high <=> heavy load. Perhaps the main difficulty is that I believed this myself for years and it was an admittedly difficult task to convince me that I was wrong %-) However, @article{martin, journal = " IEEE/ACM TRANSACTIONS ON NETWORKING", volume ="11", number = "3", month = "June", year = "2003", title = "Delay--Based Congestion Avoidance for TCP", author = "Jim Martin and Arne Nilsson and Injong Rhee", } eventually did the job. More precisely, I looked at the latencies themselves, not the variances. Let?s consider a simple example. A network B "network" is some shared media packet switching network. Let?s place a TCP sender on A and the according sink on B. The simple question is (and I thought about this years ago without really coming to an end - I?m afraid I didn?t want to): Is a variance close to zero really equivalent for a low load situation? And does increasing variance indicate increasing load? Isn?t it possible that a variance close to zero is a consequence of a fully loaded network? And _decreasing_ load in that situation would cause the latencies to vary? If we could reliably identify a low load situation from a varaince close to zero, we could use the latencies themselves as a load indicator because we could reliably identify a "no load latency" and thus could identify imminent congestion by latency observation. One could even think of a "latency-congestion scale" which is calibrated first by variance observation in order to get the "unloaded" mark and second by drop observation and some loss differentation technique to get the "imminent congestion" mark. To my knowledge, this is extensively discussed in literature - until Martin, Nilsson and Rhee found the mentioned results. Now, back to my example and the basic question: Does the assumption, latency variations indicate the degree of contention in an ad hoch network, really hold? I admit, I personally do not yet see an evidence for this. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From s.malik at tuhh.de Tue Aug 16 03:22:55 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Tue, 16 Aug 2005 12:22:55 +0200 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <4301A9CC.8090103@web.de> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> Message-ID: <4301BE7F.9080107@tuhh.de> Hi, Have not read the paper, however, I think that if, RTT = Round Trip Time, and dRTT = variations in RTT, then "dRTT" is a weak/poor indicator of congestion. A congestion signal based upon "dRTT/RTT" would give a much better idea, relatively speaking. -- Sireen Detlef Bosau wrote: > Hi to all. > > Recently, I found the following paper by Sherif M. ElRakabawy, > Alexander Klemm and Christoph Lindemann: > > http://mobicom.cs.uni-dortmund.de/publications/TCP-AP_MobiHoc05.pdf > > The paper proposes a congestion control algorithm for ad hoc networks. > Perhaps, this paper is interesting within the context of our latency > discussion. > > However, I?m not yet convinced of this work. > > If I leave out some sheets of paper, some simulations and many words, > the paper basically assumes that in ad hoc networks a TCP sender can > measurethe degree of network contention using the variance of > (recently seen) round trip times: > > -If the variance is close to zero, the network is hardly loaded. > -If the variance is "high" (of course "high" is to be defined) there > is a high degree of contention on this network. > > Afterwards the authors propose a sender pacing scheme, where a TCP > flow?s rate is decreased with respect to the so measured "degree of > contention". > > What I do not yet understand is basic assumption: variance 0 <=> no > load; variance high <=> heavy load. > > Perhaps the main difficulty is that I believed this myself for years > and it was an admittedly difficult task to convince me that I was > wrong %-) > However, > > @article{martin, > journal = " IEEE/ACM TRANSACTIONS ON NETWORKING", > volume ="11", > number = "3", > month = "June", > year = "2003", > title = "Delay--Based Congestion Avoidance for TCP", > author = "Jim Martin and Arne Nilsson and Injong Rhee", > } > eventually did the job. > > More precisely, I looked at the latencies themselves, not the variances. > > > Let?s consider a simple example. > > A network B > > "network" is some shared media packet switching network. > Let?s place a TCP sender on A and the according sink on B. > > The simple question is (and I thought about this years ago without > really coming to an end - I?m afraid I didn?t want to): > > Is a variance close to zero really equivalent for a low load situation? > And does increasing variance indicate increasing load? > > Isn?t it possible that a variance close to zero is a consequence of a > fully loaded network? And _decreasing_ load in that situation would > cause the latencies to vary? > > If we could reliably identify a low load situation from a varaince > close to zero, we could use the latencies themselves as a load > indicator because we could reliably identify a "no load latency" and > thus could identify imminent congestion by latency observation. > > One could even think of a "latency-congestion scale" which is > calibrated first by variance observation in order to get the > "unloaded" mark and second by drop observation and some loss > differentation technique to get the "imminent congestion" mark. > > To my knowledge, this is extensively discussed in literature - until > Martin, Nilsson and Rhee found the mentioned results. > > Now, back to my example and the basic question: Does the assumption, > latency variations indicate the degree of contention in an ad hoch > network, really hold? > > I admit, I personally do not yet see an evidence for this. > > Detlef -- M.Sc.-Ing. Sireen Malik Communication Networks Hamburg University of Technology FSP 4-06 (room 5.012) Schwarzenbergstrasse 95 (IVD) 21073-Hamburg, Deutschland Tel: +49 (40) 42-878-3443 Fax: +49 (40) 42-878-2941 E-Mail: s.malik at tuhh.de --Everything should be as simple as possible, but no simpler (Albert Einstein) From detlef.bosau at web.de Tue Aug 16 03:57:16 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 16 Aug 2005 12:57:16 +0200 Subject: [e2e] Latency Variation and Contention. References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> Message-ID: <4301C68B.7018257A@web.de> Sireen Habib Malik wrote: > > Hi, > > Have not read the paper, however, I think that if, > > RTT = Round Trip Time, and > dRTT = variations in RTT, > > then "dRTT" is a weak/poor indicator of congestion. > > A congestion signal based upon "dRTT/RTT" would give a much better idea, > relatively speaking. Hm. At least, it looks more complex ;-) However, it does not really affect the "hi-lo-quest". As far as I see, the basic question is: Can we detect / react upon network congestion by latency observation? It is no big deal whether we look at the RTT or variance. We can even look at higher moments of RTT (skewness, curtosis), we can introduce quantiles and thresholds, we can use any formula TeX is able to print :-) The question is: Can we distinguish a loaded network from an unloaded one by (pure) latency observation / evaluation. Detlef > > -- > Sireen > -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From touch at ISI.EDU Tue Aug 16 16:29:25 2005 From: touch at ISI.EDU (Joe Touch) Date: Tue, 16 Aug 2005 16:29:25 -0700 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <4301BE7F.9080107@tuhh.de> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> Message-ID: <430276D5.1010706@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sireen Habib Malik wrote: > Hi, > > Have not read the paper, however, I think that if, > > RTT = Round Trip Time, and > dRTT = variations in RTT, > > then "dRTT" is a weak/poor indicator of congestion. but a good indicator that congestion control will be hard to compute ;-) stability is f(dRTT), not f(RTT) RTT is a function of distance, in general dRTT is a function of the number of hops, in general Changes in the two - relative or absolute - don't seem to tell you much more than that, though. > A congestion signal based upon "dRTT/RTT" would give a much better idea, > relatively speaking. relative variance = variance/mean but noise is more closely correlated to variance than to relative variance, which makes sense if dRTT = variance what you are aiming at is SNR, i.e., 10log10(RTT/dRTT) Joe > > -- > Sireen > > > > > > > > > > > Detlef Bosau wrote: > >> Hi to all. >> >> Recently, I found the following paper by Sherif M. ElRakabawy, >> Alexander Klemm and Christoph Lindemann: >> >> http://mobicom.cs.uni-dortmund.de/publications/TCP-AP_MobiHoc05.pdf >> >> The paper proposes a congestion control algorithm for ad hoc networks. >> Perhaps, this paper is interesting within the context of our latency >> discussion. >> >> However, I?m not yet convinced of this work. >> >> If I leave out some sheets of paper, some simulations and many words, >> the paper basically assumes that in ad hoc networks a TCP sender can >> measurethe degree of network contention using the variance of >> (recently seen) round trip times: >> >> -If the variance is close to zero, the network is hardly loaded. >> -If the variance is "high" (of course "high" is to be defined) there >> is a high degree of contention on this network. >> >> Afterwards the authors propose a sender pacing scheme, where a TCP >> flow?s rate is decreased with respect to the so measured "degree of >> contention". >> >> What I do not yet understand is basic assumption: variance 0 <=> no >> load; variance high <=> heavy load. >> >> Perhaps the main difficulty is that I believed this myself for years >> and it was an admittedly difficult task to convince me that I was >> wrong %-) >> However, >> >> @article{martin, >> journal = " IEEE/ACM TRANSACTIONS ON NETWORKING", >> volume ="11", >> number = "3", >> month = "June", >> year = "2003", >> title = "Delay--Based Congestion Avoidance for TCP", >> author = "Jim Martin and Arne Nilsson and Injong Rhee", >> } >> eventually did the job. >> >> More precisely, I looked at the latencies themselves, not the variances. >> >> >> Let?s consider a simple example. >> >> A network B >> >> "network" is some shared media packet switching network. >> Let?s place a TCP sender on A and the according sink on B. >> >> The simple question is (and I thought about this years ago without >> really coming to an end - I?m afraid I didn?t want to): >> >> Is a variance close to zero really equivalent for a low load situation? >> And does increasing variance indicate increasing load? >> >> Isn?t it possible that a variance close to zero is a consequence of a >> fully loaded network? And _decreasing_ load in that situation would >> cause the latencies to vary? >> >> If we could reliably identify a low load situation from a varaince >> close to zero, we could use the latencies themselves as a load >> indicator because we could reliably identify a "no load latency" and >> thus could identify imminent congestion by latency observation. >> >> One could even think of a "latency-congestion scale" which is >> calibrated first by variance observation in order to get the >> "unloaded" mark and second by drop observation and some loss >> differentation technique to get the "imminent congestion" mark. >> >> To my knowledge, this is extensively discussed in literature - until >> Martin, Nilsson and Rhee found the mentioned results. >> >> Now, back to my example and the basic question: Does the assumption, >> latency variations indicate the degree of contention in an ad hoch >> network, really hold? >> >> I admit, I personally do not yet see an evidence for this. >> >> Detlef > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDAnbVE5f5cImnZrsRAtC9AKDYkULbaAz4y93+Ym5iIuv/rVZEWgCfW5vy MELJpDvHjw5QDGjl4dDUtLU= =lMcl -----END PGP SIGNATURE----- From s.malik at tuhh.de Wed Aug 17 02:36:41 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Wed, 17 Aug 2005 11:36:41 +0200 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <430276D5.1010706@isi.edu> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> <430276D5.1010706@isi.edu> Message-ID: <43030529.3050506@tuhh.de> Hi, >>what you are aiming at is SNR, i.e., 10log10(RTT/dRTT) So we are getting somewhere now :-) Right. SNR is the signal strength normalized to the noise strength. For dRTT=0, SNR=f(RTT/dRTT)=infinite. I considered "congestion" as the noise strength normalized to the signal strength. For dRTT=0, congestion signal based upon dRTT/RTT= f(dRTT/RTT)= zero = no congestion. So I reckon a congestion signal that looks like 1/(10log10(RTT/dRTT)) should do the trick. -- Sireen Joe Touch wrote: >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > > > >Sireen Habib Malik wrote: > > >>Hi, >> >>Have not read the paper, however, I think that if, >> >>RTT = Round Trip Time, and >>dRTT = variations in RTT, >> >>then "dRTT" is a weak/poor indicator of congestion. >> >> > >but a good indicator that congestion control will be hard to compute ;-) > >stability is f(dRTT), not f(RTT) > >RTT is a function of distance, in general >dRTT is a function of the number of hops, in general > >Changes in the two - relative or absolute - don't seem to tell you much >more than that, though. > > > >>A congestion signal based upon "dRTT/RTT" would give a much better idea, >>relatively speaking. >> >> > >relative variance = variance/mean > >but noise is more closely correlated to variance than to relative >variance, which makes sense if dRTT = variance > >what you are aiming at is SNR, i.e., 10log10(RTT/dRTT) > >Joe > > > >>-- >>Sireen >> >> >> >> >> >> >> >> >> >> >>Detlef Bosau wrote: >> >> >> >>>Hi to all. >>> >>>Recently, I found the following paper by Sherif M. ElRakabawy, >>>Alexander Klemm and Christoph Lindemann: >>> >>>http://mobicom.cs.uni-dortmund.de/publications/TCP-AP_MobiHoc05.pdf >>> >>>The paper proposes a congestion control algorithm for ad hoc networks. >>>Perhaps, this paper is interesting within the context of our latency >>>discussion. >>> >>>However, I?m not yet convinced of this work. >>> >>>If I leave out some sheets of paper, some simulations and many words, >>>the paper basically assumes that in ad hoc networks a TCP sender can >>>measurethe degree of network contention using the variance of >>>(recently seen) round trip times: >>> >>>-If the variance is close to zero, the network is hardly loaded. >>>-If the variance is "high" (of course "high" is to be defined) there >>>is a high degree of contention on this network. >>> >>>Afterwards the authors propose a sender pacing scheme, where a TCP >>>flow?s rate is decreased with respect to the so measured "degree of >>>contention". >>> >>>What I do not yet understand is basic assumption: variance 0 <=> no >>>load; variance high <=> heavy load. >>> >>>Perhaps the main difficulty is that I believed this myself for years >>>and it was an admittedly difficult task to convince me that I was >>>wrong %-) >>>However, >>> >>> @article{martin, >>> journal = " IEEE/ACM TRANSACTIONS ON NETWORKING", >>> volume ="11", >>> number = "3", >>> month = "June", >>> year = "2003", >>> title = "Delay--Based Congestion Avoidance for TCP", >>> author = "Jim Martin and Arne Nilsson and Injong Rhee", >>> } >>>eventually did the job. >>> >>>More precisely, I looked at the latencies themselves, not the variances. >>> >>> >>>Let?s consider a simple example. >>> >>> A network B >>> >>>"network" is some shared media packet switching network. >>>Let?s place a TCP sender on A and the according sink on B. >>> >>>The simple question is (and I thought about this years ago without >>>really coming to an end - I?m afraid I didn?t want to): >>> >>>Is a variance close to zero really equivalent for a low load situation? >>>And does increasing variance indicate increasing load? >>> >>>Isn?t it possible that a variance close to zero is a consequence of a >>>fully loaded network? And _decreasing_ load in that situation would >>>cause the latencies to vary? >>> >>>If we could reliably identify a low load situation from a varaince >>>close to zero, we could use the latencies themselves as a load >>>indicator because we could reliably identify a "no load latency" and >>>thus could identify imminent congestion by latency observation. >>> >>>One could even think of a "latency-congestion scale" which is >>>calibrated first by variance observation in order to get the >>>"unloaded" mark and second by drop observation and some loss >>>differentation technique to get the "imminent congestion" mark. >>> >>>To my knowledge, this is extensively discussed in literature - until >>>Martin, Nilsson and Rhee found the mentioned results. >>> >>>Now, back to my example and the basic question: Does the assumption, >>>latency variations indicate the degree of contention in an ad hoch >>>network, really hold? >>> >>>I admit, I personally do not yet see an evidence for this. >>> >>>Detlef >>> >>> >> >> >> >> >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v1.2.4 (MingW32) >Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > >iD8DBQFDAnbVE5f5cImnZrsRAtC9AKDYkULbaAz4y93+Ym5iIuv/rVZEWgCfW5vy >MELJpDvHjw5QDGjl4dDUtLU= >=lMcl >-----END PGP SIGNATURE----- > > -- M.Sc.-Ing. Sireen Malik Communication Networks Hamburg University of Technology FSP 4-06 (room 5.012) Schwarzenbergstrasse 95 (IVD) 21073-Hamburg, Deutschland Tel: +49 (40) 42-878-3443 Fax: +49 (40) 42-878-2941 E-Mail: s.malik at tuhh.de --Everything should be as simple as possible, but no simpler (Albert Einstein) From detlef.bosau at web.de Wed Aug 17 04:13:32 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 17 Aug 2005 13:13:32 +0200 Subject: [e2e] Latency Variation and Contention. References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> <430276D5.1010706@isi.edu> <43030529.3050506@tuhh.de> Message-ID: <43031BDC.1000608@web.de> Your comments are both, helpful and enlightning. Nevertheless, please, allow me to re-focus the discussion. The assertion made by ElRakbawy, Klemm and Lindemann is: Ass.1: Network contention can be measured by measuring the RTT variance. A small variance is equivalent to a low degree of contention and a high variance is equivalent to a high degree of contention. Assertions like these can be met in literature several times and it?s simply the question whether this assertion is true or not. Personally, I am in great doubt at this. It?s exactly what David P. Reed pointed out some weeks ago. Before building brittle constructs upon questionable assertions, it is important to have a solid _basis_. Here in Germany, we have a saying: "Das Fundament ist die Grundlage jeglicher Basis." I don?t know whether there exists an english equivalent, but this makes the very difference whether a space shuttle pilot coming home is busy with landing or busy with prayer. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From touch at ISI.EDU Wed Aug 17 07:34:08 2005 From: touch at ISI.EDU (Joe Touch) Date: Wed, 17 Aug 2005 07:34:08 -0700 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <43030529.3050506@tuhh.de> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> <430276D5.1010706@isi.edu> <43030529.3050506@tuhh.de> Message-ID: <43034AE0.7070000@isi.edu> Sireen Habib Malik wrote: > Hi, > > >>>what you are aiming at is SNR, i.e., 10log10(RTT/dRTT) > > So we are getting somewhere now :-) > > Right. SNR is the signal strength normalized to the noise strength. For > dRTT=0, SNR=f(RTT/dRTT)=infinite. > > I considered "congestion" as the noise strength normalized to the signal > strength. For dRTT=0, congestion signal based upon dRTT/RTT= > f(dRTT/RTT)= zero = no congestion. You can consider it the noise ratio, but why? There are other reasons that RTT can vary - multipath routing, in particular. All SNR does here is tell you how noisy the RTT is, which tells you how good you can run your feedback control (which is RTT-dependent). It doesn't tell you whether there is congestion, though. There may be a correlation in some systems, but it's not cause-effect. There are too many other causes for noisy RTTs. Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050817/c433745c/signature.bin From touch at ISI.EDU Wed Aug 17 07:37:06 2005 From: touch at ISI.EDU (Joe Touch) Date: Wed, 17 Aug 2005 07:37:06 -0700 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <43031BDC.1000608@web.de> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> <430276D5.1010706@isi.edu> <43030529.3050506@tuhh.de> <43031BDC.1000608@web.de> Message-ID: <43034B92.9040206@isi.edu> Detlef Bosau wrote: > Your comments are both, helpful and enlightning. > > Nevertheless, please, allow me to re-focus the discussion. > > The assertion made by ElRakbawy, Klemm and Lindemann is: > > Ass.1: Network contention can be measured by measuring the RTT > variance. A small variance is equivalent to a low degree of contention > and a high variance is equivalent to a high degree of contention. > > Assertions like these can be met in literature several times and it?s > simply the question whether this assertion is true or not. > > Personally, I am in great doubt at this. As am I. Multipath routing can cause it, i.e. All you know when the RTT is noisy is that the RTT is noisy, and then that anything that depends on the RTT (e.g., the window size) is necessarily imprecise. > It?s exactly what David P. Reed pointed out some weeks ago. > Before building brittle constructs upon questionable assertions, it is > important to have a solid _basis_. > > Here in Germany, we have a saying: "Das Fundament ist die Grundlage > jeglicher Basis." I don?t know whether there exists an english > equivalent, but this makes the very difference whether a space shuttle > pilot coming home is busy with landing or busy with prayer. > > Detlef Ours is "correlation != cause & effect". Gets at the same point, at the end of the day. Joe -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 254 bytes Desc: OpenPGP digital signature Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050817/1127d32f/signature.bin From faber at ISI.EDU Thu Aug 18 08:59:09 2005 From: faber at ISI.EDU (Ted Faber) Date: Thu, 18 Aug 2005 08:59:09 -0700 Subject: [e2e] Latency Variation and Contention. In-Reply-To: <43034B92.9040206@isi.edu> References: <387B5A9BF31B5D43A2B18DD9F326B8E1DA68AB@NJFPSRVEXG2KCL.research.att.com> <4301A9CC.8090103@web.de> <4301BE7F.9080107@tuhh.de> <430276D5.1010706@isi.edu> <43030529.3050506@tuhh.de> <43031BDC.1000608@web.de> <43034B92.9040206@isi.edu> Message-ID: <20050818155909.GC14126@pun.isi.edu> On Wed, Aug 17, 2005 at 07:37:06AM -0700, Joe Touch wrote: > Detlef Bosau wrote: > > Here in Germany, we have a saying: "Das Fundament ist die Grundlage > > jeglicher Basis." I don?t know whether there exists an english > > equivalent, but this makes the very difference whether a space shuttle > > pilot coming home is busy with landing or busy with prayer. > > Ours is "correlation != cause & effect". Gets at the same point, at the > end of the day. You're basically right, but lets be a little more precise. In any network that queues packets, in the absence of any other effects, the onset of congestion will result in an increase in the RTT of a given connection sampled over an RTT. This is a causal relationship: congestion causes RTT increases. There are at least 3 problems with using that observation to detect congestion: 1. Lots of other things (OS artifacts, route changes, wireless delays, ARQ) cause RTT variation. Just as with using packet loss as a congestion indication, a mistaken inference can cause a source to slow when unnecessary or speed up when unwarranted. All congestion causes RTT increases; not all increased RTTs indicate congestion. 2. Sometimes the change caused by congestion is too small to be reliably detected, even without the noise sources above. This can be because there are a lot of sources in a net near capacity or a lot of fixed delay on the path (queueing delay Earth to Mars might be hard to detect). Small queues also make this difficult, and if the recent SIGCOMM work on sizing routers is to be believed, small buffer sizes may become more common. 3. The queueing discipline in use can make detection of congestion related RTT increases, even without confounding noise in that signal and when the change is detectable, a matter of statistics. The amount of change in RTT that a source sees will be affected by how other packets are interleaved. A source can detect a small change in RTT in a byte-fair WFQ system much more quickly and reliably than in a FIFO system with varying packet sizes. Having to sample and analyze increases the work the sender does and slows the reaction time of sources. Certainly many systems have been proposed that ise RTT as a congestion indication, from Vegas through FAST to a bunch I've certainly lost track of. To use it as the only indication, successfully, in a rich network environment, requires addressing at least the problems above. There are also cases where the network environment is less rich and you can rule one or more of these out. Congestion causes RTT increases. Finding those RTT increases that are due to congestion can be tricky. -- Ted Faber http://www.isi.edu/~faber PGP: http://www.isi.edu/~faber/pubkeys.asc Unexpected attachment on this mail? See http://www.isi.edu/~faber/FAQ.html#SIG -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050818/4c027bfe/attachment.bin From keshav at uwaterloo.ca Thu Aug 18 09:15:46 2005 From: keshav at uwaterloo.ca (S. Keshav) Date: Thu, 18 Aug 2005 12:15:46 -0400 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 In-Reply-To: Message-ID: Detlef, >> The assertion made by ElRakbawy, Klemm and Lindemann is: >> >> Ass.1: Network contention can be measured by measuring the RTT >> variance. A small variance is equivalent to a low degree of contention >> and a high variance is equivalent to a high degree of contention. ... >> Personally, I am in great doubt at this. > RTT delay is influenced by the following factors: 1. Speed of light delay in the path 2. Retransmissions in the underlay 3. Queues in buffers due to a. self queueing (queueing behind your own packets) b. queueing due to cross traffic 4. The service rate of within a switch fabric in a router 5. The size of the packet whose RTT is measured Variance in the RTT can be due to variation in any of the above. So, if you want to measure contention, you have to do some things cleverly at the sender: keep packet size fixed send at a `slow' rate and also assume that paths are pinned there are no retransmissions in the underlay If these hold, then you can link RTT variation to contention. keshav From alokdube at hotpop.com Thu Aug 18 10:29:06 2005 From: alokdube at hotpop.com (Alok) Date: Thu, 18 Aug 2005 22:59:06 +0530 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: Message-ID: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> > >> Personally, I am in great doubt at this. > > > > RTT delay is influenced by the following factors: > > 1. Speed of light delay in the path > 2. Retransmissions in the underlay > 3. Queues in buffers due to > a. self queueing (queueing behind your own packets) > b. queueing due to cross traffic Do routers/ATM switches use queues for "congestion control" or because most of their cards and backplanes are asynchronous? From touch at ISI.EDU Thu Aug 18 13:23:42 2005 From: touch at ISI.EDU (Joe Touch) Date: Thu, 18 Aug 2005 13:23:42 -0700 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 In-Reply-To: References: Message-ID: <4304EE4E.7070804@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 S. Keshav wrote: > Detlef, > > >>>The assertion made by ElRakbawy, Klemm and Lindemann is: >>> >>>Ass.1: Network contention can be measured by measuring the RTT >>>variance. A small variance is equivalent to a low degree of contention >>>and a high variance is equivalent to a high degree of contention. > > ... > >>>Personally, I am in great doubt at this. >> > > RTT delay is influenced by the following factors: > > 1. Speed of light delay in the path > 2. Retransmissions in the underlay > 3. Queues in buffers due to > a. self queueing (queueing behind your own packets) > b. queueing due to cross traffic > 4. The service rate of within a switch fabric in a router > 5. The size of the packet whose RTT is measured > > Variance in the RTT can be due to variation in any of the above. > So, if you want to measure contention, you have to do some things cleverly > at the sender: > keep packet size fixed > send at a `slow' rate > and also assume that > paths are pinned > there are no retransmissions in the underlay and that the underlay hops have stable RTTs; non-geosync satellites have varying RTTs and the points about pinning and retransmissions apply to the link layers as well as to the network. > If these hold, then you can link RTT variation to contention. Yes - but when RTT variance goes up, it means that contention increased or decreased. It seems more useful to use the first derivative of the RTT than to use the variance, in that case. Joe > > keshav > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDBO5OE5f5cImnZrsRAkxsAJ4hffBVMajFvgKyj/3wEiSD/pEcSgCg8QNr jG//2Tuzz/lXCY6ZgMt2XWM= =Gmt9 -----END PGP SIGNATURE----- From touch at ISI.EDU Thu Aug 18 13:25:41 2005 From: touch at ISI.EDU (Joe Touch) Date: Thu, 18 Aug 2005 13:25:41 -0700 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 In-Reply-To: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> Message-ID: <4304EEC5.4050300@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alok wrote: >>>>Personally, I am in great doubt at this. >>> >>RTT delay is influenced by the following factors: >> >>1. Speed of light delay in the path >>2. Retransmissions in the underlay >>3. Queues in buffers due to >> a. self queueing (queueing behind your own packets) >> b. queueing due to cross traffic > > Do routers/ATM switches use queues for "congestion control" or because most > of their cards and backplanes are asynchronous? it depends on where the queues are: input queues help more for asynch backplanes/cards, as well as forwarding-based congestion (limits to header processing, e.g., for VPNs terminating IPsec) output queues are needed for output port contention congestion control, i.e., where the output link is the limiting factor Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDBO7FE5f5cImnZrsRAmdoAJ45G5GBo1JicYaRFo6ZQaAMm2eCOACgq66y zryzGiA8BDVfnDi//zugQM0= =ydER -----END PGP SIGNATURE----- From detlef.bosau at web.de Thu Aug 18 14:15:51 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 18 Aug 2005 23:15:51 +0200 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> Message-ID: <4304FA87.70108@web.de> Alok wrote: >>>>Personally, I am in great doubt at this. >>> >>RTT delay is influenced by the following factors: >> >>1. Speed of light delay in the path >>2. Retransmissions in the underlay >>3. Queues in buffers due to >> a. self queueing (queueing behind your own packets) >> b. queueing due to cross traffic > > > Do routers/ATM switches use queues for "congestion control" or because most > of their cards and backplanes are asynchronous? > > > To my understanding, queues have two purposes. 1. Rate adaptation, this includes adaptation of a flow to possible MAC delays. 2. Interleaving/Mixing of flows. Basically, these two are 3a and 3b in Keshav?s post. So, to answer your question: In a packet switching system congestion takes place in queues of store & forward nodes, especially when incoming and outgoing lines are asynchronous. I?m hesitant to make too much words here, because each word may be wrong. A very helpful rerefence is Raj Jain?s paper "A Delay-Based Approach for Congestion Avoidance in Interconnected Heterogenous Computer Networks". I always found this work helpful to understand the role of switches/routers. For congestion control itself, there are two "extreme positions" and, as in most cases where extreme positions exist, combinationes and middle courses. The first position is a strict End to End approach: Routers don?t care about congestion. If a queue runs out of space, there?s no alternative left for a router than to discard a packet. In this extreme view: _Silently_ discard a packet. Consequently, end systems must react upon packet loss / congestion notification appropriately. Look at the congavoid paper for this approach. The second position is a continous control of each flow hop by hop. Spoken very simplified: We do traffic shaping on each node, in a well controlled manner. I think (I must be careful here, I had a glance at this quite a long time ago, so forgive me if I?m wrong or unprecise here) this appproach is discussed in Keshav?s PhD thesis. If we take the second position: Yes, routers and switches use queues for congestion control. For middle courses and approaches "in between" think of active queue managemet and RED. And of course quite a number of PEP approaches, which often interconnect packet switching networks where congestion control is difficult to achieve using identical algorithms, e.g. (error-)loss free networks and lossy networks as for example 802.11 networks. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Thu Aug 18 23:59:03 2005 From: alokdube at hotpop.com (Alok) Date: Fri, 19 Aug 2005 12:29:03 +0530 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> <4304FA87.70108@web.de> Message-ID: <009e01c5a48b$7eb6b640$070218ac@rs.riverstonenet.com> Inline => ----- Original Message ----- From: "Detlef Bosau" To: Sent: Friday, August 19, 2005 2:45 AM Subject: Re: [e2e] end2end-interest Digest, Vol 18, Issue 9 > Alok wrote: > >>>>Personally, I am in great doubt at this. > >>> > >>RTT delay is influenced by the following factors: > >> > >>1. Speed of light delay in the path > >>2. Retransmissions in the underlay > >>3. Queues in buffers due to > >> a. self queueing (queueing behind your own packets) > >> b. queueing due to cross traffic > > > > > > Do routers/ATM switches use queues for "congestion control" or because most > > of their cards and backplanes are asynchronous? > > > > > > > > > To my understanding, queues have two purposes. > > 1. Rate adaptation, this includes adaptation of a flow to possible MAC > delays. Which means you have a bandwidth gradient and you buffer to handle the gradient. Which again means you have to "work on windows" and "buffer for windows" as far as TCP is concerned Simply put: ------->10Mbps---->R1----->1Mbps---> Implies R1 has to buffer , and the buffer size can be *finite* only if the traffic has a window/burst size is finite. > > 2. Interleaving/Mixing of flows. > Let me put the question in a simpler manner, assume no TOS/DSCP, why does one need queues at all???? The only time you can do a buffer is if there is an window on top capping ur burst For example, if the 10Meg guy pumps UDP at 10Meg continuously, no amount of buffering is going to help you on the 1Meg link. As far as I understand, queues are only as the inherent architecture is async, Say I have 1M----| |--------1M 1M----| switching element |--------1M 1M----| |--------1M all my switching element needs to be able to do is to switch at round robin at 6*1M ...right? Now where and why do I need the queues? Only reason that comes to mind is the async. nature (each 1M is not clocked by the same clock etc), but the queue size still does not need to be that high, does it? -thanks Alok From detlef.bosau at web.de Fri Aug 19 06:27:31 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 19 Aug 2005 15:27:31 +0200 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> <4304FA87.70108@web.de> <009e01c5a48b$7eb6b640$070218ac@rs.riverstonenet.com> Message-ID: <4305DE43.6060305@web.de> Alok wrote: >>To my understanding, queues have two purposes. >> >>1. Rate adaptation, this includes adaptation of a flow to possible MAC >>delays. > > > Which means you have a bandwidth gradient and you buffer to handle the > gradient. Yes. > Which again means you have to "work on windows" and "buffer for windows" as > far as TCP is concerned > > Simply put: > ------->10Mbps---->R1----->1Mbps---> > > Implies R1 has to buffer , and the buffer size can be *finite* only if the > traffic has a window/burst size is finite. > > Yes. It?s interesting. Some weeks ago, I got criticism why I beat the conservation principle drum here. taram! taram! tataram! I beat the con-principle drum! I can only repeat it again and again: Exactly _this_ is the purporse of ACK pacing and the conservation principle in TCP. A flow must not have more packets in transit than the congestion window allows (the "equilibrium window") and a packet must not be sent to the network until some other packet was taken away. _This_ and nothing else limits the "energy" put into the network (the analogy to physics is obvious: We talk about energy conservation, impulse conservation, sometimes I think, Van Jacobson and Sir Isaac are best friends :-)) and hence bursts, oscillation etc. are limited. Recall the Takoma bridge disaster, make the wind to stop blowing - the Takoma bridge may oscillate to eternity, but at least it was still there. > >>2. Interleaving/Mixing of flows. >> > > > Let me put the question in a simpler manner, > > assume no TOS/DSCP, why does one need queues at all???? The simple answer is: We do not need them. The more complex answer can be found e.g. in Jains "Delay" paper: Limited queues with a length thoroughly thought through can improve network performance. > The only time you can do a buffer is if there is an window on top capping ur > burst > Not quite. Think of RED. > For example, if the 10Meg guy pumps UDP at 10Meg continuously, no amount of > buffering is going to help you on the 1Meg link. > But this guy is really misbehaved: He is not responsive. Responsiveness is no part of UDP. Therefore, the application is responsible for responsiveness here. Admittedly, people forget about this quite often. It?s not an academic example, but on the support newsgroup of my ISP some guys recently detected ping. Ping. PING. PIIIIIIIIIIIIIIIIIIIIII............... ..................................... ................................................................................... Oh, you miss the rest of my post? The reason is simple: "NG" is yet to come. So, once again I take my drum, taram, taram, tataram..... Perhaps I can join a parade? The Internet still remains a well behaved community. Some administrators block ping. The guys on my ISP?s newsgroup call those administrators bad guys. I recall: "Good fences make good neighbours". > As far as I understand, queues are only as the inherent architecture is > async, Even that would not _require_ a queue. Think of Ethernet. What else is a "congestion" than a "collision", when there is no queueing on the router? So, if we had no queues, the Internet would run. Perhaps the throughput could be somewhat higher, perhaps the way the Internet runs would be more similar to a turtle than to Achilles - but who cares? Isn?t there still snail mail delievered sent by soldiers who served with General Custer? However, too large a queue can have the same effect. > > Say I have > > 1M----| |--------1M > 1M----| switching element |--------1M > 1M----| |--------1M > > > all my switching element needs to be able to do is to switch at round robin > at 6*1M ...right? Right. > Now where and why do I need the queues? Only reason that comes to mind is > the async. nature (each 1M is not clocked by the same clock etc), but the > queue size still does not need to be that high, does it? Exactly. And even no queuing (called "cut through switching" in the good old days from the past) would work. But then, packets arriving at the switch at the same time would result in the same effect as collisions. However, this debate was conducted in the eighties. So, I?m curious why some people buried tons of queueing memory in routers during the last ten years (perhaps the disaster in Cobe was overcome and now there was some amount of memory chips to be sold) and recently, researchers detect that small queues could be useful. Queues should be small. IIRC, this is exactly what John Nagle, Raj Jain and perhaps countless others told us twenty years ago. However, in extremely asnchronous situations, think of mobile wireless networks connected to the Internet, a reasonable amount of queuing is unevitable. I got a paper submission rejected this year with the enlightning comment "overqueing is bad, refer to Reiner Ludwigs PhD dissertation". I know Reiner Ludwigs PhD dissertation. When he claims, overqueueing is bad, he is perfectly right as all the researchers before. It?s really an old story. However, when service times oscillate from milliseconds to _minutes_(!) at the last mile (refer to the relevant ETSI/ITU standards for GPRS before calling me nuts), traffic might happen to be a little bursty if not equalized by queues and appropriate techniques. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Fri Aug 19 11:22:46 2005 From: alokdube at hotpop.com (Alok) Date: Fri, 19 Aug 2005 23:52:46 +0530 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> <4304FA87.70108@web.de> <009e01c5a48b$7eb6b640$070218ac@rs.riverstonenet.com> <4305DE43.6060305@web.de> Message-ID: <010f01c5a4eb$020132f0$6401a8c0@rs.riverstonenet.com> ok..either u have been smoking stuff.. or this is one helluva email :o) ----- Original Message ----- From: "Detlef Bosau" To: "Alok" Cc: Sent: Friday, August 19, 2005 6:57 PM Subject: Re: [e2e] end2end-interest Digest, Vol 18, Issue 9 Alok wrote: >>To my understanding, queues have two purposes. >> >>1. Rate adaptation, this includes adaptation of a flow to possible MAC >>delays. > > > Which means you have a bandwidth gradient and you buffer to handle the > gradient. Yes. > Which again means you have to "work on windows" and "buffer for windows" as > far as TCP is concerned > > Simply put: > ------->10Mbps---->R1----->1Mbps---> > > Implies R1 has to buffer , and the buffer size can be *finite* only if the > traffic has a window/burst size is finite. > > Yes. It?s interesting. Some weeks ago, I got criticism why I beat the conservation principle drum here. taram! taram! tataram! I beat the con-principle drum! I can only repeat it again and again: Exactly _this_ is the purporse of ACK pacing and the conservation principle in TCP. Alok=> okie! A flow must not have more packets in transit than the congestion window allows (the "equilibrium window") and a packet must not be sent to the network until some other packet was taken away. Alok=> ahh!! and how do we "know that"?? _This_ and nothing else limits the "energy" put into the network (the analogy to physics is obvious: We talk about energy conservation, impulse conservation, sometimes I think, Van Jacobson and Sir Isaac are best friends :-)) and hence bursts, oscillation etc. are limited. Alok=> ? so? Recall the Takoma bridge disaster, make the wind to stop blowing - the Takoma bridge may oscillate to eternity, but at least it was still there. Alok=> :-) if u can find the freq, it will still beat! > >>2. Interleaving/Mixing of flows. >> > > > Let me put the question in a simpler manner, > > assume no TOS/DSCP, why does one need queues at all???? The simple answer is: We do not need them. The more complex answer can be found e.g. in Jains "Delay" paper: Limited queues with a length thoroughly thought through can improve network performance. Alok=> My ability to read is limited. > The only time you can do a buffer is if there is an window on top capping ur > burst > Not quite. Think of RED. Alok==> how so? > For example, if the 10Meg guy pumps UDP at 10Meg continuously, no amount of > buffering is going to help you on the 1Meg link. > But this guy is really misbehaved: He is not responsive. Responsiveness is no part of UDP. Therefore, the application is responsible for responsiveness here. Admittedly, people forget about this quite often. It?s not an academic example, but on the support newsgroup of my ISP some guys recently detected ping. Ping. PING. PIIIIIIIIIIIIIIIIIIIIII............... ..................................... ............................................................................ ....... Oh, you miss the rest of my post? The reason is simple: "NG" is yet to come. So, once again I take my drum, taram, taram, tataram..... Perhaps I can join a parade? The Internet still remains a well behaved community. Alok=> no doubts about that ;-) Some administrators block ping. The guys on my ISP?s newsgroup call those administrators bad guys. I recall: "Good fences make good neighbours". Alok=> good chics too... > As far as I understand, queues are only as the inherent architecture is > async, Even that would not _require_ a queue. Think of Ethernet. What else is a "congestion" than a "collision", when there is no queueing on the router? Alok=> depends. A collision is the inablity to send something due to a media limitation, and *note*, the end host "orginiating" the packet experinces it in the case of collision So, if we had no queues, the Internet would run. Perhaps the throughput could be somewhat higher, perhaps the way the Internet runs would be more similar to a turtle than to Achilles - but who cares? Isn?t there still snail mail delievered sent by soldiers who served with General Custer? However, too large a queue can have the same effect. Alok=> define "too large" > > Say I have > > 1M----| |--------1M > 1M----| switching element |--------1M > 1M----| |--------1M > > > all my switching element needs to be able to do is to switch at round robin > at 6*1M ...right? Right. > Now where and why do I need the queues? Only reason that comes to mind is > the async. nature (each 1M is not clocked by the same clock etc), but the > queue size still does not need to be that high, does it? Exactly. And even no queuing (called "cut through switching" in the good old days from the past) would work. But then, packets arriving at the switch at the same time would result in the same effect as collisions. Alok=> ok. where would you "drop" them is the fundamental question. on an IS? then uve already wasted b/w and queues of an IS for no reason (remember...everything is e2e) However, this debate was conducted in the eighties. So, I?m curious why some people buried tons of queueing memory in routers during the last ten years (perhaps the disaster in Cobe was overcome and now there was some amount of memory chips to be sold) and recently, researchers detect that small queues could be useful. Alok=>$$ is a good reason ;-) Queues should be small. IIRC, this is exactly what John Nagle, Raj Jain and perhaps countless others told us twenty years ago. Alok=> Yep except i lost a bit on nagle's theorem when he kinda didnt wrap around the window. However, in extremely asnchronous situations, think of mobile wireless networks connected to the Internet, a reasonable amount of queuing is unevitable. Alok=> :-) they are good to steal other's passwds when sitting at an airport with nothing to do :-) I got a paper submission rejected this year with the enlightning comment "overqueing is bad, refer to Reiner Ludwigs PhD dissertation". I know Reiner Ludwigs PhD dissertation When he claims, overqueueing is bad, he is perfectly right as all the researchers before. It?s really an old story. Alok=> yep................ but wrap around the window..right? However, when service times oscillate from milliseconds to _minutes_(!) at the last mile (refer to the relevant ETSI/ITU standards for GPRS before calling me nuts), traffic might happen to be a little bursty if not equalized by queues and appropriate techniques. Alok=> My inability to read does wonders... ;-) Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Fri Aug 19 14:05:45 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 19 Aug 2005 23:05:45 +0200 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> <4304FA87.70108@web.de> <009e01c5a48b$7eb6b640$070218ac@rs.riverstonenet.com> <4305DE43.6060305@web.de> <010f01c5a4eb$020132f0$6401a8c0@rs.riverstonenet.com> Message-ID: <430649A9.3030007@web.de> Alok wrote: > > > A flow must not have more packets in transit than the congestion window > allows (the "equilibrium window") and a packet must not be sent to the > network until some other packet was taken away. > > Alok=> ahh!! and how do we "know that"?? A sender knows this from the acknowledgements. > > _This_ and nothing else limits the "energy" put into the network (the > analogy to physics is obvious: We talk about energy conservation, > impulse conservation, sometimes I think, Van Jacobson and Sir Isaac are > best friends :-)) and hence bursts, oscillation etc. are limited. > > Alok=> ? so? > > > Recall the Takoma bridge disaster, make the wind to stop blowing - the > Takoma bridge may oscillate to eternity, but at least it was still there. > > Alok=> :-) if u can find the freq, it will still beat! > > So what? As long as it does not _break_ it may beat! As long as we have no congestion collapse, there is no problem with queue oscillation. Of course, there may be a problem with RTT estimation, which was the original topic of this thread. However, when we have small queues and perhaps queueing delays turn to be neglectible compared to propagation delays, RTT estimation becomes easier than today. > > The more complex answer can be found e.g. in Jains "Delay" paper: > Limited queues with a length thoroughly thought through can improve > network performance. > > > Alok=> My ability to read is limited. I apologize. Perhaps we should send you posts in mp3 format? =8-) I admit, I often write too long posts. However, the issue is extremely difficult. So, i can?t put too short. (Recall Sireens signature and the Einstein quote.) > > > Not quite. Think of RED. > > > Alok==> how so? Some RED disciplines randomly discard packets even when there is no actual queue overun in order to limit oscillation and increase stability. > > Even that would not _require_ a queue. Think of Ethernet. What else is a > "congestion" than a "collision", when there is no queueing on the router? > > Alok=> depends. A collision is the inablity to send something due to a media > limitation, and *note*, the end host "orginiating" the packet experinces it > in the case of collision Not quite. Recall Davids recent post. In 802.11 ad hoc nets a collision results in a silent "discard" exactly as a congestion. This perfectly makes sense: Both, a media limitation and a queue limitation, is a limitation. Some part of the network can not convey the incoming packet. > > > So, if we had no queues, the Internet would run. Perhaps the throughput > could be somewhat higher, perhaps the way the Internet runs would be > more similar to a turtle than to Achilles - but who cares? Isn?t there > still snail mail delievered sent by soldiers who served with General Custer? > > However, too large a queue can have the same effect. > > Alok=> define "too large" > That?s the million dollar question. Especially as a TCP window is limited to 64 kBytes by default. However, if one would follow the "advice" of some "bright" network consultant I read recently, we should play around with window scaling in LANs to improve performance (God in Heaven!). Imagine a TCP sender scaled to AWND units of 1 Megabyte (we will _really_ imrpove performance). So imagine, a TCP sender has an actual window of 2 Megabyte and a router would support this. We would introduce a single trip e2e latency of nearly one second here - from one floor in a building to the other. This is not really what we want to do. In addition, in practical networks the vast majority of flows are short timed flows, so a routers memory is not occupied because there is not enough data in the flow. Hoever, theoretically (refer e.g to Jains paper) too large a buffer can simply bring down a flow?s throughput to _zero_. This is extremely hard to imagine: A sender?s window may increase beyond all limits, so does a bottleneck queue and so the time for a packet to stay in the queue may increase beyound all limits as well. I must correct the above. It?s not the infinite queueing space which brings the flow to the ground but the _window_ size. But this exactly results from unlimited queueing space if you don?t put an upper limit to a TCP sender?s window. To put Jain and Nagle short: They investigated the behaviour of packet switching networks with unlimited queues - and came to the advice: Make the queues short. > > I got a paper submission rejected this year with the enlightning comment > "overqueing is bad, refer to Reiner Ludwigs PhD dissertation". > I know Reiner Ludwigs PhD dissertation > When he claims, overqueueing is bad, he is perfectly right as all the > researchers before. It?s really an old story. > > > Alok=> yep................ but wrap around the window..right? I lost you. BTW: I do not talk about "Nagles algorithm" here but primarily of papers like: "On Packet Switches With Infinite Storage" from 1987. So basically, we do not even talk about TCP here. > > However, when service times oscillate from milliseconds to _minutes_(!) > at the last mile (refer to the relevant ETSI/ITU standards for GPRS > before calling me nuts), traffic might happen to be a little bursty if > not equalized by queues and appropriate techniques. > > > Alok=> My inability to read does wonders... ;-) I see. But my posts are a good practice. ITU standards are _much_ longer :-) Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From alokdube at hotpop.com Fri Aug 19 14:26:50 2005 From: alokdube at hotpop.com (Alok) Date: Sat, 20 Aug 2005 02:56:50 +0530 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: <022c01c5a41a$5878e4b0$6401a8c0@rs.riverstonenet.com> <4304FA87.70108@web.de> <009e01c5a48b$7eb6b640$070218ac@rs.riverstonenet.com> <4305DE43.6060305@web.de> <010f01c5a4eb$020132f0$6401a8c0@rs.riverstonenet.com> <430649A9.3030007@web.de> Message-ID: <000801c5a504$b80fc2a0$6401a8c0@rs.riverstonenet.com> > > Alok=> My inability to read does wonders... ;-) I see. But my posts are a good practice. ITU standards are _much_ longer :-) Alok==> U win! From detlef.bosau at web.de Sat Aug 20 08:57:18 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 20 Aug 2005 17:57:18 +0200 Subject: [e2e] end2end-interest Digest, Vol 18, Issue 9 References: Message-ID: <430752DE.DFEA69BF@web.de> "S. Keshav" wrote: > > > RTT delay is influenced by the following factors: > > 1. Speed of light delay in the path > 2. Retransmissions in the underlay > 3. Queues in buffers due to > a. self queueing (queueing behind your own packets) > b. queueing due to cross traffic > 4. The service rate of within a switch fabric in a router > 5. The size of the packet whose RTT is measured > > Variance in the RTT can be due to variation in any of the above. > So, if you want to measure contention, you have to do some things cleverly > at the sender: > keep packet size fixed > send at a `slow' rate > and also assume that > paths are pinned > there are no retransmissions in the underlay > > If these hold, then you can link RTT variation to contention. > > keshav Just to see, whether I understood you correctly. The packet size is fixed => serialization delay is constand and hopefully (nearly) the service times. No retransmissions and pinned paths are clear. Slow rate => There is no self queueing, any queuing is due to cross traffic. In other terms: You make sure that any RTT variation is only due to cross traffic. Right? Now, even the "low rate" requires explicit knowledge of the network and can hardly achieved along an unknown path. In addition, "cross traffic" may not be "cross traffic" but in fact _traffic_. On the street. Thinks like cars, motorcycles. As traffic signs, buildings etc., this influences the properties of a wireless channel. At least in a mobile wireless network, this is the reason why error recovery in the underlay is inevitable. So, I presume you basically agree that using RTT variation as a universal means for contention estimation is at least questionable. Is this correct? IIRC, the paper from Lindemann?s group does not mention mobility. However, I don?t remember a paper or talk, where ad hoc net users are supposed to stay in quiet and motionless medidation. Anybody is interesed in mobile ad hoc networks today. So, I would like to sharpen my question a bit: Can this approach be made to work with reasonable effort? Or should it be abandoned, beause it is not really promising? This is a hard question, I know. But for horse?s and rider?s benefit still the old saying holds true: "If you discover that you?re riding a dead horse, dismount." Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From david.hagel at gmail.com Sun Aug 21 15:15:08 2005 From: david.hagel at gmail.com (David Hagel) Date: Sun, 21 Aug 2005 18:15:08 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: Message-ID: I was wondering what are the typical coast-to-coast propagation and queuing delays observed by today's backbone networks in North America. Is there any data/study which provides a breakdown of different components of such end-to-end delays in today's backbone networks? Thanks, David From dpreed at reed.com Sun Aug 21 20:44:54 2005 From: dpreed at reed.com (David P. Reed) Date: Sun, 21 Aug 2005 23:44:54 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: Message-ID: <43094A36.1040402@reed.com> I can repeatably easily measure 40 msec. coast-to-coast (Boston-LA), of which around 25 msec. is accounted for by speed of light in fiber (which is 2/3 of speed of light in vacuum, *299,792,458 m s^-1 *, because the refractive index of fiber is approximately 1.5 or 3/2). So assume 2e8 m/s as the speed of light in fiber, 1.6e3 m/mile, and you get 1.25e5 mi/sec. The remaining 15 msec. can be accounted for by the fiber path not being straight line, or by various "buffering delays" (which include queueing delays, and scheduling delays in the case where frames are scheduled periodically and you have to wait for the next frame time to launch your frame). Craig Partridge and I have debated (offline) what the breakdown might actually turn out to be (he thinks the total buffering delay is only 2-3 msec., I think it's more like 10-12), and it would be quite interesting to get more details, but that would involve delving into the actual equipment deployed and its operating modes. From mallman at icir.org Fri Aug 19 08:46:49 2005 From: mallman at icir.org (Mark Allman) Date: Fri, 19 Aug 2005 11:46:49 -0400 Subject: [e2e] pam 2006 cfp Message-ID: <20050819154649.CD7B3335C1D@lawyers.icir.org> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://www.postel.org/pipermail/end2end-interest/attachments/20050819/15aa9649/attachment.ksh -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 185 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050819/15aa9649/attachment.bin From mbuddhikot at lucent.com Mon Aug 22 06:59:54 2005 From: mbuddhikot at lucent.com (Milind M. Buddhikot) Date: Mon, 22 Aug 2005 09:59:54 -0400 Subject: [e2e] Call for Participation, IEEE ICNP 2005, Nov 6-9, Boston Message-ID: <4309DA5A.6050504@lucent.com> Call For Participation (and Call for Student Posters) http://csr.bu.edu/icnp2005 13th IEEE International Conference on Network Protocols Boston, Massachusetts, USA November 6-9, 2005 Sponsored by: IEEE Computer Society, IEEE TCDP, NSF CISE/CNS, IBM Research, Boston University Important Dates =============== Early Registration: October 12, 2005 Student Travel Award Application: September 23, 2005 Minority Travel Award Application: September 15, 2005 Student Poster Submission: September 15, 2005 Highlights of ICNP 2005 ======================= * Keynote speech by Professor Larry Peterson (Princeton University) on "A Strategy for Continually Reinventing the Internet" * Invited talk by Darleen Fisher (National Science Foundation) on "NSF NeTS Initiatives on New Architectures and Protocols" * Presentations of peer-reviewed technical papers organized into ten sessions: + Interdomain Routing + Sensor & Ad-hoc Protocols + Peer-to-Peer Protocols + Geographic Routing in Ad-hoc Networks + Overlay Protocols + Dimensioning & Traffic Engineering + Security & Safety + Congestion Control + Protocol Implementation & Analysis + Wireless Transport * Workshop on Secure Network Protocols (NPSec) * Three timely tutorials: + Survivable Routing: Algorithms and Protocols + Wireless Mesh Networking + Session Initiation Protocol (SIP): A Protocol for Managing Next Generation Networks * Student work-in-progress poster session ================= -------------- next part -------------- A non-text attachment was scrubbed... Name: mbuddhikot.vcf Type: text/x-vcard Size: 350 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050822/ae160dfe/mbuddhikot.vcf From huitema at windows.microsoft.com Mon Aug 22 09:04:23 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Mon, 22 Aug 2005 09:04:23 -0700 Subject: [e2e] Question about propagation and queuing delays Message-ID: > The remaining 15 msec. can be accounted for by the fiber path not being > straight line, or by various "buffering delays" (which include queueing > delays, and scheduling delays in the case where frames are scheduled > periodically and you have to wait for the next frame time to launch your > frame). > > Craig Partridge and I have debated (offline) what the breakdown might > actually turn out to be (he thinks the total buffering delay is only 2-3 > msec., I think it's more like 10-12), and it would be quite interesting > to get more details, but that would involve delving into the actual > equipment deployed and its operating modes. One way to find out is to collect a large set of samples, and then look at the minimum value. As long as the route does not change, the propagation delay is the sum of the transmission times, which are supposed constant, and a set of positive random values. The minimum of a large sample is the sum of the transmission times and the minimum of the random values, which tends towards zero. Obviously, you have to verify the "stable route" hypothesis... -- Christian Huitema From david.hagel at gmail.com Mon Aug 22 09:13:41 2005 From: david.hagel at gmail.com (David Hagel) Date: Mon, 22 Aug 2005 09:13:41 -0700 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <43094A36.1040402@reed.com> References: <43094A36.1040402@reed.com> Message-ID: Thanks, this is interesting. I asked the same question on nanog and got similar responses: that queuing delay is negligible on todays backbone networks compared to other fixed delay components (propagation, store-and-forward, transmission etc). Response on nanog seems to indicate that queuing delay is almost irrelevant today. This may sound like a naive question. But if queuing delays are so insignificant in comparison to other fixed delay components then what does it say about the usefulness of all the extensive techniques for queue management and congestion control (including TCP congestion control, RED and so forth) in the context of today's backbone networks? Any thoughts? Are the congestion control researchers out of touch with reality? - Dave On 8/21/05, David P. Reed wrote: > I can repeatably easily measure 40 msec. coast-to-coast (Boston-LA), of > which around 25 msec. is accounted for by speed of light in fiber (which > is 2/3 of speed of light in vacuum, *299,792,458 m s^-1 *, because the > refractive index of fiber is approximately 1.5 or 3/2). So assume 2e8 > m/s as the speed of light in fiber, 1.6e3 m/mile, and you get 1.25e5 > mi/sec. > > The remaining 15 msec. can be accounted for by the fiber path not being > straight line, or by various "buffering delays" (which include queueing > delays, and scheduling delays in the case where frames are scheduled > periodically and you have to wait for the next frame time to launch your > frame). > > Craig Partridge and I have debated (offline) what the breakdown might > actually turn out to be (he thinks the total buffering delay is only 2-3 > msec., I think it's more like 10-12), and it would be quite interesting > to get more details, but that would involve delving into the actual > equipment deployed and its operating modes. > From detlef.bosau at web.de Mon Aug 22 11:26:07 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 22 Aug 2005 20:26:07 +0200 Subject: [e2e] Question about propagation and queuing delays References: <43094A36.1040402@reed.com> Message-ID: <430A18BF.5030202@web.de> David Hagel wrote: > > This may sound like a naive question. But if queuing delays are so > insignificant in comparison to other fixed delay components then what > does it say about the usefulness of all the extensive techniques for > queue management and congestion control (including TCP congestion > control, RED and so forth) in the context of today's backbone > networks? Any thoughts? Are the congestion control researchers out of > touch with reality? > > - Dave It depends. One answer is: Yes, they are. A more cynical answer is: If a lucky guy joins a PhD program, he must find a topic to write about. In earlier centuries one wrote a doctoral thesis about: "Was Maria virgin until her first intercourse with..., excuse me, before she became pregnant?" O.k., now we know about that. Next thesis. "Was Maria virgin _during_ her pregnancy?". Even that is clear. Next thesis, and this is anatomiclly interetindg, perhaps one could not only achieve a D.D. but an M.D. with this: "Was Maria, anatomically correct, virgin after she gave birth to Jesus?" And now the most difficult one: "Was Maria virgin _during_ the birth of Jesus?" No, this is no political incorrect offense to the readers, these are topics which were discussed extensivley in the Middle Ages, here in Germany and in Italy and in other locations of the roman catholic church. Nowadays, we are rationalists. Wie don?t debate Marias virginity. We discuss the importance of timers for congestion control. Some months ago, some people from the group around Christoph Lindemann published about "TCP wit Adaptive Pacing for Multihop Wireless Networks" and reckognized latency observation as a new crystall ball for congestion forecast and avoidance. If this sounds too cyncial: I apologize. During the last weeks, I became mad about Edge?s paper about adaptive retransmission timeouts. And the more I?m thinking about that paper and how TCP timers work, the more I become convinced that the insignificance of queueing delays, and the consequence that the Internet latency as perceived by a flow is nearly constant during the lifetime of a flow, is the reason why TCP timers work at all. As soon as latencies are subject to large and sudden change, prominent example: mobile wide area networks, we talk about "spurious timeouts" and other urban legends, which miss the problem. The more often I read Edge?s paper and think about ist, the more I play around with the actual RTT estimators, the more I?m in doubt whether these will work in a network with highly instable and quickly changing latencies. This is all the more true in mobile wireless networks where latencies are due to retransmissions and error recovery (without error recovery TCP flows would break down due to retransmission collapse in those networks) and therefore subject to change of path properties beyound our control. It?s not Edge?s approach, which causes the problem. It?s our actual approach of RTT estimation which is as useful as cast dice. So, with respect to your last sentence: We often are out of touch with reality, because using our actual TCP timers we use an insolid basis for TCP congestion control which by some chance and lucky cirumstances holds in contemporary Internet. And when there appear some "strange effects" in mobile networks, we are glad about it: "Hurray! An effect! A topic for my PhD thesis!" Honestly, if you detect fire in your house, you certainly will not be glad because you can spare fuel but you will call the fire brigade. Using instable and inappropriate estimators for mean and variance of RTT leads to a number of "strange effects", "spurious timeouts" is only one if them. However: It?s a symptom. Not the reason. A cure must focus at the reason. Not at the symptom. Once again: In contemporary Internet with neglectible queueing delays and almost constant paths, this is absolutely no problem and anything works fine. But falling asleep safe and sound, knowing Kah is around is perhaps not the best strategy to solve the imminent problem. It?s similar to our German wellfare system, where polictians ignored (well known!) problems for decades - and now we face a disaster. I?m not even convinced that anything is fine in wirebound networks. Due to some "interesting" discussions here in Germany concerning "fastpath" (some new buzzword with ADSL) I had a first glance at the ITU recommendation for g.dmt. In fact, we do not _yet_ use automatic retransmission here. But if we continue to exploit extremely noisy lines for high speed data transmission, which appears to be promising when you look at the market and which allows me as an unemployed person to use the Internet (with my old ISDN dialup account it was by far to expensive), things can turn to be different. Perhaps, ARQ might be useful fore some line. Perhaps not only at the last mile which can be hidden behind a PEP. We discussed ARQ for satellite links recently in this list. And then? Will we complain about "spurious timeouts" then? I apologize when this sounds extremely upset. It?s my honest intention not to offense anybody. And if I can contribute an approach here, I will do my very best. I sent some rough ideas to some people, perhas I will get a feedback about it. But either I am to stupid too understand TCP and it?s assumptions, or there is real danger to get into severe trouble when we still ignore the timer issue. O.k. I think, I will appl for asylum on the falklands or in the antarktis now, since I expect to receive evil criticism now. I don?t mind. If I?m wrong, I will learn my lesson. But at the moment, I?m simply discouraged. If I?m wrong, I would appreciate somebody to correct me. If not, perhpas I can think about a way out. But everytime I start my editor on my dated, ten years old P160 with 128 MByte memory I think: It does not matter, whatever I write. As long as I do not provide billions of simulations (AKA repeated assertions) with the NS2, where I would have to change great amounts of code which would require man years of work, even with an equipment where not even a _link_ run for the NS2 would take about half an hour, no one would believe me. And as an unemployed person who is, as one minor problem of course, in the need of a job and to make a living here in Germany with admitted 5 millions of unemployed people, in reality whe have probably about 8 to 10 millions of unemployed people here, I cannot rewrite the whole NS2 and insert layer 2 models, which I do not have because no one gives _real_ channel traces to an unknown guy from Germany, and I cannot implement all the necessary changes _and_ produce convincing traces (which again no one would ever believe) on my own. So, I write one and two lines, and then I shut down the editor and give up. To blether about "TCP with Adaptive Pacing...." is obviously more successful. And to ignore the problem is perhaps the best strategy. Excuse me for writhing this, but as I said, I?m _really_ discouraged. And may be, I?m completely wrong. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From marc.herbert at free.fr Mon Aug 22 11:43:37 2005 From: marc.herbert at free.fr (Marc Herbert) Date: Mon, 22 Aug 2005 20:43:37 +0200 (CEST) Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: <43094A36.1040402@reed.com> Message-ID: On Mon, 22 Aug 2005, David Hagel wrote: > Thanks, this is interesting. I asked the same question on nanog and > got similar responses: that queuing delay is negligible on todays > backbone networks compared to other fixed delay components > (propagation, store-and-forward, transmission etc). Response on nanog > seems to indicate that queuing delay is almost irrelevant today. > > This may sound like a naive question. But if queuing delays are so > insignificant in comparison to other fixed delay components then what > does it say about the usefulness of all the extensive techniques for > queue management and congestion control (including TCP congestion > control, RED and so forth) in the context of today's backbone > networks? Any thoughts? Are the congestion control researchers out of > touch with reality? The delay-based congestion control techniques you are talking about are not based on a ratio but on a delta between instant and constant delay. This still does not mean the measure is easy; but IMHO not for the reason you give. From dpreed at reed.com Mon Aug 22 14:08:58 2005 From: dpreed at reed.com (David P. Reed) Date: Mon, 22 Aug 2005 17:08:58 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: Message-ID: <430A3EEA.80004@reed.com> Christian Huitema wrote: > >One way to find out is to collect a large set of samples, and then look >at the minimum value. As long as the route does not change, the >propagation delay is the sum of the transmission times, which are >supposed constant, and a set of positive random values. The minimum of a >large sample is the sum of the transmission times and the minimum of the >random values, which tends towards zero. > >Obviously, you have to verify the "stable route" hypothesis... > > This assumes the buffering is elastic. If it includes a fixed delay independent of load in the particular equipment (e.g. a "slotted" multiplexed rate adapter) you could have a long buffer delay without variation. Not all queues are elastic. (i.e. a pair of scheduled train routes with a transfer point can have a constant queueing delay that is the skew in arrival vs. departures at the transfer point.). From fred at cisco.com Mon Aug 22 14:50:52 2005 From: fred at cisco.com (Fred Baker) Date: Tue, 23 Aug 2005 05:50:52 +0800 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: <43094A36.1040402@reed.com> Message-ID: <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> no, but there are different realities, and how one measures them is also relevant. In large fiber backbones, within the backbone we generally run 10:1 overprovisioned or more. within those backbones, as you note, the discussion is moot. But not all traffic stays within the cores of large fiber backbones - much of it is originated and terminates in end systems located in homes and offices. The networks that connect homes and offices to the backbones are often constrained differently. For example, my home (in an affluent community in California) is connected by Cable Modem, and the service that I buy (business service that in its AUP accepts a VPN, unlike the same company's residential service) guarantees a certain amount of bandwidth, and constrains me to that bandwidth - measured in KBPS. I can pretty easily fill that, and when I do certain services like VoIP don't work anywhere near as well. So I wind up playing with the queuing of traffic in the router in my home to work around the service rate limit in my ISP. As I type this morning (in a hotel in Taipei), the hotel provides an access network that I share with the other occupants of the hotel. It's not uncommon for the entire hotel to share a single path for all of its occupants, and that single path is not necessarily in MBPS. And, they tell me that the entire world is not connected by large fiber cores - as soon as you step out of the affluent industrialized countries, VSAT, 64 KBPS links, and even 9.6 access over GSM become the access paths available. As to measurement, note that we generally measure that overprovisioning by running MRTG and sampling throughput rates every 300 seconds. When you're discussing general service levels for an ISP, that is probably reasonable. When you're measuring time variations on the order of milliseconds, that's a little like running a bump counter cable across a busy intersection in your favorite downtown, reading the counter once a day, and drawing inferences about the behavior of traffic during light changes during rush hour... http://www.ieee-infocom.org/2004/Papers/37_4.PDF has an interesting data point. They used a much better measurement methodology, and one of the large networks gave them some pretty cool access in order to make those tests. Basically, queuing delays within that particular very-well-engineered large fiber core were on the order of 1 ms or less during the study, with very high confidence. But the same data flows frequently jumped into the 10 ms range even within the 90% confidence interval, and a few times jumped to 100 ms or so. The jumps to high delays would most likely relate to correlated high volume data flows, I suspect, either due to route changes or simple high traffic volume. The people on NANOG and the people in the NRENs live in a certain ivory tower, and have little patience with those who don't. They also measure the world in a certain way that is easy for them. On Aug 23, 2005, at 12:13 AM, David Hagel wrote: > Thanks, this is interesting. I asked the same question on nanog and > got similar responses: that queuing delay is negligible on todays > backbone networks compared to other fixed delay components > (propagation, store-and-forward, transmission etc). Response on > nanog seems to indicate that queuing delay is almost irrelevant today. > > This may sound like a naive question. But if queuing delays are so > insignificant in comparison to other fixed delay components then > what does it say about the usefulness of all the extensive > techniques for queue management and congestion control (including > TCP congestion control, RED and so forth) in the context of today's > backbone networks? Any thoughts? Are the congestion control > researchers out of touch with reality? > > - Dave > > On 8/21/05, David P. Reed wrote: >> I can repeatably easily measure 40 msec. coast-to-coast (Boston- >> LA), of which around 25 msec. is accounted for by speed of light >> in fiber (which is 2/3 of speed of light in vacuum, *299,792,458 m >> s^-1 *, because the refractive index of fiber is approximately 1.5 >> or 3/2). So assume 2e8 m/s as the speed of light in fiber, >> 1.6e3 m/mile, and you get 1.25e5 mi/sec. >> >> The remaining 15 msec. can be accounted for by the fiber path not >> being straight line, or by various "buffering delays" (which >> include queueing delays, and scheduling delays in the case where >> frames are scheduled periodically and you have to wait for the >> next frame time to launch your frame). >> >> Craig Partridge and I have debated (offline) what the breakdown >> might actually turn out to be (he thinks the total buffering delay >> is only 2-3 msec., I think it's more like 10-12), and it would be >> quite interesting to get more details, but that would involve >> delving into the actual equipment deployed and its operating modes. From detlef.bosau at web.de Mon Aug 22 15:39:28 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 23 Aug 2005 00:39:28 +0200 Subject: [e2e] Question about propagation and queuing delays References: Message-ID: <430A5420.9ACB8F52@web.de> Christian Huitema wrote: > > > One way to find out is to collect a large set of samples, and then look > at the minimum value. As long as the route does not change, the > propagation delay is the sum of the transmission times, which are > supposed constant, and a set of positive random values. The minimum of a > large sample is the sum of the transmission times and the minimum of the > random values, which tends towards zero. the minimum of the random values, which tends towards zero..... Is there evidence for this? I think, this is similar to the rationale given in the "Adaptive Pacing..." paper, where delay variation indicates congestion: If the random values represent e.g. queueing delays, why does the sum of these tends towards zero? Why not to an average value? If the sum would tend to zero, once again: we had a possibility to calibrate a "congestion level = f(latency)" function then. Of course, when you can observe networks in unloaded periods of time, you may be right as long as you take samples for a long enough period, sufficiently high sampling rate etc. However, from my own experience with all this "congestion level = f(latency)" magic, I became rather relcutant. It?s appealing at the first glance - however it does not look promising at the second. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From iam4 at cs.waikato.ac.nz Mon Aug 22 16:40:08 2005 From: iam4 at cs.waikato.ac.nz (Ian McDonald) Date: Tue, 23 Aug 2005 11:40:08 +1200 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <430A18BF.5030202@web.de> References: <43094A36.1040402@reed.com> <430A18BF.5030202@web.de> Message-ID: <430A6258.9090603@cs.waikato.ac.nz> Detlef Bosau wrote: > David Hagel wrote: > >> >> This may sound like a naive question. But if queuing delays are so >> insignificant in comparison to other fixed delay components then what >> does it say about the usefulness of all the extensive techniques for >> queue management and congestion control (including TCP congestion >> control, RED and so forth) in the context of today's backbone >> networks? Any thoughts? Are the congestion control researchers out of >> touch with reality? >> >> - Dave > > > > It depends. > > One answer is: Yes, they are. > > A more cynical answer is: If a lucky guy joins a PhD program, he must > find a topic to write about. As a lucky guy doing a PhD on congestion control I couldn't resist the bait :-) I may be missing something but we need congestion control as long as we have networks. In the USA and in Europe you may all have unlimited bandwidth available at virtually no cost to you but in the rest of the "real world" it doesn't quite work like that. So as long as you are bandwidth constrained you will need congestion control. I think others are out of touch of reality.... Seriously traffic can be constrained for many different reasons apart from backbones: - link at other end (e.g. web server) is on a "slow" link - mobile networks - link between ISP and upstream ISP (a particular problem in NZ at the moment) - slow speed link at consumer premises Most backbones are over provisioned in the developed world but less so in more remote corners and even less so in developing countries. I have seen presentations showing >50% packet loss in parts of Asia and Africa on this list in the last few months - surely you need congestion control for that! Remember congestion control is also about fairness on your own equipment as well - you want competing flows to share nicely (unless you specify otherwise). Regards, Ian From huitema at windows.microsoft.com Mon Aug 22 17:34:06 2005 From: huitema at windows.microsoft.com (Christian Huitema) Date: Mon, 22 Aug 2005 17:34:06 -0700 Subject: [e2e] Question about propagation and queuing delays Message-ID: > the minimum of the random values, which tends towards zero..... > > Is there evidence for this? Yep. Assuming independent samples, P(min(X1, X2,...,Xn) > y) = P(X > y) to the power N, which tends towards 0 when N increases, except for the value y=0. > If the random values represent e.g. queueing delays, why does the sum of > these tends towards zero? Why not to an average value? Min, not sum. -- Christian Huitema From vgill at vijaygill.com Mon Aug 22 19:45:14 2005 From: vgill at vijaygill.com (vijay gill) Date: Mon, 22 Aug 2005 22:45:14 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> References: <43094A36.1040402@reed.com> <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> Message-ID: <430A8DBA.2000007@vijaygill.com> Fred Baker wrote: > no, but there are different realities, and how one measures them is > also relevant. > > In large fiber backbones, within the backbone we generally run 10:1 > overprovisioned or more. within those backbones, as you note, the > discussion is moot. But not all traffic stays within the cores of large > fiber backbones - much of it is originated and terminates in end > systems located in homes and offices. We don't run 10:1 overprovisioning or n:1 overprovisioning in the backbone because we simply do not know how. I am provisioning a backbone interface, where do I get the 10 to 1 figure from. I have worked at very large backbones for most of my career and in every case, the backbone bandwidth provisioning was simply kicked off when certain paths got to a steady 50% or more utilization. The saving factor is that large macroflows between places are fairly tractacble and we can watch the link utilization and upgrade as needed (I speak to well funded north american networks, if you're running a country over a VSAT link and dialup modem, disregard this). > > The networks that connect homes and offices to the backbones are often > constrained differently. For example, my home (in an affluent community > in California) is connected by Cable Modem, and the service that I buy > (business service that in its AUP accepts a VPN, unlike the same > company's residential service) guarantees a certain amount of > bandwidth, and constrains me to that bandwidth - measured in KBPS. Here is where overprovisioning is common. Normally most cable plants allocate 20 kbps or 25 kbps per paying sub for capacity planning purposes and build the physical plant to support that. > in MBPS. And, they tell me that the entire world is not connected by > large fiber cores - as soon as you step out of the affluent > industrialized countries, VSAT, 64 KBPS links, and even 9.6 access over > GSM become the access paths available. > As to measurement, note that we generally measure that overprovisioning > by running MRTG and sampling throughput rates every 300 seconds. When > you're discussing general service levels for an ISP, that is probably > reasonable. When you're measuring time variations on the order of > milliseconds, that's a little like running a bump counter cable across > a busy intersection in your favorite downtown, reading the counter once > a day, and drawing inferences about the behavior of traffic during > light changes during rush hour... Which is why I've been pushing my vendors to implement high watermark counters that measure the maximum queue depth reached. The EWMA counters used in most routers might as well be a random number in terms of finding out microburst caused congestion. It is however, perfectly valid for cap planning for large city-pair flows. > > http://www.ieee-infocom.org/2004/Papers/37_4.PDF has an interesting > data point. They used a much better measurement methodology, and one of > the large networks gave them some pretty cool access in order to make > those tests. Basically, queuing delays within that particular > very-well-engineered large fiber core were on the order of 1 ms or less > during the study, with very high confidence. But the same data flows > frequently jumped into the 10 ms range even within the 90% confidence > interval, and a few times jumped to 100 ms or so. The jumps to high > delays would most likely relate to correlated high volume data flows, I > suspect, either due to route changes or simple high traffic volume. That burstiness occurs more frequently if your customers are connected at links that are on the same bandwidth as the core. Lots of ds3/t1/e3 type customers are not going to cause significant microburstiness issues on a 10 gig backbone. > The people on NANOG and the people in the NRENs live in a certain ivory > tower, and have little patience with those who don't. They also measure > the world in a certain way that is easy for them. > No comment. /vijay From randy at psg.com Mon Aug 22 23:40:13 2005 From: randy at psg.com (Randy Bush) Date: Mon, 22 Aug 2005 23:40:13 -0700 Subject: [e2e] Question about propagation and queuing delays References: <43094A36.1040402@reed.com> <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> Message-ID: <17162.50381.753858.916232@roam.psg.com> > In large fiber backbones, within the backbone we generally run 10:1 > overprovisioned or more. while i am quite ready to believe that in the backbones that you run, this is the case. in the backbones which are run by the large and medium isps, this is not. in the real world, it's driven by provisioning time. i.e., if one can provision in a matter of weeks, then traffic usually grows sufficiently slowly that utilization of well over 50% can be tolerated. in the more realistic situation where provisioning takes months, 50-66% is more the norm. but as i said, it also depends on rate of traffic growth. > The people on NANOG and the people in the NRENs live in a certain > ivory tower, and have little patience with those who don't. They also > measure the world in a certain way that is easy for them. unfortunately, what is 'easy' is that which is provided by the broken vendor(s). these tools are so gross as to only be useful when the law of large numbers is in play in highly aggregated traffic. when small spiky flows are at issue, we're left in what i might term as dirt, not an ivory tower. randy From puddinghead_wilson007 at yahoo.co.uk Tue Aug 23 00:47:55 2005 From: puddinghead_wilson007 at yahoo.co.uk (Puddinhead Wilson) Date: Tue, 23 Aug 2005 08:47:55 +0100 (BST) Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <430A18BF.5030202@web.de> Message-ID: <20050823074755.3014.qmail@web25701.mail.ukl.yahoo.com> How is this for a thesis in older times If the equator and the latitudes was totally aligned/in parallel with the plane of the revolution of the earth is daylight saving times needed? (neevrmind that latitudes may be ellipses) ;-) --- Detlef Bosau wrote: > David Hagel wrote: > > > > This may sound like a naive question. But if > queuing delays are so > > insignificant in comparison to other fixed delay > components then what > > does it say about the usefulness of all the > extensive techniques for > > queue management and congestion control (including > TCP congestion > > control, RED and so forth) in the context of > today's backbone > > networks? Any thoughts? Are the congestion control > researchers out of > > touch with reality? > > > > - Dave > > > It depends. > > One answer is: Yes, they are. > > A more cynical answer is: If a lucky guy joins a PhD > program, he must > find a topic to write about. > > In earlier centuries one wrote a doctoral thesis > about: "Was Maria > virgin until her first intercourse with..., excuse > me, before she became > pregnant?" O.k., now we know about that. Next > thesis. "Was Maria virgin > _during_ her pregnancy?". Even that is clear. Next > thesis, and this is > anatomiclly interetindg, perhaps one could not only > achieve a D.D. but > an M.D. with this: "Was Maria, anatomically correct, > virgin after she > gave birth to Jesus?" And now the most difficult > one: "Was Maria virgin > _during_ the birth of Jesus?" > > No, this is no political incorrect offense to the > readers, these are > topics which were discussed extensivley in the > Middle Ages, here in > Germany and in Italy and in other locations of the > roman catholic church. > > > Nowadays, we are rationalists. Wie don?t debate > Marias virginity. > > We discuss the importance of timers for congestion > control. Some months > ago, some people from the group around Christoph > Lindemann published about > "TCP wit Adaptive Pacing for Multihop Wireless > Networks" and reckognized > latency observation as a new crystall ball for > congestion forecast and > avoidance. > > If this sounds too cyncial: I apologize. > > During the last weeks, I became mad about Edge?s > paper about adaptive > retransmission timeouts. > > And the more I?m thinking about that paper and how > TCP timers work, the > more I become convinced that the insignificance of > queueing delays, and > the consequence that the Internet latency as > perceived by a flow is > nearly constant during the lifetime of a flow, is > the reason why TCP > timers work at all. > > As soon as latencies are subject to large and sudden > change, prominent > example: mobile wide area networks, we talk about > "spurious timeouts" > and other urban legends, which miss the problem. > > The more often I read Edge?s paper and think about > ist, the more I play > around with the actual RTT estimators, the more I?m > in doubt whether > these will work in a network with highly instable > and quickly changing > latencies. > > This is all the more true in mobile wireless > networks where latencies > are due to retransmissions and error recovery > (without error recovery > TCP flows would break down due to retransmission > collapse in those > networks) and therefore subject to change of path > properties beyound our > control. > > It?s not Edge?s approach, which causes the problem. > > It?s our actual approach of RTT estimation which is > as useful as cast dice. > > So, with respect to your last sentence: We often are > out of touch with > reality, because using our actual TCP timers we use > an insolid basis for > TCP congestion control which by some chance and > lucky cirumstances holds > in contemporary Internet. And when there appear some > "strange effects" > in mobile networks, we are glad about it: "Hurray! > An effect! A topic > for my PhD thesis!" > Honestly, if you detect fire in your house, you > certainly will not be > glad because you can spare fuel but you will call > the fire brigade. > > Using instable and inappropriate estimators for mean > and variance of RTT > leads to a number of "strange effects", "spurious > timeouts" is only one > if them. However: It?s a symptom. Not the reason. A > cure must focus at > the reason. Not at the symptom. > > Once again: In contemporary Internet with > neglectible queueing delays > and almost constant paths, this is absolutely no > problem and anything > works fine. But falling asleep safe and sound, > knowing Kah is around is > perhaps not the best strategy to solve the imminent > problem. It?s > similar to our German wellfare system, where > polictians ignored (well > known!) problems for decades - and now we face a > disaster. > > I?m not even convinced that anything is fine in > wirebound networks. > Due to some "interesting" discussions here in > Germany concerning > "fastpath" (some new buzzword with ADSL) I had a > first glance at the ITU > recommendation for g.dmt. In fact, we do not _yet_ > use automatic > retransmission here. But if we continue to exploit > extremely noisy lines > for high speed data transmission, which appears to > be promising when you > look at the market and which allows me as an > unemployed person to use > the Internet (with my old ISDN dialup account it was > by far to > expensive), things can turn to be different. > Perhaps, ARQ might be > useful fore some line. Perhaps not only at the last > mile which can be > hidden behind a PEP. We discussed ARQ for satellite > links recently in > this list. And then? Will we complain about > "spurious timeouts" then? > > I apologize when this sounds extremely upset. It?s > my honest intention > not to offense anybody. And if I can contribute an > approach here, I will > do my very best. I sent some rough ideas to some > people, perhas I will > get a feedback about it. > > But either I am to stupid too understand TCP and > it?s assumptions, > or there is real danger to get into severe trouble > when we still ignore > the timer issue. > > O.k. I think, I will appl for asylum on the > falklands or in the > antarktis now, since I expect to receive evil > criticism now. > > I don?t mind. If I?m wrong, I will learn my lesson. > > But at the moment, I?m simply discouraged. > If I?m wrong, I would appreciate somebody to correct > me. > If not, perhpas I can think about a way out. But > everytime I start my > editor on my dated, ten years old P160 with 128 > MByte memory I think: It > does not matter, whatever I write. As long as I do > not provide > === message truncated === ___________________________________________________________ To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com From puddinghead_wilson007 at yahoo.co.uk Tue Aug 23 01:15:58 2005 From: puddinghead_wilson007 at yahoo.co.uk (Puddinhead Wilson) Date: Tue, 23 Aug 2005 09:15:58 +0100 (BST) Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <20050823074755.3014.qmail@web25701.mail.ukl.yahoo.com> Message-ID: <20050823081558.50775.qmail@web25702.mail.ukl.yahoo.com> --- Puddinhead Wilson wrote: > How is this for a thesis in older times > > If the equator and the latitudes was totally > aligned/in parallel with the plane of the > revolution > of the earth is daylight saving times needed? > (neevrmind that latitudes may be ellipses) > > ;-) foolish me!! how will i get an ellipse from a cross section of a spehere :-)) ___________________________________________________________ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com From detlef.bosau at web.de Tue Aug 23 06:25:47 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 23 Aug 2005 15:25:47 +0200 Subject: [e2e] Question about propagation and queuing delays References: <43094A36.1040402@reed.com> <430A18BF.5030202@web.de> <430A6258.9090603@cs.waikato.ac.nz> Message-ID: <430B23DB.8050101@web.de> Ian McDonald wrote: > As a lucky guy doing a PhD on congestion control I couldn't resist the bait :-) > > I may be missing something but we need congestion control as long as we have networks. In the USA I?m totally with you. > and in Europe you may all have unlimited bandwidth available at virtually no cost to you but in the > rest of the "real world" it doesn't quite work like that. So as long as you are bandwidth > constrained you will need congestion control. I think others are out of touch of reality.... > > Excuse me, where is the flame bait? What you say is abolutley correct and I totally agree with you! I will only give one example (I always tend to write too much...). When I worked as a network adminstrator in northern Germany we had to attach some points of sales to a company network which were situated in the Czech Republik. It is interesting to observe people who always talk about ISDN, DSL, backbondes with large bandwdith, and now you are informed: "We do not yet know whether 9k6 can be achieved, we have to check the old POTS line, it may be too noisy." Depending on where you are, you may perfectly encounter different realities! Even here in Germany you may encounter stone aged POTS lines in some rural areas. When I read what you say, I would like to invite you into my ISP?s support newsgroup, I think much of the readers can learn a lot from you! Just to give one example from there: We recently had a discussion about "Fastpath". In DSL lines, you need error recovery on the last mile. Now, to save overhead you do codespreading/interleaing. Some "well informed guys" want the ISP to turn interleaving off in order to spare some "ping time". First of all, it?s simply ridiculous, theat individual customers without any technical knowledge will prescribe the provider the appropriate line coding for one individual wire pair. Second: Not only these customers may be affected by increasing error rates: These guys flood large portions of the network with defictive frames, more precisely with defective ATM cells with corrupted payload, which is eventually being detected at the customers AAL 5 peer. (At least AFAIK.) This is thoughtless waste of bandwidth, but it is nearly impossible to convince those guys that this is malicious in quite a number of cases! What is even more disastrous: In fact, in DSL TCP appears to be based upon AAL5/UBR. Unspecified bitrate. Hence, all congestion control is done at the TCP endpoints. I?m totally with you that this requires well behaved participants in a network. IIRC, LANE works with ABR and that will alleviate the problem. > > Seriously traffic can be constrained for many different reasons apart from backbones: > - link at other end (e.g. web server) is on a "slow" link > - mobile networks > - link between ISP and upstream ISP (a particular problem in NZ at the moment) > - slow speed link at consumer premises Could you _please_ join this newsgroup :-) > > Most backbones are over provisioned in the developed world but less so in more remote corners and > even less so in developing countries. I have seen presentations showing >50% packet loss in parts of > Asia and Africa on this list in the last few months - surely you need congestion control for that! Excuse me, but I don?t mind congestion control! Of course we need it! Perhaps my command of the englisch language is rather poor. But I sincerely hope that no one had misunderstood me that way that I denied the necessity of congestion control! The problem _I_ expect is, that congestion control and even proper retransmission control can run into severe problems, when TCP timers don?t work. And when I talked about the Internet as it is perceived in Europe and the US, I concluded that in _this_ area TCP works fine. Whether this holds true all about the world and in all kinds of networks is highly questionable. So, I really don?t see a flame bait here. Perhaps you understood me in a different way, but from what you wrote I couldn?t agree with you even better! Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Tue Aug 23 07:20:16 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 23 Aug 2005 16:20:16 +0200 Subject: [e2e] Question about propagation and queuing delays References: Message-ID: <430B30A0.1030202@web.de> Christian Huitema wrote: >>the minimum of the random values, which tends towards zero..... >> >>Is there evidence for this? > > > Yep. Assuming independent samples, P(min(X1, X2,...,Xn) > y) = P(X > y) > to the power N, which tends towards 0 when N increases, except for the > value y=0. > O.k. > >>If the random values represent e.g. queueing delays, why does the sum > > of > >>these tends towards zero? Why not to an average value? > But can you observe this min? If Xi, i = 1..n, are queueing delays, the minimum tends to zero. However, if you observe a packet traveling the network (I think, you obtain your samples this way?) the queueing delays will sum up? Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From fla at inescporto.pt Tue Aug 23 09:35:40 2005 From: fla at inescporto.pt (Filipe Abrantes) Date: Tue, 23 Aug 2005 17:35:40 +0100 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: <43094A36.1040402@reed.com> Message-ID: <430B505C.3000303@inescporto.pt> Hello David, David Hagel wrote: > Thanks, this is interesting. I asked the same question on nanog and > got similar responses: that queuing delay is negligible on todays > backbone networks compared to other fixed delay components > (propagation, store-and-forward, transmission etc). Response on nanog > seems to indicate that queuing delay is almost irrelevant today. > > This may sound like a naive question. But if queuing delays are so > insignificant in comparison to other fixed delay components then what > does it say about the usefulness of all the extensive techniques for > queue management and congestion control (including TCP congestion > control, RED and so forth) in the context of today's backbone > networks? Any thoughts? Are the congestion control researchers out of > touch with reality? > The latencies mentioned by David Reed are in the case of a non-congestioned path, and how it was already referred here, nowadays the most common case is to have our access link (xDSL/cable...) at home/office to be the bottleneck (the ping would struggle to fill the link right?). So, to get an approximate value for the maximum queueing delays you should try a ping when you have background traffic that fully utilizes your access link. Congestion Control only plays an active role when there is a bottleneck in the path... (well not totally true as the guys from the high-bandwidth delay and lossy paths may tell you). As to queue management, one of it's goals is also to promote fairness between flows (TCP is not that good at it), so i can see some usefulness in them too. If the final result is actually good enough I still don't know (I havent' gone too deep into this issue). I just did a ping to my home which is on a 2Mb-dl/128kbit-ul cable connection from work (where I am) to exemplify this. At home I started a P2P program which had the upload capped at 6KByte/s (capped by the application, so there could be instantaneous overloads i think) I got this: (the upload link was the bottleneck as my download was well below the dl limit) $ ping xxxxxxxx.no-ip.org PING xxxxxxx.no-ip.org (83.132.76.xx) 56(84) bytes of data. 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=1 ttl=52 time=71.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=2 ttl=52 time=109 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=3 ttl=52 time=88.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=4 ttl=52 time=29.5 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=5 ttl=52 time=399 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=6 ttl=52 time=307 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=7 ttl=52 time=131 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=8 ttl=52 time=78.6 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=9 ttl=52 time=87.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=10 ttl=52 time=54.2 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=11 ttl=52 time=93.7 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=12 ttl=52 time=22.4 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=13 ttl=52 time=21.8 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=14 ttl=52 time=45.2 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=16 ttl=52 time=251 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=17 ttl=52 time=22.1 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=18 ttl=52 time=297 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=19 ttl=52 time=290 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=20 ttl=52 time=280 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=21 ttl=52 time=21.0 ms --- xxxxxxxx.no-ip.org ping statistics --- 21 packets transmitted, 20 received, 4% packet loss, time 20020ms rtt min/avg/max/mdev = 21.046/135.229/399.044/117.287 ms Then I capped the upload at 3Kbyte/s and got this: $ ping xxxxxxxx.no-ip.org PING xxxxxxx.no-ip.org (83.132.76.xx) 56(84) bytes of data. 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=1 ttl=52 time=22.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=2 ttl=52 time=22.2 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=3 ttl=52 time=88.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=4 ttl=52 time=34.3 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=5 ttl=52 time=23.3 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=6 ttl=52 time=24.6 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=7 ttl=52 time=25.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=8 ttl=52 time=22.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=9 ttl=52 time=20.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=10 ttl=52 time=52.5 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=11 ttl=52 time=21.4 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=12 ttl=52 time=30.9 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=13 ttl=52 time=21.4 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=14 ttl=52 time=42.8 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=15 ttl=52 time=20.5 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=16 ttl=52 time=22.0 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=17 ttl=52 time=24.7 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=18 ttl=52 time=24.4 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=19 ttl=52 time=21.3 ms 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=20 ttl=52 time=20.6 ms --- xxxxxxxx.no-ip.org ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 19018ms rtt min/avg/max/mdev = 20.593/29.482/88.992/15.843 ms As you can see, queueing delays are noticeable. Best Regards Filipe Abrantes > - Dave > > > On 8/21/05, David P. Reed wrote: > >>I can repeatably easily measure 40 msec. coast-to-coast (Boston-LA), of >>which around 25 msec. is accounted for by speed of light in fiber (which >>is 2/3 of speed of light in vacuum, *299,792,458 m s^-1 *, because the >>refractive index of fiber is approximately 1.5 or 3/2). So assume 2e8 >>m/s as the speed of light in fiber, 1.6e3 m/mile, and you get 1.25e5 >>mi/sec. >> >>The remaining 15 msec. can be accounted for by the fiber path not being >>straight line, or by various "buffering delays" (which include queueing >>delays, and scheduling delays in the case where frames are scheduled >>periodically and you have to wait for the next frame time to launch your >>frame). >> >>Craig Partridge and I have debated (offline) what the breakdown might >>actually turn out to be (he thinks the total buffering delay is only 2-3 >>msec., I think it's more like 10-12), and it would be quite interesting >>to get more details, but that would involve delving into the actual >>equipment deployed and its operating modes. >> > > -- Filipe Lameiro Abrantes INESC Porto Campus da FEUP Rua Dr. Roberto Frias, 378 4200-465 Porto Portugal Phone: +351 22 209 4266 E-mail: fla at inescporto.pt From garmitage at swin.edu.au Tue Aug 23 11:27:55 2005 From: garmitage at swin.edu.au (grenville armitage) Date: Tue, 23 Aug 2005 14:27:55 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <430B23DB.8050101@web.de> References: <43094A36.1040402@reed.com> <430A18BF.5030202@web.de> <430A6258.9090603@cs.waikato.ac.nz> <430B23DB.8050101@web.de> Message-ID: <430B6AAB.9040804@swin.edu.au> Detlef Bosau wrote: [..] > Just to give one example from there: We recently had a discussion about > "Fastpath". In DSL lines, you need error recovery on the last mile. Now, > to save overhead you do codespreading/interleaing. Some "well informed > guys" want the ISP to turn interleaving off in order to spare some "ping > time". First of all, it?s simply ridiculous, theat individual customers > without any technical knowledge will prescribe the provider the > appropriate line coding for one individual wire pair. I'm curious if you have any stats on the typical error rates that will be experienced by the people who switch from Interleave to Fastpath mode on their DSL links. It is certainly true that e.g. gamers find Interleave mode to be a pain (at least 20ms additional latency), with good reason. Stats on how much packet loss the gamer will experience (in order to gain the latency improvement of Fastpath) would be interesting to know. If the error rates are low, then in fact it seems like an entirely reasonable thing for a customer to desire Fastpath. > Second: Not only > these customers may be affected by increasing error rates: These guys > flood large portions of the network with defictive frames, more > precisely with defective ATM cells with corrupted payload, which is > eventually being detected at the customers AAL 5 peer. (At least AFAIK.) > This is thoughtless waste of bandwidth, but it is nearly impossible to > convince those guys that this is malicious in quite a number of cases! I'm also curious about this "large portions of the network" which is carrying useless ATM AAL5 cells. If the bit error rate is so bad that a noticable fraction of ATM cells are useless then gamers going to have a bad packet loss rate and fairly quickly go back to interleave mode (despite the higher latency). Yet if the gamers are happy with the typical loss rate using Fastpath, then there's probably not that many wasted/useless ATM cells floating around. (Naturally, if the actual AAL_PDU loss rate starts to become more than an integer of a % the gamer's use of TCP for p2p, web surfing and email becomes problematic. But it is hard to argue this by hand-waving - we need stats on likely bit error rates a typical DSL customers islikely to see using Fastpath.) cheers, gja From rja at extremenetworks.com Tue Aug 23 11:30:36 2005 From: rja at extremenetworks.com (RJ Atkinson) Date: Tue, 23 Aug 2005 14:30:36 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: References: <43094A36.1040402@reed.com> Message-ID: <91A5CE60-4E8C-475F-9B02-371A0D3EC1BB@extremenetworks.com> On Aug 22, 2005, at 12:13, David Hagel wrote: > Thanks, this is interesting. I asked the same question on nanog and > got similar responses: that queuing delay is negligible on todays > backbone networks compared to other fixed delay components > (propagation, store-and-forward, transmission etc). Response on nanog > seems to indicate that queuing delay is almost irrelevant today. > > This may sound like a naive question. But if queuing delays are so > insignificant in comparison to other fixed delay components then what > does it say about the usefulness of all the extensive techniques for > queue management and congestion control (including TCP congestion > control, RED and so forth) in the context of today's backbone > networks? Any thoughts? Are the congestion control researchers out of > touch with reality? Congestion still exists today. However, it tends to exist not inside the network core, but instead in the access link (i.e. the link between the campus network and the upstream ISP). In many cases, this congestion is a policy choice on the part of the end site (e.g. pay for NxT1 uplink rather than T3 uplink in order to save money). Ran From marc.herbert at free.fr Tue Aug 23 11:49:09 2005 From: marc.herbert at free.fr (Marc Herbert) Date: Tue, 23 Aug 2005 20:49:09 +0200 (CEST) Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <430B23DB.8050101@web.de> References: <43094A36.1040402@reed.com> <430A18BF.5030202@web.de> <430A6258.9090603@cs.waikato.ac.nz> <430B23DB.8050101@web.de> Message-ID: On Tue, 23 Aug 2005, Detlef Bosau wrote: > When I read what you say, I would like to invite you into my ISP?s > support newsgroup, I think much of the readers can learn a lot from you! > > Just to give one example from there: We recently had a discussion about > "Fastpath". In DSL lines, you need error recovery on the last mile. Now, > to save overhead you do codespreading/interleaing. Some "well informed > guys" want the ISP to turn interleaving off in order to spare some "ping > time". First of all, it?s simply ridiculous, theat individual customers > without any technical knowledge will prescribe the provider the > appropriate line coding for one individual wire pair. Second: Not only > these customers may be affected by increasing error rates: These guys > flood large portions of the network with defictive frames, more > precisely with defective ATM cells with corrupted payload, which is > eventually being detected at the customers AAL 5 peer. (At least AFAIK.) > This is thoughtless waste of bandwidth, but it is nearly impossible to > convince those guys that this is malicious in quite a number of cases! > > What is even more disastrous: In fact, in DSL TCP appears to be based > upon AAL5/UBR. Unspecified bitrate. Hence, all congestion control is > done at the TCP endpoints. I?m totally with you that this requires well > behaved participants in a network. IIRC, LANE works with ABR and that > will alleviate the problem. FYI the second biggest ISP in France (about 1.2M subscribers) gives its subscribers a write access to this "fastpath" interleave level, through a simple web interface. http://translate.google.com/translate?u=http%3A%2F%2Fadsl.free.fr%2Fadmin%2Ffast_path.html&langpair=fr%7Cen&hl=en Of course you also have access to error stats on the DSL line. All gamers know about and love this feature. It helps them gain about 30ms, a huge benefit for "real-time" games. And they don't care much about the rest. Other subscribers don't care and use the default, conservative setting. So everyone is happy with this well-designed feature... I guess that if this feature was "flooding the network with malicious frames" or something, the ISP would obviously not have offered it. From detlef.bosau at web.de Tue Aug 23 15:09:11 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 24 Aug 2005 00:09:11 +0200 Subject: [e2e] Question about propagation and queuing delays References: <43094A36.1040402@reed.com> <430A18BF.5030202@web.de> <430A6258.9090603@cs.waikato.ac.nz> <430B23DB.8050101@web.de> <430B6AAB.9040804@swin.edu.au> Message-ID: <430B9E87.6070504@web.de> grenville armitage wrote: > Detlef Bosau wrote: > [..] > >> Just to give one example from there: We recently had a discussion >> about "Fastpath". In DSL lines, you need error recovery on the last >> mile. Now, to save overhead you do codespreading/interleaing. Some >> "well informed guys" want the ISP to turn interleaving off in order to >> spare some "ping time". First of all, it?s simply ridiculous, theat >> individual customers without any technical knowledge will prescribe >> the provider the appropriate line coding for one individual wire pair. > > > I'm curious if you have any stats on the typical error rates that will > be experienced by the people who switch from Interleave to Fastpath mode No. I haven?t. And I expect they will depend heavily on where you are. I don?t think an ISP will reveal this to customers. > on their DSL links. It is certainly true that e.g. gamers find Interleave > mode to be a pain (at least 20ms additional latency), with good reason. > Stats on how much packet loss the gamer will experience (in order to gain > the latency improvement of Fastpath) would be interesting to know. If the > error rates are low, then in fact it seems like an entirely reasonable > thing for a customer to desire Fastpath. If. The problem is, besides the problem that this may be off topic here in this list, so perhaps we should continue this discussion via PM, that it is hardly possible to make a decision here without any knowledge of the quality of the line. In addition, from my own professional experience, it is necessary to have a clear service description and service level agreement in any sort of contract. So, ISP and customer agree upon _WHAT_ is offered and not upon _HOW_ it?s implementend. Particularly the "ping times" are often wunderful "promises", based upon "there was an interview with a famous guy on this topic in the tabloids...." and then a customer expects a certain QoS. So "fastpath" may end up in a hidden, unspoken "unilateral QoS contract". What happens, if for some reason the latency increases? As a provider, you must not promise anything what you might not be able to keep. It?s always a liability issue. And if one customer is located in Markl and the other one in Flensburg, it doesn?t matter where these locatins are but there may be a distance of 1000 kilometers in between, this is of course diffferent to a situation where both people are located in Flensburg. Customer?s typically aren?t aware of this. They have read in the tabloids: "Fastpath will gurantee you ping times of 20 ms." There is not written from where to where. There is only said: "20 ms". So, when the NASA starts its mission to Mars, you shold enable fast path. Then you will have ping times to the spacecraft of 20 ms. > >> Second: Not only these customers may be affected by increasing error >> rates: These guys flood large portions of the network with defictive >> frames, more precisely with defective ATM cells with corrupted >> payload, which is eventually being detected at the customers AAL 5 >> peer. (At least AFAIK.) >> This is thoughtless waste of bandwidth, but it is nearly impossible to >> convince those guys that this is malicious in quite a number of cases! > > > I'm also curious about this "large portions of the network" which is > carrying > useless ATM AAL5 cells. If the bit error rate is so bad that a noticable > fraction > of ATM cells are useless then gamers going to have a bad packet loss > rate and > fairly quickly go back to interleave mode (despite the higher latency). When they get aware of this. Until yesterday, DSL 1000 was sufficient for online gamers. As of today, customers threaten the providers with law suits in order to get "DSL 6000", because otherwise the online game won?t work any longer. It was written in the tabloids, you know. It?s like computer worms and viruses. These are typical issues in the evening news on TV here in Germany. > Yet if > the gamers are happy with the typical loss rate using Fastpath, then > there's > probably not that many wasted/useless ATM cells floating around. > > (Naturally, if the actual AAL_PDU loss rate starts to become more than an > integer of a % the gamer's use of TCP for p2p, web surfing and email > becomes > problematic. But it is hard to argue this by hand-waving - we need stats on > likely bit error rates a typical DSL customers islikely to see using > Fastpath.) > I am totally with you. And even that?s the reason why line coding should be left to the provider who _has_ statistics and can make a decision based upon them. It?s what I said before: ISP and customer shall agree upon _what_ is provided. Not _hwo_ it?s provided. However, to get on topic again: Bascially, I talked about DSL as an _example_, why things can get more complex and more complicated than it was perhaps in the mids of the 1980s between UCLA, UCSD and UCBE. Basically, I startet with the question: Why do we expect difficulties with TCP in mobile wireless networks? some years ago. Then, some thousands of papers later and having read tons of paper (is there any forest left?) about varying bandwidth, spurious timeouts, adverse interactions, scheduling problems and other problems which are of course scary - and occur on each and every company LAN with even no wireless component in it - I got into the details of TCP timer estimation. As one would expect, this ended up in the question: Why does the Internet work at all? Obviously, it does. _I_ want to understand, _if_ and if _why_ there are problems with TCP in mobile wireless networks. It?s not convincing that there are dozens of PhD theses around which claim there are problems here, as long as there is no convincing reason for this. Simulations are not convincing (they prove anything and nothing - whatever you prefer) and "occasional observation" (recall the "cold fusion") aren?t neither. When we talk about problems with TCP in mobile wireless networks, we must give _reasons_ why there could be problems. Anything else is playing. And not science. When you look at my homepage, you?ll find my Path Tail Emulation paper there. I did not write a second paper yet, so a number of obejctions must be discussed and a number of corrections must be done in later versions. However, at the moment, it is not the question _how_ to solve the "problem" with TCP in mobile wireless networks. It is the question _IF_ there is a problem at all. And it is not the question that Ludwig, Gurtov, Chrakravorty and thousands of others have written tons of papers, there would be some. I have read a great deal of this stuff and there are problems in the NS2 caused by HICCUP and this shall make me believe there is a problem in reality without NS2 and HICCUP. Bash me, beat my, excuse me, that is not convincing. Either we identify were TCP is vulnerable or were the "system model" assumed by TCP is violated by mobile wireless networks - or this all is guess, hand-waving. That is the reason why I talked about an "urban legend" here yesterday. You may perfectly say, I would question quite a couple of PhD theses and whether it was justified to award the candidates the degree. If you do so, you perfectly understood what I mean. What I have read so far on this issue is not convincing - but simply sloppy. And if I couldn?t do it signicantly better, I wouldn?t write a second paper. But _if_ I do, and I?m trying to do so, there must be a sound basis in this. And not these "irreproducible observations" and (by NS2) "repeated assertions" I?ve read so far. And an embarrassing example for sloppiness is the "Adaptive Pacing" paper by ElRakabawy, Klemm and Lindemann at the Mobihoc this year. Taking two or three repeated assertations ("simulations") as a proof for a fundamental, but questionable, theorem cannot be accepted and I wonder, why the paper was by the reviewers. The paper is nicely written, there are nice figures and tables.... But for the benefit of a conference?s reputation, there should be some content in there as well and if the content is correct this is even better. Detlef Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From david.hagel at gmail.com Tue Aug 23 15:06:41 2005 From: david.hagel at gmail.com (David Hagel) Date: Tue, 23 Aug 2005 18:06:41 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <430B505C.3000303@inescporto.pt> References: <43094A36.1040402@reed.com> <430B505C.3000303@inescporto.pt> Message-ID: >From all that I hear, access links seem like the main curlpits that cause most of the congestion today. But so many of the current congestion control evaluations focus on alleviating congestion in the network core. Perhaps a simpler network model, in which only access links can be the bottlenecks, might yield much simpler congestion control solutions? Has there been any work in this direction? With all the non-TCP applications (like VoIP) emerging on the horizon, does relying on TCP alone for congestion control make much sense? (List moderators -- in case I am stirring some old soup of discussion on this list, please feel free to kill this thread. I am new to this list.) - Dave On 8/23/05, Filipe Abrantes wrote: > Hello David, > > David Hagel wrote: > > Thanks, this is interesting. I asked the same question on nanog and > > got similar responses: that queuing delay is negligible on todays > > backbone networks compared to other fixed delay components > > (propagation, store-and-forward, transmission etc). Response on nanog > > seems to indicate that queuing delay is almost irrelevant today. > > > > This may sound like a naive question. But if queuing delays are so > > insignificant in comparison to other fixed delay components then what > > does it say about the usefulness of all the extensive techniques for > > queue management and congestion control (including TCP congestion > > control, RED and so forth) in the context of today's backbone > > networks? Any thoughts? Are the congestion control researchers out of > > touch with reality? > > > > The latencies mentioned by David Reed are in the case of a > non-congestioned path, and how it was already referred here, nowadays > the most common case is to have our access link (xDSL/cable...) at > home/office to be the bottleneck (the ping would struggle to fill the > link right?). So, to get an approximate value for the maximum queueing > delays you should try a ping when you have background traffic that fully > utilizes your access link. > > Congestion Control only plays an active role when there is a bottleneck > in the path... (well not totally true as the guys from the > high-bandwidth delay and lossy paths may tell you). > > As to queue management, one of it's goals is also to promote fairness > between flows (TCP is not that good at it), so i can see some usefulness > in them too. If the final result is actually good enough I still don't > know (I havent' gone too deep into this issue). > > I just did a ping to my home which is on a 2Mb-dl/128kbit-ul cable > connection from work (where I am) to exemplify this. At home I started a > P2P program which had the upload capped at 6KByte/s (capped by the > application, so there could be instantaneous overloads i think) I got this: > (the upload link was the bottleneck as my download was well below the dl > limit) > > $ ping xxxxxxxx.no-ip.org > PING xxxxxxx.no-ip.org (83.132.76.xx) 56(84) bytes of data. > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=1 > ttl=52 time=71.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=2 > ttl=52 time=109 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=3 > ttl=52 time=88.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=4 > ttl=52 time=29.5 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=5 > ttl=52 time=399 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=6 > ttl=52 time=307 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=7 > ttl=52 time=131 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=8 > ttl=52 time=78.6 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=9 > ttl=52 time=87.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=10 > ttl=52 time=54.2 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=11 > ttl=52 time=93.7 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=12 > ttl=52 time=22.4 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=13 > ttl=52 time=21.8 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=14 > ttl=52 time=45.2 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=16 > ttl=52 time=251 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=17 > ttl=52 time=22.1 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=18 > ttl=52 time=297 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=19 > ttl=52 time=290 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=20 > ttl=52 time=280 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=21 > ttl=52 time=21.0 ms > > --- xxxxxxxx.no-ip.org ping statistics --- > 21 packets transmitted, 20 received, 4% packet loss, time 20020ms > rtt min/avg/max/mdev = 21.046/135.229/399.044/117.287 ms > > > Then I capped the upload at 3Kbyte/s and got this: > > $ ping xxxxxxxx.no-ip.org > PING xxxxxxx.no-ip.org (83.132.76.xx) 56(84) bytes of data. > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=1 > ttl=52 time=22.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=2 > ttl=52 time=22.2 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=3 > ttl=52 time=88.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=4 > ttl=52 time=34.3 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=5 > ttl=52 time=23.3 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=6 > ttl=52 time=24.6 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=7 > ttl=52 time=25.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=8 > ttl=52 time=22.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=9 > ttl=52 time=20.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=10 > ttl=52 time=52.5 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=11 > ttl=52 time=21.4 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=12 > ttl=52 time=30.9 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=13 > ttl=52 time=21.4 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=14 > ttl=52 time=42.8 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=15 > ttl=52 time=20.5 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=16 > ttl=52 time=22.0 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=17 > ttl=52 time=24.7 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=18 > ttl=52 time=24.4 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=19 > ttl=52 time=21.3 ms > 64 bytes from a83-132-76-xx.cpe.netcabo.pt (83.132.76.xx): icmp_seq=20 > ttl=52 time=20.6 ms > > --- xxxxxxxx.no-ip.org ping statistics --- > 20 packets transmitted, 20 received, 0% packet loss, time 19018ms > rtt min/avg/max/mdev = 20.593/29.482/88.992/15.843 ms > > > As you can see, queueing delays are noticeable. > > Best Regards > > Filipe Abrantes > > > > - Dave > > > > > > On 8/21/05, David P. Reed wrote: > > > >>I can repeatably easily measure 40 msec. coast-to-coast (Boston-LA), of > >>which around 25 msec. is accounted for by speed of light in fiber (which > >>is 2/3 of speed of light in vacuum, *299,792,458 m s^-1 *, because the > >>refractive index of fiber is approximately 1.5 or 3/2). So assume 2e8 > >>m/s as the speed of light in fiber, 1.6e3 m/mile, and you get 1.25e5 > >>mi/sec. > >> > >>The remaining 15 msec. can be accounted for by the fiber path not being > >>straight line, or by various "buffering delays" (which include queueing > >>delays, and scheduling delays in the case where frames are scheduled > >>periodically and you have to wait for the next frame time to launch your > >>frame). > >> > >>Craig Partridge and I have debated (offline) what the breakdown might > >>actually turn out to be (he thinks the total buffering delay is only 2-3 > >>msec., I think it's more like 10-12), and it would be quite interesting > >>to get more details, but that would involve delving into the actual > >>equipment deployed and its operating modes. > >> > > > > > > -- > Filipe Lameiro Abrantes > INESC Porto > Campus da FEUP > Rua Dr. Roberto Frias, 378 > 4200-465 Porto > Portugal > > Phone: +351 22 209 4266 > E-mail: fla at inescporto.pt > From alexkr at cisco.com Tue Aug 23 15:21:43 2005 From: alexkr at cisco.com (Alex Krivonosov (alexkr)) Date: Tue, 23 Aug 2005 15:21:43 -0700 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP Message-ID: Hi, Can anybody help me to solve this issue? I have a TCP connection handled by the completion port IO model. What is happening is in case I specify a large buffer for receiving (WSARecv), the operation completes only after the buffer is full, not after receiving about 500 bytes (a packet), so a significant delay is introduced. In case of small buffers, performance degrades. Any advice on this? Completion port model is a must. Thank you Alex Krivonosov -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050823/ba58f2c7/attachment.html From lars.eggert at netlab.nec.de Wed Aug 24 01:06:42 2005 From: lars.eggert at netlab.nec.de (Lars Eggert) Date: Wed, 24 Aug 2005 10:06:42 +0200 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP In-Reply-To: References: Message-ID: <0EC343A4-094D-4EE6-9428-7D3FE03CB83E@netlab.nec.de> On Aug 24, 2005, at 0:21, Alex Krivonosov (alexkr) wrote: > I have a TCP connection handled by the completion port IO model. > What is happening is in case I specify a large buffer for receiving > (WSARecv), the operation completes only after the buffer is full, > not after receiving about 500 bytes (a packet), so a significant > delay is introduced. In case of small buffers, performance > degrades. Any advice on this? Completion port model is a must. Please understand that TCP doesn't deliver "packets" to the application, it provides a byte stream. You may want to look into using non-blocking I/O for the receive call. (I don't know what you mean by "completion port model.") Lars -- Lars Eggert NEC Network Laboratories -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3686 bytes Desc: not available Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050824/9ed62f66/smime-0001.bin From braden at ISI.EDU Wed Aug 24 05:43:01 2005 From: braden at ISI.EDU (Bob Braden) Date: Wed, 24 Aug 2005 05:43:01 -0700 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP In-Reply-To: <0EC343A4-094D-4EE6-9428-7D3FE03CB83E@netlab.nec.de> References: Message-ID: <5.1.0.14.2.20050824053938.024567a0@boreas.isi.edu> BTW, the designers of TCP included a protocol mechanism to deal with this problem. It is called "Push". The Berkeley folks chose to ignore it when they developed the socket interface, because it did not fit into their model of a connection as a virtual file. Interesting example of an atrophied protocol mechanism. Bob Braden At 10:06 AM 8/24/2005 +0200, Lars Eggert wrote: >On Aug 24, 2005, at 0:21, Alex Krivonosov (alexkr) wrote: >>I have a TCP connection handled by the completion port IO model. >>What is happening is in case I specify a large buffer for receiving >>(WSARecv), the operation completes only after the buffer is full, >>not after receiving about 500 bytes (a packet), so a significant >>delay is introduced. In case of small buffers, performance >>degrades. Any advice on this? Completion port model is a must. > >Please understand that TCP doesn't deliver "packets" to the >application, it provides a byte stream. You may want to look into >using non-blocking I/O for the receive call. (I don't know what you >mean by "completion port model.") > >Lars >-- >Lars Eggert NEC Network Laboratories > > From s.malik at tuhh.de Wed Aug 24 06:30:18 2005 From: s.malik at tuhh.de (Sireen Habib Malik) Date: Wed, 24 Aug 2005 15:30:18 +0200 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP In-Reply-To: References: Message-ID: <430C766A.4000801@tuhh.de> > >I have a TCP connection handled by the completion port IO model. What is >happening is in case I specify a large buffer for receiving (WSARecv), >the operation completes only after the buffer is full, not after >receiving about 500 bytes (a packet), so a significant delay is >introduced. In case of small buffers, performance degrades. Any advice >on this? Completion port model is a must. > > No idea what a "completion port model" is! Here are some hints for your question. Large delay for large buffer is intuitively clear. For the small buffers, consider the relation: Maximum Window (Wmax) = BufferSize+ Capacity*RTT (assuming error-free medium, and that Capacity is the speed of the ONLY bottleneck). For the more common TCP versions, Wmax should not be so small that TCP does not get a chance to get out of the Slow-Start phase. For triple-duplicates to arrive, there should be atleast 3 packets successfully delievered after the lost packet. Otherwise, TCP time-outs. If your client access-speed is small, give your connection enough buffer-space to "atleast" get to the saw-tooth behavior. The other possibility is that with a large buffer the connection can operate at the Maximum Congestion Window, however, with a small buffer you are forcing it go into the saw-tooth congestion control. So poorer performance, relatively speaking. Hope that helps. -- SM From Anil.Agarwal at viasat.com Wed Aug 24 08:01:44 2005 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Wed, 24 Aug 2005 11:01:44 -0400 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP Message-ID: All, "Completion Port Model" is a Windows-specific mechanism. I suspect this question requires some Windows expertise, rather than TCP expertise. See links below for some info on this topic - http://www.nevelsteen.com/coding/completion_ports_in_delphi.htm http://www.sysinternals.com/Information/IoCompletionPorts.html Anil Agarwal Viasat Inc. 22300 Comsat Dr. Clarksburg, MD 20871 (W) 301-428-4655 Anil.Agarwal at viasat.com -----Original Message----- From: end2end-interest-bounces at postel.org [mailto:end2end-interest-bounces at postel.org]On Behalf Of Sireen Habib Malik Sent: Wednesday, August 24, 2005 9:30 AM To: Alex Krivonosov (alexkr) Cc: end2end-interest at postel.org Subject: Re: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP > >I have a TCP connection handled by the completion port IO model. What is >happening is in case I specify a large buffer for receiving (WSARecv), >the operation completes only after the buffer is full, not after >receiving about 500 bytes (a packet), so a significant delay is >introduced. In case of small buffers, performance degrades. Any advice >on this? Completion port model is a must. > > No idea what a "completion port model" is! Here are some hints for your question. Large delay for large buffer is intuitively clear. For the small buffers, consider the relation: Maximum Window (Wmax) = BufferSize+ Capacity*RTT (assuming error-free medium, and that Capacity is the speed of the ONLY bottleneck). For the more common TCP versions, Wmax should not be so small that TCP does not get a chance to get out of the Slow-Start phase. For triple-duplicates to arrive, there should be atleast 3 packets successfully delievered after the lost packet. Otherwise, TCP time-outs. If your client access-speed is small, give your connection enough buffer-space to "atleast" get to the saw-tooth behavior. The other possibility is that with a large buffer the connection can operate at the Maximum Congestion Window, however, with a small buffer you are forcing it go into the saw-tooth congestion control. So poorer performance, relatively speaking. Hope that helps. -- SM From fred at cisco.com Wed Aug 24 02:36:47 2005 From: fred at cisco.com (Fred Baker) Date: Wed, 24 Aug 2005 17:36:47 +0800 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> References: <43094A36.1040402@reed.com> <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> Message-ID: <0D1E327F-5E78-49ED-A132-D048995E144A@cisco.com> So I am sitting in a meeting room at APAN, which is meeting in Taipei. I happen to be VPN'd into Cisco in San Jose, but I shut that down to develop a traceroute for your benefit. The traceroute from here to Cisco is: traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40 byte packets 1 ip-242-001 (140.109.242.1) 8.177 ms 10.311 ms 16.018 ms 2 ae-0-10.br0.tpe.tw.rt.ascc.net (140.109.251.50) 2.096 ms 66.035 ms 49.755 ms 3 s4-1-1-0.br0.pax.us.rt.ascc.net (140.109.251.105) 206.316 ms 162.307 ms 259.891 ms 4 so-5-1.hsa4.sanjose1.level3.net (64.152.81.9) 130.915 ms 274.471 ms 304.699 ms 5 so-2-1-0.bbr2.sanjose1.level3.net (4.68.114.157) 132.229 ms 176.587 ms 135.330 ms 6 ge-11-0.ipcolo1.sanjose1.level3.net (4.68.123.41) 134.507 ms ge-11-2.ipcolo1.sanjose1.level3.net (4.68.123.169) 131.669 ms ge-11-0.ipcolo1.sanjose1.level3.net (4.68.123.41) 134.544 ms 7 p1-0.cisco.bbnplanet.net (4.0.26.14) 130.734 ms 131.757 ms 140.291 ms 8 sjck-dmzbb-gw1.cisco.com (128.107.239.9) 146.848 ms 132.394 ms 168.201 ms ... I ran a ping (through the VPN) to a server inside Cisco. While I did that, I downloaded a number of files. The variation in ping delay is: 225 packets transmitted, 222 packets received, 1% packet loss round-trip min/avg/max/stddev = 132.565/571.710/2167.062/441.876 ms The peak rate sftp reported was about 141.3 KB/s, and the least rate was 34.2 KB/s. The difference most likely relates to the effects of packet loss (1.3% loss is non-negligible), delay variation (a standard deviation in ping RTT of 442 ms and an absolute variation in delay of 2034 ms are also non-negligible), the effects of slow-start and fast-retransmit procedures, or the bandwidth remaining while other users also made use of the link. What this demonstrates is the variation in delay that happens around bottlenecks in the Internet, and why folks that worry about TCP/SCTP congestion management procedures are not playing with recreational pharmaceuticals. I won't speculate where this bottleneck is beyond saying I'll bet it's in one of the first few hops of that traceroute - the access path. On Aug 23, 2005, at 5:50 AM, Fred Baker wrote: > no, but there are different realities, and how one measures them is > also relevant. > > In large fiber backbones, within the backbone we generally run 10:1 > overprovisioned or more. within those backbones, as you note, the > discussion is moot. But not all traffic stays within the cores of > large fiber backbones - much of it is originated and terminates in > end systems located in homes and offices. > > The networks that connect homes and offices to the backbones are > often constrained differently. For example, my home (in an affluent > community in California) is connected by Cable Modem, and the > service that I buy (business service that in its AUP accepts a VPN, > unlike the same company's residential service) guarantees a certain > amount of bandwidth, and constrains me to that bandwidth - measured > in KBPS. I can pretty easily fill that, and when I do certain > services like VoIP don't work anywhere near as well. So I wind up > playing with the queuing of traffic in the router in my home to > work around the service rate limit in my ISP. As I type this > morning (in a hotel in Taipei), the hotel provides an access > network that I share with the other occupants of the hotel. It's > not uncommon for the entire hotel to share a single path for all of > its occupants, and that single path is not necessarily in MBPS. > And, they tell me that the entire world is not connected by large > fiber cores - as soon as you step out of the affluent > industrialized countries, VSAT, 64 KBPS links, and even 9.6 access > over GSM become the access paths available. > > As to measurement, note that we generally measure that > overprovisioning by running MRTG and sampling throughput rates > every 300 seconds. When you're discussing general service levels > for an ISP, that is probably reasonable. When you're measuring time > variations on the order of milliseconds, that's a little like > running a bump counter cable across a busy intersection in your > favorite downtown, reading the counter once a day, and drawing > inferences about the behavior of traffic during light changes > during rush hour... > > http://www.ieee-infocom.org/2004/Papers/37_4.PDF has an interesting > data point. They used a much better measurement methodology, and > one of the large networks gave them some pretty cool access in > order to make those tests. Basically, queuing delays within that > particular very-well-engineered large fiber core were on the order > of 1 ms or less during the study, with very high confidence. But > the same data flows frequently jumped into the 10 ms range even > within the 90% confidence interval, and a few times jumped to 100 > ms or so. The jumps to high delays would most likely relate to > correlated high volume data flows, I suspect, either due to route > changes or simple high traffic volume. > > The people on NANOG and the people in the NRENs live in a certain > ivory tower, and have little patience with those who don't. They > also measure the world in a certain way that is easy for them. > > > On Aug 23, 2005, at 12:13 AM, David Hagel wrote: > > >> Thanks, this is interesting. I asked the same question on nanog >> and got similar responses: that queuing delay is negligible on >> todays backbone networks compared to other fixed delay components >> (propagation, store-and-forward, transmission etc). Response on >> nanog seems to indicate that queuing delay is almost irrelevant >> today. >> >> This may sound like a naive question. But if queuing delays are so >> insignificant in comparison to other fixed delay components then >> what does it say about the usefulness of all the extensive >> techniques for queue management and congestion control (including >> TCP congestion control, RED and so forth) in the context of >> today's backbone networks? Any thoughts? Are the congestion >> control researchers out of touch with reality? >> >> - Dave >> >> On 8/21/05, David P. Reed wrote: >> >>> I can repeatably easily measure 40 msec. coast-to-coast (Boston- >>> LA), of which around 25 msec. is accounted for by speed of light >>> in fiber (which is 2/3 of speed of light in vacuum, *299,792,458 >>> m s^-1 *, because the refractive index of fiber is approximately >>> 1.5 or 3/2). So assume 2e8 m/s as the speed of light in fiber, >>> 1.6e3 m/mile, and you get 1.25e5 mi/sec. >>> >>> The remaining 15 msec. can be accounted for by the fiber path not >>> being straight line, or by various "buffering delays" (which >>> include queueing delays, and scheduling delays in the case where >>> frames are scheduled periodically and you have to wait for the >>> next frame time to launch your frame). >>> >>> Craig Partridge and I have debated (offline) what the breakdown >>> might actually turn out to be (he thinks the total buffering >>> delay is only 2-3 msec., I think it's more like 10-12), and it >>> would be quite interesting to get more details, but that would >>> involve delving into the actual equipment deployed and its >>> operating modes. >>> > > From sampad_m at rediffmail.com Wed Aug 24 09:06:51 2005 From: sampad_m at rediffmail.com (sampad mishra) Date: 24 Aug 2005 16:06:51 -0000 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP Message-ID: <20050824160651.789.qmail@webmail8.rediffmail.com> On Wed, 24 Aug 2005 Lars Eggert wrote : >On Aug 24, 2005, at 0:21, Alex Krivonosov (alexkr) wrote: >>I have a TCP connection handled by the completion port IO model. What is happening is in case I specify a large buffer for receiving (WSARecv), the operation completes only after the buffer is full, not after receiving about 500 bytes (a packet), so a significant delay is introduced. In case of small buffers, performance degrades. Any advice on this? Completion port model is a must. > >Please understand that TCP doesn't deliver "packets" to the application, it provides a byte stream. You may want to look into using non-blocking I/O for the receive call. (I don't know what you mean by "completion port model.") > >Lars What Lars said is right, TCP doen't deliver "packets" to the application. Now in your case I think it is going into the blocking mode. One way to verify is, check the return value, Result = WSARecv(....) If the socket is non blocking , it would return WSAEWOULDBLOCK. You have to handle this case using WSAAsyncSelect(SOCKET id , HWND , uint msg,combination of events(like FD_READ,FD_WRITE , etc) Now handle these messages(FD_READ for reading,....) in ur WindowProc of the window specified. You have to go through the MSDN document to get a clear picture... u can use the chunk of code illustarted below: Result = WSARecv(....) if (Result == SOCKET_ERROR) { Error = WSAGetLastError(); switch (Error) { case WSAENETRESET: // flow through case WSAECONNRESET: return FALSE; case WSAEWOULDBLOCK: WSAAsyncSelect (SOCKID, HWND, WM_TCP_NET_MESSAGE(uint), FD_CONNECT | FD_READ | FD_WRITE | FD_CLOSE); return FALSE; default: return FALSE; } Well I'm not sure whether this is what u wanted nevertheless this might still help. Regards, Sampad Mishra. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050824/3e3fe90e/attachment.html From sampad_m at rediffmail.com Wed Aug 24 09:07:31 2005 From: sampad_m at rediffmail.com (sampad mishra) Date: 24 Aug 2005 16:07:31 -0000 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP Message-ID: <20050824160731.17361.qmail@webmail30.rediffmail.com> On Wed, 24 Aug 2005 Lars Eggert wrote : >On Aug 24, 2005, at 0:21, Alex Krivonosov (alexkr) wrote: >>I have a TCP connection handled by the completion port IO model. What is happening is in case I specify a large buffer for receiving (WSARecv), the operation completes only after the buffer is full, not after receiving about 500 bytes (a packet), so a significant delay is introduced. In case of small buffers, performance degrades. Any advice on this? Completion port model is a must. > >Please understand that TCP doesn't deliver "packets" to the application, it provides a byte stream. You may want to look into using non-blocking I/O for the receive call. (I don't know what you mean by "completion port model.") > >Lars What Lars said is right, TCP doen't deliver "packets" to the application. Now in your case I think it is going into the blocking mode. One way to verify is, check the return value, Result = WSARecv(....) If the socket is non blocking , it would return WSAEWOULDBLOCK. You have to handle this case using WSAAsyncSelect(SOCKET id , HWND , uint msg,combination of events(like FD_READ,FD_WRITE , etc) Now handle these messages(FD_READ for reading,....) in ur WindowProc of the window specified. You have to go through the MSDN document to get a clear picture... u can use the chunk of code illustarted below: Result = WSARecv(....) if (Result == SOCKET_ERROR) { Error = WSAGetLastError(); switch (Error) { case WSAENETRESET: // flow through case WSAECONNRESET: return FALSE; case WSAEWOULDBLOCK: WSAAsyncSelect (SOCKID, HWND, WM_TCP_NET_MESSAGE(uint), FD_CONNECT | FD_READ | FD_WRITE | FD_CLOSE); return FALSE; default: return FALSE; } Well I'm not sure whether this is what u wanted nevertheless this might still help. Regards, Sampad Mishra. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050824/966e8710/attachment-0001.html From sommerfeld at sun.com Wed Aug 24 09:23:01 2005 From: sommerfeld at sun.com (Bill Sommerfeld) Date: Wed, 24 Aug 2005 12:23:01 -0400 Subject: [e2e] Question about propagation and queuing delays In-Reply-To: <0D1E327F-5E78-49ED-A132-D048995E144A@cisco.com> References: <43094A36.1040402@reed.com> <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> <0D1E327F-5E78-49ED-A132-D048995E144A@cisco.com> Message-ID: <1124900580.7308.36.camel@thunk> On Wed, 2005-08-24 at 05:36, Fred Baker wrote: > The peak rate sftp reported was about 141.3 KB/s, and the least rate > was 34.2 KB/s. The difference most likely relates to the effects of > packet loss (1.3% loss is non-negligible), delay variation (a > standard deviation in ping RTT of 442 ms and an absolute variation in > delay of 2034 ms are also non-negligible), the effects of slow-start > and fast-retransmit procedures, or the bandwidth remaining while > other users also made use of the link. If by sftp you mean the protocol described in some version of draft-ietf-secsh-filexfer running over ssh, i'd be cautious in using it as a path benchmark. ssh muxes multiple stream channels over a single tcp connection, and as a result does its own flow control on a per-channel basis so that implementations aren't forced to choose between unlimited buffer usage within implementations or deadlock. There have been anecdotal reports to the secure shell wg that default channel window sizes in commonly used implementations are far from optimal but I haven't heard any updates in a while. But then, I'm just the cat-herder for that WG and not an implementor of that protocol family... - Bill From alexkr at cisco.com Wed Aug 24 09:22:59 2005 From: alexkr at cisco.com (Alex Krivonosov (alexkr)) Date: Wed, 24 Aug 2005 09:22:59 -0700 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP Message-ID: Sireen, This issue has nothing to do with the protocol operation, this is clearly an intermal windows problem. While using blocking sockets, it does not happen independent on the buffer size. Alex -----Original Message----- From: Sireen Habib Malik [mailto:s.malik at tuhh.de] Sent: Wednesday, August 24, 2005 6:30 AM To: Alex Krivonosov (alexkr) Cc: end2end-interest at postel.org Subject: Re: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP > >I have a TCP connection handled by the completion port IO model. What >is happening is in case I specify a large buffer for receiving >(WSARecv), the operation completes only after the buffer is full, not >after receiving about 500 bytes (a packet), so a significant delay is >introduced. In case of small buffers, performance degrades. Any advice >on this? Completion port model is a must. > > No idea what a "completion port model" is! Here are some hints for your question. Large delay for large buffer is intuitively clear. For the small buffers, consider the relation: Maximum Window (Wmax) = BufferSize+ Capacity*RTT (assuming error-free medium, and that Capacity is the speed of the ONLY bottleneck). For the more common TCP versions, Wmax should not be so small that TCP does not get a chance to get out of the Slow-Start phase. For triple-duplicates to arrive, there should be atleast 3 packets successfully delievered after the lost packet. Otherwise, TCP time-outs. If your client access-speed is small, give your connection enough buffer-space to "atleast" get to the saw-tooth behavior. The other possibility is that with a large buffer the connection can operate at the Maximum Congestion Window, however, with a small buffer you are forcing it go into the saw-tooth congestion control. So poorer performance, relatively speaking. Hope that helps. -- SM From randy at psg.com Wed Aug 24 10:49:42 2005 From: randy at psg.com (Randy Bush) Date: Wed, 24 Aug 2005 10:49:42 -0700 Subject: [e2e] Question about propagation and queuing delays References: <43094A36.1040402@reed.com> <13B1CFCA-3291-4B04-8CC4-D711D4423486@cisco.com> <0D1E327F-5E78-49ED-A132-D048995E144A@cisco.com> Message-ID: <17164.45878.715907.892524@roam.psg.com> > The traceroute from here to Cisco is: > > traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40 > byte packets > 1 ip-242-001 (140.109.242.1) 8.177 ms 10.311 ms 16.018 ms one can not measure to you as there is blockage traceroute to 140.109.242.1 (140.109.242.1), 64 hops max, 40 byte packets 1 psg2 (147.28.0.5) 0.268 ms 0.283 ms 0.355 ms 2 ... 10 s4-4-3-0.br0.tpe.tw.rt.ascc.net (140.109.251.106) 167.259 ms 166.778 ms 166.749 ms 11 * * * but one can measure up to the block and pathchar to 140.109.251.106 (140.109.251.106) mtu limited to 1500 bytes at local host doing 32 probes at each of 45 sizes (64 to 1500 by 32) 0 rip (147.28.0.39) | 33 Mb/s, 133 us (631 us) 1 psg2 (147.28.0.5) | 42 Mb/s, 103 us (1.12 ms) 2 e2.psg1.psg.com (147.28.1.5) | ?? b/s, -93 us (0.91 ms) 3 sl-gw11-sea-0-1.sprintlink.net (144.232.9.61) | 182 Mb/s, 16 us (1.01 ms) 4 sl-bb20-sea-9-2.sprintlink.net (144.232.6.125) | 337 Mb/s, 135 us (1.31 ms) 5 so-3-0-0.gar1.Seattle1.Level3.net (209.0.227.133) | 1138 Mb/s, 28 us (1.38 ms) 6 so-7-0-0.mp1.Seattle1.Level3.net (64.159.1.81) | ?? b/s, 8.61 ms (18.5 ms) 7 as-0-0.bbr2.SanJose1.Level3.net (64.159.0.218) | 1873497444986126 Mb/s, 77 us (18.7 ms) 8 so-14-0.hsa4.SanJose1.Level3.net (4.68.114.158) | 256 Mb/s, 69 us (18.9 ms) 9 REACH-SERVIC.hsa4.Level3.net (64.152.81.10) | 32 Mb/s, 63.7 ms (147 ms), +q 21.3 ms (84.9 KB) 10 s4-4-3-0.br0.tpe.tw.rt.ascc.net (140.109.251.106) 10 hops, rtt 145 ms (147 ms), bottleneck 32 Mb/s, pipe 583768 bytes observe the queue on the 9/10 hop randy From dpreed at reed.com Thu Aug 25 07:37:34 2005 From: dpreed at reed.com (David P. Reed) Date: Thu, 25 Aug 2005 10:37:34 -0400 Subject: [e2e] Need help: setting winsock receive low watermark while using completion port and TCP In-Reply-To: <0EC343A4-094D-4EE6-9428-7D3FE03CB83E@netlab.nec.de> References: <0EC343A4-094D-4EE6-9428-7D3FE03CB83E@netlab.nec.de> Message-ID: <430DD7AE.5060101@reed.com> Lars Eggert wrote: > On Aug 24, 2005, at 0:21, Alex Krivonosov (alexkr) wrote: > >> I have a TCP connection handled by the completion port IO model. >> What is happening is in case I specify a large buffer for receiving >> (WSARecv), the operation completes only after the buffer is full, >> not after receiving about 500 bytes (a packet), so a significant >> delay is introduced. In case of small buffers, performance degrades. >> Any advice on this? Completion port model is a must. > > > Please understand that TCP doesn't deliver "packets" to the > application, it provides a byte stream. You may want to look into > using non-blocking I/O for the receive call. (I don't know what you > mean by "completion port model.") The definition of I/O completion in Winsock *is* buffer full. Size of buffer on receive is not a major performance problem (system calls aren't slow compared to processing), so if you want notification on 500 bytes, use 500 byte buffers. A thought you might not have considered: Perhaps you are sending your 500 byte messages, one per call, on the sender with TCP_NODELAY set? This could cause some performance problems if the source end has a fast link, but the receiving node has a slow absorption rate (the packets on the source will not combine into larger frames until the window fills up.) Of course that is exactly what TCP_NODELAY is for (minimizing message latency, but increasing network overhead) - if you don't care so much about latency, don't set TCP_NODELAY. (or you can get very complex by using I/O completion based app-level output management on the send side to control the latency/efficiency tradeoff, using WSASendMessage to gather mutliple frames adaptively and "delaying" sends at the app level until precise conditions hold related to your desired latency goal and trying to gather 1-3 of your messages into single sends). From detlef.bosau at web.de Thu Aug 25 12:13:38 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 25 Aug 2005 21:13:38 +0200 Subject: [e2e] Retransmission Timouts revisited Message-ID: <430E1862.23935C68@web.de> I intendedly do not write "TCP" because this matter is not restricted to TCP but retransmission timeouts are required in _any_ protocol which has to cope with packet losses. It doesn?t matter whether a timeout is detected on the sender or on the receiver. As long as there is no other mechanism to detect packet loss, we must rely on timouts. I just compared the rto algorithms given in Edge?s paper from 1984 and the congavoid paper (obviously in some newer version where rto is set to mean+4*variation instead of mean+2*variation, but this does not matter for the discussion). For simplity, let?s ignore Karn?s algorithm and focus on the math work for rto. We further assume the preconditions for Edge?s work may hold. (In fact, we must have a very close look to this, but in this discussion we assume they will hold.) Then the difference is the choice of the variance estimator. Is this correct? Both, Edge and Jacobson/Karels as well, estimate the RTT mean wusing an EWMA filter. Edge estimates the variance using an EWMA filter as well, other than Jacobson who uses an estimator which gives an estimation for the variance and is easier to calculate. What makes me curious about that is, that the rto given by edge _essentialy_ relys on (an one tailed version of) Chebyshev?s inequality. That?s why ranted the last days when it came to spurious timeouts. The is much written about the multiplicative factor k in rto=mean + k*variance. However, isntead of a qualitative "guess", Edge?s paper derives an RTO which respects a _prescribed_ upper limit for an "unwanted retransmission probality" AKA spurious timeout probability. When we consider Edge?s formula RTO = mean + e [\sigma*(1-Y)/Y]^1/2 and set e to 1 (which is appropriate because "e" does not appear in the derivation of the formula), than Y is the spurious timeout probability. (Refer to Edge?s paper, formulae 20 ff. for details.) Practically, this means that if we choose rto=man+2*var = mean + (4*\sigma^2)^1/2 this means 4=(1-Y)/Y hence Y = 1/5. Hence, the spurious timeout probability has an upper limit of 1/5, _independent_ from the actual RTT distribution. As I said above, a newer version of the congavoid paper deals with k=4, then the spurious timeout probability is 1/17. This matter was even discussed in a paper by Leung, Klein, Mooney and Haner who proposed to further increase the RTO to avoid spurious timeouts. Of course, one can have a religious discussion here. However, if we take Edge?s formula, we prescribe an upper limit for the spurious timeout propability, e.g Y should be 1/63, and then we have: k=sqrt((1-y)/y) = sqrt((62/63) * 63) = sqrt(62) = ca. 7.87 Hence, we can define an upper limit for the spurious timeout probability and derive the necessary k. NB: We did not discuss delayed retransmissions here which may result from a too large rto. All this holds true in Edge?s paper, at least asymptotically. However, the congavoid paper choses a different variance estimator, which is easier to calculate. Q1.: Is there evidence that the rto equations by Edge hold true? Particularly: What is the precise relationship between the "mdev" used in the congavoid paper and \sigma? There are some vague remarks on this one. However, this is a central issue as it directly affects the applicability of Edge?s formula. The very strength of Edge?s formula is that we have a _generic_ estimation for the spurious timeout probability, which is especially indendendent of the actual RTT distribution. Q2.: As we have more powerful computing machinery now than 1988, did anybody think of using Edge?s orginal formulae again? If we recall Edge?s formula, that?s why I talked from an "urban legend" here recently as far as spurious timeouts are concerned: There _are_ spurious timeouts. Anywhere, anytime. And assumed, Edge?s formula where applicable, we can define an upper limit for the spurious timeout probability and hence, spurious timeouts will not occur unduly often. It doesn?t matter whether we run TCP over Ethernet, GPRS or even with flying pigs. Whether the assumptions for Edge?s formulae will hold on an arbitrary network, is a different story. Some problems are: 1. A few years ago, I read a paper that packet order distortions cannot be neglected in the Internet. Thus, the Internet does no longer performe "Sequencing Positive Acknowledgement Retnramission (SPAR)". I did not yet udnerstand all the details, but Edge explicitely maks use of the SPAR assumption several times in his paper. Thus, I?m not yet convinced that his rationale will hold, if this assumption is violated. 2. The EWMA estimators used by Edge require the observation variables to be independent. In fact, RTT observations are gained from ACK packets and these are send by the receiver as TCP packets arrive. (Let?s ignore delayed ACK here.) Hence, the latency experienced by a packet directly affects the sampling times of following RTT samples. In ohter terms: The random variables used for rtt observation directly affect each other and I?m not sure whether they are really indpendent. 3. In addition to 2, we must review the weakly stationary assumption. Basically, in Edge?s paper this assumption results in convergence statemtens: i) The mean estimator is in fact an asymptotic unbiased estimator for mean. ii) The variance estimator converges to var (t(n+1)-T(n)) where t(n+1) is the n+1 th rtt observation and T(n) the n-th estimate for mean. Both convergences hold for n->inf., i.e. "in the long run". 4. "in the long run": As we know from pratical statistics, in the Internet short term flows are by far more often as long term flows. In other terms: When the estimators "start to converge", the flow of interest may be history. In any case, the formulae given by Edge make it easier to do proper analysis, as they explicitely state assumptions and preconditions. At the moment, I do not quite see, under wich circumstances the rto formula in the congavoid paper will hold. Particularly unduly often spurious timeouts rise the question, whether the problem is in fact the network technology in use, or wether the problem is the violation of the requirements for the rto estimator to work properly. >From the aforementioned problem list, I conclude that there are a number of vulnerabilities in the actual rto estimation scheme. In addition, this is not only a problem for TCP, but for any protocol which requires timeouts and must rely upon estimators for mean and variance to obtain them. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Thu Aug 25 13:23:08 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 25 Aug 2005 22:23:08 +0200 Subject: [e2e] Retransmission Timouts revisited References: <430E1862.23935C68@web.de> Message-ID: <430E28AC.7E6F320A@web.de> Detlef Bosau wrote: > Some problems are: .... I missed one important problem: The forecast capacity. Particularly, the congavoid paper recommends to choose the gain in the mean estimator according to noise etc. Edge clearly points out that the gain should be chosen that way, that the forecast error is minimzied. The problem is that RTO is applied to a packet which has not yet been sent. Consider an estimated rtt of 5 seconds, then we forecast a networks path properties for a duration of 5 seconds. At the moment, we have a "one size fits all" gain in TCP, which practically ignores, that actual RTTs span a range of up to eight orders of magnitude. When I read the "cost to cost" latencies posted by several authors here, I was highly interested. Unfortunately, I don?t live in the US, I?m an ordinary private Internet user in Germany, connected with a DSL line. Admittedly, RTT up to one or two _seconds_ two servers in the US are extremely rate, but they _exist_. And when 3rd generation mobile networks reach the public, we will face much larger round trip times. This leads to a settling time problem as well. This is particularly a problem for short termed flows. Just to say that. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From detlef.bosau at web.de Thu Aug 25 15:15:51 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 26 Aug 2005 00:15:51 +0200 Subject: [e2e] SPAR, comment to: Re: Retransmission Timouts revisited References: <430E1862.23935C68@web.de> <430E28AC.7E6F320A@web.de> Message-ID: <430E4316.2502DB8F@web.de> I just received a mail from Wes: > > > On Thu, Aug 25, 2005 at 09:13:38PM +0200, Detlef Bosau wrote: > > > > Some problems are: > > > > 1. A few years ago, I read a paper that packet order distortions cannot > > be neglected in the Internet. Thus, the Internet does no longer > > performe "Sequencing Positive Acknowledgement Retnramission (SPAR)". I > > did not yet udnerstand all the details, but Edge explicitely maks > > use of the SPAR assumption several times in his paper. Thus, I?m not yet > > convinced that his rationale will hold, if this assumption > > is violated. > > > > My interpretation of SPAR is "TCP without SACK", i.e. only the last > in-sequence segment is acknowledged, wheras PAR is "TCP with SACK" where > segments are acked as they arrive. So these are properties of the > transport protocol, not of the network that delivers segments to the > transport, so network reordering does not violate any assumption. > Edge's paper states that his algorithm works for both SPAR and PAR > anyways, so I'm not sure you can call this a "problem". > > -Wes O.k. the central point is packet reordering here. I just read his remakrs on SPAR and PAR, and in fact it seems, Wes is right here. It?s even just on the fist page of Edge?s paper :-| However, I did not yet completely understand the rationale here, it?s not easy. When I stated some "possible problems", that it may be that I just did not understand these details. Therefore, please allow me to ask. -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From arjuna.sathiaseelan at gmail.com Fri Aug 26 12:06:50 2005 From: arjuna.sathiaseelan at gmail.com (Arjuna Sathiaseelan) Date: Fri, 26 Aug 2005 20:06:50 +0100 Subject: [e2e] Link Level Retransmissions Message-ID: <1ef2259005082612063e071e5d@mail.gmail.com> Dear All, I would like to know what is the maximum number of link level retransmissions allowed in an ARQ protocol especially in satellite networks? Is this based on a timer i.e. when a timeout happens the link layer would leave the error recovery to the TCP layer..Please let me know.. Regds, Arjuna From detlef.bosau at web.de Fri Aug 26 12:44:28 2005 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 26 Aug 2005 21:44:28 +0200 Subject: [e2e] SPAR, comment to: Re: Retransmission Timouts revisited References: <430E1862.23935C68@web.de> <430E28AC.7E6F320A@web.de> <430E4316.2502DB8F@web.de> Message-ID: <430F711C.4020403@web.de> Detlef Bosau wrote: > I just received a mail from Wes: > > (mail snipped.) Although I thought, it would could be justified to quote this mail without asking Wes, because there is no personal or confidential content in this, it was wrong to do so. I apologzie for that. Detlef -- Detlef Bosau Galileistrasse 30 70565 Stuttgart Mail: detlef.bosau at web.de Web: http://www.detlef-bosau.de Mobile: +49 172 681 9937 From ikob at koganei.wide.ad.jp Wed Aug 31 03:39:22 2005 From: ikob at koganei.wide.ad.jp (Katsushi Kobayashi) Date: Wed, 31 Aug 2005 19:39:22 +0900 Subject: [e2e] PFLDnet2006 CFP Message-ID: Call For Papers =============== Fourth International Workshop on Protocols for Fast Long-Distance Networks PFLDnet2006 February 2-3, 2006 Nara, Japan - An ancient capital city older than Kyoto ------------------------------------------------------------------------ http://www.hpcc.jp/pfldnet2006/ ------------------------------------------------------------------------ Fast long-distance networks (i.e., networks operating at 622 Mbit/s, 2.5 Gbit/s, or 10 Gbit/s, soon will be 40 Gbit/s, and spanning several countries or states) are now becoming commonplace. Increasing numbers of researchers now routinely transfer between 10 GB and multi-TB datasets over gigabit networks. Application domains for such massive transfers include data-intensive Grids (e.g., in Particle Physics, Earth Observation, Bio informatics, and Radio Astronomy), database mirroring for Web sites (e.g., in e-commerce), and push-based Web cache updates. Although the connectivity infrastructure is now in place, or will soon be, the transport and application protocols available to date are proving inadequate for fast transfer of large volumes of data over such networks. Current versions of TCP cannot fully exploit the network capacity. For instance, recovery time from a congestion event grows at a super-linear rate, and can easily exceed 10 minutes in very high bandwidth-delay product networks. It also requires a large congestion window for high throughput, consuming valuable system resources. A number of research teams have begun investigating advanced protocols for domain-specific and general applications. The International Workshop on Protocols for Fast Long-Distance Networks in CERN (http://datatag.web.cern.ch/datatag/pfldnet2003/), in Argonne (http://www-didc.lbl.gov/PFLDnet2004/), and in Lyon (http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/) were very successful in bringing together many researchers from all over the world including North America, Europe and Asia who are working on these problems. This workshop will continue this tradition, and provide a perfect setting for researchers in this area to exchange ideas and experience. This single-track workshop will provide researchers and technologists with a focused, highly interactive opportunity to present, discuss and exchange experience on leading research, development and future directions in high performance transport and application protocols (TCP, UDP, HTTP, FTP, etc.) over fast long-distance networks. In order to facilitate discussions, attendance will be limited to 60 participants. Please register early to ensure your participation. Depending on the number of people who register, we may need to restrict the number of people from a given organization to allow for a broader representation of the research community. Registration will open late 2005. Call For Papers --------------- Participants wishing to present a paper should upload a four- pages extended abstract to http://www.hpcc.jp/pfldnet2006/ by October 14 2005. Authors whose abstracts are selected for presentation will have the option to submit a full paper, to be published on the PFLDnet 2006 web site and in the PFLDnet 2006 proceedings. Scope ----- The PFLDnet2006 workshop will focus on research issues and challenges as well as lessons learned from experience. Topics of interest include and are not limited to: - Protocol issues in fast long-distance networks - Enhancements of TCP and its variants - Novel data transport protocols designed for new application services - Transport over optical networks - RDMA over WANs - Shaping on TCP and UDP traffic - QoS and scalability issues - Parallel transfers and multistreaming - Multicast over fast long-distance networks - Modeling and simulation-based results - Experiments on real networks and actual measurements - Protocol benchmarking - Protocol implementation and hardware issues (PCs, NICs, TOEs, routers, switches, etc.) - Data replications and striping - Requirements and experience from bandwidth demanding applications - Bulk-data transfer applications both TCP and non-TCP based - Transport service for Grids Important Dates --------------- Extended Abstract Submission Deadline: October 14 Acceptance Notification: December 2 Final Paper Submission: January 20 Workshop: February 2-3 Committees ---------- Co-Chairs: Richard Hughes-Jones (Univ. Manchester - UK) Kei Hiraki (Univ. of Tokyo - JP) Jason Leigh (UIC - USA) Steering Committee: Pascale Vicat-Blanc Primet (INRIA - FR) Tomohiro Kudoh (AIST - JP) Katsushi Kobayashi (NICT - JP) Technical Program Committee : Brian L Tierney (LBL - USA) R. Les Cottrell (SLAC - USA) Bill Allcock (ANL - USA) Eitan Altman (INRIA - FR) Richard Carlson (Internet 2 - USA) Sally Floyd (ICIR - USA) Pascale Vicat-Blanc Primet (INRIA - FR) Tomohiro Kudoh (AIST - JP) Douglas Leith (Hamilton Institute - IR) Steven Low (CALTECH - USA) Medy Sanadidi (UCLA - USA) Robin Tasker (CCLRC - UK) Hideyuki Shimonishi (NEC - JP) Kenjiro Cho (IIJ - JP) Injong Rhee (NCSU - USA) Andrew Chien (UCSD - USA) Aaron Falk (ISI - USA) Katsushi Kobayashi (NICT - JP) Local Organization Committee: Noritoshi Demizu (NICT - JP) Sponsors: --------- NICT, JAPAN TBD. Contact: ikob at koganei.wide.ad.jp