From touch at ISI.EDU Fri Dec 1 15:29:42 2006 From: touch at ISI.EDU (Joe Touch) Date: Fri, 01 Dec 2006 15:29:42 -0800 Subject: [e2e] trading acks...TRACKS In-Reply-To: <1164699864.2453.31.camel@strangepork> References: <45687401.7020308@web.de> <1164699864.2453.31.camel@strangepork> Message-ID: <4570BAE6.7000303@isi.edu> Christian Kreibich wrote: > Hi Detlef, > > On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote: >> just a very spontaneous, perhaps stupid, question: What is the >> difference between "packet symmetry" and the well known principle of >> packet conservation here? Aren?t these ideas at least quite similar? > > the packet conservation principle states that in a steady-state TCP > flow, a new packet is not to enter the network before another one has > left That "packet conservation principle" already has a (perhaps not as well-known, but certainly worth knowing) name: 'isarithmic', and was proposed by Davies in 1972. "Packet symmetry" appears to be per-NIC isarithmic. The "packet conservation principle" is single-protocol isarithmic. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061201/c141ada7/signature.bin From Jon.Crowcroft at cl.cam.ac.uk Sat Dec 2 03:26:02 2006 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sat, 02 Dec 2006 11:26:02 +0000 Subject: [e2e] trading acks...TRACKS In-Reply-To: Message from Joe Touch of "Fri, 01 Dec 2006 15:29:42 PST." <4570BAE6.7000303@isi.edu> Message-ID: the isarithmic flow control stuff was very nice - i remmeber reading it, but have never recently mamnaged to find the actual reference - do you have the proper citation for davies' idea? - we should make sure we know it! In missive <4570BAE6.7000303 at isi.edu>, Joe Touch typed: >>This is an OpenPGP/MIME signed message (RFC 2440 and 3156) >>--------------enig4E3DED50FCEAFA2E99B7EC8E >>Content-Type: text/plain; charset=ISO-8859-1 >>Content-Transfer-Encoding: quoted-printable >> >>Christian Kreibich wrote: >>> Hi Detlef, >>>=20 >>> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote: >>>> just a very spontaneous, perhaps stupid, question: What is the=20 >>>> difference between "packet symmetry" and the well known principle of=20 >>>> packet conservation here? Aren=B4t these ideas at least quite similar?= >> >>>=20 >>> the packet conservation principle states that in a steady-state TCP >>> flow, a new packet is not to enter the network before another one has >>> left >> >>That "packet conservation principle" already has a (perhaps not as >>well-known, but certainly worth knowing) name: 'isarithmic', and was >>proposed by Davies in 1972. >> >>"Packet symmetry" appears to be per-NIC isarithmic. >> >>The "packet conservation principle" is single-protocol isarithmic. >> >>Joe >> >>--=20 >>---------------------------------------- >>Joe Touch >>Sr. Network Engineer, USAF TSAT Space Segment >> >> >> >> >>--------------enig4E3DED50FCEAFA2E99B7EC8E >>Content-Type: application/pgp-signature; name="signature.asc" >>Content-Description: OpenPGP digital signature >>Content-Disposition: attachment; filename="signature.asc" >> >>-----BEGIN PGP SIGNATURE----- >>Version: GnuPG v1.4.3 (MingW32) >>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >>iD8DBQFFcLrmE5f5cImnZrsRAvyMAKDE4E/HA6a3dE6V6tIQ/EQosrFd7wCg7nMZ >>l++51oFUPmw9COfh/Sz9tdA= >>=Sdoq >>-----END PGP SIGNATURE----- >> >>--------------enig4E3DED50FCEAFA2E99B7EC8E-- >> cheers jon From svp+ at cs.cmu.edu Sat Dec 2 09:29:14 2006 From: svp+ at cs.cmu.edu (Swapnil V. Patil) Date: Sat, 2 Dec 2006 12:29:14 -0500 (EST) Subject: [e2e] trading acks...TRACKS In-Reply-To: Message-ID: On Sat, 2 Dec 2006, Jon Crowcroft wrote: > the isarithmic flow control stuff was very nice - i remmeber reading it, but have never > recently mamnaged to find the actual reference - do you have the proper citation for > davies' idea? - we should make sure we know it! > I think the main paper is ... "The control of congestion in packet switching networks" by Donald W. Davies http://portal.acm.org/citation.cfm?id=811052 Thanks -swapnil > In missive <4570BAE6.7000303 at isi.edu>, Joe Touch typed: > > >>This is an OpenPGP/MIME signed message (RFC 2440 and 3156) > >>--------------enig4E3DED50FCEAFA2E99B7EC8E > >>Content-Type: text/plain; charset=ISO-8859-1 > >>Content-Transfer-Encoding: quoted-printable > >> > >>Christian Kreibich wrote: > >>> Hi Detlef, > >>>=20 > >>> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote: > >>>> just a very spontaneous, perhaps stupid, question: What is the=20 > >>>> difference between "packet symmetry" and the well known principle of=20 > >>>> packet conservation here? Aren=B4t these ideas at least quite similar?= > >> > >>>=20 > >>> the packet conservation principle states that in a steady-state TCP > >>> flow, a new packet is not to enter the network before another one has > >>> left > >> > >>That "packet conservation principle" already has a (perhaps not as > >>well-known, but certainly worth knowing) name: 'isarithmic', and was > >>proposed by Davies in 1972. > >> > >>"Packet symmetry" appears to be per-NIC isarithmic. > >> > >>The "packet conservation principle" is single-protocol isarithmic. > >> > >>Joe > >> > >>--=20 > >>---------------------------------------- > >>Joe Touch > >>Sr. Network Engineer, USAF TSAT Space Segment > >> > >> > >> > >> > >>--------------enig4E3DED50FCEAFA2E99B7EC8E > >>Content-Type: application/pgp-signature; name="signature.asc" > >>Content-Description: OpenPGP digital signature > >>Content-Disposition: attachment; filename="signature.asc" > >> > >>-----BEGIN PGP SIGNATURE----- > >>Version: GnuPG v1.4.3 (MingW32) > >>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > >> > >>iD8DBQFFcLrmE5f5cImnZrsRAvyMAKDE4E/HA6a3dE6V6tIQ/EQosrFd7wCg7nMZ > >>l++51oFUPmw9COfh/Sz9tdA= > >>=Sdoq > >>-----END PGP SIGNATURE----- > >> > >>--------------enig4E3DED50FCEAFA2E99B7EC8E-- > >> > > cheers > > jon > > > From L.Wood at surrey.ac.uk Sat Dec 2 18:53:19 2006 From: L.Wood at surrey.ac.uk (L.Wood@surrey.ac.uk) Date: Sun, 3 Dec 2006 02:53:19 -0000 Subject: [e2e] trading acks...TRACKS Message-ID: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> If a packet can't enter the network until one has left, how do you ever get started in an empty totally quiet network? Simple reductio ad absurdum suggests that the packet conservation principle as expressed below is bogus. Not so much isarithmic, as isacrock. However, packet conservation through a router is something that can be aspired to, under limited conditions - thinking about a networking analogue of Kirchoff's electrical laws through the router as a point can actually be useful, too. L. odd to see someone actually mention they're working on TSAT... -----Original Message----- From: end2end-interest-bounces at postel.org on behalf of Joe Touch Sent: Fri 2006-12-01 23:29 To: Christian Kreibich Cc: Jon Crowcroft; end2end-interest at postel.org Subject: Re: [e2e] trading acks...TRACKS Christian Kreibich wrote: > Hi Detlef, > > On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote: >> just a very spontaneous, perhaps stupid, question: What is the >> difference between "packet symmetry" and the well known principle of >> packet conservation here? Aren?t these ideas at least quite similar? > > the packet conservation principle states that in a steady-state TCP > flow, a new packet is not to enter the network before another one has > left That "packet conservation principle" already has a (perhaps not as well-known, but certainly worth knowing) name: 'isarithmic', and was proposed by Davies in 1972. "Packet symmetry" appears to be per-NIC isarithmic. The "packet conservation principle" is single-protocol isarithmic. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/b9b0feee/attachment.html From touch at ISI.EDU Sat Dec 2 19:07:12 2006 From: touch at ISI.EDU (Joe Touch) Date: Sat, 02 Dec 2006 19:07:12 -0800 Subject: [e2e] trading acks...TRACKS In-Reply-To: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> Message-ID: <45723F60.5020903@isi.edu> L.Wood at surrey.ac.uk wrote: > If a packet can't enter the network until one has left, how do > you ever get started in an empty totally quiet network? Simple > reductio ad absurdum suggests that the packet conservation > principle as expressed below is bogus. Not so much isarithmic, > as isacrock. I didn't say it was a great idea; just that it had a name. ;-) But you can bootstrap such a situation; token rings do it all the time. > However, packet conservation through a router is something that > can be aspired to, under limited conditions - thinking about > a networking analogue of Kirchoff's electrical laws through the > router as a point can actually be useful, too. I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense to create/destroy electrons without a source/sink; the same is not true for packets. > L. > > odd to see someone actually mention they're working on TSAT... > > > > > > -----Original Message----- > From: end2end-interest-bounces at postel.org on behalf of Joe Touch > Sent: Fri 2006-12-01 23:29 > To: Christian Kreibich > Cc: Jon Crowcroft; end2end-interest at postel.org > Subject: Re: [e2e] trading acks...TRACKS > > Christian Kreibich wrote: >> Hi Detlef, >> >> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote: >>> just a very spontaneous, perhaps stupid, question: What is the >>> difference between "packet symmetry" and the well known principle of >>> packet conservation here? Aren?t these ideas at least quite similar? >> >> the packet conservation principle states that in a steady-state TCP >> flow, a new packet is not to enter the network before another one has >> left > > That "packet conservation principle" already has a (perhaps not as > well-known, but certainly worth knowing) name: 'isarithmic', and was > proposed by Davies in 1972. > > "Packet symmetry" appears to be per-NIC isarithmic. > > The "packet conservation principle" is single-protocol isarithmic. > > Joe > > -- > ---------------------------------------- > Joe Touch > Sr. Network Engineer, USAF TSAT Space Segment > > > > -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061202/d66fcc47/signature.bin From m.musolesi at cs.ucl.ac.uk Sun Dec 3 01:47:18 2006 From: m.musolesi at cs.ucl.ac.uk (Mirco Musolesi) Date: Sun, 03 Dec 2006 09:47:18 +0000 Subject: [e2e] trading acks...TRACKS In-Reply-To: <45723F60.5020903@isi.edu> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> <45723F60.5020903@isi.edu> Message-ID: <45729D26.4010505@cs.ucl.ac.uk> >> However, packet conservation through a router is something that >> can be aspired to, under limited conditions - thinking about >> a networking analogue of Kirchoff's electrical laws through the >> router as a point can actually be useful, too. > > I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense > to create/destroy electrons without a source/sink; the same is not true > for packets. You may think to a representation of the network with connections to the "ground" or a "voltage source" for each router to represent/quantify packets that are created/lost in it. Mirco -- Mirco Musolesi Dept. of Computer Science, University College London Gower Street London WC1E 6BT United Kingdom Phone: +44 20 7679 0391 Fax: +44 20 7387 1397 Web: http://www.cs.ucl.ac.uk/staff/m.musolesi From touch at ISI.EDU Sun Dec 3 11:45:27 2006 From: touch at ISI.EDU (Joe Touch) Date: Sun, 03 Dec 2006 11:45:27 -0800 Subject: [e2e] trading acks...TRACKS In-Reply-To: <45729D26.4010505@cs.ucl.ac.uk> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> <45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk> Message-ID: <45732957.1050908@isi.edu> Mirco Musolesi wrote: > >>> However, packet conservation through a router is something that >>> can be aspired to, under limited conditions - thinking about >>> a networking analogue of Kirchoff's electrical laws through the >>> router as a point can actually be useful, too. >> >> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense >> to create/destroy electrons without a source/sink; the same is not true >> for packets. > > You may think to a representation of the network with connections to the > "ground" or a "voltage source" for each router to represent/quantify > packets that are created/lost in it. Right - but then Kirchoff's laws don't apply unless that connection to ground has some impedence (otherwise the entire net is grounded). What's the impedence of a router? :-) I.e., it's dynamic (which is OK) and content-sensitive (which seems hard to model). Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/f01802a6/signature.bin From L.Wood at surrey.ac.uk Sun Dec 3 14:10:02 2006 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Sun, 03 Dec 2006 22:10:02 +0000 Subject: [e2e] trading acks...TRACKS In-Reply-To: <45732957.1050908@isi.edu> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> <45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk> <45732957.1050908@isi.edu> Message-ID: <200612032210.WAA06009@cisco.com> At Sunday 03/12/2006 11:45 -0800, Joe Touch wrote: >Mirco Musolesi wrote: >> >>>> However, packet conservation through a router is something that >>>> can be aspired to, under limited conditions - thinking about >>>> a networking analogue of Kirchoff's electrical laws through the >>>> router as a point can actually be useful, too. >>> >>> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense >>> to create/destroy electrons without a source/sink; the same is not true >>> for packets. >> >> You may think to a representation of the network with connections to the >> "ground" or a "voltage source" for each router to represent/quantify >> packets that are created/lost in it. > >Right - but then Kirchoff's laws don't apply unless that connection to >ground has some impedence resistance! (The impedance has to have a real part, otherwise everything's at ground.) >(otherwise the entire net is grounded). What's >the impedence of a router? :-) I.e., it's dynamic (which is OK) and >content-sensitive (which seems hard to model). The analogy would be that the router's input and output impedances are frequency-sensitive, and the content's sent at different frequencies (ports/addresses/QoS/whatever). Talking of "impedance mismatches" between fat and thin pipes is quite common. L. From avg at kotovnik.com Sun Dec 3 14:32:18 2006 From: avg at kotovnik.com (Vadim Antonov) Date: Sun, 3 Dec 2006 14:32:18 -0800 (PST) Subject: [e2e] trading acks...TRACKS In-Reply-To: <200612032210.WAA06009@cisco.com> Message-ID: > >>>> However, packet conservation through a router is something that > >>>> can be aspired to, under limited conditions - thinking about > >>>> a networking analogue of Kirchoff's electrical laws through the > >>>> router as a point can actually be useful, too. This is all nice and dandy but don't forget that the network has N kinds of "electrons", which are attracted to N end-points independently of each other. The reason why Kirchoff's laws work for steady-state electrical networks (or networks of pipes carrying gasses) is that all particles are the same, and repel each other. Packets can be made to "repel" each other, but making them drift in any direction where the "pressure" (or potential) is lower makes the network useless - as it's whole purpose to move the packets to the destinations, and not to the places where there's no congestion. --vadim From christian.kreibich at cl.cam.ac.uk Sun Dec 3 15:21:02 2006 From: christian.kreibich at cl.cam.ac.uk (Christian Kreibich) Date: Sun, 03 Dec 2006 15:21:02 -0800 Subject: [e2e] trading acks...TRACKS In-Reply-To: References: Message-ID: <1165188062.16726.39.camel@strangepork> On Sun, 2006-12-03 at 14:32 -0800, Vadim Antonov wrote: > Packets can be made to "repel" each other, but making them drift in any > direction where the "pressure" (or potential) is lower makes the network > useless - as it's whole purpose to move the packets to the destinations, > and not to the places where there's no congestion. This is converging on field theory again... http://www.cl.cam.ac.uk/~jac22/talks/fields.pdf -- Cheers, Christian. From L.Wood at surrey.ac.uk Sun Dec 3 16:47:39 2006 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Mon, 04 Dec 2006 00:47:39 +0000 Subject: [e2e] trading acks...TRACKS In-Reply-To: References: <200612032210.WAA06009@cisco.com> Message-ID: <200612040048.AAA12128@cisco.com> At Sunday 03/12/2006 14:32 -0800, Vadim Antonov wrote: >Packets can be made to "repel" each other, that's the norm (packets occupying discrete space/time), unless you use network coding to 'attract' and 'entangle' packets. L. From m.musolesi at cs.ucl.ac.uk Sun Dec 3 17:59:33 2006 From: m.musolesi at cs.ucl.ac.uk (Mirco Musolesi) Date: Mon, 04 Dec 2006 01:59:33 +0000 Subject: [e2e] trading acks...TRACKS In-Reply-To: <45732957.1050908@isi.edu> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> <45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk> <45732957.1050908@isi.edu> Message-ID: <45738105.3040803@cs.ucl.ac.uk> >> You may think to a representation of the network with connections to the >> "ground" or a "voltage source" for each router to represent/quantify >> packets that are created/lost in it. > > Right - but then Kirchoff's laws don't apply unless that connection to > ground has some impedence (otherwise the entire net is grounded). What's > the impedence of a router? :-) I.e., it's dynamic (which is OK) and > content-sensitive (which seems hard to model). Yes, I agree, I was implicitly assuming that you have some resistence to model the packet loss/creation (that can change over time). You may also think to model a router with a sort of building block composed of more complex circuitry to deal with different types of traffic... Mirco -- Mirco Musolesi Dept. of Computer Science, University College London Gower Street London WC1E 6BT United Kingdom Phone: +44 20 7679 0391 Fax: +44 20 7387 1397 Web: http://www.cs.ucl.ac.uk/staff/m.musolesi From touch at ISI.EDU Sun Dec 3 19:07:29 2006 From: touch at ISI.EDU (Joe Touch) Date: Sun, 03 Dec 2006 19:07:29 -0800 Subject: [e2e] trading acks...TRACKS In-Reply-To: <200612032210.WAA06009@cisco.com> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> <45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk> <45732957.1050908@isi.edu> <200612032210.WAA06009@cisco.com> Message-ID: <457390F1.1080608@isi.edu> Lloyd Wood wrote: > At Sunday 03/12/2006 11:45 -0800, Joe Touch wrote: >> Mirco Musolesi wrote: >>>>> However, packet conservation through a router is something that >>>>> can be aspired to, under limited conditions - thinking about >>>>> a networking analogue of Kirchoff's electrical laws through the >>>>> router as a point can actually be useful, too. >>>> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense >>>> to create/destroy electrons without a source/sink; the same is not true >>>> for packets. >>> You may think to a representation of the network with connections to the >>> "ground" or a "voltage source" for each router to represent/quantify >>> packets that are created/lost in it. >> Right - but then Kirchoff's laws don't apply unless that connection to >> ground has some impedence > > resistance! (The impedance has to have a real part, otherwise everything's at ground.) Kirchoff's laws work for impedance too ;-) >> (otherwise the entire net is grounded). What's >> the impedence of a router? :-) I.e., it's dynamic (which is OK) and >> content-sensitive (which seems hard to model). > > The analogy would be that the router's input and output impedances are frequency-sensitive, and the content's sent at different frequencies (ports/addresses/QoS/whatever). > > Talking of "impedance mismatches" between fat and thin pipes is quite common. Agreed, which is why impedance is what I thought of. Resistive Kirchoff's nets don't make sense here, IMO. Joe -- ---------------------------------------- Joe Touch Sr. Network Engineer, USAF TSAT Space Segment -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/44047b3a/signature.bin From detlef.bosau at web.de Mon Dec 4 05:39:52 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 04 Dec 2006 14:39:52 +0100 Subject: [e2e] trading acks...TRACKS In-Reply-To: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk> Message-ID: <45742528.8030007@web.de> O.k., let?s carry cowls to Newcastle :-) (I know, I better should arrange for a trip to the north pole for the next six weeks after sending this post because it?s so stupid.) (BTW: Kind regards from "Rockin? Rudy", the little McDonald?s reindeer, I bought some years ago.) L.Wood at surrey.ac.uk wrote: > > If a packet can't enter the network until one has left, how do > you ever get started in an empty totally quiet network? Simple > reductio ad absurdum suggests that the packet conservation > principle as expressed below is bogus. Not so much isarithmic, > as isacrock. > I personally compare this whole thing to energy as it is kept in a dynamic system. So, the conservation principle basically means nothing else then the energy in this system should be kept constant. So, you have two issues here: 1. Keep the amount of engery constant => don?t add energy to the system before the system has completed some work, i.e. energy has left the system. 2. The question is: How much energy can a system keep? The second issue is addressed by a) probing which yields b) an estimator for a path?s capacity, i.e. CWND. So, you don?t have a "strong" isarithmic system: You can add workload (=emergy) as long as it can be kept and is not dropped by some router. However, it?s a problem to have exact system theoretical model of the Internet or even a single TCP connection. And I even don?t really know where this should be good for. Perhaps for some interesting calculus calisthenics which are interesting for some papers or even some PhD theses. But at least the models I know of are far too much away from a real packet switching network to be really useful. In my opinion, the most basic reasons for the Internet to work acceptable are in fact 1. the conservation principle, which ensures that the workload in the net is not increased in an "unreasonable" way, but 2. there is some reasonable probing (basically the AIMD probing) and particularly, if the path?s capacitor estimation turns to be to large, it is decreased - and anything is fine. So, the Internet works fine with no real congestion collapse. (The most prominent oscillating system suffering from some special case of congestion collapse is perhaps the Takoma bridge disaster http://www.ketchum.org/bridgecollapse.html) O.k., it?s not much wisdom in what I write here. Mainly, I doubt these extremely sophisticated models. Personally, I think mostly of the Takoma bridge - which would still be there if only someone hat limited the energy ;-) - and Newton?s cradle when I try to understand stability issues in the Internet. The latter is particularly descriptive as the number of balls visualizes the workload. You can imagine adding a ball as long there is room in the cradle or taking away a ball, it?s funny :-) Detlef From anil at cmmacs.ernet.in Tue Dec 12 20:32:07 2006 From: anil at cmmacs.ernet.in (V Anil Kumar) Date: Wed, 13 Dec 2006 10:02:07 +0530 (IST) Subject: [e2e] Extracting No. of packets or bytes in a router buffer Message-ID: We are searching for any known techniques to continuously sample (say at every 100 msec interval) the buffer occupancy of router interfaces. The requirement is to extract or estimate the instantaneous value of the number of packets or bytes in the router buffer from another machine in the network, and not the maximum possible router buffer size. Any suggestion, advice or pointer to literature on this? Thanks in advance. Anil From craig at aland.bbn.com Wed Dec 13 10:54:41 2006 From: craig at aland.bbn.com (Craig Partridge) Date: Wed, 13 Dec 2006 13:54:41 -0500 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: Your message of "Wed, 13 Dec 2006 10:02:07 +0530." Message-ID: <20061213185441.34AFF64@aland.bbn.com> Queue sizes are standard SNMP variables and thus could be sampled at these intervals. But it looks as if you want the queues on a per host basis? Craig In message , V A nil Kumar writes: > >We are searching for any known techniques to continuously sample (say at >every 100 msec interval) the buffer occupancy of router interfaces. The >requirement is to extract or estimate the instantaneous value of the >number of packets or bytes in the router buffer from another machine in >the network, and not the maximum possible router buffer size. > >Any suggestion, advice or pointer to literature on this? > >Thanks in advance. > >Anil From detlef.bosau at web.de Wed Dec 13 11:50:21 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 13 Dec 2006 20:50:21 +0100 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <20061213185441.34AFF64@aland.bbn.com> References: <20061213185441.34AFF64@aland.bbn.com> Message-ID: <4580597D.7060901@web.de> Craig Partridge wrote: > Queue sizes are standard SNMP variables and thus could be sampled at > Hm. Do I get sampled SNMP variables timely? Perhaps, the poor samples shall undergo some nasty treatment (wavelet transformation or something similiar) and I?m not quite sure whether SNMP queries might cause a too large jitter in the sampled time series? Detlef From crovella at cs.bu.edu Wed Dec 13 12:06:09 2006 From: crovella at cs.bu.edu (Mark Crovella) Date: Wed, 13 Dec 2006 15:06:09 -0500 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <20061213185441.34AFF64@aland.bbn.com> Message-ID: <0511C607B17F804EBE96FFECD1FD9859F608CC@cs-exs2.cs-nt.bu.edu> Hi Craig, What MIB provides queue sizes? I am not sure that 'standard' is the right word when talking about MIBs :). Maybe some MIBs provide queue sizes but the one most commonly used in backbone routers (MIB-II, RFC 1213) doesn't provide instantaneous queue lengths as far as I know. If I am wrong, please correct me. People have used measures of queuing delay to infer queue lengths. I can't think of a paper that focuses on this, but the general idea is that queue length in bytes has a relationship to queueing delay and link bandwidth (factoring in other sources of delays inside routers). On a sort-of related topic we had a paper a while ago that tried to estimate queue lengths during packet loss events (ie, to estimate buffer sizes or RED parameters). It is Jun Liu, Mark E. Crovella (2001). Using Loss Pairs to Discover Network Properties. In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop 2001. pp. 127--138. http://www.cs.bu.edu/faculty/crovella/paper-archive/imw-losspairs.pdf - Mark > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org] On Behalf Of > Craig Partridge > Sent: Wednesday, December 13, 2006 1:55 PM > To: V Anil Kumar > Cc: end2end-interest at postel.org > Subject: Re: [e2e] Extracting No. of packets or bytes in a > router buffer > > > Queue sizes are standard SNMP variables and thus could be > sampled at these intervals. But it looks as if you want the > queues on a per host basis? > > Craig > > In message > >, V A nil Kumar writes: > > > > >We are searching for any known techniques to continuously > sample (say > >at every 100 msec interval) the buffer occupancy of router > interfaces. > >The requirement is to extract or estimate the instantaneous value of > >the number of packets or bytes in the router buffer from another > >machine in the network, and not the maximum possible router > buffer size. > > > >Any suggestion, advice or pointer to literature on this? > > > >Thanks in advance. > > > >Anil > > From craig at aland.bbn.com Wed Dec 13 12:38:56 2006 From: craig at aland.bbn.com (Craig Partridge) Date: Wed, 13 Dec 2006 15:38:56 -0500 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: Your message of "Wed, 13 Dec 2006 15:06:09 EST." <0511C607B17F804EBE96FFECD1FD9859F608CC@cs-exs2.cs-nt.bu.edu> Message-ID: <20061213203856.2941664@aland.bbn.com> In message <0511C607B17F804EBE96FFECD1FD9859F608CC at cs-exs2.cs-nt.bu.edu>, "Mark >What MIB provides queue sizes? I am not sure that 'standard' is the >right word when talking about MIBs :). Maybe some MIBs provide queue >sizes but the one most commonly used in backbone routers (MIB-II, RFC >1213) doesn't provide instantaneous queue lengths as far as I know. If >I am wrong, please correct me. Hi Mark: >From MIB-II, p. 23: ifOutQLen OBJECT-TYPE SYNTAX Gauge ACCESS read-only STATUS mandatory DESCRIPTION "The length of the output packet queue (in packets)." ::= { ifEntry 21 } If I remember correctly, we put it there in MIB-I. Note this is per output interface. [chair, IETF MIB-I Working Group... not something I advertise widely] Craig From jon.kare.hellan at uninett.no Thu Dec 14 04:35:52 2006 From: jon.kare.hellan at uninett.no (Jon K Hellan) Date: Thu, 14 Dec 2006 13:35:52 +0100 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <20061213203856.2941664@aland.bbn.com> References: <20061213203856.2941664@aland.bbn.com> Message-ID: <45814528.4070901@uninett.no> Craig Partridge wrote: > ifOutQLen OBJECT-TYPE > SYNTAX Gauge > ACCESS read-only > STATUS mandatory > DESCRIPTION > "The length of the output packet queue (in > packets)." > ::= { ifEntry 21 } > > If I remember correctly, we put it there in MIB-I. Note this is per > output interface. The problem is scheduling of snmp polling in the router. It is not at all unlikely that the router will defer this task until it doesn't have anything more important to do, like forwarding packets! If so, the counters are going to report empty queues most of the time. Jon K?re Hellan From Olav.Kvittem at uninett.no Thu Dec 14 05:45:19 2006 From: Olav.Kvittem at uninett.no (Olav.Kvittem@uninett.no) Date: Thu, 14 Dec 2006 14:45:19 +0100 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: Message from V Anil Kumar of "Wed, 13 Dec 2006 10:02:07 +0530." Message-ID: <20061214134519.525A387C6A@tyholt.uninett.no> anil at cmmacs.ernet.in said: > We are searching for any known techniques to continuously sample (say at > every 100 msec interval) the buffer occupancy of router interfaces. We did make a tool that could poll that fast and do accurate timestamps a few years ago and discovered that the routers did not update theirs MIB's that often. Some platforms would not copy statistics from the interface cards more often than a few (5) seconds. Single processor architectures though seemed to have subsecond resolution. >The > requirement is to extract or estimate the instantaneous value of the number > of packets or bytes in the router buffer from another machine in the network, > and not the maximum possible router buffer size. > Any suggestion, advice or pointer to literature on this? did not publish, but the perl-scripts are intact. cheers Olav From jsommers at cs.wisc.edu Thu Dec 14 06:36:02 2006 From: jsommers at cs.wisc.edu (Joel Sommers) Date: Thu, 14 Dec 2006 08:36:02 -0600 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <20061214134519.525A387C6A@tyholt.uninett.no> References: <20061214134519.525A387C6A@tyholt.uninett.no> Message-ID: <4C50B963-502E-4379-8790-20F261717B18@cs.wisc.edu> > anil at cmmacs.ernet.in said: >> We are searching for any known techniques to continuously sample >> (say at >> every 100 msec interval) the buffer occupancy of router interfaces. > > We did make a tool that could poll that fast and do accurate > timestamps > a few years ago and discovered that the routers > did not update theirs MIB's that often. Some platforms would not copy > statistics from the interface cards more often than a few (5) > seconds. Single processor architectures though seemed to have > subsecond > resolution. For example, with a Cisco GSR the update interval (line card to router processor) is about 10 seconds (at least for line cards I've measured). For the measurements we did for "Sizing Router Buffers" (Appenzeller, et al., SIGCOMM 2004), we read the queue length values directly from the line card (i.e., we opened an IOS session on the line card itself and periodically polled the counter). Joel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2380 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061214/24c22e4a/smime.bin From fred at cisco.com Wed Dec 13 12:16:41 2006 From: fred at cisco.com (Fred Baker) Date: Wed, 13 Dec 2006 12:16:41 -0800 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <20061213185441.34AFF64@aland.bbn.com> References: <20061213185441.34AFF64@aland.bbn.com> Message-ID: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> You're talking about ifOutQLen. It was originally proposed in RFC 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 1994). The reason it was deprecated is not documented, but the fundamental issue is that it is non-trivial to calculate and is very ephemeral. The big issue in calculating it is that it is rarely exactly one queue. Consider a simple case on simple hardware available in 1994. +----------+ | | | | | CPU +-+ | | | +----------+ | BUS | +----------+ | +---------+ | | +-+ LANCE | | | | +---------+ | DRAM +-+ | | | +---------+ | | +-+ LANCE | +----------+ | +---------+ I'm using the term "bus" in the most general possible sense - some way for the various devices to get to the common memory. This gets implemented many ways. The AMD 7990 LANCE chip was and is a common Ethernet implementation. It has in front of it a ring in which one can describe up to 2^N messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea at any given time how many messages are waiting - it only knows whether it is working on one right now or is idle, and when switching from message to message it knows whether the next slot it considers contains a message. So it can't keep such a counter. The device driver similarly has a limited view; it might know how many it has put in and how many it has taken out again, but it doesn't know whether the LANCE has perhaps completed some of the messages it hasn't taken out yet. So in the sense of the definition ("The length of the output packet queue (in packets)."), it doesn't know how many are still waiting. In addition, it is common for such queues or rings to be configured pretty small, with excess going into a diffserv- described set of software queues. There are far more general problems. Cisco has a fast forwarding technology that we use on some of our midrange products that calculates when messages should be sent and schedules them in a common calendar queue. Every mumble time units, the traffic that should be sent during THIS time interval are picked up and dispersed to the various interfaces they need to go out. Hence, there isn't a single "output queue", but rather a commingled output schedule that shifts traffic to other output queues at various times - which in turn do something akin to what I described above. Also, in modern equipment one often has forwarders and drivers on NIC cards rather than having some central processor do that. For management purposes, the drivers maintain their counts locally and periodically (perhaps once a second) upload the contents of those counters to a place where management can see them. So when you ask "what is the current queue depth", I have to ask what the hardware has, what of that has already been spent but isn't cleaned up yet, what is in how many software queues, how they are organized, and whether that number has been put somewhere that management can see it. Oh - did I mention encrypt/decrypt units, compressors, and other inline services that might have their own queues associated with them? Yes, there is a definition on the books. I don't know that it answers the question. On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: > > Queue sizes are standard SNMP variables and thus could be sampled at > these intervals. But it looks as if you want the queues on a per host > basis? > > Craig > > In message 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A > nil Kumar writes: > >> >> We are searching for any known techniques to continuously sample >> (say at >> every 100 msec interval) the buffer occupancy of router >> interfaces. The >> requirement is to extract or estimate the instantaneous value of the >> number of packets or bytes in the router buffer from another >> machine in >> the network, and not the maximum possible router buffer size. >> >> Any suggestion, advice or pointer to literature on this? >> >> Thanks in advance. >> >> Anil From algold at rnp.br Thu Dec 14 12:05:01 2006 From: algold at rnp.br (Alexandre Grojsgold) Date: Thu, 14 Dec 2006 18:05:01 -0200 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> References: <20061213185441.34AFF64@aland.bbn.com> <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> Message-ID: <007c01c71fbb$245b3110$6d119330$@br> > -----Original Message----- > From: end2end-interest-bounces at postel.org [mailto:end2end-interest- > bounces at postel.org] On Behalf Of Fred Baker > Sent: quarta-feira, 13 de dezembro de 2006 18:17 > To: Craig Partridge > Cc: end2end-interest at postel.org > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer > > You're talking about ifOutQLen. It was originally proposed in RFC > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 > 1994). The reason it was deprecated is not documented, but the > fundamental issue is that it is non-trivial to calculate and is very > ephemeral. > Sorry, but... I really did not understand. Of course, it?s ephemeral. The link occupancy is also an ephemeral information. During the sending of a packet, it?s 100% busy. Between two packets, it?s utterly idle. It doesn?t mean it's not possible to get some statistics out of the link usage. Like mean byte rate, 5 minute mean byte rate, variance, and so on. The same way it should be possible to get queue statistics out of each outgoing interface in the router. I am really impressed to know it is so difficult to grab this kind of information, since router manufacturers claim they can do magic with queue managing, like diffser, traffic shaping, priority queueing, etc... all of this looking at the queues and making tricks with them. It?s amazing. -- Alexandre. From fred at cisco.com Thu Dec 14 13:26:27 2006 From: fred at cisco.com (Fred Baker) Date: Thu, 14 Dec 2006 13:26:27 -0800 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <007c01c71fbb$245b3110$6d119330$@br> References: <20061213185441.34AFF64@aland.bbn.com> <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> <007c01c71fbb$245b3110$6d119330$@br> Message-ID: On Dec 14, 2006, at 12:05 PM, Alexandre Grojsgold wrote: > Of course, it?s ephemeral. I point that out because it is a fundamental criterion that the SNMP community has been using for a while. It is one thing to enable an NMS to read the configuration of a device (largely static) or read a counter (monotonically increasing, so that subsequent reads tell you what happened between the reads). ifOutQLen is a gauge, which is to say that it looks a lot like a random number in this context. In such a case, the SNMP community will generally suggest that the number is not all that meaningful. > I am really impressed to know it is so difficult to grab this kind > of information, since router manufacturers claim they can do magic > with queue managing, like diffser, traffic shaping, priority > queueing, etc... all of this looking at the queues and making > tricks with them. Do I detect a note of sarcasm? The point is what is known by whom at a particular time. A bit of code looking at a choice of queuing something locally or handing it to the next widget makes a pretty simple determination - when it tries to hand the datagram off the next widget accepts it or not, and if not, it does the local thing. "accepts" can have various meanings - it may actively reject it, or (more probably) has given permission to send some quantum and the quantum is used up. Looking at individual queues, one can do a lot of things such as you mention. The hard part is in a distributed system (a system that has functionality on a variety of cards managed by a variety of communicating processes) to have a single overall view of the entire state of the process at exactly the time one wants to find the answer to the overall question. From denio at gprt.ufpe.br Fri Dec 15 05:42:40 2006 From: denio at gprt.ufpe.br (Denio Mariz) Date: Fri, 15 Dec 2006 11:42:40 -0200 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <007c01c71fbb$245b3110$6d119330$@br> References: <20061213185441.34AFF64@aland.bbn.com> <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> <007c01c71fbb$245b3110$6d119330$@br> Message-ID: > > I am really impressed to know it is so difficult to grab this kind of > information, since router manufacturers claim they can do magic with queue > managing, like diffser, traffic shaping, priority queueing, etc... all of > this looking at the queues and making tricks with them. > I'm supposing that all these magic are done through accessing the information directly, internally in the router. So, I think the real issue here is how frequently the counters are updated into the MIB for external access. Denio. From limkt_2 at hotmail.com Fri Dec 15 16:12:33 2006 From: limkt_2 at hotmail.com (Lim Kong Teong) Date: Sat, 16 Dec 2006 00:12:33 +0000 Subject: [e2e] FLID_DL Simulation In-Reply-To: Message-ID: Hi, I conduct experiments on Flid-Dl, which I got from Digital Fountain. We set the experiment as below: 1 Flid-DL session compete with 4 TCP sessions for 2.5Mb bottleneck link. We use dumb bell topology. The experiments run smoothly, however I got some peculiar results. 1) When I check trace file, I find out Flid-Dl receiver receive empty packet with zero packet size as below: r 3.739743 1 3 fliddl 0 ------- 0 2.0 -2147483608.8888 -1 6094 I make one modification as below, but still get the same result. Change: Packet* p = allocpkt(); to Packet* p = allocpkt(packet_payload_); 2) I calculate the packet received by the 4 TCP receivers using the trace file, and suprisingly the throughput suggest that TCP flows consumed all of the bottleneck bandwidth with total throughput 2.5 Mb. Then, I calculate the bottleneck link utilization, and got the link utilization of 2.5 Mb. However, the statistics from Flid suggest Flid session used approximately 0.5 Mb of bottleneck link. Any explanation or suggestion please! TQ. Lim _________________________________________________________________ Talk now to your Hotmail contacts with Windows Live Messenger. http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview From L.Wood at surrey.ac.uk Sat Dec 16 01:39:10 2006 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Sat, 16 Dec 2006 09:39:10 +0000 Subject: [e2e] FLID_DL Simulation In-Reply-To: References: Message-ID: <200612160939.JAA03848@cisco.com> ask the ns-users list. At Saturday 16/12/2006 00:12 +0000, Lim Kong Teong wrote: >Hi, > >I conduct experiments on Flid-Dl, which I got from Digital >Fountain. We set the experiment as below: > >1 Flid-DL session compete with 4 TCP sessions for 2.5Mb >bottleneck link. We use dumb bell topology. > >The experiments run smoothly, however I got some >peculiar results. > >1) When I check trace file, I find out Flid-Dl receiver >receive empty packet with zero packet size as below: > >r 3.739743 1 3 fliddl 0 ------- 0 2.0 -2147483608.8888 -1 6094 > >I make one modification as below, but still get the same result. >Change: Packet* p = allocpkt(); to Packet* p = allocpkt(packet_payload_); > >2) I calculate the packet received by the 4 TCP receivers using the >trace file, and suprisingly the throughput suggest that TCP flows >consumed all of the bottleneck bandwidth with total throughput >2.5 Mb. Then, I calculate the bottleneck link utilization, and got >the link utilization of 2.5 Mb. >However, the statistics from Flid suggest Flid session used approximately >0.5 Mb of bottleneck link. > >Any explanation or suggestion please! > >TQ. > >Lim > >_________________________________________________________________ >Talk now to your Hotmail contacts with Windows Live Messenger. http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview > From cjs at cs.ucc.ie Wed Dec 20 01:15:30 2006 From: cjs at cs.ucc.ie (Cormac J. Sreenan) Date: Wed, 20 Dec 2006 09:15:30 +0000 Subject: [e2e] CFP: Workshop on Embedded Networked Sensors (EmNets'07) Message-ID: <4588FF32.20406@cs.ucc.ie> Our apologies if you receive multiple copies of the EmNets CFP. ------------------------------------------------------------------------ *********************************************************** CALL FOR PAPERS Fourth Workshop on Embedded Networked Sensors (EmNets 2007) Cork, Ireland 25-26 June 2007 www.cs.ucc.ie/emnets2007 *********************************************************** The Fourth Workshop on Embedded Networked Sensors (EmNets 2007) brings together wireless sensor network researchers from academic and industrial backgrounds to present groundbreaking results that will shed light on present and future research challenges. The workshop emphasises results from experiments or deployments that quantify the challenges in the wireless sensor systems of today as well as early results from new ideas that introduce promising approaches that will define the challenges in the wireless sensor systems of tomorrow. We especially welcome papers reporting on results that refute common assumptions, deployment experiences, novel and original approaches, and, more generally, papers that will help inform and guide research. The EmNets Program Committee discourages submissions that are short versions of papers that will be submitted to other conferences in the near future, since its goal is to engage the research community in a discussion of future challenges and issues. Topics of interest include, but are not limited to: Validation/refutation of prior results Application experiences: measurements, successes and failures Future applications: requirements and challenges Hardware platforms, tradeoffs, and trends Data and network storage Delay-tolerant networking Management, debugging, and troubleshooting Network and software reliability Network and system architectures Software bug detection and tools Energy sources, scavenging, and low-power operation Human-Computer interfaces for sensornets Benchmarks and evaluation suites All papers will be subject to peer review. Accepted papers will appear in a formal published proceedings. IMPORTANT DATES: Submission Deadline: March 9, 2007 (5 pages) Notification: April 30, 2007 Camera Ready Due: May 21, 2007 Workshop: June 25-6, 2007 ORGANIZATION: General Chair: Cormac J. Sreenan, University College Cork cjs at cs.ucc.ie Program Co-Chairs: Philip Levis, Stanford University pal at cs.stanford.edu Joe Paradiso, MIT joep at media.mit.edu Technical Program Committee: Jan Beutel, ETH Zurich Kieren Delaney, Cork Institute of Technology Terry Dishongh, Intel Corporation Henri Dubois-Ferriere, EPFL Deborah Estrin, UCLA David Gay, Intel Research Berkeley Michel Goraczo, Microsoft Research Margaret Martonosi, Princeton University Mike Masquelier, Motorola G.Q. Maguire Jr., KTH Sweden Paddy Nixon, University College Dublin Robert Poor, Adozu, Inc. Frank Schmidt, EnOcean John Regehr, University of Utah Frank Schmidt, EnOcean Randy Smith, Sun Microsystems Jack Stankovic, University of Virginia Robert Szewczyk, Moteiv Inc. Henry Tirri, Nokia Peter van der Stok, Philips, Eindhoven University of Technology Guang-Zhong Yang, Imperial College London Kazuo Yano, Hitachi ------------------------------------------------------------------ From lynne at telemuse.net Wed Dec 20 13:33:50 2006 From: lynne at telemuse.net (Lynne Jolitz) Date: Wed, 20 Dec 2006 13:33:50 -0800 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com> Message-ID: <002901c7247e$89e4c920$6e8944c6@telemuse.net> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols. Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge. The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering). Lynne Jolitz ---- We use SpamQuiz. If your ISP didn't make the grade try http://lynne.telemuse.net > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker > Sent: Wednesday, December 13, 2006 12:17 PM > To: Craig Partridge > Cc: end2end-interest at postel.org > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer > > > You're talking about ifOutQLen. It was originally proposed in RFC > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 > 1994). The reason it was deprecated is not documented, but the > fundamental issue is that it is non-trivial to calculate and is very > ephemeral. > > The big issue in calculating it is that it is rarely exactly one > queue. Consider a simple case on simple hardware available in 1994. > > +----------+ | > | | | > | CPU +-+ > | | | > +----------+ | BUS > | > +----------+ | +---------+ > | | +-+ LANCE | > | | | +---------+ > | DRAM +-+ > | | | +---------+ > | | +-+ LANCE | > +----------+ | +---------+ > > I'm using the term "bus" in the most general possible sense - some > way for the various devices to get to the common memory. This gets > implemented many ways. > > The AMD 7990 LANCE chip was and is a common Ethernet implementation. > It has in front of it a ring in which one can describe up to 2^N > messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea > at any given time how many messages are waiting - it only knows > whether it is working on one right now or is idle, and when switching > from message to message it knows whether the next slot it considers > contains a message. So it can't keep such a counter. The device > driver similarly has a limited view; it might know how many it has > put in and how many it has taken out again, but it doesn't know > whether the LANCE has perhaps completed some of the messages it > hasn't taken out yet. So in the sense of the definition ("The length > of the output packet queue (in packets)."), it doesn't know how many > are still waiting. In addition, it is common for such queues or rings > to be configured pretty small, with excess going into a diffserv- > described set of software queues. > > There are far more general problems. Cisco has a fast forwarding > technology that we use on some of our midrange products that > calculates when messages should be sent and schedules them in a > common calendar queue. Every mumble time units, the traffic that > should be sent during THIS time interval are picked up and dispersed > to the various interfaces they need to go out. Hence, there isn't a > single "output queue", but rather a commingled output schedule that > shifts traffic to other output queues at various times - which in > turn do something akin to what I described above. > > Also, in modern equipment one often has forwarders and drivers on NIC > cards rather than having some central processor do that. For > management purposes, the drivers maintain their counts locally and > periodically (perhaps once a second) upload the contents of those > counters to a place where management can see them. > > So when you ask "what is the current queue depth", I have to ask what > the hardware has, what of that has already been spent but isn't > cleaned up yet, what is in how many software queues, how they are > organized, and whether that number has been put somewhere that > management can see it. > > Oh - did I mention encrypt/decrypt units, compressors, and other > inline services that might have their own queues associated with them? > > Yes, there is a definition on the books. I don't know that it answers > the question. > > On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: > > > > > Queue sizes are standard SNMP variables and thus could be sampled at > > these intervals. But it looks as if you want the queues on a per host > > basis? > > > > Craig > > > > In message > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A > > nil Kumar writes: > > > >> > >> We are searching for any known techniques to continuously sample > >> (say at > >> every 100 msec interval) the buffer occupancy of router > >> interfaces. The > >> requirement is to extract or estimate the instantaneous value of the > >> number of packets or bytes in the router buffer from another > >> machine in > >> the network, and not the maximum possible router buffer size. > >> > >> Any suggestion, advice or pointer to literature on this? > >> > >> Thanks in advance. > >> > >> Anil > From mathis at psc.edu Fri Dec 22 11:09:43 2006 From: mathis at psc.edu (Matt Mathis) Date: Fri, 22 Dec 2006 14:09:43 -0500 (EST) Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <002901c7247e$89e4c920$6e8944c6@telemuse.net> References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> Message-ID: Another approach is to get accurate time stamps of ingress/egress packets and use the difference in the time stamps to compute effective queue depths. The NLANR PMA team was building a "router clamp", an "octopus" designed to get traces from all interfaces of a busy Internet2 core router. I have since lost track of the details. Google "router clamp pma" for clues. I basically don't believe queue depths measured by any other means, because there are so many cascaded queues in a typical modern router. I point out that most NIC's have short queues right at the wire, along with every DMA engine and bus arbitrator, etc. Claiming that an internal software instrument accurately represents the true aggregate queue depth for the router is equivalent to asserting that none of the other potential bottlenecks in the router have any queued packets. If they never have queued packets, why did the HW people bother with the silicon? I conclude there is always potential for packets to be queued out of scope of the software instruments. It's a long story, but I have first hand experience with one of these cases: my external measurement of maximum queues size was only half of the design size, because the "wrong" bottleneck dominated. Good luck, --MM-- ------------------------------------------- Matt Mathis http://www.psc.edu/~mathis Work:412.268.3319 Home/Cell:412.654.7529 ------------------------------------------- Evil is defined by mortals who think they know "The Truth" and use force to apply it to others. On Wed, 20 Dec 2006, Lynne Jolitz wrote: > Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols. > > Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge. > > The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering). > > Lynne Jolitz > > ---- > We use SpamQuiz. > If your ISP didn't make the grade try http://lynne.telemuse.net > > > -----Original Message----- > > From: end2end-interest-bounces at postel.org > > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker > > Sent: Wednesday, December 13, 2006 12:17 PM > > To: Craig Partridge > > Cc: end2end-interest at postel.org > > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer > > > > > > You're talking about ifOutQLen. It was originally proposed in RFC > > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 > > 1994). The reason it was deprecated is not documented, but the > > fundamental issue is that it is non-trivial to calculate and is very > > ephemeral. > > > > The big issue in calculating it is that it is rarely exactly one > > queue. Consider a simple case on simple hardware available in 1994. > > > > +----------+ | > > | | | > > | CPU +-+ > > | | | > > +----------+ | BUS > > | > > +----------+ | +---------+ > > | | +-+ LANCE | > > | | | +---------+ > > | DRAM +-+ > > | | | +---------+ > > | | +-+ LANCE | > > +----------+ | +---------+ > > > > I'm using the term "bus" in the most general possible sense - some > > way for the various devices to get to the common memory. This gets > > implemented many ways. > > > > The AMD 7990 LANCE chip was and is a common Ethernet implementation. > > It has in front of it a ring in which one can describe up to 2^N > > messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea > > at any given time how many messages are waiting - it only knows > > whether it is working on one right now or is idle, and when switching > > from message to message it knows whether the next slot it considers > > contains a message. So it can't keep such a counter. The device > > driver similarly has a limited view; it might know how many it has > > put in and how many it has taken out again, but it doesn't know > > whether the LANCE has perhaps completed some of the messages it > > hasn't taken out yet. So in the sense of the definition ("The length > > of the output packet queue (in packets)."), it doesn't know how many > > are still waiting. In addition, it is common for such queues or rings > > to be configured pretty small, with excess going into a diffserv- > > described set of software queues. > > > > There are far more general problems. Cisco has a fast forwarding > > technology that we use on some of our midrange products that > > calculates when messages should be sent and schedules them in a > > common calendar queue. Every mumble time units, the traffic that > > should be sent during THIS time interval are picked up and dispersed > > to the various interfaces they need to go out. Hence, there isn't a > > single "output queue", but rather a commingled output schedule that > > shifts traffic to other output queues at various times - which in > > turn do something akin to what I described above. > > > > Also, in modern equipment one often has forwarders and drivers on NIC > > cards rather than having some central processor do that. For > > management purposes, the drivers maintain their counts locally and > > periodically (perhaps once a second) upload the contents of those > > counters to a place where management can see them. > > > > So when you ask "what is the current queue depth", I have to ask what > > the hardware has, what of that has already been spent but isn't > > cleaned up yet, what is in how many software queues, how they are > > organized, and whether that number has been put somewhere that > > management can see it. > > > > Oh - did I mention encrypt/decrypt units, compressors, and other > > inline services that might have their own queues associated with them? > > > > Yes, there is a definition on the books. I don't know that it answers > > the question. > > > > On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: > > > > > > > > Queue sizes are standard SNMP variables and thus could be sampled at > > > these intervals. But it looks as if you want the queues on a per host > > > basis? > > > > > > Craig > > > > > > In message > > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A > > > nil Kumar writes: > > > > > >> > > >> We are searching for any known techniques to continuously sample > > >> (say at > > >> every 100 msec interval) the buffer occupancy of router > > >> interfaces. The > > >> requirement is to extract or estimate the instantaneous value of the > > >> number of packets or bytes in the router buffer from another > > >> machine in > > >> the network, and not the maximum possible router buffer size. > > >> > > >> Any suggestion, advice or pointer to literature on this? > > >> > > >> Thanks in advance. > > >> > > >> Anil > > > From mathis at psc.edu Sat Dec 23 08:01:42 2006 From: mathis at psc.edu (Matt Mathis) Date: Sat, 23 Dec 2006 11:01:42 -0500 (EST) Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: <1166832054.9009.171.camel@officepc-junliu> References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> <1166832054.9009.171.camel@officepc-junliu> Message-ID: ICMP has been dead as a measurement protocol for about 10 years now. The problem is that nearly all implementations process ICMP at substantially lower priority than other protocols, so the measurements are far worse than reality. I think you are looking for something more along the lines of IPMP, the IP measurement protocol. Look for the expired Internet drafts: draft-bennett-ippm-ipmp-01 2003-03-05 Expired draft-mcgregor-ipmp-04 2004-02-04 Expired There is also a report by several people including Fred Baker and me, analyzing these two conflicting drafts, and proposing yet another variant. I couldn't find the report quickly. Perhaps Fred has a copy.....? If you want to follow this thread, be sure to engage the router vendors/large ISP's early and listen to them carefully, because the academic and industrial agendas clash very badly. (You should read the report first.) Thanks, --MM-- ------------------------------------------- Matt Mathis http://www.psc.edu/~mathis Work:412.268.3319 Home/Cell:412.654.7529 ------------------------------------------- Evil is defined by mortals who think they know "The Truth" and use force to apply it to others. On Fri, 22 Dec 2006, Jun Liu wrote: > I am amazed by this thread of discussion. The key issue of correctly > estimating the queuing delay at a particular router is to make the > queuing delay of interest distinct from the delays caused by other > factors. I agree with Matt Mathis' opinion that the difference of a pair > of timestamps experienced by an IP packet at a router > closely characterizes the queuing delay of this packet at this router. > However, it is inconvenient for an end system to obtain the values of > the difference of time-stamp pairs. The NLANR PMA Router Clamp has only > been installed surrounding one core router and relies > on special measurement circuits. The data measured by Clamp is suitable > for statistics analysis rather than providing dynamic indications to end > hosts. > > I have been working on estimating the maximum queuing delay at the > outbound queue of the slowest link along an end-to-end path. Here, a > slowest link refers to a link with the longest maximum queuing delay > along the path. The queuing delay at the slowest link can be estimated > from measured RTTs along the path. If the histogram of a set of measured > RTTs has a single mode, then the maximum queuing delay at the slowest > link can be approximated by the delay value at the mode less the value > of the minimum RTT. The estimation of the maximum queuing delay at the > slowest link is largely affected by the non-ignorable queuing delays at > other routers. For example, a histogram of measured RTTs can have > multiple modes when there are two or more identical slowest links in a > path. Hence, appropriate technique of filtering noises is necessary. > However, multimodal based estimation issues remain unsolved. > > I am thinking of modifying the ICMP protocol to serve for carrying > dynamic delay information at routers to end hosts. The reason of > considering ICMP is due to two concerns. First, ICMP should have been > implemented at all routers and end hosts. "ICMP, uses the basic support > of IP as if it were a higher level protocol, however, ICMP is actually > an integral part of IP, and must be implemented by every IP > module." [RFC 792] Second, a lot of active probing based network > measurement methods were developed based on ICMP. > > Currently, an ICMP error reporting message is sent by a router upon > processing an erroneous IP packet and is routed back to the sender of > this IP packet. When this happens, the IP packet is dropped at the > router. Let's call an erroneous IP packet an echo, and the corresponding > ICMP packet an echo reply. The proposed modification is to make a pair > of echo and echo reply packets co-exist in the network. Namely, an echo > packet is kept routed to its destination after it has triggered an echo > reply which will be sent back to the sender of this echo. When we assume > that another echo reply will be sent by the destination of this echo > packet, the sender will obtain two echo reply packets on one echo. The > RTTs of the two echo reply packets share delays on the common links they > both traversed. > > Consider a simple network shown below. We denote by d(x,y) the delay > from network node x to y. d(x,y) consists of the link latency, > transmission delay on link (x,y), and the delay in node x (which is a > sum of queuing and processing delays within x). We are about to estimate > the queuing delay at router B (either dynamic delays or the maximum > delay). We consider a worst case scenario by assuming that d(A,B) and > d(B,D) always have similar dynamic values. This scenario happens when > the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of > the two router have the same size, and the same traffic pattern is on > routers A and B. > > d(S,A) d(A,B) d(B,D) > Sender -------------------> R_A ----------------> R_B > ------------------> Destination > <------------------- <---------------- > <------------------ > d(A,S) d(B,A) d(D,B) > > If the sender can make both router B and the destination send an echo > reply on every echo packet it sends, then the difference of the RTTs > between the two echo reply packets offers us a value of (d(B,D)+d(D,B)). > This value much closely characterizes the queuing delay at router B than > using pure RTTs. This method makes queuing delay information timely > delivered to an end node---the sender of the echo packets. > > The method described here is somewhat similar to the idea adopted in van > Jacobson's work of pathchar which incrementally measures the link > bandwidth hop-by-hop from the link next to the source to the link next > to the destination. However, there are two differences. First, in > pathchar, only one echo reply can be triggered by an echo, and a pair of > echo and echo reply can not co-exist in the network. Second, in > pathchar, the RTTs of echo reply packets taking different path lengths > do not necessarily share common delay portions. > > Two obvious side effects of this modified ICMP protocol are the overhead > and the security issues. Higher overhead is made because of the > co-existence of echo and echo reply packets in the network. One echo > packet can potentially trigger as many echo reply packets as the number > of intermediate routers between a pair of sender and destination. Thus, > the security issue deserves consideration. > > My question here is that whether such modification on ICMP is > acceptable, or it simply introduces a new evil. > > Jun Liu > > On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote: > > Another approach is to get accurate time stamps of ingress/egress packets and > > use the difference in the time stamps to compute effective queue depths. The > > NLANR PMA team was building a "router clamp", an "octopus" designed to get > > traces from all interfaces of a busy Internet2 core router. I have since lost > > track of the details. Google "router clamp pma" for clues. > > > > I basically don't believe queue depths measured by any other means, because > > there are so many cascaded queues in a typical modern router. I point out > > that most NIC's have short queues right at the wire, along with every DMA > > engine and bus arbitrator, etc. > > > > Claiming that an internal software instrument accurately represents the true > > aggregate queue depth for the router is equivalent to asserting that none of > > the other potential bottlenecks in the router have any queued packets. If they > > never have queued packets, why did the HW people bother with the silicon? I > > conclude there is always potential for packets to be queued out of scope of > > the software instruments. > > > > It's a long story, but I have first hand experience with one of these cases: > > my external measurement of maximum queues size was only half of the design size, > > because the "wrong" bottleneck dominated. > > > > Good luck, > > --MM-- > > ------------------------------------------- > > Matt Mathis http://www.psc.edu/~mathis > > Work:412.268.3319 Home/Cell:412.654.7529 > > ------------------------------------------- > > Evil is defined by mortals who think they know > > "The Truth" and use force to apply it to others. > > > > On Wed, 20 Dec 2006, Lynne Jolitz wrote: > > > > > Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols. > > > > > > Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge. > > > > > > The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering). > > > > > > Lynne Jolitz > > > > > > ---- > > > We use SpamQuiz. > > > If your ISP didn't make the grade try http://lynne.telemuse.net > > > > > > > -----Original Message----- > > > > From: end2end-interest-bounces at postel.org > > > > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker > > > > Sent: Wednesday, December 13, 2006 12:17 PM > > > > To: Craig Partridge > > > > Cc: end2end-interest at postel.org > > > > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer > > > > > > > > > > > > You're talking about ifOutQLen. It was originally proposed in RFC > > > > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 > > > > 1994). The reason it was deprecated is not documented, but the > > > > fundamental issue is that it is non-trivial to calculate and is very > > > > ephemeral. > > > > > > > > The big issue in calculating it is that it is rarely exactly one > > > > queue. Consider a simple case on simple hardware available in 1994. > > > > > > > > +----------+ | > > > > | | | > > > > | CPU +-+ > > > > | | | > > > > +----------+ | BUS > > > > | > > > > +----------+ | +---------+ > > > > | | +-+ LANCE | > > > > | | | +---------+ > > > > | DRAM +-+ > > > > | | | +---------+ > > > > | | +-+ LANCE | > > > > +----------+ | +---------+ > > > > > > > > I'm using the term "bus" in the most general possible sense - some > > > > way for the various devices to get to the common memory. This gets > > > > implemented many ways. > > > > > > > > The AMD 7990 LANCE chip was and is a common Ethernet implementation. > > > > It has in front of it a ring in which one can describe up to 2^N > > > > messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea > > > > at any given time how many messages are waiting - it only knows > > > > whether it is working on one right now or is idle, and when switching > > > > from message to message it knows whether the next slot it considers > > > > contains a message. So it can't keep such a counter. The device > > > > driver similarly has a limited view; it might know how many it has > > > > put in and how many it has taken out again, but it doesn't know > > > > whether the LANCE has perhaps completed some of the messages it > > > > hasn't taken out yet. So in the sense of the definition ("The length > > > > of the output packet queue (in packets)."), it doesn't know how many > > > > are still waiting. In addition, it is common for such queues or rings > > > > to be configured pretty small, with excess going into a diffserv- > > > > described set of software queues. > > > > > > > > There are far more general problems. Cisco has a fast forwarding > > > > technology that we use on some of our midrange products that > > > > calculates when messages should be sent and schedules them in a > > > > common calendar queue. Every mumble time units, the traffic that > > > > should be sent during THIS time interval are picked up and dispersed > > > > to the various interfaces they need to go out. Hence, there isn't a > > > > single "output queue", but rather a commingled output schedule that > > > > shifts traffic to other output queues at various times - which in > > > > turn do something akin to what I described above. > > > > > > > > Also, in modern equipment one often has forwarders and drivers on NIC > > > > cards rather than having some central processor do that. For > > > > management purposes, the drivers maintain their counts locally and > > > > periodically (perhaps once a second) upload the contents of those > > > > counters to a place where management can see them. > > > > > > > > So when you ask "what is the current queue depth", I have to ask what > > > > the hardware has, what of that has already been spent but isn't > > > > cleaned up yet, what is in how many software queues, how they are > > > > organized, and whether that number has been put somewhere that > > > > management can see it. > > > > > > > > Oh - did I mention encrypt/decrypt units, compressors, and other > > > > inline services that might have their own queues associated with them? > > > > > > > > Yes, there is a definition on the books. I don't know that it answers > > > > the question. > > > > > > > > On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: > > > > > > > > > > > > > > Queue sizes are standard SNMP variables and thus could be sampled at > > > > > these intervals. But it looks as if you want the queues on a per host > > > > > basis? > > > > > > > > > > Craig > > > > > > > > > > In message > > > > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A > > > > > nil Kumar writes: > > > > > > > > > >> > > > > >> We are searching for any known techniques to continuously sample > > > > >> (say at > > > > >> every 100 msec interval) the buffer occupancy of router > > > > >> interfaces. The > > > > >> requirement is to extract or estimate the instantaneous value of the > > > > >> number of packets or bytes in the router buffer from another > > > > >> machine in > > > > >> the network, and not the maximum possible router buffer size. > > > > >> > > > > >> Any suggestion, advice or pointer to literature on this? > > > > >> > > > > >> Thanks in advance. > > > > >> > > > > >> Anil > > > > > > > > From fred at cisco.com Fri Dec 22 14:54:03 2006 From: fred at cisco.com (Fred Baker) Date: Fri, 22 Dec 2006 14:54:03 -0800 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> Message-ID: <8A57DB74-D3CE-4FD4-9E34-0FAFC4D1BFAE@cisco.com> On Dec 22, 2006, at 11:09 AM, Matt Mathis wrote: > Another approach is to get accurate time stamps of ingress/egress > packets and > use the difference in the time stamps to compute effective queue > depths. I'm not sure I believe that approach. A TDM drop-and-insert- multiplexor has a constant queue depth and 100% utilization, while a statistical multiplexor has a variable queue depth when it has 100% utilization. The depth of the queue becomes a question of your model of the arrival and departure processes. From dpreed at reed.com Sat Dec 23 13:01:08 2006 From: dpreed at reed.com (David P. Reed) Date: Sat, 23 Dec 2006 16:01:08 -0500 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> <1166832054.9009.171.camel@officepc-junliu> Message-ID: <458D9914.3020608@reed.com> I find the first sentence here very odd. ICMP is used every day. It is hardly dead. Perhaps you meant that it doesn't work very well? The real point you are making here is that *any* measurement protocol that can be distinguished from regular traffic by routers is at high risk of generating completely *wrong* answers, for two reasons: 1. Router vendors find it convenient to make their routers privilege real traffic over measurement overhead. 2. There is a constant temptation to "game" any benchmarking tests that vendors tend to accede to. Academics do the same thing when they are proposing great new ideas that they want to sell - so this isn't a statement that says commercial is bad and academic has the moral high ground. (the benchmarking game in the database business (TP1) or the processor business (MIPS or FLOPS according to standard benchmarks) or the 3D graphics business are all unfortunately gamed every day). Purveyors of ideas are tempted to lie or spin performance numbers. That's the high-tech industry version of I.F.Stone's: "governments lie". Why would a router vendor offer to report a reliable number over SNMP? So the general conclusion one should draw from this is that performance measurements should be done without the help of vendors or proposers (call them purveyors), with a great deal of effort put into measuring "real" cases that cannot be detected and distorted by purveyor interpretations that either: a. allow the purveyor to claim that the measurement is bogus (ICMP should never have been broken by vendor optimizations, but it was in their interest to do so as noted above) or b. allow the purveyor to generate much better numbers than will ever be seen in practice, either by special casing measurement packets, or putting the definition of the measurement being made in the hands of the purveyor. Matt Mathis wrote: > ICMP has been dead as a measurement protocol for about 10 years now. The > problem is that nearly all implementations process ICMP at substantially lower > priority than other protocols, so the measurements are far worse than reality. > > I think you are looking for something more along the lines of IPMP, the IP > measurement protocol. Look for the expired Internet drafts: > draft-bennett-ippm-ipmp-01 2003-03-05 Expired > draft-mcgregor-ipmp-04 2004-02-04 Expired > > There is also a report by several people including Fred Baker and me, > analyzing these two conflicting drafts, and proposing yet another variant. I > couldn't find the report quickly. Perhaps Fred has a copy.....? > > If you want to follow this thread, be sure to engage the router vendors/large > ISP's early and listen to them carefully, because the academic and industrial > agendas clash very badly. (You should read the report first.) > > Thanks, > --MM-- > ------------------------------------------- > Matt Mathis http://www.psc.edu/~mathis > Work:412.268.3319 Home/Cell:412.654.7529 > ------------------------------------------- > Evil is defined by mortals who think they know > "The Truth" and use force to apply it to others. > > On Fri, 22 Dec 2006, Jun Liu wrote: > > >> I am amazed by this thread of discussion. The key issue of correctly >> estimating the queuing delay at a particular router is to make the >> queuing delay of interest distinct from the delays caused by other >> factors. I agree with Matt Mathis' opinion that the difference of a pair >> of timestamps experienced by an IP packet at a router >> closely characterizes the queuing delay of this packet at this router. >> However, it is inconvenient for an end system to obtain the values of >> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only >> been installed surrounding one core router and relies >> on special measurement circuits. The data measured by Clamp is suitable >> for statistics analysis rather than providing dynamic indications to end >> hosts. >> >> I have been working on estimating the maximum queuing delay at the >> outbound queue of the slowest link along an end-to-end path. Here, a >> slowest link refers to a link with the longest maximum queuing delay >> along the path. The queuing delay at the slowest link can be estimated >> from measured RTTs along the path. If the histogram of a set of measured >> RTTs has a single mode, then the maximum queuing delay at the slowest >> link can be approximated by the delay value at the mode less the value >> of the minimum RTT. The estimation of the maximum queuing delay at the >> slowest link is largely affected by the non-ignorable queuing delays at >> other routers. For example, a histogram of measured RTTs can have >> multiple modes when there are two or more identical slowest links in a >> path. Hence, appropriate technique of filtering noises is necessary. >> However, multimodal based estimation issues remain unsolved. >> >> I am thinking of modifying the ICMP protocol to serve for carrying >> dynamic delay information at routers to end hosts. The reason of >> considering ICMP is due to two concerns. First, ICMP should have been >> implemented at all routers and end hosts. "ICMP, uses the basic support >> of IP as if it were a higher level protocol, however, ICMP is actually >> an integral part of IP, and must be implemented by every IP >> module." [RFC 792] Second, a lot of active probing based network >> measurement methods were developed based on ICMP. >> >> Currently, an ICMP error reporting message is sent by a router upon >> processing an erroneous IP packet and is routed back to the sender of >> this IP packet. When this happens, the IP packet is dropped at the >> router. Let's call an erroneous IP packet an echo, and the corresponding >> ICMP packet an echo reply. The proposed modification is to make a pair >> of echo and echo reply packets co-exist in the network. Namely, an echo >> packet is kept routed to its destination after it has triggered an echo >> reply which will be sent back to the sender of this echo. When we assume >> that another echo reply will be sent by the destination of this echo >> packet, the sender will obtain two echo reply packets on one echo. The >> RTTs of the two echo reply packets share delays on the common links they >> both traversed. >> >> Consider a simple network shown below. We denote by d(x,y) the delay >> from network node x to y. d(x,y) consists of the link latency, >> transmission delay on link (x,y), and the delay in node x (which is a >> sum of queuing and processing delays within x). We are about to estimate >> the queuing delay at router B (either dynamic delays or the maximum >> delay). We consider a worst case scenario by assuming that d(A,B) and >> d(B,D) always have similar dynamic values. This scenario happens when >> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of >> the two router have the same size, and the same traffic pattern is on >> routers A and B. >> >> d(S,A) d(A,B) d(B,D) >> Sender -------------------> R_A ----------------> R_B >> ------------------> Destination >> <------------------- <---------------- >> <------------------ >> d(A,S) d(B,A) d(D,B) >> >> If the sender can make both router B and the destination send an echo >> reply on every echo packet it sends, then the difference of the RTTs >> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)). >> This value much closely characterizes the queuing delay at router B than >> using pure RTTs. This method makes queuing delay information timely >> delivered to an end node---the sender of the echo packets. >> >> The method described here is somewhat similar to the idea adopted in van >> Jacobson's work of pathchar which incrementally measures the link >> bandwidth hop-by-hop from the link next to the source to the link next >> to the destination. However, there are two differences. First, in >> pathchar, only one echo reply can be triggered by an echo, and a pair of >> echo and echo reply can not co-exist in the network. Second, in >> pathchar, the RTTs of echo reply packets taking different path lengths >> do not necessarily share common delay portions. >> >> Two obvious side effects of this modified ICMP protocol are the overhead >> and the security issues. Higher overhead is made because of the >> co-existence of echo and echo reply packets in the network. One echo >> packet can potentially trigger as many echo reply packets as the number >> of intermediate routers between a pair of sender and destination. Thus, >> the security issue deserves consideration. >> >> My question here is that whether such modification on ICMP is >> acceptable, or it simply introduces a new evil. >> >> Jun Liu >> >> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote: >> >>> Another approach is to get accurate time stamps of ingress/egress packets and >>> use the difference in the time stamps to compute effective queue depths. The >>> NLANR PMA team was building a "router clamp", an "octopus" designed to get >>> traces from all interfaces of a busy Internet2 core router. I have since lost >>> track of the details. Google "router clamp pma" for clues. >>> >>> I basically don't believe queue depths measured by any other means, because >>> there are so many cascaded queues in a typical modern router. I point out >>> that most NIC's have short queues right at the wire, along with every DMA >>> engine and bus arbitrator, etc. >>> >>> Claiming that an internal software instrument accurately represents the true >>> aggregate queue depth for the router is equivalent to asserting that none of >>> the other potential bottlenecks in the router have any queued packets. If they >>> never have queued packets, why did the HW people bother with the silicon? I >>> conclude there is always potential for packets to be queued out of scope of >>> the software instruments. >>> >>> It's a long story, but I have first hand experience with one of these cases: >>> my external measurement of maximum queues size was only half of the design size, >>> because the "wrong" bottleneck dominated. >>> >>> Good luck, >>> --MM-- >>> ------------------------------------------- >>> Matt Mathis http://www.psc.edu/~mathis >>> Work:412.268.3319 Home/Cell:412.654.7529 >>> ------------------------------------------- >>> Evil is defined by mortals who think they know >>> "The Truth" and use force to apply it to others. >>> >>> On Wed, 20 Dec 2006, Lynne Jolitz wrote: >>> >>> >>>> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols. >>>> >>>> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge. >>>> >>>> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering). >>>> >>>> Lynne Jolitz >>>> >>>> ---- >>>> We use SpamQuiz. >>>> If your ISP didn't make the grade try http://lynne.telemuse.net >>>> >>>> >>>>> -----Original Message----- >>>>> From: end2end-interest-bounces at postel.org >>>>> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker >>>>> Sent: Wednesday, December 13, 2006 12:17 PM >>>>> To: Craig Partridge >>>>> Cc: end2end-interest at postel.org >>>>> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer >>>>> >>>>> >>>>> You're talking about ifOutQLen. It was originally proposed in RFC >>>>> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 >>>>> 1994). The reason it was deprecated is not documented, but the >>>>> fundamental issue is that it is non-trivial to calculate and is very >>>>> ephemeral. >>>>> >>>>> The big issue in calculating it is that it is rarely exactly one >>>>> queue. Consider a simple case on simple hardware available in 1994. >>>>> >>>>> +----------+ | >>>>> | | | >>>>> | CPU +-+ >>>>> | | | >>>>> +----------+ | BUS >>>>> | >>>>> +----------+ | +---------+ >>>>> | | +-+ LANCE | >>>>> | | | +---------+ >>>>> | DRAM +-+ >>>>> | | | +---------+ >>>>> | | +-+ LANCE | >>>>> +----------+ | +---------+ >>>>> >>>>> I'm using the term "bus" in the most general possible sense - some >>>>> way for the various devices to get to the common memory. This gets >>>>> implemented many ways. >>>>> >>>>> The AMD 7990 LANCE chip was and is a common Ethernet implementation. >>>>> It has in front of it a ring in which one can describe up to 2^N >>>>> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea >>>>> at any given time how many messages are waiting - it only knows >>>>> whether it is working on one right now or is idle, and when switching >>>>> from message to message it knows whether the next slot it considers >>>>> contains a message. So it can't keep such a counter. The device >>>>> driver similarly has a limited view; it might know how many it has >>>>> put in and how many it has taken out again, but it doesn't know >>>>> whether the LANCE has perhaps completed some of the messages it >>>>> hasn't taken out yet. So in the sense of the definition ("The length >>>>> of the output packet queue (in packets)."), it doesn't know how many >>>>> are still waiting. In addition, it is common for such queues or rings >>>>> to be configured pretty small, with excess going into a diffserv- >>>>> described set of software queues. >>>>> >>>>> There are far more general problems. Cisco has a fast forwarding >>>>> technology that we use on some of our midrange products that >>>>> calculates when messages should be sent and schedules them in a >>>>> common calendar queue. Every mumble time units, the traffic that >>>>> should be sent during THIS time interval are picked up and dispersed >>>>> to the various interfaces they need to go out. Hence, there isn't a >>>>> single "output queue", but rather a commingled output schedule that >>>>> shifts traffic to other output queues at various times - which in >>>>> turn do something akin to what I described above. >>>>> >>>>> Also, in modern equipment one often has forwarders and drivers on NIC >>>>> cards rather than having some central processor do that. For >>>>> management purposes, the drivers maintain their counts locally and >>>>> periodically (perhaps once a second) upload the contents of those >>>>> counters to a place where management can see them. >>>>> >>>>> So when you ask "what is the current queue depth", I have to ask what >>>>> the hardware has, what of that has already been spent but isn't >>>>> cleaned up yet, what is in how many software queues, how they are >>>>> organized, and whether that number has been put somewhere that >>>>> management can see it. >>>>> >>>>> Oh - did I mention encrypt/decrypt units, compressors, and other >>>>> inline services that might have their own queues associated with them? >>>>> >>>>> Yes, there is a definition on the books. I don't know that it answers >>>>> the question. >>>>> >>>>> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: >>>>> >>>>> >>>>>> Queue sizes are standard SNMP variables and thus could be sampled at >>>>>> these intervals. But it looks as if you want the queues on a per host >>>>>> basis? >>>>>> >>>>>> Craig >>>>>> >>>>>> In message >>>>> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A >>>>>> nil Kumar writes: >>>>>> >>>>>> >>>>>>> We are searching for any known techniques to continuously sample >>>>>>> (say at >>>>>>> every 100 msec interval) the buffer occupancy of router >>>>>>> interfaces. The >>>>>>> requirement is to extract or estimate the instantaneous value of the >>>>>>> number of packets or bytes in the router buffer from another >>>>>>> machine in >>>>>>> the network, and not the maximum possible router buffer size. >>>>>>> >>>>>>> Any suggestion, advice or pointer to literature on this? >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> Anil >>>>>>> > > > From detlef.bosau at web.de Sat Dec 23 13:45:36 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 23 Dec 2006 22:45:36 +0100 Subject: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? Message-ID: <458DA380.1070207@web.de> I apologize if this is a stupid question. However, I would like to know how we shall deal with this scenario. Consider a client maintaing TCP sessions to different servers. (My PC is actually doing so, so there is at least one specimen.) Consider the following topology ---------------(FE)--------------- Server 1 Client ----- (E)------router ------------(FE)----------router ----------------(E)------------------Server 2 E = Ethernet (10 Mbit/s) FE = Fast Ethernt (100 MBit/s) The link in the middle represents some transport network / path, e.g. the Internet. The common bottleneck is actually the link between client and router. Consider there were two greedy TCP flows to the client, one originates from server 1 and the other from server 2. My feeling is that the flow server 1 - client should achieve more throughput than the other. From what I see in a simulation, the ratio in the secnario above is roughly 2:1. (I did this simulation this evening, so admittedly there might be errors.) Is there a general opinion how the throughput ratio should be in a scenario like this? Thanks. Detlef From Jon.Crowcroft at cl.cam.ac.uk Sat Dec 23 13:47:44 2006 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sat, 23 Dec 2006 21:47:44 +0000 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: Message from "David P. Reed" of "Sat, 23 Dec 2006 16:01:08 EST." <458D9914.3020608@reed.com> Message-ID: The obvious solution is for everyone everywhere to run time wget www.google.com once a minute and then put the answer on a web page called, say 'hostname`-`date`.txt and wait for google to index it (or we could set up a gmail account with a public pasword and just email answers there) and then run traceroute for same set to find intersection of sub-paths, then we'd have a huge oracle of rtts from every to everywhere (that matters), pretty much, and no amount of icmp would be needed at all, de-prioritised or otherwize. btw, why do americans call queues "lines", except when talking about networks? surely we should have line theory, and active line management and fair lining and so on? now where's that d**n martini...? In missive <458D9914.3020608 at reed.com>, "David P. Reed" typed: >>I find the first sentence here very odd. ICMP is used every day. It >>is hardly dead. >> >>Perhaps you meant that it doesn't work very well? >> >>The real point you are making here is that *any* measurement protocol >>that can be distinguished from regular traffic by routers is at high >>risk of generating completely *wrong* answers, for two reasons: >> >>1. Router vendors find it convenient to make their routers privilege >>real traffic over measurement overhead. >> >>2. There is a constant temptation to "game" any benchmarking tests that >>vendors tend to accede to. Academics do the same thing when they are >>proposing great new ideas that they want to sell - so this isn't a >>statement that says commercial is bad and academic has the moral high >>ground. (the benchmarking game in the database business (TP1) or the >>processor business (MIPS or FLOPS according to standard benchmarks) or >>the 3D graphics business are all unfortunately gamed every day). >> >>Purveyors of ideas are tempted to lie or spin performance numbers. >>That's the high-tech industry version of I.F.Stone's: "governments lie". >> >>Why would a router vendor offer to report a reliable number over SNMP? >> >>So the general conclusion one should draw from this is that performance >>measurements should be done without the help of vendors or proposers >>(call them purveyors), with a great deal of effort put into measuring >>"real" cases that cannot be detected and distorted by purveyor >>interpretations that either: >> >>a. allow the purveyor to claim that the measurement is bogus (ICMP >>should never have been broken by vendor optimizations, but it was in >>their interest to do so as noted above) or >> >>b. allow the purveyor to generate much better numbers than will ever be >>seen in practice, either by special casing measurement packets, or >>putting the definition of the measurement being made in the hands of the >>purveyor. >> >>Matt Mathis wrote: >>> ICMP has been dead as a measurement protocol for about 10 years now. The >>> problem is that nearly all implementations process ICMP at substantially lower >>> priority than other protocols, so the measurements are far worse than reality. >>> >>> I think you are looking for something more along the lines of IPMP, the IP >>> measurement protocol. Look for the expired Internet drafts: >>> draft-bennett-ippm-ipmp-01 2003-03-05 Expired >>> draft-mcgregor-ipmp-04 2004-02-04 Expired >>> >>> There is also a report by several people including Fred Baker and me, >>> analyzing these two conflicting drafts, and proposing yet another variant. I >>> couldn't find the report quickly. Perhaps Fred has a copy.....? >>> >>> If you want to follow this thread, be sure to engage the router vendors/large >>> ISP's early and listen to them carefully, because the academic and industrial >>> agendas clash very badly. (You should read the report first.) >>> >>> Thanks, >>> --MM-- >>> ------------------------------------------- >>> Matt Mathis http://www.psc.edu/~mathis >>> Work:412.268.3319 Home/Cell:412.654.7529 >>> ------------------------------------------- >>> Evil is defined by mortals who think they know >>> "The Truth" and use force to apply it to others. >>> >>> On Fri, 22 Dec 2006, Jun Liu wrote: >>> >>> >>>> I am amazed by this thread of discussion. The key issue of correctly >>>> estimating the queuing delay at a particular router is to make the >>>> queuing delay of interest distinct from the delays caused by other >>>> factors. I agree with Matt Mathis' opinion that the difference of a pair >>>> of timestamps experienced by an IP packet at a router >>>> closely characterizes the queuing delay of this packet at this router. >>>> However, it is inconvenient for an end system to obtain the values of >>>> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only >>>> been installed surrounding one core router and relies >>>> on special measurement circuits. The data measured by Clamp is suitable >>>> for statistics analysis rather than providing dynamic indications to end >>>> hosts. >>>> >>>> I have been working on estimating the maximum queuing delay at the >>>> outbound queue of the slowest link along an end-to-end path. Here, a >>>> slowest link refers to a link with the longest maximum queuing delay >>>> along the path. The queuing delay at the slowest link can be estimated >>>> from measured RTTs along the path. If the histogram of a set of measured >>>> RTTs has a single mode, then the maximum queuing delay at the slowest >>>> link can be approximated by the delay value at the mode less the value >>>> of the minimum RTT. The estimation of the maximum queuing delay at the >>>> slowest link is largely affected by the non-ignorable queuing delays at >>>> other routers. For example, a histogram of measured RTTs can have >>>> multiple modes when there are two or more identical slowest links in a >>>> path. Hence, appropriate technique of filtering noises is necessary. >>>> However, multimodal based estimation issues remain unsolved. >>>> >>>> I am thinking of modifying the ICMP protocol to serve for carrying >>>> dynamic delay information at routers to end hosts. The reason of >>>> considering ICMP is due to two concerns. First, ICMP should have been >>>> implemented at all routers and end hosts. "ICMP, uses the basic support >>>> of IP as if it were a higher level protocol, however, ICMP is actually >>>> an integral part of IP, and must be implemented by every IP >>>> module." [RFC 792] Second, a lot of active probing based network >>>> measurement methods were developed based on ICMP. >>>> >>>> Currently, an ICMP error reporting message is sent by a router upon >>>> processing an erroneous IP packet and is routed back to the sender of >>>> this IP packet. When this happens, the IP packet is dropped at the >>>> router. Let's call an erroneous IP packet an echo, and the corresponding >>>> ICMP packet an echo reply. The proposed modification is to make a pair >>>> of echo and echo reply packets co-exist in the network. Namely, an echo >>>> packet is kept routed to its destination after it has triggered an echo >>>> reply which will be sent back to the sender of this echo. When we assume >>>> that another echo reply will be sent by the destination of this echo >>>> packet, the sender will obtain two echo reply packets on one echo. The >>>> RTTs of the two echo reply packets share delays on the common links they >>>> both traversed. >>>> >>>> Consider a simple network shown below. We denote by d(x,y) the delay >>>> from network node x to y. d(x,y) consists of the link latency, >>>> transmission delay on link (x,y), and the delay in node x (which is a >>>> sum of queuing and processing delays within x). We are about to estimate >>>> the queuing delay at router B (either dynamic delays or the maximum >>>> delay). We consider a worst case scenario by assuming that d(A,B) and >>>> d(B,D) always have similar dynamic values. This scenario happens when >>>> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of >>>> the two router have the same size, and the same traffic pattern is on >>>> routers A and B. >>>> >>>> d(S,A) d(A,B) d(B,D) >>>> Sender -------------------> R_A ----------------> R_B >>>> ------------------> Destination >>>> <------------------- <---------------- >>>> <------------------ >>>> d(A,S) d(B,A) d(D,B) >>>> >>>> If the sender can make both router B and the destination send an echo >>>> reply on every echo packet it sends, then the difference of the RTTs >>>> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)). >>>> This value much closely characterizes the queuing delay at router B than >>>> using pure RTTs. This method makes queuing delay information timely >>>> delivered to an end node---the sender of the echo packets. >>>> >>>> The method described here is somewhat similar to the idea adopted in van >>>> Jacobson's work of pathchar which incrementally measures the link >>>> bandwidth hop-by-hop from the link next to the source to the link next >>>> to the destination. However, there are two differences. First, in >>>> pathchar, only one echo reply can be triggered by an echo, and a pair of >>>> echo and echo reply can not co-exist in the network. Second, in >>>> pathchar, the RTTs of echo reply packets taking different path lengths >>>> do not necessarily share common delay portions. >>>> >>>> Two obvious side effects of this modified ICMP protocol are the overhead >>>> and the security issues. Higher overhead is made because of the >>>> co-existence of echo and echo reply packets in the network. One echo >>>> packet can potentially trigger as many echo reply packets as the number >>>> of intermediate routers between a pair of sender and destination. Thus, >>>> the security issue deserves consideration. >>>> >>>> My question here is that whether such modification on ICMP is >>>> acceptable, or it simply introduces a new evil. >>>> >>>> Jun Liu >>>> >>>> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote: >>>> >>>>> Another approach is to get accurate time stamps of ingress/egress packets and >>>>> use the difference in the time stamps to compute effective queue depths. The >>>>> NLANR PMA team was building a "router clamp", an "octopus" designed to get >>>>> traces from all interfaces of a busy Internet2 core router. I have since lost >>>>> track of the details. Google "router clamp pma" for clues. >>>>> >>>>> I basically don't believe queue depths measured by any other means, because >>>>> there are so many cascaded queues in a typical modern router. I point out >>>>> that most NIC's have short queues right at the wire, along with every DMA >>>>> engine and bus arbitrator, etc. >>>>> >>>>> Claiming that an internal software instrument accurately represents the true >>>>> aggregate queue depth for the router is equivalent to asserting that none of >>>>> the other potential bottlenecks in the router have any queued packets. If they >>>>> never have queued packets, why did the HW people bother with the silicon? I >>>>> conclude there is always potential for packets to be queued out of scope of >>>>> the software instruments. >>>>> >>>>> It's a long story, but I have first hand experience with one of these cases: >>>>> my external measurement of maximum queues size was only half of the design size, >>>>> because the "wrong" bottleneck dominated. >>>>> >>>>> Good luck, >>>>> --MM-- >>>>> ------------------------------------------- >>>>> Matt Mathis http://www.psc.edu/~mathis >>>>> Work:412.268.3319 Home/Cell:412.654.7529 >>>>> ------------------------------------------- >>>>> Evil is defined by mortals who think they know >>>>> "The Truth" and use force to apply it to others. >>>>> >>>>> On Wed, 20 Dec 2006, Lynne Jolitz wrote: >>>>> >>>>> >>>>>> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols. >>>>>> >>>>>> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge. >>>>>> >>>>>> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering). >>>>>> >>>>>> Lynne Jolitz >>>>>> >>>>>> ---- >>>>>> We use SpamQuiz. >>>>>> If your ISP didn't make the grade try http://lynne.telemuse.net >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: end2end-interest-bounces at postel.org >>>>>>> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker >>>>>>> Sent: Wednesday, December 13, 2006 12:17 PM >>>>>>> To: Craig Partridge >>>>>>> Cc: end2end-interest at postel.org >>>>>>> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer >>>>>>> >>>>>>> >>>>>>> You're talking about ifOutQLen. It was originally proposed in RFC >>>>>>> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573 >>>>>>> 1994). The reason it was deprecated is not documented, but the >>>>>>> fundamental issue is that it is non-trivial to calculate and is very >>>>>>> ephemeral. >>>>>>> >>>>>>> The big issue in calculating it is that it is rarely exactly one >>>>>>> queue. Consider a simple case on simple hardware available in 1994. >>>>>>> >>>>>>> +----------+ | >>>>>>> | | | >>>>>>> | CPU +-+ >>>>>>> | | | >>>>>>> +----------+ | BUS >>>>>>> | >>>>>>> +----------+ | +---------+ >>>>>>> | | +-+ LANCE | >>>>>>> | | | +---------+ >>>>>>> | DRAM +-+ >>>>>>> | | | +---------+ >>>>>>> | | +-+ LANCE | >>>>>>> +----------+ | +---------+ >>>>>>> >>>>>>> I'm using the term "bus" in the most general possible sense - some >>>>>>> way for the various devices to get to the common memory. This gets >>>>>>> implemented many ways. >>>>>>> >>>>>>> The AMD 7990 LANCE chip was and is a common Ethernet implementation. >>>>>>> It has in front of it a ring in which one can describe up to 2^N >>>>>>> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea >>>>>>> at any given time how many messages are waiting - it only knows >>>>>>> whether it is working on one right now or is idle, and when switching >>>>>>> from message to message it knows whether the next slot it considers >>>>>>> contains a message. So it can't keep such a counter. The device >>>>>>> driver similarly has a limited view; it might know how many it has >>>>>>> put in and how many it has taken out again, but it doesn't know >>>>>>> whether the LANCE has perhaps completed some of the messages it >>>>>>> hasn't taken out yet. So in the sense of the definition ("The length >>>>>>> of the output packet queue (in packets)."), it doesn't know how many >>>>>>> are still waiting. In addition, it is common for such queues or rings >>>>>>> to be configured pretty small, with excess going into a diffserv- >>>>>>> described set of software queues. >>>>>>> >>>>>>> There are far more general problems. Cisco has a fast forwarding >>>>>>> technology that we use on some of our midrange products that >>>>>>> calculates when messages should be sent and schedules them in a >>>>>>> common calendar queue. Every mumble time units, the traffic that >>>>>>> should be sent during THIS time interval are picked up and dispersed >>>>>>> to the various interfaces they need to go out. Hence, there isn't a >>>>>>> single "output queue", but rather a commingled output schedule that >>>>>>> shifts traffic to other output queues at various times - which in >>>>>>> turn do something akin to what I described above. >>>>>>> >>>>>>> Also, in modern equipment one often has forwarders and drivers on NIC >>>>>>> cards rather than having some central processor do that. For >>>>>>> management purposes, the drivers maintain their counts locally and >>>>>>> periodically (perhaps once a second) upload the contents of those >>>>>>> counters to a place where management can see them. >>>>>>> >>>>>>> So when you ask "what is the current queue depth", I have to ask what >>>>>>> the hardware has, what of that has already been spent but isn't >>>>>>> cleaned up yet, what is in how many software queues, how they are >>>>>>> organized, and whether that number has been put somewhere that >>>>>>> management can see it. >>>>>>> >>>>>>> Oh - did I mention encrypt/decrypt units, compressors, and other >>>>>>> inline services that might have their own queues associated with them? >>>>>>> >>>>>>> Yes, there is a definition on the books. I don't know that it answers >>>>>>> the question. >>>>>>> >>>>>>> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote: >>>>>>> >>>>>>> >>>>>>>> Queue sizes are standard SNMP variables and thus could be sampled at >>>>>>>> these intervals. But it looks as if you want the queues on a per host >>>>>>>> basis? >>>>>>>> >>>>>>>> Craig >>>>>>>> >>>>>>>> In message >>>>>>> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A >>>>>>>> nil Kumar writes: >>>>>>>> >>>>>>>> >>>>>>>>> We are searching for any known techniques to continuously sample >>>>>>>>> (say at >>>>>>>>> every 100 msec interval) the buffer occupancy of router >>>>>>>>> interfaces. The >>>>>>>>> requirement is to extract or estimate the instantaneous value of the >>>>>>>>> number of packets or bytes in the router buffer from another >>>>>>>>> machine in >>>>>>>>> the network, and not the maximum possible router buffer size. >>>>>>>>> >>>>>>>>> Any suggestion, advice or pointer to literature on this? >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>>> Anil >>>>>>>>> >>> >>> >>> cheers jon From detlef.bosau at web.de Sat Dec 23 14:52:32 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 23 Dec 2006 23:52:32 +0100 Subject: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? In-Reply-To: <458DA380.1070207@web.de> References: <458DA380.1070207@web.de> Message-ID: <458DB330.80504@web.de> My goodness, I did not see my disastrous "ASCII-art". I apologize. Let?s give it another try: ----(FE)---- Server 1 C - (E)---router --(FE)---router ----(E)------Server 2 C = Client, rest as before. Hopefully, it is better now. From detlef.bosau at web.de Sun Dec 24 14:52:56 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 24 Dec 2006 23:52:56 +0100 Subject: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? In-Reply-To: <458DA380.1070207@web.de> References: <458DA380.1070207@web.de> Message-ID: <458F04C8.30100@web.de> Detlef Bosau wrote: > I apologize if this is a stupid question. I admit, it was a *very* stupid question :-) Because my ASCII arts were terrible, I add a nam-screenshot here (hopefully, I?m allowed to send this mail in HTML): NAM screenshot Links: 0-2: 100 Mbit/s, 1 ms 1-2: 10 Mbit/s, 1 ms 2-3: 100 Mbit/s, 10 ms 3-4: 10 MBit/s, 1 ms Sender: 0,1 Receiver: 4 > > > My feeling is that the flow server 1 - client should achieve more > throughput than the other. From what I see in a simulation, the ratio > in the secnario above is roughly 2:1. (I did this simulation this > evening, so admittedly there might be errors.) > > Is there a general opinion how the throughput ratio should be in a > scenario like this? Obviously, my feeling is wrong. Perhaps, I should consider reality more than my feelings :-[ AIMD distributes the *path capacity (i.e. "memory") *in equal shares. So, in case of two flows sharing a path, each flow is assigned an equal window. Hence, the rates should be equal as they depend on the window (= estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = cwnd/RTT) However, the scenario depicted above is an interesting one: Apparently, the sender at node 1 is paced "ideally" by the link 1-2. So, packets sent by node 0 are dropped at node 3 unuduly often. In consequence, the flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 to 4 runs as if there was no competitor. If the bandwdith 1-2 is changed a little bit, the bevaviour returns to the expected one. I?m still not quite sure whether this behaviour matches reality or whether it is an NS2 artifact. Detlef -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061224/3033e94c/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: bild.png Type: image/png Size: 21858 bytes Desc: not available Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061224/3033e94c/bild-0001.png From Anil.Agarwal at viasat.com Mon Dec 25 08:35:18 2006 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Mon, 25 Dec 2006 11:35:18 -0500 Subject: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? References: <458DA380.1070207@web.de> <458F04C8.30100@web.de> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com> Detlef, Here is a possible explanation for the results in your scenario - Take the case when both connections are active and the queue at router 2 remains non-empty. Every T seconds, there will be a packet departure at router 2, resulting in the queue size decreasing by 1 packet at time T. If a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1, another packet will arrive at router 2 from node 1. ta1 is the time taken by the Ack to reach node 1. If a packet from node 0 departs at time n*T, then at time n*T + ta0 + t0, another packet will arrive at router 2 from node 0. ta0 is the time taken by the Ack to reach node 0. t0 is the transmission time of a packet at 100 Mbps. Another packet from node 0 may arrive at time n*T + ta0 + 2 * t0. In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + t0 > ta1. I am assuming that propagation delays were set to 0 in the simulations. It can be seen, that when a node 1 packet arrives at node 2, the queue is never full - a packet departure takes place ta1 seconds before its arrival, and no node 0 packet arrive during the ta1 seconds. No such property holds for node 0 packets - hence node 0 packets are selectively dropped. Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switching delays will probably break this synchronized relationship. Regards, Anil Anil Agarwal ViaSat Inc. Germantown, MD ________________________________ From: end2end-interest-bounces at postel.org on behalf of Detlef Bosau Sent: Sun 12/24/2006 5:52 PM To: end2end-interest at postel.org Cc: Michael Kochte; Daniel Minder; Martin Reisslein; Frank Duerr Subject: Re: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? Detlef Bosau wrote: I apologize if this is a stupid question. I admit, it was a very stupid question :-) Because my ASCII arts were terrible, I add a nam-screenshot here (hopefully, I?m allowed to send this mail in HTML): Links: 0-2: 100 Mbit/s, 1 ms 1-2: 10 Mbit/s, 1 ms 2-3: 100 Mbit/s, 10 ms 3-4: 10 MBit/s, 1 ms Sender: 0,1 Receiver: 4 My feeling is that the flow server 1 - client should achieve more throughput than the other. From what I see in a simulation, the ratio in the secnario above is roughly 2:1. (I did this simulation this evening, so admittedly there might be errors.) Is there a general opinion how the throughput ratio should be in a scenario like this? Obviously, my feeling is wrong. Perhaps, I should consider reality more than my feelings :-[ AIMD distributes the path capacity (i.e. "memory") in equal shares. So, in case of two flows sharing a path, each flow is assigned an equal window. Hence, the rates should be equal as they depend on the window (= estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = cwnd/RTT) However, the scenario depicted above is an interesting one: Apparently, the sender at node 1 is paced "ideally" by the link 1-2. So, packets sent by node 0 are dropped at node 3 unuduly often. In consequence, the flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 to 4 runs as if there was no competitor. If the bandwdith 1-2 is changed a little bit, the bevaviour returns to the expected one. I?m still not quite sure whether this behaviour matches reality or whether it is an NS2 artifact. Detlef -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/1d136a43/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: bild.png Type: image/png Size: 21858 bytes Desc: bild.png Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/1d136a43/bild-0001.png From Anil.Agarwal at viasat.com Mon Dec 25 15:38:44 2006 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Mon, 25 Dec 2006 18:38:44 -0500 Subject: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client? References: <458DA380.1070207@web.de> <458F04C8.30100@web.de> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com> Detlef, In my earlier description, I had incorrectly assumed that link 2-3 was at 10 Mbps. The nature of the problem is similar whether link 2-3 is at 10 Mbps or 100 Mbps. Here is a corrected description for your network scenario - Take the case when both connections are active and the queue at router 3 remains non-empty. Every T seconds, there will be a packet departure at router 3, resulting in the queue size decreasing by 1 packet. At router 3, if a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1 + t0, another packet will arrive from node 1. ta1 is the time taken by the Ack to reach node 1 from node 4. t0 is the transmission time of a packet at 100 Mbps. At router 3, if a packet from node 0 departs at time n*T, then at time n*T + ta0 + 2 * t0, another packet will arrive from node 0. ta0 is the time taken by the Ack to reach node 0 from node 4. t0 is the transmission time of a packet at 100 Mbps. Another packet (of a packet pair) from node 0 may arrive at time n*T + ta0 + 3 * t0. In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + 2 * t0 > ta1 + t0. I am assuming that propagation delays were set to 0 in the simulations. It can be seen, that when a node 1 packet arrives at node 3, the queue is never full - a packet departure takes place ta1 + t0 seconds before its arrival, and no node 0 packets arrive during ths interval. No such property holds for node 0 packets - hence node 0 packets are selectively dropped. Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switch delays will probably break this synchronized relationship. RED will also help. One can construct many other similar scenarios, where one connection is selectively favored over another. Perhaps, one more reason to use RED. Anil ________________________________ From: end2end-interest-bounces at postel.org on behalf of Agarwal, Anil Sent: Mon 12/25/2006 11:35 AM To: Detlef Bosau; end2end-interest at postel.org Cc: Michael Kochte; Martin Reisslein; Frank Duerr; Daniel Minder Subject: Re: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client? Detlef, Here is a possible explanation for the results in your scenario - Take the case when both connections are active and the queue at router 2 remains non-empty. Every T seconds, there will be a packet departure at router 2, resulting in the queue size decreasing by 1 packet at time T. If a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1, another packet will arrive at router 2 from node 1. ta1 is the time taken by the Ack to reach node 1. If a packet from node 0 departs at time n*T, then at time n*T + ta0 + t0, another packet will arrive at router 2 from node 0. ta0 is the time taken by the Ack to reach node 0. t0 is the transmission time of a packet at 100 Mbps. Another packet from node 0 may arrive at time n*T + ta0 + 2 * t0. In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + t0 > ta1. I am assuming that propagation delays were set to 0 in the simulations. It can be seen, that when a node 1 packet arrives at node 2, the queue is never full - a packet departure takes place ta1 seconds before its arrival, and no node 0 packet arrive during the ta1 seconds. No such property holds for node 0 packets - hence node 0 packets are selectively dropped. Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switching delays will probably break this synchronized relationship. Regards, Anil Anil Agarwal ViaSat Inc. Germantown, MD ________________________________ From: end2end-interest-bounces at postel.org on behalf of Detlef Bosau Sent: Sun 12/24/2006 5:52 PM To: end2end-interest at postel.org Cc: Michael Kochte; Daniel Minder; Martin Reisslein; Frank Duerr Subject: Re: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client? Detlef Bosau wrote: I apologize if this is a stupid question. I admit, it was a very stupid question :-) Because my ASCII arts were terrible, I add a nam-screenshot here (hopefully, I?m allowed to send this mail in HTML): Links: 0-2: 100 Mbit/s, 1 ms 1-2: 10 Mbit/s, 1 ms 2-3: 100 Mbit/s, 10 ms 3-4: 10 MBit/s, 1 ms Sender: 0,1 Receiver: 4 My feeling is that the flow server 1 - client should achieve more throughput than the other. From what I see in a simulation, the ratio in the secnario above is roughly 2:1. (I did this simulation this evening, so admittedly there might be errors.) Is there a general opinion how the throughput ratio should be in a scenario like this? Obviously, my feeling is wrong. Perhaps, I should consider reality more than my feelings :-[ AIMD distributes the path capacity (i.e. "memory") in equal shares. So, in case of two flows sharing a path, each flow is assigned an equal window. Hence, the rates should be equal as they depend on the window (= estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = cwnd/RTT) However, the scenario depicted above is an interesting one: Apparently, the sender at node 1 is paced "ideally" by the link 1-2. So, packets sent by node 0 are dropped at node 3 unuduly often. In consequence, the flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 to 4 runs as if there was no competitor. If the bandwdith 1-2 is changed a little bit, the bevaviour returns to the expected one. I?m still not quite sure whether this behaviour matches reality or whether it is an NS2 artifact. Detlef -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/d26abde2/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: bild.png Type: image/png Size: 21858 bytes Desc: bild.png Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/d26abde2/bild-0001.png From fred at cisco.com Sat Dec 23 13:15:32 2006 From: fred at cisco.com (Fred Baker) Date: Sat, 23 Dec 2006 13:15:32 -0800 Subject: [e2e] Extracting No. of packets or bytes in a router buffer In-Reply-To: References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> <1166832054.9009.171.camel@officepc-junliu> Message-ID: On Dec 23, 2006, at 8:01 AM, Matt Mathis wrote: > ICMP has been dead as a measurement protocol for about 10 years > now. The problem is that nearly all implementations process ICMP at > substantially lower priority than other protocols, so the > measurements are far worse than reality. > > I think you are looking for something more along the lines of IPMP, > the IP measurement protocol. Look for the expired Internet drafts: > draft-bennett-ippm-ipmp-01 2003-03-05 Expired > draft-mcgregor-ipmp-04 2004-02-04 Expired http://tools.ietf.org/html/draft-bennett-ippm-ipmp http://tools.ietf.org/html/draft-mcgregor-ipmp > There is also a report by several people including Fred Baker and > me, analyzing these two conflicting drafts, and proposing yet > another variant. I couldn't find the report quickly. Perhaps Fred > has a copy.....? > > If you want to follow this thread, be sure to engage the router > vendors/large ISP's early and listen to them carefully, because the > academic and industrial agendas clash very badly. (You should > read the report first.) > > Thanks, > --MM-- > ------------------------------------------- > Matt Mathis http://www.psc.edu/~mathis > Work:412.268.3319 Home/Cell:412.654.7529 > ------------------------------------------- Is this what you're thinking of? Let me reiterate your point - if you want features in routers and switches that will help you be able to determine what is happening in various networks along the way between here and there, you have two avenues. One is that you can measure externally and make inferences about the total end to end path that may not tell you much about any specific point. The other is that you can know specifics of the path and perhaps individual nodes on the path by asking them questions. If you want supporting features to be available to you in the routers, convince the ISPs that *they* want them. Reason: they will have to turn them on, and they will have to allow you access, and it will be their dollars that convince the vendors to build them. So think hard about how these will help the ISPs do what they do. Note I am not in saying this throwing cold water on it. The ISPs are in fact looking for ways to deliver SLAs that involve multiple ISPs on an end to end path - that's part of the ITU NGN effort. Help them solve that problem and you might get a fair bit of interest. At the same time, if your solution is "interesting research" but doesn't help the ISPs solve a problem they want solved, expect results to be spotty at best. Begin forwarded message: > From: Mark Allman > Date: September 7, 2004 6:33:19 AM PDT > To: imrg-ipmp-review at guns.icir.org > Subject: Mark Allman: [IMRG] ipmp review team report > Reply-To: mallman at icir.org > > *** PGP SIGNATURE VERIFICATION *** > > *** Status: Good Signature > *** Signer: Mark Allman (0xCE3222CE) > *** Signed: 09/07/04 6:33:19 AM > *** Verified: 09/07/04 7:13:11 AM > *** BEGIN PGP VERIFIED MESSAGE *** > > > Folks- > > FYI, here is what I sent to the IMRG mailing list (for those who do > not > track it). > > Thanks again for all your hard work! > > allman > > > > ------- Forwarded Message > > To: imrg at irtf.org > From: Mark Allman > Organization: ICSI Center for Internet Research (ICIR) > Song-of-the-Day: Paradise City > Date: Tue, 07 Sep 2004 09:03:02 -0400 > Subject: [IMRG] ipmp review team report > > > Folks- > > A while back you might remember that we had some discussion on the IP > Measurement Protocol. In order to try to gain some traction I asked a > few folks to act as a review team to look over the two proposals on > the > table (and the ancillary information). The team has completed their > work and did a very nice job of debating the issues involved in > IPMP and > coming up with a summary of their feelings. > > The team members are listed at the bottom of the report and I wish to > thank them for their diligence in reviewing these documents. > > The report from the group is below. Please feel free to discuss the > ideas enumerated in the report on this mailing list all you want. The > team is not the final word. I convened the team to get some focued > energy thinking about these issues. The report is in no way > binding nor > the community's final judgement. So, please feel very free to > continue > the discussion. > > allman > > > > > > IMRG IPMP Review Team Report > - ---------------------------- > > The Internet Measurement Research Group (within the IRTF) convened a > small team to review the materials related to the IP Measurement > Protocol (IPMP). The members of the group (listed at the end of > this report) discussed IPMP and several larger issues. In > particular, the team reviewed the following two Internet-Drafts: > > draft-mcgregor-ipmp-03.txt > draft-bennett-ippm-ipmp-01.txt > > The goal of this effort was to chart a strawman course for moving > forward with some sort of measurement protocol (if possible). > > Note: This message represents the group's consensus. However, that > does not mean that each member of the team agrees with each point in > this note. The group reached rough agreement, not unaminity. > > The following are the high-order bits from the discussion. > > The fundamental challenge that measurement protocols attempt to > address is to provide a means to measure the network characteristics > researchers and operators want to understand in a way that provides > fine grained information about the network in a lightweight fashion. > To this end, we would suggest that IPMP wants to develop tools that > are: > > - implementable in reasonable timeframes on existing equipment, > which means that they should not depend on ASIC development or > new equipment purchase > > - deployable; ISPs would ideally want them, and at minimum not > turn them off > > - useful to the ISPs in terms of their business rules and the > questions they ask about their own networks > > If the procedures or protocols are useful to the ISPs, one can > expect that they will be willing to collect the data, and may under > some appropriate rules also allow researchers to collect data or > share collected data with researchers. > > In the above context, the team found the motivation for IPMP given > in both documents to be lacking --- to the point where the team did > not feel the current proposals are viable. Several > related/supporting points were discussed: > > * From the perspective of a vendor developing equipment and > protocols or an ISP deploying them, the IPMP proposals on the > table do not look viable. The fundamental goal of IPMP is to > display the structure of a network and many of its fine-scale > characteristics. This is information that a service provider > does not share with anyone else except - maybe - under > NDA. Given that the protocols to obtain the information are > fairly complex and involve a fair level of memory writes, the > vendor will do this if and only if its ISP customers ask for it, > and they are not asking for this. > > * Making a better ping or traceroute is, on the one hand, too > narrow and mechanistic a focus and yet also too focused on what > researchers might find compelling rather than what operators > would. > > * A tool to reverse engineer a network isn't needed by the ISPs. > They already know the structure of their own networks. > > That said, the team **strongly** believes that there is much room > for improvement in the state of network troubleshooting and > debugging. In particular: > > * Some service providers are asking for a solution to a problem > that may yield data that researchers may find valuable. Within > its own network, a service provider is generally interested in > locating the links that introduce variability into their > network. It may view them as under-provisioned for offered > load, as inappropriately routed, or whatever, but they are in > fact interested in locating links that require upgrading in some > form. > > * Some service providers are asking (in TIA and related fora) how > they can deploy SLAs that cross ISP boundaries. These may be > among ISPs that form business coalitions, such as Teleglobe has > tried to set up with its transit network customers, or among > regional networks such as US RBOCs that view transitive SLAs as > a rational approach. The watchword in such consortia is "trust > but verify"; it is in their interest to have a procedure or > protocol that will allow them to isolate issues that may prevent > them from meeting SLA guarantees in something resembling real > time. Since those SLAs are one-way, this means accurate one-way > delay and jitter measurements host to host, POP to POP, or CPE > to CPE. > > In addition, in looking at the protocols themselves, we found > ourselves wondering how much could be learned by clever inference from > fairly simple data collection and black box measurement, as opposed > to explicit reporting of values. > > As another example, we note that the intention of such procedures as > CalTech's FAST and MIT's XCP protocols is to detect and measure > variable delays in the network and cause traffic to be sent in such > a way as to maximize throughput while minimizing such delays. This > fundamental question is a direct corollary to that raised in > http://www.nwfusion.com/research/2002/1216isptestside1.html, and > that raised in the context of transitive Tier 2 network SLAs. These > would like to be able to identify the existence of an SLA failure or > other disturbance in the Force on a route, report its magnitude, and > isolate the disturbing device. To that end, we wonder can be done > with the numbers measured by Dina Katabi's XCP protocol. > > Finally, the team wondered if a protocol that carries less global > information but more precision would be more deployable. For > example if the stamps just consisted of an opaque ID, TTL and simple > 32 bit counter running on "the most stable local frequency source", > then the ISP (w/ the engineering documentation for their own gear) > can use database techniques to compute everything carried by the > current protocol. The stamps are simple enough where we can, with a > straight face, ask for them in multiple places within one box: input > and output framers, bus DMA engines, etc. We can envision that this > would be an extremely valuable tool for an ISP to understand (and > diagnose) certain QoS properties of their own network. Note that, > globally parsable metadata in the stamps probably has negative value > to most ISPs because it reduces an ISP's ability to keep it's assets > private. The barrier to deployment in not so much the cost of the > implementation, but the indirect cost of the leaking proprietary > topology information. > > At the same time, external researchers could use inference > techniques to get some of the same information, including most > dynamic properties such as queue depths etc. The external users get > much less topology information, unless they make an explicit > arrangement with the ISP to get the annotations associated with the > opaque IDs. > > In summary, the team came to two points of consensus: 1) that the > protocol is inadequately motivated by the proposals, even though > ISPs would like to be able to measure their and their neighbors' > networks; 2) that the protocol's complexity and intrusiveness are > inadequately justified with respect to other, potentially more > lightweight approaches that may be easier to deploy. The main point > is that to get a protocol deployed, ISPs need to ask for it loudly > enough and router vendors need to be able to implement it easily > enough, and neither is argued by these proposals. > > Review team members: Guy Almes (Internet2), Fred Baker (Cisco), Paul > Barford (UWisc), Chistophe Diot (Intel Research), Ralph Droms > (Cisco), Larry Dunn (Cisco), Matt Mathis (PSC), David Moore > (CAIDA), Jennifer Rexford (AT&T Research), Neil Spring (Univ. of > Washington) > Scribe / team shepherd: Mark Allman (ICIR) > > ------- End of Forwarded Message > > > > > > > *** END PGP VERIFIED MESSAGE *** From detlef.bosau at web.de Tue Dec 26 08:39:36 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Tue, 26 Dec 2006 17:39:36 +0100 Subject: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com> References: <458DA380.1070207@web.de> <458F04C8.30100@web.de> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com> <0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com> Message-ID: <45915048.2030908@web.de> Agarwal, Anil wrote: > Detlef, > > In my earlier description, I had incorrectly assumed that link 2-3 was > at 10 Mbps. The nature of the problem is similar whether link 2-3 is > at 10 Mbps or 100 Mbps. Admittedly, I didn?t understand it yesterday... However, eventually you say: > Changing bandwidths a bit or introducing real-life factors such as > propagation delays, variable processing delays and/or variable > Ethernet switch delays will probably break this synchronized > relationship. RED will also help. > In fact, the behaviour disappears when I randomize the delays. > One can construct many other similar scenarios, where one connection > is selectively favored over another. Perhaps, one more reason to use RED. > I?m not quite sure about the relationship to RED here. (In fact, I still have no personal opinion to RED, however I have numerous questions to RED, but I think that?s not the question here.) What I try to understand is basically, whether the observed behaviour is an artifact or not. It might be as well a behaviour which happens only under rare circumstances, think of the capture effect in Ethernet. In consequence, my basic doubt aganist all kind of *mulation (simulatin, emulation etc.) rises again. I personally make no difference between simulation and emulation. To my knowledge, emulation is used synonymously for "real time simulation" and as such is prone for the same artifacts and errors for any other kind of simulation. Particularly the synchronicity between the two links 1-2 and 3-4 is basically artificial. Even with quartz-controlled timers I severely doubt hat two NICs will ever run perfeclty synchronous - and this is not even necessary as long as data sent by one NIC is read error free by the other. Like all other kinds of *mulation the NS2 is nothing than a set of difference equations put in "some strange form". It thus represents our hopes, fantasy and religious beliefs. Unfortunately, reality doesn?t care about any of them ;-) Hover, I did not consider this scenario by pure chance. The question behind this scenario is a very precise one. Let?s draw a network, somewhat more simple this time. Sender(i) ------------(some network)-----------Splitter------(some network)---------Receiver Sender(i) denotes several senders. Now, a stone aged question of mine arises: Can it be guaranteed that there is no overload, neither on the network before the splitter nor on the network behind it? In some off list discussion last year, Mark Allman pointed out that overload on both network paths (before and behind the splitter) is prevented by TCP congestion control and in split connections the splitter prevents overload by TCP flow control. E.g.: Consider one sender and the path before the splitter a 100 Mbps path, the path behind the splitter a 10 Mbps path. In fact (and that?s why I added an experimental flow control to TCP in my simulation), when the buffer at the splitter is sufficiently large after some settling time the window on the sender is correctly curtailed that way that the sender achieves an average rate of 10 Mbps. NB: I think I have to re-read the thesis by Rajiv Chakaravorthy on this issue because it?s the question whether we need window clamping techniques. It seems that any necessary clamping can be achieved by the existing flow control and congestion control mechanisms of TCP. Now, if there is an array of senders, denoted as sender(i), we have basically three scenarios. 1.: The common bottleneck is _before_ the splitter. Perfect. The splitter is fed slower than it is served. We?re lucky. 2.: The common bottleneck is _behind_ the splitter. Perfect. The splitter assigns an equal share of bandwidth to each flow and throttles the senders by means of flow control if necessary. We are lucky again. (I don?t know why we need Christmas in the presence of so much luck.) 3.: There is no "common bottleneck". I have to explain this because at a first glance, this appears to be nonsense: Either the path head, i.e. the part before the splitter, or the path tail, i.e. the part behind the splitter, should be the common bottleneck if there is one. Consider the path tail running with 10 Mbps. Consider sender(0) being capable of sending at 100 Mbps. Consider sender(1) being capable of sending at 2 Mbps. If sender(0) runs alone, the bottleneck is the path tail. If sender(1) runs alone, the bottleneck is the path head. Consider both senders are running in parallel. What will happen in the presence of a splitter? Again we have two cases. 1.: The path tail runs TCP or some other window controlled protocol which maintains available ressource. Hopefully, the flow from sender(1) will achieve something about 2 Mbps and the flow from sender(0) will get the rest. I don?t know. I?m actually trying to find out. 2.: The path tail runs some rate controlled protocol as it would make sense e.g. in satellite connections where the startup behaviour of TCP is extremely annoying. Now: How will this protocol distrute the availabe ressources among the flows? One could give equal shares to them, i.e. 5 Mbps to each. However, because the flow from sender(1) cannot send faster than 2 Mbps, 3 Mbps would remain unused. Particularly the latter scenario seems to be some kind of "end to end" problem: The splitter node does not know the end to end ressource situation and thus have to leave the distribution of ressources to the end nodes. Do I go wrong here? Any comments are highly appreciated. From Anil.Agarwal at viasat.com Wed Dec 27 07:55:49 2006 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Wed, 27 Dec 2006 10:55:49 -0500 Subject: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client? Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com> Detlef, You wrote > > One can construct many other similar scenarios, where one > connection > > is selectively favored over another. Perhaps, one more > reason to use RED. > > > I?m not quite sure about the relationship to RED here. (In > fact, I still have no personal opinion to RED, however I have > numerous questions to RED, but I think that?s not the question here.) RED will probabilistically discard packets before the queue gets full, resulting in packet discards for both connections. RED will be biased against the larger connection. Eventually, both TCP connections will reach the same (statistical) rate and experience the same packet drop rate. > > Again we have two cases. > > 1.: The path tail runs TCP or some other window controlled protocol > which maintains available ressource. Hopefully, the flow from > sender(1) > will achieve something about 2 Mbps and the flow from > sender(0) will get > the rest. I don?t know. I?m actually trying to find out. > > 2.: The path tail runs some rate controlled protocol as it would make > sense e.g. in satellite connections where the startup > behaviour of TCP > is extremely annoying. Now: How will this protocol distrute > the availabe > ressources among the flows? One could give equal shares to > them, i.e. 5 > Mbps to each. > However, because the flow from sender(1) cannot send faster > than 2 Mbps, > 3 Mbps would remain unused. > > Particularly the latter scenario seems to be some kind of > "end to end" > problem: The splitter node does not know the end to end ressource > situation and thus have to leave the distribution of > ressources to the > end nodes. > > Do I go wrong here? A "good" TCP-splitter should produce correct (desired) results in this scenario and many others. It should produce correct results with 2 or 200 connections, with 2 or 200 different network segments and bottlenecks, some before and some after the TCP-splitter; it should produce correct results when the amount of bandwidth available over the various network segments (especially over the satellite network segment) is variable and not known a priori; it should produce correct results when there is cross traffic on the bottleneck links in the network segments, which does not traverse the TCP-splitter. A TCP-splitter that "splits" bandwidth "equally" among various connections will not make the cut. Hint: TCP by itself does a commendable, although not perfect, job of meeting the above scenarios. I don't know how well other commercial TCP-splitters perform in these scenarios, but I can speak for one - we use our (my) own TCP PEP product over our VSAT networks; it works fairly well over many such scenarios (not quite 200 network segments :)). I tried out a few more scenarios, including yours, since your last email. Regards, Anil From detlef.bosau at web.de Wed Dec 27 09:47:52 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 27 Dec 2006 18:47:52 +0100 Subject: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com> Message-ID: <4592B1C8.5040601@web.de> Agarwal, Anil wrote: >> >> I?m not quite sure about the relationship to RED here. (In >> fact, I still have no personal opinion to RED, however I have >> numerous questions to RED, but I think that?s not the question here.) >> > > RED will probabilistically discard packets before the queue gets full, resulting in packet discards for both connections. RED will be biased against the larger connection. Eventually, both TCP connections will reach the same (statistical) rate and experience the same packet drop rate. > However, the latter is no consequence of RED but the basic goal of AIMD. > >> Again we have two cases. >> >> 1.: The path tail runs TCP or some other window controlled protocol >> which maintains available ressource. Hopefully, the flow from >> sender(1) >> will achieve something about 2 Mbps and the flow from >> sender(0) will get >> the rest. I don?t know. I?m actually trying to find out. >> >> 2.: The path tail runs some rate controlled protocol as it would make >> sense e.g. in satellite connections where the startup >> behaviour of TCP >> is extremely annoying. Now: How will this protocol distrute >> the availabe >> ressources among the flows? One could give equal shares to >> them, i.e. 5 >> Mbps to each. >> However, because the flow from sender(1) cannot send faster >> than 2 Mbps, >> 3 Mbps would remain unused. >> >> Particularly the latter scenario seems to be some kind of >> "end to end" >> problem: The splitter node does not know the end to end ressource >> situation and thus have to leave the distribution of >> ressources to the >> end nodes. >> >> Do I go wrong here? >> > > A "good" TCP-splitter should produce correct (desired) results in this scenario and many others. It should produce correct results with 2 or 200 connections, with 2 or 200 different network segments and bottlenecks, some before and some after the TCP-splitter; it should produce correct results when the amount of bandwidth available over the various network segments (especially over the satellite network segment) is variable and not known a priori; it should produce correct results when there is cross traffic on the bottleneck links in the network segments, which does not traverse the TCP-splitter. > It should. However I find it not easy to understand how this is achieved in each individual case. I even do not know that much literature on this issue. I know the works of Rajiv Chakravorthy and Rami Mukhtar, but I?m not aware of more works in this area. If you know some additional work, I would appreciate any hint. > A TCP-splitter that "splits" bandwidth "equally" among various connections will not make the cut. > > Hint: TCP by itself does a commendable, although not perfect, job of meeting the above scenarios. > > Yes, of course. However, introducing a splitter into a path has consequences for the end-to-end semantics of ACK ackets and thus for TCP selfclocking, the RTT, the path capacity as seen by the sender. So, if TCP does a good job in the absence of a splitter it is not clear by itself, that id did a good job in the presence of a splitter ;-) > I don't know how well other commercial TCP-splitters perform in these scenarios, but I can speak for one - we use our (my) own TCP PEP product over our VSAT networks; it works fairly well over many such scenarios (not quite 200 network segments :)). I tried out a few more scenarios, including yours, since your last email. > > Do you have any literature on this one? It?s sometimes a pity that commercial products are often poorly documented. This might be understood from a commercial point of view where it is always a concern to get an advantage about possible competitors. But it?s quite insatisfactory for a scientific discussion. In addition: I?m generally quite reluctant towards simulation. I think it?s always better to understand why a middlebox/splitter does or does not behave in a certain way. I often read papers where it is not quite clear why certain scenarios are chosen for simulations and other ones are left out. However, simulation can help to understand a certain behaviour. So, what I?m mostly interested in is the rationale behind the statement that a certain scenario will work. Detlef From detlef.bosau at web.de Thu Dec 28 11:10:32 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 28 Dec 2006 20:10:32 +0100 Subject: [e2e] Commercial splitters and end to end issues. In-Reply-To: <4592B1C8.5040601@web.de> References: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com> <4592B1C8.5040601@web.de> Message-ID: <459416A8.6020005@web.de> In our recent discussion it was said: > >> I don't know how well other commercial TCP-splitters perform in these >> scenarios, but I can speak for one - we use our (my) own TCP PEP >> product over our VSAT networks; it works fairly well over many such >> scenarios (not quite 200 network segments :)). I tried out a few more >> scenarios, including yours, since your last email. >> Admittedly, I?m somewhat disappointed. It is well possible that my scenario was nonsense. And it is well possible that my question was not clear. However, I always enjoy discussions where I can learn from. And now, I?m given the hint "our commercial product does this". It is not clear whether this is the case or not. Neither do I have a descriptions of the mechanisms in use. (I did not find any at the ViaSat Homepage. Only the typical whitepaper material.) Yesterday, I thought I had written a paper on this issue - and it were rejected with the simple statement: "Our commercial product does this." With no further description or hint. Although it might be no big deal to set up a simulation for connection splitting, I spent some work on it. And my questions might be stupid - at least theire are honest. And when I read the comment, a commercial product did this all without further hint, this made me sad and it made me angry. Particularly as I know similar comments from paper rejects. Perhaps, it?s my personal problem that I cannot deal with situations like this very well. Detlef From detlef.bosau at web.de Sun Dec 31 11:15:44 2006 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 31 Dec 2006 20:15:44 +0100 Subject: [e2e] Are we doing sliding window in the Internet? Message-ID: <45980C60.9020405@web.de> Happy New Year, Miss Sophy My Dear! (Although this sketch is in Englisch, it is hardly known outside Germay to my knowledge.) I wonder whether we?re really doing sliding window in TCP connections all the time or whether a number of connections have congestion windows of only one segment, i.e. behave like stop?n wait in reality. When I assume an Ethernet like MTU, i.e. 1500 byte = 12000 bit, and 10 ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps. From this I would expect that in quite a few cases a TCP connection will have a congestion window of 1 MSS or even less. In addition, some weeks ago I read a paper, I don?t remember were, that we should reconsider and perhaps resize our MTUs to larger values for networks with large bandwidth. The rationale was simply as follows: The MTU size is always a tradeoff between overhead and jitter. From Ethernet we know that we can accept a maximum packet duration of 12000 bit / (10 Mbps) = 1.2 ms and the resultig jitter. For Gigabit Ethernet a maximum packet duration of 1.2 ms would result in a MTU size of 1500 kbyte = 1.5 Mbyte. If so, we would see "stop?n wait like" connections much more frequently than today. Is this view correct? From DMedhi at umkc.edu Sun Dec 31 14:50:59 2006 From: DMedhi at umkc.edu (Medhi, Deep) Date: Sun, 31 Dec 2006 16:50:59 -0600 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45980C60.9020405@web.de> Message-ID: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu> See John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the Performance of HTTP Over Several Transport Protocols." ACM/IEEE Transactions on Networking, vol. 5, pp. 616-630, October, 1997. This covers maximum usable window size for different transmission media. -- Deep > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org] On Behalf Of Detlef Bosau > Sent: Sunday, December 31, 2006 1:16 PM > To: end2end-interest at postel.org > Cc: Daniel Minder; frank.duerr > Subject: [e2e] Are we doing sliding window in the Internet? > > Happy New Year, Miss Sophy My Dear! > > (Although this sketch is in Englisch, it is hardly known > outside Germay to my knowledge.) > > I wonder whether we?re really doing sliding window in TCP > connections all the time or whether a number of connections > have congestion windows of only one segment, i.e. behave like > stop?n wait in reality. > > When I assume an Ethernet like MTU, i.e. 1500 byte = 12000 > bit, and 10 ms RTT the throughput is roughly 12000 bit / 10 > ms = 1.2 Mbps. > > From this I would expect that in quite a few cases a TCP > connection will have a congestion window of 1 MSS or even less. > > In addition, some weeks ago I read a paper, I don?t remember > were, that we should reconsider and perhaps resize our MTUs > to larger values for networks with large bandwidth. The > rationale was simply as follows: The MTU size is always a > tradeoff between overhead and jitter. From Ethernet we know > that we can accept a maximum packet duration of 12000 bit / (10 > Mbps) = 1.2 ms and the resultig jitter. For Gigabit Ethernet > a maximum packet duration of 1.2 ms would result in a MTU > size of 1500 kbyte = 1.5 Mbyte. > > If so, we would see "stop?n wait like" connections much more > frequently than today. > > Is this view correct? > > > > From fred at cisco.com Sun Dec 31 16:29:00 2006 From: fred at cisco.com (Fred Baker) Date: Sun, 31 Dec 2006 16:29:00 -0800 Subject: [e2e] Are we doing sliding window in the Internet? In-Reply-To: <45980C60.9020405@web.de> References: <45980C60.9020405@web.de> Message-ID: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com> yes and no. A large percentage of sessions are very short - count the bytes in this email and consider how many TCP segments are required to carry it, for example, or look through your web cache to see the sizes of objects it stores. We are doing the sliding window algorithm, but it cuts very short when the TCP session abruptly closes. For longer exchanges - p2p and many others - yes, we indeed do sliding window. I don't see any reason to believe that TCPs tune themselves to have exactly RTT/MSS segments outstanding. That would be the optimal number to have ourstanding, but generally they will have the smallest of { the offered window, the sender's maximum window, and the used window at which they start dropping traffic }. If they never see loss, they can keep an incredibly large amount of data outstanding regardless of the values of RTT and MSS. I wonder where you got the notion that a typical session had a 10 ms RTT. In a LAN environment where the servers are in the same building, that is probably the case. But consider these rather more typical examples: across my VPN to a machine at work, across the US to MIT, and across the Atlantic to you: [stealth-10-32-244-218:~] fred% traceroute irp-view7 traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40 byte packets 1 fred-vpn (10.32.244.217) 1.486 ms 1.047 ms 1.034 ms 2 n003-000-000-000.static.ge.com (3.7.12.1) 22.360 ms 20.962 ms 22.194 ms 3 10.34.251.137 (10.34.251.137) 23.559 ms 22.586 ms 22.236 ms 4 sjc20-a5-gw2 (10.34.250.78) 21.465 ms 22.544 ms 20.748 ms 5 sjc20-sbb5-gw1 (128.107.180.105) 22.294 ms 22.351 ms 22.803 ms 6 sjc20-rbb-gw5 (128.107.180.22) 21.583 ms 22.517 ms 24.190 ms 7 sjc12-rbb-gw4 (128.107.180.2) 22.115 ms 23.143 ms 21.478 ms 8 sjc5-sbb4-gw1 (171.71.241.253) 26.550 ms 23.122 ms 21.569 ms 9 sjc12-dc5-gw2 (171.71.241.66) 22.115 ms 22.435 ms 22.185 ms 10 sjc5-dc3-gw2 (171.71.243.46) 22.031 ms 21.846 ms 22.185 ms 11 irp-view7 (171.70.65.144) 22.760 ms 22.912 ms 21.941 ms [stealth-10-32-244-218:~] fred% traceroute www.mit.edu traceroute to www.mit.edu (18.7.22.83), 64 hops max, 40 byte packets 1 fred-vpn (10.32.244.217) 1.468 ms 1.108 ms 1.083 ms 2 172.16.16.1 (172.16.16.1) 11.994 ms 10.351 ms 10.858 ms 3 cbshost-68-111-47-251.sbcox.net (68.111.47.251) 9.238 ms 19.517 ms 9.857 ms 4 12.125.98.101 (12.125.98.101) 11.849 ms 11.913 ms 12.086 ms 5 gbr1-p100.la2ca.ip.att.net (12.123.28.130) 12.348 ms 11.736 ms 12.891 ms 6 tbr2-p013502.la2ca.ip.att.net (12.122.11.145) 15.071 ms 13.462 ms 13.453 ms 7 12.127.3.221 (12.127.3.221) 12.643 ms 13.761 ms 14.345 ms 8 br1-a3110s9.attga.ip.att.net (192.205.33.230) 13.842 ms 12.414 ms 12.647 ms 9 ae-32-54.ebr2.losangeles1.level3.net (4.68.102.126) 16.651 ms ae-32-56.ebr2.losangeles1.level3.net (4.68.102.190) 20.154 ms * 10 * * * 11 ae-2.ebr1.sanjose1.level3.net (4.69.132.9) 28.222 ms 24.319 ms ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2) 35.417 ms 12 ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2) 25.640 ms 22.567 ms * 13 ae-3.ebr1.denver1.level3.net (4.69.132.58) 52.275 ms 60.821 ms 54.384 ms 14 ae-3.ebr1.chicago1.level3.net (4.69.132.62) 68.285 ms ae-1-100.ebr2.denver1.level3.net (4.69.132.38) 59.113 ms 68.779 ms 15 * * * 16 * ae-7-7.car1.boston1.level3.net (4.69.132.241) 94.977 ms * 17 ae-7-7.car1.boston1.level3.net (4.69.132.241) 95.821 ms ae-11-11.car2.boston1.level3.net (4.69.132.246) 93.856 ms ae-7-7.car1.boston1.level3.net (4.69.132.241) 96.735 ms 18 ae-11-11.car2.boston1.level3.net (4.69.132.246) 91.093 ms 92.125 ms 4.79.2.2 (4.79.2.2) 95.802 ms 19 4.79.2.2 (4.79.2.2) 93.945 ms 95.336 ms 97.301 ms 20 w92-rtr-1-backbone.mit.edu (18.168.0.25) 98.246 ms www.mit.edu (18.7.22.83) 93.657 ms w92-rtr-1-backbone.mit.edu (18.168.0.25) 92.610 ms [stealth-10-32-244-218:~] fred% traceroute web.de traceroute to web.de (217.72.195.42), 64 hops max, 40 byte packets 1 fred-vpn (10.32.244.217) 1.482 ms 1.078 ms 1.093 ms 2 172.16.16.1 (172.16.16.1) 12.131 ms 9.318 ms 8.140 ms 3 cbshost-68-111-47-251.sbcox.net (68.111.47.251) 10.790 ms 9.051 ms 10.564 ms 4 12.125.98.101 (12.125.98.101) 13.580 ms 21.643 ms 12.206 ms 5 gbr2-p100.la2ca.ip.att.net (12.123.28.134) 12.446 ms 12.914 ms 12.006 ms 6 tbr2-p013602.la2ca.ip.att.net (12.122.11.149) 13.463 ms 12.711 ms 12.187 ms 7 12.127.3.213 (12.127.3.213) 185.324 ms 11.845 ms 12.189 ms 8 192.205.33.226 (192.205.33.226) 12.008 ms 11.665 ms 25.390 ms 9 ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65) 13.695 ms ae-1-51.bbr1.losangeles1.level3.net (4.68.102.1) 11.645 ms ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65) 12.517 ms 10 ae-1-0.bbr1.frankfurt1.level3.net (212.187.128.30) 171.886 ms as-2-0.bbr2.frankfurt1.level3.net (4.68.128.169) 167.640 ms 168.895 ms 11 ge-10-0.ipcolo1.frankfurt1.level3.net (4.68.118.9) 170.336 ms ge-11-1.ipcolo1.frankfurt1.level3.net (4.68.118.105) 174.211 ms ge-10-1.ipcolo1.frankfurt1.level3.net (4.68.118.73) 169.730 ms 12 gw-megaspace.frankfurt.eu.level3.net (212.162.44.158) 169.276 ms 170.110 ms 168.099 ms 13 te-2-3.gw-backbone-d.bs.ka.schlund.net (212.227.120.17) 171.412 ms 171.820 ms 170.265 ms 14 a0kac2.gw-distwe-a.bs.ka.schlund.net (212.227.121.218) 175.416 ms 173.653 ms 174.007 ms 15 ha-42.web.de (217.72.195.42) 174.908 ms 174.921 ms 175.821 ms On Dec 31, 2006, at 11:15 AM, Detlef Bosau wrote: > Happy New Year, Miss Sophy My Dear! > > (Although this sketch is in Englisch, it is hardly known outside > Germay to my knowledge.) > > I wonder whether we?re really doing sliding window in TCP > connections all the time or whether a number of connections have > congestion windows of only one segment, i.e. behave like stop?n > wait in reality. > > When I assume an Ethernet like MTU, i.e. 1500 byte = 12000 bit, > and 10 ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps. > > From this I would expect that in quite a few cases a TCP connection > will have a congestion window of 1 MSS or even less. > > In addition, some weeks ago I read a paper, I don?t remember were, > that we should reconsider and perhaps resize our MTUs to larger > values for networks with large bandwidth. The rationale was simply > as follows: The MTU size is always a tradeoff between overhead and > jitter. From Ethernet we know that we can accept a maximum packet > duration of 12000 bit / (10 Mbps) = 1.2 ms and the resultig > jitter. For Gigabit Ethernet > a maximum packet duration of 1.2 ms would result in a MTU size of > 1500 kbyte = 1.5 Mbyte. > > If so, we would see "stop?n wait like" connections much more > frequently than today. > > Is this view correct? >