From skandor at gmail.com Thu Sep 3 13:23:52 2009 From: skandor at gmail.com (A.B.) Date: Thu, 3 Sep 2009 17:23:52 -0300 Subject: [e2e] a future for circuits? Message-ID: <25f9e2130909031323i1e4c1042xd7f907804fd85f51@mail.gmail.com> Hi, There has been a lot of attention in the last years, within the academic networks community, for hybrid networks that can carry both IP (layer3) traffic as layer2 (Ethernet VLANs) ?pseudo-circuits?. Can we see this as a sign that a future public internet will widely provide both kinds of service? Regards, - a.b. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090903/07d2d815/attachment.html From Jon.Crowcroft at cl.cam.ac.uk Fri Sep 4 03:09:49 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Fri, 04 Sep 2009 11:09:49 +0100 Subject: [e2e] a future for circuits? In-Reply-To: <25f9e2130909031323i1e4c1042xd7f907804fd85f51@mail.gmail.com> References: <25f9e2130909031323i1e4c1042xd7f907804fd85f51@mail.gmail.com> Message-ID: well, its clear that 100% isolation properties are needed between some global organisations using a shared infrastructure both in performance and in access control - the claim is that i) sharing the underlying transmission and switching is a gain (e.g. for operations&management) even when you don't get ii) the statistical multiplexing gain from all the elastic applications (I'm not sure the claim is 100% proven) certainly I can see a _massive_ number of virtual private enclaves in terms of access control (i'd like to manage all the devices in my house from wherever I am but not have anyone else get at any of them - so I have an android phone on IP - I should be able to do this from anywhere on the planet - i don't need hard multiplexing though - just secure VPNs will do fine-) one vision (e.g. BT's 21C network) has every household having a swiched ethernets worth of capacity to anywhere (unspecified wither its 10, 100, 1G or whatever), and there are management simplifications if this maps to a sub-lambda mux on the optical core and (slowly being deployed) access networks, purely in hardware... but i dont see soggy, fluffy, packet switching going away.... In missive <25f9e2130909031323i1e4c1042xd7f907804fd85f51 at mail.gmail.com>, "A .B." typed: >>--0015175cd0788b0a1c0472b2294d >>Content-Type: text/plain; charset=windows-1252 >>Content-Transfer-Encoding: quoted-printable >> >>Hi, >> >>There has been a lot of attention in the last years, within the academic >>networks community, for hybrid networks that can carry both IP (layer3) >>traffic as layer2 (Ethernet VLANs) =93pseudo-circuits=94. >> >>Can we see this as a sign that a future public internet will widely provid= >>e >>both kinds of service? >> >> >> >>Regards, >> >> >> >>- a.b. >> >>--0015175cd0788b0a1c0472b2294d >>Content-Type: text/html; charset=windows-1252 >>Content-Transfer-Encoding: quoted-printable >> >> >> >>

Hi,
>>
>>There has been a lot of attention in the last years, within the academic >>networks community, for hybrid networks that can carry both IP (layer3) tra= >>ffic >>as layer2 (Ethernet VLANs) =93pseudo-circuits=94.

>> >>

Can we see >>this as a sign that =A0a future public internet >>will widely provide both kinds of service?

>> >>

=A0

>> >>

Regards,

>> >>

=A0

>> >>

->Roman"; font-style: normal; font-variant: normal; font-weight: normal;= >> font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch:= >> normal;">=A0=A0=A0=A0=A0=A0=A0=A0=A0 >>a.b. >>

>> >> >>--0015175cd0788b0a1c0472b2294d-- cheers jon From dpreed at reed.com Sun Sep 6 18:00:16 2009 From: dpreed at reed.com (David P. Reed) Date: Sun, 06 Sep 2009 21:00:16 -0400 Subject: [e2e] What's wrong with this picture? Message-ID: <4AA45B20.6030705@reed.com> For those who have some idea of how TCP does congestion control, I ask "what's wrong with this picture?" And perhaps those who know someone responsible at the Internet Access Provider involved, perhaps we could organize some consulting help... (Hint: the problem relates to a question, "why are there no lost IP datagrams?", and a second hint is that the ping time this morning was about 193 milliseconds.) Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer the question. (they used to get funding from the IAP involved, but apparently that company does not listen to them). $ ping lcs.mit.edu PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 time=6330 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 time=6005 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 time=8509 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 time=9310 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 time=8586 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 time=7765 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 time=7168 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 time=10261 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 time=10624 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 time=9625 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 time=9725 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 time=8725 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 time=9306 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 time=8306 ms ^C --- lcs.mit.edu ping statistics --- 24 packets transmitted, 14 received, 41% packet loss, time 33174ms rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe 11 $ traceroute lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets 1 * * * 2 172.26.248.2 (172.26.248.2) 693.585 ms 693.415 ms 712.282 ms 3 * * * 4 172.16.192.18 (172.16.192.18) 712.700 ms 1356.680 ms 1359.469 ms 5 12.88.7.205 (12.88.7.205) 1361.306 ms 673.642 ms 673.541 ms 6 cr84.cgcil.ip.att.net (12.122.152.134) 673.442 ms 673.371 ms 673.742 ms 7 cr2.cgcil.ip.att.net (12.123.7.250) 655.126 ms 654.186 ms 554.690 ms 8 * * ggr2.cgcil.ip.att.net (12.122.132.133) 912.385 ms 9 192.205.33.210 (192.205.33.210) 909.925 ms 911.335 ms 911.204 ms 10 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 569.740 ms 569.605 ms 907.409 ms 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 369.680 ms 344.495 ms 345.252 ms 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 355.645 ms 641.866 ms 641.367 ms 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 636.598 ms 636.797 ms 635.755 ms 14 B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23) 635.766 ms 634.794 ms 866.430 ms 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 758.305 ms 822.244 ms 821.202 ms 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 833.699 ms 1055.548 ms 1116.813 ms 17 zermatt.csail.mit.edu (128.30.2.121) 1114.838 ms 539.951 ms 620.681 ms [david at whimsy ~]$ ping 172.26.248.2 PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data. 64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms 64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms 64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms 64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms 64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms 64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms 64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms 64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms 64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms 64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms 64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms 64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms 64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms 64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms ^C --- 172.26.248.2 ping statistics --- 15 packets transmitted, 14 received, 6% packet loss, time 14322ms rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2 $ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090906/5019446c/attachment.html From xaixili at live.com Sun Sep 6 23:43:56 2009 From: xaixili at live.com (Xai Xi) Date: Mon, 7 Sep 2009 06:43:56 +0000 Subject: [e2e] load balancing Message-ID: Hello e2e veterans, Do you recall any early attempts to combine traffic load balancing and routing in the same distributed process? Were there any deployments? Thanks, Xai. _________________________________________________________________ Drag n? drop?Get easy photo sharing with Windows Live? Photos. http://www.microsoft.com/windows/windowslive/products/photos.aspx -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/2f6939d1/attachment.html From sgros at zemris.fer.hr Mon Sep 7 00:21:37 2009 From: sgros at zemris.fer.hr (Stjepan Gros) Date: Mon, 07 Sep 2009 09:21:37 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: <1252308097.6257.9.camel@fedora> On Sun, 2009-09-06 at 21:00 -0400, David P. Reed wrote: > For those who have some idea of how TCP does congestion control, I ask > "what's wrong with this picture?" And perhaps those who know someone > responsible at the Internet Access Provider involved, perhaps we could > organize some consulting help... > > (Hint: the problem relates to a question, "why are there no lost IP > datagrams?", and a second hint is that the ping time this morning was > about 193 milliseconds.) > > Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer > the question. (they used to get funding from the IAP involved, but > apparently that company does not listen to them). > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=6330 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=6005 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 > time=8509 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 > time=9310 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 > time=8586 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 > time=7765 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 > time=7168 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 > time=10261 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 > time=10624 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 > time=9625 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 > time=9725 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 > time=8725 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 > time=9306 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 > time=8306 ms > ^C > --- lcs.mit.edu ping statistics --- > 24 packets transmitted, 14 received, 41% packet loss, time 33174ms > rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe > 11 > $ traceroute lcs.mit.edu > traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets > 1 * * * > 2 172.26.248.2 (172.26.248.2) 693.585 ms 693.415 ms 712.282 ms > 3 * * * > 4 172.16.192.18 (172.16.192.18) 712.700 ms 1356.680 ms 1359.469 > ms > 5 12.88.7.205 (12.88.7.205) 1361.306 ms 673.642 ms 673.541 ms > 6 cr84.cgcil.ip.att.net (12.122.152.134) 673.442 ms 673.371 ms > 673.742 ms > 7 cr2.cgcil.ip.att.net (12.123.7.250) 655.126 ms 654.186 ms > 554.690 ms > 8 * * ggr2.cgcil.ip.att.net (12.122.132.133) 912.385 ms > 9 192.205.33.210 (192.205.33.210) 909.925 ms 911.335 ms 911.204 > ms > 10 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 569.740 ms > 569.605 ms 907.409 ms > 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 369.680 ms 344.495 > ms 345.252 ms > 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 355.645 ms 641.866 > ms 641.367 ms > 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 636.598 ms > 636.797 ms 635.755 ms > 14 B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23) 635.766 ms 634.794 ms > 866.430 ms > 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 758.305 ms 822.244 ms > 821.202 ms > 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 833.699 ms 1055.548 > ms 1116.813 ms > 17 zermatt.csail.mit.edu (128.30.2.121) 1114.838 ms 539.951 ms > 620.681 ms > [david at whimsy ~]$ ping 172.26.248.2 > PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data. > 64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms > 64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms > 64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms > 64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms > 64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms > 64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms > 64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms > 64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms > 64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms > 64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms > 64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms > 64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms > 64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms > 64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms > ^C > --- 172.26.248.2 ping statistics --- > 15 packets transmitted, 14 received, 6% packet loss, time 14322ms > rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2 > $ My guess: the large RTT values cause the network path to have large bandwidth delay product. TCP, in the absence of the packet loss, tries to fill it, causing even more congestion? SG From Jon.Crowcroft at cl.cam.ac.uk Mon Sep 7 01:38:06 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Mon, 07 Sep 2009 09:38:06 +0100 Subject: [e2e] load balancing In-Reply-To: References: Message-ID: the boldest attempt at this (I think) that was sane was http://tools.ietf.org/html/draft-ietf-ospf-omp-02 but it is 10 years ago, and i dont think people have done much with it since (but it was cute!) the idea of resource pooling is part of the trilogy project and is now being discussed in the ietf a lot - some background is at https://fit.nokia.com/lars/talks/2009-keio-trilogy.pdf and www.ietf.org/proceedings/72/slides/tsvarea-2.pdf modeling results by kelly et al, and key/massoulie et al show that this is more robust since relatively small numbers of pseudo-random choices of disjoint or partially link disjoint paths can get most of the gain of a pure optimisation approach... In missive , Xai Xi typed: >> >>Hello e2e veterans=2C >> >>Do you recall any early attempts to combine traffic load balancing and rout= >>ing in the same distributed process? Were there any deployments? >> >>Thanks=2C >>Xai. >> >> >>_________________________________________________________________ >>Drag n=92 drop=97Get easy photo sharing with Windows Live=99 Photos. >> >>http://www.microsoft.com/windows/windowslive/products/photos.aspx= >> >>--_ad54f11f-0e1a-418c-8930-a725f8990c78_ >>Content-Type: text/html; charset="Windows-1252" >>Content-Transfer-Encoding: quoted-printable >> >> >> >> >> >> >>Hello e2e veterans=2C

Do you recall any early attempts to combine tr= >>affic load balancing and routing in the same distributed process? Were ther= >>e any deployments?

Thanks=2C
Xai.



What can you= >> do with the new Windows Live? >windowslive/default.aspx' target=3D'_new'>Find out >>= >> >>--_ad54f11f-0e1a-418c-8930-a725f8990c78_-- cheers jon From Jon.Crowcroft at cl.cam.ac.uk Mon Sep 7 01:39:11 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Mon, 07 Sep 2009 09:39:11 +0100 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: so you've got a slow uplink and fast downlink - it doesn't look like xDSL - is it a really overloaded 3G or 2.5G link? (if there's congestion in the "air" interface, you might get this sort of thing...) In missive <4AA45B20.6030705 at reed.com>, "David P. Reed" typed: >>This is a multi-part message in MIME format. >>--------------020400030709090500010400 >>Content-Type: text/plain; charset=ISO-8859-1; format=flowed >>Content-Transfer-Encoding: 7bit >> >>For those who have some idea of how TCP does congestion control, I ask >>"what's wrong with this picture?" And perhaps those who know someone >>responsible at the Internet Access Provider involved, perhaps we could >>organize some consulting help... >> >>(Hint: the problem relates to a question, "why are there no lost IP >>datagrams?", and a second hint is that the ping time this morning was >>about 193 milliseconds.) >> >>Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer >>the question. (they used to get funding from the IAP involved, but >>apparently that company does not listen to them). >> >>$ ping lcs.mit.edu >>PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 >>time=6330 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 >>time=6005 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 >>time=8509 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 >>time=9310 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 >>time=8586 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 >>time=7765 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 >>time=7168 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 >>time=10261 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 >>time=10624 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 >>time=9625 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 >>time=9725 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 >>time=8725 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 >>time=9306 ms >>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 >>time=8306 ms >>^C >>--- lcs.mit.edu ping statistics --- >>24 packets transmitted, 14 received, 41% packet loss, time 33174ms >>rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe 11 >>$ traceroute lcs.mit.edu >>traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets >> 1 * * * >> 2 172.26.248.2 (172.26.248.2) 693.585 ms 693.415 ms 712.282 ms >> 3 * * * >> 4 172.16.192.18 (172.16.192.18) 712.700 ms 1356.680 ms 1359.469 ms >> 5 12.88.7.205 (12.88.7.205) 1361.306 ms 673.642 ms 673.541 ms >> 6 cr84.cgcil.ip.att.net (12.122.152.134) 673.442 ms 673.371 ms >>673.742 ms >> 7 cr2.cgcil.ip.att.net (12.123.7.250) 655.126 ms 654.186 ms 554.690 ms >> 8 * * ggr2.cgcil.ip.att.net (12.122.132.133) 912.385 ms >> 9 192.205.33.210 (192.205.33.210) 909.925 ms 911.335 ms 911.204 ms >>10 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 569.740 ms 569.605 >>ms 907.409 ms >>11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 369.680 ms 344.495 >>ms 345.252 ms >>12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 355.645 ms 641.866 >>ms 641.367 ms >>13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 636.598 ms >>636.797 ms 635.755 ms >>14 B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23) 635.766 ms 634.794 ms >>866.430 ms >>15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 758.305 ms 822.244 ms >>821.202 ms >>16 trantor.kalgan.csail.mit.edu (128.30.0.246) 833.699 ms 1055.548 >>ms 1116.813 ms >>17 zermatt.csail.mit.edu (128.30.2.121) 1114.838 ms 539.951 ms >>620.681 ms >>[david at whimsy ~]$ ping 172.26.248.2 >>PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data. >>64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms >>64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms >>64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms >>64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms >>64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms >>64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms >>64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms >>64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms >>64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms >>64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms >>64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms >>64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms >>64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms >>64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms >>^C >>--- 172.26.248.2 ping statistics --- >>15 packets transmitted, 14 received, 6% packet loss, time 14322ms >>rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2 >>$ >> >> >>--------------020400030709090500010400 >>Content-Type: text/html; charset=ISO-8859-1 >>Content-Transfer-Encoding: 7bit >> >> >> >> >> >> >> >> >>For those who have some idea >>of how TCP does congestion control, I ask "what's wrong with this >>picture?"  And perhaps those who know someone responsible at the >>Internet Access Provider involved, perhaps we could organize some >>consulting help...
>>
>>(Hint: the problem relates to a question, "why are there no lost IP >>datagrams?", and a second hint is that the ping time this morning was >>about 193 milliseconds.)
>>
>>Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer >>the question.  (they used to get funding from the IAP involved, but >>apparently that company does not listen to them).
>>
>>$ ping lcs.mit.edu
>>PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data.
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 >>time=6330 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 >>time=6005 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 >>time=8509 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 >>time=9310 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 >>time=8586 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 >>time=7765 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 >>time=7168 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 >>time=10261 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 >>time=10624 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 >>time=9625 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 >>time=9725 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 >>time=8725 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 >>time=9306 ms
>>64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 >>time=8306 ms
>>^C
>>--- lcs.mit.edu ping statistics ---
>>24 packets transmitted, 14 received, 41% packet loss, time 33174ms
>>rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe 11
>>$ traceroute lcs.mit.edu
>>traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets
>> 1  * * *
>> 2  172.26.248.2 (172.26.248.2)  693.585 ms  693.415 ms  712.282 ms
>> 3  * * *
>> 4  172.16.192.18 (172.16.192.18)  712.700 ms  1356.680 ms  1359.469 ms
>> 5  12.88.7.205 (12.88.7.205)  1361.306 ms  673.642 ms  673.541 ms
>> 6  cr84.cgcil.ip.att.net (12.122.152.134)  673.442 ms  673.371 ms  >>673.742 ms
>> 7  cr2.cgcil.ip.att.net (12.123.7.250)  655.126 ms  654.186 ms  >>554.690 ms
>> 8  * * ggr2.cgcil.ip.att.net (12.122.132.133)  912.385 ms
>> 9  192.205.33.210 (192.205.33.210)  909.925 ms  911.335 ms  911.204 ms
>>10  ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94)  569.740 ms  >>569.605 ms  907.409 ms
>>11  ae-1-5.bar1.Boston1.Level3.net (4.69.140.93)  369.680 ms  344.495 >>ms  345.252 ms
>>12  ae-7-7.car1.Boston1.Level3.net (4.69.132.241)  355.645 ms  641.866 >>ms  641.367 ms
>>13  MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98)  636.598 ms  >>636.797 ms  635.755 ms
>>14  B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23)  635.766 ms  634.794 ms  >>866.430 ms
>>15  MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65)  758.305 ms  822.244 ms  >>821.202 ms
>>16  trantor.kalgan.csail.mit.edu (128.30.0.246)  833.699 ms  1055.548 >>ms  1116.813 ms
>>17  zermatt.csail.mit.edu (128.30.2.121)  1114.838 ms  539.951 ms  >>620.681 ms
>>[david at whimsy ~]$ ping 172.26.248.2
>>PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data.
>>64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms
>>64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms
>>64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms
>>64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms
>>64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms
>>64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms
>>64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms
>>64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms
>>64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms
>>64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms
>>64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms
>>64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms
>>64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms
>>64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms
>>^C
>>--- 172.26.248.2 ping statistics ---
>>15 packets transmitted, 14 received, 6% packet loss, time 14322ms
>>rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2
>>$
>>
>>
>> >> >> >>--------------020400030709090500010400-- cheers jon From jeroen at unfix.org Mon Sep 7 01:53:12 2009 From: jeroen at unfix.org (Jeroen Massar) Date: Mon, 07 Sep 2009 10:53:12 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> David P. Reed wrote: > For those who have some idea of how TCP does congestion control, I ask > "what's wrong with this picture?" TCP != ICMP ;) But for the rest, most likely your pipes are very very very full but have proper buffering. Greets, Jeroen -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/78e0588b/signature.bin From ihsanqazi at gmail.com Mon Sep 7 03:20:37 2009 From: ihsanqazi at gmail.com (Ihsan Ayyub Qazi) Date: Mon, 7 Sep 2009 06:20:37 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> References: <4AA45B20.6030705@reed.com> <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> Message-ID: > > TCP != ICMP ;) > Sure but the large delay (and variation) is probably due to large buffers getting filled up by TCP traffic; ping just measured the delay rather than causing it. Ihsan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/28cb6ab7/attachment.html From jeroen at unfix.org Mon Sep 7 03:46:44 2009 From: jeroen at unfix.org (Jeroen Massar) Date: Mon, 07 Sep 2009 12:46:44 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <4AA45B20.6030705@reed.com> <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> Message-ID: <4AA4E494.40102@spaghetti.zurich.ibm.com> Ihsan Ayyub Qazi wrote: > TCP != ICMP ;) > > > Sure but the large delay (and variation) is probably due to large > buffers getting filled up by TCP traffic; ping just measured the delay > rather than causing it. That is not what I meant. Just think of the case where there is a nice ISP involved, or for that matter several, who are doing "QoS" and prioritizes ICMP a lot lower than TCP... you do know that all bittorrent is evil and that everything over port 80 is 'good' don't you? :) Also think of that little fact that traceroute is unidirectional, you only see the return path from your vantage point, not the path that the packets are taking from those points, you know that the RTT is but you don't know if that is a symmetric path or not, let alone where the delay is happening. This is why there was this feature called source-routing which allowed one to partially make that happen, but unfortunately that feature has a lot of abuse added to it. David: a cooler question would involve some 6to4 hosts, then one really does not know where the packets are truly going ;) As long as one does not own/control/operate the complete path it will be really quite hard to tell what happens to packets outside ones control. Greets, Jeroen -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/515388fc/signature.bin From sgros at zemris.fer.hr Mon Sep 7 05:37:17 2009 From: sgros at zemris.fer.hr (Stjepan Gros) Date: Mon, 07 Sep 2009 14:37:17 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA4E494.40102@spaghetti.zurich.ibm.com> References: <4AA45B20.6030705@reed.com> <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> Message-ID: <1252327037.4357.39.camel@fedora> On Mon, 2009-09-07 at 12:46 +0200, Jeroen Massar wrote: > Ihsan Ayyub Qazi wrote: > > TCP != ICMP ;) > > > > > > Sure but the large delay (and variation) is probably due to large > > buffers getting filled up by TCP traffic; ping just measured the delay > > rather than causing it. > > That is not what I meant. Just think of the case where there is a nice > ISP involved, or for that matter several, who are doing "QoS" and > prioritizes ICMP a lot lower than TCP... you do know that all bittorrent > is evil and that everything over port 80 is 'good' don't you? :) In the presence of QoS, it's hard to conclude something about one protocol (TCP) from the measurements performed with another protocol (ICMP) if at least some assumptions are made. But, in this case, it _could_ be that the conclusions are similar: 1. For a start, I would always give higher priority to ICMP and traceroute, simply to be able to detect when something is wrong in the network, even when it is congested. But I do not have any operational experience, so I'm speculating. Anyway, in that case, ICMP traffic is a lower bound on delay TCP will get. 2. If, on the other hand, ICMP has lower priority than TCP, then the delay can be from around 0 up to the given value, but since ICMP is going only after TCP has been sent than it represents upper bound on TCP's delay which means that, at least some connections, experience approximately equal delay to ICMP packets, bringing us again to the high delay in network. > Also think of that little fact that traceroute is unidirectional, you > only see the return path from your vantage point, not the path that the > packets are taking from those points, you know that the RTT is but > you don't know if that is a symmetric path or not, let alone where the > delay is happening. This is why there was this feature called > source-routing which allowed one to partially make that happen, but > unfortunately that feature has a lot of abuse added to it. I don't get this one, i.e. that we see "return path"? Maybe I didn't understand what you meant, but as I understand it we could see different interfaces on the routers when the packet is going _to_ the destination, but in any case, we see routers that the packet is going through to the destination, not the path the response is sent back. But I agree that traceroute could be misleading... BTW let me try additional guess, again, under the assumption of (almost) no QoS. If I assume that ping times to the node lcs.mit.edu and to the node 172.26.248.2 are measured at the same level of congestion, then did this unnamed ISP decided to do aggressive buffering in the node 172.26.248.2 (or something similar that has the same effect)? That would be one possible explanation what's happening. Namely, traffic that goes through node 172.26.248.2 is blocked on the outgoing line which incurs high RTT, but the traffic to the node itself goes directly to the CPU and response is sent immediately back yielding lower RTT? Stjepan P.S. With all the recent hype of ISPs doing traffic limiting and similar, I doubt that this has nothing to do with QoS, making my guesses a bit wrong, but it's interesting nevertheless... :) From sthaug at nethelp.no Mon Sep 7 05:44:14 2009 From: sthaug at nethelp.no (sthaug@nethelp.no) Date: Mon, 07 Sep 2009 14:44:14 +0200 (CEST) Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA4E494.40102@spaghetti.zurich.ibm.com> References: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> Message-ID: <20090907.144414.74741371.sthaug@nethelp.no> > That is not what I meant. Just think of the case where there is a nice > ISP involved, or for that matter several, who are doing "QoS" and > prioritizes ICMP a lot lower than TCP... you do know that all bittorrent > is evil and that everything over port 80 is 'good' don't you? :) Do you know of *any* ISPs that explicitly prioritize ICMP a lot lower than TCP for traffic *through* the ISP's routers? Note that this is different from ICMP to/from the router itself. Steinar Haug, Nethelp consulting, sthaug at nethelp.no From jeroen at unfix.org Mon Sep 7 05:57:55 2009 From: jeroen at unfix.org (Jeroen Massar) Date: Mon, 07 Sep 2009 14:57:55 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <20090907.144414.74741371.sthaug@nethelp.no> References: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> <20090907.144414.74741371.sthaug@nethelp.no> Message-ID: <4AA50353.6000207@spaghetti.zurich.ibm.com> sthaug at nethelp.no wrote: >> That is not what I meant. Just think of the case where there is a nice >> ISP involved, or for that matter several, who are doing "QoS" and >> prioritizes ICMP a lot lower than TCP... you do know that all bittorrent >> is evil and that everything over port 80 is 'good' don't you? :) > > Do you know of *any* ISPs that explicitly prioritize ICMP a lot lower > than TCP for traffic *through* the ISP's routers? I know several ISPs with heavily misconfigured "QoS" or rather "prioritization" setups. They tend to configure "TCP port 80" (unless it goes to some streaming site like Youtube) as good traffic, and thus high-prio that traffic, everything else is dropped in the bad bucket. ICMP is generally forgotten or directly classified as 'only used for DDoS' and thus make it very low-priority. It might sound logical for the person who configures it, but it is actually not... > Note that this is > different from ICMP to/from the router itself. That is generally because of the hardware doing queuing and depending on platform because the CPU of the box has to handle it instead of just forwarding the packets. As the latencies where traceroute, the packets would hit the router itself and it would have to handle it. (if TTL=0 oeeeh, it is me ;) Greets, Jeroen -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 196 bytes Desc: OpenPGP digital signature Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/10591d6f/signature.bin From dpreed at reed.com Mon Sep 7 06:30:49 2009 From: dpreed at reed.com (David P. Reed) Date: Mon, 07 Sep 2009 09:30:49 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA4E494.40102@spaghetti.zurich.ibm.com> References: <4AA45B20.6030705@reed.com> <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> Message-ID: <4AA50B09.4030308@reed.com> You are overthinking this. I can share with you the delays using a personal hack "tcp ping" tool that sends a few bytes over a TCPNODELAY socket. They are consistent with this measure. Measurements are never perfect, but that doesn't mean they can't tell us a lot. I used to diagnose the Multics operating system performance problems by studying the panel register display. Rarely needed to write special profiling code On 09/07/2009 06:46 AM, Jeroen Massar wrote: > Ihsan Ayyub Qazi wrote: > >> TCP != ICMP ;) >> >> >> Sure but the large delay (and variation) is probably due to large >> buffers getting filled up by TCP traffic; ping just measured the delay >> rather than causing it. >> > That is not what I meant. Just think of the case where there is a nice > ISP involved, or for that matter several, who are doing "QoS" and > prioritizes ICMP a lot lower than TCP... you do know that all bittorrent > is evil and that everything over port 80 is 'good' don't you? :) > > Also think of that little fact that traceroute is unidirectional, you > only see the return path from your vantage point, not the path that the > packets are taking from those points, you know that the RTT is but > you don't know if that is a symmetric path or not, let alone where the > delay is happening. This is why there was this feature called > source-routing which allowed one to partially make that happen, but > unfortunately that feature has a lot of abuse added to it. > > > David: a cooler question would involve some 6to4 hosts, then one really > does not know where the packets are truly going ;) > > As long as one does not own/control/operate the complete path it will be > really quite hard to tell what happens to packets outside ones control. > > Greets, > Jeroen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/84e744a8/attachment-0001.html From jnc at mercury.lcs.mit.edu Mon Sep 7 07:32:57 2009 From: jnc at mercury.lcs.mit.edu (Noel Chiappa) Date: Mon, 7 Sep 2009 10:32:57 -0400 (EDT) Subject: [e2e] load balancing Message-ID: <20090907143257.878A66BE56B@mercury.lcs.mit.edu> > From: Xai Xi > Do you recall any early attempts to combine traffic load balancing and > routing in the same distributed process? Were there any deployments? The ARPANet did this, in both versions of the routing it ran. Note that including load in path selection can reduce the stability of the routing; at worst, there can be feedback loops which cause oscillations, and other unstable behaviour. The original ARPANet routing was subject to numerous problems of this sort; see BBN Report 3803, which discusses the problems in some detail. Note that the new ARPANet routing didn't completely cure the problem, merely ameliorated it. That's because the fundamental problem, a feedback loop between path selection and load, was still there. There are ways to break the feedback loop and still have path selection dependent on load, but not in a routing architecture that looks anything like that (e.g. hop-by-hop path selection). Noel From algold at rnp.br Mon Sep 7 07:37:48 2009 From: algold at rnp.br (Alexandre Grojsgold) Date: Mon, 7 Sep 2009 11:37:48 -0300 Subject: [e2e] RES: What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: <002e01ca2fc8$c56166e0$502434a0$@br> Hi, Looking at the first ping command, it seems that the first packet was lost (icmp_seq=0), as well as a series of packets after the 15th, before David decided to ^C the ping execution. The 14 packets that went through experienced a huge delay, really hard to explain. So, my first guess on a "wrong thing" - no network should hold a packet for long 8 or 9 seconds, and yet deliver it to somewhere. No buffer should be big enough to hold packets so long . But I still cannot imagine where the packets where sitting for such a long time. Looking at the traceroute: the measured times do not grow monotonously, and show hi variance. I would say that the delay came from the first hop (172.26.248.2?). A new ping, now targeted at 172.26.248.2, shows a decreasing round trip delay - a buffer getting empty? - and still shows 1 packet loss. It seems also that David is behind a NAT, since 172.26.248.2 is a RFC1918 reserved address. In any case, such long delays cannot be good for stable functioning of TCP congestion control. --a.l.g. De: end2end-interest-bounces at postel.org [mailto:end2end-interest-bounces at postel.org] Em nome de David P. Reed Enviada em: domingo, 6 de setembro de 2009 22:00 Para: end2end-interest list Assunto: [e2e] What's wrong with this picture? For those who have some idea of how TCP does congestion control, I ask "what's wrong with this picture?" And perhaps those who know someone responsible at the Internet Access Provider involved, perhaps we could organize some consulting help... (Hint: the problem relates to a question, "why are there no lost IP datagrams?", and a second hint is that the ping time this morning was about 193 milliseconds.) Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer the question. (they used to get funding from the IAP involved, but apparently that company does not listen to them). $ ping lcs.mit.edu PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 time=6330 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 time=6005 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 time=8509 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 time=9310 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 time=8586 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 time=7765 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 time=7168 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 time=10261 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 time=10624 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 time=9625 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 time=9725 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 time=8725 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 time=9306 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 time=8306 ms ^C --- lcs.mit.edu ping statistics --- 24 packets transmitted, 14 received, 41% packet loss, time 33174ms rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe 11 $ traceroute lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets 1 * * * 2 172.26.248.2 (172.26.248.2) 693.585 ms 693.415 ms 712.282 ms 3 * * * 4 172.16.192.18 (172.16.192.18) 712.700 ms 1356.680 ms 1359.469 ms 5 12.88.7.205 (12.88.7.205) 1361.306 ms 673.642 ms 673.541 ms 6 cr84.cgcil.ip.att.net (12.122.152.134) 673.442 ms 673.371 ms 673.742 ms 7 cr2.cgcil.ip.att.net (12.123.7.250) 655.126 ms 654.186 ms 554.690 ms 8 * * ggr2.cgcil.ip.att.net (12.122.132.133) 912.385 ms 9 192.205.33.210 (192.205.33.210) 909.925 ms 911.335 ms 911.204 ms 10 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 569.740 ms 569.605 ms 907.409 ms 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 369.680 ms 344.495 ms 345.252 ms 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 355.645 ms 641.866 ms 641.367 ms 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 636.598 ms 636.797 ms 635.755 ms 14 B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23) 635.766 ms 634.794 ms 866.430 ms 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 758.305 ms 822.244 ms 821.202 ms 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 833.699 ms 1055.548 ms 1116.813 ms 17 zermatt.csail.mit.edu (128.30.2.121) 1114.838 ms 539.951 ms 620.681 ms [david at whimsy ~]$ ping 172.26.248.2 PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data. 64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms 64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms 64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms 64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms 64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms 64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms 64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms 64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms 64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms 64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms 64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms 64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms 64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms 64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms ^C --- 172.26.248.2 ping statistics --- 15 packets transmitted, 14 received, 6% packet loss, time 14322ms rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2 $ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090907/2d3549c8/attachment.html From Jon.Crowcroft at cl.cam.ac.uk Mon Sep 7 08:19:13 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Mon, 07 Sep 2009 16:19:13 +0100 Subject: [e2e] What's wrong with this picture? In-Reply-To: <1252327037.4357.39.camel@fedora> References: <4AA45B20.6030705@reed.com> <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> <1252327037.4357.39.camel@fedora> Message-ID: yes, its clear that "ping" is a higher freqency (just say "ping" out loud) than the gutteral trrracerroute and abhorrent bittorrent is basso profundo, thus imperfect band pass filters will impact their mileage differentially... and of course commercially minded ISPs may be carrying deep pocket inspection so more seriously: of course, in a sane world, QoS only takes away from customers willing to pay less when there are customers willing to pay more (c.f. work conservation #101) and not away from everyone in the working class when there's nothing else going on in the upper echelons; but it is problematical for the user to tell if this is what is going on reproduceably. In missive <1252327037.4357.39.camel at fedora>, Stjepan Gros typed: >>On Mon, 2009-09-07 at 12:46 +0200, Jeroen Massar wrote: >>> Ihsan Ayyub Qazi wrote: >>> > TCP != ICMP ;) >>> > >>> > >>> > Sure but the large delay (and variation) is probably due to large >>> > buffers getting filled up by TCP traffic; ping just measured the delay >>> > rather than causing it. >>> >>> That is not what I meant. Just think of the case where there is a nice >>> ISP involved, or for that matter several, who are doing "QoS" and >>> prioritizes ICMP a lot lower than TCP... you do know that all bittorrent >>> is evil and that everything over port 80 is 'good' don't you? :) >> >>In the presence of QoS, it's hard to conclude something about one >>protocol (TCP) from the measurements performed with another protocol >>(ICMP) if at least some assumptions are made. But, in this case, it >>_could_ be that the conclusions are similar: >> >>1. For a start, I would always give higher priority to ICMP and >>traceroute, simply to be able to detect when something is wrong in the >>network, even when it is congested. But I do not have any operational >>experience, so I'm speculating. Anyway, in that case, ICMP traffic is a >>lower bound on delay TCP will get. >> >>2. If, on the other hand, ICMP has lower priority than TCP, then the >>delay can be from around 0 up to the given value, but since ICMP is >>going only after TCP has been sent than it represents upper bound on >>TCP's delay which means that, at least some connections, experience >>approximately equal delay to ICMP packets, bringing us again to the high >>delay in network. >> >>> Also think of that little fact that traceroute is unidirectional, you >>> only see the return path from your vantage point, not the path that the >>> packets are taking from those points, you know that the RTT is but >>> you don't know if that is a symmetric path or not, let alone where the >>> delay is happening. This is why there was this feature called >>> source-routing which allowed one to partially make that happen, but >>> unfortunately that feature has a lot of abuse added to it. >> >>I don't get this one, i.e. that we see "return path"? Maybe I didn't >>understand what you meant, but as I understand it we could see different >>interfaces on the routers when the packet is going _to_ the destination, >>but in any case, we see routers that the packet is going through to the >>destination, not the path the response is sent back. But I agree that >>traceroute could be misleading... >> >>BTW let me try additional guess, again, under the assumption of (almost) >>no QoS. If I assume that ping times to the node lcs.mit.edu and to the >>node 172.26.248.2 are measured at the same level of congestion, then did >>this unnamed ISP decided to do aggressive buffering in the node >>172.26.248.2 (or something similar that has the same effect)? That would >>be one possible explanation what's happening. Namely, traffic that goes >>through node 172.26.248.2 is blocked on the outgoing line which incurs >>high RTT, but the traffic to the node itself goes directly to the CPU >>and response is sent immediately back yielding lower RTT? >> >>Stjepan >> >>P.S. With all the recent hype of ISPs doing traffic limiting and >>similar, I doubt that this has nothing to do with QoS, making my guesses >>a bit wrong, but it's interesting nevertheless... :) >> cheers jon From perfgeek at mac.com Mon Sep 7 08:36:04 2009 From: perfgeek at mac.com (rick jones) Date: Mon, 07 Sep 2009 08:36:04 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: <4ACAF650-1FCE-4442-B560-CF9B2F20317A@mac.com> On Sep 6, 2009, at 6:00 PM, David P. Reed wrote: > For those who have some idea of how TCP does congestion control, I > ask "what's wrong with this picture?" > And perhaps those who know someone responsible at the Internet > Access Provider involved, perhaps we could organize some consulting > help... > > (Hint: the problem relates to a question, "why are there no lost IP > datagrams?", and a second hint is that the ping time this morning > was about 193 milliseconds.) Perhaps the IAP in question believes that all ICMP traffic is equal, but some is more equal than others, with destination unreachable having rather higher priority than echo? rick jones http://homepage.mac.com/perfgeek From sgros at zemris.fer.hr Mon Sep 7 23:02:22 2009 From: sgros at zemris.fer.hr (Stjepan Gros) Date: Tue, 08 Sep 2009 08:02:22 +0200 Subject: [e2e] RES: What's wrong with this picture? In-Reply-To: <002e01ca2fc8$c56166e0$502434a0$@br> References: <4AA45B20.6030705@reed.com> <002e01ca2fc8$c56166e0$502434a0$@br> Message-ID: <1252389742.3077.7.camel@fedora> On Mon, 2009-09-07 at 11:37 -0300, Alexandre Grojsgold wrote: > Hi, > > > > Looking at the first ping command, it seems that the first packet was > lost (icmp_seq=0), as well as a series of packets after the 15th, > before David decided to ^C the ping execution. The 14 packets that > went through experienced a huge delay, really hard to explain. ping (at least on linux/fedora) starts with icmp_seq=1, so no packet was lost. For example: $ ping 192.168.3.20 PING 192.168.3.20 (192.168.3.20) 56(84) bytes of data. 64 bytes from 192.168.3.20: icmp_seq=1 ttl=63 time=0.687 ms 64 bytes from 192.168.3.20: icmp_seq=2 ttl=63 time=0.837 ms ^C --- 192.168.3.20 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1429ms rtt min/avg/max/mdev = 0.687/0.762/0.837/0.075 ms But it is interesting that the first ping command has been running for slightly more than 33 seconds and during that time it only sent 24 probes, instead of at least 30 (one per second)? S From dpreed at reed.com Tue Sep 8 04:58:19 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 07:58:19 -0400 Subject: [e2e] RES: What's wrong with this picture? In-Reply-To: <1252389742.3077.7.camel@fedora> References: <4AA45B20.6030705@reed.com> <002e01ca2fc8$c56166e0$502434a0$@br> <1252389742.3077.7.camel@fedora> Message-ID: <4AA646DB.9040602@reed.com> On 09/08/2009 02:02 AM, Stjepan Gros wrote: > But it is interesting that the first ping command has been running for > slightly more than 33 seconds and during that time it only sent 24 > probes, instead of at least 30 (one per second)? > > Good observation - I missed this data point. I can only suggest that the "ping" equivalent called "DNS lookup" took a full roundtrip (many seconds), and the once per second rate starts after the name is resolved. I didn't do a confirming experiment, because I didn't notice that. It was the first time I named "lcs.mit.edu" in my session, so I doubt it was in my DNS cache. From gdt at gdt.id.au Tue Sep 8 05:29:21 2009 From: gdt at gdt.id.au (Glen Turner) Date: Tue, 08 Sep 2009 21:59:21 +0930 Subject: [e2e] What's wrong with this picture? In-Reply-To: <20090907.144414.74741371.sthaug@nethelp.no> References: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> <20090907.144414.74741371.sthaug@nethelp.no> Message-ID: <4AA64E21.5070803@gdt.id.au> Steinar Haug wrote: > Do you know of *any* ISPs that explicitly prioritize ICMP a lot lower > than TCP for traffic *through* the ISP's routers? Note that this is > different from ICMP to/from the router itself. Yes, I did the QoS policy for one -- AARNet. We had a lot of ping flooding from our fast network of smaller ISPs we peer with. So into the Scavenger class went ICMP Echo and Echo Reply. There's no need for diagnostic features to run in the Best Effort class. Can't say that anyone has noticed, yet alone complained. We did send out a warning about the use of ICMP Echo as a heartbeat protocol for STONITH. Alexandre Grojsgold wrote: > no network should hold a packet for long 8 or 9 seconds, and yet deliver it to somewhere Actually, lots of hosts and routers will deliver queued packets that were present when the interface went offline when the interface comes back online. Operationally speaking, cyclical bursts of packets O(10s) old are indicative of a clocking issue or a forever-negotiating ethernet interface, we've also seen a variant of delay behaviour on devices suffering RAM ECC errors. Best wishes, Glen -- Glen Turner From dpreed at reed.com Tue Sep 8 07:05:54 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 10:05:54 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA64C2C.4040707@freedesktop.org> References: <4AA4C9F8.2030602@spaghetti.zurich.ibm.com> <4AA4E494.40102@spaghetti.zurich.ibm.com> <20090907.144414.74741371.sthaug@nethelp.no> <4AA64C2C.4040707@freedesktop.org> Message-ID: <4AA664C2.1000009@reed.com> Jim - I suspect your Comcast support person was partly right. ICMP *echoing* is sidelined. However, IP packets that contain ICMP messages destined farther down the line are NOT dropped by routers and switches. That would be dumb, though I'm sure some networks that don't want to monitor their own congestion might be so dumb as to imagine that ICMP mice will somehow overload a network. I don't think such people are members of NANOG). It turns out that Comcast's problem (extensively investigated by technologists rather than their PR dept., only after the Harvard FCC hearing) was that DOCSIS modems they had bought actually had multiple-seconds worth of buffering on their upstream-facing interfaces, and did not under any circumstances drop packets in a way that would allow TCP to know enough to slow down the AI part of AIMD. Given the sidelining of *echoing* yes, pinging a router might not give much info about that router. But pinging the next, unloaded router down the route will tell you a lot. In any case, it's easy to open up a TCP connection and carry out an end-to-end ping without ever using ICMP. Just wait a few seconds after a sync, send a few bytes, and have a responder echo them. If you use TCPNODELAY option, you will get a reliable result. I have a python program on my server that handles such things. In this particular measurement, the data from this "TCP ping" gave consistent RTT's with the ICMP ping. It's fascinating to me that people REALLY WANT to call this "measurement error". As opposed to *operator* misconfiguration (or router-designer-error). Perhaps someone might actually be able to guess what manufacturer sells the equipment that routinely buffers 8 seconds of outgoing packets on a link without a hint of backpressure that would allow TCP's congestion control to kick in? I just want to see it fixed before Sandvine sells some more TCP-RST-injectors and DPI spies to that vendor, and starts accusing people with some very cool handsets of "attacking the network". Maybe the handset vendor would be interested in having interactions take less than 8-20 seconds between gesture and response from a server? One thing that is clear: the spate of news stories about "spectrum shortage" has missed a fundamental technical problem that has NOTHING to do with spectrum. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/218e1d8d/attachment.html From ljorgenson at apparentnetworks.com Tue Sep 8 09:29:08 2009 From: ljorgenson at apparentnetworks.com (Loki Jorgenson) Date: Tue, 8 Sep 2009 09:29:08 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: Message-ID: Satellite link? Probably geostationary from the looks of it (around 600ms RTT) - possibly more than one bounce in certain cases. The buffers on satellite link modems can be quite large to boot. Little to no loss is not surprising given their retransmit capabilities. Apparently part of a fail-over if it returned to terrestrial values in the morning. That accounts for anything up to about 2 seconds. The really high 5-10 second RTTs still seem pretty extreme even for satellites.... Loki Jorgenson Apparent Networks t 604 433 2333 ext 105 m 604 250-4642 -----Original Message----- Message: 1 Date: Sun, 06 Sep 2009 21:00:16 -0400 From: "David P. Reed" Subject: [e2e] What's wrong with this picture? To: end2end-interest list Message-ID: <4AA45B20.6030705 at reed.com> Content-Type: text/plain; charset="iso-8859-1" For those who have some idea of how TCP does congestion control, I ask "what's wrong with this picture?" And perhaps those who know someone responsible at the Internet Access Provider involved, perhaps we could organize some consulting help... (Hint: the problem relates to a question, "why are there no lost IP datagrams?", and a second hint is that the ping time this morning was about 193 milliseconds.) $ ping lcs.mit.edu PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 time=6330 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 time=6005 ms From Anil.Agarwal at viasat.com Tue Sep 8 10:14:16 2009 From: Anil.Agarwal at viasat.com (Agarwal, Anil) Date: Tue, 8 Sep 2009 13:14:16 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA664C2.1000009@reed.com> Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> David, Another possible explanation would be that the IAP equipment (or DOCSIS modem) provides some level of packet differentiation and queuing/prioritization (e.g., real-time v/s non-real-time) and during your test, there was a lot of real-time traffic, reducing the amount of bandwidth available to TCP-traffic. Even if queues are "properly" sized based on access link speed, multiple queues and prioritization can create havoc with packet delays. To some extent, we are all to blame, since we (the IETF community) have not really offered any strong guidelines on how to size router buffers. There are plenty of papers on the subject. The most we seem to suggest is that it should be some fraction of the bandwidth-delay product, with little agreement on the fraction value. Some critical questions remain unanswered - 1. What does bandwidth-delay product mean for a poor access router, probably pre-configured at the factory? Is the bandwidth based on its own access link speed? What delay value should be used? - delay can vary dynamically across a wide range. 2. How about links with variable speeds (wireless, satellite)? Should buffer size be computed dynamically. 3. What happens with use of Differentiated services and queues? The bandwidth available to a specific queue can dynamically vary across a wide range. Best Effort queue probably suffers the most. 4. Should queues be sized based on maximum queuing delay instead of a fixed/computed amount of buffer space? 5. Similar questions apply to use of RED/ECN; how do we compute (all) RED parameters for such links and queues? Anil Anil Agarwal ViaSat Inc. 20511 Seneca Meadows Parkway Germantown, MD 20876 ________________________________ From: end2end-interest-bounces at postel.org [mailto:end2end-interest-bounces at postel.org] On Behalf Of David P. Reed Sent: Tuesday, September 08, 2009 10:06 AM To: Jim Gettys Cc: jeroen at unfix.org; sthaug at nethelp.no; end2end-interest at postel.org Subject: Re: [e2e] What's wrong with this picture? Jim - I suspect your Comcast support person was partly right. ICMP *echoing* is sidelined. However, IP packets that contain ICMP messages destined farther down the line are NOT dropped by routers and switches. That would be dumb, though I'm sure some networks that don't want to monitor their own congestion might be so dumb as to imagine that ICMP mice will somehow overload a network. I don't think such people are members of NANOG). It turns out that Comcast's problem (extensively investigated by technologists rather than their PR dept., only after the Harvard FCC hearing) was that DOCSIS modems they had bought actually had multiple-seconds worth of buffering on their upstream-facing interfaces, and did not under any circumstances drop packets in a way that would allow TCP to know enough to slow down the AI part of AIMD. Given the sidelining of *echoing* yes, pinging a router might not give much info about that router. But pinging the next, unloaded router down the route will tell you a lot. In any case, it's easy to open up a TCP connection and carry out an end-to-end ping without ever using ICMP. Just wait a few seconds after a sync, send a few bytes, and have a responder echo them. If you use TCPNODELAY option, you will get a reliable result. I have a python program on my server that handles such things. In this particular measurement, the data from this "TCP ping" gave consistent RTT's with the ICMP ping. It's fascinating to me that people REALLY WANT to call this "measurement error". As opposed to *operator* misconfiguration (or router-designer-error). Perhaps someone might actually be able to guess what manufacturer sells the equipment that routinely buffers 8 seconds of outgoing packets on a link without a hint of backpressure that would allow TCP's congestion control to kick in? I just want to see it fixed before Sandvine sells some more TCP-RST-injectors and DPI spies to that vendor, and starts accusing people with some very cool handsets of "attacking the network". Maybe the handset vendor would be interested in having interactions take less than 8-20 seconds between gesture and response from a server? One thing that is clear: the spate of news stories about "spectrum shortage" has missed a fundamental technical problem that has NOTHING to do with spectrum. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/72cca325/attachment-0001.html From arthurcallado at gmail.com Tue Sep 8 10:54:28 2009 From: arthurcallado at gmail.com (Arthur Callado) Date: Tue, 8 Sep 2009 14:54:28 -0300 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: Message-ID: And 5-10 seconds RTT seems too little for IPN (http://www.ipnsig.org), no matter what planet. Or are you pinging from the Earth's Moon? That RTT would be reasonable (without equipment failure) in a moon-to-earth scenario, but I wonder what reason you gave for an agency to fund such an endeavor. Arthur Callado. Networking and Telecommunications Research Group Federal University of Pernambuco, Brazil On Tue, Sep 8, 2009 at 1:29 PM, Loki Jorgenson wrote: > Satellite link? ?Probably geostationary from the looks of it (around > 600ms RTT) - possibly more than one bounce in certain cases. ?The > buffers on satellite link modems can be quite large to boot. ?Little to > no loss is not surprising given their retransmit capabilities. > Apparently part of a fail-over if it returned to terrestrial values in > the morning. > > That accounts for anything up to about 2 seconds. ?The really high 5-10 > second RTTs still seem pretty extreme even for satellites.... > > Loki Jorgenson > Apparent Networks > t ? 604 433 2333 ext 105 > m ? 604 250-4642 > > -----Original Message----- > > Message: 1 > Date: Sun, 06 Sep 2009 21:00:16 -0400 > From: "David P. Reed" > Subject: [e2e] What's wrong with this picture? > To: end2end-interest list > Message-ID: <4AA45B20.6030705 at reed.com> > Content-Type: text/plain; charset="iso-8859-1" > > For those who have some idea of how TCP does congestion control, I ask > "what's wrong with this picture?" ?And perhaps those who know someone > responsible at the Internet Access Provider involved, perhaps we could > organize some consulting help... > > (Hint: the problem relates to a question, "why are there no lost IP > datagrams?", and a second hint is that the ping time this morning was > about 193 milliseconds.) > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=6330 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=6005 ms > > > From dokaspar.ietf at gmail.com Tue Sep 8 10:54:49 2009 From: dokaspar.ietf at gmail.com (Dominik Kaspar) Date: Tue, 8 Sep 2009 19:54:49 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: Message-ID: <2a3692de0909081054u4fc1ca0x6423bc0f9037aaee@mail.gmail.com> If there is a lot of congestion, ICMP packets should get dropped at intermediate routers. Maybe some "smart" router takes the term "best effort" too seriously and tries not to lose a single packet by buffering everything on secondary storage...? Dominik On Tue, Sep 8, 2009 at 6:29 PM, Loki Jorgenson wrote: > Satellite link? ?Probably geostationary from the looks of it (around > 600ms RTT) - possibly more than one bounce in certain cases. ?The > buffers on satellite link modems can be quite large to boot. ?Little to > no loss is not surprising given their retransmit capabilities. > Apparently part of a fail-over if it returned to terrestrial values in > the morning. > > That accounts for anything up to about 2 seconds. ?The really high 5-10 > second RTTs still seem pretty extreme even for satellites.... > > Loki Jorgenson > Apparent Networks > t ? 604 433 2333 ext 105 > m ? 604 250-4642 > > -----Original Message----- > > Message: 1 > Date: Sun, 06 Sep 2009 21:00:16 -0400 > From: "David P. Reed" > Subject: [e2e] What's wrong with this picture? > To: end2end-interest list > Message-ID: <4AA45B20.6030705 at reed.com> > Content-Type: text/plain; charset="iso-8859-1" > > For those who have some idea of how TCP does congestion control, I ask > "what's wrong with this picture?" ?And perhaps those who know someone > responsible at the Internet Access Provider involved, perhaps we could > organize some consulting help... > > (Hint: the problem relates to a question, "why are there no lost IP > datagrams?", and a second hint is that the ping time this morning was > about 193 milliseconds.) > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=6330 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=6005 ms > > > From dpreed at reed.com Tue Sep 8 10:56:20 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 13:56:20 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> Message-ID: <4AA69AC4.1090507@reed.com> I should not have been so cute - I didn't really want to pick on the operator involved, because I suspect that other 3G operators around the world probably use the same equipment and same rough configuration. The ping and traceroute were from Chicago, using an ATT Mercury data modem, the same channel as the Apple iPhones use, but it's much easier to run test suites from my netbook. Here's the same test from another time of day, early Sunday morning, when things were working well. Note that I ran the test over the entire labor day weekend at intervals. The end-to-end ping time was bimodal. Either it pegged at over 5000 milliseconds, or happily sat at under 200 milliseconds. Exactly what one would expect if TCP congestion control were disabled by overbuffering in a router preceding the bottleneck link shared by many users. ------------------------------ $ ping lcs.mit.edu PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 time=209 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 time=118 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 time=166 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 time=165 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 time=224 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 time=183 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 time=224 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 time=181 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 time=220 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 time=179 ms 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 time=219 ms ^C --- lcs.mit.edu ping statistics --- 11 packets transmitted, 11 received, 0% packet loss, time 10780ms rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms $ traceroute lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets 1 * * * 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms 3 * * * 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms 207.527 ms 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms 90.083 ms 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 ms 91.576 ms 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms 88.825 ms 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms 80.918 ms 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms 70.927 ms 78.817 ms 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 ms 156.892 ms 140.711 ms 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 ms 139.764 ms 129.853 ms 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms 154.366 ms 152.225 ms 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms 129.801 ms 89.659 ms 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms 118.818 ms 91.727 ms 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms 88.768 ms 85.837 ms 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms 103.569 ms $ From pganti at gmail.com Tue Sep 8 11:04:15 2009 From: pganti at gmail.com (Paddy Ganti) Date: Tue, 8 Sep 2009 11:04:15 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA45B20.6030705@reed.com> References: <4AA45B20.6030705@reed.com> Message-ID: <2ff1f08a0909081104u1c2bcf16pe054c6aa603890b6@mail.gmail.com> The fact that the TTL value is 44 in the response makes me hazard a guess that there are routing issues with the IAP (sounds like MPLS may be involved). On Sun, Sep 6, 2009 at 6:00 PM, David P. Reed wrote: > For those who have some idea of how TCP does congestion control, I ask > "what's wrong with this picture?" And perhaps those who know someone > responsible at the Internet Access Provider involved, perhaps we could > organize some consulting help... > > (Hint: the problem relates to a question, "why are there no lost IP > datagrams?", and a second hint is that the ping time this morning was about > 193 milliseconds.) > > Van Jacobsen, Scott Shenker, and Sally Floyd are not allowed to answer the > question. (they used to get funding from the IAP involved, but apparently > that company does not listen to them). > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=6330 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=6005 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 > time=8509 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 > time=9310 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 > time=8586 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 > time=7765 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 > time=7168 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 > time=10261 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 > time=10624 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 > time=9625 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 > time=9725 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=12 ttl=44 > time=8725 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=13 ttl=44 > time=9306 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=14 ttl=44 > time=8306 ms > ^C > --- lcs.mit.edu ping statistics --- > 24 packets transmitted, 14 received, 41% packet loss, time 33174ms > rtt min/avg/max/mdev = 6005.237/8589.365/10624.776/1334.140 ms, pipe 11 > $ traceroute lcs.mit.edu > traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets > 1 * * * > 2 172.26.248.2 (172.26.248.2) 693.585 ms 693.415 ms 712.282 ms > 3 * * * > 4 172.16.192.18 (172.16.192.18) 712.700 ms 1356.680 ms 1359.469 ms > 5 12.88.7.205 (12.88.7.205) 1361.306 ms 673.642 ms 673.541 ms > 6 cr84.cgcil.ip.att.net (12.122.152.134) 673.442 ms 673.371 ms > 673.742 ms > 7 cr2.cgcil.ip.att.net (12.123.7.250) 655.126 ms 654.186 ms 554.690 > ms > 8 * * ggr2.cgcil.ip.att.net (12.122.132.133) 912.385 ms > 9 192.205.33.210 (192.205.33.210) 909.925 ms 911.335 ms 911.204 ms > 10 ae-31-53.ebr1.Chicago1.Level3.net (4.68.101.94) 569.740 ms 569.605 > ms 907.409 ms > 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 369.680 ms 344.495 ms > 345.252 ms > 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 355.645 ms 641.866 ms > 641.367 ms > 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 636.598 ms 636.797 > ms 635.755 ms > 14 B24-RTR-2-BACKBONE-2.MIT.EDU (18.168.1.23) 635.766 ms 634.794 ms > 866.430 ms > 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 758.305 ms 822.244 ms > 821.202 ms > 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 833.699 ms 1055.548 ms > 1116.813 ms > 17 zermatt.csail.mit.edu (128.30.2.121) 1114.838 ms 539.951 ms 620.681 > ms > [david at whimsy ~]$ ping 172.26.248.2 > PING 172.26.248.2 (172.26.248.2) 56(84) bytes of data. > 64 bytes from 172.26.248.2: icmp_seq=1 ttl=254 time=1859 ms > 64 bytes from 172.26.248.2: icmp_seq=2 ttl=254 time=1363 ms > 64 bytes from 172.26.248.2: icmp_seq=3 ttl=254 time=1322 ms > 64 bytes from 172.26.248.2: icmp_seq=4 ttl=254 time=1657 ms > 64 bytes from 172.26.248.2: icmp_seq=5 ttl=254 time=1725 ms > 64 bytes from 172.26.248.2: icmp_seq=6 ttl=254 time=1740 ms > 64 bytes from 172.26.248.2: icmp_seq=7 ttl=254 time=1838 ms > 64 bytes from 172.26.248.2: icmp_seq=8 ttl=254 time=1738 ms > 64 bytes from 172.26.248.2: icmp_seq=9 ttl=254 time=1517 ms > 64 bytes from 172.26.248.2: icmp_seq=10 ttl=254 time=978 ms > 64 bytes from 172.26.248.2: icmp_seq=11 ttl=254 time=715 ms > 64 bytes from 172.26.248.2: icmp_seq=12 ttl=254 time=678 ms > 64 bytes from 172.26.248.2: icmp_seq=13 ttl=254 time=638 ms > 64 bytes from 172.26.248.2: icmp_seq=14 ttl=254 time=761 ms > ^C > --- 172.26.248.2 ping statistics --- > 15 packets transmitted, 14 received, 6% packet loss, time 14322ms > rtt min/avg/max/mdev = 638.651/1324.002/1859.725/455.200 ms, pipe 2 > $ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/6b03fdca/attachment-0001.html From william.allen.simpson at gmail.com Tue Sep 8 14:40:46 2009 From: william.allen.simpson at gmail.com (William Allen Simpson) Date: Tue, 08 Sep 2009 17:40:46 -0400 Subject: [e2e] 64-bit timestamps? Message-ID: <4AA6CF5E.7080707@gmail.com> While I'm experimenting and daydreaming, back in the day there was an expressed interest in 64-bit timestamps. Since it would be harder to expand the existing 32-bit sequence space, would it be of some utility to expand the PAWS timestamps intead? From dpreed at reed.com Tue Sep 8 15:05:05 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 18:05:05 -0400 Subject: [e2e] congestion collapse definition Message-ID: <4AA6D511.7000300@reed.com> Folks, I tend not to use the term "congestion collapse", though it is in common use in the Internet community. The phenomenon I've been experiencing on the other thread about AT&T 3G data access network configuration on this list, if I'm correct (as I'm pretty sure I am) should probably be called "congestion collapse", or else we need a new term. The phenomenon observed in Comcast's debacle with DOCSIS upstream buffering should be called by the same term - again, buffering is allowed to build on a shared queue carrying diverse traffic, without providing any feedback that can be recognized by TCP's rate control loop, leading to positive feedback and uncontrolled delay. If I look at Wikipedia, for example, at the definition of congestion collapse there, it says that CC is characterized by large buffering delays AND lost packets. However, in the Comcast and ATT cases here, the queues get so obnoxiously long (5-10 seconds) that users presumably give up running apps long before packet *loss* sets in due to overflow. This appears to be because all the TCP stacks are doing their job: new connections slow-start, then AI accelerates at a rate that is gradual enough (and over short-enough connections) that the huge buffers can stabilize at the point where human pain is the congestion control algorithm. Human pain was the load control algorithm in early overloaded TimeSharingSystems. On the original Multics system, people realized that in the middle of the day it was *foolish* to start a program that ran more than one second, because priority given to line editors over compute jobs meant that compute jobs would NEVER complete (unless one did an obscure thing called "quit-starting" the program to interact once a second by stopping and restarting the compile - some hackers rigged up terminals to automatically send interrupt/restart commands once per second to get their work done, but the rest of us coders worked mostly between 11 pm and 6 am). Of course another part of fixing ATT's problem is to fix the *upstream* capacity of the network. The bottleneck wouldn't occur if the output queue of the bottleneck router could drain as fast as users can generate demand. Back to my question: should this phenomenon be included in "congestion collapse" (I believe so), or should we invent a new more specific name (Buffer Madness?). -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/c592ad67/attachment.html From dpreed at reed.com Tue Sep 8 15:24:37 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 18:24:37 -0400 Subject: [e2e] 64-bit timestamps? In-Reply-To: <4AA6CF5E.7080707@gmail.com> References: <4AA6CF5E.7080707@gmail.com> Message-ID: <4AA6D9A5.6040601@reed.com> In regard to DNS security issues, I suggest reading Appendix B of RFC 1323 on whether PAWS helps. (I quote B.2 below). Since DNS queries do not *require* the full duplex TCP close to be reliable (the close provides no correctness to the DNS app), only the second point in B.2 really ought to matter. But PAWS may not be useful, since DNS itself might be made to maintain state across connections, moving the problem out of TCP and into the app (DNS) layer where it probably belongs. --------------- from appendix B of RFC 1323: ---------------- B.2 Closing and Reopening a Connection When a TCP connection is closed, a delay of 2*MSL in TIME-WAIT state ties up the socket pair for 4 minutes (see Section 3.5 of [Postel81]. Applications built upon TCP that close one connection and open a new one (e.g., an FTP data transfer connection using Stream mode) must choose a new socket pair each time. The TIME- WAIT delay serves two different purposes: 1. Implement the full-duplex reliable close handshake of TCP. The proper time to delay the final close step is not really related to the MSL; it depends instead upon the RTO for the FIN segments and therefore upon the RTT of the path. (It could be argued that the side that is sending a FIN knows what degree of reliability it needs, and therefore it should be able to determine the length of the TIME-WAIT delay for the FIN's recipient. This could be accomplished with an appropriate TCP option in FIN segments.) Although there is no formal upper-bound on RTT, common network engineering practice makes an RTT greater than 1 minute very unlikely. Thus, the 4 minute delay in TIME-WAIT state works satisfactorily to provide a reliable full-duplex TCP close. Note again that this is independent of MSL enforcement and network speed. The TIME-WAIT state could cause an indirect performance problem if an application needed to repeatedly close one connection and open another at a very high frequency, since the number of available TCP ports on a host is less than 2**16. However, high network speeds are not the major contributor to this problem; the RTT is the limiting factor in how quickly connections can be opened and closed. Therefore, this problem will be no worse at high transfer speeds. 2. Allow old duplicate segments to expire. To replace this function of TIME-WAIT state, a mechanism would have to operate across connections. PAWS is defined strictly within a single connection; the last timestamp is TS.Recent is kept in the connection control block, and discarded when a connection is closed. An additional mechanism could be added to the TCP, a per-host cache of the last timestamp received from any connection. This value could then be used in the PAWS mechanism to reject old duplicate segments from earlier incarnations of the connection, if the timestamp clock can be guaranteed to have ticked at least once since the old connection was open. This would require that the TIME-WAIT delay plus the RTT together must be at least one tick of the sender's timestamp clock. Such an extension is not part of the proposal of this RFC. Note that this is a variant on the mechanism proposed by Garlick, Rom, and Postel [Garlick77], which required each host to maintain connection records containing the highest sequence numbers on every connection. Using timestamps instead, it is only necessary to keep one quantity per remote host, regardless of the number of simultaneous connections to that host. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/03944ac4/attachment.html From william.allen.simpson at gmail.com Tue Sep 8 15:42:55 2009 From: william.allen.simpson at gmail.com (William Allen Simpson) Date: Tue, 08 Sep 2009 18:42:55 -0400 Subject: [e2e] 64-bit timestamps? In-Reply-To: <4AA6D9A5.6040601@reed.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> Message-ID: <4AA6DDEF.4080907@gmail.com> David P. Reed wrote: > In regard to DNS security issues, > ... But PAWS may not be useful, since DNS itself > might be made to maintain state across connections, moving the problem > out of TCP and into the app (DNS) layer where it probably belongs. > This has no relation to the question that I asked, which has no mention what-so-ever about DNS security. Nor did I find the cut and paste of an old familiar RFC appendix particularly informative, not even in fancy multi-part alternative html (instead of the native format).... In any case, I've been paying attention to the more recent 1323bis. From dokaspar.ietf at gmail.com Tue Sep 8 15:47:40 2009 From: dokaspar.ietf at gmail.com (Dominik Kaspar) Date: Wed, 9 Sep 2009 00:47:40 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA69AC4.1090507@reed.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> Message-ID: <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> Hello David, You mentioned the bimodal behaviour of your 3G connection. I recently noticed the same thing but have not yet been able to explain why this happens. I also ran Ping tests over multiple days using an HSDPA modem (with both the client and server located in Oslo, Norway). The experienced RTTs were very stable over short periods of time, but sometimes they averaged around 80ms, while at other times the average was at about 300ms. A CDF illustration of the results is available here: http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png What is the reason of these two modes? Is it caused by adaptive modulation and coding on the physical layer? If so, why does it affect the delay so much? I would only expect a reduced bandwidth, but not much change in delay... Greetings, Dominik On Tue, Sep 8, 2009 at 7:56 PM, David P. Reed wrote: > I should not have been so cute - I didn't really want to pick on the > operator involved, because I suspect that other 3G operators around the > world probably use the same equipment and same rough configuration. > > The ping and traceroute were from Chicago, using an ATT Mercury data modem, > the same channel as the Apple iPhones use, but it's much easier to run test > suites from my netbook. > > Here's the same test from another time of day, early Sunday morning, when > things were working well. > > Note that I ran the test over the entire labor day weekend at intervals. > The end-to-end ping time was bimodal. ?Either it pegged at over 5000 > milliseconds, or happily sat at under 200 milliseconds. ? Exactly what one > would expect if TCP congestion control were disabled by overbuffering in a > router preceding the bottleneck link shared by many users. > > ------------------------------ > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=209 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=118 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 > time=166 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 > time=165 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 > time=224 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 > time=183 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 > time=224 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 > time=181 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 > time=220 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 > time=179 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 > time=219 ms > ^C > --- lcs.mit.edu ping statistics --- > 11 packets transmitted, 11 received, 0% packet loss, time 10780ms > rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms > $ traceroute lcs.mit.edu > traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets > ?1 ?* * * > ?2 ?172.26.248.2 (172.26.248.2) ?178.725 ms ?178.568 ms ?179.500 ms > ?3 ?* * * > ?4 ?172.16.192.34 (172.16.192.34) ?187.794 ms ?187.677 ms ?207.527 ms > ?5 ?12.88.7.205 (12.88.7.205) ?207.416 ms ?208.325 ms ?69.630 ms > ?6 ?cr84.cgcil.ip.att.net (12.122.152.134) ?79.425 ms ?89.227 ms ?90.083 ms > ?7 ?cr2.cgcil.ip.att.net (12.123.7.250) ?98.679 ms ?90.727 ms ?91.576 ms > ?8 ?ggr2.cgcil.ip.att.net (12.122.132.137) ?72.728 ms ?89.628 ms ?88.825 ms > ?9 ?192.205.33.186 (192.205.33.186) ?89.787 ms ?89.794 ms ?80.918 ms > 10 ?ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) ?79.895 ms ?70.927 ms > ?78.817 ms > 11 ?ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) ?107.820 ms ?156.892 ms > ?140.711 ms > 12 ?ae-7-7.car1.Boston1.Level3.net (4.69.132.241) ?139.638 ms ?139.764 ms > ?129.853 ms > 13 ?MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) ?149.595 ms ?154.366 ms > ?152.225 ms > 14 ?B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) ?146.808 ms ?129.801 ms ?89.659 > ms > 15 ?MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) ?109.463 ms ?118.818 ms ?91.727 > ms > 16 ?trantor.kalgan.csail.mit.edu (128.30.0.246) ?91.541 ms ?88.768 ms > ?85.837 ms > 17 ?zermatt.csail.mit.edu (128.30.2.121) ?117.581 ms ?116.564 ms ?103.569 ms > $ > > > From dpreed at reed.com Tue Sep 8 16:07:24 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 19:07:24 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> Message-ID: <4AA6E3AC.9060404@reed.com> I'm willing to bet that you are seeing the same problem I am, and that it has nothing to do with the modem or wireless protocol. Instead you are seeing what would happen if you simulate in ns2 the following system structure: -------------------------\ --------------------------\ ---------------------------\ wireless medium [WIRELESS HUB]------[ROUTER]-----------backbone ISP ---------------------------/ --------------------------/ When the link between the ROUTER and backbone ISP is of lower bitrate B than the sum of all the realizable simultaneous uplink demand from devices on the left, the outbound queue of the router is of size M > BT where T is the observed stable long delay, and the ROUTER does nothing to signal congestion until the entire M bytes (now very large) of memory are exhausted. Memory is now very cheap, and not-very-clueful network layer 2 designers (who don't study TCP or the Internet) are likely to throw too much at the problem without doing the right thing in their firmware. On 09/08/2009 06:47 PM, Dominik Kaspar wrote: > Hello David, > > You mentioned the bimodal behaviour of your 3G connection. I recently > noticed the same thing but have not yet been able to explain why this > happens. > > I also ran Ping tests over multiple days using an HSDPA modem (with > both the client and server located in Oslo, Norway). The experienced > RTTs were very stable over short periods of time, but sometimes they > averaged around 80ms, while at other times the average was at about > 300ms. > > A CDF illustration of the results is available here: > http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png > > What is the reason of these two modes? Is it caused by adaptive > modulation and coding on the physical layer? If so, why does it affect > the delay so much? I would only expect a reduced bandwidth, but not > much change in delay... > > Greetings, > Dominik > > > On Tue, Sep 8, 2009 at 7:56 PM, David P. Reed wrote: > >> I should not have been so cute - I didn't really want to pick on the >> operator involved, because I suspect that other 3G operators around the >> world probably use the same equipment and same rough configuration. >> >> The ping and traceroute were from Chicago, using an ATT Mercury data modem, >> the same channel as the Apple iPhones use, but it's much easier to run test >> suites from my netbook. >> >> Here's the same test from another time of day, early Sunday morning, when >> things were working well. >> >> Note that I ran the test over the entire labor day weekend at intervals. >> The end-to-end ping time was bimodal. Either it pegged at over 5000 >> milliseconds, or happily sat at under 200 milliseconds. Exactly what one >> would expect if TCP congestion control were disabled by overbuffering in a >> router preceding the bottleneck link shared by many users. >> >> ------------------------------ >> >> $ ping lcs.mit.edu >> PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 >> time=209 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 >> time=118 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 >> time=166 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 >> time=165 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 >> time=224 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 >> time=183 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 >> time=224 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 >> time=181 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 >> time=220 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 >> time=179 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 >> time=219 ms >> ^C >> --- lcs.mit.edu ping statistics --- >> 11 packets transmitted, 11 received, 0% packet loss, time 10780ms >> rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms >> $ traceroute lcs.mit.edu >> traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets >> 1 * * * >> 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms >> 3 * * * >> 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms 207.527 ms >> 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms >> 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms 90.083 ms >> 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 ms 91.576 ms >> 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms 88.825 ms >> 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms 80.918 ms >> 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms 70.927 ms >> 78.817 ms >> 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 ms 156.892 ms >> 140.711 ms >> 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 ms 139.764 ms >> 129.853 ms >> 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms 154.366 ms >> 152.225 ms >> 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms 129.801 ms 89.659 >> ms >> 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms 118.818 ms 91.727 >> ms >> 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms 88.768 ms >> 85.837 ms >> 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms 103.569 ms >> $ >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/f14ceebe/attachment.html From dpreed at reed.com Tue Sep 8 17:58:09 2009 From: dpreed at reed.com (David P. Reed) Date: Tue, 08 Sep 2009 20:58:09 -0400 Subject: [e2e] 64-bit timestamps? In-Reply-To: <4AA6DDEF.4080907@gmail.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> <4AA6DDEF.4080907@gmail.com> Message-ID: <4AA6FDA1.9000307@reed.com> List moderator - please suspend Simpson's privileges; the rules suggest that his obnoxious behavior towards a helpful comment demand moderation until he stops behaving this way. On 09/08/2009 06:42 PM, William Allen Simpson wrote: > David P. Reed wrote: >> In regard to DNS security issues, ... But PAWS may not be useful, >> since DNS itself might be made to maintain state across connections, >> moving the problem out of TCP and into the app (DNS) layer where it >> probably belongs. >> > This has no relation to the question that I asked, which has no mention > what-so-ever about DNS security. Nor did I find the cut and paste of an > old familiar RFC appendix particularly informative, not even in fancy > multi-part alternative html (instead of the native format).... > > In any case, I've been paying attention to the more recent 1323bis. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/dc8810e6/attachment.html From perfgeek at mac.com Tue Sep 8 18:41:42 2009 From: perfgeek at mac.com (rick jones) Date: Tue, 08 Sep 2009 18:41:42 -0700 Subject: [e2e] congestion collapse definition In-Reply-To: <4AA6D511.7000300@reed.com> References: <4AA6D511.7000300@reed.com> Message-ID: It was my understanding that congestion collapse was defined based on the effective throughput (goodput?) dropping to epsilon while the link utilization was 100%. I would assert, without any additional substantiation, that unless the effective throughput goes to epsilon, it isn't "congestion collapse" and one would indeed need another term. rick jones there is no rest for the wicked, yet the virtuous have no pillows From jnc at mercury.lcs.mit.edu Tue Sep 8 19:58:05 2009 From: jnc at mercury.lcs.mit.edu (Noel Chiappa) Date: Tue, 8 Sep 2009 22:58:05 -0400 (EDT) Subject: [e2e] congestion collapse definition Message-ID: <20090909025805.D6ACB6BE609@mercury.lcs.mit.edu> > From: "David P. Reed" > If I look at Wikipedia, for example, at the definition of congestion > collapse there, it says that CC is characterized by large buffering > delays AND lost packets. I was responsible for the original version of that page, so any faults in it can probably be laid at my door... :-) When writing it, I was probably thinking of the examples of congestive collapse I had seen (e.g. the ARPANet TCP email meltdown, TFTP Sorcerer's Apprentice Syndrome, etc) in which high packet drop rates were seen. Of course, that was 'back in the day', when routers had a _lot_ less buffering than they do now, so the symptoms naturally differed somewhat. > should this phenomenon be included in "congestion collapse" (I believe > so), or should we invent a new more specific name (Buffer Madness?). Both, I think. In general terms, they are both examples of the same basic concept; people give the network a lot more traffic than it can handle, and as a result things turn to total alimentary byproduct. To me, the term 'congestive collapse' is a perfect term for that general situation, because the network is, well, congested; so congested it effectively ceases to function. (Ceases to function effectively?) Congestion + failure -> congestive collapse. (Somehow 'congestive failure' doesn't have the same ring - probably the alliteration.) At the same time, they are interestingly different (in terms of the symptoms), so this subvariant (which we will probably be seeing more of, until the lessons gets into the global conciousness) could probably use a specific term. 'Buffer Madness' sounds good to me; another possibility is some term related to blockage of the alimentary canal, where stuff just keeps building up - but my brain won't cough up, right at this moment, the term I'm thinking of. Noel From jnc at mercury.lcs.mit.edu Tue Sep 8 20:24:22 2009 From: jnc at mercury.lcs.mit.edu (Noel Chiappa) Date: Tue, 8 Sep 2009 23:24:22 -0400 (EDT) Subject: [e2e] congestion collapse definition Message-ID: <20090909032422.F38C56BE60D@mercury.lcs.mit.edu> > From: rick jones > It was my understanding that congestion collapse was defined based on > the effective throughput (goodput?) dropping to epsilon while the link > utilization was 100%. Well, I think that general concept may be a good thing to have in there (the link getting little real work done, even though its utilization is high), but two caveats. First, I do think the definition of CC needs to say something about what's _causing_ that situation (because it's probably possible to have other causes of epsilon goodput with 100% utilization) - and for CC, it is 'excess offered load' that is the cause. Second, I'm not sure it has to be as low as epsilon; I'd say that a link which which is 100% used, but only 20% goodput, is in in something very close to congestive collapse. OK, so maybe 'total congestive collapse' is epsilon goodput, but then what is, say, only 20% goodput? Maybe we should just throw a generic term at it, rather than a specific level; perhaps something like say 'little or no productive throughput with high utilization'. > I would assert, without any additional substantiation, that unless the > effective throughput goes to epsilon, it isn't "congestion collapse" > and one would indeed need another term. Now that I think about it, David's scenario is quite interesting, because it's not clear that any packets are being uselessly retransmitted or discarded. In other words, in David's case, goodput may actually be quite high (as in, all the packets will eventually be accepted and processed at their destination); it's just that the packets are _so_ delayed that the human (or automaton) users have given up. So once again we circle back to the definition of 'the offered load is so high, above and beyond the capacity of the network, that the network can't provide a useful level of service'. But is that 'congestive collapse', or just plain old 'way overloaded'? So maybe we need to have some definition in terms of queue lengths and/or drop rates: 'when because of excess offered load, either i) queue lengths get ridiculously long, or ii) drop rates get very high, the network is in congestive collapse'. Noel From Jon.Crowcroft at cl.cam.ac.uk Tue Sep 8 21:34:04 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Wed, 09 Sep 2009 05:34:04 +0100 Subject: [e2e] congestion collapse definition In-Reply-To: <20090909025805.D6ACB6BE609@mercury.lcs.mit.edu> References: <20090909025805.D6ACB6BE609@mercury.lcs.mit.edu> Message-ID: yes ++ i recall being taught about this pre-internet and when (just pre 1988 tcp fixen) load increase leads to fall in throughput (as in aloha, csma, or any contended media) due to more time being spent contending, and colliding, than in resolving conention and getting a "slot" - whether the slot is on media, or in a buffer the original religion was that under such loads, one should switch to reservation schemes... but then the control theory approach that kk/raj jain, van et al introduced, led to graceful asymptote to some operating point with maximum goodput, rather than collapse, was a shift in religion - some design rules emerged for parameters (like buffer sizing, packet sizes, media loss due to non congestive reasons) that would then allow these control feedback schemes to work - otherwise you'd still get collapse the "buffer madness" turned out to be a new religion...plus there are assumptions about the balance of traffic matrix and the rate of change of number of flows (i.e. what had been 2nd order effects) that are violated in real world scenarios so that you still get congestive collapse - i.e. correlated flows (flash crowds) and accidental topological ddos attacks (which the underprvisioned 3g access might be yet another example of?) and all compounded by mis-construing the desgn rules... i.e. we've moved on to the next set of problems...:) In missive <20090909025805.D6ACB6BE609 at mercury.lcs.mit.edu>, Noel Chiappa typed: >> > From: "David P. Reed" >> >> > If I look at Wikipedia, for example, at the definition of congestion >> > collapse there, it says that CC is characterized by large buffering >> > delays AND lost packets. >> >>I was responsible for the original version of that page, so any faults in it >>can probably be laid at my door... :-) >> >>When writing it, I was probably thinking of the examples of congestive >>collapse I had seen (e.g. the ARPANet TCP email meltdown, TFTP Sorcerer's >>Apprentice Syndrome, etc) in which high packet drop rates were seen. Of >>course, that was 'back in the day', when routers had a _lot_ less buffering >>than they do now, so the symptoms naturally differed somewhat. >> >> > should this phenomenon be included in "congestion collapse" (I believe >> > so), or should we invent a new more specific name (Buffer Madness?). >> >>Both, I think. >> >>In general terms, they are both examples of the same basic concept; people >>give the network a lot more traffic than it can handle, and as a result things >>turn to total alimentary byproduct. To me, the term 'congestive collapse' is a >>perfect term for that general situation, because the network is, well, >>congested; so congested it effectively ceases to function. (Ceases to function >>effectively?) Congestion + failure -> congestive collapse. (Somehow >>'congestive failure' doesn't have the same ring - probably the alliteration.) >> >>At the same time, they are interestingly different (in terms of the >>symptoms), so this subvariant (which we will probably be seeing more of, >>until the lessons gets into the global conciousness) could probably use a >>specific term. 'Buffer Madness' sounds good to me; another possibility is >>some term related to blockage of the alimentary canal, where stuff just keeps >>building up - but my brain won't cough up, right at this moment, the term I'm >>thinking of. >> >> Noel cheers jon From william.allen.simpson at gmail.com Wed Sep 9 01:00:46 2009 From: william.allen.simpson at gmail.com (William Allen Simpson) Date: Wed, 09 Sep 2009 04:00:46 -0400 Subject: [e2e] 64-bit timestamps? In-Reply-To: <4AA6FDA1.9000307@reed.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> <4AA6DDEF.4080907@gmail.com> <4AA6FDA1.9000307@reed.com> Message-ID: <4AA760AE.3030806@gmail.com> David P. Reed wrote: > On 09/08/2009 06:42 PM, William Allen Simpson wrote: >> David P. Reed wrote: >>> In regard to DNS security issues, ... But PAWS may not be useful, >>> since DNS itself might be made to maintain state across connections, >>> moving the problem out of TCP and into the app (DNS) layer where it >>> probably belongs. >>> >> This has no relation to the question that I asked, which has no mention >> what-so-ever about DNS security. Nor did I find the cut and paste of an >> old familiar RFC appendix particularly informative, not even in fancy >> multi-part alternative html (instead of the native format).... >> >> In any case, I've been paying attention to the more recent 1323bis. >> > List moderator - please suspend Simpson's privileges; the rules suggest > that his obnoxious behavior towards a helpful comment demand moderation > until he stops behaving this way. > Or vice versa. Must Mr. Reed "poison the well" of every discussion? I've fixed his top-posting (again), and his multipart alternative (again). The comment was *not* helpful, as it bore no relation to the query, and his proof by assertion is not good argument. Nor is his condescension appropriate behavior. Going forward, I'll do my best to ignore his posts, as long as they don't interfere with discussion on the merits. From william.allen.simpson at gmail.com Wed Sep 9 01:18:50 2009 From: william.allen.simpson at gmail.com (William Allen Simpson) Date: Wed, 09 Sep 2009 04:18:50 -0400 Subject: [e2e] 64-bit timestamps? In-Reply-To: <5D7E69F3-8AD2-4DBF-9A65-ABF6B8482A52@nokia.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> <5D7E69F3-8AD2-4DBF-9A65-ABF6B8482A52@nokia.com> Message-ID: <4AA764EA.7030505@gmail.com> Lars Eggert wrote: > On 2009-9-9, at 1:24, David P. Reed wrote: >> In regard to DNS security issues, I suggest reading Appendix B of RFC >> 1323 on whether PAWS helps. (I quote B.2 below). > > FYI, the TCPM working group is currently working on an update to RFC > 1323 (http://tools.ietf.org/html/draft-ietf-tcpm-1323bis) and would be > interested in receiving feedback on the current draft. > Thank you, I've already indicated that I'm aware of that draft (although the other poster apparently was not). Nothing there discusses 64-bit timestamps. Anyway, looking at the existing code, it seems relatively easy expanding to 64-bit timestamps by zeroing the first 32 bits. Perhaps in the future somebody will find the extension useful. The negotiation is relatively straightforward. carries 32-bits plus another 32-bits of zero (as usual). carries a full 64-bit timestamp, and the original sender merely calculates RTT from its own saved timestamp in the old-fashioned way (Karn's algorithm). +data carries two full 64-bit timestamps, easily distinguished. From dpreed at reed.com Wed Sep 9 06:04:22 2009 From: dpreed at reed.com (David P. Reed) Date: Wed, 09 Sep 2009 09:04:22 -0400 Subject: [e2e] [unclassified] Re: 64-bit timestamps? In-Reply-To: <4AA760AE.3030806@gmail.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> <4AA6DDEF.4080907@gmail.com> <4AA6FDA1.9000307@reed.com> <4AA760AE.3030806@gmail.com> Message-ID: <4AA7A7D6.5010104@reed.com> [Top posting because it makes sense in this case, so one does not have to dig through intercalated replies as one follows the top level of a thread. And multi-part miming because it is the most polite to the most diverse crowd of mail readers, as well as being a *standard* (Dave Crocker will verify this).] Clearly I made a mistake in guessing what makes the introduction of 64-bit TSOPT worthy of a query. So let me avoid guessing why this was asked, since the only explanation offered by the author of the query was Instead, I will ask the question that should be asked of every proposal: what problem would this solve, does it solve it fully, and is it a problem that is best answered by leaving it to the application in question? (this is the end-to-end argument's core question - one that should be considered in every network protocol design). AFAICT from the author's brief posting of his query, the rationale was a reference to "back in the day there was an expressed interest" and "would it be of some utility". Neither of these are problem statements sufficient to justify a change that affects all TCP stacks on all platforms. If the author is seeking help, perhaps he will grace us with an explanation of a problem that this proposal would be the best approach to solve? I would point out that the sequence space and PAWS tend to fall into the category of "functions" that do not fully solve applications problems, but are merely "optimizations". The text in RFC 1323 alludes to exactly this point in the cited Appendix B (which I will not requote for the readers' convenience here, given the authors' rage at my use of an email to cite a key section of design rationale). On 09/09/2009 04:00 AM, William Allen Simpson wrote: > David P. Reed wrote: >> On 09/08/2009 06:42 PM, William Allen Simpson wrote: >>> David P. Reed wrote: >>>> In regard to DNS security issues, ... But PAWS may not be useful, >>>> since DNS itself might be made to maintain state across >>>> connections, moving the problem out of TCP and into the app (DNS) >>>> layer where it probably belongs. >>>> >>> This has no relation to the question that I asked, which has no mention >>> what-so-ever about DNS security. Nor did I find the cut and paste >>> of an >>> old familiar RFC appendix particularly informative, not even in fancy >>> multi-part alternative html (instead of the native format).... >>> >>> In any case, I've been paying attention to the more recent 1323bis. >>> >> List moderator - please suspend Simpson's privileges; the rules >> suggest that his obnoxious behavior towards a helpful comment demand >> moderation until he stops behaving this way. >> > Or vice versa. Must Mr. Reed "poison the well" of every discussion? > > I've fixed his top-posting (again), and his multipart alternative > (again). > > The comment was *not* helpful, as it bore no relation to the query, and > his proof by assertion is not good argument. Nor is his condescension > appropriate behavior. > > Going forward, I'll do my best to ignore his posts, as long as they don't > interfere with discussion on the merits. > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090909/cfb343f2/attachment.html From detlef.bosau at web.de Wed Sep 9 07:09:08 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 09 Sep 2009 16:09:08 +0200 Subject: [e2e] congestion collapse definition In-Reply-To: <4AA6D511.7000300@reed.com> References: <4AA6D511.7000300@reed.com> Message-ID: <4AA7B704.5030606@web.de> David P. Reed wrote: > Folks, I tend not to use the term "congestion collapse", though it is > in common use in the Internet community. > > The phenomenon I've been experiencing on the other thread about AT&T > 3G data access network configuration on this list, if I'm correct (as > I'm pretty sure I am) should probably be called "congestion collapse", > or else we need a new term. > > The phenomenon observed in Comcast's debacle with DOCSIS upstream > buffering should be called by the same term - again, buffering is > allowed to build on a shared queue carrying diverse traffic, without > providing any feedback that can be recognized by TCP's rate control > loop, leading to positive feedback and uncontrolled delay. > Hm. I think, you encounter two problems. 1.: Overbuffering. (Which you frequently mention yourself in the context of wireless networks.) 2.: And I have to guess here because I don't have any first hand experience with DOCSIS, the MAC algorithm used in DOCSIS or by Comcast respectively. > If I look at Wikipedia, for example, at the definition of congestion > collapse there, it says that CC is characterized by large buffering > delays AND lost packets. However, in the Comcast and ATT cases here, > the queues get so obnoxiously long (5-10 seconds) that users > presumably give up running apps long before packet *loss* sets in due > to overflow. Isn't this a clear symptom of overbuffering? Particularly, as TCP is likely to see quite a number of RTO backoffs when starting a session? This reminds me of the difficulty of TCP in establishing a proper ACK clock in wireless networks in case of "short time disconnections", which Lachlan mentioned some weeks ago. However, when a queuing delay in wirebound networs grows to 5 or 10 seconds, the queue is obviously dimensioned far too large. > Back to my question: should this phenomenon be included in "congestion > collapse" (I believe so), or should we invent a new more specific name > (Buffer Madness?). It is surely not congestion collapse, because the traffic growth in CC is caused by extensive retransmissions. So, CC is basically self induced. More precisely: The traffic growth is self induced and the resulting packet loss is self induced. When, if, there is something self induced in your problem, this is the growth of the RTO. However, this is hardly a real self induction. I think "buffer madness" or "extensive buffering syndrome" would be suitable. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From detlef.bosau at web.de Wed Sep 9 07:11:51 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 09 Sep 2009 16:11:51 +0200 Subject: [e2e] congestion collapse definition In-Reply-To: <4AA7B704.5030606@web.de> References: <4AA6D511.7000300@reed.com> <4AA7B704.5030606@web.de> Message-ID: <4AA7B7A7.30304@web.de> "excessive buffering syndrome" would be better. -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From perfgeek at mac.com Wed Sep 9 08:19:25 2009 From: perfgeek at mac.com (rick jones) Date: Wed, 09 Sep 2009 08:19:25 -0700 Subject: [e2e] congestion collapse definition In-Reply-To: <4AA7B7A7.30304@web.de> References: <4AA6D511.7000300@reed.com> <4AA7B704.5030606@web.de> <4AA7B7A7.30304@web.de> Message-ID: <80194711-BE01-42F2-9A37-9C717845EB4A@mac.com> On Sep 9, 2009, at 7:11 AM, Detlef Bosau wrote: > "excessive buffering syndrome" would be better. I'll add my stripe to the bikeshed and suggest: congestive queueing which conveys the queueing, strikes fear of congestion, but stops short of declaring collapse since, if I understand correctly, we are in a situation where the queue does not overflow (much). rick jones there is no rest for the wicked, yet the virtuous have no pillows From detlef.bosau at web.de Wed Sep 9 08:41:18 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Wed, 09 Sep 2009 17:41:18 +0200 Subject: [e2e] congestion collapse definition In-Reply-To: <80194711-BE01-42F2-9A37-9C717845EB4A@mac.com> References: <4AA6D511.7000300@reed.com> <4AA7B704.5030606@web.de> <4AA7B7A7.30304@web.de> <80194711-BE01-42F2-9A37-9C717845EB4A@mac.com> Message-ID: <4AA7CC9E.7040106@web.de> rick jones wrote: > > On Sep 9, 2009, at 7:11 AM, Detlef Bosau wrote: >> "excessive buffering syndrome" would be better. > > I'll add my stripe to the bikeshed and suggest: > > congestive queueing Not quite, because filling available buffer space does not cause congestion. However, excessive buffering may cause extremely large round trip times. And particularly, it makes the user believe in large "bandwidth delay products". With all the known consequences, e.g. receiver window scaling and long times for competitive flows to converge to equilibrium etc. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From detlef.bosau at web.de Thu Sep 10 09:23:37 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Thu, 10 Sep 2009 18:23:37 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> Message-ID: <4AA92809.8010004@web.de> Dominik Kaspar wrote: > > A CDF illustration of the results is available here: > http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png > > What is the reason of these two modes? Is it caused by adaptive > modulation and coding on the physical layer? If so, why does it affect > the delay so much? I would only expect a reduced bandwidth, but not > much change in delay... > > I'm a bit curious about this discussion. I really thought, these things were understood, and it's only me, who doen't know the literature. A remarkable property of HSDPA is that service times may vary on an _extremely_ large range. This is due to variations in - line coding, - channel coding, - transport block length, i.e. code usage and puncturing and - MAC delay. For quite a few days now, I'm thinking on whether this delay variation may even affect the algorithms for TCP RTO calculation (refer to Edge's paper and its assumptions). I'm not surprised about a multimodal behaviour here. If, I would be surprised to see _only_ two modes here. To my knowledge, WLAN uses only two line codings (HSDPA and the like may use three, is this correct? QPSK, 16 QAM and sometimes even 64 QAM?), however, there is less variation in channel coding and puncturing etc. like in HSDPA. I gathered some results from the EURANE project and related research projects on http://www.detlef-bosau.de/index.php?select=symbols in order to have an overview, at least for myself, about the delay variation in HSDPA. Please note, that even a transport block's "Payload" may vary from 176 to 21576 bits. And HSDPA may repeat a transport block up to three times. So, without MAC and propagation latency, the HSDPA throughput (for _code_ bits, not even for _information_ bits) may vary from 21576 bits / 2 ms = 10798000 bits/second downto 176 bits / 6 ms (eq. three sending attemps) = 29333 bits/s. This is a factor of nearly 369. And I neglected - any propagation delay, - the times used for ACKs/NAKs on L2, - MAC delays. If you think about up to 8 terminals in a cell, you may well see a certain throughput (gross) (? I always mix up gross and net bit rate here, I'm German, our chancelor always gets confused with gross and net....) at one time, and a 2500 times larger one at another time. And of course, you're likely to see anything in between. So, it's basically somewhat astonishing to see _only_ two modes here.... Do we have more precise information about the scenario? Detlef > Greetings, > Dominik > > > On Tue, Sep 8, 2009 at 7:56 PM, David P. Reed wrote: > >> I should not have been so cute - I didn't really want to pick on the >> operator involved, because I suspect that other 3G operators around the >> world probably use the same equipment and same rough configuration. >> >> The ping and traceroute were from Chicago, using an ATT Mercury data modem, >> the same channel as the Apple iPhones use, but it's much easier to run test >> suites from my netbook. >> >> Here's the same test from another time of day, early Sunday morning, when >> things were working well. >> >> Note that I ran the test over the entire labor day weekend at intervals. >> The end-to-end ping time was bimodal. Either it pegged at over 5000 >> milliseconds, or happily sat at under 200 milliseconds. Exactly what one >> would expect if TCP congestion control were disabled by overbuffering in a >> router preceding the bottleneck link shared by many users. >> >> ------------------------------ >> >> $ ping lcs.mit.edu >> PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 >> time=209 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 >> time=118 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 >> time=166 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 >> time=165 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 >> time=224 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 >> time=183 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 >> time=224 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 >> time=181 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 >> time=220 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 >> time=179 ms >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 >> time=219 ms >> ^C >> --- lcs.mit.edu ping statistics --- >> 11 packets transmitted, 11 received, 0% packet loss, time 10780ms >> rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms >> $ traceroute lcs.mit.edu >> traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets >> 1 * * * >> 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms >> 3 * * * >> 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms 207.527 ms >> 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms >> 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms 90.083 ms >> 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 ms 91.576 ms >> 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms 88.825 ms >> 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms 80.918 ms >> 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms 70.927 ms >> 78.817 ms >> 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 ms 156.892 ms >> 140.711 ms >> 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 ms 139.764 ms >> 129.853 ms >> 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms 154.366 ms >> 152.225 ms >> 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms 129.801 ms 89.659 >> ms >> 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms 118.818 ms 91.727 >> ms >> 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms 88.768 ms >> 85.837 ms >> 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms 103.569 ms >> $ >> >> >> >> > > -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From dokaspar.ietf at gmail.com Thu Sep 10 17:37:35 2009 From: dokaspar.ietf at gmail.com (Dominik Kaspar) Date: Fri, 11 Sep 2009 02:37:35 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA6E3AC.9060404@reed.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> Message-ID: <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> Hi David, Thanks for the explanations about the bottleneck link to the backbone ISP. The illustrated system architecture and the overuse of buffers certainly sounds like reasonable cause for those huge delays you have posted at the beginning of this thread. The "bimodal" behaviour of delays > 5000 ms and delays < 200 ms that you have measured is really extreme and it seems to differ somewhat from what I have observed. In my experiments, the delay abruptly switches between two rather stable "modes"... sometimes every few minutes, sometimes just once a day. It is completely unpredictable and I have not yet found _the_ explanation for its cause. I doubt it has anything to do with TCP... it seems much more likely to be one of the HSDPA-specific properties that Detlef has pointed out (line coding, MAC-layer ACKs, ...). Here is the entire 24h ping log that clearly illustrates the two "modes": http://home.simula.no/~kaspar/static/ping-hsdpa-24h-bimodal-00.txt Greetings, Dominik On Wed, Sep 9, 2009 at 1:07 AM, David P. Reed wrote: > I'm willing to bet that you are seeing the same problem I am, and that it > has nothing to do with the modem or wireless protocol. > > Instead you are seeing what would happen if you simulate in ns2 the > following system structure: > > -------------------------\ > --------------------------\ > ---------------------------\ > ??? ? wireless medium?? [WIRELESS HUB]------[ROUTER]-----------backbone ISP > ---------------------------/ > --------------------------/ > > When the link between the ROUTER and backbone ISP is of lower bitrate B than > the sum of all the realizable simultaneous uplink demand from devices on the > left, the outbound queue of the router is of size M > BT where T is the > observed stable long delay, and the ROUTER does nothing to signal congestion > until the entire M bytes (now very large) of memory are exhausted. > > Memory is now very cheap, and not-very-clueful network layer 2 designers > (who don't study TCP or the Internet) are likely to throw too much at the > problem without doing the right thing in their firmware. > > On 09/08/2009 06:47 PM, Dominik Kaspar wrote: > > Hello David, > > You mentioned the bimodal behaviour of your 3G connection. I recently > noticed the same thing but have not yet been able to explain why this > happens. > > I also ran Ping tests over multiple days using an HSDPA modem (with > both the client and server located in Oslo, Norway). The experienced > RTTs were very stable over short periods of time, but sometimes they > averaged around 80ms, while at other times the average was at about > 300ms. > > A CDF illustration of the results is available here: > http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png > > What is the reason of these two modes? Is it caused by adaptive > modulation and coding on the physical layer? If so, why does it affect > the delay so much? I would only expect a reduced bandwidth, but not > much change in delay... > > Greetings, > Dominik > > > On Tue, Sep 8, 2009 at 7:56 PM, David P. Reed wrote: > > > I should not have been so cute - I didn't really want to pick on the > operator involved, because I suspect that other 3G operators around the > world probably use the same equipment and same rough configuration. > > The ping and traceroute were from Chicago, using an ATT Mercury data modem, > the same channel as the Apple iPhones use, but it's much easier to run test > suites from my netbook. > > Here's the same test from another time of day, early Sunday morning, when > things were working well. > > Note that I ran the test over the entire labor day weekend at intervals. > The end-to-end ping time was bimodal. ?Either it pegged at over 5000 > milliseconds, or happily sat at under 200 milliseconds. ? Exactly what one > would expect if TCP congestion control were disabled by overbuffering in a > router preceding the bottleneck link shared by many users. > > ------------------------------ > > $ ping lcs.mit.edu > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > time=209 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > time=118 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 > time=166 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 > time=165 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 > time=224 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 > time=183 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 > time=224 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 > time=181 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 > time=220 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 > time=179 ms > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 > time=219 ms > ^C > --- lcs.mit.edu ping statistics --- > 11 packets transmitted, 11 received, 0% packet loss, time 10780ms > rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms > $ traceroute lcs.mit.edu > traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets > ?1 ?* * * > ?2 ?172.26.248.2 (172.26.248.2) ?178.725 ms ?178.568 ms ?179.500 ms > ?3 ?* * * > ?4 ?172.16.192.34 (172.16.192.34) ?187.794 ms ?187.677 ms ?207.527 ms > ?5 ?12.88.7.205 (12.88.7.205) ?207.416 ms ?208.325 ms ?69.630 ms > ?6 ?cr84.cgcil.ip.att.net (12.122.152.134) ?79.425 ms ?89.227 ms ?90.083 ms > ?7 ?cr2.cgcil.ip.att.net (12.123.7.250) ?98.679 ms ?90.727 ms ?91.576 ms > ?8 ?ggr2.cgcil.ip.att.net (12.122.132.137) ?72.728 ms ?89.628 ms ?88.825 ms > ?9 ?192.205.33.186 (192.205.33.186) ?89.787 ms ?89.794 ms ?80.918 ms > 10 ?ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) ?79.895 ms ?70.927 ms > ?78.817 ms > 11 ?ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) ?107.820 ms ?156.892 ms > ?140.711 ms > 12 ?ae-7-7.car1.Boston1.Level3.net (4.69.132.241) ?139.638 ms ?139.764 ms > ?129.853 ms > 13 ?MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) ?149.595 ms ?154.366 ms > ?152.225 ms > 14 ?B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) ?146.808 ms ?129.801 ms ?89.659 > ms > 15 ?MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) ?109.463 ms ?118.818 ms ?91.727 > ms > 16 ?trantor.kalgan.csail.mit.edu (128.30.0.246) ?91.541 ms ?88.768 ms > ?85.837 ms > 17 ?zermatt.csail.mit.edu (128.30.2.121) ?117.581 ms ?116.564 ms ?103.569 ms > $ > > > > > > From zartash at lums.edu.pk Thu Sep 10 20:43:09 2009 From: zartash at lums.edu.pk (zartash) Date: Fri, 11 Sep 2009 09:43:09 +0600 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AA92809.8010004@web.de> Message-ID: <2e798b0c-c885-43fd-be7e-68625de0d00d@exchht01.lums.net> > -----Original Message----- > From: end2end-interest-bounces at postel.org [mailto:end2end-interest- > bounces at postel.org] On Behalf Of Detlef Bosau > Sent: Thursday, September 10, 2009 10:24 PM > To: end2end-interest at postel.org > Cc: David P. Reed > Subject: Re: [e2e] What's wrong with this picture? > > Dominik Kaspar wrote: > > > > A CDF illustration of the results is available here: > > http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png > > > > What is the reason of these two modes? Is it caused by adaptive > > modulation and coding on the physical layer? If so, why does it affect > > the delay so much? I would only expect a reduced bandwidth, but not > > much change in delay... > > > > > > I'm a bit curious about this discussion. I really thought, these things > were understood, and it's only me, who doen't know the literature. > > A remarkable property of HSDPA is that service times may vary on an > _extremely_ large range. > > This is due to variations in > - line coding, [ZAU] You mean the modulation, and not line coding? In any case, I am surprised that a change in modulation would change the delay (or service time, for that matter), by so much. By a variation in modulation, I would expect a change in data rate but not in the delay. Is there a study that you are referring to? > - channel coding, > - transport block length, i.e. code usage and puncturing and > - MAC delay. > > For quite a few days now, I'm thinking on whether this delay variation > may even affect the algorithms for TCP RTO calculation (refer to Edge's > paper and its assumptions). > > I'm not surprised about a multimodal behaviour here. If, I would be > surprised to see _only_ two modes here. [ZAU] If I assume that the change in modulation does, in fact, impact the delay (for whatever reason!), then I will NOT be surprised to see only two modes here. There are only three possible modulations in HSDPA and it is not unusual that only two of them are used. This is because HSDPA uses a particular modulation based on channel conditions, and it is okay to not observe extreme variations in the channel conditions. It would be nice if the experiment is repeated when the laptop with HSDPA card moved around -- but still no guarantee that all the three modulations will get a chance to be utlized. This all assumes that modulation affects the delay significantly, something I am still unable to make peace with. > To my knowledge, WLAN uses only two line codings (HSDPA and the like may > use three, is this correct? QPSK, 16 QAM and sometimes even 64 QAM?), > however, there is less variation in channel coding and puncturing etc. > like in > HSDPA. [ZAU] At one time, WLAN uses one of the two modulation schemes (well, 11g to be precise!) which are DSSS and OFDM but a number of connection speeds on each of these modulations. And yes, HSDPA uses one of the three modulation schemes. In both cases, the modulation scheme and data rates at a given point in time are selected based on the channel conditions. > I gathered some results from the EURANE project and related research > projects on http://www.detlef-bosau.de/index.php?select=symbols > in order to have an overview, at least for myself, about the delay > variation in HSDPA. > > Please note, that even a transport block's "Payload" may vary from 176 > to 21576 bits. And HSDPA may repeat a transport block up to three times. > So, without MAC and propagation latency, the HSDPA throughput (for > _code_ bits, not even for _information_ bits) may vary from 21576 bits / > 2 ms = 10798000 bits/second downto 176 bits / 6 ms (eq. three sending > attemps) = 29333 bits/s. This is a factor of nearly 369. > > And I neglected > - any propagation delay, > - the times used for ACKs/NAKs on L2, > - MAC delays. > > If you think about up to 8 terminals in a cell, you may well see a > certain throughput (gross) (? I always mix up gross and net bit rate > here, I'm German, our chancelor always gets confused with gross and > net....) at one time, and a 2500 times larger one at another time. > > And of course, you're likely to see anything in between. So, it's > basically somewhat astonishing to see _only_ two modes here.... [ZAU] I am a bit confused here, are we mixing up throughput and delay here? Best regards, Zartash > Do we have more precise information about the scenario? > > Detlef > > > > Greetings, > > Dominik > > > > > > On Tue, Sep 8, 2009 at 7:56 PM, David P. Reed wrote: > > > >> I should not have been so cute - I didn't really want to pick on the > >> operator involved, because I suspect that other 3G operators around the > >> world probably use the same equipment and same rough configuration. > >> > >> The ping and traceroute were from Chicago, using an ATT Mercury data > modem, > >> the same channel as the Apple iPhones use, but it's much easier to run > test > >> suites from my netbook. > >> > >> Here's the same test from another time of day, early Sunday morning, > when > >> things were working well. > >> > >> Note that I ran the test over the entire labor day weekend at > intervals. > >> The end-to-end ping time was bimodal. Either it pegged at over 5000 > >> milliseconds, or happily sat at under 200 milliseconds. Exactly what > one > >> would expect if TCP congestion control were disabled by overbuffering > in a > >> router preceding the bottleneck link shared by many users. > >> > >> ------------------------------ > >> > >> $ ping lcs.mit.edu > >> PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=1 ttl=44 > >> time=209 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=2 ttl=44 > >> time=118 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=3 ttl=44 > >> time=166 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=4 ttl=44 > >> time=165 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=5 ttl=44 > >> time=224 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=6 ttl=44 > >> time=183 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=7 ttl=44 > >> time=224 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=8 ttl=44 > >> time=181 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=9 ttl=44 > >> time=220 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=10 ttl=44 > >> time=179 ms > >> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): icmp_seq=11 ttl=44 > >> time=219 ms > >> ^C > >> --- lcs.mit.edu ping statistics --- > >> 11 packets transmitted, 11 received, 0% packet loss, time 10780ms > >> rtt min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms > >> $ traceroute lcs.mit.edu > >> traceroute to lcs.mit.edu (128.30.2.121), 30 hops max, 60 byte packets > >> 1 * * * > >> 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms > >> 3 * * * > >> 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms 207.527 ms > >> 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms > >> 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms > 90.083 ms > >> 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 ms 91.576 > ms > >> 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms > 88.825 ms > >> 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms 80.918 ms > >> 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms 70.927 > ms > >> 78.817 ms > >> 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 ms 156.892 > ms > >> 140.711 ms > >> 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 ms 139.764 > ms > >> 129.853 ms > >> 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms > 154.366 ms > >> 152.225 ms > >> 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms 129.801 ms > 89.659 > >> ms > >> 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms 118.818 ms > 91.727 > >> ms > >> 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms 88.768 ms > >> 85.837 ms > >> 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms > 103.569 ms > >> $ > >> > >> > >> > >> > > > > > > > -- > Detlef Bosau Galileistra?e 30 70565 Stuttgart > phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau > ICQ: 566129673 http://detlef.bosau at web.de > From stiliadi at alcatel-lucent.com Fri Sep 11 06:24:15 2009 From: stiliadi at alcatel-lucent.com (Stiliadis, Dimitrios (Dimitri)) Date: Fri, 11 Sep 2009 08:24:15 -0500 Subject: [e2e] What's wrong with this picture? In-Reply-To: <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> Message-ID: btw, the bimodal can be also explained if your card is attaching to two different cells (or bands), depending on signal conditions/fading etc. (i.e. it oscillates between cell A and cell B or band A and band B). Also, it is not rare to see that the device will switch between HSDPA and EDGE if the signal conditions on HSDPA are marginal. Several tests have shown that EDGE delays of 500ms are normal. In order to fully understand the issue, you need to also log all the modem information (such as cell ID, band that it is attached etc.etc.), or you need to perform the test in a anechoic chamber with test equipment. Usually you can do that, by issuing the right AT commands to the modem. David: As for the uplink bandwidth bottleneck, the problem there other than larger router buffers is that ARQ interprets losses on the wire as losses on the air, and tries retransmissions. Your picture is not accurate without an RNC in the middle that implements some form of ARQ/RLP There are just two many things going on in the background, that pings alone cannot show. Cheers, Dimitri > -----Original Message----- > From: end2end-interest-bounces at postel.org > [mailto:end2end-interest-bounces at postel.org] On Behalf Of > Dominik Kaspar > Sent: Thursday, September 10, 2009 8:38 PM > To: David P. Reed > Cc: end2end-interest at postel.org > Subject: Re: [e2e] What's wrong with this picture? > > Hi David, > > Thanks for the explanations about the bottleneck link to the > backbone ISP. The illustrated system architecture and the > overuse of buffers certainly sounds like reasonable cause for > those huge delays you have posted at the beginning of this thread. > > The "bimodal" behaviour of delays > 5000 ms and delays < 200 > ms that you have measured is really extreme and it seems to > differ somewhat from what I have observed. In my experiments, > the delay abruptly switches between two rather stable > "modes"... sometimes every few minutes, sometimes just once a > day. It is completely unpredictable and I have not yet found > _the_ explanation for its cause. I doubt it has anything to > do with TCP... it seems much more likely to be one of the > HSDPA-specific properties that Detlef has pointed out (line > coding, MAC-layer ACKs, ...). > > Here is the entire 24h ping log that clearly illustrates the > two "modes": > http://home.simula.no/~kaspar/static/ping-hsdpa-24h-bimodal-00.txt > > Greetings, > Dominik > > > On Wed, Sep 9, 2009 at 1:07 AM, David P. Reed wrote: > > I'm willing to bet that you are seeing the same problem I > am, and that > > it has nothing to do with the modem or wireless protocol. > > > > Instead you are seeing what would happen if you simulate in ns2 the > > following system structure: > > > > -------------------------\ > > --------------------------\ > > ---------------------------\ > > ??? ? wireless medium?? [WIRELESS > > HUB]------[ROUTER]-----------backbone ISP > ---------------------------/ > > --------------------------/ > > > > When the link between the ROUTER and backbone ISP is of > lower bitrate > > B than the sum of all the realizable simultaneous uplink > demand from > > devices on the left, the outbound queue of the router is of > size M > > > BT where T is the observed stable long delay, and the ROUTER does > > nothing to signal congestion until the entire M bytes (now > very large) of memory are exhausted. > > > > Memory is now very cheap, and not-very-clueful network layer 2 > > designers (who don't study TCP or the Internet) are likely to throw > > too much at the problem without doing the right thing in > their firmware. > > > > On 09/08/2009 06:47 PM, Dominik Kaspar wrote: > > > > Hello David, > > > > You mentioned the bimodal behaviour of your 3G connection. > I recently > > noticed the same thing but have not yet been able to > explain why this > > happens. > > > > I also ran Ping tests over multiple days using an HSDPA modem (with > > both the client and server located in Oslo, Norway). The > experienced > > RTTs were very stable over short periods of time, but > sometimes they > > averaged around 80ms, while at other times the average was at about > > 300ms. > > > > A CDF illustration of the results is available here: > > http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png > > > > What is the reason of these two modes? Is it caused by adaptive > > modulation and coding on the physical layer? If so, why > does it affect > > the delay so much? I would only expect a reduced bandwidth, but not > > much change in delay... > > > > Greetings, > > Dominik > > > > > > On Tue, Sep 8, 2009 at 7:56 PM, David P. > Reed wrote: > > > > > > I should not have been so cute - I didn't really want to > pick on the > > operator involved, because I suspect that other 3G operators around > > the world probably use the same equipment and same rough > configuration. > > > > The ping and traceroute were from Chicago, using an ATT > Mercury data > > modem, the same channel as the Apple iPhones use, but it's > much easier > > to run test suites from my netbook. > > > > Here's the same test from another time of day, early Sunday > morning, > > when things were working well. > > > > Note that I ran the test over the entire labor day weekend > at intervals. > > The end-to-end ping time was bimodal. ?Either it pegged at > over 5000 > > milliseconds, or happily sat at under 200 milliseconds. ? > Exactly what > > one would expect if TCP congestion control were disabled by > > overbuffering in a router preceding the bottleneck link > shared by many users. > > > > ------------------------------ > > > > $ ping lcs.mit.edu > > PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=1 ttl=44 > > time=209 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=2 ttl=44 > > time=118 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=3 ttl=44 > > time=166 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=4 ttl=44 > > time=165 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=5 ttl=44 > > time=224 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=6 ttl=44 > > time=183 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=7 ttl=44 > > time=224 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=8 ttl=44 > > time=181 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=9 ttl=44 > > time=220 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=10 ttl=44 > > time=179 ms > > 64 bytes from zermatt.csail.mit.edu (128.30.2.121): > icmp_seq=11 ttl=44 > > time=219 ms > > ^C > > --- lcs.mit.edu ping statistics --- > > 11 packets transmitted, 11 received, 0% packet loss, time > 10780ms rtt > > min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms $ traceroute > > lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 > hops max, 60 > > byte packets > > ?1 ?* * * > > ?2 ?172.26.248.2 (172.26.248.2) ?178.725 ms ?178.568 ms ?179.500 ms > > ?3 ?* * * > > ?4 ?172.16.192.34 (172.16.192.34) ?187.794 ms ?187.677 ms ? > 207.527 ms > > ?5 ?12.88.7.205 (12.88.7.205) ?207.416 ms ?208.325 ms ?69.630 ms > > ?6 ?cr84.cgcil.ip.att.net (12.122.152.134) ?79.425 ms ?89.227 ms ? > > 90.083 ms > > ?7 ?cr2.cgcil.ip.att.net (12.123.7.250) ?98.679 ms ?90.727 > ms ?91.576 > > ms > > ?8 ?ggr2.cgcil.ip.att.net (12.122.132.137) ?72.728 ms ?89.628 ms ? > > 88.825 ms > > ?9 ?192.205.33.186 (192.205.33.186) ?89.787 ms ?89.794 ms ? > 80.918 ms > > 10 ?ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) ?79.895 ms ? > > 70.927 ms > > ?78.817 ms > > 11 ?ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) ?107.820 > ms ?156.892 > > ms > > ?140.711 ms > > 12 ?ae-7-7.car1.Boston1.Level3.net (4.69.132.241) ?139.638 > ms ?139.764 > > ms > > ?129.853 ms > > 13 ?MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) ?149.595 ms ? > > 154.366 ms > > ?152.225 ms > > 14 ?B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) ?146.808 ms ? > 129.801 ms ? > > 89.659 ms > > 15 ?MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) ?109.463 ms ? > 118.818 ms ? > > 91.727 ms > > 16 ?trantor.kalgan.csail.mit.edu (128.30.0.246) ?91.541 ms ? > 88.768 ms > > ?85.837 ms > > 17 ?zermatt.csail.mit.edu (128.30.2.121) ?117.581 ms ?116.564 ms ? > > 103.569 ms $ > > > > > > > > > > > > > > From dpreed at reed.com Fri Sep 11 08:00:36 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 11 Sep 2009 11:00:36 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> Message-ID: <4AAA6614.9040402@reed.com> I appreciate the your explanation, Dimitrios, is an attempt to explain why this might happen at layer 2. However, the service I am using is offered as an IP (layer 3) service. It is a *requirement* that layer 3 drop packets to signal congestion buildup. Otherwise, the binding of this particular layer 2 transport (with elastic 10 second queues) is something that is just WRONG to claim as an high-speed Internet service. (except for the case where the other end is 10 light seconds away). The entire congestion control mechanism (and approximate fairness mechanism) of the Internet works on the assumption that congestion is signaled to those who are congesting the network - so they can back off, which they will in fact do. I don't know who designed this system, but I can say that any serious ISP should NOT buy such equipment if it functions this way. On 09/11/2009 09:24 AM, Stiliadis, Dimitrios (Dimitri) wrote: > btw, the bimodal can be also explained if your card is attaching > to two different cells (or bands), depending on signal conditions/fading etc. > (i.e. it oscillates between cell A and cell B or band A and band B). > Also, it is not rare to see that the device will switch > between HSDPA and EDGE if the signal conditions on HSDPA are marginal. > Several tests have shown that EDGE delays of 500ms are normal. > > In order to fully understand the issue, you need to also log all the > modem information (such as cell ID, band that it is attached etc.etc.), > or you need to perform the test in a anechoic chamber with test > equipment. Usually you can do that, by issuing the right AT commands > to the modem. > > David: > > As for the uplink bandwidth bottleneck, the problem there other than > larger router buffers is that ARQ interprets losses on the wire > as losses on the air, and tries retransmissions. Your picture > is not accurate without an RNC in the middle that implements some form > of ARQ/RLP > > There are just two many things going on in the background, that pings > alone cannot show. > > Cheers, > > Dimitri > > >> -----Original Message----- >> From: end2end-interest-bounces at postel.org >> [mailto:end2end-interest-bounces at postel.org] On Behalf Of >> Dominik Kaspar >> Sent: Thursday, September 10, 2009 8:38 PM >> To: David P. Reed >> Cc: end2end-interest at postel.org >> Subject: Re: [e2e] What's wrong with this picture? >> >> Hi David, >> >> Thanks for the explanations about the bottleneck link to the >> backbone ISP. The illustrated system architecture and the >> overuse of buffers certainly sounds like reasonable cause for >> those huge delays you have posted at the beginning of this thread. >> >> The "bimodal" behaviour of delays> 5000 ms and delays< 200 >> ms that you have measured is really extreme and it seems to >> differ somewhat from what I have observed. In my experiments, >> the delay abruptly switches between two rather stable >> "modes"... sometimes every few minutes, sometimes just once a >> day. It is completely unpredictable and I have not yet found >> _the_ explanation for its cause. I doubt it has anything to >> do with TCP... it seems much more likely to be one of the >> HSDPA-specific properties that Detlef has pointed out (line >> coding, MAC-layer ACKs, ...). >> >> Here is the entire 24h ping log that clearly illustrates the >> two "modes": >> http://home.simula.no/~kaspar/static/ping-hsdpa-24h-bimodal-00.txt >> >> Greetings, >> Dominik >> >> >> On Wed, Sep 9, 2009 at 1:07 AM, David P. Reed wrote: >> >>> I'm willing to bet that you are seeing the same problem I >>> >> am, and that >> >>> it has nothing to do with the modem or wireless protocol. >>> >>> Instead you are seeing what would happen if you simulate in ns2 the >>> following system structure: >>> >>> -------------------------\ >>> --------------------------\ >>> ---------------------------\ >>> wireless medium [WIRELESS >>> HUB]------[ROUTER]-----------backbone ISP >>> >> ---------------------------/ >> >>> --------------------------/ >>> >>> When the link between the ROUTER and backbone ISP is of >>> >> lower bitrate >> >>> B than the sum of all the realizable simultaneous uplink >>> >> demand from >> >>> devices on the left, the outbound queue of the router is of >>> >> size M> >> >>> BT where T is the observed stable long delay, and the ROUTER does >>> nothing to signal congestion until the entire M bytes (now >>> >> very large) of memory are exhausted. >> >>> Memory is now very cheap, and not-very-clueful network layer 2 >>> designers (who don't study TCP or the Internet) are likely to throw >>> too much at the problem without doing the right thing in >>> >> their firmware. >> >>> On 09/08/2009 06:47 PM, Dominik Kaspar wrote: >>> >>> Hello David, >>> >>> You mentioned the bimodal behaviour of your 3G connection. >>> >> I recently >> >>> noticed the same thing but have not yet been able to >>> >> explain why this >> >>> happens. >>> >>> I also ran Ping tests over multiple days using an HSDPA modem (with >>> both the client and server located in Oslo, Norway). The >>> >> experienced >> >>> RTTs were very stable over short periods of time, but >>> >> sometimes they >> >>> averaged around 80ms, while at other times the average was at about >>> 300ms. >>> >>> A CDF illustration of the results is available here: >>> http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png >>> >>> What is the reason of these two modes? Is it caused by adaptive >>> modulation and coding on the physical layer? If so, why >>> >> does it affect >> >>> the delay so much? I would only expect a reduced bandwidth, but not >>> much change in delay... >>> >>> Greetings, >>> Dominik >>> >>> >>> On Tue, Sep 8, 2009 at 7:56 PM, David P. >>> >> Reed wrote: >> >>> >>> I should not have been so cute - I didn't really want to >>> >> pick on the >> >>> operator involved, because I suspect that other 3G operators around >>> the world probably use the same equipment and same rough >>> >> configuration. >> >>> The ping and traceroute were from Chicago, using an ATT >>> >> Mercury data >> >>> modem, the same channel as the Apple iPhones use, but it's >>> >> much easier >> >>> to run test suites from my netbook. >>> >>> Here's the same test from another time of day, early Sunday >>> >> morning, >> >>> when things were working well. >>> >>> Note that I ran the test over the entire labor day weekend >>> >> at intervals. >> >>> The end-to-end ping time was bimodal. Either it pegged at >>> >> over 5000 >> >>> milliseconds, or happily sat at under 200 milliseconds. >>> >> Exactly what >> >>> one would expect if TCP congestion control were disabled by >>> overbuffering in a router preceding the bottleneck link >>> >> shared by many users. >> >>> ------------------------------ >>> >>> $ ping lcs.mit.edu >>> PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=1 ttl=44 >> >>> time=209 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=2 ttl=44 >> >>> time=118 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=3 ttl=44 >> >>> time=166 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=4 ttl=44 >> >>> time=165 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=5 ttl=44 >> >>> time=224 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=6 ttl=44 >> >>> time=183 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=7 ttl=44 >> >>> time=224 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=8 ttl=44 >> >>> time=181 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=9 ttl=44 >> >>> time=220 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=10 ttl=44 >> >>> time=179 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=11 ttl=44 >> >>> time=219 ms >>> ^C >>> --- lcs.mit.edu ping statistics --- >>> 11 packets transmitted, 11 received, 0% packet loss, time >>> >> 10780ms rtt >> >>> min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms $ traceroute >>> lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 >>> >> hops max, 60 >> >>> byte packets >>> 1 * * * >>> 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms >>> 3 * * * >>> 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms >>> >> 207.527 ms >> >>> 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms >>> 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms >>> 90.083 ms >>> 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 >>> >> ms 91.576 >> >>> ms >>> 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms >>> 88.825 ms >>> 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms >>> >> 80.918 ms >> >>> 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms >>> 70.927 ms >>> 78.817 ms >>> 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 >>> >> ms 156.892 >> >>> ms >>> 140.711 ms >>> 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 >>> >> ms 139.764 >> >>> ms >>> 129.853 ms >>> 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms >>> 154.366 ms >>> 152.225 ms >>> 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms >>> >> 129.801 ms >> >>> 89.659 ms >>> 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms >>> >> 118.818 ms >> >>> 91.727 ms >>> 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms >>> >> 88.768 ms >> >>> 85.837 ms >>> 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms >>> 103.569 ms $ >>> >>> >>> >>> >>> >>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090911/0d818ab5/attachment.html From dpreed at reed.com Fri Sep 11 08:02:50 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 11 Sep 2009 11:02:50 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> Message-ID: <4AAA669A.6010300@reed.com> Signal condtions were excellent, 5 bars, line of sight out the window of a 6th floor hotel room, clear weather. On 09/11/2009 09:24 AM, Stiliadis, Dimitrios (Dimitri) wrote: > btw, the bimodal can be also explained if your card is attaching > to two different cells (or bands), depending on signal conditions/fading etc. > (i.e. it oscillates between cell A and cell B or band A and band B). > Also, it is not rare to see that the device will switch > between HSDPA and EDGE if the signal conditions on HSDPA are marginal. > Several tests have shown that EDGE delays of 500ms are normal. > > In order to fully understand the issue, you need to also log all the > modem information (such as cell ID, band that it is attached etc.etc.), > or you need to perform the test in a anechoic chamber with test > equipment. Usually you can do that, by issuing the right AT commands > to the modem. > > David: > > As for the uplink bandwidth bottleneck, the problem there other than > larger router buffers is that ARQ interprets losses on the wire > as losses on the air, and tries retransmissions. Your picture > is not accurate without an RNC in the middle that implements some form > of ARQ/RLP > > There are just two many things going on in the background, that pings > alone cannot show. > > Cheers, > > Dimitri > > >> -----Original Message----- >> From: end2end-interest-bounces at postel.org >> [mailto:end2end-interest-bounces at postel.org] On Behalf Of >> Dominik Kaspar >> Sent: Thursday, September 10, 2009 8:38 PM >> To: David P. Reed >> Cc: end2end-interest at postel.org >> Subject: Re: [e2e] What's wrong with this picture? >> >> Hi David, >> >> Thanks for the explanations about the bottleneck link to the >> backbone ISP. The illustrated system architecture and the >> overuse of buffers certainly sounds like reasonable cause for >> those huge delays you have posted at the beginning of this thread. >> >> The "bimodal" behaviour of delays> 5000 ms and delays< 200 >> ms that you have measured is really extreme and it seems to >> differ somewhat from what I have observed. In my experiments, >> the delay abruptly switches between two rather stable >> "modes"... sometimes every few minutes, sometimes just once a >> day. It is completely unpredictable and I have not yet found >> _the_ explanation for its cause. I doubt it has anything to >> do with TCP... it seems much more likely to be one of the >> HSDPA-specific properties that Detlef has pointed out (line >> coding, MAC-layer ACKs, ...). >> >> Here is the entire 24h ping log that clearly illustrates the >> two "modes": >> http://home.simula.no/~kaspar/static/ping-hsdpa-24h-bimodal-00.txt >> >> Greetings, >> Dominik >> >> >> On Wed, Sep 9, 2009 at 1:07 AM, David P. Reed wrote: >> >>> I'm willing to bet that you are seeing the same problem I >>> >> am, and that >> >>> it has nothing to do with the modem or wireless protocol. >>> >>> Instead you are seeing what would happen if you simulate in ns2 the >>> following system structure: >>> >>> -------------------------\ >>> --------------------------\ >>> ---------------------------\ >>> wireless medium [WIRELESS >>> HUB]------[ROUTER]-----------backbone ISP >>> >> ---------------------------/ >> >>> --------------------------/ >>> >>> When the link between the ROUTER and backbone ISP is of >>> >> lower bitrate >> >>> B than the sum of all the realizable simultaneous uplink >>> >> demand from >> >>> devices on the left, the outbound queue of the router is of >>> >> size M> >> >>> BT where T is the observed stable long delay, and the ROUTER does >>> nothing to signal congestion until the entire M bytes (now >>> >> very large) of memory are exhausted. >> >>> Memory is now very cheap, and not-very-clueful network layer 2 >>> designers (who don't study TCP or the Internet) are likely to throw >>> too much at the problem without doing the right thing in >>> >> their firmware. >> >>> On 09/08/2009 06:47 PM, Dominik Kaspar wrote: >>> >>> Hello David, >>> >>> You mentioned the bimodal behaviour of your 3G connection. >>> >> I recently >> >>> noticed the same thing but have not yet been able to >>> >> explain why this >> >>> happens. >>> >>> I also ran Ping tests over multiple days using an HSDPA modem (with >>> both the client and server located in Oslo, Norway). The >>> >> experienced >> >>> RTTs were very stable over short periods of time, but >>> >> sometimes they >> >>> averaged around 80ms, while at other times the average was at about >>> 300ms. >>> >>> A CDF illustration of the results is available here: >>> http://home.simula.no/~kaspar/static/cdf-hsdpa-rtt-00.png >>> >>> What is the reason of these two modes? Is it caused by adaptive >>> modulation and coding on the physical layer? If so, why >>> >> does it affect >> >>> the delay so much? I would only expect a reduced bandwidth, but not >>> much change in delay... >>> >>> Greetings, >>> Dominik >>> >>> >>> On Tue, Sep 8, 2009 at 7:56 PM, David P. >>> >> Reed wrote: >> >>> >>> I should not have been so cute - I didn't really want to >>> >> pick on the >> >>> operator involved, because I suspect that other 3G operators around >>> the world probably use the same equipment and same rough >>> >> configuration. >> >>> The ping and traceroute were from Chicago, using an ATT >>> >> Mercury data >> >>> modem, the same channel as the Apple iPhones use, but it's >>> >> much easier >> >>> to run test suites from my netbook. >>> >>> Here's the same test from another time of day, early Sunday >>> >> morning, >> >>> when things were working well. >>> >>> Note that I ran the test over the entire labor day weekend >>> >> at intervals. >> >>> The end-to-end ping time was bimodal. Either it pegged at >>> >> over 5000 >> >>> milliseconds, or happily sat at under 200 milliseconds. >>> >> Exactly what >> >>> one would expect if TCP congestion control were disabled by >>> overbuffering in a router preceding the bottleneck link >>> >> shared by many users. >> >>> ------------------------------ >>> >>> $ ping lcs.mit.edu >>> PING lcs.mit.edu (128.30.2.121) 56(84) bytes of data. >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=1 ttl=44 >> >>> time=209 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=2 ttl=44 >> >>> time=118 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=3 ttl=44 >> >>> time=166 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=4 ttl=44 >> >>> time=165 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=5 ttl=44 >> >>> time=224 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=6 ttl=44 >> >>> time=183 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=7 ttl=44 >> >>> time=224 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=8 ttl=44 >> >>> time=181 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=9 ttl=44 >> >>> time=220 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=10 ttl=44 >> >>> time=179 ms >>> 64 bytes from zermatt.csail.mit.edu (128.30.2.121): >>> >> icmp_seq=11 ttl=44 >> >>> time=219 ms >>> ^C >>> --- lcs.mit.edu ping statistics --- >>> 11 packets transmitted, 11 received, 0% packet loss, time >>> >> 10780ms rtt >> >>> min/avg/max/mdev = 118.008/190.547/224.960/31.772 ms $ traceroute >>> lcs.mit.edu traceroute to lcs.mit.edu (128.30.2.121), 30 >>> >> hops max, 60 >> >>> byte packets >>> 1 * * * >>> 2 172.26.248.2 (172.26.248.2) 178.725 ms 178.568 ms 179.500 ms >>> 3 * * * >>> 4 172.16.192.34 (172.16.192.34) 187.794 ms 187.677 ms >>> >> 207.527 ms >> >>> 5 12.88.7.205 (12.88.7.205) 207.416 ms 208.325 ms 69.630 ms >>> 6 cr84.cgcil.ip.att.net (12.122.152.134) 79.425 ms 89.227 ms >>> 90.083 ms >>> 7 cr2.cgcil.ip.att.net (12.123.7.250) 98.679 ms 90.727 >>> >> ms 91.576 >> >>> ms >>> 8 ggr2.cgcil.ip.att.net (12.122.132.137) 72.728 ms 89.628 ms >>> 88.825 ms >>> 9 192.205.33.186 (192.205.33.186) 89.787 ms 89.794 ms >>> >> 80.918 ms >> >>> 10 ae-31-55.ebr1.Chicago1.Level3.net (4.68.101.158) 79.895 ms >>> 70.927 ms >>> 78.817 ms >>> 11 ae-1-5.bar1.Boston1.Level3.net (4.69.140.93) 107.820 >>> >> ms 156.892 >> >>> ms >>> 140.711 ms >>> 12 ae-7-7.car1.Boston1.Level3.net (4.69.132.241) 139.638 >>> >> ms 139.764 >> >>> ms >>> 129.853 ms >>> 13 MASSACHUSET.car1.Boston1.Level3.net (4.53.48.98) 149.595 ms >>> 154.366 ms >>> 152.225 ms >>> 14 B24-RTR-2-BACKBONE.MIT.EDU (18.168.0.23) 146.808 ms >>> >> 129.801 ms >> >>> 89.659 ms >>> 15 MITNET.TRANTOR.CSAIL.MIT.EDU (18.4.7.65) 109.463 ms >>> >> 118.818 ms >> >>> 91.727 ms >>> 16 trantor.kalgan.csail.mit.edu (128.30.0.246) 91.541 ms >>> >> 88.768 ms >> >>> 85.837 ms >>> 17 zermatt.csail.mit.edu (128.30.2.121) 117.581 ms 116.564 ms >>> 103.569 ms $ >>> >>> >>> >>> >>> >>> >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090911/0a2b37f9/attachment-0001.html From detlef.bosau at web.de Fri Sep 11 08:20:19 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 11 Sep 2009 17:20:19 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <2e798b0c-c885-43fd-be7e-68625de0d00d@exchht01.lums.net> References: <2e798b0c-c885-43fd-be7e-68625de0d00d@exchht01.lums.net> Message-ID: <4AAA6AB3.5070307@web.de> zartash wrote: > [ZAU] You mean the modulation, and not line coding? In any case, I am > To my knowledge, "modulation" and "line coding" are used synonymously. > surprised that a change in modulation would change the delay (or service > time, for that matter), by so much. Perhaps not that much, but in Germany we say: "Kleinvieh macht auch Mist". The translation "many a mickle makes a muckle" seems to be quite useful here ;-) In addition, it's not only the line coding but typically you adapt the channel coding and puncturing as well. > By a variation in modulation, I would > expect a change in data rate but not in the delay. No. I refer to common sense ;-) What is "data rate" at all? In wireless networks, you do not have a "serialization delay" which spreads, say, 1000 bit into a block of 1 ms temporal duration. Instead, this 1000 bits are often separated into quite a number of "transport blocks" of, say, 200 bit or less (depending on the channel's condition of course) and each of these blocks is serviced then. Some may be sent once, others twice, some even three times until they are successfully read at the receiver - if this happens at all. You cannot tell the necessary number of transmissions in advance. So, you cannot predict the time needed to convey this amount of data along a given line. This holds particularly true in systems with variable line/channel-coding. For this reason, you may well talk about the time needed to convey a transport block. Or, eventualy, the time needed to convey a packet. But it does not really make sense to talk about a "rate" here, because this "rate" may change every few bits in a packet. >> For quite a few days now, I'm thinking on whether this delay variation >> may even affect the algorithms for TCP RTO calculation (refer to Edge's >> paper and its assumptions). >> >> I'm not surprised about a multimodal behaviour here. If, I would be >> surprised to see _only_ two modes here. >> > > [ZAU] If I assume that the change in modulation does, in fact, impact the > delay (for whatever reason!), It affects the time needed to convey a certain amount of data from the sender to the receiver. Hence, it directly affects the RTT in TCP. > then I will NOT be surprised to see only two > modes here. There are only three possible modulations in HSDPA and it is not > Three modulations, actually one channel coding (to my knowlege: Turbo Code 1/3), quite a few transport lengths and appropriate puncturing schemes and code occupations. Please keep in mind that HSDPA transport blocks are "wrapped around" a number of channelization codes. In addition, the number of transmission attempts for a block may vary. > [ZAU] At one time, WLAN uses one of the two modulation schemes (well, 11g to > be precise!) which are DSSS DSSS is a spreading scheme, not coding. Coding / Modulation would be QPSK here or 16 QAM respectively. > and OFDM but a number of connection speeds on > each of these modulations. And yes, HSDPA uses one of the three modulation > neither DSSS nor OFDM ;-) (I'm not quite sure here, but I think, the channelization is achieved using Walsh-Hadamard Codes here.) > schemes. In both cases, the modulation scheme and data rates at a given > point in time are selected based on the channel conditions. > > Basically yes. However, this is the rate data is _sent_ with. The rate, data is _successfully_ read with, may be different as we have to take into account retransmissions here. > [ZAU] I am a bit confused here, are we mixing up throughput and delay here? > > Hm. Don't worry, I don't ;-) Regards Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From dpreed at reed.com Fri Sep 11 08:29:38 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 11 Sep 2009 11:29:38 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> Message-ID: <4AAA6CE2.70701@reed.com> Perhaps I was not clear. I REALLY think Dimitri's comments about why the layer 2 stuff might have huge delays with perfect reliable delivery are really good starting points for exploring how to design a good implementation of HSPA packet transport at layer 2, where "good" means - works well for an Internet Access Network. I want to thank him for putting in that useful information (I have no ability to reach inside ATT's corporate proprietary wall to do those experiments. But on the other hand, the phenomenon is observable and reliable, with more experiments possible even without ATT cooperation). At the same time, it's important to realize that layer 2 is not "unto itself" the system requirement specifier. Scott Bradner told me that it is common for ISPs to write contracts to layer 2 equipment vendors that state "Never drop packets". If it's in the contract, the blame switches back to the ISP that wrote the contract for buying the wrong stuff. (though I'd suggest that if the layer 2 vendor knows that his customer will be running IP over it, he might *counterpropose* a *better* system). Ultimately, I hope that AT&T resolves this problem somehow. It may cost a lot of money. Intel eventually fixed the floating point bug in the Pentium, rather than try to cover it up by a PR blitz. That was the right decision - but since I was involved, I know from personal experience that engineers at Intel and its customers were under enormous pressure to agree with the Intel PR people that "no user would ever experience the bug". It took a Lotus 2-cell spreadsheet, with simple formulas any ordinary person could understand, that gave an answer off by a factor of 10, to demonstrate that "any user could see and demonstrate the bug". From L.Wood at surrey.ac.uk Fri Sep 11 09:26:00 2009 From: L.Wood at surrey.ac.uk (Lloyd Wood) Date: Fri, 11 Sep 2009 17:26:00 +0100 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AAA6CE2.70701@reed.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6CE2.70701@reed.com> Message-ID: <212866B7-D590-4CF6-AF91-A6200F032FBA@surrey.ac.uk> On 11 Sep 2009, at 16:29, David P. Reed wrote: > > Scott Bradner told me that it is common for ISPs to write contracts > to layer 2 equipment vendors that state "Never drop packets". We wrote RFC3366, which says "don't do that." L. DTN work: http://info.ee.surrey.ac.uk/Personal/L.Wood/saratoga/ From detlef.bosau at web.de Fri Sep 11 11:07:03 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 11 Sep 2009 20:07:03 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: <212866B7-D590-4CF6-AF91-A6200F032FBA@surrey.ac.uk> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6CE2.70701@reed.com> <212866B7-D590-4CF6-AF91-A6200F032FBA@surrey.ac.uk> Message-ID: <4AAA91C7.1020805@web.de> Lloyd Wood wrote: > > On 11 Sep 2009, at 16:29, David P. Reed wrote: >> >> Scott Bradner told me that it is common for ISPs to write contracts >> to layer 2 equipment vendors that state "Never drop packets". > > We wrote RFC3366, which says "don't do that." > > L. > Hm. I think, particularly Dave will agree when I say we should pursue _short_ transmission times and rather should leave a packet undelivered than do excessive retransmissions. This is one reason why I think that the (often claimed "strong") difference between congestion drop and corruption drop is a bit academic and theoretic. Is it possible, that we encounter a combination of excessive buffering, excessive retransmission and volative path conditions here? Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From lachlan.andrew at gmail.com Fri Sep 11 14:41:24 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Fri, 11 Sep 2009 14:41:24 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AAA6614.9040402@reed.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> Message-ID: Greetings David, 2009/9/11 David P. Reed : > I appreciate the your explanation, Dimitrios, is an attempt to explain why > this might happen at layer 2.? However, the service I am using is offered as > an IP (layer 3) service.?? It is a *requirement* that layer 3 drop packets > to signal congestion buildup. No, IP is claimed to run over a "best effort" network. That means that the router *may* discard packets, but doesn't mean that it *must*. If the delay is less than the IP lifetime (3 minutes?) then the router is within spec (from the E2E point of view). The dominance of IP was exactly that it doesn't place heavy "requirements" on the forwarding behaviour of the routers. > Otherwise, the binding of this particular layer 2 transport (with elastic 10 > second queues) is something that is just WRONG to claim as an high-speed > Internet service.??? (except for the case where the other end is 10 light > seconds away). No, it is not "WRONG" to claim a high bit-rate service is high-speed, even if it has high latency. High-speed is not equivalent to low-delay. However, I agree that it is a very bad design to have such long delays. > The entire congestion control mechanism (and approximate fairness mechanism) > of the Internet works on the assumption that congestion is signaled to those > who are congesting the network - so they can back off, which they will in > fact do. If the networks have changed, we should change the assumptions that TCP makes. In the old days, VJ changed TCP so that it would run over congested un-reliable networks. If TCP is now being asked to run over congested reliable networks, shouldn't we update TCP? There are many methods which use delay as an indicator of congestion, as well as using loss. (Should I plug Steven Low's FAST here?) We don't need anything very fine-tuned in a case like this; just something very basic. Of course, fixing TCP to work over any IP connection (as it was intended) does not mean that the underlying networks should not be optimised. As Lloyd said, we already have recommendations. > I don't know who designed this system, but I can say that any serious ISP > should NOT buy such equipment if it functions this way. I agree, but that shouldn't stop the E2E principle from being applied, and trying to design E2E protocols to cope with it. Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From detlef.bosau at web.de Sat Sep 12 04:22:39 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 12 Sep 2009 13:22:39 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> Message-ID: <4AAB847F.7010906@web.de> Lachlan Andrew wrote: > No, IP is claimed to run over a "best effort" network. That means > To run? Or to hobble? ;-) > that the router *may* discard packets, but doesn't mean that it > *must*. This is not even a theoretical debate. A router's storage capacity is finite, hence the buffer cannot keep an infinite number of packets. However, we're talking about TCP here. And TCP simply does not work properly without any kind of congestion control. Hence, there is a strong need to have a sender informed about a congested network. > If the delay is less than the IP lifetime (3 minutes?) then > the router is within spec (from the E2E point of view). I don't know a router who checks a packet's lifetime with the clock, although some stone-aged specification proposes a temporal interpretation of lifetime ;-) Practically, in IPv4 the lifetime is a maximum hop count. In IPv6, this is even through for the specs. > The dominance > of IP was exactly that it doesn't place heavy "requirements" on the > forwarding behaviour of the routers. > > Which does not mean, that there would be no need to do so. Particularly, when any kind of recovery layer comes in effect, we should carefully consider requirements for router behaviour. >> Otherwise, the binding of this particular layer 2 transport (with elastic 10 >> second queues) is something that is just WRONG to claim as an high-speed >> Internet service. (except for the case where the other end is 10 light >> seconds away). >> > > No, it is not "WRONG" to claim a high bit-rate service is high-speed, > even if it has high latency. Hm. I think, the problem is the misconception mentioned by zartash yesterday: Do we talk about rates? Or do we talk about delays? Or do we talk about service times? In packet switching networks, we generally talk about service times and nothing else. Any kind of "rate" or "throughput" is a derived quantity. One particular consequence of this is that we well may consider to _restrict_ particular service times and to discard packets which cannot be serviced in a certain amount of time in order to keep queues stable and to avoid infinite head of line blocking etc. A side effect from doing so is that a sender is, if implicitly, informed about a "lost packet" and reacts accordingly: It does congestion handling. When a door is congested, it is sometimes an academic debate, whether it's simply overcrowded or temporarily closed. I cannot pass the door anyway. And the appropriate action is to either find another door, or to give it another try some time later. > High-speed is not equivalent to > low-delay. However, a high speed network will necessarily have small service times. > >> The entire congestion control mechanism (and approximate fairness mechanism) >> of the Internet works on the assumption that congestion is signaled to those >> who are congesting the network - so they can back off, which they will in >> fact do. >> > > If the networks have changed, we should change the assumptions that TCP makes. > > In the old days, VJ changed TCP so that it would run over congested > un-reliable networks. If TCP is now being asked to run over congested > reliable networks, shouldn't we update TCP? I don't see the point. From the days, when VJ wrote the congavoid paper, up to now TCP works fine about congested reliable networks ;-) (What, if not "reliable", are wirebound links?) Dave's problem arises from unreliable networks (yes, I intendedly use a somewhat strange definition of reliability here ;-)), i.e. from wireless ones. One possibility to turn those "unreliable networks" into reliable ones is a strict recovery layer which offers arbitrarily high probabilities for successful packet delivery. If I would read Lloyds post in a black-hearted manner, I could interpret RFC 3366 in exactly that way. However, I don't really think, that this is the intended interpretation. > There are many methods > which use delay as an indicator of congestion, as well as using loss. > > (Should I plug Steven Low's FAST here?) We don't need anything very > fine-tuned in a case like this; just something very basic. > I dealt with ideas like this myself because it is appealing at a first glance - and I got several papers rejected. Although, this was disappointing to me in the first, I had to understand that delay is one of the worst indicators for network congestion one could even imagine. One of the first criticisms, I've got, was that there is usually no reference delay which is related to a "sane" network. (This is different in network management scenarios, where one does _intentionally_ some "baselining" in order to obtain exactly that.) However, this is not the hardest one. The really problem with delay and delay variations is, that there are several possible causes for them: - congestion. - MAC latencies. (similar to congestion.) - high recovery latencies due to large numbers of retransmissions. - route changes. - changes in path properties / line coding / channel coding / puncturing etc. Without any particular knowledge of the path, you may not be able to determine "the one" (if it is one at all) reason for a delay variation. So, the consequence is that we should abandon the use of delays as congestion indicator. The original meaning of "congestion" is that a path is full and cannot accept more data. And the by far most compelling indication for "this path cannot accept mor data" is that this "more data" is discarded. > Of course, fixing TCP to work over any IP connection (as it was > intended) does not mean that the underlying networks should not be > optimised. As Lloyd said, we already have recommendations. > > And we have the end-to-end recommendations which tell us, that underlying networks should not attempt to solve all problems on their own. And a proper trade off, when a problem can be solved at a "low layer" and when it should be forwarded to an upper layer, or upper layers are at least involved, is _always_ a concern, not only in networking but in every kind of all days life - including the bankruptcy of Lehman brothers ;-) -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From dpreed at reed.com Sat Sep 12 12:12:01 2009 From: dpreed at reed.com (David P. Reed) Date: Sat, 12 Sep 2009 15:12:01 -0400 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> Message-ID: <4AABF281.1020300@reed.com> On 09/11/2009 05:41 PM, Lachlan Andrew wrote > No, IP is claimed to run over a "best effort" network. That means > that the router *may* discard packets, but doesn't mean that it > *must*. If the delay is less than the IP lifetime (3 minutes?) then > the router is within spec (from the E2E point of view). The dominance > of IP was exactly that it doesn't place heavy "requirements" on the > forwarding behaviour of the routers. I disagree with this paragraph. No one ever claimed that IP would run over *any* best efforts network. One could argue that routers that take pains to deliver packets at *any* cost (including buffering them for 10 seconds when the travel time over the link between points is on the order of 1 microsecond, and the signalling rate is > 1 Megabit/sec) are not "best efforts" but "heroic efforts" networks. In any case, research topics for future networks aside, the current IP network was, is, and has been developed with the goal of minimizing buffering and queueing delay in the network. The congestion control and fairness mechanism developed by Van Jacobson and justified by Kelly (on game theoretic grounds, which actually makes a great deal of sense, because it punishes non-compliance to some extent) is both standardized and dependent on tight control loops, which means no substantial queueing delay. It's not the buffer capacity that is the problem. It's the lack of signalling congestion. And the introduction of "persistent traffic jams" in layer 2 elements, since the drainage rate of a queue is crucial to recovery time. One can dream of an entirely different network. But this is NOT a political problem where there is some weird idea that layer 2 networks offering layer 3 transit should have political rights to just do what they please. It's merely a matter of what actually *works*. Your paragraph sounds like the statements of what my seagoing ancestors called "sea-lawyers" people who make some weird interpretation of a "rule book" that seems to be based on the idea that the design came from "god" or the "king". Nope - the design came from figuring out what worked best. Now, I welcome a fully proven research activity that works as well as the Internet does when operators haven't configured their layer 2 components to signal congestion and limit buildup of slow-to-drain queues clogged with packets. You are welcome to develop and convince us to replace the Internet with it, once it *works*. From lachlan.andrew at gmail.com Sat Sep 12 13:54:08 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Sat, 12 Sep 2009 13:54:08 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AAB847F.7010906@web.de> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AAB847F.7010906@web.de> Message-ID: Greetings, I'm glad my provocative comments stimulated a response. I'm happy to agree that they're not a sound way to make incremental changes to the current network, but they reflect a particular hobby horse of mine, that the "inter" has been forgotten in the internet. An internet is not a network, but a collection of networks. There has been much confusion because people think that Ethernet networks are "link" instead of networks. That has caused people to misinterpret topology data and all sorts of problems. 2009/9/12 Detlef Bosau : > Lachlan Andrew wrote: >> >> No, IP is claimed to run over a "best effort" network. ?That means > > To run? Or to hobble? ;-) Definitely to hobble... >> that the router *may* discard packets, but doesn't mean that it >> *must*. > > This is not even a theoretical debate. > > A router's storage capacity is finite, hence the buffer cannot keep an > infinite number of packets. I never mentioned an infinite number of packets. If links have token-based congestion control then it is possible to have a lossless network. Token-based control was proposed for ATM, and is (correct me if I'm wrong) being reintroduced for "Data Centre Ethernet". If we say "IP will not run over Data Centre Ethernet, because it doesn't drop packets, therefore Data Centre Ethernet is wrong", then we are in serious trouble. *That* sounds like we're taking our IETF rule-book as god-given. Granted, Data Centre Ethernet isn't being offered as a public "internet" service, but surely we should be looking at the issue of TCP-over-reliable-networks. > However, we're talking about TCP here. And TCP simply does not work properly > without any kind of congestion control. RFC 793 does. The current TCP congestion control mechanism was a very useful hack to solve an immediate problem. Why are we so keen to defend a 20 year old kludge? (I know: "because it works". But what if there are cases where it *doesn't* work. Why say "Those cases shouldn't exist. Next problem"?) VJ's radical insight was "the network is telling us something" when it drops a packet. That brought about a radical improvement. Why is it so hard to say "the network is telling us something" when it delays packets? Perhaps this debate should be in the IRTF/ICCRG (Cc'd) instead of the IETF, but > Hence, there is a strong need to have a sender informed about a congested > network. Doesn't an 8-second delay inform the sender? Of course, by the time it has reached 8 seconds it is too late. However, we can detect 100ms of delay after a mere 100ms... Delay gives *instant* per-packet feedback, whereas loss only gives feedback every few packets (or every few thousand packets on high BDP links). Pure delay-based TCP has many problems, since we don't understand all of the possible causes of delay, but delay should definitely be considered as information. >> If the delay is less than the IP lifetime (3 minutes?) then >> the router is within spec (from the E2E point of view). > > I don't know a router who checks a packet's lifetime with the clock, > although some stone-aged specification proposes a temporal interpretation of > lifetime ;-) Practically, in IPv4 the lifetime is a maximum hop count. In > IPv6, this is even through for the specs. I'm not talking about TTL. I'm talking about the maximum segment lifetime (MSL), which, as recently as 2008 (draft-touch-intarea-ipv4-unique-id-00.txt) was "typically interpreted as two minutes". >> No, it is not "WRONG" to claim a high bit-rate service is high-speed, >> even if it has high latency. > > Do we talk about rates? Or do we talk about delays? Or do we talk > about service times? "speed" = "rate". A concord is fast, even if I have to book my ticket a day in advance. > In packet switching networks, we generally talk about service times and > nothing else. > Any kind of "rate" or "throughput" is a derived quantity. True, but it is the derived quantity which was being advertised. The service was not being advertised as a low-delay service, but as a high-speed service. I'm not trying to defend a service which has these appalling delays; just to get us out of the mindset that they're doing something "wrong" as distinct from "stupid". >> ?High-speed is not equivalent to low-delay. > > However, a high speed network will necessarily have small service times. Yes, but not small latency. See the concord example. >>> The entire congestion control mechanism (and approximate fairness > From the days, when VJ wrote the congavoid paper, up to now TCP works fine > about congested reliable networks ;-) > (What, if not "reliable", are wirebound links?) Wirebound IP links are "best effort", not "reliable". "Reliable" (when applied to a protocol) means that it guarantees to deliver each packet exactly once. That is what is meant by RFC 3366's "TCP provides a reliable byte-stream transport service, building upon the best-effort datagram delivery service provided by the Internet Protocol" It has nothing to do with whether we can actually rely on the protocol to get the job done usefully... > Dave's problem arises from unreliable networks (yes, I intendedly use a > somewhat strange definition of reliability here ;-)), i.e. from wireless > ones. No, it is a "reliable" network running over an unreliable physical link. The problem is that it tries too hard to be reliable. >> ?There are many methods >> which use delay as an indicator of congestion, as well as using loss. >> ?(Should I plug Steven Low's FAST here?) ?We don't need anything very >> fine-tuned in a case like this; just something very basic. > > delay is one of the worst indicators for network congestion one could > even imagine. Oh? In that case, why are we assuming that this 8-second delay was caused by congestion? I'm not saying we should *only* consider delay, but that the problem here is that TCP is *ignoring* delay, like RFC793 ignored loss. > there is usually no > reference delay which is related to a "sane" network. True, that is a challenge, but we know that these delays are not sane, and should define appropriate responses. > The really problem with delay and delay variations is, that there are > several possible causes for them: > - congestion. > - MAC latencies. (similar to congestion.) > - high recovery latencies due to large numbers of retransmissions. > - route changes. > - changes in path properties / line coding / channel coding / puncturing > etc. > > Without any particular knowledge of the path, you may not be able to > determine "the one" (if it is one at all) reason for a delay variation. Similarly, we don't know that "the one" reason for a packet loss is congestion. However, we can use the available information. Take a look at Doug Leith's recent work on debunking the myths about why we can't use delay for congestion estimation. > So, the consequence is that we should abandon the use of delays as > congestion indicator. No, we shouldn't rely on delay as the only source of information. I don't see why we should ever decide a priori to ignore information. > The original meaning of "congestion" is that a path is full and cannot > accept more data. > And the by far most compelling indication for "this path cannot accept mor > data" is that this "more data" is discarded. True, it is compelling. However, it isn't the only indication. I'm not saying that we should reject loss as an indicator of congestion. I'm just saying that we shouldn't ignore other indicators. >> Of course, fixing TCP to work over any IP connection (as it was >> intended) does not mean that the underlying networks should not be >> optimised. ?As Lloyd said, we already have recommendations. > > And we have the end-to-end recommendations which tell us, that underlying > networks should not attempt to solve all problems on their own. "Be liberal in what you accept, and conservative in what you send" (RFC1122). Once again, I'm not saying that the link layer is good. I'm just saying that we should be open to improving TCP to make it accept that there are bad links. Of course, a valid argument against making TCP robust to bad links is that it hides the badness of those links and makes link-designers lazy. However, I'm not going to argue against Moore's law just because it makes programmers lazy. Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From lachlan.andrew at gmail.com Sat Sep 12 15:12:24 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Sat, 12 Sep 2009 15:12:24 -0700 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AABF281.1020300@reed.com> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> Message-ID: Greetings David, 2009/9/12 David P. Reed : > > > On 09/11/2009 05:41 PM, Lachlan Andrew wrote >> >> No, IP is claimed to run over a "best effort" network. ?That means >> that the router *may* discard packets, but doesn't mean that it >> *must*. ?If the delay is less than the IP lifetime (3 minutes?) then >> the router is within spec (from the E2E point of view). > > I disagree with this paragraph. No one ever claimed that IP would run over > *any* best efforts network. ?One could argue that routers that take pains to > deliver packets at *any* cost (including buffering them for 10 seconds when > the travel time over the link between points is on the order of 1 > microsecond, and the signalling rate is > 1 Megabit/sec) are not "best > efforts" but "heroic efforts" networks. You are right that "good" isn't one-dimensional. I have always found it odd that people use "best effort" to mean something less that "trying as hard as is possible" (because best is a superlative -- nothing is better than the best). It was only in formulating a reply that I realised "best" could also mean "most appropriate". Still, a quick search for "discard" in the "Requirements for Internet Hosts" standard doesn't say that they *have* to discard packets. Again, my main motivation for writing a provocative email was that I'm frustrated at people saying "We're allowed to specify TCP to behave badly on highly-buffered links, but link designers aren't failing if they design links that behave badly with highly-aggressive E2E protocols". TCP congestion control was only ever intended as an emergency hack. It is remarkable that it has worked as well as it has, but why do we have to keep designing networks to support the hack? As I said in reply to Detlef, a good TCP can make link designers lazy. However, we shouldn't let good links make us as TCP / E2E designers lazy. > In any case, research topics for future networks aside, the current IP > network was, is, and has been developed with the goal of minimizing > buffering and queueing delay in the network. The congestion control and > fairness mechanism developed by Van Jacobson and justified by Kelly (on game > theoretic grounds, which actually makes a great deal of sense, because it > punishes non-compliance to some extent) is both standardized and dependent > on tight control loops, which means no substantial queueing delay. The IETF may have standardised TCP, but what if the IEEE "standardises" a reliable link protocol (like data centre ethernet), or the ITU standardises high-reliability ATM (used by DSL modems, which also get the blame for TCP's excessive aggressiveness)? Should we change their standards, or ours? The IETF isn't the only standards body, or even the supreme one. If there are standards that don't interact well, we should revisit all standards, starting with the ones we can control. > It's not the buffer capacity that is the problem. ?It's the lack of > signalling congestion. And the introduction of "persistent traffic jams" in > layer 2 elements, since the drainage rate of a queue is crucial to recovery > time. Perhaps it is the lack of the IETF protocol paying attention to the congestion signals. As I mentioned, VJ's breakthrough was realising that TCP should listen more closely to what the network was telling us. Why should we not keep doing that? When the link is screaming with high delay, why don't we back off? > One can dream of an entirely different network. ?But this is NOT a political > problem where there is some weird idea that layer 2 networks offering layer > 3 transit should have political rights to just do what they please. ?It's > merely a matter of what actually *works*. It is exactly a political problem, between different standards bodies. But closer to your analogy, who gives TCP the right to send just what it pleases? I'm not talking about "an entirely different network", but one which exists and on which you took measurements. The network has problems, caused by poor interaction between an IETF protocol and the hardware one which it runs. One has to change. Why close our eyes to the possibility of changing the protocol? > Your paragraph sounds like the statements of what my seagoing ancestors > called "sea-lawyers" people who make some weird interpretation of a "rule > book" that seems to be based on the idea that the design came from "god" or > the "king". ?Nope - the design came from figuring out what worked best. At the risk of being repetitive, I see the same thing in reverse: I'm hearing "we can't change TCP to work over the current network, because TCP is standardized, and it used to work". I'm not saying that we should have links with excessive buffers. I'm not even saying that they shouldn't unnecessarily drop packets (although it sounds odd). I'm just saying that we should *also* be open to changing TCP to work over the links that people build. > Now, I welcome a fully proven research activity that works as well as the > Internet does when operators haven't configured their layer 2 components to > signal congestion and limit buildup of slow-to-drain queues clogged with > packets. Great. I agree that my mindset is more IRTF than IETF, and so I'm Cc'ing this to ICCRG too. However, I'm arguing that the layer 2 links *are* signalling congestion very strongly, if only we'll listen. Links with slow-to-drain queues are certainly a problem if there is a high load of traffic which doesn't have proper congestion control, but that isn't a reason not to design proper congestion control which doesn't fill all available buffers. > You are welcome to develop and convince us to replace the Internet with it, > once it *works*. I'm not talking about replacing the internet, any more than RFC2851 / RFC5681 replace RFC753. I'm only suggesting that we design protocols which work on the network that is out there, and that you measured. If the link you mention is an isolated case, then we can simply call it misconfigured. However, I don't believe it is an isolated case, and we should take responsibility for TCP's poor behaviour on such links. Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From calvert at netlab.uky.edu Sat Sep 12 19:37:25 2009 From: calvert at netlab.uky.edu (Ken Calvert) Date: Sat, 12 Sep 2009 22:37:25 -0400 (EDT) Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> Message-ID: What I think is interesting about this discussion is that the original "framers" [:-)] of TCP saw nothing wrong with an MSL denominated in minutes, and now delivering a datagram after it spends 10 seconds in the network is considered harmful. In between came VJCC -- but we've had that all these years and this is the first time I've heard anyone suggest it's a problem that packets can survive in the network for one-sixth of a minute. An MSL is required so TCP (or any *practical* reliable transport) can have certain safety properties. This discussion shows that MSL has implications for the CC control loop as well. TCP's correctness would be fine if the MSL were 10 seconds, so *if* the consensus is that multiple seconds of buffering is broken, why not acknowledge that the world has changed and make that an "official" IETF policy? Indeed, as has been noted here recently (in a different discussion), some TCP implementations already assume a small MSL for other performance reasons. But Reed's experiment shows there are real dangers in doing so. KC > Greetings David, > > 2009/9/12 David P. Reed : >> >> >> On 09/11/2009 05:41 PM, Lachlan Andrew wrote >>> >>> No, IP is claimed to run over a "best effort" network. ?That means >>> that the router *may* discard packets, but doesn't mean that it >>> *must*. ?If the delay is less than the IP lifetime (3 minutes?) then >>> the router is within spec (from the E2E point of view). >> >> I disagree with this paragraph. No one ever claimed that IP would run over >> *any* best efforts network. ?One could argue that routers that take pains to >> deliver packets at *any* cost (including buffering them for 10 seconds when >> the travel time over the link between points is on the order of 1 >> microsecond, and the signalling rate is > 1 Megabit/sec) are not "best >> efforts" but "heroic efforts" networks. > > You are right that "good" isn't one-dimensional. I have always found > it odd that people use "best effort" to mean something less that > "trying as hard as is possible" (because best is a superlative -- > nothing is better than the best). It was only in formulating a reply > that I realised "best" could also mean "most appropriate". > > Still, a quick search for "discard" in the "Requirements for Internet > Hosts" standard doesn't say that they *have* to discard packets. > > Again, my main motivation for writing a provocative email was that I'm > frustrated at people saying "We're allowed to specify TCP to behave > badly on highly-buffered links, but link designers aren't failing if > they design links that behave badly with highly-aggressive E2E > protocols". > > TCP congestion control was only ever intended as an emergency hack. > It is remarkable that it has worked as well as it has, but why do we > have to keep designing networks to support the hack? As I said in > reply to Detlef, a good TCP can make link designers lazy. However, we > shouldn't let good links make us as TCP / E2E designers lazy. > >> In any case, research topics for future networks aside, the current IP >> network was, is, and has been developed with the goal of minimizing >> buffering and queueing delay in the network. The congestion control and >> fairness mechanism developed by Van Jacobson and justified by Kelly (on game >> theoretic grounds, which actually makes a great deal of sense, because it >> punishes non-compliance to some extent) is both standardized and dependent >> on tight control loops, which means no substantial queueing delay. > > The IETF may have standardised TCP, but what if the IEEE > "standardises" a reliable link protocol (like data centre ethernet), > or the ITU standardises high-reliability ATM (used by DSL modems, > which also get the blame for TCP's excessive aggressiveness)? Should > we change their standards, or ours? The IETF isn't the only > standards body, or even the supreme one. If there are standards that > don't interact well, we should revisit all standards, starting with > the ones we can control. > >> It's not the buffer capacity that is the problem. ?It's the lack of >> signalling congestion. And the introduction of "persistent traffic jams" in >> layer 2 elements, since the drainage rate of a queue is crucial to recovery >> time. > > Perhaps it is the lack of the IETF protocol paying attention to the > congestion signals. As I mentioned, VJ's breakthrough was realising > that TCP should listen more closely to what the network was telling > us. Why should we not keep doing that? When the link is screaming > with high delay, why don't we back off? > >> One can dream of an entirely different network. ?But this is NOT a political >> problem where there is some weird idea that layer 2 networks offering layer >> 3 transit should have political rights to just do what they please. ?It's >> merely a matter of what actually *works*. > > It is exactly a political problem, between different standards bodies. > > But closer to your analogy, who gives TCP the right to send just what > it pleases? I'm not talking about "an entirely different network", > but one which exists and on which you took measurements. The network > has problems, caused by poor interaction between an IETF protocol and > the hardware one which it runs. One has to change. Why close our > eyes to the possibility of changing the protocol? > >> Your paragraph sounds like the statements of what my seagoing ancestors >> called "sea-lawyers" people who make some weird interpretation of a "rule >> book" that seems to be based on the idea that the design came from "god" or >> the "king". ?Nope - the design came from figuring out what worked best. > > At the risk of being repetitive, I see the same thing in reverse: I'm > hearing "we can't change TCP to work over the current network, because > TCP is standardized, and it used to work". I'm not saying that we > should have links with excessive buffers. I'm not even saying that > they shouldn't unnecessarily drop packets (although it sounds odd). > I'm just saying that we should *also* be open to changing TCP to work > over the links that people build. > >> Now, I welcome a fully proven research activity that works as well as the >> Internet does when operators haven't configured their layer 2 components to >> signal congestion and limit buildup of slow-to-drain queues clogged with >> packets. > > Great. I agree that my mindset is more IRTF than IETF, and so I'm > Cc'ing this to ICCRG too. > > However, I'm arguing that the layer 2 links *are* signalling > congestion very strongly, if only we'll listen. > > Links with slow-to-drain queues are certainly a problem if there is a > high load of traffic which doesn't have proper congestion control, but > that isn't a reason not to design proper congestion control which > doesn't fill all available buffers. > >> You are welcome to develop and convince us to replace the Internet with it, >> once it *works*. > > I'm not talking about replacing the internet, any more than RFC2851 / > RFC5681 replace RFC753. I'm only suggesting that we design protocols > which work on the network that is out there, and that you measured. > If the link you mention is an isolated case, then we can simply call > it misconfigured. However, I don't believe it is an isolated case, > and we should take responsibility for TCP's poor behaviour on such > links. > > Cheers, > Lachlan > Ken Calvert, Professor University of Kentucky Computer Science Lexington, KY USA 40506 Lab for Advanced Networking Tel: +1.859.257.6745 calvert at netlab.uky.edu Fax: +1.859.323.1971 http://protocols.netlab.uky.edu/~calvert/ From Jon.Crowcroft at cl.cam.ac.uk Sun Sep 13 02:43:39 2009 From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft) Date: Sun, 13 Sep 2009 10:43:39 +0100 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AAB847F.7010906@web.de> Message-ID: this made me think- the problem with the ill-defined "best effort" is that IP as a service falls victim to the dictatorship of the majority - the internet is a network of networks as you rightly remind us, and those networks constantly expand to include new things - some of the new things are great (greater speed, lower latency) but some are bad (lower speed, higher latency, interference induced packet loss) and some are even worse (re-definitions of service to include throttling of some flows by port, allegedly for TE or policy, and firewall filtering of default legacy setings making positive re-definitions of field use (e.g. ECN) hard to deploy...) so what is seen by IP (and therefore has to be coped with by TCP and most applications) has to expand to include the worst as well as the better (democracy in action).... its probably too late to do this, but a formal definition of best effort semantics (including the postel principle) would be really good in stopping mission creep... on a more specific note, I implemented an IP forwarding box at least twice and one time, actually decremented TTL on a 1 second timer as well as once per hop, (but that was back when an output link was slow enough that a queue could persist that long) but fixing time based discard isn't, for me the crucial thing - fixing buffer sizes in routers (as per damon wischik and others' work) seems more like repairing the root cause... In missive , Lachlan Andrew typed: >>I'm glad my provocative comments stimulated a response. >>I'm happy to agree that they're not a sound way to make incremental >>changes to the current network, but they reflect a particular hobby >>horse of mine, that the "inter" has been forgotten in the internet. From detlef.bosau at web.de Sun Sep 13 04:43:31 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 13 Sep 2009 13:43:31 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> Message-ID: <4AACDAE3.7090608@web.de> Lachlan Andrew wrote: > You are right that "good" isn't one-dimensional. I have always found > it odd that people use "best effort" to mean something less that > "trying as hard as is possible" (because best is a superlative -- > nothing is better than the best). It was only in formulating a reply > that I realised "best" could also mean "most appropriate". > > To my understanding, "best effort" is a term from the business world. When I send you a book and ship the item "best effort delivery", this means: I will take the item to the parcel service. And I don't care for the rest. Hence, best effort is not a synonym for "taking responsibility" for something. Quite the opposite is true: "best effort" is a synsnom for SNMP: "Sorry, not my problem." > Still, a quick search for "discard" in the "Requirements for Internet > Hosts" standard doesn't say that they *have* to discard packets. > When I throw a bottle of milk to the ground and the bottle breaks, there is no standard which says that the milk MUST spill out. (However, it's best current practice to wipe it away, because the milk will become sour otherwise and will smell really nasty.) > Again, my main motivation for writing a provocative email was that I'm > frustrated at people saying "We're allowed to specify TCP to behave > badly on highly-buffered links, but link designers aren't failing if > they design links that behave badly with highly-aggressive E2E > protocols". > We should keep in mind the reason for buffering. There was a discussion of this issue some weeks ago. In a private mail, I once was told: The reason for buffering is to cope with asynchronous packet arrival. Period. That was the most concise statement I've ever heard on this issue. And of course, some people refer to the book of Len Kleinrock and the drawings found there and quoted by Raj Jain and many others which deal with the "power" of a queuing system and optimum throughput and a knee.... Some weeks ago, I found a wonderful article which recognized a delay larger than the "knee" delay as congestion indicator This is an appealing idea and I love it. There is only one minor question left: What's the delay for the "knee"? And how can this be determined without any knowledge about the path and the traffic structure? So, the major purpose of buffering is to cope with asynchronous arrival. And when there is some minor benefit for the achieved throughput from properly designed buffers, I don't mind. > TCP congestion control was only ever intended as an emergency hack. > When each thoroughly designed optimal solution for a problem would at least be half as successful as VJ's "hack", the world would be a better one. In my opinion, the congavoid paper is not a "quick hack" but simply a stroke of genius. I don't know whether this is common sense, but I'm convinced that this is certainly one of the most important papers ever. (With respect to my remark on Lehman Bro's: I wish, some of the decision makers from that "very weekend" would have proposed an emergency hack for the problem like TCP congestion control. This would have spared us much problems. However, we should think positive: The financial market is now delivered from congestion for quite a long period of time...) > It is remarkable that it has worked as well as it has, but why do we > have to keep designing networks to support the hack? As I said in > First: Because it works. Second: Up to know, nothing better is known. > reply to Detlef, a good TCP can make link designers lazy. However, we > shouldn't let good links make us as TCP / E2E designers lazy. > > For me, I can say, I'm not going to get lazy ;-) The real concern is the proper separation of concerns: Who carries the burden of reliable delivery? Is this due to the link? Or is this due to the end points? That's the reason why I think, that this mailing list here is highly appropriate for this issue: The proper distribution of the "reliability concern" among the OSI layers 1 to 4 and among links and nodes respectively is a typical end-to-end issue. > The IETF may have standardised TCP, but what if the IEEE > "standardises" a reliable link protocol (like data centre ethernet), > or the ITU standardises high-reliability ATM (used by DSL modems, > which also get the blame for TCP's excessive aggressiveness)? Should > we change their standards, or ours? The IETF isn't the only > standards body, or even the supreme one. If there are standards that > don't interact well, we should revisit all standards, starting with > the ones we can control. > > We should avoid mixing up different goals. TCP/IP is a generic protocol suite with hardly any assumptions at all. When, e.g. for a data center, there is some proprietary solution much more appropriate than a generic one, it may be of course reasonable to use it. > It is exactly a political problem, between different standards bodies. > > But closer to your analogy, who gives TCP the right to send just what > it pleases? I'm not talking about "an entirely different network", > but one which exists and on which you took measurements. The network > has problems, caused by poor interaction between an IETF protocol and > the hardware one which it runs. One has to change. Why close our > eyes to the possibility of changing the protocol? > > As I said above: The better we know our network, the more assumptions we can make and the better we know our requirements, the more precise and useful our design, both of network components and protocols, will be. TCP/IP is "one size fits all", and this is both, the most important reason for its success and its severest limitation as well. > At the risk of being repetitive, I see the same thing in reverse: I'm > hearing "we can't change TCP to work over the current network, because > TCP is standardized, and it used to work". When I review the proposals for TCP changes made in the last decade, I'm not convinced that no one is willing to consider changes to TCP. However, a recent paper submission of mine was rejected, amongst others, with the remark: "Ouch! You're going to change TCP here!". When there are valid reasons to change protocols, we should consider doing so. > I'm not saying that we > should have links with excessive buffers. I'm not even saying that > they shouldn't unnecessarily drop packets (although it sounds odd). > I'm just saying that we should *also* be open to changing TCP to work > over the links that people build. > > That's what many of us are actually doing. -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From detlef.bosau at web.de Sun Sep 13 04:57:27 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 13 Sep 2009 13:57:27 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> Message-ID: <4AACDE27.5060101@web.de> Ken Calvert wrote: > > What I think is interesting about this discussion is that the original > "framers" [:-)] of TCP saw nothing wrong with an MSL denominated in > minutes, and now delivering a datagram after it spends 10 seconds in > the network is considered harmful. In between came VJCC -- but we've > had that all these years and this is the first time I've heard anyone > suggest it's a problem that packets can survive in the network for > one-sixth of a minute. > > An MSL is required so TCP (or any *practical* reliable transport) can > have certain safety properties. This discussion shows that MSL has > implications for the CC control loop as well. TCP's correctness would > be fine if the MSL were 10 seconds, so *if* the consensus is that > multiple seconds of buffering is broken, why not acknowledge that the > world has changed and make that an "official" IETF policy? Did I get you right: MSL = Maximum Segment Length? Now, then the question is: what's the reason for a MSL of 10 seconds? Is this the maximum segment _size_ of perhaps 50 Gigabytes, which requires this temporal extension? Or is it a heroic effort link which is busy with the error free delivery of one or two bytes? In the latter case, I wouldn't call that MSL but head of line blocking... -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From calvert at netlab.uky.edu Sun Sep 13 05:23:02 2009 From: calvert at netlab.uky.edu (Ken Calvert) Date: Sun, 13 Sep 2009 08:23:02 -0400 (EDT) Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AACDE27.5060101@web.de> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDE27.5060101@web.de> Message-ID: > Did I get you right: MSL = Maximum Segment Length? Sorry - Maximum Segment Lifetime. KC From detlef.bosau at web.de Sun Sep 13 07:04:48 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sun, 13 Sep 2009 16:04:48 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <4AA69AC4.1090507@reed.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDE27.5060101@web.de> Message-ID: <4AACFC00.9040707@web.de> Ken Calvert wrote: > >> Did I get you right: MSL = Maximum Segment Length? > > Sorry - Maximum Segment Lifetime. As ist was said before... sorry.... However, the story remains the same. What is the reason to keep a segment in the network that long? Is it a wireless link between the earth and the moon, where it takes some time for a packet to travel the whole distance? Or is it a recovery layer who does endless retransmissions on link layer? Now, when a packet gets dropped/corrupted on the way, the packet needs retransmission. There's no discussion about that. The question is, whether the retransmission is to be done locally or end-to-end. Perhaps, this does not make really a difference for a _single_ TCP connection on a _dedicated_ path / link. However, for mobile terminals in a cellular network, it make a huge difference whether a base station deals quite some minutes with one packet (as we see this in GPRS) and several terminals are not even serviced, or whether the effort on L2 remains in reasonable limits. -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From lachlan.andrew at gmail.com Sun Sep 13 15:57:00 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Mon, 14 Sep 2009 08:57:00 +1000 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AACDAE3.7090608@web.de> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDAE3.7090608@web.de> Message-ID: 2009/9/13 Detlef Bosau : > Lachlan Andrew wrote: >> >> I have always found >> it odd that people use "best effort" to mean something less that >> "trying as hard as is possible" > > To my understanding, "best effort" is a term from the business world. > > When I send you a book and ship the item "best effort delivery", this means: > I will take the item to the parcel service. And I don't care for the rest. Yep. I find that an odd use of "best" too... >> Still, a quick search for "discard" in the "Requirements for Internet >> Hosts" standard doesn't say that they *have* to discard packets. >> > > When I throw a bottle of milk to ?the ground and the bottle breaks, there is > no standard which says that the milk MUST spill out. > > (However, it's best current practice to wipe it away, because the milk will > become sour otherwise and will smell really nasty.) >> >> Again, my main motivation for writing a provocative email was that I'm >> frustrated at people saying "We're allowed to specify TCP to behave >> badly on highly-buffered links, but link designers aren't failing if >> they design links that behave badly with highly-aggressive E2E >> protocols". >> > > We should keep in mind the reason for buffering. There was a discussion of > this issue some weeks ago. > > In a private mail, I once was told: The reason for buffering is to cope with > asynchronous packet arrival. > > Period. > > That was the most concise statement I've ever heard on this issue. > > And of course, some people refer to the book of Len Kleinrock and the > drawings found there and quoted by Raj Jain and many others which deal with > the "power" of a queuing system and optimum throughput and a knee.... > > Some weeks ago, I found a wonderful article which recognized a delay larger > than the "knee" delay as congestion indicator > > This is an appealing idea and I love it. There is only one minor question > left: What's the delay for the "knee"? > And how can this be determined without any knowledge about the path and the > traffic structure? > > So, the major purpose of buffering is to cope with asynchronous arrival. And > when there is some minor benefit for the achieved throughput from properly > designed buffers, I don't mind. >> TCP congestion control was only ever intended as an emergency ?hack. > > When each thoroughly designed optimal solution for a problem would at least > be half as successful as VJ's "hack", the world would be a better one. > In my opinion, the congavoid paper is not a "quick hack" but simply a stroke > of genius. Absolutely! It was brilliant to realise that loss was telling us something about the network. He also proposed a very robust response to it, over a wide range of conditions. The only reason I call it a hack is to counter the view that it is a carefully engineered solution, and that networks should be designed to show a particular undesirable symptom of congestion just because "TCP needs it". >> It is remarkable that it has worked as well as it has, but why do we >> have to keep designing networks to support the hack? > > First: Because it works. > Second: Up to know, nothing better is known. It works, except on links with large buffers (which exist, whether or not they "should") or for large BDP flows (which exist, and will become more widespread), or for links with non-congestion losses (which exist, and will continue to without "heroic" ARQ). Someone has pointed out that simply the binary backoff of the RTO may be enough to prevent congestion collapse. Who knows what aspect of VJ's algorithm is really responsible for making the internet "work", and how much is simply that we don't see all the details? >> The IETF may have standardised TCP, but what if the IEEE >> "standardises" a reliable link protocol (like data centre ethernet), >> or the ITU standardises high-reliability ATM (used by DSL modems, >> which also get the blame for TCP's excessive aggressiveness)? ?Should >> we change their standards, or ours? ?The IETF isn't the only >> standards body, or even the supreme one. ?If there are standards that >> don't interact well, we should revisit all standards, starting with >> the ones we can control. > > We should avoid mixing up different goals. > > TCP/IP is a generic protocol suite with hardly any assumptions at all. Exactly my point. I don't think TCP should assume that routers drop packets instead of buffering them. We can still use VJ's insight (that we should look for symptoms of congestion, and then back off) without that assumption. >> At the risk of being repetitive, I see the same thing in reverse: ?I'm >> hearing "we can't change TCP to work over the current network, because >> TCP is standardized, and it used to work". > > When I review the proposals for TCP changes made in the last decade, I'm not > convinced that no one is willing to consider changes to TCP. > > However, a recent paper submission of mine was rejected, amongst others, > with the remark: "Ouch! You're going to change TCP here!". > > When there are valid reasons to change protocols, we should consider doing > so. Absolutely. Many in the academic research community are (too?) willing to change TCP. However, it is hard for the academics to make the changes without the IETF's support. 2009/9/14 Detlef Bosau : >> Maximum Segment Lifetime. > > However, the story remains the same. What is the reason to keep a segment in > the network that long? The MSL is not that we should try to keep segments in the network that long, but that protocols should still work if, by mistake, a packet does survive that long. We don't want a misconfigured router somewhere to cause ambiguity between two IP fragments, for example. It was perhaps misleading of me to bring the MSL into the discussion in the first place... (We want the network to be "safe" under those conditions, but shouldn't optimise for them.) The point was that a few seconds of delay is not "wrong", even though it is undesirable. Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From detlef.bosau at web.de Sun Sep 13 18:58:02 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 14 Sep 2009 03:58:02 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <2a3692de0909081547s1695adffmb39e804ccc31e6ce@mail.gmail.com> <4AA6E3AC.9060404@reed.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDAE3.7090608@web.de> Message-ID: <4AADA32A.4010106@web.de> Lachlan Andrew wrote: >> When I send you a book and ship the item "best effort delivery", this means: >> I will take the item to the parcel service. And I don't care for the rest. >> > > Yep. I find that an odd use of "best" too... > It's a realistic one. Next week, I'm going to have an appointment at the employment exchange. And they offer "best effort" care for me. I.e.: I'm aged 46, so I will get no job offers. When I apply for a job, I'm considered as too old - and I will not get an answer. My diploma is too old, I could not do my job etc. etc. etc. I'm simply tired to hear these excuses, while in Stuttgart some "then thousands" (no joke but what's said by our local government) IT experts are sought. However, I'm too old, my diploma is too old, my knowledge is too bad etc. etc. etc. That's best effort care - and that's the reason, why I'm answering to your post hat half past two in the morning - I cannot sleep. :-( I apologize for this personal remark, but it's a practical example for "best effort care". >> When each thoroughly designed optimal solution for a problem would at least >> be half as successful as VJ's "hack", the world would be a better one. >> In my opinion, the congavoid paper is not a "quick hack" but simply a stroke >> of genius. >> > > Absolutely! It was brilliant to realise that loss was telling us > something about the network. He also proposed a very robust response > to it, over a wide range of conditions. > Actually, the network tells us quite a few things. Of course, I'm too old and my diploma is too old, so I'm too stupid to listen at this.... (I cannot ignore my bitterness here....) and that was the reason for my proposal, I wrote some days ago. E.g. the network tells us something about possible throughput, network congestion, network conditions, even when packets are not delivered successfully. Or, to be precise, one can tell the network tell us these things. So, we can of course consider TCP modifications which make TCP run better over lossy networks. And perhaps, we can discuss TCP flavours for particular network conditions. This must be done carefully - however, it may be reasonable in some cases. > The only reason I call it a hack is to counter the view that it is a > carefully engineered solution, What are you missing there for a "careful engineered" solution? Would the congavoid algorithm be more compelling, if Van had added ten pages with formulae and greek symbols to his work? ;-) The idea was in fact shamelessly simple :-) The network tells us that it cannot carry the amount of data we put into it - and we simply halve the amount of data, which is put in the network. The bucket is not large enough to keep all the water, we fill into it - and we simply fill less water into the poor thing ;-) Of course, Van could have added one page in Greek, one page in Latin and one page in Hebrew to his work - however, this would not cause a substantial change. Quite contrary, the work would be worse than, because it's extremely hard to find an analytical, or at least any formal, description of what is happening here. And that can even be seen in Kelly's paper "Charging and Rate Control for Elastic Traffic" from 1997, which is often referred to in this context. It is extremely difficult to apply this work to wireless networks. The difficulty is in the description of - throughput, - rate, - link capacity, - service time in wireless networks. It's extremely hard, to find analytical, _quantitative_, descriptions for that. So, in that case, it's simply a great idea to have _qualitative_ approaches, which offer a simple solution when the network, and I'm using your words here, cries for help :-) > and that networks should be designed to > show a particular undesirable symptom of congestion just because "TCP > needs it". > > I don't agree here. We do not intentionally introduce packet drops because we need it for TCP. It's quite the other way round: Because there is, generally spoken, no scheduling and no central congestion control, please refer to Keshav's PhD Thesis for an alternative approach, which is definitely _not_ the best effort approach IIRC, we encounter drops. So, in the beginning, there were the drops. And TCP was introduced later. And now, as you say, we've seen that a network may cry for help by dropping packets - and we made a virtue of necessity then and used these drops for congestion control. And drops are outstandingly well suited for this purpose! There is no concern that "drop signaling" may get lost - because loss cannot get loss - so we have a reliable signaling mechanism without any further ado. We can simply use it. > >>> It is remarkable that it has worked as well as it has, but why do we >>> have to keep designing networks to support the hack? >>> >> First: Because it works. >> Second: Up to know, nothing better is known. >> > > I forgot a third reason: We do not even design networks that way that they produce drops. The truth is: Packets are dropped - and we can't help it! (Except by use of a central scheduling and rate allocation, see Keshav's work.) > It works, except on links with large buffers (which exist, whether or > not they "should") or for large BDP flows (which exist, and will > become more widespread), or for links with non-congestion losses > (which exist, and will continue to without "heroic" ARQ). > > And that's the problem of "one size fits all". You're right, that in LFNs the startup behaviour may be nasty. And there are quite some approaches around, Westwood, FAST, only to mention these two. Most of them are more or less rate controlled. The problem is to obtain the correct rate for a - typically - unknown path. As soon as you employ some kind of probing for this purpose, the problems are similar to the ones you'll encounter with VJCC. On the other hand: If you have a certain knowledge of the appropriate rates, there's no reason not to use it. The problem is "one size fits all". When I buy a t-shirt, it's most likely to be too short and everyone get's upset. However, it's the largest size available in the store - and once, an acquaintance of mine said she could use a t-shirt in my size as an evening gown.... So, perhaps, I should visit a store for evening gowns to get something appropriate.... However, there's some difficulty with my unemployment compensation, because even _that_ is calculated "one size fits all". No matter, whether you are an unemployed mouse - or an unemployed elephant. One size fits all, and so, everybody gets a t-shirt in "the size" - and everybody gets upset, when the t-shirt is too small for an elephant or too large for a mouse. > Someone has pointed out that simply the binary backoff of the RTO may > be enough to prevent congestion collapse. Certainly it is. And certainly a lobotomy is enough to cure some kinds of mental illness. Unfortunately, there are these regrettable side effects. Binary backoff is a drastic measure. And sometimes _too_ drastic. If you encounter some transient link outage with your mobile, the RTO rapidly increases into ranges of minutes. And it takes quite some time, and I think you mentioned the problem yourself some weeks ago, to reestablish a proper ACK clock when the link is available again. > Who knows what aspect of > VJ's algorithm is really responsible for making the internet "work", > and how much is simply that we don't see all the details? > Who _cares_? I sometimes referred to the well known "Galloping Gertie", i.e. the Tacoma bridge disaster. How do we prevent this kind of disasters? With thousands of sheets of paper with formulae? No. Unfortunately, we could not stop the wind from blowing, when Gertie started its gallop, but actually, this is quite usually the way to go: When a system starts oscillating, try to get energy out of it. In some buildings, there are compensators for this purpose. And even if they don't exactly match the building's Eigenfrequency, the main thing is that they kill energy. This may be not an elegant mathematical solution, but it protects life. (Oh, Wikipedia is nice :-) I found the correct English word for it: Tuned mass damper. http://www.popularmechanics.com/technology/industry/1612252.html ) And be assured: No one counts the people who actually stay in the Taipei 101 tower, because the number of persons who actually stay in the building, might shift the tower's eigenfrequency. And that's the same in computer networks. If they start oscillating or are getting instable (i.e. the queues grow too large) - you kill energy, i.e. drop packets. >> TCP/IP is a generic protocol suite with hardly any assumptions at all. >> > > Exactly my point. I don't think TCP should assume that routers drop > packets instead of buffering them. We can still use VJ's insight > (that we should look for symptoms of congestion, and then back off) > without that assumption. > > O.k., so you don't kill energy but tune the system ;-) (The analogy to an oscillating system may become more obvious, if you recall newton's cradle. I'm not quite sure about VJ's education but from that analogy and the "conservation principle", I strongly presume that he is a physicist.) >> When I review the proposals for TCP changes made in the last decade, I'm not >> convinced that no one is willing to consider changes to TCP. >> >> However, a recent paper submission of mine was rejected, amongst others, >> with the remark: "Ouch! You're going to change TCP here!". >> >> When there are valid reasons to change protocols, we should consider doing >> so. >> > > Absolutely. Many in the academic research community are (too?) > willing to change TCP. However, it is hard for the academics to make > the changes without the IETF's support. > > Now: In the academic research community, one can hardly be too willing to change TCP. It's simply our job to ask questions and assess answers and solutions. Whether some change or some protocol will be deployed, is a different story. I don't know how "large" the Internet was, when VJ proposed VJCC. 40 nodes? 100 nodes? So, it wasn't a big deal to simply try this "hack" and see what happens. Nowadays, there are a little more nodes than then - therefore, deployment of new protocols may be a bit more difficult. > 2009/9/14 Detlef Bosau : > >>> Maximum Segment Lifetime. >>> >> However, the story remains the same. What is the reason to keep a segment in >> the network that long? >> > > The MSL is not that we should try to keep segments in the network that > long, but that protocols should still work if, by mistake, a packet > does survive that long. We don't want a misconfigured router > somewhere to cause ambiguity between two IP fragments, for example. > > It was perhaps misleading of me to bring the MSL into the discussion > in the first place... (We want the network to be "safe" under those > conditions, but shouldn't optimise for them.) The point was that a > few seconds of delay is not "wrong", even though it is undesirable. > > Cheers, > Lachlan > > -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From lachlan.andrew at gmail.com Sun Sep 13 20:04:18 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Mon, 14 Sep 2009 13:04:18 +1000 Subject: [e2e] What's wrong with this picture? In-Reply-To: <4AADA32A.4010106@web.de> References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDAE3.7090608@web.de> <4AADA32A.4010106@web.de> Message-ID: 2009/9/14 Detlef Bosau : > >> The only reason I call it a hack is to counter the view that it is a >> carefully engineered solution, > > What are you missing there for a "careful engineered" solution? > > Would the congavoid algorithm be more compelling, if Van had added ten pages > with formulae and greek symbols to his work? ;-) It would have been more compelling as a carefully engineered solution (to resource allocation, rather than congestion avoidance) if his paper hadn't said "While algorithms at the transport endpoints can insure the network capacity isn?t exceeded, they cannot insure fair sharing of that capacity". This thread isn't really about TCP-friendliness, but that has been a stumbling block in implementing any other form of congestion control, which might work on highly-buffered links. > The idea was in fact shamelessly simple :-) The network tells us that it > cannot carry the amount of data we put into it - and we simply halve the > amount of data, which is put in the network. Yep, I agree it was a great solution for the network at the time. >> ?and that networks should be designed to >> show a particular undesirable symptom of congestion just because "TCP >> needs it". > > I don't agree here. > > We do not intentionally introduce packet drops because we need it for TCP. Then why is having a large buffer bad? > And now, as you say, we've seen that a network may cry for help by dropping > packets - and we made a virtue of necessity then and used these drops for > congestion control. I agree. We should listen to drops. We should also listen to delay. > I forgot a third reason: We do not even design networks that way that they > produce drops. The truth is: Packets are dropped - and we can't help it! > (Except by use of a central scheduling and rate allocation, see Keshav's > work.) a) We can, with backpressure. b) The issue is not whether we should *ever* drop packets. David's point was that we should drop them even if we can pay for enough buffer space to keep them. (Given the current TCP, he is right.) >> It works, except on... > > And that's the problem of "one size fits all". We could have a one-size-fits-all solution which also responds to excessive delay. >> Someone has pointed out that simply the binary backoff of the RTO may >> be enough to prevent congestion collapse. > > Binary backoff is a drastic measure. > > And sometimes _too_ drastic. If you encounter some transient link outage > with your mobile, the RTO rapidly increases into ranges of minutes. I agree. We should have *something* else. However, many things other than AIMD are good enough for that "something", and safe enough to try. My point was that I don't think anyone has done a large-scale trial of any other congestion control, and found that it doesn't "work". (For transient outage, binary backoff only over-estimates the duration of the outage by a factor of 2. It takes minutes to increase to the range of minutes. Binary backoff is more of a problem if we happen to get a large number of "random" losses.) >> Who knows what aspect of >> VJ's algorithm is really responsible for making the internet "work", >> and how much is simply that we don't see all the details? >> > > Who _cares_? Anyone who says "we use it rather than scheme A because it works" should care, especially when looking at a case where it doesn't work. > In some buildings, there are compensators for this purpose. And even if they > don't exactly match the building's Eigenfrequency, the main thing is that > they kill energy. > > This may be not an elegant mathematical solution, but it protects life. The issue of this thread wasn't whether modelling is good. However, since you bring it up: The reason people know that they need damping at all is because they understand the mathematics behind the dynamics. Without that insight, people would say "Let's just build stronger walls". > And that's the same in computer networks. If they start oscillating or are > getting instable (i.e. the queues grow too large) - you kill energy, i.e. > drop packets. Alternatively, if queues grow too large, you can reduce the rate at which you inject them into the network. That is what congestion control is all about. >> I don't think TCP should assume that routers drop >> packets instead of buffering them. ?We can still use VJ's insight >> (that we should look for symptoms of congestion, and then back off) >> without that assumption. > > O.k., so you don't kill energy but tune the system ;-) If packets are energy, the amount of energy removed by reducing the send rate is much more than that removed by dropping any reasonable fraction of packets. I'm not saying how we should design buffers. I'm just suggesting that we should design TCP to listen to all available congestion signals, rather than saying that the link is "bad" if sends the packets that *we* have sent it (including others using the same algorithm as us). Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From detlef.bosau at web.de Mon Sep 14 09:05:45 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Mon, 14 Sep 2009 18:05:45 +0200 Subject: [e2e] What's wrong with this picture? In-Reply-To: References: <0B0A20D0B3ECD742AA2514C8DDA3B0650295F874@VGAEXCH01.hq.corp.viasat.com> <2a3692de0909101737q63abdecbi95ee34892798c9ba@mail.gmail.com> <4AAA6614.9040402@reed.com> <4AABF281.1020300@reed.com> <4AACDAE3.7090608@web.de> <4AADA32A.4010106@web.de> Message-ID: <4AAE69D9.5080108@web.de> Lachlan Andrew wrote: > It would have been more compelling as a carefully engineered solution > (to resource allocation, rather than congestion avoidance) if his > paper hadn't said > "While algorithms at the transport endpoints can insure the network > capacity isn?t exceeded, they cannot insure fair sharing of that > capacity". > > Unfortunately, VJ is correct here. As long as paths are lossy, either due to congestion or due to corruption, you cannot predict how much of the sent data is actually kept in the network path. Two flows may have sent 10000 bytes each - and actually, 1000 bytes from the first flow are dropped due to congestion and 800 bytes from the second one are lost due to packet corruption, so these two flows don't share the available resources equally. (BTW: IIRC, Kelly's paper does not pay attention to this problem.) >>> and that networks should be designed to >>> show a particular undesirable symptom of congestion just because "TCP >>> needs it". >>> >> I don't agree here. >> >> We do not intentionally introduce packet drops because we need it for TCP. >> > > Then why is having a large buffer bad? > > Large buffers may introduce long service times, sometimes packet bursts and some authors write about long term dependencies which are the reason for self similarity of network traffic. >> And now, as you say, we've seen that a network may cry for help by dropping >> packets - and we made a virtue of necessity then and used these drops for >> congestion control. >> > > I agree. We should listen to drops. We should also listen to delay. > > One problem with delay is that the observed delay itself is a stochastic variable. The observed values may spread around some expectation. Actually, it is extremely hard to make a significant observation with only one experiment. So, an extreme outlier could be mistaken as a congestion indication. So, you have "false positive" results to some extent. >> I forgot a third reason: We do not even design networks that way that they >> produce drops. The truth is: Packets are dropped - and we can't help it! >> (Except by use of a central scheduling and rate allocation, see Keshav's >> work.) >> > > a) We can, with backpressure. > ....and infinite backlog. IIRC, in the BSD kernel, there is a function tcpquench() (?) which is called when a packet cannot be enqueued at an outgoing interface, so the sending attempt is postponed and cwnd is halved. This works at the sender, unfortunately it doesn't work along the path because a router "in between" cannot postpone an already sent packet. Actually, this mechanism ensures fairness between two TCP flows sharing the same sender ad receiver. If you didn't have this mechanism, this fairness issue would be left to the OS's scheduler and you could not provide for resource fairness for TCP flows on single tasking systems, e.g. MS-DOS (excuse me ;-)) and the KA9Q stack. > b) The issue is not whether we should *ever* drop packets. David's > point was that we should drop them even if we can pay for enough > buffer space to keep them. (Given the current TCP, he is right.) > > Yes, of course. Or should we accept infinite head of line blocking for _all_ competing flows when only _one_ listener e.g. in a cellular network has a problem? > We could have a one-size-fits-all solution which also responds to > excessive delay. > > So we're looking for a "one size fits all significance test" for delays.... ;-) _That's_ the very problem. >> Binary backoff is a drastic measure. >> >> And sometimes _too_ drastic. If you encounter some transient link outage >> with your mobile, the RTO rapidly increases into ranges of minutes. >> > > I agree. We should have *something* else. However, many things other > than AIMD are good enough for that "something", and safe enough to > try. My point was that I don't think anyone has done a large-scale > trial of any other congestion control, and found that it doesn't > "work". > > (For transient outage, binary backoff only over-estimates the duration > of the outage by a factor of 2. ^n. 2^n ;-) n is the "time out counter". There is an exponential growth. > It takes minutes to increase to the > range of minutes. Binary backoff is more of a problem if we happen to > get a large number of "random" losses.) > > Absolutely. And that's the scenario where the exponential growth becomes a problem. >> >> Who _cares_? >> > > Anyone who says "we use it rather than scheme A because it works" > should care, especially when looking at a case where it doesn't > work. > That's always true: We have to pay a close look to a scenario, where a scheme fails. > >> In some buildings, there are compensators for this purpose. And even if they >> don't exactly match the building's Eigenfrequency, the main thing is that >> they kill energy. >> >> This may be not an elegant mathematical solution, but it protects life. >> > > The issue of this thread wasn't whether modelling is good. > However, since you bring it up: The reason people know that they need > damping at all is because they understand the mathematics behind the > dynamics. Without that insight, people would say "Let's just build > stronger walls". > > It's both. They understand the mathematics and see, why stronger walls alone wouldn't work. However, they understand the mathematics and see, that there is a simple and practical solution to the problem. >> And that's the same in computer networks. If they start oscillating or are >> getting instable (i.e. the queues grow too large) - you kill energy, i.e. >> drop packets. >> > > Alternatively, if queues grow too large, you can reduce the rate at > which you inject them into the network. That is what congestion > control is all about. > > It's both. Of course: Reducing the rate heals the _reason_ for the problem. Dropping packets alleviates the _symptom_ of the problem. It's a bit like our secretary of treasury after the Lehman's crash: "When a house is burning, we have to extinguish the fire. No matter whether it is malicious arson or not." > I'm not saying how we should design buffers. I'm just suggesting that > we should design TCP to listen to all available congestion signals, > rather than saying that the link is "bad" if sends the packets that > *we* have sent it (including others using the same algorithm as us). > > I agree. However: The problem is not dealing with dedicated links. The problem is dealing with shared ones. Because on shared links (e.g. one base station, eight mobiles and therefore eight logical links) one bad link can usurp all the resources in the cell and a head of line blocking for _one_ link can cause severe harm to all the others. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From touch at ISI.EDU Mon Sep 14 11:43:24 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 14 Sep 2009 11:43:24 -0700 Subject: [e2e] 64-bit timestamps? In-Reply-To: <4AA6FDA1.9000307@reed.com> References: <4AA6CF5E.7080707@gmail.com> <4AA6D9A5.6040601@reed.com> <4AA6DDEF.4080907@gmail.com> <4AA6FDA1.9000307@reed.com> Message-ID: <4AAE8ECC.4060100@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, all, David P. Reed wrote: > List moderator - please suspend Simpson's privileges; the rules suggest > that his obnoxious behavior towards a helpful comment demand moderation > until he stops behaving this way. Posts directed at the list moderator need to be sent to one of the following addresses: end2end-interest-owner at postel.org touch at isi.edu (even better) Requests sent to the list itself may be detected late (as in this case) or not at all (I don't check every post for such requests). I do appreciate the efforts of all involved to try to get back to the discussion at hand. Joe (as list admin) > On 09/08/2009 06:42 PM, William Allen Simpson wrote: >> David P. Reed wrote: >>> In regard to DNS security issues, ... But PAWS may not be useful, >>> since DNS itself might be made to maintain state across connections, >>> moving the problem out of TCP and into the app (DNS) layer where it >>> probably belongs. >>> >> This has no relation to the question that I asked, which has no mention >> what-so-ever about DNS security. Nor did I find the cut and paste of an >> old familiar RFC appendix particularly informative, not even in fancy >> multi-part alternative html (instead of the native format).... >> >> In any case, I've been paying attention to the more recent 1323bis. >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) iEYEARECAAYFAkqujswACgkQE5f5cImnZrvNwACeLJScAqT8BvEKI3cfP/NHjN3n c0YAnjJtpHvCqCXeE4HTS/F5vBpHz4YE =zCmz -----END PGP SIGNATURE----- From silvestris at di.uniroma1.it Wed Sep 16 00:48:43 2009 From: silvestris at di.uniroma1.it (Simone Silvestri) Date: Wed, 16 Sep 2009 09:48:43 +0200 Subject: [e2e] PerNEM 2010 - Deadline approaching Message-ID: <4AB0985B.9000001@di.uniroma1.it> We apologize if you receive multiple copies of this Call for Papers. ****************************************************************** PerNEM 2010 Call for Papers The First Annual Workshop on Pervasive Networks for Emergency Management (In conjunction with IEEE PerCom 2010) http://san.ee.ic.ac.uk/pernem2010/ Mannheim, Germany, March 29 - April 2, 2010 Scope ----- The events that took place on 11 September 2001 have brought to the forefront the unique challenges that occur during a crisis, which require effective sensing, communications and decision making with demanding time constraints in highly dynamic environments. Pervasive systems address these requirements by providing decision support to rescuers and evacuees, guaranteeing communications and collecting information that is vital for planning and organising the emergency operation. This workshop focuses on pervasive networked sensing and decision making, both wired and wireless, geared towards emergency management. PerNEM 2010 addresses leading edge research in these areas through the use of sensing, communication, decision support, simulation tools and modelling methods with focus on system design, optimisation and experimental evaluation. Topics ------ PerNEM will bring together contributions which include but are not limited to the following areas: * Networked sensors for emergency management * Pervasive middleware for emergency management * Decentralised algorithms for pervasive systems * Self-aware and self- adaptive network design and evaluation * Network self-healing, security and self-defence * Wireless Networks for emergency support * Mobile sensors for disaster monitoring * Networked robotics for wireless communications * Pervasive emergency management systems * QoS in critical communications Registration and Submission Details ----------------------------------- Accepted papers will be included and indexed in the IEEE digital libraries (Xplore), showing their affiliation with IEEE PerCom. Submitted papers should be no longer than 6 pages in length, and formatted to 2 columns, 10pt fonts, using the IEEE Computer Society 8.5" x 11" authors kit. Papers should be submitted via the EasyChair PerNEM 2010 page: http://www.easychair.org/conferences/?conf=pernem2010 Paper submission: September 28, 2009 Author notification: December 21, 2009 Camera-ready due: January 29, 2010 Organising Committee -------------------- Erol Gelenbe Intelligent Systems & Networks Group, Imperial College London, UK Georgia Sakellari Intelligent Systems & Networks Group, Imperial College London, UK Avgoustinos Filippoupolitis Intelligent Systems & Networks Group, Imperial College London, UK Programme Committee ------------------- Christoforos Anagnostopoulos Institute for Mathematical Sciences, Imperial College London, UK Gokce Gorbil Intelligent Systems & Networks Group, Imperial College London, UK Alex Healing Centre for Information and Security Systems Research, British Telecom,UK Laurence Hey Intelligent Systems & Networks Group, Imperial College London, UK Eleni Karatza Department of Informatics, Aristotle University of Thessaloniki, Greece Georgios Loukas Intelligent Systems & Networks Group, Imperial College London, UK Gulay Oke Istanbul Technical University, Turkey Alex Rogers Intelligence, Agents, Multimedia Group, University of Southampton, UK Simone Silvestri University of Rome "La Sapienza", Italy Oliver Smith General Dynamics UK Ltd From detlef.bosau at web.de Fri Sep 18 06:59:27 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 18 Sep 2009 15:59:27 +0200 Subject: [e2e] How many transmission attempts should be done on wireless networks? Message-ID: <4AB3923F.9040902@web.de> Hi to all, this debate continues to appear here now and then in the list ;-) and to my understanding, the positions are quite extreme here. On the one hand, there are some standards, who do "heroic effort" (credits to DPR ;-)) and allow to configure even a SDU corruption ratio 10?9. On the other hand, there is e.g. DPR, who does hardly any effort and says, there should be typically not more than three transmission attempts. I would like to understand these positions a bit better than now and perhaps, there are some strong facts, which support the one or the other view. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From dpreed at reed.com Fri Sep 18 08:38:50 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 18 Sep 2009 11:38:50 -0400 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB3923F.9040902@web.de> References: <4AB3923F.9040902@web.de> Message-ID: <4AB3A98A.1010105@reed.com> To me it comes down to a simple thing: end-to-end packet latency matters to the higher layers, controlling the combined incoming rate of packets must be done at the endpoints, since they are the only place to back off, Internal bottlenecks cause large queueing delays to build up, and absent packet drops, those queues will drain only as fast as the outgoing links that service them can carry packets, The *application* control loops to manage application layer backoff share the same paths and the same queueing delays as do all the rest of the packets, (this could be ameliorated by making a separate flow control channel, but there are a wide range of applications that would prefer to back off intelligently - e.g. use lower rate codecs or decide to go drink coffee and come back when the net is more lightly loaded - so the control channel cannot be based on some fixed theory that hides control from apps). In any case, in today's TCP, we should be seeking Kleinrock's optimum steady state - a pipelining state such that no router *ever* has more than one packet in each outgoing queue (on the average). On links that are not bottlenecks, the average is << 1 packet in the outgoing queue, on links that are bottlenecks, the combined flows traversing that link are such that the average queue length is 1 packet. This is the "minimal latency" state that sustains the maximum overall achievable throughput of a steady state network. Analogous to a single-queue being "double buffered", where a new packet arrives just as the old packet completes. Because of the delays in the control loops at the endpoints that cannot measure without long delay, entry and exit of new flows, and other non-steady-state things, we actually are able to tolerate queue averages a few packets (2-3) long on the average, in service of keeping throughput up in the worst case without resonances and other problems that relate to control instabilities. It's not the size of the buffers that matters - they handle transients better. What matters is not dropping packets feeding into a bottleneck aggressively enough to keep the outgoing queue drained, so it doesn't build up sustained backups of obsolete packets. On 09/18/2009 09:59 AM, Detlef Bosau wrote: > Hi to all, > > this debate continues to appear here now and then in the list ;-) and > to my understanding, the positions are quite extreme here. > > On the one hand, there are some standards, who do "heroic effort" > (credits to DPR ;-)) and allow to configure even a SDU corruption > ratio 10?9. > > On the other hand, there is e.g. DPR, who does hardly any effort and > says, there should be typically not more than three transmission > attempts. > > I would like to understand these positions a bit better than now and > perhaps, there are some strong facts, which support the one or the > other view. > > Detlef > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090918/be35c767/attachment.html From detlef.bosau at web.de Fri Sep 18 09:06:28 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Fri, 18 Sep 2009 18:06:28 +0200 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB3A98A.1010105@reed.com> References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> Message-ID: <4AB3B004.4000400@web.de> O.k., so that's the case for small queues: small queuing latencies and endpoints, which back off intelligently. But what's the reason for doing only a small number of retransmissions locally on a lossy channel? One point, you mentioned, is that an application may want to pause and wait for better channel conditions (you talked about load conditions, but this is comparable). Basically, my question is to which extent recovery on lossy links should be done locally and to which extent recovery should be left to the end points. Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From dpreed at reed.com Fri Sep 18 10:49:57 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 18 Sep 2009 13:49:57 -0400 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB3B004.4000400@web.de> References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> Message-ID: <4AB3C845.3000304@reed.com> Queues and retransmissions are inseparable. You have to maintain a queue while retransmitting. Ideally (in coding theory) you never would retransmit the same thing twice. You would instead transmit a smaller and different piece of information the second time, so that the combination at the receiver would regenerate the lost information. For an erasure channel, you could use low rate Reed-Solomon codes or other kinds of things over the link. For other kinds of channels, you could do other kinds of things. But the key thing here regarding latency and throughput is: don't create a queue behind your retransmission by focusing on moving the current packet (becoming older and older) at "heroic cost". This is an "end-to-end argument" about placing function (reliable delivery) in the wrong place (at the link layer in wireless). Larry Roberts fumes (and I support him) about 802.11 systems that won't stop retransmitting one Ethernet packet until 255 tries have been made. That means that congestion is not signaled and routing is not changed for 255 times too long. What the link should do is fragment packets to tinier deliverable units, reassemble them, and stop creating a backup in the path. On 09/18/2009 12:06 PM, Detlef Bosau wrote: > > O.k., so that's the case for small queues: small queuing latencies and > endpoints, which back off intelligently. > > But what's the reason for doing only a small number of retransmissions > locally on a lossy channel? > > One point, you mentioned, is that an application may want to pause and > wait for better channel conditions (you talked about load conditions, > but this is comparable). > > Basically, my question is to which extent recovery on lossy links > should be done locally and to which extent recovery should be left to > the end points. > > Detlef > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090918/5d72a9e6/attachment.html From L.Wood at surrey.ac.uk Fri Sep 18 11:20:07 2009 From: L.Wood at surrey.ac.uk (L.Wood@surrey.ac.uk) Date: Fri, 18 Sep 2009 19:20:07 +0100 Subject: [e2e] How many transmission attempts should be done on wireless networks? References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> Message-ID: <4835AFD53A246A40A3B8DA85D658C4BE01368A55@EVS-EC1-NODE4.surrey.ac.uk> > Basically, my question is to which extent recovery on lossy links should > be done locally and to which extent recovery should be left to the end > points. It depends. See RFC3366. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090918/87e5afe9/attachment.html From dpreed at reed.com Fri Sep 18 18:13:22 2009 From: dpreed at reed.com (David P. Reed) Date: Fri, 18 Sep 2009 21:13:22 -0400 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB3B004.4000400@web.de> References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> Message-ID: <4AB43032.2020003@reed.com> Let me clarify that in making my comment about 802.11 systems that won't stop retransmitting until 255 tries have been made, I was citing Larry Roberts. I have not done such measurements. If Larry is right and common chipsets retry up to 255 tries, I support his point. Not everything Larry says is something I agree with, and I have not independently done a test of current chipsets myself. If no one else has done so, I may do those tests myself.... after the observations of ATT's 8-10 second backlogs, I'm starting to believe there are likely to be lots of vendor supplied equipment that may fail to signal congestion backlogs properly. Has anyone surveyed currently-on-the-market 802.11 implementations for overly heroic retransmission strategies? Hari B. did a number of them, but I haven't seen a published dataset. Note, if a shared channel like "listen-before-talk" wireless Ethernet is buffered among all competing flows, merely holding on to a few packets per interface can destroy end-to-end congestion control by creating a slow-to-drain backlog, if the number of interfaces sharing a common channel is large. From lachlan.andrew at gmail.com Fri Sep 18 18:44:20 2009 From: lachlan.andrew at gmail.com (Lachlan Andrew) Date: Sat, 19 Sep 2009 11:44:20 +1000 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB3C845.3000304@reed.com> References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> <4AB3C845.3000304@reed.com> Message-ID: 2009/9/19 David P. Reed : > Queues and retransmissions are inseparable.? You have to maintain a queue > while retransmitting.?? Ideally (in coding theory) you never would > retransmit the same thing twice.?? You would instead transmit a smaller and > different piece of information the second time, so that the combination at > the receiver would regenerate the lost information. Very true. I believe Detlef is interested in HSPA systems which already do hybrid ARQ, rather than retransmitting the same thing twice. His question still applies. In information-theoretic terms, Detlef's could have asked "how much water-filling-over-time" should there be? If a channel is bad, information theory tells us to delay sending information until the channel becomes good. That is essentially what heroic hybrid ARQ repeated retransmissions (with increasing delays) does. However, this causes obvious problems to applications, which are ignored by traditional information theory. It would be interesting to separate this question into: a) What is the optimum for running VJ's TCP? b) What is the optimum for more general congestion control? The answers may be very different. One issue which I couldn't see in RFC3366 is the effect of the fading rate. If fading causes outages much shorter than a RTT (unknown at the link -- another story), then AIMD benefits from aggressive retransmission so that its window is the right size for the average rate achieved over a whole RTT. That is the fastest granularity that AIMD can effectively control. However, if the outages and resulting queueing become significant compared to the propagation component of the RTT, then perhaps fewer ARQs should be performed. (This is better than having aggressive retransmission and a short queue, since it is drop-from-front, which is better than drop-tail.) Cheers, Lachlan -- Lachlan Andrew Centre for Advanced Internet Architectures (CAIA) Swinburne University of Technology, Melbourne, Australia Ph +61 3 9214 4837 From L.Wood at surrey.ac.uk Sat Sep 19 02:02:02 2009 From: L.Wood at surrey.ac.uk (L.Wood@surrey.ac.uk) Date: Sat, 19 Sep 2009 10:02:02 +0100 Subject: [e2e] How many transmission attempts should be done on wireless networks? References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> <4AB43032.2020003@reed.com> Message-ID: <4835AFD53A246A40A3B8DA85D658C4BE01368A5A@EVS-EC1-NODE4.surrey.ac.uk> David P. Reed writes: > Let me clarify that in making my comment about 802.11 systems that won't > stop retransmitting until 255 tries have been made, I was citing Larry > Roberts. I have not done such measurements. If Larry is right and > common chipsets retry up to 255 tries, I support his point. Not > everything Larry says is something I agree with, and I have not > independently done a test of current chipsets myself. In 802.11 DCF, failure to receive an expected link-layer ack causes the sender to wait a random number of (50us) slot times between 0 to to J. J is calculated based on previous # of access attempts plus a bias factor. The first retransmission may go out after up to 15 slot times (750us); subsequent retransmissions may go out after up to 255 slot times (12.75ms) - the (channel) contention window size. This increasing DCF backoff delay before sending is intended to work around interference. Number of slots waited before attempting a resend is not number of resends. L. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090919/a6d8b97c/attachment.html From detlef.bosau at web.de Sat Sep 19 03:37:23 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 19 Sep 2009 12:37:23 +0200 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: <4AB43032.2020003@reed.com> References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> <4AB43032.2020003@reed.com> Message-ID: <4AB4B463.8020903@web.de> David P. Reed wrote: > Let me clarify that in making my comment about 802.11 systems that > won't stop retransmitting until 255 tries have been made, I was citing > Larry Roberts. I just got a notice by Manu Lochin, who described an even worse scenario in some 802.11b implementation. However, it was a PM, so perhaps Manu will perhaps provide a comment to the list as well? Detlef -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From detlef.bosau at web.de Sat Sep 19 03:54:38 2009 From: detlef.bosau at web.de (Detlef Bosau) Date: Sat, 19 Sep 2009 12:54:38 +0200 Subject: [e2e] How many transmission attempts should be done on wireless networks? In-Reply-To: References: <4AB3923F.9040902@web.de> <4AB3A98A.1010105@reed.com> <4AB3B004.4000400@web.de> <4AB3C845.3000304@reed.com> Message-ID: <4AB4B86E.5030604@web.de> Lachlan Andrew wrote: > 2009/9/19 David P. Reed : > >> Queues and retransmissions are inseparable. You have to maintain a queue >> while retransmitting. Ideally (in coding theory) you never would >> retransmit the same thing twice. You would instead transmit a smaller and >> different piece of information the second time, so that the combination at >> the receiver would regenerate the lost information. >> > > Very true. I believe Detlef is interested in HSPA systems which > already do hybrid ARQ, rather than retransmitting the same thing > twice. His question still applies. > No, at the moment I'm definitely more generic. The point is: When ARQ can guarantee a maximum SDU corruption ratio of, say, 10^-3 or 10^-4, we could treat wireless links similar to wired once - at a _very_ first glance. There would be no loss differentiation problem any more - and the world were fine =8-) "Always look on the bright side of life!" *whistle* > In information-theoretic terms, Detlef's could have asked "how much > water-filling-over-time" should there be? This is already a second glance ;-) And an important one too. VJCC attempts fair distribution of _capacity_, wrt to your terms: pipes and buckets. Or fair water-filling-over-time done by the individual sources. And actually, retransmissions appear as water-filling by an "unknown", yet not controlled, source. > If a channel is bad, > information theory tells us to delay sending information until the > channel becomes good. That's the approach pursued by opportunistic scheduling and a result of IIRC Stephen Hanly's PhD thesis. And I think Hari has done some work in this direction as well for 802.11. > That is essentially what heroic hybrid ARQ > repeated retransmissions (with increasing delays) does. Really? Particularly in HSDPA, I don't see e.g. an adaptation to changing channel conditions. Actually, a CQI is chosen once, and then a transport block is sent in not more than three attempts. (Of course with HARQ, i.e. e.g. chase combining or, I think this is done more often, incremental redundancy.) The limitation to a maximum of three attempts makes sense, because the channel estimation does not provide a forecast of the long term channel conditions but only for its "actual state", which may be sufficient for the next, say, 10 ms or so. > However, this > causes obvious problems to applications, which are ignored by > traditional information theory. > > Yes. And I think, this is hard stuff! Because it's extremely difficult to forecast the number of sending attempts actually being needed to guarantee a certain SDU corruption probability. -- Detlef Bosau Galileistra?e 30 70565 Stuttgart phone: +49 711 5208031 mobile: +49 172 6819937 skype: detlef.bosau ICQ: 566129673 http://detlef.bosau at web.de From end2end-interest at postel.org Tue Sep 22 04:22:33 2009 From: end2end-interest at postel.org (VIAGRA Inc.) Date: Tue, 22 Sep 2009 03:22:33 -0800 Subject: [e2e] Pharmacy Online Sale 80% OFF! Message-ID: <20090922-52233.5258.qmail@OzzyCell> An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090922/69902ef3/attachment.html -------------- next part -------------- New from WebMD: Dear end2end-interest at postel.org!. Sign-up today! You are subscribed as end2end-interest at postel.org. View and manage your WebMD newsletter preferences. Subscribe to more newsletters. Change/update your email address. WebMD Privacy Policy WebMD Office of Privacy 1175 Peachtree Street, Suite 2400, Atlanta, GA 30361 ? 2009 WebMD, LLC. All rights reserved. From silvestris at di.uniroma1.it Thu Sep 24 01:10:33 2009 From: silvestris at di.uniroma1.it (Simone Silvestri) Date: Thu, 24 Sep 2009 10:10:33 +0200 Subject: [e2e] PerNEM 2010: few days left! Message-ID: <4ABB2979.9020507@di.uniroma1.it> We apologize if you receive multiple copies of this Call for Papers. ****************************************************************** PerNEM 2010 Call for Papers The First Annual Workshop on Pervasive Networks for Emergency Management (In conjunction with IEEE PerCom 2010) http://san.ee.ic.ac.uk/pernem2010/ Mannheim, Germany, March 29 - April 2, 2010 Scope ----- The events that took place on 11 September 2001 have brought to the forefront the unique challenges that occur during a crisis, which require effective sensing, communications and decision making with demanding time constraints in highly dynamic environments. Pervasive systems address these requirements by providing decision support to rescuers and evacuees, guaranteeing communications and collecting information that is vital for planning and organising the emergency operation. This workshop focuses on pervasive networked sensing and decision making, both wired and wireless, geared towards emergency management. PerNEM 2010 addresses leading edge research in these areas through the use of sensing, communication, decision support, simulation tools and modelling methods with focus on system design, optimisation and experimental evaluation. Topics ------ PerNEM will bring together contributions which include but are not limited to the following areas: * Networked sensors for emergency management * Pervasive middleware for emergency management * Decentralised algorithms for pervasive systems * Self-aware and self- adaptive network design and evaluation * Network self-healing, security and self-defence * Wireless Networks for emergency support * Mobile sensors for disaster monitoring * Networked robotics for wireless communications * Pervasive emergency management systems * QoS in critical communications Registration and Submission Details ----------------------------------- Accepted papers will be included and indexed in the IEEE digital libraries (Xplore), showing their affiliation with IEEE PerCom. Submitted papers should be no longer than 6 pages in length, and formatted to 2 columns, 10pt fonts, using the IEEE Computer Society 8.5" x 11" authors kit. Papers should be submitted via the EasyChair PerNEM 2010 page: http://www.easychair.org/conferences/?conf=pernem2010 Paper submission: September 28, 2009 Author notification: December 21, 2009 Camera-ready due: January 29, 2010 Organising Committee -------------------- Erol Gelenbe Intelligent Systems & Networks Group, Imperial College London, UK Georgia Sakellari Intelligent Systems & Networks Group, Imperial College London, UK Avgoustinos Filippoupolitis Intelligent Systems & Networks Group, Imperial College London, UK Programme Committee ------------------- Christoforos Anagnostopoulos Institute for Mathematical Sciences, Imperial College London, UK Gokce Gorbil Intelligent Systems & Networks Group, Imperial College London, UK Alex Healing Centre for Information and Security Systems Research, British Telecom,UK Laurence Hey Intelligent Systems & Networks Group, Imperial College London, UK Eleni Karatza Department of Informatics, Aristotle University of Thessaloniki, Greece Georgios Loukas Intelligent Systems & Networks Group, Imperial College London, UK Gulay Oke Istanbul Technical University, Turkey Alex Rogers Intelligence, Agents, Multimedia Group, University of Southampton, UK Simone Silvestri University of Rome "La Sapienza", Italy Oliver Smith General Dynamics UK Ltd From fu at cs.uni-goettingen.de Tue Sep 29 21:20:41 2009 From: fu at cs.uni-goettingen.de (Xiaoming Fu) Date: Tue, 29 Sep 2009 21:20:41 -0700 Subject: [e2e] CCW'09 call for participation In-Reply-To: <4AC12DC2.6080807@cs.uni-goettingen.de> References: <4AC12DC2.6080807@cs.uni-goettingen.de> Message-ID: <4AC2DC99.7070508@cs.uni-goettingen.de> [We apologize if you receive multiple copies of this message.] Dear colleagues, We cordially invite you to join us in Cranwell Resort Lenox, MA, USA, 10/18/2009 to 10/21/2009 for the 23rd IEEE Computer Communications Workshop (CCW 2009). CCW is the annual flagship workshop of the IEEE Communications Society's Technical Committee on Computer Communications (TCCC). It is a panel-based workshop with informal, interactive sessions exploring emerging issues and trends in networking and computer communications. Please kindly note: - The workshop program spans 2.5 days from Monday to Wednesday noon: see http://www.comsoc.org/~tccc/ccw/2009/program.htm - Early registration deadline: Oct 1, 2009 see http://www.comsoc.org/~tccc/ccw/2009/registration.htm - Hotel reservation cut-off due: Oct 5, 2009 see http://www.comsoc.org/~tccc/ccw/2009/venue.htm TCCC welcomes new members and encourages anyone interested to join and participate in our technical activities. Joining TCCC only requires that you subscribe to the TCCC mailing list. Instructions on how to join our mailing list can be found under the Mailing List heading of the TCCC Home Page: http://www.comsoc.org/~tccc/ In case you need more information on TCCC, feel free to contact any of the current TCCC officers. We look forward to seeing you soon in Lenox, MA! Best regards, Xiaoming