From touch at ISI.EDU  Fri Dec  1 15:29:42 2006
From: touch at ISI.EDU (Joe Touch)
Date: Fri, 01 Dec 2006 15:29:42 -0800
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <1164699864.2453.31.camel@strangepork>
References: <E1Gnd3b-0004eT-00@mta1.cl.cam.ac.uk> <45687401.7020308@web.de>
	<1164699864.2453.31.camel@strangepork>
Message-ID: <4570BAE6.7000303@isi.edu>

Christian Kreibich wrote:
> Hi Detlef,
> 
> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote:
>> just a very spontaneous, perhaps stupid, question: What is the 
>> difference between "packet symmetry" and the well known principle of 
>> packet conservation here? Aren?t these ideas at least quite similar?
> 
> the packet conservation principle states that in a steady-state TCP
> flow, a new packet is not to enter the network before another one has
> left

That "packet conservation principle" already has a (perhaps not as
well-known, but certainly worth knowing) name: 'isarithmic', and was
proposed by Davies in 1972.

"Packet symmetry" appears to be per-NIC isarithmic.

The "packet conservation principle" is single-protocol isarithmic.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061201/c141ada7/signature.bin

From Jon.Crowcroft at cl.cam.ac.uk  Sat Dec  2 03:26:02 2006
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Sat, 02 Dec 2006 11:26:02 +0000
Subject: [e2e] trading acks...TRACKS
In-Reply-To: Message from Joe Touch <touch@ISI.EDU> 
	of "Fri, 01 Dec 2006 15:29:42 PST." <4570BAE6.7000303@isi.edu> 
Message-ID: <E1GqT0P-00067B-00@mta1.cl.cam.ac.uk>

the isarithmic flow control stuff was very nice - i remmeber reading it, but have never
recently mamnaged to find the actual reference - do you have the proper citation for
davies' idea? - we should make sure we know it!

In missive <4570BAE6.7000303 at isi.edu>, Joe Touch typed:

 >>This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
 >>--------------enig4E3DED50FCEAFA2E99B7EC8E
 >>Content-Type: text/plain; charset=ISO-8859-1
 >>Content-Transfer-Encoding: quoted-printable
 >>
 >>Christian Kreibich wrote:
 >>> Hi Detlef,
 >>>=20
 >>> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote:
 >>>> just a very spontaneous, perhaps stupid, question: What is the=20
 >>>> difference between "packet symmetry" and the well known principle of=20
 >>>> packet conservation here? Aren=B4t these ideas at least quite similar?=
 >>
 >>>=20
 >>> the packet conservation principle states that in a steady-state TCP
 >>> flow, a new packet is not to enter the network before another one has
 >>> left
 >>
 >>That "packet conservation principle" already has a (perhaps not as
 >>well-known, but certainly worth knowing) name: 'isarithmic', and was
 >>proposed by Davies in 1972.
 >>
 >>"Packet symmetry" appears to be per-NIC isarithmic.
 >>
 >>The "packet conservation principle" is single-protocol isarithmic.
 >>
 >>Joe
 >>
 >>--=20
 >>----------------------------------------
 >>Joe Touch
 >>Sr. Network Engineer, USAF TSAT Space Segment
 >>
 >>
 >>
 >>
 >>--------------enig4E3DED50FCEAFA2E99B7EC8E
 >>Content-Type: application/pgp-signature; name="signature.asc"
 >>Content-Description: OpenPGP digital signature
 >>Content-Disposition: attachment; filename="signature.asc"
 >>
 >>-----BEGIN PGP SIGNATURE-----
 >>Version: GnuPG v1.4.3 (MingW32)
 >>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
 >>
 >>iD8DBQFFcLrmE5f5cImnZrsRAvyMAKDE4E/HA6a3dE6V6tIQ/EQosrFd7wCg7nMZ
 >>l++51oFUPmw9COfh/Sz9tdA=
 >>=Sdoq
 >>-----END PGP SIGNATURE-----
 >>
 >>--------------enig4E3DED50FCEAFA2E99B7EC8E--
 >>

 cheers

   jon


From svp+ at cs.cmu.edu  Sat Dec  2 09:29:14 2006
From: svp+ at cs.cmu.edu (Swapnil V. Patil)
Date: Sat, 2 Dec 2006 12:29:14 -0500 (EST)
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <E1GqT0P-00067B-00@mta1.cl.cam.ac.uk>
Message-ID: <Pine.LNX.4.33L.0612021223140.9115-100000@linux1.gp.cs.cmu.edu>

On Sat, 2 Dec 2006, Jon Crowcroft wrote:

> the isarithmic flow control stuff was very nice - i remmeber reading it, but have never
> recently mamnaged to find the actual reference - do you have the proper citation for
> davies' idea? - we should make sure we know it!
>

I think the main paper is ...

"The control of congestion in packet switching networks"
by Donald W. Davies
http://portal.acm.org/citation.cfm?id=811052

Thanks
-swapnil


> In missive <4570BAE6.7000303 at isi.edu>, Joe Touch typed:
>
>  >>This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
>  >>--------------enig4E3DED50FCEAFA2E99B7EC8E
>  >>Content-Type: text/plain; charset=ISO-8859-1
>  >>Content-Transfer-Encoding: quoted-printable
>  >>
>  >>Christian Kreibich wrote:
>  >>> Hi Detlef,
>  >>>=20
>  >>> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote:
>  >>>> just a very spontaneous, perhaps stupid, question: What is the=20
>  >>>> difference between "packet symmetry" and the well known principle of=20
>  >>>> packet conservation here? Aren=B4t these ideas at least quite similar?=
>  >>
>  >>>=20
>  >>> the packet conservation principle states that in a steady-state TCP
>  >>> flow, a new packet is not to enter the network before another one has
>  >>> left
>  >>
>  >>That "packet conservation principle" already has a (perhaps not as
>  >>well-known, but certainly worth knowing) name: 'isarithmic', and was
>  >>proposed by Davies in 1972.
>  >>
>  >>"Packet symmetry" appears to be per-NIC isarithmic.
>  >>
>  >>The "packet conservation principle" is single-protocol isarithmic.
>  >>
>  >>Joe
>  >>
>  >>--=20
>  >>----------------------------------------
>  >>Joe Touch
>  >>Sr. Network Engineer, USAF TSAT Space Segment
>  >>
>  >>
>  >>
>  >>
>  >>--------------enig4E3DED50FCEAFA2E99B7EC8E
>  >>Content-Type: application/pgp-signature; name="signature.asc"
>  >>Content-Description: OpenPGP digital signature
>  >>Content-Disposition: attachment; filename="signature.asc"
>  >>
>  >>-----BEGIN PGP SIGNATURE-----
>  >>Version: GnuPG v1.4.3 (MingW32)
>  >>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>  >>
>  >>iD8DBQFFcLrmE5f5cImnZrsRAvyMAKDE4E/HA6a3dE6V6tIQ/EQosrFd7wCg7nMZ
>  >>l++51oFUPmw9COfh/Sz9tdA=
>  >>=Sdoq
>  >>-----END PGP SIGNATURE-----
>  >>
>  >>--------------enig4E3DED50FCEAFA2E99B7EC8E--
>  >>
>
>  cheers
>
>    jon
>
>
>


From L.Wood at surrey.ac.uk  Sat Dec  2 18:53:19 2006
From: L.Wood at surrey.ac.uk (L.Wood@surrey.ac.uk)
Date: Sun, 3 Dec 2006 02:53:19 -0000
Subject: [e2e] trading acks...TRACKS
Message-ID: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>

If a packet can't enter the network until one has left, how do
you ever get started in an empty totally quiet network? Simple
reductio ad absurdum suggests that the packet conservation
principle as expressed below is bogus. Not so much isarithmic,
as isacrock.

However, packet conservation through a router is something that
can be aspired to, under limited conditions - thinking about
a networking analogue of Kirchoff's electrical laws through the
router as a point can actually be useful, too.

L.

odd to see someone actually mention they're working on TSAT...

<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at surrey.ac.uk>


-----Original Message-----
From: end2end-interest-bounces at postel.org on behalf of Joe Touch
Sent: Fri 2006-12-01 23:29
To: Christian Kreibich
Cc: Jon Crowcroft; end2end-interest at postel.org
Subject: Re: [e2e] trading acks...TRACKS
 
Christian Kreibich wrote:
> Hi Detlef,
> 
> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote:
>> just a very spontaneous, perhaps stupid, question: What is the 
>> difference between "packet symmetry" and the well known principle of 
>> packet conservation here? Aren?t these ideas at least quite similar?
> 
> the packet conservation principle states that in a steady-state TCP
> flow, a new packet is not to enter the network before another one has
> left

That "packet conservation principle" already has a (perhaps not as
well-known, but certainly worth knowing) name: 'isarithmic', and was
proposed by Davies in 1972.

"Packet symmetry" appears to be per-NIC isarithmic.

The "packet conservation principle" is single-protocol isarithmic.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/b9b0feee/attachment.html

From touch at ISI.EDU  Sat Dec  2 19:07:12 2006
From: touch at ISI.EDU (Joe Touch)
Date: Sat, 02 Dec 2006 19:07:12 -0800
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
Message-ID: <45723F60.5020903@isi.edu>


L.Wood at surrey.ac.uk wrote:
> If a packet can't enter the network until one has left, how do
> you ever get started in an empty totally quiet network? Simple
> reductio ad absurdum suggests that the packet conservation
> principle as expressed below is bogus. Not so much isarithmic,
> as isacrock.

I didn't say it was a great idea; just that it had a name. ;-)
But you can bootstrap such a situation; token rings do it all the time.

> However, packet conservation through a router is something that
> can be aspired to, under limited conditions - thinking about
> a networking analogue of Kirchoff's electrical laws through the
> router as a point can actually be useful, too.

I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense
to create/destroy electrons without a source/sink; the same is not true
for packets.

> L.
> 
> odd to see someone actually mention they're working on TSAT...
>
> <http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at surrey.ac.uk>
> 
> 
> 
> -----Original Message-----
> From: end2end-interest-bounces at postel.org on behalf of Joe Touch
> Sent: Fri 2006-12-01 23:29
> To: Christian Kreibich
> Cc: Jon Crowcroft; end2end-interest at postel.org
> Subject: Re: [e2e] trading acks...TRACKS
> 
> Christian Kreibich wrote:
>> Hi Detlef,
>>
>> On Sat, 2006-11-25 at 17:49 +0100, Detlef Bosau wrote:
>>> just a very spontaneous, perhaps stupid, question: What is the
>>> difference between "packet symmetry" and the well known principle of
>>> packet conservation here? Aren?t these ideas at least quite similar?
>>
>> the packet conservation principle states that in a steady-state TCP
>> flow, a new packet is not to enter the network before another one has
>> left
> 
> That "packet conservation principle" already has a (perhaps not as
> well-known, but certainly worth knowing) name: 'isarithmic', and was
> proposed by Davies in 1972.
> 
> "Packet symmetry" appears to be per-NIC isarithmic.
> 
> The "packet conservation principle" is single-protocol isarithmic.
> 
> Joe
> 
> --
> ----------------------------------------
> Joe Touch
> Sr. Network Engineer, USAF TSAT Space Segment
> 
> 
> 
> 

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061202/d66fcc47/signature.bin

From m.musolesi at cs.ucl.ac.uk  Sun Dec  3 01:47:18 2006
From: m.musolesi at cs.ucl.ac.uk (Mirco Musolesi)
Date: Sun, 03 Dec 2006 09:47:18 +0000
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <45723F60.5020903@isi.edu>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
	<45723F60.5020903@isi.edu>
Message-ID: <45729D26.4010505@cs.ucl.ac.uk>


>> However, packet conservation through a router is something that
>> can be aspired to, under limited conditions - thinking about
>> a networking analogue of Kirchoff's electrical laws through the
>> router as a point can actually be useful, too.
 >
> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense
> to create/destroy electrons without a source/sink; the same is not true
> for packets.

You may think to a representation of the network with connections to the 
"ground" or a "voltage source" for each router to represent/quantify 
packets that are created/lost in it.

Mirco


-- 
Mirco Musolesi
Dept. of Computer Science, University College London
Gower Street London WC1E 6BT United Kingdom
Phone: +44 20 7679 0391 Fax: +44 20 7387 1397
Web: http://www.cs.ucl.ac.uk/staff/m.musolesi

From touch at ISI.EDU  Sun Dec  3 11:45:27 2006
From: touch at ISI.EDU (Joe Touch)
Date: Sun, 03 Dec 2006 11:45:27 -0800
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <45729D26.4010505@cs.ucl.ac.uk>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
	<45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk>
Message-ID: <45732957.1050908@isi.edu>


Mirco Musolesi wrote:
> 
>>> However, packet conservation through a router is something that
>>> can be aspired to, under limited conditions - thinking about
>>> a networking analogue of Kirchoff's electrical laws through the
>>> router as a point can actually be useful, too.
>>
>> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense
>> to create/destroy electrons without a source/sink; the same is not true
>> for packets.
> 
> You may think to a representation of the network with connections to the
> "ground" or a "voltage source" for each router to represent/quantify
> packets that are created/lost in it.

Right - but then Kirchoff's laws don't apply unless that connection to
ground has some impedence (otherwise the entire net is grounded). What's
the impedence of a router? :-) I.e., it's dynamic (which is OK) and
content-sensitive (which seems hard to model).

Joe
-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/f01802a6/signature.bin

From L.Wood at surrey.ac.uk  Sun Dec  3 14:10:02 2006
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Sun, 03 Dec 2006 22:10:02 +0000
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <45732957.1050908@isi.edu>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
	<45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk>
	<45732957.1050908@isi.edu>
Message-ID: <200612032210.WAA06009@cisco.com>

At Sunday 03/12/2006 11:45 -0800, Joe Touch wrote:
>Mirco Musolesi wrote:
>> 
>>>> However, packet conservation through a router is something that
>>>> can be aspired to, under limited conditions - thinking about
>>>> a networking analogue of Kirchoff's electrical laws through the
>>>> router as a point can actually be useful, too.
>>>
>>> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense
>>> to create/destroy electrons without a source/sink; the same is not true
>>> for packets.
>> 
>> You may think to a representation of the network with connections to the
>> "ground" or a "voltage source" for each router to represent/quantify
>> packets that are created/lost in it.
>
>Right - but then Kirchoff's laws don't apply unless that connection to
>ground has some impedence 

resistance! (The impedance has to have a real part, otherwise everything's at ground.)

>(otherwise the entire net is grounded). What's
>the impedence of a router? :-) I.e., it's dynamic (which is OK) and
>content-sensitive (which seems hard to model).

The analogy would be that the router's input and output impedances are frequency-sensitive, and the content's sent at different frequencies (ports/addresses/QoS/whatever).

Talking of "impedance mismatches" between fat and thin pipes is quite common.

L.

From avg at kotovnik.com  Sun Dec  3 14:32:18 2006
From: avg at kotovnik.com (Vadim Antonov)
Date: Sun, 3 Dec 2006 14:32:18 -0800 (PST)
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <200612032210.WAA06009@cisco.com>
Message-ID: <Pine.LNX.4.44.0612031422230.21766-100000@gato.kotovnik.com>


> >>>> However, packet conservation through a router is something that
> >>>> can be aspired to, under limited conditions - thinking about
> >>>> a networking analogue of Kirchoff's electrical laws through the
> >>>> router as a point can actually be useful, too.

This is all nice and dandy but don't forget that the network has N kinds 
of "electrons", which are attracted to N end-points independently of each 
other.

The reason why Kirchoff's laws work for steady-state electrical networks 
(or networks of pipes carrying gasses) is that all particles are the same, 
and repel each other.

Packets can be made to "repel" each other, but making them drift in any 
direction where the "pressure" (or potential) is lower makes the network
useless - as it's whole purpose to move the packets to the destinations, 
and not to the places where there's no congestion.

--vadim


From christian.kreibich at cl.cam.ac.uk  Sun Dec  3 15:21:02 2006
From: christian.kreibich at cl.cam.ac.uk (Christian Kreibich)
Date: Sun, 03 Dec 2006 15:21:02 -0800
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <Pine.LNX.4.44.0612031422230.21766-100000@gato.kotovnik.com>
References: <Pine.LNX.4.44.0612031422230.21766-100000@gato.kotovnik.com>
Message-ID: <1165188062.16726.39.camel@strangepork>

On Sun, 2006-12-03 at 14:32 -0800, Vadim Antonov wrote:
> Packets can be made to "repel" each other, but making them drift in any 
> direction where the "pressure" (or potential) is lower makes the network
> useless - as it's whole purpose to move the packets to the destinations, 
> and not to the places where there's no congestion.

This is converging on field theory again...
http://www.cl.cam.ac.uk/~jac22/talks/fields.pdf

-- 
Cheers,
Christian.


From L.Wood at surrey.ac.uk  Sun Dec  3 16:47:39 2006
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Mon, 04 Dec 2006 00:47:39 +0000
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <Pine.LNX.4.44.0612031422230.21766-100000@gato.kotovnik.com
 >
References: <200612032210.WAA06009@cisco.com>
	<Pine.LNX.4.44.0612031422230.21766-100000@gato.kotovnik.com>
Message-ID: <200612040048.AAA12128@cisco.com>

At Sunday 03/12/2006 14:32 -0800, Vadim Antonov wrote:
>Packets can be made to "repel" each other,

that's the norm (packets occupying discrete space/time), unless you use network coding to 'attract' and 'entangle' packets.

L. 

From m.musolesi at cs.ucl.ac.uk  Sun Dec  3 17:59:33 2006
From: m.musolesi at cs.ucl.ac.uk (Mirco Musolesi)
Date: Mon, 04 Dec 2006 01:59:33 +0000
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <45732957.1050908@isi.edu>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
	<45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk>
	<45732957.1050908@isi.edu>
Message-ID: <45738105.3040803@cs.ucl.ac.uk>


>> You may think to a representation of the network with connections to the
>> "ground" or a "voltage source" for each router to represent/quantify
>> packets that are created/lost in it.
> 
> Right - but then Kirchoff's laws don't apply unless that connection to
> ground has some impedence (otherwise the entire net is grounded). What's
> the impedence of a router? :-) I.e., it's dynamic (which is OK) and
> content-sensitive (which seems hard to model).

Yes, I agree, I was implicitly assuming that you have some resistence to 
model the packet loss/creation (that can change over time).

You may also think to model a router with a sort of building block 
composed of more complex circuitry to deal with different types of 
traffic...

Mirco

-- 
Mirco Musolesi
Dept. of Computer Science, University College London
Gower Street London WC1E 6BT United Kingdom
Phone: +44 20 7679 0391 Fax: +44 20 7387 1397
Web: http://www.cs.ucl.ac.uk/staff/m.musolesi

From touch at ISI.EDU  Sun Dec  3 19:07:29 2006
From: touch at ISI.EDU (Joe Touch)
Date: Sun, 03 Dec 2006 19:07:29 -0800
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <200612032210.WAA06009@cisco.com>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
	<45723F60.5020903@isi.edu> <45729D26.4010505@cs.ucl.ac.uk>
	<45732957.1050908@isi.edu> <200612032210.WAA06009@cisco.com>
Message-ID: <457390F1.1080608@isi.edu>


Lloyd Wood wrote:
> At Sunday 03/12/2006 11:45 -0800, Joe Touch wrote:
>> Mirco Musolesi wrote:
>>>>> However, packet conservation through a router is something that
>>>>> can be aspired to, under limited conditions - thinking about
>>>>> a networking analogue of Kirchoff's electrical laws through the
>>>>> router as a point can actually be useful, too.
>>>> I'm not sure Kirchoff's laws are applicable here. It wouldn't make sense
>>>> to create/destroy electrons without a source/sink; the same is not true
>>>> for packets.
>>> You may think to a representation of the network with connections to the
>>> "ground" or a "voltage source" for each router to represent/quantify
>>> packets that are created/lost in it.
>> Right - but then Kirchoff's laws don't apply unless that connection to
>> ground has some impedence 
> 
> resistance! (The impedance has to have a real part, otherwise everything's at ground.)

Kirchoff's laws work for impedance too ;-)

>> (otherwise the entire net is grounded). What's
>> the impedence of a router? :-) I.e., it's dynamic (which is OK) and
>> content-sensitive (which seems hard to model).
> 
> The analogy would be that the router's input and output impedances are frequency-sensitive, and the content's sent at different frequencies (ports/addresses/QoS/whatever).
> 
> Talking of "impedance mismatches" between fat and thin pipes is quite common.

Agreed, which is why impedance is what I thought of. Resistive
Kirchoff's nets don't make sense here, IMO.

Joe

-- 
----------------------------------------
Joe Touch
Sr. Network Engineer, USAF TSAT Space Segment

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061203/44047b3a/signature.bin

From detlef.bosau at web.de  Mon Dec  4 05:39:52 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Mon, 04 Dec 2006 14:39:52 +0100
Subject: [e2e] trading acks...TRACKS
In-Reply-To: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
References: <603BF90EB2E7EB46BF8C226539DFC20701316A8D@EVS-EC1-NODE1.surrey.ac.uk>
Message-ID: <45742528.8030007@web.de>

O.k., let?s carry cowls to Newcastle :-)
(I know, I better should arrange for a trip to the north pole for the 
next six weeks after sending this post because it?s so stupid.)
(BTW:  Kind regards from "Rockin? Rudy", the little McDonald?s reindeer, 
I bought some years ago.)

L.Wood at surrey.ac.uk wrote:
>
> If a packet can't enter the network until one has left, how do
> you ever get started in an empty totally quiet network? Simple
> reductio ad absurdum suggests that the packet conservation
> principle as expressed below is bogus. Not so much isarithmic,
> as isacrock.
>

I personally compare this whole thing to energy as it is kept in a 
dynamic system. So, the conservation principle basically means nothing 
else then the energy in this system should be kept constant.

So, you have two issues here:
1. Keep the amount of engery constant => don?t add energy to the system 
before the system has completed some work, i.e. energy has left the system.
2. The question is: How much energy can a system keep?

The second issue is addressed by a) probing which yields b) an estimator 
for a path?s capacity, i.e. CWND.
So, you don?t have a "strong" isarithmic system: You can add workload 
(=emergy) as long as it can be kept and is not dropped by some router.

However, it?s a problem to  have exact system theoretical model of the 
Internet or even a single TCP connection. And I even don?t really know 
where this should be good for. Perhaps for some interesting calculus 
calisthenics which are interesting for some papers or even some PhD theses.
But at least the models I know of are far too much away from a real 
packet switching network to be really useful.

In my opinion, the most basic reasons for the Internet to work 
acceptable are in fact  1. the conservation principle, which ensures 
that the workload in the net is not increased in an "unreasonable" way, 
but 2. there is some reasonable probing (basically the AIMD probing) and 
particularly, if the path?s capacitor estimation turns to be to large, 
it is decreased - and anything is fine.

So, the Internet works fine with no real congestion collapse. (The most 
prominent oscillating system suffering from some special case of 
congestion collapse is perhaps the Takoma bridge disaster 
http://www.ketchum.org/bridgecollapse.html)

O.k., it?s not much wisdom in what I write here.

Mainly, I doubt these extremely sophisticated models.

Personally, I think mostly of the Takoma bridge - which would still be 
there if only someone hat limited the energy ;-) - and Newton?s cradle 
when I try to understand stability issues in the Internet. The latter is 
particularly descriptive as the number of balls visualizes the workload. 
You can imagine adding a ball as long there is room in the cradle or 
taking away a ball, it?s funny :-)


Detlef


From anil at cmmacs.ernet.in  Tue Dec 12 20:32:07 2006
From: anil at cmmacs.ernet.in (V Anil Kumar)
Date: Wed, 13 Dec 2006 10:02:07 +0530 (IST)
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
Message-ID: <Pine.LNX.4.44.0612130958100.28208-100000@cmm2.cmmacs.ernet.in>


We are searching for any known techniques to continuously sample (say at
every 100 msec interval) the buffer occupancy of router interfaces. The
requirement is to extract or estimate the instantaneous value of the
number of packets or bytes in the router buffer from another machine in
the network, and not the maximum possible router buffer size.

Any suggestion, advice or pointer to literature on this?

Thanks in advance.

Anil   


From craig at aland.bbn.com  Wed Dec 13 10:54:41 2006
From: craig at aland.bbn.com (Craig Partridge)
Date: Wed, 13 Dec 2006 13:54:41 -0500
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: Your message of "Wed, 13 Dec 2006 10:02:07 +0530."
	<Pine.LNX.4.44.0612130958100.28208-100000@cmm2.cmmacs.ernet.in> 
Message-ID: <20061213185441.34AFF64@aland.bbn.com>


Queue sizes are standard SNMP variables and thus could be sampled at
these intervals.  But it looks as if you want the queues on a per host
basis?

Craig

In message <Pine.LNX.4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
nil Kumar writes:

>
>We are searching for any known techniques to continuously sample (say at
>every 100 msec interval) the buffer occupancy of router interfaces. The
>requirement is to extract or estimate the instantaneous value of the
>number of packets or bytes in the router buffer from another machine in
>the network, and not the maximum possible router buffer size.
>
>Any suggestion, advice or pointer to literature on this?
>
>Thanks in advance.
>
>Anil   

From detlef.bosau at web.de  Wed Dec 13 11:50:21 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 13 Dec 2006 20:50:21 +0100
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <20061213185441.34AFF64@aland.bbn.com>
References: <20061213185441.34AFF64@aland.bbn.com>
Message-ID: <4580597D.7060901@web.de>

Craig Partridge wrote:
> Queue sizes are standard SNMP variables and thus could be sampled at
>   

Hm. Do I get sampled SNMP variables timely?

Perhaps, the poor samples shall undergo some nasty treatment (wavelet 
transformation or something similiar) and I?m not quite sure whether 
SNMP queries might cause a too large jitter in the sampled time series?

Detlef


From crovella at cs.bu.edu  Wed Dec 13 12:06:09 2006
From: crovella at cs.bu.edu (Mark Crovella)
Date: Wed, 13 Dec 2006 15:06:09 -0500
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <20061213185441.34AFF64@aland.bbn.com>
Message-ID: <0511C607B17F804EBE96FFECD1FD9859F608CC@cs-exs2.cs-nt.bu.edu>

Hi Craig,

What MIB provides queue sizes?  I am not sure that 'standard' is the
right word when talking about MIBs :).   Maybe some MIBs provide queue
sizes but the one most commonly used in backbone routers (MIB-II, RFC
1213) doesn't provide instantaneous queue lengths as far as I know.  If
I am wrong, please correct me.

People have used measures of queuing delay to infer queue lengths.   I
can't think of a paper that focuses on this, but the general idea is
that queue length in bytes has a relationship to queueing delay and link
bandwidth (factoring in other sources of delays inside routers).

On a sort-of related topic we had a paper a while ago that tried to
estimate queue lengths during packet loss events (ie, to estimate buffer
sizes or RED parameters).  It is 

Jun Liu, Mark E. Crovella (2001).
Using Loss Pairs to Discover Network Properties.
In: Proceedings of the ACM SIGCOMM Internet Measurement Workshop 2001.
pp. 127--138.
http://www.cs.bu.edu/faculty/crovella/paper-archive/imw-losspairs.pdf

- Mark

> -----Original Message-----
> From: end2end-interest-bounces at postel.org 
> [mailto:end2end-interest-bounces at postel.org] On Behalf Of 
> Craig Partridge
> Sent: Wednesday, December 13, 2006 1:55 PM
> To: V Anil Kumar
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Extracting No. of packets or bytes in a 
> router buffer
> 
> 
> Queue sizes are standard SNMP variables and thus could be 
> sampled at these intervals.  But it looks as if you want the 
> queues on a per host basis?
> 
> Craig
> 
> In message 
> <Pine.LNX.4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in
> >, V A nil Kumar writes:
> 
> >
> >We are searching for any known techniques to continuously 
> sample (say 
> >at every 100 msec interval) the buffer occupancy of router 
> interfaces. 
> >The requirement is to extract or estimate the instantaneous value of 
> >the number of packets or bytes in the router buffer from another 
> >machine in the network, and not the maximum possible router 
> buffer size.
> >
> >Any suggestion, advice or pointer to literature on this?
> >
> >Thanks in advance.
> >
> >Anil   
> 
> 


From craig at aland.bbn.com  Wed Dec 13 12:38:56 2006
From: craig at aland.bbn.com (Craig Partridge)
Date: Wed, 13 Dec 2006 15:38:56 -0500
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: Your message of "Wed, 13 Dec 2006 15:06:09 EST."
	<0511C607B17F804EBE96FFECD1FD9859F608CC@cs-exs2.cs-nt.bu.edu> 
Message-ID: <20061213203856.2941664@aland.bbn.com>


In message <0511C607B17F804EBE96FFECD1FD9859F608CC at cs-exs2.cs-nt.bu.edu>, "Mark

>What MIB provides queue sizes?  I am not sure that 'standard' is the
>right word when talking about MIBs :).   Maybe some MIBs provide queue
>sizes but the one most commonly used in backbone routers (MIB-II, RFC
>1213) doesn't provide instantaneous queue lengths as far as I know.  If
>I am wrong, please correct me.

Hi Mark:

>From MIB-II, p. 23:

          ifOutQLen OBJECT-TYPE
              SYNTAX  Gauge
              ACCESS  read-only
              STATUS  mandatory
              DESCRIPTION
                      "The length of the output packet queue (in
                      packets)." 
              ::= { ifEntry 21 }

If I remember correctly, we put it there in MIB-I.  Note this is per
output interface.

[chair, IETF MIB-I Working Group... not something I advertise widely]

Craig

From jon.kare.hellan at uninett.no  Thu Dec 14 04:35:52 2006
From: jon.kare.hellan at uninett.no (Jon K Hellan)
Date: Thu, 14 Dec 2006 13:35:52 +0100
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <20061213203856.2941664@aland.bbn.com>
References: <20061213203856.2941664@aland.bbn.com>
Message-ID: <45814528.4070901@uninett.no>

Craig Partridge wrote:
>           ifOutQLen OBJECT-TYPE
>               SYNTAX  Gauge
>               ACCESS  read-only
>               STATUS  mandatory
>               DESCRIPTION
>                       "The length of the output packet queue (in
>                       packets)." 
>               ::= { ifEntry 21 }
> 
> If I remember correctly, we put it there in MIB-I.  Note this is per
> output interface.

The problem is scheduling of snmp polling in the router. It is not at 
all unlikely that the router will defer this task until it doesn't have 
anything more important to do, like forwarding packets! If so, the 
counters are going to report empty queues most of the time.

Jon K?re Hellan


From Olav.Kvittem at uninett.no  Thu Dec 14 05:45:19 2006
From: Olav.Kvittem at uninett.no (Olav.Kvittem@uninett.no)
Date: Thu, 14 Dec 2006 14:45:19 +0100
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: Message from V Anil Kumar <anil@cmmacs.ernet.in> of "Wed,
	13 Dec 2006 10:02:07 +0530."
	<Pine.LNX.4.44.0612130958100.28208-100000@cmm2.cmmacs.ernet.in>
Message-ID: <20061214134519.525A387C6A@tyholt.uninett.no>


anil at cmmacs.ernet.in said:
> We are searching for any known techniques to continuously sample (say at
> every 100 msec interval) the buffer occupancy of router interfaces. 

We did make a tool that could poll that fast and do accurate timestamps 
a few years ago and discovered that the routers
did not update theirs MIB's that often. Some platforms would not copy 
statistics from the interface cards more often than a few (5)
seconds. Single processor architectures though seemed to have subsecond
resolution.

>The
> requirement is to extract or estimate the instantaneous value of the number
> of packets or bytes in the router buffer from another machine in the network,
> and not the maximum possible router buffer size.

> Any suggestion, advice or pointer to literature on this?

did not publish, but the perl-scripts are intact.

cheers
Olav


From jsommers at cs.wisc.edu  Thu Dec 14 06:36:02 2006
From: jsommers at cs.wisc.edu (Joel Sommers)
Date: Thu, 14 Dec 2006 08:36:02 -0600
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <20061214134519.525A387C6A@tyholt.uninett.no>
References: <20061214134519.525A387C6A@tyholt.uninett.no>
Message-ID: <4C50B963-502E-4379-8790-20F261717B18@cs.wisc.edu>

> anil at cmmacs.ernet.in said:
>> We are searching for any known techniques to continuously sample  
>> (say at
>> every 100 msec interval) the buffer occupancy of router interfaces.
>
> We did make a tool that could poll that fast and do accurate  
> timestamps
> a few years ago and discovered that the routers
> did not update theirs MIB's that often. Some platforms would not copy
> statistics from the interface cards more often than a few (5)
> seconds. Single processor architectures though seemed to have  
> subsecond
> resolution.

For example, with a Cisco GSR the update interval (line card to  
router processor) is about 10 seconds (at least for line cards I've  
measured).

For the measurements we did for "Sizing Router Buffers" (Appenzeller,  
et al., SIGCOMM 2004), we read the queue length values directly from  
the line card (i.e., we opened an IOS session on the line card itself  
and periodically polled the counter).


Joel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2380 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061214/24c22e4a/smime.bin

From fred at cisco.com  Wed Dec 13 12:16:41 2006
From: fred at cisco.com (Fred Baker)
Date: Wed, 13 Dec 2006 12:16:41 -0800
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <20061213185441.34AFF64@aland.bbn.com>
References: <20061213185441.34AFF64@aland.bbn.com>
Message-ID: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>

You're talking about ifOutQLen. It was originally proposed in RFC  
1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573  
1994). The reason it was deprecated is not documented, but the  
fundamental issue is that it is non-trivial to calculate and is very  
ephemeral.

The big issue in calculating it is that it is rarely exactly one  
queue. Consider a simple case on simple hardware available in 1994.

    +----------+ |
    |          | |
    |  CPU     +-+
    |          | |
    +----------+ | BUS
                 |
    +----------+ | +---------+
    |          | +-+ LANCE   |
    |          | | +---------+
    |  DRAM    +-+
    |          | | +---------+
    |          | +-+ LANCE   |
    +----------+ | +---------+

I'm using the term "bus" in the most general possible sense - some  
way for the various devices to get to the common memory. This gets  
implemented many ways.

The AMD 7990 LANCE chip was and is a common Ethernet implementation.  
It has in front of it a ring in which one can describe up to 2^N  
messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea  
at any given time how many messages are waiting - it only knows  
whether it is working on one right now or is idle, and when switching  
from message to message it knows whether the next slot it considers  
contains a message. So it can't keep such a counter. The device  
driver similarly has a limited view; it might know how many it has  
put in and how many it has taken out again, but it doesn't know  
whether the LANCE has perhaps completed some of the messages it  
hasn't taken out yet. So in the sense of the definition ("The length  
of the output packet queue (in packets)."), it doesn't know how many  
are still waiting. In addition, it is common for such queues or rings  
to be configured pretty small, with excess going into a diffserv- 
described set of software queues.

There are far more general problems. Cisco has a fast forwarding  
technology that we use on some of our midrange products that  
calculates when messages should be sent and schedules them in a  
common calendar queue. Every mumble time units, the traffic that  
should be sent during THIS time interval are picked up and dispersed  
to the various interfaces they need to go out. Hence, there isn't a  
single "output queue", but rather a commingled output schedule that  
shifts traffic to other output queues at various times - which in  
turn do something akin to what I described above.

Also, in modern equipment one often has forwarders and drivers on NIC  
cards rather than having some central processor do that. For  
management purposes, the drivers maintain their counts locally and  
periodically (perhaps once a second) upload the contents of those  
counters to a place where management can see them.

So when you ask "what is the current queue depth", I have to ask what  
the hardware has, what of that has already been spent but isn't  
cleaned up yet, what is in how many software queues, how they are  
organized, and whether that number has been put somewhere that  
management can see it.

Oh - did I mention encrypt/decrypt units, compressors, and other  
inline services that might have their own queues associated with them?

Yes, there is a definition on the books. I don't know that it answers  
the question.

On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:

>
> Queue sizes are standard SNMP variables and thus could be sampled at
> these intervals.  But it looks as if you want the queues on a per host
> basis?
>
> Craig
>
> In message <Pine.LNX. 
> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
> nil Kumar writes:
>
>>
>> We are searching for any known techniques to continuously sample  
>> (say at
>> every 100 msec interval) the buffer occupancy of router  
>> interfaces. The
>> requirement is to extract or estimate the instantaneous value of the
>> number of packets or bytes in the router buffer from another  
>> machine in
>> the network, and not the maximum possible router buffer size.
>>
>> Any suggestion, advice or pointer to literature on this?
>>
>> Thanks in advance.
>>
>> Anil

From algold at rnp.br  Thu Dec 14 12:05:01 2006
From: algold at rnp.br (Alexandre Grojsgold)
Date: Thu, 14 Dec 2006 18:05:01 -0200
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>
References: <20061213185441.34AFF64@aland.bbn.com>
	<41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>
Message-ID: <007c01c71fbb$245b3110$6d119330$@br>


> -----Original Message-----
> From: end2end-interest-bounces at postel.org [mailto:end2end-interest-
> bounces at postel.org] On Behalf Of Fred Baker
> Sent: quarta-feira, 13 de dezembro de 2006 18:17
> To: Craig Partridge
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
> 
> You're talking about ifOutQLen. It was originally proposed in RFC
> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
> 1994). The reason it was deprecated is not documented, but the
> fundamental issue is that it is non-trivial to calculate and is very
> ephemeral.
> 

Sorry, but...

I really did not understand.

Of course, it?s ephemeral. The link occupancy is also an ephemeral
information. During the sending of a packet, it?s 100% busy. Between two
packets, it?s utterly idle. It doesn?t mean it's not possible to get some
statistics out of the link usage. Like mean byte rate, 5 minute mean byte
rate, variance, and so on.

The same way it should be possible to get queue statistics out of each
outgoing interface in the router. 

I am really impressed to know it is so difficult to grab this kind of
information, since router manufacturers claim they can do magic with  queue
managing, like diffser, traffic shaping, priority queueing, etc... all of
this looking at the queues and making tricks with them.

It?s amazing.

-- Alexandre.


From fred at cisco.com  Thu Dec 14 13:26:27 2006
From: fred at cisco.com (Fred Baker)
Date: Thu, 14 Dec 2006 13:26:27 -0800
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <007c01c71fbb$245b3110$6d119330$@br>
References: <20061213185441.34AFF64@aland.bbn.com>
	<41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>
	<007c01c71fbb$245b3110$6d119330$@br>
Message-ID: <FC995E4F-4F4A-4E99-AB5D-E0DF69306AC9@cisco.com>


On Dec 14, 2006, at 12:05 PM, Alexandre Grojsgold wrote:
> Of course, it?s ephemeral.

I point that out because it is a fundamental criterion that the SNMP  
community has been using for a while. It is one thing to enable an  
NMS to read the configuration of a device (largely static) or read a  
counter (monotonically increasing, so that subsequent reads tell you  
what happened between the reads). ifOutQLen is a gauge, which is to  
say that it looks a lot like a random number in this context. In such  
a case, the SNMP community will generally suggest that the number is  
not all that meaningful.

> I am really impressed to know it is so difficult to grab this kind  
> of information, since router manufacturers claim they can do magic  
> with  queue managing, like diffser, traffic shaping, priority  
> queueing, etc... all of this looking at the queues and making  
> tricks with them.

Do I detect a note of sarcasm?

The point is what is known by whom at a particular time. A bit of  
code looking at a choice of queuing something locally or handing it  
to the next widget makes a pretty simple determination - when it  
tries to hand the datagram off the next widget accepts it or not, and  
if not, it does the local thing. "accepts" can have various meanings  
- it may actively reject it, or (more probably) has given permission  
to send some quantum and the quantum is used up. Looking at  
individual queues, one can do a lot of things such as you mention.

The hard part is in a distributed system (a system that has  
functionality on a variety of cards managed by a variety of  
communicating processes) to have a single overall view of the entire  
state of the process at exactly the time one wants to find the answer  
to the overall question. 

From denio at gprt.ufpe.br  Fri Dec 15 05:42:40 2006
From: denio at gprt.ufpe.br (Denio Mariz)
Date: Fri, 15 Dec 2006 11:42:40 -0200
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <007c01c71fbb$245b3110$6d119330$@br>
References: <20061213185441.34AFF64@aland.bbn.com>
	<41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>
	<007c01c71fbb$245b3110$6d119330$@br>
Message-ID: <dde9f3ed0612150542k623013d8o240e2a0f63c7bb4d@mail.gmail.com>

>
> I am really impressed to know it is so difficult to grab this kind of
> information, since router manufacturers claim they can do magic with  queue
> managing, like diffser, traffic shaping, priority queueing, etc... all of
> this looking at the queues and making tricks with them.
>
I'm supposing that all these magic are done through accessing the
information directly, internally in the router. So, I think the real
issue here is how frequently the counters are updated into the MIB for
external access.

Denio.

From limkt_2 at hotmail.com  Fri Dec 15 16:12:33 2006
From: limkt_2 at hotmail.com (Lim Kong Teong)
Date: Sat, 16 Dec 2006 00:12:33 +0000
Subject: [e2e] FLID_DL Simulation
In-Reply-To: <mailman.1.1164744000.22004.end2end-interest@postel.org>
Message-ID: <BAY23-F6502FB13310AAF40147ACD5CB0@phx.gbl>

Hi,

I conduct experiments on Flid-Dl, which I got from Digital
Fountain. We set the experiment as below:

1 Flid-DL session compete with 4 TCP sessions for 2.5Mb
bottleneck link. We use dumb bell topology.

The experiments run smoothly, however I got some
peculiar results.

1) When I check trace file, I find out Flid-Dl receiver
receive empty packet with zero packet size as below:

r 3.739743 1 3 fliddl 0 ------- 0 2.0 -2147483608.8888 -1 6094

I make one modification as below, but still get the same result.
Change: Packet* p = allocpkt(); to Packet* p = allocpkt(packet_payload_);

2) I calculate the packet received by the 4 TCP receivers using the
trace file, and suprisingly the throughput suggest that TCP flows
consumed all of the bottleneck bandwidth with total throughput
2.5 Mb. Then, I calculate the bottleneck link utilization, and got
the link utilization of 2.5 Mb.
However, the statistics from Flid suggest Flid session used approximately
0.5 Mb of bottleneck link.

Any explanation or suggestion please!

TQ.

Lim

_________________________________________________________________
Talk now to your Hotmail contacts with Windows Live Messenger. 
http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview


From L.Wood at surrey.ac.uk  Sat Dec 16 01:39:10 2006
From: L.Wood at surrey.ac.uk (Lloyd Wood)
Date: Sat, 16 Dec 2006 09:39:10 +0000
Subject: [e2e] FLID_DL Simulation
In-Reply-To: <BAY23-F6502FB13310AAF40147ACD5CB0@phx.gbl>
References: <mailman.1.1164744000.22004.end2end-interest@postel.org>
	<BAY23-F6502FB13310AAF40147ACD5CB0@phx.gbl>
Message-ID: <200612160939.JAA03848@cisco.com>

ask the ns-users list.

At Saturday 16/12/2006 00:12 +0000, Lim Kong Teong wrote:
>Hi,
>
>I conduct experiments on Flid-Dl, which I got from Digital
>Fountain. We set the experiment as below:
>
>1 Flid-DL session compete with 4 TCP sessions for 2.5Mb
>bottleneck link. We use dumb bell topology.
>
>The experiments run smoothly, however I got some
>peculiar results.
>
>1) When I check trace file, I find out Flid-Dl receiver
>receive empty packet with zero packet size as below:
>
>r 3.739743 1 3 fliddl 0 ------- 0 2.0 -2147483608.8888 -1 6094
>
>I make one modification as below, but still get the same result.
>Change: Packet* p = allocpkt(); to Packet* p = allocpkt(packet_payload_);
>
>2) I calculate the packet received by the 4 TCP receivers using the
>trace file, and suprisingly the throughput suggest that TCP flows
>consumed all of the bottleneck bandwidth with total throughput
>2.5 Mb. Then, I calculate the bottleneck link utilization, and got
>the link utilization of 2.5 Mb.
>However, the statistics from Flid suggest Flid session used approximately
>0.5 Mb of bottleneck link.
>
>Any explanation or suggestion please!
>
>TQ.
>
>Lim
>
>_________________________________________________________________
>Talk now to your Hotmail contacts with Windows Live Messenger. http://clk.atdmt.com/MSN/go/msnnkwme0020000001msn/direct/01/?href=http://get.live.com/messenger/overview
>

From cjs at cs.ucc.ie  Wed Dec 20 01:15:30 2006
From: cjs at cs.ucc.ie (Cormac J. Sreenan)
Date: Wed, 20 Dec 2006 09:15:30 +0000
Subject: [e2e] CFP: Workshop on Embedded Networked Sensors (EmNets'07)
Message-ID: <4588FF32.20406@cs.ucc.ie>


Our apologies if you receive multiple copies of the EmNets CFP.

------------------------------------------------------------------------

***********************************************************
			CALL FOR PAPERS

	Fourth Workshop on Embedded Networked Sensors
			(EmNets 2007)
			
			Cork, Ireland
			25-26 June 2007

		     www.cs.ucc.ie/emnets2007
***********************************************************

The Fourth Workshop on Embedded Networked Sensors (EmNets 2007) brings
together wireless sensor network researchers from academic and industrial
backgrounds to present groundbreaking results that will shed light on present
and future research challenges. The workshop emphasises results from
experiments or deployments that quantify the challenges in the wireless
sensor systems of today as well as early results from new ideas that introduce
promising approaches that will define the challenges in the wireless
sensor systems of tomorrow. We especially welcome papers reporting on
results that refute common assumptions, deployment experiences, novel and
original approaches, and, more generally, papers that will help inform and
guide research.

The EmNets Program Committee discourages submissions that are short 
versions of papers that will be submitted to other conferences in the 
near future, since its goal is to engage the research community in a 
discussion of future challenges and issues.

Topics of interest include, but are not limited to:
Validation/refutation of prior results
Application experiences: measurements, successes and failures
Future applications: requirements and challenges
Hardware platforms, tradeoffs, and trends
Data and network storage 
Delay-tolerant networking
Management, debugging, and troubleshooting
Network and software reliability
Network and system architectures
Software bug detection and tools
Energy sources, scavenging, and low-power operation
Human-Computer interfaces for sensornets
Benchmarks and evaluation suites

All papers will be subject to peer review. Accepted papers will
appear in a formal published proceedings.

IMPORTANT DATES:

Submission Deadline: March 9, 2007 (5 pages)
Notification: April 30, 2007
Camera Ready Due: May 21, 2007
Workshop: June 25-6, 2007

ORGANIZATION:

General Chair: 
    Cormac J. Sreenan, University College Cork
    cjs at cs.ucc.ie

Program Co-Chairs: 
    Philip Levis, Stanford University
    pal at cs.stanford.edu

    Joe Paradiso, MIT
    joep at media.mit.edu

Technical Program Committee:
Jan Beutel, ETH Zurich
Kieren Delaney, Cork Institute of Technology
Terry Dishongh, Intel Corporation
Henri Dubois-Ferriere, EPFL
Deborah Estrin, UCLA
David Gay, Intel Research Berkeley
Michel Goraczo, Microsoft Research
Margaret Martonosi, Princeton University
Mike Masquelier, Motorola
G.Q. Maguire Jr., KTH Sweden
Paddy Nixon, University College Dublin
Robert Poor, Adozu, Inc.
Frank Schmidt, EnOcean
John Regehr, University of Utah
Frank Schmidt, EnOcean
Randy Smith, Sun Microsystems
Jack Stankovic, University of Virginia
Robert Szewczyk, Moteiv Inc.
Henry Tirri, Nokia
Peter van der Stok, Philips, Eindhoven University of Technology 
Guang-Zhong Yang, Imperial College London 
Kazuo Yano, Hitachi
------------------------------------------------------------------


From lynne at telemuse.net  Wed Dec 20 13:33:50 2006
From: lynne at telemuse.net (Lynne Jolitz)
Date: Wed, 20 Dec 2006 13:33:50 -0800
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <41C5B1AE-E6FF-432A-8D79-1610C026FC50@cisco.com>
Message-ID: <002901c7247e$89e4c920$6e8944c6@telemuse.net>

Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.

Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.

The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).

Lynne Jolitz

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net

> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
> Sent: Wednesday, December 13, 2006 12:17 PM
> To: Craig Partridge
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
> 
> 
> You're talking about ifOutQLen. It was originally proposed in RFC  
> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573  
> 1994). The reason it was deprecated is not documented, but the  
> fundamental issue is that it is non-trivial to calculate and is very  
> ephemeral.
> 
> The big issue in calculating it is that it is rarely exactly one  
> queue. Consider a simple case on simple hardware available in 1994.
> 
>     +----------+ |
>     |          | |
>     |  CPU     +-+
>     |          | |
>     +----------+ | BUS
>                  |
>     +----------+ | +---------+
>     |          | +-+ LANCE   |
>     |          | | +---------+
>     |  DRAM    +-+
>     |          | | +---------+
>     |          | +-+ LANCE   |
>     +----------+ | +---------+
> 
> I'm using the term "bus" in the most general possible sense - some  
> way for the various devices to get to the common memory. This gets  
> implemented many ways.
> 
> The AMD 7990 LANCE chip was and is a common Ethernet implementation.  
> It has in front of it a ring in which one can describe up to 2^N  
> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea  
> at any given time how many messages are waiting - it only knows  
> whether it is working on one right now or is idle, and when switching  
> from message to message it knows whether the next slot it considers  
> contains a message. So it can't keep such a counter. The device  
> driver similarly has a limited view; it might know how many it has  
> put in and how many it has taken out again, but it doesn't know  
> whether the LANCE has perhaps completed some of the messages it  
> hasn't taken out yet. So in the sense of the definition ("The length  
> of the output packet queue (in packets)."), it doesn't know how many  
> are still waiting. In addition, it is common for such queues or rings  
> to be configured pretty small, with excess going into a diffserv- 
> described set of software queues.
> 
> There are far more general problems. Cisco has a fast forwarding  
> technology that we use on some of our midrange products that  
> calculates when messages should be sent and schedules them in a  
> common calendar queue. Every mumble time units, the traffic that  
> should be sent during THIS time interval are picked up and dispersed  
> to the various interfaces they need to go out. Hence, there isn't a  
> single "output queue", but rather a commingled output schedule that  
> shifts traffic to other output queues at various times - which in  
> turn do something akin to what I described above.
> 
> Also, in modern equipment one often has forwarders and drivers on NIC  
> cards rather than having some central processor do that. For  
> management purposes, the drivers maintain their counts locally and  
> periodically (perhaps once a second) upload the contents of those  
> counters to a place where management can see them.
> 
> So when you ask "what is the current queue depth", I have to ask what  
> the hardware has, what of that has already been spent but isn't  
> cleaned up yet, what is in how many software queues, how they are  
> organized, and whether that number has been put somewhere that  
> management can see it.
> 
> Oh - did I mention encrypt/decrypt units, compressors, and other  
> inline services that might have their own queues associated with them?
> 
> Yes, there is a definition on the books. I don't know that it answers  
> the question.
> 
> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
> 
> >
> > Queue sizes are standard SNMP variables and thus could be sampled at
> > these intervals.  But it looks as if you want the queues on a per host
> > basis?
> >
> > Craig
> >
> > In message <Pine.LNX. 
> > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
> > nil Kumar writes:
> >
> >>
> >> We are searching for any known techniques to continuously sample  
> >> (say at
> >> every 100 msec interval) the buffer occupancy of router  
> >> interfaces. The
> >> requirement is to extract or estimate the instantaneous value of the
> >> number of packets or bytes in the router buffer from another  
> >> machine in
> >> the network, and not the maximum possible router buffer size.
> >>
> >> Any suggestion, advice or pointer to literature on this?
> >>
> >> Thanks in advance.
> >>
> >> Anil
> 


From mathis at psc.edu  Fri Dec 22 11:09:43 2006
From: mathis at psc.edu (Matt Mathis)
Date: Fri, 22 Dec 2006 14:09:43 -0500 (EST)
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <002901c7247e$89e4c920$6e8944c6@telemuse.net>
References: <002901c7247e$89e4c920$6e8944c6@telemuse.net>
Message-ID: <Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>

Another approach is to get accurate time stamps of ingress/egress packets and
use the difference in the time stamps to compute effective queue depths.  The
NLANR PMA team was building a "router clamp", an "octopus" designed to get
traces from all interfaces of a busy Internet2 core router.  I have since lost
track of the details. Google "router clamp pma" for clues.

I basically don't believe queue depths measured by any other means, because
there are so many cascaded queues in a typical modern router.  I point out
that most NIC's have short queues right at the wire, along with every DMA
engine and bus arbitrator, etc.

Claiming that an internal software instrument accurately represents the true
aggregate queue depth for the router is equivalent to asserting that none of
the other potential bottlenecks in the router have any queued packets. If they
never have queued packets, why did the HW people bother with the silicon?   I
conclude there is always potential for packets to be queued out of scope of
the software instruments.

It's a long story, but I have first hand experience with one of these cases:
my external measurement of maximum queues size was only half of the design size,
because the "wrong" bottleneck dominated.

Good luck,
--MM--
-------------------------------------------
Matt Mathis      http://www.psc.edu/~mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.

On Wed, 20 Dec 2006, Lynne Jolitz wrote:

> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.
>
> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.
>
> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).
>
> Lynne Jolitz
>
> ----
> We use SpamQuiz.
> If your ISP didn't make the grade try http://lynne.telemuse.net
>
> > -----Original Message-----
> > From: end2end-interest-bounces at postel.org
> > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
> > Sent: Wednesday, December 13, 2006 12:17 PM
> > To: Craig Partridge
> > Cc: end2end-interest at postel.org
> > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
> >
> >
> > You're talking about ifOutQLen. It was originally proposed in RFC
> > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
> > 1994). The reason it was deprecated is not documented, but the
> > fundamental issue is that it is non-trivial to calculate and is very
> > ephemeral.
> >
> > The big issue in calculating it is that it is rarely exactly one
> > queue. Consider a simple case on simple hardware available in 1994.
> >
> >     +----------+ |
> >     |          | |
> >     |  CPU     +-+
> >     |          | |
> >     +----------+ | BUS
> >                  |
> >     +----------+ | +---------+
> >     |          | +-+ LANCE   |
> >     |          | | +---------+
> >     |  DRAM    +-+
> >     |          | | +---------+
> >     |          | +-+ LANCE   |
> >     +----------+ | +---------+
> >
> > I'm using the term "bus" in the most general possible sense - some
> > way for the various devices to get to the common memory. This gets
> > implemented many ways.
> >
> > The AMD 7990 LANCE chip was and is a common Ethernet implementation.
> > It has in front of it a ring in which one can describe up to 2^N
> > messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea
> > at any given time how many messages are waiting - it only knows
> > whether it is working on one right now or is idle, and when switching
> > from message to message it knows whether the next slot it considers
> > contains a message. So it can't keep such a counter. The device
> > driver similarly has a limited view; it might know how many it has
> > put in and how many it has taken out again, but it doesn't know
> > whether the LANCE has perhaps completed some of the messages it
> > hasn't taken out yet. So in the sense of the definition ("The length
> > of the output packet queue (in packets)."), it doesn't know how many
> > are still waiting. In addition, it is common for such queues or rings
> > to be configured pretty small, with excess going into a diffserv-
> > described set of software queues.
> >
> > There are far more general problems. Cisco has a fast forwarding
> > technology that we use on some of our midrange products that
> > calculates when messages should be sent and schedules them in a
> > common calendar queue. Every mumble time units, the traffic that
> > should be sent during THIS time interval are picked up and dispersed
> > to the various interfaces they need to go out. Hence, there isn't a
> > single "output queue", but rather a commingled output schedule that
> > shifts traffic to other output queues at various times - which in
> > turn do something akin to what I described above.
> >
> > Also, in modern equipment one often has forwarders and drivers on NIC
> > cards rather than having some central processor do that. For
> > management purposes, the drivers maintain their counts locally and
> > periodically (perhaps once a second) upload the contents of those
> > counters to a place where management can see them.
> >
> > So when you ask "what is the current queue depth", I have to ask what
> > the hardware has, what of that has already been spent but isn't
> > cleaned up yet, what is in how many software queues, how they are
> > organized, and whether that number has been put somewhere that
> > management can see it.
> >
> > Oh - did I mention encrypt/decrypt units, compressors, and other
> > inline services that might have their own queues associated with them?
> >
> > Yes, there is a definition on the books. I don't know that it answers
> > the question.
> >
> > On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
> >
> > >
> > > Queue sizes are standard SNMP variables and thus could be sampled at
> > > these intervals.  But it looks as if you want the queues on a per host
> > > basis?
> > >
> > > Craig
> > >
> > > In message <Pine.LNX.
> > > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
> > > nil Kumar writes:
> > >
> > >>
> > >> We are searching for any known techniques to continuously sample
> > >> (say at
> > >> every 100 msec interval) the buffer occupancy of router
> > >> interfaces. The
> > >> requirement is to extract or estimate the instantaneous value of the
> > >> number of packets or bytes in the router buffer from another
> > >> machine in
> > >> the network, and not the maximum possible router buffer size.
> > >>
> > >> Any suggestion, advice or pointer to literature on this?
> > >>
> > >> Thanks in advance.
> > >>
> > >> Anil
> >
>

From mathis at psc.edu  Sat Dec 23 08:01:42 2006
From: mathis at psc.edu (Matt Mathis)
Date: Sat, 23 Dec 2006 11:01:42 -0500 (EST)
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <1166832054.9009.171.camel@officepc-junliu>
References: <002901c7247e$89e4c920$6e8944c6@telemuse.net> 
	<Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>
	<1166832054.9009.171.camel@officepc-junliu>
Message-ID: <Pine.LNX.4.58.0612231043090.2241@tesla.psc.edu>

ICMP has been dead as a measurement protocol for about 10 years now.   The
problem is that nearly all implementations process ICMP at substantially lower
priority than other protocols, so the measurements are far worse than reality.

I think you are looking for something more along the lines of IPMP, the IP
measurement protocol.   Look for the expired Internet drafts:
draft-bennett-ippm-ipmp-01      2003-03-05      Expired
draft-mcgregor-ipmp-04  2004-02-04      Expired

There is also a report by several people including Fred Baker and me,
analyzing these two conflicting drafts, and proposing yet another variant.  I
couldn't find the report quickly.  Perhaps Fred has a copy.....?

If you want to follow this thread, be sure to engage the router vendors/large
ISP's early and listen to them carefully, because the academic and industrial
agendas clash very badly.   (You should read the report first.)

Thanks,
--MM--
-------------------------------------------
Matt Mathis      http://www.psc.edu/~mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by mortals who think they know
"The Truth" and use force to apply it to others.

On Fri, 22 Dec 2006, Jun Liu wrote:

> I am amazed by this thread of discussion. The key issue of correctly
> estimating the queuing delay at a particular router is to make the
> queuing delay of interest distinct from the delays caused by other
> factors. I agree with Matt Mathis' opinion that the difference of a pair
> of <ingress, egress> timestamps experienced by an IP packet at a router
> closely characterizes the queuing delay of this packet at this router.
> However, it is inconvenient for an end system to obtain the values of
> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only
> been installed surrounding one core router and relies
> on special measurement circuits. The data measured by Clamp is suitable
> for statistics analysis rather than providing dynamic indications to end
> hosts.
>
> I have been working on estimating the maximum queuing delay at the
> outbound queue of the slowest link along an end-to-end path. Here, a
> slowest link refers to a link with the longest maximum queuing delay
> along the path. The queuing delay at the slowest link can be estimated
> from measured RTTs along the path. If the histogram of a set of measured
> RTTs has a single mode, then the maximum queuing delay at the slowest
> link can be approximated by the delay value at the mode less the value
> of the minimum RTT. The estimation of the maximum queuing delay at the
> slowest link is largely affected by the non-ignorable queuing delays at
> other routers. For example, a histogram of measured RTTs can have
> multiple modes when there are two or more identical slowest links in a
> path. Hence, appropriate technique of filtering noises is necessary.
> However, multimodal based estimation issues remain unsolved.
>
> I am thinking of modifying the ICMP protocol to serve for carrying
> dynamic delay information at routers to end hosts. The reason of
> considering ICMP is due to two concerns. First, ICMP should have been
> implemented at all routers and end hosts. "ICMP, uses the basic support
> of IP as if it were a higher level protocol, however, ICMP is actually
> an integral part of IP, and must be implemented by every IP
> module." [RFC 792] Second, a lot of active probing based network
> measurement methods were developed based on ICMP.
>
> Currently, an ICMP error reporting message is sent by a router upon
> processing an erroneous IP packet and is routed back to the sender of
> this IP packet. When this happens, the IP packet is dropped at the
> router. Let's call an erroneous IP packet an echo, and the corresponding
> ICMP packet an echo reply. The proposed modification is to make a pair
> of echo and echo reply packets co-exist in the network. Namely, an echo
> packet is kept routed to its destination after it has triggered an echo
> reply which will be sent back to the sender of this echo. When we assume
> that another echo reply will be sent by the destination of this echo
> packet, the sender will obtain two echo reply packets on one echo. The
> RTTs of the two echo reply packets share delays on the common links they
> both traversed.
>
> Consider a simple network shown below. We denote by d(x,y) the delay
> from network node x to y. d(x,y) consists of the link latency,
> transmission delay on link (x,y), and the delay in node x (which is a
> sum of queuing and processing delays within x). We are about to estimate
> the queuing delay at router B (either dynamic delays or the maximum
> delay). We consider a worst case scenario by assuming that d(A,B) and
> d(B,D) always have similar dynamic values. This scenario happens when
> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of
> the two router have the same size, and the same traffic pattern is on
> routers A and B.
>
>             d(S,A)                  d(A,B)               d(B,D)
> Sender -------------------> R_A ----------------> R_B
> ------------------> Destination
>        <-------------------     <----------------
> <------------------
>             d(A,S)                  d(B,A)               d(D,B)
>
> If the sender can make both router B and the destination send an echo
> reply on every echo packet it sends, then the difference of the RTTs
> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)).
> This value much closely characterizes the queuing delay at router B than
> using pure RTTs. This method makes queuing delay information timely
> delivered to an end node---the sender of the echo packets.
>
> The method described here is somewhat similar to the idea adopted in van
> Jacobson's work of pathchar which incrementally measures the link
> bandwidth hop-by-hop from the link next to the source to the link next
> to the destination. However, there are two differences. First, in
> pathchar, only one echo reply can be triggered by an echo, and a pair of
> echo and echo reply can not co-exist in the network. Second, in
> pathchar, the RTTs of echo reply packets taking different path lengths
> do not necessarily share common delay portions.
>
> Two obvious side effects of this modified ICMP protocol are the overhead
> and the security issues. Higher overhead is made because of the
> co-existence of echo and echo reply packets in the network. One echo
> packet can potentially trigger as many echo reply packets as the number
> of intermediate routers between a pair of sender and destination. Thus,
> the security issue deserves consideration.
>
> My question here is that whether such modification on ICMP is
> acceptable, or it simply introduces a new evil.
>
> Jun Liu
>
> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote:
> > Another approach is to get accurate time stamps of ingress/egress packets and
> > use the difference in the time stamps to compute effective queue depths.  The
> > NLANR PMA team was building a "router clamp", an "octopus" designed to get
> > traces from all interfaces of a busy Internet2 core router.  I have since lost
> > track of the details. Google "router clamp pma" for clues.
> >
> > I basically don't believe queue depths measured by any other means, because
> > there are so many cascaded queues in a typical modern router.  I point out
> > that most NIC's have short queues right at the wire, along with every DMA
> > engine and bus arbitrator, etc.
> >
> > Claiming that an internal software instrument accurately represents the true
> > aggregate queue depth for the router is equivalent to asserting that none of
> > the other potential bottlenecks in the router have any queued packets. If they
> > never have queued packets, why did the HW people bother with the silicon?   I
> > conclude there is always potential for packets to be queued out of scope of
> > the software instruments.
> >
> > It's a long story, but I have first hand experience with one of these cases:
> > my external measurement of maximum queues size was only half of the design size,
> > because the "wrong" bottleneck dominated.
> >
> > Good luck,
> > --MM--
> > -------------------------------------------
> > Matt Mathis      http://www.psc.edu/~mathis
> > Work:412.268.3319    Home/Cell:412.654.7529
> > -------------------------------------------
> > Evil is defined by mortals who think they know
> > "The Truth" and use force to apply it to others.
> >
> > On Wed, 20 Dec 2006, Lynne Jolitz wrote:
> >
> > > Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.
> > >
> > > Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.
> > >
> > > The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).
> > >
> > > Lynne Jolitz
> > >
> > > ----
> > > We use SpamQuiz.
> > > If your ISP didn't make the grade try http://lynne.telemuse.net
> > >
> > > > -----Original Message-----
> > > > From: end2end-interest-bounces at postel.org
> > > > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
> > > > Sent: Wednesday, December 13, 2006 12:17 PM
> > > > To: Craig Partridge
> > > > Cc: end2end-interest at postel.org
> > > > Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
> > > >
> > > >
> > > > You're talking about ifOutQLen. It was originally proposed in RFC
> > > > 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
> > > > 1994). The reason it was deprecated is not documented, but the
> > > > fundamental issue is that it is non-trivial to calculate and is very
> > > > ephemeral.
> > > >
> > > > The big issue in calculating it is that it is rarely exactly one
> > > > queue. Consider a simple case on simple hardware available in 1994.
> > > >
> > > >     +----------+ |
> > > >     |          | |
> > > >     |  CPU     +-+
> > > >     |          | |
> > > >     +----------+ | BUS
> > > >                  |
> > > >     +----------+ | +---------+
> > > >     |          | +-+ LANCE   |
> > > >     |          | | +---------+
> > > >     |  DRAM    +-+
> > > >     |          | | +---------+
> > > >     |          | +-+ LANCE   |
> > > >     +----------+ | +---------+
> > > >
> > > > I'm using the term "bus" in the most general possible sense - some
> > > > way for the various devices to get to the common memory. This gets
> > > > implemented many ways.
> > > >
> > > > The AMD 7990 LANCE chip was and is a common Ethernet implementation.
> > > > It has in front of it a ring in which one can describe up to 2^N
> > > > messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea
> > > > at any given time how many messages are waiting - it only knows
> > > > whether it is working on one right now or is idle, and when switching
> > > > from message to message it knows whether the next slot it considers
> > > > contains a message. So it can't keep such a counter. The device
> > > > driver similarly has a limited view; it might know how many it has
> > > > put in and how many it has taken out again, but it doesn't know
> > > > whether the LANCE has perhaps completed some of the messages it
> > > > hasn't taken out yet. So in the sense of the definition ("The length
> > > > of the output packet queue (in packets)."), it doesn't know how many
> > > > are still waiting. In addition, it is common for such queues or rings
> > > > to be configured pretty small, with excess going into a diffserv-
> > > > described set of software queues.
> > > >
> > > > There are far more general problems. Cisco has a fast forwarding
> > > > technology that we use on some of our midrange products that
> > > > calculates when messages should be sent and schedules them in a
> > > > common calendar queue. Every mumble time units, the traffic that
> > > > should be sent during THIS time interval are picked up and dispersed
> > > > to the various interfaces they need to go out. Hence, there isn't a
> > > > single "output queue", but rather a commingled output schedule that
> > > > shifts traffic to other output queues at various times - which in
> > > > turn do something akin to what I described above.
> > > >
> > > > Also, in modern equipment one often has forwarders and drivers on NIC
> > > > cards rather than having some central processor do that. For
> > > > management purposes, the drivers maintain their counts locally and
> > > > periodically (perhaps once a second) upload the contents of those
> > > > counters to a place where management can see them.
> > > >
> > > > So when you ask "what is the current queue depth", I have to ask what
> > > > the hardware has, what of that has already been spent but isn't
> > > > cleaned up yet, what is in how many software queues, how they are
> > > > organized, and whether that number has been put somewhere that
> > > > management can see it.
> > > >
> > > > Oh - did I mention encrypt/decrypt units, compressors, and other
> > > > inline services that might have their own queues associated with them?
> > > >
> > > > Yes, there is a definition on the books. I don't know that it answers
> > > > the question.
> > > >
> > > > On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
> > > >
> > > > >
> > > > > Queue sizes are standard SNMP variables and thus could be sampled at
> > > > > these intervals.  But it looks as if you want the queues on a per host
> > > > > basis?
> > > > >
> > > > > Craig
> > > > >
> > > > > In message <Pine.LNX.
> > > > > 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
> > > > > nil Kumar writes:
> > > > >
> > > > >>
> > > > >> We are searching for any known techniques to continuously sample
> > > > >> (say at
> > > > >> every 100 msec interval) the buffer occupancy of router
> > > > >> interfaces. The
> > > > >> requirement is to extract or estimate the instantaneous value of the
> > > > >> number of packets or bytes in the router buffer from another
> > > > >> machine in
> > > > >> the network, and not the maximum possible router buffer size.
> > > > >>
> > > > >> Any suggestion, advice or pointer to literature on this?
> > > > >>
> > > > >> Thanks in advance.
> > > > >>
> > > > >> Anil
> > > >
> > >
>

From fred at cisco.com  Fri Dec 22 14:54:03 2006
From: fred at cisco.com (Fred Baker)
Date: Fri, 22 Dec 2006 14:54:03 -0800
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>
References: <002901c7247e$89e4c920$6e8944c6@telemuse.net>
	<Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>
Message-ID: <8A57DB74-D3CE-4FD4-9E34-0FAFC4D1BFAE@cisco.com>


On Dec 22, 2006, at 11:09 AM, Matt Mathis wrote:

> Another approach is to get accurate time stamps of ingress/egress  
> packets and
> use the difference in the time stamps to compute effective queue  
> depths.

I'm not sure I believe that approach. A TDM drop-and-insert- 
multiplexor has a constant queue depth and 100% utilization, while a  
statistical multiplexor has a variable queue depth when it has 100%  
utilization. The depth of the queue becomes a question of your model  
of the arrival and departure processes.

From dpreed at reed.com  Sat Dec 23 13:01:08 2006
From: dpreed at reed.com (David P. Reed)
Date: Sat, 23 Dec 2006 16:01:08 -0500
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <Pine.LNX.4.58.0612231043090.2241@tesla.psc.edu>
References: <002901c7247e$89e4c920$6e8944c6@telemuse.net>
	<Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>	<1166832054.9009.171.camel@officepc-junliu>
	<Pine.LNX.4.58.0612231043090.2241@tesla.psc.edu>
Message-ID: <458D9914.3020608@reed.com>

I find the first sentence here very odd.   ICMP is used every day.   It 
is hardly dead.

Perhaps you meant that it doesn't work very well?

The real point you are making here is that *any* measurement protocol 
that can be distinguished from regular traffic by routers is at high 
risk of generating completely *wrong* answers, for two reasons:

1. Router vendors find it convenient to make their routers privilege 
real traffic over measurement overhead.

2. There is a constant temptation to "game" any benchmarking tests that 
vendors tend to accede to.   Academics do the same thing when they are 
proposing great new ideas that they want to sell - so this isn't a 
statement that says commercial is bad and academic has the moral high 
ground.   (the benchmarking game in the database business (TP1) or the 
processor business (MIPS or FLOPS according to standard benchmarks) or 
the 3D graphics business are all unfortunately gamed every day).

Purveyors of ideas are tempted to lie or spin performance numbers.   
That's the high-tech industry version of I.F.Stone's: "governments lie".

Why would a router vendor offer to report a reliable number over SNMP?

So the general conclusion one should draw from this is that performance 
measurements should be done without the help of vendors or proposers 
(call them purveyors), with a great deal of effort put into measuring 
"real" cases that cannot be detected and distorted by purveyor 
interpretations that either:

a. allow the purveyor to claim that the measurement is bogus (ICMP 
should never have been broken by vendor optimizations, but it was in 
their interest to do so as noted above) or

b. allow the purveyor to generate much better numbers than will ever be 
seen in practice, either by special casing measurement packets, or 
putting the definition of the measurement being made in the hands of the 
purveyor.

Matt Mathis wrote:
> ICMP has been dead as a measurement protocol for about 10 years now.   The
> problem is that nearly all implementations process ICMP at substantially lower
> priority than other protocols, so the measurements are far worse than reality.
>
> I think you are looking for something more along the lines of IPMP, the IP
> measurement protocol.   Look for the expired Internet drafts:
> draft-bennett-ippm-ipmp-01      2003-03-05      Expired
> draft-mcgregor-ipmp-04  2004-02-04      Expired
>
> There is also a report by several people including Fred Baker and me,
> analyzing these two conflicting drafts, and proposing yet another variant.  I
> couldn't find the report quickly.  Perhaps Fred has a copy.....?
>
> If you want to follow this thread, be sure to engage the router vendors/large
> ISP's early and listen to them carefully, because the academic and industrial
> agendas clash very badly.   (You should read the report first.)
>
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://www.psc.edu/~mathis
> Work:412.268.3319    Home/Cell:412.654.7529
> -------------------------------------------
> Evil is defined by mortals who think they know
> "The Truth" and use force to apply it to others.
>
> On Fri, 22 Dec 2006, Jun Liu wrote:
>
>   
>> I am amazed by this thread of discussion. The key issue of correctly
>> estimating the queuing delay at a particular router is to make the
>> queuing delay of interest distinct from the delays caused by other
>> factors. I agree with Matt Mathis' opinion that the difference of a pair
>> of <ingress, egress> timestamps experienced by an IP packet at a router
>> closely characterizes the queuing delay of this packet at this router.
>> However, it is inconvenient for an end system to obtain the values of
>> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only
>> been installed surrounding one core router and relies
>> on special measurement circuits. The data measured by Clamp is suitable
>> for statistics analysis rather than providing dynamic indications to end
>> hosts.
>>
>> I have been working on estimating the maximum queuing delay at the
>> outbound queue of the slowest link along an end-to-end path. Here, a
>> slowest link refers to a link with the longest maximum queuing delay
>> along the path. The queuing delay at the slowest link can be estimated
>> from measured RTTs along the path. If the histogram of a set of measured
>> RTTs has a single mode, then the maximum queuing delay at the slowest
>> link can be approximated by the delay value at the mode less the value
>> of the minimum RTT. The estimation of the maximum queuing delay at the
>> slowest link is largely affected by the non-ignorable queuing delays at
>> other routers. For example, a histogram of measured RTTs can have
>> multiple modes when there are two or more identical slowest links in a
>> path. Hence, appropriate technique of filtering noises is necessary.
>> However, multimodal based estimation issues remain unsolved.
>>
>> I am thinking of modifying the ICMP protocol to serve for carrying
>> dynamic delay information at routers to end hosts. The reason of
>> considering ICMP is due to two concerns. First, ICMP should have been
>> implemented at all routers and end hosts. "ICMP, uses the basic support
>> of IP as if it were a higher level protocol, however, ICMP is actually
>> an integral part of IP, and must be implemented by every IP
>> module." [RFC 792] Second, a lot of active probing based network
>> measurement methods were developed based on ICMP.
>>
>> Currently, an ICMP error reporting message is sent by a router upon
>> processing an erroneous IP packet and is routed back to the sender of
>> this IP packet. When this happens, the IP packet is dropped at the
>> router. Let's call an erroneous IP packet an echo, and the corresponding
>> ICMP packet an echo reply. The proposed modification is to make a pair
>> of echo and echo reply packets co-exist in the network. Namely, an echo
>> packet is kept routed to its destination after it has triggered an echo
>> reply which will be sent back to the sender of this echo. When we assume
>> that another echo reply will be sent by the destination of this echo
>> packet, the sender will obtain two echo reply packets on one echo. The
>> RTTs of the two echo reply packets share delays on the common links they
>> both traversed.
>>
>> Consider a simple network shown below. We denote by d(x,y) the delay
>> from network node x to y. d(x,y) consists of the link latency,
>> transmission delay on link (x,y), and the delay in node x (which is a
>> sum of queuing and processing delays within x). We are about to estimate
>> the queuing delay at router B (either dynamic delays or the maximum
>> delay). We consider a worst case scenario by assuming that d(A,B) and
>> d(B,D) always have similar dynamic values. This scenario happens when
>> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of
>> the two router have the same size, and the same traffic pattern is on
>> routers A and B.
>>
>>             d(S,A)                  d(A,B)               d(B,D)
>> Sender -------------------> R_A ----------------> R_B
>> ------------------> Destination
>>        <-------------------     <----------------
>> <------------------
>>             d(A,S)                  d(B,A)               d(D,B)
>>
>> If the sender can make both router B and the destination send an echo
>> reply on every echo packet it sends, then the difference of the RTTs
>> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)).
>> This value much closely characterizes the queuing delay at router B than
>> using pure RTTs. This method makes queuing delay information timely
>> delivered to an end node---the sender of the echo packets.
>>
>> The method described here is somewhat similar to the idea adopted in van
>> Jacobson's work of pathchar which incrementally measures the link
>> bandwidth hop-by-hop from the link next to the source to the link next
>> to the destination. However, there are two differences. First, in
>> pathchar, only one echo reply can be triggered by an echo, and a pair of
>> echo and echo reply can not co-exist in the network. Second, in
>> pathchar, the RTTs of echo reply packets taking different path lengths
>> do not necessarily share common delay portions.
>>
>> Two obvious side effects of this modified ICMP protocol are the overhead
>> and the security issues. Higher overhead is made because of the
>> co-existence of echo and echo reply packets in the network. One echo
>> packet can potentially trigger as many echo reply packets as the number
>> of intermediate routers between a pair of sender and destination. Thus,
>> the security issue deserves consideration.
>>
>> My question here is that whether such modification on ICMP is
>> acceptable, or it simply introduces a new evil.
>>
>> Jun Liu
>>
>> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote:
>>     
>>> Another approach is to get accurate time stamps of ingress/egress packets and
>>> use the difference in the time stamps to compute effective queue depths.  The
>>> NLANR PMA team was building a "router clamp", an "octopus" designed to get
>>> traces from all interfaces of a busy Internet2 core router.  I have since lost
>>> track of the details. Google "router clamp pma" for clues.
>>>
>>> I basically don't believe queue depths measured by any other means, because
>>> there are so many cascaded queues in a typical modern router.  I point out
>>> that most NIC's have short queues right at the wire, along with every DMA
>>> engine and bus arbitrator, etc.
>>>
>>> Claiming that an internal software instrument accurately represents the true
>>> aggregate queue depth for the router is equivalent to asserting that none of
>>> the other potential bottlenecks in the router have any queued packets. If they
>>> never have queued packets, why did the HW people bother with the silicon?   I
>>> conclude there is always potential for packets to be queued out of scope of
>>> the software instruments.
>>>
>>> It's a long story, but I have first hand experience with one of these cases:
>>> my external measurement of maximum queues size was only half of the design size,
>>> because the "wrong" bottleneck dominated.
>>>
>>> Good luck,
>>> --MM--
>>> -------------------------------------------
>>> Matt Mathis      http://www.psc.edu/~mathis
>>> Work:412.268.3319    Home/Cell:412.654.7529
>>> -------------------------------------------
>>> Evil is defined by mortals who think they know
>>> "The Truth" and use force to apply it to others.
>>>
>>> On Wed, 20 Dec 2006, Lynne Jolitz wrote:
>>>
>>>       
>>>> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.
>>>>
>>>> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.
>>>>
>>>> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).
>>>>
>>>> Lynne Jolitz
>>>>
>>>> ----
>>>> We use SpamQuiz.
>>>> If your ISP didn't make the grade try http://lynne.telemuse.net
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: end2end-interest-bounces at postel.org
>>>>> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
>>>>> Sent: Wednesday, December 13, 2006 12:17 PM
>>>>> To: Craig Partridge
>>>>> Cc: end2end-interest at postel.org
>>>>> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
>>>>>
>>>>>
>>>>> You're talking about ifOutQLen. It was originally proposed in RFC
>>>>> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
>>>>> 1994). The reason it was deprecated is not documented, but the
>>>>> fundamental issue is that it is non-trivial to calculate and is very
>>>>> ephemeral.
>>>>>
>>>>> The big issue in calculating it is that it is rarely exactly one
>>>>> queue. Consider a simple case on simple hardware available in 1994.
>>>>>
>>>>>     +----------+ |
>>>>>     |          | |
>>>>>     |  CPU     +-+
>>>>>     |          | |
>>>>>     +----------+ | BUS
>>>>>                  |
>>>>>     +----------+ | +---------+
>>>>>     |          | +-+ LANCE   |
>>>>>     |          | | +---------+
>>>>>     |  DRAM    +-+
>>>>>     |          | | +---------+
>>>>>     |          | +-+ LANCE   |
>>>>>     +----------+ | +---------+
>>>>>
>>>>> I'm using the term "bus" in the most general possible sense - some
>>>>> way for the various devices to get to the common memory. This gets
>>>>> implemented many ways.
>>>>>
>>>>> The AMD 7990 LANCE chip was and is a common Ethernet implementation.
>>>>> It has in front of it a ring in which one can describe up to 2^N
>>>>> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea
>>>>> at any given time how many messages are waiting - it only knows
>>>>> whether it is working on one right now or is idle, and when switching
>>>>> from message to message it knows whether the next slot it considers
>>>>> contains a message. So it can't keep such a counter. The device
>>>>> driver similarly has a limited view; it might know how many it has
>>>>> put in and how many it has taken out again, but it doesn't know
>>>>> whether the LANCE has perhaps completed some of the messages it
>>>>> hasn't taken out yet. So in the sense of the definition ("The length
>>>>> of the output packet queue (in packets)."), it doesn't know how many
>>>>> are still waiting. In addition, it is common for such queues or rings
>>>>> to be configured pretty small, with excess going into a diffserv-
>>>>> described set of software queues.
>>>>>
>>>>> There are far more general problems. Cisco has a fast forwarding
>>>>> technology that we use on some of our midrange products that
>>>>> calculates when messages should be sent and schedules them in a
>>>>> common calendar queue. Every mumble time units, the traffic that
>>>>> should be sent during THIS time interval are picked up and dispersed
>>>>> to the various interfaces they need to go out. Hence, there isn't a
>>>>> single "output queue", but rather a commingled output schedule that
>>>>> shifts traffic to other output queues at various times - which in
>>>>> turn do something akin to what I described above.
>>>>>
>>>>> Also, in modern equipment one often has forwarders and drivers on NIC
>>>>> cards rather than having some central processor do that. For
>>>>> management purposes, the drivers maintain their counts locally and
>>>>> periodically (perhaps once a second) upload the contents of those
>>>>> counters to a place where management can see them.
>>>>>
>>>>> So when you ask "what is the current queue depth", I have to ask what
>>>>> the hardware has, what of that has already been spent but isn't
>>>>> cleaned up yet, what is in how many software queues, how they are
>>>>> organized, and whether that number has been put somewhere that
>>>>> management can see it.
>>>>>
>>>>> Oh - did I mention encrypt/decrypt units, compressors, and other
>>>>> inline services that might have their own queues associated with them?
>>>>>
>>>>> Yes, there is a definition on the books. I don't know that it answers
>>>>> the question.
>>>>>
>>>>> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
>>>>>
>>>>>           
>>>>>> Queue sizes are standard SNMP variables and thus could be sampled at
>>>>>> these intervals.  But it looks as if you want the queues on a per host
>>>>>> basis?
>>>>>>
>>>>>> Craig
>>>>>>
>>>>>> In message <Pine.LNX.
>>>>>> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
>>>>>> nil Kumar writes:
>>>>>>
>>>>>>             
>>>>>>> We are searching for any known techniques to continuously sample
>>>>>>> (say at
>>>>>>> every 100 msec interval) the buffer occupancy of router
>>>>>>> interfaces. The
>>>>>>> requirement is to extract or estimate the instantaneous value of the
>>>>>>> number of packets or bytes in the router buffer from another
>>>>>>> machine in
>>>>>>> the network, and not the maximum possible router buffer size.
>>>>>>>
>>>>>>> Any suggestion, advice or pointer to literature on this?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> Anil
>>>>>>>               
>
>
>   

From detlef.bosau at web.de  Sat Dec 23 13:45:36 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sat, 23 Dec 2006 22:45:36 +0100
Subject: [e2e] How shall we deal with servers with different bandwidths and
 a common bottleneck to the client?
Message-ID: <458DA380.1070207@web.de>

I apologize if this is a stupid question.

However, I would like to know how we shall deal with this scenario.

Consider a client maintaing TCP sessions to different servers. (My PC is 
actually doing so, so there is at least one specimen.)

Consider the following topology

                                                                         
               ---------------(FE)--------------- Server 1 
                         
Client  ----- (E)------router ------------(FE)----------router
                                                                                       
----------------(E)------------------Server 2

E = Ethernet (10 Mbit/s)
FE = Fast Ethernt (100 MBit/s)

The link in the middle represents some transport network / path, e.g. 
the Internet. The common bottleneck is actually the link between client 
and router.
Consider there were two greedy TCP flows to the client, one originates 
from server 1 and the other from server 2.

My feeling is that the flow server 1 - client should achieve more 
throughput than the other. From what I see in a simulation, the ratio in 
the secnario above is roughly 2:1. (I did this simulation this evening, 
so admittedly there might be errors.)

Is there a general opinion how the throughput ratio should be in a 
scenario like this?

Thanks.

Detlef

From Jon.Crowcroft at cl.cam.ac.uk  Sat Dec 23 13:47:44 2006
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Sat, 23 Dec 2006 21:47:44 +0000
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: Message from "David P. Reed" <dpreed@reed.com> 
	of "Sat, 23 Dec 2006 16:01:08 EST." <458D9914.3020608@reed.com> 
Message-ID: <E1GyEiX-0002QS-00@mta1.cl.cam.ac.uk>

The obvious solution is for everyone everywhere to run 
time wget www.google.com
once a minute
and then put the answer on a web page called, say
'hostname`-`date`.txt
and wait for google to index it
(or we could set up a gmail account with a public pasword 
and just email answers there) 
and then run traceroute for same set to find intersection of sub-paths,
then we'd have a huge oracle of rtts from every to everywhere (that matters),
pretty much,
and no amount of icmp would be needed at all, de-prioritised or otherwize.

btw, why do americans call queues "lines", except when talking about networks?
surely we should have 
line theory, and active line management and fair lining and so on?

now where's that d**n martini...?


In missive <458D9914.3020608 at reed.com>, "David P. Reed" typed:

 >>I find the first sentence here very odd.   ICMP is used every day.   It 
 >>is hardly dead.
 >>
 >>Perhaps you meant that it doesn't work very well?
 >>
 >>The real point you are making here is that *any* measurement protocol 
 >>that can be distinguished from regular traffic by routers is at high 
 >>risk of generating completely *wrong* answers, for two reasons:
 >>
 >>1. Router vendors find it convenient to make their routers privilege 
 >>real traffic over measurement overhead.
 >>
 >>2. There is a constant temptation to "game" any benchmarking tests that 
 >>vendors tend to accede to.   Academics do the same thing when they are 
 >>proposing great new ideas that they want to sell - so this isn't a 
 >>statement that says commercial is bad and academic has the moral high 
 >>ground.   (the benchmarking game in the database business (TP1) or the 
 >>processor business (MIPS or FLOPS according to standard benchmarks) or 
 >>the 3D graphics business are all unfortunately gamed every day).
 >>
 >>Purveyors of ideas are tempted to lie or spin performance numbers.   
 >>That's the high-tech industry version of I.F.Stone's: "governments lie".
 >>
 >>Why would a router vendor offer to report a reliable number over SNMP?
 >>
 >>So the general conclusion one should draw from this is that performance 
 >>measurements should be done without the help of vendors or proposers 
 >>(call them purveyors), with a great deal of effort put into measuring 
 >>"real" cases that cannot be detected and distorted by purveyor 
 >>interpretations that either:
 >>
 >>a. allow the purveyor to claim that the measurement is bogus (ICMP 
 >>should never have been broken by vendor optimizations, but it was in 
 >>their interest to do so as noted above) or
 >>
 >>b. allow the purveyor to generate much better numbers than will ever be 
 >>seen in practice, either by special casing measurement packets, or 
 >>putting the definition of the measurement being made in the hands of the 
 >>purveyor.
 >>
 >>Matt Mathis wrote:
 >>> ICMP has been dead as a measurement protocol for about 10 years now.   The
 >>> problem is that nearly all implementations process ICMP at substantially lower
 >>> priority than other protocols, so the measurements are far worse than reality.
 >>>
 >>> I think you are looking for something more along the lines of IPMP, the IP
 >>> measurement protocol.   Look for the expired Internet drafts:
 >>> draft-bennett-ippm-ipmp-01      2003-03-05      Expired
 >>> draft-mcgregor-ipmp-04  2004-02-04      Expired
 >>>
 >>> There is also a report by several people including Fred Baker and me,
 >>> analyzing these two conflicting drafts, and proposing yet another variant.  I
 >>> couldn't find the report quickly.  Perhaps Fred has a copy.....?
 >>>
 >>> If you want to follow this thread, be sure to engage the router vendors/large
 >>> ISP's early and listen to them carefully, because the academic and industrial
 >>> agendas clash very badly.   (You should read the report first.)
 >>>
 >>> Thanks,
 >>> --MM--
 >>> -------------------------------------------
 >>> Matt Mathis      http://www.psc.edu/~mathis
 >>> Work:412.268.3319    Home/Cell:412.654.7529
 >>> -------------------------------------------
 >>> Evil is defined by mortals who think they know
 >>> "The Truth" and use force to apply it to others.
 >>>
 >>> On Fri, 22 Dec 2006, Jun Liu wrote:
 >>>
 >>>   
 >>>> I am amazed by this thread of discussion. The key issue of correctly
 >>>> estimating the queuing delay at a particular router is to make the
 >>>> queuing delay of interest distinct from the delays caused by other
 >>>> factors. I agree with Matt Mathis' opinion that the difference of a pair
 >>>> of <ingress, egress> timestamps experienced by an IP packet at a router
 >>>> closely characterizes the queuing delay of this packet at this router.
 >>>> However, it is inconvenient for an end system to obtain the values of
 >>>> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only
 >>>> been installed surrounding one core router and relies
 >>>> on special measurement circuits. The data measured by Clamp is suitable
 >>>> for statistics analysis rather than providing dynamic indications to end
 >>>> hosts.
 >>>>
 >>>> I have been working on estimating the maximum queuing delay at the
 >>>> outbound queue of the slowest link along an end-to-end path. Here, a
 >>>> slowest link refers to a link with the longest maximum queuing delay
 >>>> along the path. The queuing delay at the slowest link can be estimated
 >>>> from measured RTTs along the path. If the histogram of a set of measured
 >>>> RTTs has a single mode, then the maximum queuing delay at the slowest
 >>>> link can be approximated by the delay value at the mode less the value
 >>>> of the minimum RTT. The estimation of the maximum queuing delay at the
 >>>> slowest link is largely affected by the non-ignorable queuing delays at
 >>>> other routers. For example, a histogram of measured RTTs can have
 >>>> multiple modes when there are two or more identical slowest links in a
 >>>> path. Hence, appropriate technique of filtering noises is necessary.
 >>>> However, multimodal based estimation issues remain unsolved.
 >>>>
 >>>> I am thinking of modifying the ICMP protocol to serve for carrying
 >>>> dynamic delay information at routers to end hosts. The reason of
 >>>> considering ICMP is due to two concerns. First, ICMP should have been
 >>>> implemented at all routers and end hosts. "ICMP, uses the basic support
 >>>> of IP as if it were a higher level protocol, however, ICMP is actually
 >>>> an integral part of IP, and must be implemented by every IP
 >>>> module." [RFC 792] Second, a lot of active probing based network
 >>>> measurement methods were developed based on ICMP.
 >>>>
 >>>> Currently, an ICMP error reporting message is sent by a router upon
 >>>> processing an erroneous IP packet and is routed back to the sender of
 >>>> this IP packet. When this happens, the IP packet is dropped at the
 >>>> router. Let's call an erroneous IP packet an echo, and the corresponding
 >>>> ICMP packet an echo reply. The proposed modification is to make a pair
 >>>> of echo and echo reply packets co-exist in the network. Namely, an echo
 >>>> packet is kept routed to its destination after it has triggered an echo
 >>>> reply which will be sent back to the sender of this echo. When we assume
 >>>> that another echo reply will be sent by the destination of this echo
 >>>> packet, the sender will obtain two echo reply packets on one echo. The
 >>>> RTTs of the two echo reply packets share delays on the common links they
 >>>> both traversed.
 >>>>
 >>>> Consider a simple network shown below. We denote by d(x,y) the delay
 >>>> from network node x to y. d(x,y) consists of the link latency,
 >>>> transmission delay on link (x,y), and the delay in node x (which is a
 >>>> sum of queuing and processing delays within x). We are about to estimate
 >>>> the queuing delay at router B (either dynamic delays or the maximum
 >>>> delay). We consider a worst case scenario by assuming that d(A,B) and
 >>>> d(B,D) always have similar dynamic values. This scenario happens when
 >>>> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of
 >>>> the two router have the same size, and the same traffic pattern is on
 >>>> routers A and B.
 >>>>
 >>>>             d(S,A)                  d(A,B)               d(B,D)
 >>>> Sender -------------------> R_A ----------------> R_B
 >>>> ------------------> Destination
 >>>>        <-------------------     <----------------
 >>>> <------------------
 >>>>             d(A,S)                  d(B,A)               d(D,B)
 >>>>
 >>>> If the sender can make both router B and the destination send an echo
 >>>> reply on every echo packet it sends, then the difference of the RTTs
 >>>> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)).
 >>>> This value much closely characterizes the queuing delay at router B than
 >>>> using pure RTTs. This method makes queuing delay information timely
 >>>> delivered to an end node---the sender of the echo packets.
 >>>>
 >>>> The method described here is somewhat similar to the idea adopted in van
 >>>> Jacobson's work of pathchar which incrementally measures the link
 >>>> bandwidth hop-by-hop from the link next to the source to the link next
 >>>> to the destination. However, there are two differences. First, in
 >>>> pathchar, only one echo reply can be triggered by an echo, and a pair of
 >>>> echo and echo reply can not co-exist in the network. Second, in
 >>>> pathchar, the RTTs of echo reply packets taking different path lengths
 >>>> do not necessarily share common delay portions.
 >>>>
 >>>> Two obvious side effects of this modified ICMP protocol are the overhead
 >>>> and the security issues. Higher overhead is made because of the
 >>>> co-existence of echo and echo reply packets in the network. One echo
 >>>> packet can potentially trigger as many echo reply packets as the number
 >>>> of intermediate routers between a pair of sender and destination. Thus,
 >>>> the security issue deserves consideration.
 >>>>
 >>>> My question here is that whether such modification on ICMP is
 >>>> acceptable, or it simply introduces a new evil.
 >>>>
 >>>> Jun Liu
 >>>>
 >>>> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote:
 >>>>     
 >>>>> Another approach is to get accurate time stamps of ingress/egress packets and
 >>>>> use the difference in the time stamps to compute effective queue depths.  The
 >>>>> NLANR PMA team was building a "router clamp", an "octopus" designed to get
 >>>>> traces from all interfaces of a busy Internet2 core router.  I have since lost
 >>>>> track of the details. Google "router clamp pma" for clues.
 >>>>>
 >>>>> I basically don't believe queue depths measured by any other means, because
 >>>>> there are so many cascaded queues in a typical modern router.  I point out
 >>>>> that most NIC's have short queues right at the wire, along with every DMA
 >>>>> engine and bus arbitrator, etc.
 >>>>>
 >>>>> Claiming that an internal software instrument accurately represents the true
 >>>>> aggregate queue depth for the router is equivalent to asserting that none of
 >>>>> the other potential bottlenecks in the router have any queued packets. If they
 >>>>> never have queued packets, why did the HW people bother with the silicon?   I
 >>>>> conclude there is always potential for packets to be queued out of scope of
 >>>>> the software instruments.
 >>>>>
 >>>>> It's a long story, but I have first hand experience with one of these cases:
 >>>>> my external measurement of maximum queues size was only half of the design size,
 >>>>> because the "wrong" bottleneck dominated.
 >>>>>
 >>>>> Good luck,
 >>>>> --MM--
 >>>>> -------------------------------------------
 >>>>> Matt Mathis      http://www.psc.edu/~mathis
 >>>>> Work:412.268.3319    Home/Cell:412.654.7529
 >>>>> -------------------------------------------
 >>>>> Evil is defined by mortals who think they know
 >>>>> "The Truth" and use force to apply it to others.
 >>>>>
 >>>>> On Wed, 20 Dec 2006, Lynne Jolitz wrote:
 >>>>>
 >>>>>       
 >>>>>> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.
 >>>>>>
 >>>>>> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.
 >>>>>>
 >>>>>> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).
 >>>>>>
 >>>>>> Lynne Jolitz
 >>>>>>
 >>>>>> ----
 >>>>>> We use SpamQuiz.
 >>>>>> If your ISP didn't make the grade try http://lynne.telemuse.net
 >>>>>>
 >>>>>>         
 >>>>>>> -----Original Message-----
 >>>>>>> From: end2end-interest-bounces at postel.org
 >>>>>>> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
 >>>>>>> Sent: Wednesday, December 13, 2006 12:17 PM
 >>>>>>> To: Craig Partridge
 >>>>>>> Cc: end2end-interest at postel.org
 >>>>>>> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
 >>>>>>>
 >>>>>>>
 >>>>>>> You're talking about ifOutQLen. It was originally proposed in RFC
 >>>>>>> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
 >>>>>>> 1994). The reason it was deprecated is not documented, but the
 >>>>>>> fundamental issue is that it is non-trivial to calculate and is very
 >>>>>>> ephemeral.
 >>>>>>>
 >>>>>>> The big issue in calculating it is that it is rarely exactly one
 >>>>>>> queue. Consider a simple case on simple hardware available in 1994.
 >>>>>>>
 >>>>>>>     +----------+ |
 >>>>>>>     |          | |
 >>>>>>>     |  CPU     +-+
 >>>>>>>     |          | |
 >>>>>>>     +----------+ | BUS
 >>>>>>>                  |
 >>>>>>>     +----------+ | +---------+
 >>>>>>>     |          | +-+ LANCE   |
 >>>>>>>     |          | | +---------+
 >>>>>>>     |  DRAM    +-+
 >>>>>>>     |          | | +---------+
 >>>>>>>     |          | +-+ LANCE   |
 >>>>>>>     +----------+ | +---------+
 >>>>>>>
 >>>>>>> I'm using the term "bus" in the most general possible sense - some
 >>>>>>> way for the various devices to get to the common memory. This gets
 >>>>>>> implemented many ways.
 >>>>>>>
 >>>>>>> The AMD 7990 LANCE chip was and is a common Ethernet implementation.
 >>>>>>> It has in front of it a ring in which one can describe up to 2^N
 >>>>>>> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea
 >>>>>>> at any given time how many messages are waiting - it only knows
 >>>>>>> whether it is working on one right now or is idle, and when switching
 >>>>>>> from message to message it knows whether the next slot it considers
 >>>>>>> contains a message. So it can't keep such a counter. The device
 >>>>>>> driver similarly has a limited view; it might know how many it has
 >>>>>>> put in and how many it has taken out again, but it doesn't know
 >>>>>>> whether the LANCE has perhaps completed some of the messages it
 >>>>>>> hasn't taken out yet. So in the sense of the definition ("The length
 >>>>>>> of the output packet queue (in packets)."), it doesn't know how many
 >>>>>>> are still waiting. In addition, it is common for such queues or rings
 >>>>>>> to be configured pretty small, with excess going into a diffserv-
 >>>>>>> described set of software queues.
 >>>>>>>
 >>>>>>> There are far more general problems. Cisco has a fast forwarding
 >>>>>>> technology that we use on some of our midrange products that
 >>>>>>> calculates when messages should be sent and schedules them in a
 >>>>>>> common calendar queue. Every mumble time units, the traffic that
 >>>>>>> should be sent during THIS time interval are picked up and dispersed
 >>>>>>> to the various interfaces they need to go out. Hence, there isn't a
 >>>>>>> single "output queue", but rather a commingled output schedule that
 >>>>>>> shifts traffic to other output queues at various times - which in
 >>>>>>> turn do something akin to what I described above.
 >>>>>>>
 >>>>>>> Also, in modern equipment one often has forwarders and drivers on NIC
 >>>>>>> cards rather than having some central processor do that. For
 >>>>>>> management purposes, the drivers maintain their counts locally and
 >>>>>>> periodically (perhaps once a second) upload the contents of those
 >>>>>>> counters to a place where management can see them.
 >>>>>>>
 >>>>>>> So when you ask "what is the current queue depth", I have to ask what
 >>>>>>> the hardware has, what of that has already been spent but isn't
 >>>>>>> cleaned up yet, what is in how many software queues, how they are
 >>>>>>> organized, and whether that number has been put somewhere that
 >>>>>>> management can see it.
 >>>>>>>
 >>>>>>> Oh - did I mention encrypt/decrypt units, compressors, and other
 >>>>>>> inline services that might have their own queues associated with them?
 >>>>>>>
 >>>>>>> Yes, there is a definition on the books. I don't know that it answers
 >>>>>>> the question.
 >>>>>>>
 >>>>>>> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
 >>>>>>>
 >>>>>>>           
 >>>>>>>> Queue sizes are standard SNMP variables and thus could be sampled at
 >>>>>>>> these intervals.  But it looks as if you want the queues on a per host
 >>>>>>>> basis?
 >>>>>>>>
 >>>>>>>> Craig
 >>>>>>>>
 >>>>>>>> In message <Pine.LNX.
 >>>>>>>> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
 >>>>>>>> nil Kumar writes:
 >>>>>>>>
 >>>>>>>>             
 >>>>>>>>> We are searching for any known techniques to continuously sample
 >>>>>>>>> (say at
 >>>>>>>>> every 100 msec interval) the buffer occupancy of router
 >>>>>>>>> interfaces. The
 >>>>>>>>> requirement is to extract or estimate the instantaneous value of the
 >>>>>>>>> number of packets or bytes in the router buffer from another
 >>>>>>>>> machine in
 >>>>>>>>> the network, and not the maximum possible router buffer size.
 >>>>>>>>>
 >>>>>>>>> Any suggestion, advice or pointer to literature on this?
 >>>>>>>>>
 >>>>>>>>> Thanks in advance.
 >>>>>>>>>
 >>>>>>>>> Anil
 >>>>>>>>>               
 >>>
 >>>
 >>>   

 cheers

   jon


From detlef.bosau at web.de  Sat Dec 23 14:52:32 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sat, 23 Dec 2006 23:52:32 +0100
Subject: [e2e] How shall we deal with servers with different bandwidths
 and a common bottleneck to the client?
In-Reply-To: <458DA380.1070207@web.de>
References: <458DA380.1070207@web.de>
Message-ID: <458DB330.80504@web.de>

My goodness, I did not see my disastrous "ASCII-art".

I apologize.

Let?s give it another try:
                                                ----(FE)---- Server 1 
                       
C - (E)---router --(FE)---router
                                                ----(E)------Server 2

C = Client, rest as before.

Hopefully, it is better now.


From detlef.bosau at web.de  Sun Dec 24 14:52:56 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sun, 24 Dec 2006 23:52:56 +0100
Subject: [e2e] How shall we deal with servers with different bandwidths
 and a common bottleneck to the client?
In-Reply-To: <458DA380.1070207@web.de>
References: <458DA380.1070207@web.de>
Message-ID: <458F04C8.30100@web.de>

Detlef Bosau wrote:
> I apologize if this is a stupid question.

I admit, it was a *very* stupid question :-)

Because my ASCII arts were terrible, I add a nam-screenshot here 
(hopefully, I?m allowed to send this mail in HTML):

NAM screenshot

Links:
0-2: 100 Mbit/s, 1 ms
1-2: 10 Mbit/s, 1 ms
2-3: 100 Mbit/s, 10 ms
3-4: 10 MBit/s, 1 ms

Sender: 0,1
Receiver: 4
>
>
> My feeling is that the flow server 1 - client should achieve more 
> throughput than the other. From what I see in a simulation, the ratio 
> in the secnario above is roughly 2:1. (I did this simulation this 
> evening, so admittedly there might be errors.)
>
> Is there a general opinion how the throughput ratio should be in a 
> scenario like this?


Obviously, my feeling is wrong. Perhaps, I should consider reality more 
than my feelings :-[

AIMD distributes  the *path capacity (i.e. "memory") *in equal shares.  
So, in case of two flows sharing a path, each flow is assigned an equal 
window. Hence, the rates should be equal as they depend on the window (= 
estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = 
cwnd/RTT)

However, the scenario depicted above is an interesting one: Apparently, 
the sender at node 1 is paced "ideally" by the link 1-2. So, packets 
sent by node 0 are dropped at node 3 unuduly often.  In consequence, the 
flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 
to 4 runs as if there was no competitor.

If the bandwdith 1-2 is changed a little bit, the bevaviour returns to 
the expected one.

I?m still not quite sure whether this behaviour matches reality or 
whether it is an NS2 artifact.

Detlef
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061224/3033e94c/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bild.png
Type: image/png
Size: 21858 bytes
Desc: not available
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061224/3033e94c/bild-0001.png

From Anil.Agarwal at viasat.com  Mon Dec 25 08:35:18 2006
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Mon, 25 Dec 2006 11:35:18 -0500
Subject: [e2e] How shall we deal with servers with different bandwidths
	and a common bottleneck to the client?
References: <458DA380.1070207@web.de> <458F04C8.30100@web.de>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com>

Detlef,
 
Here is a possible explanation for the results in your scenario -
 
Take the case when both connections are active and the queue at router 2 remains non-empty.
 
Every T seconds, there will be a packet departure at router 2, resulting in the queue size decreasing by 1 packet at time T.
 
If a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1, another packet will arrive at router 2 from node 1.
      ta1 is the time taken by the Ack to reach node 1.
 
If a packet from node 0 departs at time n*T, then at time n*T + ta0 + t0, another packet will arrive at router 2 from node 0.
      ta0 is the time taken by the Ack to reach node 0. 
      t0 is the transmission time of a packet at 100 Mbps. 
      Another packet from node 0 may arrive at time n*T + ta0 + 2 * t0.
 
In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + t0 > ta1. I am assuming that propagation delays were set to 0 in the simulations.
 
It can be seen, that when a node 1 packet arrives at node 2, the queue is never full - a packet departure takes place ta1 seconds before its arrival, and no node 0 packet arrive during the ta1 seconds.
 
No such property holds for node 0 packets - hence node 0 packets are selectively dropped.
 
Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switching delays will probably break this synchronized relationship.
 
Regards,
Anil
 
Anil Agarwal
ViaSat Inc.
Germantown, MD
 

________________________________

From: end2end-interest-bounces at postel.org on behalf of Detlef Bosau
Sent: Sun 12/24/2006 5:52 PM
To: end2end-interest at postel.org
Cc: Michael Kochte; Daniel Minder; Martin Reisslein; Frank Duerr
Subject: Re: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client?


Detlef Bosau wrote: 

	I apologize if this is a stupid question.
	

I admit, it was a very stupid question :-)

Because my ASCII arts were terrible, I add a nam-screenshot here (hopefully, I?m allowed to send this mail in HTML):

 
Links: 
0-2: 100 Mbit/s, 1 ms
1-2: 10 Mbit/s, 1 ms
2-3: 100 Mbit/s, 10 ms
3-4: 10 MBit/s, 1 ms

Sender: 0,1
Receiver: 4


	My feeling is that the flow server 1 - client should achieve more throughput than the other. From what I see in a simulation, the ratio in the secnario above is roughly 2:1. (I did this simulation this evening, so admittedly there might be errors.) 
	
	Is there a general opinion how the throughput ratio should be in a scenario like this?


Obviously, my feeling is wrong. Perhaps, I should consider reality more than my feelings :-[ 

AIMD distributes  the path capacity (i.e. "memory") in equal shares.  So, in case of two flows sharing a path, each flow is assigned an equal window. Hence, the rates should be equal as they depend on the window (= estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = cwnd/RTT)

However, the scenario depicted above is an interesting one: Apparently, the sender at node 1 is paced "ideally" by the link 1-2. So, packets sent by node 0 are dropped at node 3 unuduly often.  In consequence, the flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 to 4 runs as if there was no competitor.

If the bandwdith 1-2 is changed a little bit, the bevaviour returns to the expected one.

I?m still not quite sure whether this behaviour matches reality or whether it is an NS2 artifact.

Detlef

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/1d136a43/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bild.png
Type: image/png
Size: 21858 bytes
Desc: bild.png
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/1d136a43/bild-0001.png

From Anil.Agarwal at viasat.com  Mon Dec 25 15:38:44 2006
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Mon, 25 Dec 2006 18:38:44 -0500
Subject: [e2e] How shall we deal with servers with different
	bandwidthsand a common bottleneck to the client?
References: <458DA380.1070207@web.de> <458F04C8.30100@web.de>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com>

Detlef,
 
In my earlier description, I had incorrectly assumed that link 2-3 was at 10 Mbps. The nature of the problem is similar whether link 2-3 is at 10 Mbps or 100 Mbps.
 
Here is a corrected description for your network scenario -
 
Take the case when both connections are active and the queue at router 3 remains non-empty.
 
Every T seconds, there will be a packet departure at router 3, resulting in the queue size decreasing by 1 packet.
 
At router 3, if a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1 + t0, another packet will arrive from node 1.
      ta1 is the time taken by the Ack to reach node 1 from node 4.
      t0 is the transmission time of a packet at 100 Mbps.
 
At router 3, if a packet from node 0 departs at time n*T, then at time n*T + ta0 + 2 * t0, another packet will arrive from node 0.
      ta0 is the time taken by the Ack to reach node 0 from node 4.
      t0 is the transmission time of a packet at 100 Mbps. 
      Another packet (of a packet pair) from node 0 may arrive at time n*T + ta0 + 3 * t0.
 
In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + 2 * t0 > ta1 + t0. I am assuming that propagation delays were set to 0 in the simulations.
 
It can be seen, that when a node 1 packet arrives at node 3, the queue is never full - a packet departure takes place ta1 + t0 seconds before its arrival, and no node 0 packets arrive during ths interval.
 
No such property holds for node 0 packets - hence node 0 packets are selectively dropped.
 
Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switch delays will probably break this synchronized relationship. RED will also help.
 
One can construct many other similar scenarios, where one connection is selectively favored over another. Perhaps, one more reason to use RED.
 
Anil

________________________________

From: end2end-interest-bounces at postel.org on behalf of Agarwal, Anil
Sent: Mon 12/25/2006 11:35 AM
To: Detlef Bosau; end2end-interest at postel.org
Cc: Michael Kochte; Martin Reisslein; Frank Duerr; Daniel Minder
Subject: Re: [e2e] How shall we deal with servers with different bandwidthsand a common bottleneck to the client?


Detlef,
 
Here is a possible explanation for the results in your scenario -
 
Take the case when both connections are active and the queue at router 2 remains non-empty.
 
Every T seconds, there will be a packet departure at router 2, resulting in the queue size decreasing by 1 packet at time T.
 
If a packet from node 1 departs at time n*T, then at time (n+1)*T + ta1, another packet will arrive at router 2 from node 1.
      ta1 is the time taken by the Ack to reach node 1.
 
If a packet from node 0 departs at time n*T, then at time n*T + ta0 + t0, another packet will arrive at router 2 from node 0.
      ta0 is the time taken by the Ack to reach node 0. 
      t0 is the transmission time of a packet at 100 Mbps. 
      Another packet from node 0 may arrive at time n*T + ta0 + 2 * t0.
 
In the scenario, ta0 << T, ta1 << T, and t0 = T / 10, ta0 + t0 > ta1. I am assuming that propagation delays were set to 0 in the simulations.
 
It can be seen, that when a node 1 packet arrives at node 2, the queue is never full - a packet departure takes place ta1 seconds before its arrival, and no node 0 packet arrive during the ta1 seconds.
 
No such property holds for node 0 packets - hence node 0 packets are selectively dropped.
 
Changing bandwidths a bit or introducing real-life factors such as propagation delays, variable processing delays and/or variable Ethernet switching delays will probably break this synchronized relationship.
 
Regards,
Anil
 
Anil Agarwal
ViaSat Inc.
Germantown, MD
 

________________________________

From: end2end-interest-bounces at postel.org on behalf of Detlef Bosau
Sent: Sun 12/24/2006 5:52 PM
To: end2end-interest at postel.org
Cc: Michael Kochte; Daniel Minder; Martin Reisslein; Frank Duerr
Subject: Re: [e2e] How shall we deal with servers with different bandwidths and a common bottleneck to the client?


Detlef Bosau wrote: 

	I apologize if this is a stupid question.
	

I admit, it was a very stupid question :-)

Because my ASCII arts were terrible, I add a nam-screenshot here (hopefully, I?m allowed to send this mail in HTML):

 
Links: 
0-2: 100 Mbit/s, 1 ms
1-2: 10 Mbit/s, 1 ms
2-3: 100 Mbit/s, 10 ms
3-4: 10 MBit/s, 1 ms

Sender: 0,1
Receiver: 4


	My feeling is that the flow server 1 - client should achieve more throughput than the other. From what I see in a simulation, the ratio in the secnario above is roughly 2:1. (I did this simulation this evening, so admittedly there might be errors.) 
	
	Is there a general opinion how the throughput ratio should be in a scenario like this?


Obviously, my feeling is wrong. Perhaps, I should consider reality more than my feelings :-[ 

AIMD distributes  the path capacity (i.e. "memory") in equal shares.  So, in case of two flows sharing a path, each flow is assigned an equal window. Hence, the rates should be equal as they depend on the window (= estimate of path capaciyt) and RTT. (Well known rule of thumb: rate = cwnd/RTT)

However, the scenario depicted above is an interesting one: Apparently, the sender at node 1 is paced "ideally" by the link 1-2. So, packets sent by node 0 are dropped at node 3 unuduly often.  In consequence, the flow from 0 to 4 hardly achieves any throughput whereas the flow from 1 to 4 runs as if there was no competitor.

If the bandwdith 1-2 is changed a little bit, the bevaviour returns to the expected one.

I?m still not quite sure whether this behaviour matches reality or whether it is an NS2 artifact.

Detlef

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/d26abde2/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bild.png
Type: image/png
Size: 21858 bytes
Desc: bild.png
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20061225/d26abde2/bild-0001.png

From fred at cisco.com  Sat Dec 23 13:15:32 2006
From: fred at cisco.com (Fred Baker)
Date: Sat, 23 Dec 2006 13:15:32 -0800
Subject: [e2e] Extracting No. of packets or bytes in a router buffer
In-Reply-To: <Pine.LNX.4.58.0612231043090.2241@tesla.psc.edu>
References: <002901c7247e$89e4c920$6e8944c6@telemuse.net>
	<Pine.LNX.4.58.0612211430320.2241@tesla.psc.edu>
	<1166832054.9009.171.camel@officepc-junliu>
	<Pine.LNX.4.58.0612231043090.2241@tesla.psc.edu>
Message-ID: <EFF8579F-EEE8-4B2B-B334-12A4822C0964@cisco.com>

On Dec 23, 2006, at 8:01 AM, Matt Mathis wrote:
> ICMP has been dead as a measurement protocol for about 10 years  
> now. The problem is that nearly all implementations process ICMP at  
> substantially lower priority than other protocols, so the  
> measurements are far worse than reality.
>
> I think you are looking for something more along the lines of IPMP,  
> the IP measurement protocol.   Look for the expired Internet drafts:
> draft-bennett-ippm-ipmp-01      2003-03-05      Expired
> draft-mcgregor-ipmp-04  2004-02-04      Expired

http://tools.ietf.org/html/draft-bennett-ippm-ipmp
http://tools.ietf.org/html/draft-mcgregor-ipmp

> There is also a report by several people including Fred Baker and  
> me, analyzing these two conflicting drafts, and proposing yet  
> another variant.  I couldn't find the report quickly.  Perhaps Fred  
> has a copy.....?
>
> If you want to follow this thread, be sure to engage the router  
> vendors/large ISP's early and listen to them carefully, because the  
> academic and industrial agendas clash very badly.   (You should  
> read the report first.)
>
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://www.psc.edu/~mathis
> Work:412.268.3319    Home/Cell:412.654.7529
> -------------------------------------------


Is this what you're thinking of?

Let me reiterate your point - if you want features in routers and  
switches that will help you be able to determine what is happening in  
various networks along the way between here and there, you have two  
avenues. One is that you can measure externally and make inferences  
about the total end to end path that may not tell you much about any  
specific point. The other is that you can know specifics of the path  
and perhaps individual nodes on the path by asking them questions. If  
you want supporting features to be available to you in the routers,  
convince the ISPs that *they* want them. Reason: they will have to  
turn them on, and they will have to allow you access, and it will be  
their dollars that convince the vendors to build them. So think hard  
about how these will help the ISPs do what they do.

Note I am not in saying this throwing cold water on it. The ISPs are  
in fact looking for ways to deliver SLAs that involve multiple ISPs  
on an end to end path - that's part of the ITU NGN effort. Help them  
solve that problem and you might get a fair bit of interest. At the  
same time, if your solution is "interesting research" but doesn't  
help the ISPs solve a problem they want solved, expect results to be  
spotty at best.

Begin forwarded message:
> From: Mark Allman <mallman at icir.org>
> Date: September 7, 2004 6:33:19 AM PDT
> To: imrg-ipmp-review at guns.icir.org
> Subject: Mark Allman: [IMRG] ipmp review team report
> Reply-To: mallman at icir.org
>
> *** PGP SIGNATURE VERIFICATION ***
>
> *** Status:   Good Signature
> *** Signer:   Mark Allman <mallman at icir.org> (0xCE3222CE)
> *** Signed:   09/07/04 6:33:19 AM
> *** Verified: 09/07/04 7:13:11 AM
> *** BEGIN PGP VERIFIED MESSAGE ***
>
>
> Folks-
>
> FYI, here is what I sent to the IMRG mailing list (for those who do  
> not
> track it).
>
> Thanks again for all your hard work!
>
> allman
>
>
>
> ------- Forwarded Message
>
> To: imrg at irtf.org
> From: Mark Allman <mallman at icir.org>
> Organization: ICSI Center for Internet Research (ICIR)
> Song-of-the-Day: Paradise City
> Date: Tue, 07 Sep 2004 09:03:02 -0400
> Subject: [IMRG] ipmp review team report
>
>
> Folks-
>
> A while back you might remember that we had some discussion on the IP
> Measurement Protocol.  In order to try to gain some traction I asked a
> few folks to act as a review team to look over the two proposals on  
> the
> table (and the ancillary information).  The team has completed their
> work and did a very nice job of debating the issues involved in  
> IPMP and
> coming up with a summary of their feelings.
>
> The team members are listed at the bottom of the report and I wish to
> thank them for their diligence in reviewing these documents.
>
> The report from the group is below.  Please feel free to discuss the
> ideas enumerated in the report on this mailing list all you want.  The
> team is not the final word.  I convened the team to get some focued
> energy thinking about these issues.  The report is in no way  
> binding nor
> the community's final judgement.  So, please feel very free to  
> continue
> the discussion.
>
> allman
>
>
>
>
>
> IMRG IPMP Review Team Report
> - ----------------------------
>
> The Internet Measurement Research Group (within the IRTF) convened a
> small team to review the materials related to the IP Measurement
> Protocol (IPMP).  The members of the group (listed at the end of
> this report) discussed IPMP and several larger issues.  In
> particular, the team reviewed the following two Internet-Drafts:
>
>     draft-mcgregor-ipmp-03.txt
>     draft-bennett-ippm-ipmp-01.txt
>
> The goal of this effort was to chart a strawman course for moving
> forward with some sort of measurement protocol (if possible).
>
> Note: This message represents the group's consensus.  However, that
> does not mean that each member of the team agrees with each point in
> this note.  The group reached rough agreement, not unaminity.
>
> The following are the high-order bits from the discussion.
>
> The fundamental challenge that measurement protocols attempt to
> address is to provide a means to measure the network characteristics
> researchers and operators want to understand in a way that provides
> fine grained information about the network in a lightweight fashion.
> To this end, we would suggest that IPMP wants to develop tools that
> are:
>
>   - implementable in reasonable timeframes on existing equipment,
>     which means that they should not depend on ASIC development or
>     new equipment purchase
>
>   - deployable; ISPs would ideally want them, and at minimum not
>     turn them off
>
>   - useful to the ISPs in terms of their business rules and the
>     questions they ask about their own networks
>
> If the procedures or protocols are useful to the ISPs, one can
> expect that they will be willing to collect the data, and may under
> some appropriate rules also allow researchers to collect data or
> share collected data with researchers.
>
> In the above context, the team found the motivation for IPMP given
> in both documents to be lacking --- to the point where the team did
> not feel the current proposals are viable.  Several
> related/supporting points were discussed:
>
>   * From the perspective of a vendor developing equipment and
>     protocols or an ISP deploying them, the IPMP proposals on the
>     table do not look viable.  The fundamental goal of IPMP is to
>     display the structure of a network and many of its fine-scale
>     characteristics.  This is information that a service provider
>     does not share with anyone else except - maybe - under
>     NDA. Given that the protocols to obtain the information are
>     fairly complex and involve a fair level of memory writes, the
>     vendor will do this if and only if its ISP customers ask for it,
>     and they are not asking for this.
>
>   * Making a better ping or traceroute is, on the one hand, too
>     narrow and mechanistic a focus and yet also too focused on what
>     researchers might find compelling rather than what operators
>     would.
>
>   * A tool to reverse engineer a network isn't needed by the ISPs.
>     They already know the structure of their own networks.
>
> That said, the team **strongly** believes that there is much room
> for improvement in the state of network troubleshooting and
> debugging.  In particular:
>
>   * Some service providers are asking for a solution to a problem
>     that may yield data that researchers may find valuable.  Within
>     its own network, a service provider is generally interested in
>     locating the links that introduce variability into their
>     network.  It may view them as under-provisioned for offered
>     load, as inappropriately routed, or whatever, but they are in
>     fact interested in locating links that require upgrading in some
>     form.
>
>   * Some service providers are asking (in TIA and related fora) how
>     they can deploy SLAs that cross ISP boundaries. These may be
>     among ISPs that form business coalitions, such as Teleglobe has
>     tried to set up with its transit network customers, or among
>     regional networks such as US RBOCs that view transitive SLAs as
>     a rational approach.  The watchword in such consortia is "trust
>     but verify"; it is in their interest to have a procedure or
>     protocol that will allow them to isolate issues that may prevent
>     them from meeting SLA guarantees in something resembling real
>     time.  Since those SLAs are one-way, this means accurate one-way
>     delay and jitter measurements host to host, POP to POP, or CPE
>     to CPE.
>
> In addition, in looking at the protocols themselves, we found
> ourselves wondering how much could be learned by clever inference from
> fairly simple data collection and black box measurement, as opposed
> to explicit reporting of values.
>
> As another example, we note that the intention of such procedures as
> CalTech's FAST and MIT's XCP protocols is to detect and measure
> variable delays in the network and cause traffic to be sent in such
> a way as to maximize throughput while minimizing such delays. This
> fundamental question is a direct corollary to that raised in
> http://www.nwfusion.com/research/2002/1216isptestside1.html, and
> that raised in the context of transitive Tier 2 network SLAs. These
> would like to be able to identify the existence of an SLA failure or
> other disturbance in the Force on a route, report its magnitude, and
> isolate the disturbing device. To that end, we wonder can be done
> with the numbers measured by Dina Katabi's XCP protocol.
>
> Finally, the team wondered if a protocol that carries less global
> information but more precision would be more deployable.  For
> example if the stamps just consisted of an opaque ID, TTL and simple
> 32 bit counter running on "the most stable local frequency source",
> then the ISP (w/ the engineering documentation for their own gear)
> can use database techniques to compute everything carried by the
> current protocol.  The stamps are simple enough where we can, with a
> straight face, ask for them in multiple places within one box: input
> and output framers, bus DMA engines, etc.  We can envision that this
> would be an extremely valuable tool for an ISP to understand (and
> diagnose) certain QoS properties of their own network.  Note that,
> globally parsable metadata in the stamps probably has negative value
> to most ISPs because it reduces an ISP's ability to keep it's assets
> private.  The barrier to deployment in not so much the cost of the
> implementation, but the indirect cost of the leaking proprietary
> topology information.
>
> At the same time, external researchers could use inference
> techniques to get some of the same information, including most
> dynamic properties such as queue depths etc.  The external users get
> much less topology information, unless they make an explicit
> arrangement with the ISP to get the annotations associated with the
> opaque IDs.
>
> In summary, the team came to two points of consensus: 1) that the
> protocol is inadequately motivated by the proposals, even though
> ISPs would like to be able to measure their and their neighbors'
> networks; 2) that the protocol's complexity and intrusiveness are
> inadequately justified with respect to other, potentially more
> lightweight approaches that may be easier to deploy.  The main point
> is that to get a protocol deployed, ISPs need to ask for it loudly
> enough and router vendors need to be able to implement it easily
> enough, and neither is argued by these proposals.
>
> Review team members: Guy Almes (Internet2), Fred Baker (Cisco), Paul
>   Barford (UWisc), Chistophe Diot (Intel Research), Ralph Droms
>   (Cisco), Larry Dunn (Cisco), Matt Mathis (PSC), David Moore
>   (CAIDA), Jennifer Rexford (AT&T Research), Neil Spring (Univ. of
>   Washington)
> Scribe / team shepherd: Mark Allman (ICIR)
>
> ------- End of Forwarded Message
>
>
>
>
>
>
> *** END PGP VERIFIED MESSAGE ***


From detlef.bosau at web.de  Tue Dec 26 08:39:36 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Tue, 26 Dec 2006 17:39:36 +0100
Subject: [e2e] How shall we deal with servers with different
 bandwidthsand a common bottleneck to the client?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com>
References: <458DA380.1070207@web.de> <458F04C8.30100@web.de>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3547@VGAEXCH01.hq.corp.viasat.com>
	<0B0A20D0B3ECD742AA2514C8DDA3B0650A3549@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <45915048.2030908@web.de>

Agarwal, Anil wrote:
> Detlef,
>  
> In my earlier description, I had incorrectly assumed that link 2-3 was 
> at 10 Mbps. The nature of the problem is similar whether link 2-3 is 
> at 10 Mbps or 100 Mbps.

Admittedly, I didn?t understand it yesterday...

However, eventually you say:
> Changing bandwidths a bit or introducing real-life factors such as 
> propagation delays, variable processing delays and/or variable 
> Ethernet switch delays will probably break this synchronized 
> relationship. RED will also help.
>  

In fact, the behaviour disappears when I randomize the delays.
> One can construct many other similar scenarios, where one connection 
> is selectively favored over another. Perhaps, one more reason to use RED.
>  
I?m not quite sure about the relationship to RED here. (In fact, I still 
have no personal opinion to RED, however I have numerous questions to 
RED, but I think that?s not the question here.)

What I try to understand is basically, whether the observed behaviour is 
an artifact or not.

It might be as well a behaviour which happens only under rare 
circumstances, think of the capture effect in Ethernet.

In consequence, my basic doubt aganist all kind of *mulation (simulatin, 
emulation etc.) rises again. I personally make no difference between 
simulation and emulation. To my knowledge, emulation is used 
synonymously for "real time simulation" and as such is prone for the 
same artifacts and errors for any other kind of simulation. Particularly 
the synchronicity between the two links 1-2 and 3-4 is basically 
artificial. Even with quartz-controlled timers I severely doubt hat two 
NICs will ever run perfeclty synchronous - and this is not even 
necessary as long as data sent by one NIC is read error free by the other.

Like all other kinds of *mulation the NS2 is nothing than a set of 
difference equations put in "some strange form".
It thus represents our hopes, fantasy and religious beliefs. 
Unfortunately, reality doesn?t care about any of them  ;-)

Hover, I did not consider this scenario by pure chance. The question 
behind this scenario is a very precise one.

Let?s draw a network, somewhat more simple this time.

Sender(i) ------------(some network)-----------Splitter------(some 
network)---------Receiver


Sender(i) denotes several senders.

Now, a stone aged question of mine arises:

Can it be guaranteed that there is no overload, neither on the network 
before the splitter nor on the network behind it?

In some off list discussion last year, Mark Allman pointed out that 
overload on both network paths (before and behind the splitter) is 
prevented by TCP congestion control and in split connections the 
splitter prevents overload by TCP flow control.

E.g.:  Consider one sender and the path before the splitter a 100 Mbps 
path, the path behind the splitter a 10 Mbps path.

In fact (and that?s why I added an experimental flow control to TCP in 
my simulation), when the buffer at the splitter is sufficiently large 
after some settling time the window on the sender is correctly curtailed 
that way that the sender achieves an average rate of 10 Mbps.

NB: I think I have to re-read the thesis by Rajiv Chakaravorthy on this 
issue because it?s the question whether we need window clamping techniques.
It seems that any necessary clamping can be achieved by the existing 
flow control and congestion control mechanisms of TCP.

Now, if there is an array of senders, denoted as sender(i), we have 
basically three scenarios.

1.:  The common bottleneck is _before_ the splitter.

Perfect. The splitter is fed slower than it is served. We?re lucky.

2.: The common bottleneck is _behind_ the splitter.

Perfect. The splitter assigns an equal share of bandwidth to each flow 
and throttles the senders by means of flow control if necessary.
We are lucky again. (I don?t know why we need Christmas in the presence 
of so much luck.)

3.: There is no "common bottleneck".

I have to explain this because at a first glance, this appears to be 
nonsense: Either the path head, i.e. the part before the splitter, or 
the path tail, i.e. the part behind the splitter, should be the common 
bottleneck if there is one.

Consider the path tail running with 10 Mbps. Consider sender(0) being 
capable of sending at 100 Mbps. Consider sender(1) being capable of 
sending at 2 Mbps.

If sender(0)  runs alone, the bottleneck is the path tail. If sender(1) 
runs alone, the bottleneck is the path head.

Consider both senders are running in parallel. What will happen in the 
presence of a splitter?

Again we have two cases.

1.: The path tail runs TCP or some other window controlled protocol 
which maintains available ressource. Hopefully, the flow from sender(1) 
will achieve something about 2 Mbps and the flow from sender(0) will get 
the rest. I don?t know. I?m actually trying to find out.

2.: The path tail runs some rate controlled protocol as it would make 
sense e.g. in satellite connections where the startup behaviour of TCP 
is extremely annoying. Now: How will this protocol distrute the availabe 
ressources among the flows? One could give equal shares to them, i.e. 5 
Mbps to each.
However, because the flow from sender(1) cannot send faster than 2 Mbps, 
3 Mbps would remain unused.

Particularly the latter scenario seems to be some kind of "end to end" 
problem: The splitter node does not know the end to end ressource 
situation and thus have to leave the distribution of ressources to the 
end nodes.

Do I go wrong here?

Any comments are highly appreciated.


From Anil.Agarwal at viasat.com  Wed Dec 27 07:55:49 2006
From: Anil.Agarwal at viasat.com (Agarwal, Anil)
Date: Wed, 27 Dec 2006 10:55:49 -0500
Subject: [e2e] How shall we deal with servers with different
	bandwidthsand a common bottleneck to the client?
Message-ID: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com>

Detlef,

You wrote  

> > One can construct many other similar scenarios, where one 
> connection 
> > is selectively favored over another. Perhaps, one more 
> reason to use RED.
> >  
> I?m not quite sure about the relationship to RED here. (In 
> fact, I still have no personal opinion to RED, however I have 
> numerous questions to RED, but I think that?s not the question here.)

RED will probabilistically discard packets before the queue gets full, resulting in packet discards for both connections. RED will be biased against the larger connection. Eventually, both TCP connections will reach the same (statistical) rate and experience the same packet drop rate.

> 
> Again we have two cases.
> 
> 1.: The path tail runs TCP or some other window controlled protocol 
> which maintains available ressource. Hopefully, the flow from 
> sender(1) 
> will achieve something about 2 Mbps and the flow from 
> sender(0) will get 
> the rest. I don?t know. I?m actually trying to find out.
> 
> 2.: The path tail runs some rate controlled protocol as it would make 
> sense e.g. in satellite connections where the startup 
> behaviour of TCP 
> is extremely annoying. Now: How will this protocol distrute 
> the availabe 
> ressources among the flows? One could give equal shares to 
> them, i.e. 5 
> Mbps to each.
> However, because the flow from sender(1) cannot send faster 
> than 2 Mbps, 
> 3 Mbps would remain unused.
> 
> Particularly the latter scenario seems to be some kind of 
> "end to end" 
> problem: The splitter node does not know the end to end ressource 
> situation and thus have to leave the distribution of 
> ressources to the 
> end nodes.
> 
> Do I go wrong here?

A "good" TCP-splitter should produce correct (desired) results in this scenario and many others. It should produce correct results with 2 or 200 connections, with 2 or 200 different network segments and bottlenecks, some before and some after the TCP-splitter; it should produce correct results when the amount of bandwidth available over the various network segments (especially over the satellite network segment) is variable and not known a priori; it should produce correct results when there is cross traffic on the bottleneck links in the network segments, which does not traverse the TCP-splitter.

A TCP-splitter that "splits" bandwidth "equally" among various connections will not make the cut.

Hint: TCP by itself does a commendable, although not perfect, job of meeting the above scenarios.

I don't know how well other commercial TCP-splitters perform in these scenarios, but I can speak for one - we use our (my) own TCP PEP product over our VSAT networks; it works fairly well over many such scenarios (not quite 200 network segments :)). I tried out a few more scenarios, including yours, since your last email.

Regards,
Anil


From detlef.bosau at web.de  Wed Dec 27 09:47:52 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Wed, 27 Dec 2006 18:47:52 +0100
Subject: [e2e] How shall we deal with servers with different
 bandwidthsand a common bottleneck to the client?
In-Reply-To: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com>
References: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com>
Message-ID: <4592B1C8.5040601@web.de>

Agarwal, Anil wrote:
>>
>> I?m not quite sure about the relationship to RED here. (In 
>> fact, I still have no personal opinion to RED, however I have 
>> numerous questions to RED, but I think that?s not the question here.)
>>     
>
> RED will probabilistically discard packets before the queue gets full, resulting in packet discards for both connections. RED will be biased against the larger connection. Eventually, both TCP connections will reach the same (statistical) rate and experience the same packet drop rate.
>   

However, the latter is no consequence of RED but the basic goal of AIMD.

>   
>> Again we have two cases.
>>
>> 1.: The path tail runs TCP or some other window controlled protocol 
>> which maintains available ressource. Hopefully, the flow from 
>> sender(1) 
>> will achieve something about 2 Mbps and the flow from 
>> sender(0) will get 
>> the rest. I don?t know. I?m actually trying to find out.
>>
>> 2.: The path tail runs some rate controlled protocol as it would make 
>> sense e.g. in satellite connections where the startup 
>> behaviour of TCP 
>> is extremely annoying. Now: How will this protocol distrute 
>> the availabe 
>> ressources among the flows? One could give equal shares to 
>> them, i.e. 5 
>> Mbps to each.
>> However, because the flow from sender(1) cannot send faster 
>> than 2 Mbps, 
>> 3 Mbps would remain unused.
>>
>> Particularly the latter scenario seems to be some kind of 
>> "end to end" 
>> problem: The splitter node does not know the end to end ressource 
>> situation and thus have to leave the distribution of 
>> ressources to the 
>> end nodes.
>>
>> Do I go wrong here?
>>     
>
> A "good" TCP-splitter should produce correct (desired) results in this scenario and many others. It should produce correct results with 2 or 200 connections, with 2 or 200 different network segments and bottlenecks, some before and some after the TCP-splitter; it should produce correct results when the amount of bandwidth available over the various network segments (especially over the satellite network segment) is variable and not known a priori; it should produce correct results when there is cross traffic on the bottleneck links in the network segments, which does not traverse the TCP-splitter.
>   

It should. However I find it not easy to understand how this is achieved 
in each individual case. I even do not know that much literature on this 
issue. I know the works of Rajiv Chakravorthy and Rami Mukhtar, but I?m 
not aware of more works in this area.
If you know some additional work, I would appreciate any hint.

> A TCP-splitter that "splits" bandwidth "equally" among various connections will not make the cut.
>
> Hint: TCP by itself does a commendable, although not perfect, job of meeting the above scenarios.
>
>   
Yes, of course. However, introducing a splitter into a path has 
consequences for the end-to-end semantics of ACK ackets and thus for TCP 
selfclocking, the RTT, the path capacity as seen by the sender.  So, if 
TCP does a good job in the absence of a splitter it is not clear by 
itself, that id did a good job in the presence of a splitter ;-)

> I don't know how well other commercial TCP-splitters perform in these scenarios, but I can speak for one - we use our (my) own TCP PEP product over our VSAT networks; it works fairly well over many such scenarios (not quite 200 network segments :)). I tried out a few more scenarios, including yours, since your last email.
>
>   

Do you have any literature on this one?

It?s sometimes a pity that commercial products are often poorly 
documented. This might be understood from a commercial point of view 
where it is always a concern to get an advantage about possible 
competitors. But it?s quite insatisfactory for a scientific discussion.

In addition: I?m  generally quite reluctant towards simulation. I think 
it?s always better to understand why a middlebox/splitter does or does 
not behave in a certain way. I often read papers where it is not quite 
clear why certain scenarios are chosen for simulations and other ones 
are left out. However, simulation can help to understand a certain 
behaviour. So, what I?m mostly interested in is the rationale behind the 
statement that a certain scenario will work.

Detlef


From detlef.bosau at web.de  Thu Dec 28 11:10:32 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Thu, 28 Dec 2006 20:10:32 +0100
Subject: [e2e] Commercial splitters and end to end issues.
In-Reply-To: <4592B1C8.5040601@web.de>
References: <0B0A20D0B3ECD742AA2514C8DDA3B06517CAD4@VGAEXCH01.hq.corp.viasat.com>
	<4592B1C8.5040601@web.de>
Message-ID: <459416A8.6020005@web.de>

In our recent discussion it was said:
>
>> I don't know how well other commercial TCP-splitters perform in these 
>> scenarios, but I can speak for one - we use our (my) own TCP PEP 
>> product over our VSAT networks; it works fairly well over many such 
>> scenarios (not quite 200 network segments :)). I tried out a few more 
>> scenarios, including yours, since your last email.
>>

Admittedly, I?m somewhat disappointed.

It is well possible that my scenario was nonsense. And it is well 
possible that my question was not clear.

However, I always enjoy discussions where I can learn from. And now, I?m 
given the hint "our commercial product does this".
It is not clear whether this is the case or not. Neither do I have a 
descriptions of the mechanisms in use. (I did not find any at the ViaSat 
Homepage. Only the typical whitepaper material.)

Yesterday, I thought I had written a paper on this issue - and it were 
rejected with the simple statement: "Our commercial product does this." 
With no further description or hint.

Although it might be no big deal to set up a simulation for connection 
splitting, I spent some work on it. And my questions might be stupid - 
at least theire are honest. And when I read the comment, a commercial 
product did this all without further hint, this made me sad and it made 
me angry. Particularly as I know similar comments from paper rejects.

Perhaps, it?s my personal problem that I cannot deal with situations 
like this very well.

Detlef


From detlef.bosau at web.de  Sun Dec 31 11:15:44 2006
From: detlef.bosau at web.de (Detlef Bosau)
Date: Sun, 31 Dec 2006 20:15:44 +0100
Subject: [e2e] Are we doing sliding window in the Internet?
Message-ID: <45980C60.9020405@web.de>

Happy New Year, Miss Sophy My Dear!

(Although this sketch is in Englisch, it is hardly known outside Germay 
to my knowledge.)

I wonder whether we?re really doing sliding window in TCP connections 
all the time or whether a number of connections have congestion windows 
of only one segment, i.e. behave like stop?n wait in reality.

When I assume an  Ethernet like MTU, i.e. 1500 byte = 12000 bit, and 10 
ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps.

 From this I would expect that in quite a few cases a TCP connection 
will have a congestion window of 1 MSS or even less.

In addition, some weeks ago I read a paper, I don?t remember were, that 
we should reconsider and perhaps resize our MTUs to larger values for 
networks with large bandwidth. The rationale was simply as follows: The 
MTU size is always a tradeoff between overhead and jitter. From Ethernet 
we know that we can accept a maximum packet duration of 12000 bit / (10 
Mbps) = 1.2 ms  and the resultig jitter. For Gigabit Ethernet
a maximum packet duration of 1.2 ms would result in a MTU size of 1500 
kbyte = 1.5 Mbyte.

If so, we would see "stop?n wait like" connections much more frequently 
than today.

Is this view correct?


From DMedhi at umkc.edu  Sun Dec 31 14:50:59 2006
From: DMedhi at umkc.edu (Medhi, Deep)
Date: Sun, 31 Dec 2006 16:50:59 -0600
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45980C60.9020405@web.de>
Message-ID: <032EC4F75A527A4FA58C5B1B5DECFBB301F249E6@KC-MSX1.kc.umkc.edu>


See 

John Heidemann, Katia Obraczka, and Joe Touch. "Modeling the Performance of HTTP Over Several Transport Protocols." ACM/IEEE Transactions on Networking, vol. 5, pp. 616-630, October, 1997. 

This covers maximum usable window size for different transmission media.

	-- Deep

> -----Original Message-----
> From: end2end-interest-bounces at postel.org 
> [mailto:end2end-interest-bounces at postel.org] On Behalf Of Detlef Bosau
> Sent: Sunday, December 31, 2006 1:16 PM
> To: end2end-interest at postel.org
> Cc: Daniel Minder; frank.duerr
> Subject: [e2e] Are we doing sliding window in the Internet?
> 
> Happy New Year, Miss Sophy My Dear!
> 
> (Although this sketch is in Englisch, it is hardly known 
> outside Germay to my knowledge.)
> 
> I wonder whether we?re really doing sliding window in TCP 
> connections all the time or whether a number of connections 
> have congestion windows of only one segment, i.e. behave like 
> stop?n wait in reality.
> 
> When I assume an  Ethernet like MTU, i.e. 1500 byte = 12000 
> bit, and 10 ms RTT the throughput is roughly 12000 bit / 10 
> ms = 1.2 Mbps.
> 
>  From this I would expect that in quite a few cases a TCP 
> connection will have a congestion window of 1 MSS or even less.
> 
> In addition, some weeks ago I read a paper, I don?t remember 
> were, that we should reconsider and perhaps resize our MTUs 
> to larger values for networks with large bandwidth. The 
> rationale was simply as follows: The MTU size is always a 
> tradeoff between overhead and jitter. From Ethernet we know 
> that we can accept a maximum packet duration of 12000 bit / (10
> Mbps) = 1.2 ms  and the resultig jitter. For Gigabit Ethernet 
> a maximum packet duration of 1.2 ms would result in a MTU 
> size of 1500 kbyte = 1.5 Mbyte.
> 
> If so, we would see "stop?n wait like" connections much more 
> frequently than today.
> 
> Is this view correct?
> 
> 
> 
> 


From fred at cisco.com  Sun Dec 31 16:29:00 2006
From: fred at cisco.com (Fred Baker)
Date: Sun, 31 Dec 2006 16:29:00 -0800
Subject: [e2e] Are we doing sliding window in the Internet?
In-Reply-To: <45980C60.9020405@web.de>
References: <45980C60.9020405@web.de>
Message-ID: <2C63D9E0-9738-44A9-8A7F-C59D36276EF4@cisco.com>

yes and no.

A large percentage of sessions are very short - count the bytes in  
this email and consider how many TCP segments are required to carry  
it, for example, or look through your web cache to see the sizes of  
objects it stores. We are doing the sliding window algorithm, but it  
cuts very short when the TCP session abruptly closes.

For longer exchanges - p2p and many others - yes, we indeed do  
sliding window.

I don't see any reason to believe that TCPs tune themselves to have  
exactly RTT/MSS segments outstanding. That would be the optimal  
number to have ourstanding, but generally they will have the smallest  
of { the offered window, the sender's maximum window, and the used  
window at which they start dropping traffic }. If they never see  
loss, they can keep an incredibly large amount of data outstanding  
regardless of the values of RTT and MSS.

I wonder where you got the notion that a typical session had a 10 ms  
RTT. In a LAN environment where the servers are in the same building,  
that is probably the case. But consider these rather more typical  
examples: across my VPN to a machine at work, across the US to MIT,  
and across the Atlantic to you:

[stealth-10-32-244-218:~] fred% traceroute irp-view7
traceroute to irp-view7.cisco.com (171.70.65.144), 64 hops max, 40  
byte packets
1  fred-vpn (10.32.244.217)  1.486 ms  1.047 ms  1.034 ms
2  n003-000-000-000.static.ge.com (3.7.12.1)  22.360 ms  20.962 ms   
22.194 ms
3  10.34.251.137 (10.34.251.137)  23.559 ms  22.586 ms  22.236 ms
4  sjc20-a5-gw2 (10.34.250.78)  21.465 ms  22.544 ms  20.748 ms
5  sjc20-sbb5-gw1 (128.107.180.105)  22.294 ms  22.351 ms  22.803 ms
6  sjc20-rbb-gw5 (128.107.180.22)  21.583 ms  22.517 ms  24.190 ms
7  sjc12-rbb-gw4 (128.107.180.2)  22.115 ms  23.143 ms  21.478 ms
8  sjc5-sbb4-gw1 (171.71.241.253)  26.550 ms  23.122 ms  21.569 ms
9  sjc12-dc5-gw2 (171.71.241.66)  22.115 ms  22.435 ms  22.185 ms
10  sjc5-dc3-gw2 (171.71.243.46)  22.031 ms  21.846 ms  22.185 ms
11  irp-view7 (171.70.65.144)  22.760 ms  22.912 ms  21.941 ms

[stealth-10-32-244-218:~] fred% traceroute www.mit.edu
traceroute to www.mit.edu (18.7.22.83), 64 hops max, 40 byte packets
1  fred-vpn (10.32.244.217)  1.468 ms  1.108 ms  1.083 ms
2  172.16.16.1 (172.16.16.1)  11.994 ms  10.351 ms  10.858 ms
3  cbshost-68-111-47-251.sbcox.net (68.111.47.251)  9.238 ms  19.517  
ms  9.857 ms
4  12.125.98.101 (12.125.98.101)  11.849 ms  11.913 ms  12.086 ms
5  gbr1-p100.la2ca.ip.att.net (12.123.28.130)  12.348 ms  11.736 ms   
12.891 ms
6  tbr2-p013502.la2ca.ip.att.net (12.122.11.145)  15.071 ms  13.462  
ms  13.453 ms
7  12.127.3.221 (12.127.3.221)  12.643 ms  13.761 ms  14.345 ms
8  br1-a3110s9.attga.ip.att.net (192.205.33.230)  13.842 ms  12.414  
ms  12.647 ms
9  ae-32-54.ebr2.losangeles1.level3.net (4.68.102.126)  16.651 ms  
ae-32-56.ebr2.losangeles1.level3.net (4.68.102.190)  20.154 ms *
10  * * *
11  ae-2.ebr1.sanjose1.level3.net (4.69.132.9)  28.222 ms  24.319 ms  
ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2)  35.417 ms
12  ae-1-100.ebr2.sanjose1.level3.net (4.69.132.2)  25.640 ms  22.567  
ms *
13  ae-3.ebr1.denver1.level3.net (4.69.132.58)  52.275 ms  60.821 ms   
54.384 ms
14  ae-3.ebr1.chicago1.level3.net (4.69.132.62)  68.285 ms  
ae-1-100.ebr2.denver1.level3.net (4.69.132.38)  59.113 ms  68.779 ms
15  * * *
16  * ae-7-7.car1.boston1.level3.net (4.69.132.241)  94.977 ms *
17  ae-7-7.car1.boston1.level3.net (4.69.132.241)  95.821 ms  
ae-11-11.car2.boston1.level3.net (4.69.132.246)  93.856 ms  
ae-7-7.car1.boston1.level3.net (4.69.132.241)  96.735 ms
18  ae-11-11.car2.boston1.level3.net (4.69.132.246)  91.093 ms   
92.125 ms 4.79.2.2 (4.79.2.2)  95.802 ms
19  4.79.2.2 (4.79.2.2)  93.945 ms  95.336 ms  97.301 ms
20  w92-rtr-1-backbone.mit.edu (18.168.0.25)  98.246 ms www.mit.edu  
(18.7.22.83)  93.657 ms w92-rtr-1-backbone.mit.edu (18.168.0.25)   
92.610 ms

[stealth-10-32-244-218:~] fred% traceroute web.de
traceroute to web.de (217.72.195.42), 64 hops max, 40 byte packets
1  fred-vpn (10.32.244.217)  1.482 ms  1.078 ms  1.093 ms
2  172.16.16.1 (172.16.16.1)  12.131 ms  9.318 ms  8.140 ms
3  cbshost-68-111-47-251.sbcox.net (68.111.47.251)  10.790 ms  9.051  
ms  10.564 ms
4  12.125.98.101 (12.125.98.101)  13.580 ms  21.643 ms  12.206 ms
5  gbr2-p100.la2ca.ip.att.net (12.123.28.134)  12.446 ms  12.914 ms   
12.006 ms
6  tbr2-p013602.la2ca.ip.att.net (12.122.11.149)  13.463 ms  12.711  
ms  12.187 ms
7  12.127.3.213 (12.127.3.213)  185.324 ms  11.845 ms  12.189 ms
8  192.205.33.226 (192.205.33.226)  12.008 ms  11.665 ms  25.390 ms
9  ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65)  13.695 ms  
ae-1-51.bbr1.losangeles1.level3.net (4.68.102.1)  11.645 ms  
ae-1-53.bbr1.losangeles1.level3.net (4.68.102.65)  12.517 ms
10  ae-1-0.bbr1.frankfurt1.level3.net (212.187.128.30)  171.886 ms  
as-2-0.bbr2.frankfurt1.level3.net (4.68.128.169)  167.640 ms  168.895 ms
11  ge-10-0.ipcolo1.frankfurt1.level3.net (4.68.118.9)  170.336 ms  
ge-11-1.ipcolo1.frankfurt1.level3.net (4.68.118.105)  174.211 ms  
ge-10-1.ipcolo1.frankfurt1.level3.net (4.68.118.73)  169.730 ms
12  gw-megaspace.frankfurt.eu.level3.net (212.162.44.158)  169.276  
ms  170.110 ms  168.099 ms
13  te-2-3.gw-backbone-d.bs.ka.schlund.net (212.227.120.17)  171.412  
ms  171.820 ms  170.265 ms
14  a0kac2.gw-distwe-a.bs.ka.schlund.net (212.227.121.218)  175.416  
ms  173.653 ms  174.007 ms
15  ha-42.web.de (217.72.195.42)  174.908 ms  174.921 ms  175.821 ms


On Dec 31, 2006, at 11:15 AM, Detlef Bosau wrote:

> Happy New Year, Miss Sophy My Dear!
>
> (Although this sketch is in Englisch, it is hardly known outside  
> Germay to my knowledge.)
>
> I wonder whether we?re really doing sliding window in TCP  
> connections all the time or whether a number of connections have  
> congestion windows of only one segment, i.e. behave like stop?n  
> wait in reality.
>
> When I assume an  Ethernet like MTU, i.e. 1500 byte = 12000 bit,  
> and 10 ms RTT the throughput is roughly 12000 bit / 10 ms = 1.2 Mbps.
>
> From this I would expect that in quite a few cases a TCP connection  
> will have a congestion window of 1 MSS or even less.
>
> In addition, some weeks ago I read a paper, I don?t remember were,  
> that we should reconsider and perhaps resize our MTUs to larger  
> values for networks with large bandwidth. The rationale was simply  
> as follows: The MTU size is always a tradeoff between overhead and  
> jitter. From Ethernet we know that we can accept a maximum packet  
> duration of 12000 bit / (10 Mbps) = 1.2 ms  and the resultig  
> jitter. For Gigabit Ethernet
> a maximum packet duration of 1.2 ms would result in a MTU size of  
> 1500 kbyte = 1.5 Mbyte.
>
> If so, we would see "stop?n wait like" connections much more  
> frequently than today.
>
> Is this view correct?
>