[e2e] Extracting No. of packets or bytes in a router buffer

David P. Reed dpreed at reed.com
Sat Dec 23 13:01:08 PST 2006


I find the first sentence here very odd.   ICMP is used every day.   It 
is hardly dead.

Perhaps you meant that it doesn't work very well?

The real point you are making here is that *any* measurement protocol 
that can be distinguished from regular traffic by routers is at high 
risk of generating completely *wrong* answers, for two reasons:

1. Router vendors find it convenient to make their routers privilege 
real traffic over measurement overhead.

2. There is a constant temptation to "game" any benchmarking tests that 
vendors tend to accede to.   Academics do the same thing when they are 
proposing great new ideas that they want to sell - so this isn't a 
statement that says commercial is bad and academic has the moral high 
ground.   (the benchmarking game in the database business (TP1) or the 
processor business (MIPS or FLOPS according to standard benchmarks) or 
the 3D graphics business are all unfortunately gamed every day).

Purveyors of ideas are tempted to lie or spin performance numbers.   
That's the high-tech industry version of I.F.Stone's: "governments lie".

Why would a router vendor offer to report a reliable number over SNMP?

So the general conclusion one should draw from this is that performance 
measurements should be done without the help of vendors or proposers 
(call them purveyors), with a great deal of effort put into measuring 
"real" cases that cannot be detected and distorted by purveyor 
interpretations that either:

a. allow the purveyor to claim that the measurement is bogus (ICMP 
should never have been broken by vendor optimizations, but it was in 
their interest to do so as noted above) or

b. allow the purveyor to generate much better numbers than will ever be 
seen in practice, either by special casing measurement packets, or 
putting the definition of the measurement being made in the hands of the 
purveyor.

Matt Mathis wrote:
> ICMP has been dead as a measurement protocol for about 10 years now.   The
> problem is that nearly all implementations process ICMP at substantially lower
> priority than other protocols, so the measurements are far worse than reality.
>
> I think you are looking for something more along the lines of IPMP, the IP
> measurement protocol.   Look for the expired Internet drafts:
> draft-bennett-ippm-ipmp-01      2003-03-05      Expired
> draft-mcgregor-ipmp-04  2004-02-04      Expired
>
> There is also a report by several people including Fred Baker and me,
> analyzing these two conflicting drafts, and proposing yet another variant.  I
> couldn't find the report quickly.  Perhaps Fred has a copy.....?
>
> If you want to follow this thread, be sure to engage the router vendors/large
> ISP's early and listen to them carefully, because the academic and industrial
> agendas clash very badly.   (You should read the report first.)
>
> Thanks,
> --MM--
> -------------------------------------------
> Matt Mathis      http://www.psc.edu/~mathis
> Work:412.268.3319    Home/Cell:412.654.7529
> -------------------------------------------
> Evil is defined by mortals who think they know
> "The Truth" and use force to apply it to others.
>
> On Fri, 22 Dec 2006, Jun Liu wrote:
>
>   
>> I am amazed by this thread of discussion. The key issue of correctly
>> estimating the queuing delay at a particular router is to make the
>> queuing delay of interest distinct from the delays caused by other
>> factors. I agree with Matt Mathis' opinion that the difference of a pair
>> of <ingress, egress> timestamps experienced by an IP packet at a router
>> closely characterizes the queuing delay of this packet at this router.
>> However, it is inconvenient for an end system to obtain the values of
>> the difference of time-stamp pairs. The NLANR PMA Router Clamp has only
>> been installed surrounding one core router and relies
>> on special measurement circuits. The data measured by Clamp is suitable
>> for statistics analysis rather than providing dynamic indications to end
>> hosts.
>>
>> I have been working on estimating the maximum queuing delay at the
>> outbound queue of the slowest link along an end-to-end path. Here, a
>> slowest link refers to a link with the longest maximum queuing delay
>> along the path. The queuing delay at the slowest link can be estimated
>> from measured RTTs along the path. If the histogram of a set of measured
>> RTTs has a single mode, then the maximum queuing delay at the slowest
>> link can be approximated by the delay value at the mode less the value
>> of the minimum RTT. The estimation of the maximum queuing delay at the
>> slowest link is largely affected by the non-ignorable queuing delays at
>> other routers. For example, a histogram of measured RTTs can have
>> multiple modes when there are two or more identical slowest links in a
>> path. Hence, appropriate technique of filtering noises is necessary.
>> However, multimodal based estimation issues remain unsolved.
>>
>> I am thinking of modifying the ICMP protocol to serve for carrying
>> dynamic delay information at routers to end hosts. The reason of
>> considering ICMP is due to two concerns. First, ICMP should have been
>> implemented at all routers and end hosts. "ICMP, uses the basic support
>> of IP as if it were a higher level protocol, however, ICMP is actually
>> an integral part of IP, and must be implemented by every IP
>> module." [RFC 792] Second, a lot of active probing based network
>> measurement methods were developed based on ICMP.
>>
>> Currently, an ICMP error reporting message is sent by a router upon
>> processing an erroneous IP packet and is routed back to the sender of
>> this IP packet. When this happens, the IP packet is dropped at the
>> router. Let's call an erroneous IP packet an echo, and the corresponding
>> ICMP packet an echo reply. The proposed modification is to make a pair
>> of echo and echo reply packets co-exist in the network. Namely, an echo
>> packet is kept routed to its destination after it has triggered an echo
>> reply which will be sent back to the sender of this echo. When we assume
>> that another echo reply will be sent by the destination of this echo
>> packet, the sender will obtain two echo reply packets on one echo. The
>> RTTs of the two echo reply packets share delays on the common links they
>> both traversed.
>>
>> Consider a simple network shown below. We denote by d(x,y) the delay
>> from network node x to y. d(x,y) consists of the link latency,
>> transmission delay on link (x,y), and the delay in node x (which is a
>> sum of queuing and processing delays within x). We are about to estimate
>> the queuing delay at router B (either dynamic delays or the maximum
>> delay). We consider a worst case scenario by assuming that d(A,B) and
>> d(B,D) always have similar dynamic values. This scenario happens when
>> the bandwidths of link (A,B) and (B,D) are same, the outgoing queues of
>> the two router have the same size, and the same traffic pattern is on
>> routers A and B.
>>
>>             d(S,A)                  d(A,B)               d(B,D)
>> Sender -------------------> R_A ----------------> R_B
>> ------------------> Destination
>>        <-------------------     <----------------
>> <------------------
>>             d(A,S)                  d(B,A)               d(D,B)
>>
>> If the sender can make both router B and the destination send an echo
>> reply on every echo packet it sends, then the difference of the RTTs
>> between the two echo reply packets offers us a value of (d(B,D)+d(D,B)).
>> This value much closely characterizes the queuing delay at router B than
>> using pure RTTs. This method makes queuing delay information timely
>> delivered to an end node---the sender of the echo packets.
>>
>> The method described here is somewhat similar to the idea adopted in van
>> Jacobson's work of pathchar which incrementally measures the link
>> bandwidth hop-by-hop from the link next to the source to the link next
>> to the destination. However, there are two differences. First, in
>> pathchar, only one echo reply can be triggered by an echo, and a pair of
>> echo and echo reply can not co-exist in the network. Second, in
>> pathchar, the RTTs of echo reply packets taking different path lengths
>> do not necessarily share common delay portions.
>>
>> Two obvious side effects of this modified ICMP protocol are the overhead
>> and the security issues. Higher overhead is made because of the
>> co-existence of echo and echo reply packets in the network. One echo
>> packet can potentially trigger as many echo reply packets as the number
>> of intermediate routers between a pair of sender and destination. Thus,
>> the security issue deserves consideration.
>>
>> My question here is that whether such modification on ICMP is
>> acceptable, or it simply introduces a new evil.
>>
>> Jun Liu
>>
>> On Fri, 2006-12-22 at 14:09 -0500, Matt Mathis wrote:
>>     
>>> Another approach is to get accurate time stamps of ingress/egress packets and
>>> use the difference in the time stamps to compute effective queue depths.  The
>>> NLANR PMA team was building a "router clamp", an "octopus" designed to get
>>> traces from all interfaces of a busy Internet2 core router.  I have since lost
>>> track of the details. Google "router clamp pma" for clues.
>>>
>>> I basically don't believe queue depths measured by any other means, because
>>> there are so many cascaded queues in a typical modern router.  I point out
>>> that most NIC's have short queues right at the wire, along with every DMA
>>> engine and bus arbitrator, etc.
>>>
>>> Claiming that an internal software instrument accurately represents the true
>>> aggregate queue depth for the router is equivalent to asserting that none of
>>> the other potential bottlenecks in the router have any queued packets. If they
>>> never have queued packets, why did the HW people bother with the silicon?   I
>>> conclude there is always potential for packets to be queued out of scope of
>>> the software instruments.
>>>
>>> It's a long story, but I have first hand experience with one of these cases:
>>> my external measurement of maximum queues size was only half of the design size,
>>> because the "wrong" bottleneck dominated.
>>>
>>> Good luck,
>>> --MM--
>>> -------------------------------------------
>>> Matt Mathis      http://www.psc.edu/~mathis
>>> Work:412.268.3319    Home/Cell:412.654.7529
>>> -------------------------------------------
>>> Evil is defined by mortals who think they know
>>> "The Truth" and use force to apply it to others.
>>>
>>> On Wed, 20 Dec 2006, Lynne Jolitz wrote:
>>>
>>>       
>>>> Fred has very accurately and enjoyably answered the hardware question. But it gets more complicated when you consider transport-level in hardware, because the staging of the data from the bus and application memory involves buffering too, as well as contention reordering buffers used in the processing of transport-level protocols.
>>>>
>>>> Even more complicated is multiple transport interfaces in say, a blade server, where the buffering of the blade server's frame may be significant - you might be combining blade elements with different logic that stages them to a very high bandwidth 10 Gbit or greater output technology, where there is a bit of blurring between where switching and where channels from the transport layer merge.
>>>>
>>>> The upshot is given all the elements involved, it is hard to tell when something leaves the buffer, but it is always possible to tell when something *enters* the output buffer. All stacks track the outbound packet count, and obviously you can determine the rate by sampling the counters. But confirming how much has yet to hit the depth of buffering will be s very difficult exercise as Fred notes. It may be the case that the rules are very different from one packet to the next (e.g. very different dwell times in the buffers - we don't always have non-preemptive buffering).
>>>>
>>>> Lynne Jolitz
>>>>
>>>> ----
>>>> We use SpamQuiz.
>>>> If your ISP didn't make the grade try http://lynne.telemuse.net
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: end2end-interest-bounces at postel.org
>>>>> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Fred Baker
>>>>> Sent: Wednesday, December 13, 2006 12:17 PM
>>>>> To: Craig Partridge
>>>>> Cc: end2end-interest at postel.org
>>>>> Subject: Re: [e2e] Extracting No. of packets or bytes in a router buffer
>>>>>
>>>>>
>>>>> You're talking about ifOutQLen. It was originally proposed in RFC
>>>>> 1066 (1988) and deprecated in the Interfaces Group MIB (RFC 1573
>>>>> 1994). The reason it was deprecated is not documented, but the
>>>>> fundamental issue is that it is non-trivial to calculate and is very
>>>>> ephemeral.
>>>>>
>>>>> The big issue in calculating it is that it is rarely exactly one
>>>>> queue. Consider a simple case on simple hardware available in 1994.
>>>>>
>>>>>     +----------+ |
>>>>>     |          | |
>>>>>     |  CPU     +-+
>>>>>     |          | |
>>>>>     +----------+ | BUS
>>>>>                  |
>>>>>     +----------+ | +---------+
>>>>>     |          | +-+ LANCE   |
>>>>>     |          | | +---------+
>>>>>     |  DRAM    +-+
>>>>>     |          | | +---------+
>>>>>     |          | +-+ LANCE   |
>>>>>     +----------+ | +---------+
>>>>>
>>>>> I'm using the term "bus" in the most general possible sense - some
>>>>> way for the various devices to get to the common memory. This gets
>>>>> implemented many ways.
>>>>>
>>>>> The AMD 7990 LANCE chip was and is a common Ethernet implementation.
>>>>> It has in front of it a ring in which one can describe up to 2^N
>>>>> messages (0 <= N <= 7) awaiting transmission. The LANCE has no idea
>>>>> at any given time how many messages are waiting - it only knows
>>>>> whether it is working on one right now or is idle, and when switching
>>>>> from message to message it knows whether the next slot it considers
>>>>> contains a message. So it can't keep such a counter. The device
>>>>> driver similarly has a limited view; it might know how many it has
>>>>> put in and how many it has taken out again, but it doesn't know
>>>>> whether the LANCE has perhaps completed some of the messages it
>>>>> hasn't taken out yet. So in the sense of the definition ("The length
>>>>> of the output packet queue (in packets)."), it doesn't know how many
>>>>> are still waiting. In addition, it is common for such queues or rings
>>>>> to be configured pretty small, with excess going into a diffserv-
>>>>> described set of software queues.
>>>>>
>>>>> There are far more general problems. Cisco has a fast forwarding
>>>>> technology that we use on some of our midrange products that
>>>>> calculates when messages should be sent and schedules them in a
>>>>> common calendar queue. Every mumble time units, the traffic that
>>>>> should be sent during THIS time interval are picked up and dispersed
>>>>> to the various interfaces they need to go out. Hence, there isn't a
>>>>> single "output queue", but rather a commingled output schedule that
>>>>> shifts traffic to other output queues at various times - which in
>>>>> turn do something akin to what I described above.
>>>>>
>>>>> Also, in modern equipment one often has forwarders and drivers on NIC
>>>>> cards rather than having some central processor do that. For
>>>>> management purposes, the drivers maintain their counts locally and
>>>>> periodically (perhaps once a second) upload the contents of those
>>>>> counters to a place where management can see them.
>>>>>
>>>>> So when you ask "what is the current queue depth", I have to ask what
>>>>> the hardware has, what of that has already been spent but isn't
>>>>> cleaned up yet, what is in how many software queues, how they are
>>>>> organized, and whether that number has been put somewhere that
>>>>> management can see it.
>>>>>
>>>>> Oh - did I mention encrypt/decrypt units, compressors, and other
>>>>> inline services that might have their own queues associated with them?
>>>>>
>>>>> Yes, there is a definition on the books. I don't know that it answers
>>>>> the question.
>>>>>
>>>>> On Dec 13, 2006, at 10:54 AM, Craig Partridge wrote:
>>>>>
>>>>>           
>>>>>> Queue sizes are standard SNMP variables and thus could be sampled at
>>>>>> these intervals.  But it looks as if you want the queues on a per host
>>>>>> basis?
>>>>>>
>>>>>> Craig
>>>>>>
>>>>>> In message <Pine.LNX.
>>>>>> 4.44.0612130958100.28208-100000 at cmm2.cmmacs.ernet.in>, V A
>>>>>> nil Kumar writes:
>>>>>>
>>>>>>             
>>>>>>> We are searching for any known techniques to continuously sample
>>>>>>> (say at
>>>>>>> every 100 msec interval) the buffer occupancy of router
>>>>>>> interfaces. The
>>>>>>> requirement is to extract or estimate the instantaneous value of the
>>>>>>> number of packets or bytes in the router buffer from another
>>>>>>> machine in
>>>>>>> the network, and not the maximum possible router buffer size.
>>>>>>>
>>>>>>> Any suggestion, advice or pointer to literature on this?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> Anil
>>>>>>>               
>
>
>   


More information about the end2end-interest mailing list