From Black_David at emc.com  Fri Apr  1 06:11:04 2005
From: Black_David at emc.com (Black_David@emc.com)
Date: Fri, 1 Apr 2005 09:11:04 -0500 
Subject: [e2e] 911 and cell phones
Message-ID: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>

David,

It's too bad that California screwed this up.  I know from
actual experience that calling 911 from at least my cell
phone in Mass. enables one to reach the state police in 
short order.

As for "exact position" of a cell phone, I think that's a red
herring, because the only way to get it accurately appears to
involve a GPS receiver in the cell phone, which most cell
phones don't have.  While I'm not an expert, my impression
from what I've seen is that triangulation based on location
of cell site antennas has not been sufficiently workable
in practice.  Even GPS has its limits - if the receiver can't
see enough satellites, the result is a 2-D fix instead
of 3-D, which can be a problem in a multi-story building.

OTOH, if you want to trust your life to a global SLP
infrastructure (Uh, where can I find one of those?),
that's your choice ...

Thanks,
--David
----------------------------------------------------
David L. Black, Senior Technologist
EMC Corporation, 176 South St., Hopkinton, MA  01748
+1 (508) 293-7953             FAX: +1 (508) 293-7786
black_david at emc.com        Mobile: +1 (978) 394-7754
----------------------------------------------------


> -----Original Message-----
> From: end2end-interest-bounces at postel.org 
> [mailto:end2end-interest-bounces at postel.org] On Behalf Of 
> David P. Reed
> Sent: Thursday, March 31, 2005 1:13 PM
> To: Alex Cannara
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] Skype and congestion collapse.
> 
> Alex - the underlying assumption is that traditional telephony delivers 
> 911 functionality best.   Well, the word on the street is that in 
> California, if you call 911 on your basic, non IP cell phone, your exact 
> position is delivered to ... well, no one knows where, but it's a place 
> that has no capability to actually transfer that information to anyone 
> who actually can help you in an emergency.   Better to call directory 
> assistance for the phone number of your local police dept. and hope they 
> don't tell you "911", because that will guarantee 30-90 minutes of 
> screwing around.
> 
> OK, maybe wired phones still do 911 OK, but do PBXes?   I doubt it - so 
> the point about bosses may be bogus as well.
> 
> Here the argument that 911 should be "in the network" fails.  I'd much 
> rather have my actual physical telephone be smart enough to figure out 
> how to summon emergency services (perhaps finding the doctor who is in 
> the next cubicle over if the SLP emergency service existed), I think.
> 
> Dale Hatfield points out that the phone companies have made it 
> *impossible* to deploy a "911-like" service over the WWW, because who 
> can trust that a person would actually tell the truth that they have a 
> life or death situation on their hands.   But of course we can ALL trust 
> Verizon Wireless with our lives... yeah right.

From dpreed at reed.com  Fri Apr  1 06:53:27 2005
From: dpreed at reed.com (David P. Reed)
Date: Fri, 01 Apr 2005 09:53:27 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
References: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
Message-ID: <424D6067.9040401@reed.com>

David - exact position may not matter in most cases, but that's what 
Vonage is being beaten up about (I have 911 on the Vonage line 
activated, and it gets through to my local emergency services just fine 
because I told the system when I set it up where that was.)

I note that getting the Massachusetts "state police" is rarely useful 
unless you are driving on the Massachusetts Turnpike (they might as well 
be a call center in Bangalore).   They cannot by law assist you, and do 
not have the best means to pass on calls to localities, who might help 
you if you observe someone being mugged or raped on the street in (say) 
downtown Brockton.

As far as I know, every CDMA cell phone being sold today (the vast 
majority in the market) have GPS in them (in the form of A-GPS, a 
proprietary technology that comes from qualcomm, which used GPS receiver 
in the phone, plus an assist from towers that gets the autonomous GPS 
re-locked fast when it goes out of satellite coverage).   I think that 
GSM phones also all have GPS onboard as well.

You are right that tower triangulation has failed, but the E911 mandate 
for cell phones still holds, and GPS is the technology that has been 
universally adopted, and works pretty well, as far as getting location.

But as I said, knowing approximate or exact location isn't very good if 
the system design actually routes calls away from local responders to a 
single point of failure in some remote, windowless building that has no 
direct local presence.


From hgs at cs.columbia.edu  Fri Apr  1 07:11:31 2005
From: hgs at cs.columbia.edu (Henning Schulzrinne)
Date: Fri, 01 Apr 2005 10:11:31 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
References: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
Message-ID: <424D64A3.5050607@cs.columbia.edu>

People interested in this topic might want to follow the work of the 
ECRIT working group in the IETF (and, for location delivery, the GEOPRIV 
working group). Some related material can be found at 
http://www.cs.columbia.edu/sip/emergency.html

Short summary: Emergency calling ("911") is undergoing a radical 
technical transformation, motivated by the difficulty to support mobile 
devices, number portability, telematics services and VoIP in the 
traditional, 1960ish technology that is currently being used. As usual, 
it will take a decade or more for this transition.

Black_David at emc.com wrote:
> David,
> 
> It's too bad that California screwed this up.  I know from
> actual experience that calling 911 from at least my cell
> phone in Mass. enables one to reach the state police in 
> short order.
> 
> As for "exact position" of a cell phone, I think that's a red
> herring, because the only way to get it accurately appears to
> involve a GPS receiver in the cell phone, which most cell
> phones don't have.  While I'm not an expert, my impression
> from what I've seen is that triangulation based on location
> of cell site antennas has not been sufficiently workable
> in practice.  Even GPS has its limits - if the receiver can't
> see enough satellites, the result is a 2-D fix instead
> of 3-D, which can be a problem in a multi-story building.
> 
> OTOH, if you want to trust your life to a global SLP
> infrastructure (Uh, where can I find one of those?),
> that's your choice ...
> 
> Thanks,
> --David
> ----------------------------------------------------
> David L. Black, Senior Technologist
> EMC Corporation, 176 South St., Hopkinton, MA  01748
> +1 (508) 293-7953             FAX: +1 (508) 293-7786
> black_david at emc.com        Mobile: +1 (978) 394-7754
> ----------------------------------------------------
> 
> 
> 
>>-----Original Message-----
>>From: end2end-interest-bounces at postel.org 
>>[mailto:end2end-interest-bounces at postel.org] On Behalf Of 
>>David P. Reed
>>Sent: Thursday, March 31, 2005 1:13 PM
>>To: Alex Cannara
>>Cc: end2end-interest at postel.org
>>Subject: Re: [e2e] Skype and congestion collapse.
>>
>>Alex - the underlying assumption is that traditional telephony delivers 
>>911 functionality best.   Well, the word on the street is that in 
>>California, if you call 911 on your basic, non IP cell phone, your exact 
>>position is delivered to ... well, no one knows where, but it's a place 
>>that has no capability to actually transfer that information to anyone 
>>who actually can help you in an emergency.   Better to call directory 
>>assistance for the phone number of your local police dept. and hope they 
>>don't tell you "911", because that will guarantee 30-90 minutes of 
>>screwing around.
>>
>>OK, maybe wired phones still do 911 OK, but do PBXes?   I doubt it - so 
>>the point about bosses may be bogus as well.
>>
>>Here the argument that 911 should be "in the network" fails.  I'd much 
>>rather have my actual physical telephone be smart enough to figure out 
>>how to summon emergency services (perhaps finding the doctor who is in 
>>the next cubicle over if the SLP emergency service existed), I think.
>>
>>Dale Hatfield points out that the phone companies have made it 
>>*impossible* to deploy a "911-like" service over the WWW, because who 
>>can trust that a person would actually tell the truth that they have a 
>>life or death situation on their hands.   But of course we can ALL trust 
>>Verizon Wireless with our lives... yeah right.

From dpreed at reed.com  Fri Apr  1 07:34:38 2005
From: dpreed at reed.com (David P. Reed)
Date: Fri, 01 Apr 2005 10:34:38 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <424D64A3.5050607@cs.columbia.edu>
References: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
	<424D64A3.5050607@cs.columbia.edu>
Message-ID: <424D6A0E.4050904@reed.com>

Henning, it's great that there's an ECRIT working group with a 10-year 
transition plan.   The best is always worth waiting for.   It's kind of 
like the plan to deal with human-caused climate change.   First we need 
to study it, so we can figure out the optimum theoretical answer (or for 
that matter, whether there's a problem at all).

However, the Internet started based on a quite different approach.   We 
didn't start by creating an IETF to study every problem to death and 
dole out money to academics doing theoretical studies.

I follow ECRIT from afar, but frankly, I wonder if the group has the 
guts to take any technical leadership role in actually "doing" something.

The IETF in many cases, and in my personal opinion, has become a 
pointless technocracy that talks mostly to itself.   In this case, it 
will be irrelevant if it views the issue as a "smooth" ten-year 
transition, to be centrally managed and planned.

Like most cases of innovation in technology that is rapidly changing, 
innovation in location will come from the edge, from pragmatic 
experimental labs, from open source, from entrepreneurs doing 
proprietary things that the public likes; it will be about working code 
and rough consensus (what the IETF used to be, before the current 
Stepford academics took it over and got rid of any hope of doing good 
work, instead focusing on MUST and MAY and WILL and SHOULD).

And the IETF bureaucrats will travel the world, enjoying junkets in 
fancy hotels, running BOFs and if they are slightly lucky, maybe having 
an influence over a tiny part of the future.


From Jon.Crowcroft at cl.cam.ac.uk  Fri Apr  1 07:45:03 2005
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Fri, 01 Apr 2005 16:45:03 +0100
Subject: [e2e] information superhighway finally realized.
Message-ID: <E1DHOKW-00045O-00@mta1.cl.cam.ac.uk>

Thinking about this, what Al Gore really meant has just sunk in:

What you need is a person and vehicle tracking system - this can be
multi-modal - if a person carries a device that has gps (or cell/tower
or wifi, or other triangulation based) location services, then its
easy - but its also easy if you have plenty of surveillance cameras -
these have two benefits
1/ you can then implement car registration recognition, and charge for
road usage (and congestion charge) fairly and efficiently, reducing 
pollution, accidents and delays and permitting the police to catch 
folks that break laws (bad and fast driving etc) with hi resolution
instead of notoriously inaccurate human witnesses
2/ you can also use this to recognize and catch terrorists, since the
recognition system can be plugged into a traffic anomaly detection
system and autmatically detect people renting cars in airports and
driving them full of gas into tall buildings

Looking further afield, one could put in automatic speed control in
the cars, and even immobilizers so that if the camera at the roadside
shows a person who doesn't have the visage of one of the recognized
(safe, approved) drivers of the said car, it either goes very slowly
or not at all  - even further afield, the vehicle could be
autmatically routed to a county jail - this could also apply to people
who havnt paid their tax, or are on the run.

It could prevent soldiers awol from iraq driving over the border to 
canada or way down south to mexico

3/ M-ad hoc
think of all the benefits - if all the cars are fitted with 802.11
devices, we could also use them to provide a network - we could cause
cars to route to places where there is a gap in the connectivity at
the moment - as per the grossglauser/tse result, and using network
coding, this would provide for arbitrary capacity almost unbounded in
gas rich countries.

4/ crative accountancy
at the same time, one could have creative CRISPS (Connectivity Rich
ISPs) that have ingenious billing schemes - your wireless ad hoc
broadband bill could be rolled into the tax on your gas at the gas
station - you go fill up with 10 gallons of gas and 100Gbytes of
download  - if you run a multi-occupancy vehicle, and you also
run peer-to-peer file sharing you would get a double discount.

5/ border routing considerations
Of course one would need to consider the regulatory problems - if the
US were to build a lot of roads just south of the canadian border to
offer "offshore network capacity"< but the canadians used Hydro to
re-charge lots of fuel-cell and electrc/hybrid cars, then someone is
going to think about power-line broadband - then there could be
interference unless one deploys OFDM (Oil For Download Mobility)

6/ denial of service, and other security problems
of course it is easy to jam radio, and its easy to jam on the radio
too. so we need to worry about this, but not too much - if someone
blocks your download, you just drive to blockbuster and pick up the
DVD there anyhow...

7/ network management considerations

The system should be just as manageable as the internet and the road
system. congestion will be rare (there will be no packet loss in my
car), and resilience will be provided by fast oregon bypasses.
overlay routing (put that bike in the pickup, put that laptop on the
bike, put that USB memory stick in the laptop) will naturally occur
and is a matter for further study.

Now, back to your normal service... 

In missive <424D6067.9040401 at reed.com>, "David P. Reed" typed:

 >>David - exact position may not matter in most cases, but that's what 
 >>Vonage is being beaten up about (I have 911 on the Vonage line 
 >>activated, and it gets through to my local emergency services just fine 
 >>because I told the system when I set it up where that was.)
 >>
 >>I note that getting the Massachusetts "state police" is rarely useful 
 >>unless you are driving on the Massachusetts Turnpike (they might as well 
 >>be a call center in Bangalore).   They cannot by law assist you, and do 
 >>not have the best means to pass on calls to localities, who might help 
 >>you if you observe someone being mugged or raped on the street in (say) 
 >>downtown Brockton.
 >>
 >>As far as I know, every CDMA cell phone being sold today (the vast 
 >>majority in the market) have GPS in them (in the form of A-GPS, a 
 >>proprietary technology that comes from qualcomm, which used GPS receiver 
 >>in the phone, plus an assist from towers that gets the autonomous GPS 
 >>re-locked fast when it goes out of satellite coverage).   I think that 
 >>GSM phones also all have GPS onboard as well.
 >>
 >>You are right that tower triangulation has failed, but the E911 mandate 
 >>for cell phones still holds, and GPS is the technology that has been 
 >>universally adopted, and works pretty well, as far as getting location.
 >>
 >>But as I said, knowing approximate or exact location isn't very good if 
 >>the system design actually routes calls away from local responders to a 
 >>single point of failure in some remote, windowless building that has no 
 >>direct local presence.
 >>
 >>

 cheers

   jon


From hgs at cs.columbia.edu  Fri Apr  1 07:45:43 2005
From: hgs at cs.columbia.edu (Henning Schulzrinne)
Date: Fri, 01 Apr 2005 10:45:43 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <424D6A0E.4050904@reed.com>
References: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
	<424D64A3.5050607@cs.columbia.edu> <424D6A0E.4050904@reed.com>
Message-ID: <424D6CA7.1080000@cs.columbia.edu>

None of the active participants in the IETF is arguing for a ten-year 
plan; I think everyone involved would be happy to ditch the existing 
system tomorrow - and certainly keep people from spending gobs of money 
on patching it. Limitations of available public funding, industry 
structures (and, in some cases, lack of technical skills in PSAPs) make 
a slow deployment likely. It's obviously more fun to write jeremiads 
about the IETF than deal with the complicated reality of large deployed, 
safety-critical systems.

David P. Reed wrote:
> Henning, it's great that there's an ECRIT working group with a 10-year 
> transition plan.   The best is always worth waiting for.   It's kind of 
> like the plan to deal with human-caused climate change.   First we need 
> to study it, so we can figure out the optimum theoretical answer (or for 
> that matter, whether there's a problem at all).

From dpreed at reed.com  Fri Apr  1 08:10:56 2005
From: dpreed at reed.com (David P. Reed)
Date: Fri, 01 Apr 2005 11:10:56 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <424D6CA7.1080000@cs.columbia.edu>
References: <B459CE1AFFC52D4688B2A5B842CA35EA07E5D08E@corpmx14.corp.emc.com>
	<424D64A3.5050607@cs.columbia.edu> <424D6A0E.4050904@reed.com>
	<424D6CA7.1080000@cs.columbia.edu>
Message-ID: <424D7290.1010501@reed.com>

Henning Schulzrinne wrote:

> It's obviously more fun to write jeremiads about the IETF than deal 
> with the complicated reality of large deployed, safety-critical systems.

Indeed it is!  :-)   The difference is that I have worked with such 
systems in the past, and work directly with people who do have to deal 
with such things.   The IETF has no accountability whatsoever.

From braden at ISI.EDU  Fri Apr  1 09:06:04 2005
From: braden at ISI.EDU (Bob Braden)
Date: Fri, 1 Apr 2005 09:06:04 -0800 (PST)
Subject: [e2e] 911 and cell phones
Message-ID: <200504011706.JAA25472@gra.isi.edu>


What does this topic have to do with the end-to-end principle/practice?

Bob Braden

From jtw at lcs.mit.edu  Fri Apr  1 09:35:54 2005
From: jtw at lcs.mit.edu (John Wroclawski)
Date: Fri, 1 Apr 2005 12:35:54 -0500
Subject: [e2e] 911 and cell phones
In-Reply-To: <200504011706.JAA25472@gra.isi.edu>
References: <200504011706.JAA25472@gra.isi.edu>
Message-ID: <p06210207be7335d96b47@[128.30.5.151]>

At 9:06 AM -0800 4/1/05, Bob Braden wrote:
>What does this topic have to do with the end-to-end principle/practice?
>
>Bob Braden

Actually, it does seem to - when you strip lots of detail away a key 
question seems to be the architectural choice of "end system knows 
where it is and tells the dispatcher" vs "infrastructure expected to 
know where the thing using it is". Interesting that both VoIP and 
cellular tech are apparently pushing the architecture towards a more 
e2e model.

--john

From Farooq.Bari at cingular.com  Fri Apr  1 14:41:47 2005
From: Farooq.Bari at cingular.com (Bari, Farooq)
Date: Fri, 1 Apr 2005 14:41:47 -0800
Subject: [e2e] 911 and cell phones
Message-ID: <F9753E41A179D7438C42C6A834654434015DE2A7@wa-msg10-bth.wireless.attws.com>

Can the end device be trusted in such matters? Can someone sitting in
say Europe make a VoIP emergency call in US? Should not it be like trust
but verify?

> -----Original Message-----
> From: end2end-interest-bounces at postel.org [mailto:end2end-interest-
> bounces at postel.org] On Behalf Of John Wroclawski
> Sent: Friday, April 01, 2005 9:36 AM
> To: Bob Braden; Black_David at emc.com; dpreed at reed.com
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] 911 and cell phones
> 
> At 9:06 AM -0800 4/1/05, Bob Braden wrote:
> >What does this topic have to do with the end-to-end
principle/practice?
> >
> >Bob Braden
> 
> Actually, it does seem to - when you strip lots of detail away a key
> question seems to be the architectural choice of "end system knows
> where it is and tells the dispatcher" vs "infrastructure expected to
> know where the thing using it is". Interesting that both VoIP and
> cellular tech are apparently pushing the architecture towards a more
> e2e model.
> 
> --john

From cannara at attglobal.net  Fri Apr  1 20:19:26 2005
From: cannara at attglobal.net (Cannara)
Date: Fri, 01 Apr 2005 20:19:26 -0800
Subject: [e2e] Skype and congestion collapse.
References: <026F8EEDAD2C4342A993203088C1FC051C270E@esealmw109.eemea.ericsson.se>
	<42360909.6090509@attglobal.net> <424C3D96.3000402@reed.com>
Message-ID: <424E1D4E.3BDC7AFF@attglobal.net>

Actually, David, I wasn't talking about cell phones, but wired lines. 
However, I did need to use 911 via a cell some months back, when an idiot kid
was shooting BBs at cars passing near where we live.  He was dumb enough to
shoot at my wife's side of the car as we cruised back past, trying to identify
his house.  So I called 911, which in Calif goes to the CHP.  The CHP called
the Sheriff and within minutes 2 cars were there.  The officers said they'd
give the kid a real scare and put him in one car, while talking about going to
Juvenile Hall.  They then impressed on him the seriousness of what he'd done,
and he also gave up his friend, who had hidden in the garage.  Then the absent
parent returned home in his obligatory SUV to witness how well he'd brought up
his kid.  As the officers explained to all of them the illegality off such
weapons in our county, they also let the twist in the wind a bit wondering if
we'd press charges.  After some time, we let the officers know we wouldn't and
they said they'd make the kids write letters of apology.  They did a good job
of that, cutting short potential lives of crime -- all thanks to a cellphone
and 911.  :]

At some point, laws and trust can and do work together to provide reliable
emergency services.  IP phone isn't there yet.

Alex

"David P. Reed" wrote:
> 
> Alex - the underlying assumption is that traditional telephony delivers
> 911 functionality best.   Well, the word on the street is that in
> California, if you call 911 on your basic, non IP cell phone, your exact
> position is delivered to ... well, no one knows where, but it's a place
> that has no capability to actually transfer that information to anyone
> who actually can help you in an emergency.   Better to call directory
> assistance for the phone number of your local police dept. and hope they
> don't tell you "911", because that will guarantee 30-90 minutes of
> screwing around.
> 
> OK, maybe wired phones still do 911 OK, but do PBXes?   I doubt it - so
> the point about bosses may be bogus as well.
> 
> Here the argument that 911 should be "in the network" fails.   I'd much
> rather have my actual physical telephone be smart enough to figure out
> how to summon emergency services (perhaps finding the doctor who is in
> the next cubicle over if the SLP emergency service existed), I think.
> 
> Dale Hatfield points out that the phone companies have made it
> *impossible* to deploy a "911-like" service over the WWW, because who
> can trust that a person would actually tell the truth that they have a
> life or death situation on their hands.   But of course we can ALL trust
> Verizon Wireless with our lives... yeah right.


From cannara at attglobal.net  Fri Apr  1 21:13:24 2005
From: cannara at attglobal.net (Cannara)
Date: Fri, 01 Apr 2005 21:13:24 -0800
Subject: [e2e] TFRC vs UDP
References: <A3863F3136CBC546A40A61BA9CBA9D93079C9D@sonusmail03.sonusnet.com>
Message-ID: <424E29F4.2DC4DA98@attglobal.net>

The problem with all of these "The transport does what the network should do"
bandaids is that they're just that.  Combine dealing with congestion at the
wrong layer, in naive ways, with the traditional lack of standards review and
source control, and that's given us our wonderfully insecure, spam-ridden
Internet.  What we have today is barely a networking existence proof, after
spending many millions of taxpayers' $, over about 30 years for certain
'researchers' and their grad students to write papers, convene around the
world and 'invent' rather than engineer protocols.  This is not meant as a
criticism of DCCP, TFRC, etc., just a commentary on the vast difference
between a subsidized, poorly-managed, self-involved, bureaucracy-stunted
Internet and something elegant, economical, efficient and worthy of general
pride, like Ethernet quickly became as has remained.  :]

The just out "E2E Vision" draft says this, perhaps inadvertently, by
suggesting that we need a "vision" for data communications for the next "10-15
years".  Well, yes, we needed it 10-15 years ago, when serious protocol R&D
was stopped -- without properly addressing 'minor' details, like congestion,
or even node addressing.  The Vision doc also exhibits the Internet
bureaucracy's traditional NIH approach, as in:

"The older members of the data communications research community spent some of
their formative years in the time when data communications was being
revolutionized by the creation of a new paradigm: packet switching. While
packet switching is now an accepted, indeed, lauded way to think about data
communications, into the early 1980s it was still a radical idea and into the
1990s required periodic justification."

leading newer readers to think the Internet folks invented packet switching.
This self-serving comment isn't even approrpiate in a "vision" document, but
it reflects the tradition of maintaining ignorance that any bureaucracy
depends on to avoid scrutiny.  After all, TCP/IP folks then apparently never
even knew what a MAC was, because the IMPs on their Unix boxes somehow used
phone lines to get to other IMPs on other Unix boxes.  They apparently didn't
even understand why they should have lauded things like Ethernet, Znet,
CromemcoNet, CorvusNet, XNS, SNA, yadda, yadda, which in the '70s depended not
on the public dole from DARPA, but real corporate investment in efficient
networking systems.  Though not being a great IBM fan, I do have to admit to
knowing no one who ever hacked an SNA network, but someone probably did
something, once, and it was likely harder that getting Sendmail to execute
remote code.

What a "Vision" doc could say, to be more trustable than a Carl Rove memo, is
maybe:  "Today's Internet has demonstrated that open, international network
communication, at will, is a realizable goal.  Unfortunately, lack of proper
engineering attention to certain areas has also exposed shortsightedness on
the part of its designers.  For example, a mistaken trust in human nature has
left our Internet and all its users exposed to extremely serious issues:
insecurity, denial of service, unwanted traffic, lack of efficiency, economic
barriers, all manner of traditional scams, and even heightened potential for
identity loss.  The vision for the Internet now should be to move beyond its
existence-proof phase and into the realm of a safe, reliable, economical and
responsible utility."

Anyway, it's interesting and encouraging that we at least have some folks
concerned about the future.  I hope rocking the boat is on their agenda.

Alex

"Phelan, Tom" wrote:
> 
> Hi Syed,
> 
> DCCP includes TFRC as one of its congestion control algorithms, and there has been quite a bit of discussion in the group of the impact of TFRC on streaming media applications.  The DCCP User Guide contains an extensive discussion of the issue.  Unfortunately, it's timed out of the drafts archive as we work out the future course of the guide, but it's available at http://www.phelan-4.com/dccp/draft-ietf-dccp-user-guide-02.txt.
> 
> Tom Phelan
> 
> > -----Original Message-----
> > From: end2end-interest-bounces at postel.org
> > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Syed Faisal
> > Hasan
> > Sent: Thursday, March 31, 2005 8:37 AM
> > To: gdc at iki.fi
> > Cc: end2end-interest at postel.org
> > Subject: Re: [e2e] TFRC vs UDP
> >
> >
> > Hi Dado,
> >
> >
> > >Syed Faisal Hasan wrote:
> > >>
> > >>To whom it may concern,
> > >>
> > >>TFRC was designed for use by the Continuous Media (CM)
> > applications. But
> > >>why will a CM application which is performing well using
> > UDP, use TFRC if
> > >>there is performance gap (more latency, less number of
> > packets transmitted
> > >>in the same time, high rate fluctuations in the beginning)
> > betwen UDP and
> > >>TFRC ? May be thats the reason we haven't seen any
> > applicatons using TFRC.
> > >>On the other hand there is no (I haven't found) research
> > which analyzes
> > >>the performance difference between UDP and TFRC. It is
> > clear that TFRC
> > >>will not perform exactly like UDP ( due to TFRC's
> > friendliness with TCP),
> > >>but how much can we expect from TFRC?
> > >
> > >
> > >Hi Syed,
> > >
> > >the motivation is that although your application would work
> > fine and you (a
> > >single person in a society) would have a good welfare using
> > UDP, it is your
> > >fellow citizens that would potentially suffer from your actions. The
> > >network resource should be distributed fairly - whatever it
> > means. If we
> > >all started to disregard other users, the network might stop working
> > >properly - equivalent to anarchy in a society. Mechanisms
> > are needed to
> > >guarantee fair distribution of the resource. TFRC attempts
> > to provide
> > >mechanisms to distribute the bandwidth resource in the same
> > way TCP does.
> > >
> > >I think performance issues depend more on competing TCP
> > flows and the
> > >network than on the TFRC control algorithm.
> >
> > I understand what you are talking about. But I want to know
> > the performance
> > difference of
> > UDP and TFRC in the same scenario. Is there any published
> > research on this?
> >
> > Faisal

From cannara at attglobal.net  Fri Apr  1 21:20:44 2005
From: cannara at attglobal.net (Cannara)
Date: Fri, 01 Apr 2005 21:20:44 -0800
Subject: [e2e] 911 and cell phones
References: <F9753E41A179D7438C42C6A834654434015DE2A7@wa-msg10-bth.wireless.attws.com>
Message-ID: <424E2BAC.3E57AF8D@attglobal.net>

In fact, both infrastructure and end nodes know where the caller is.  Which is
where the trust & security of the wired POTS system originates -- the physical
line has a trusted, internal identifier no one but the telco ever knows, and
our cellphones each have unique identifiers we never see, but which the
services must know in order to locate us.  While we may think our phone
numbers are what identify us and our locations, they do not.  They're just
names on proprietary tables of unique internal system identifiers.

Alex

"Bari, Farooq" wrote:
> 
> Can the end device be trusted in such matters? Can someone sitting in
> say Europe make a VoIP emergency call in US? Should not it be like trust
> but verify?
> 
> > -----Original Message-----
> > From: end2end-interest-bounces at postel.org [mailto:end2end-interest-
> > bounces at postel.org] On Behalf Of John Wroclawski
> > Sent: Friday, April 01, 2005 9:36 AM
> > To: Bob Braden; Black_David at emc.com; dpreed at reed.com
> > Cc: end2end-interest at postel.org
> > Subject: Re: [e2e] 911 and cell phones
> >
> > At 9:06 AM -0800 4/1/05, Bob Braden wrote:
> > >What does this topic have to do with the end-to-end
> principle/practice?
> > >
> > >Bob Braden
> >
> > Actually, it does seem to - when you strip lots of detail away a key
> > question seems to be the architectural choice of "end system knows
> > where it is and tells the dispatcher" vs "infrastructure expected to
> > know where the thing using it is". Interesting that both VoIP and
> > cellular tech are apparently pushing the architecture towards a more
> > e2e model.
> >
> > --john

From cannara at attglobal.net  Fri Apr  1 21:31:42 2005
From: cannara at attglobal.net (Cannara)
Date: Fri, 01 Apr 2005 21:31:42 -0800
Subject: [e2e] information superhighway finally realized.
References: <E1DHOKW-00045O-00@mta1.cl.cam.ac.uk>
Message-ID: <424E2E3E.EAA691B5@attglobal.net>

Good Jon!  But, Connecticut and several other states are already passing laws
that prevent lots of info from car transponders from being used by anyone but
the drivers -- thank goodness!  You may have heard of the rental agency in
Conn. that was billing customers per mile when they exceeded speed limits. 
Nasty!  :]

Alex

Jon Crowcroft wrote:
> 
> Thinking about this, what Al Gore really meant has just sunk in:
> 
> What you need is a person and vehicle tracking system - this can be
> multi-modal - if a person carries a device that has gps (or cell/tower
> or wifi, or other triangulation based) location services, then its
> easy - but its also easy if you have plenty of surveillance cameras -
> these have two benefits
> 1/ you can then implement car registration recognition, and charge for
> road usage (and congestion charge) fairly and efficiently, reducing
> pollution, accidents and delays and permitting the police to catch
> folks that break laws (bad and fast driving etc) with hi resolution
> instead of notoriously inaccurate human witnesses
> 2/ you can also use this to recognize and catch terrorists, since the
> recognition system can be plugged into a traffic anomaly detection
> system and autmatically detect people renting cars in airports and
> driving them full of gas into tall buildings
> 
> Looking further afield, one could put in automatic speed control in
> the cars, and even immobilizers so that if the camera at the roadside
> shows a person who doesn't have the visage of one of the recognized
> (safe, approved) drivers of the said car, it either goes very slowly
> or not at all  - even further afield, the vehicle could be
> autmatically routed to a county jail - this could also apply to people
> who havnt paid their tax, or are on the run.
> 
> It could prevent soldiers awol from iraq driving over the border to
> canada or way down south to mexico
> 
> 3/ M-ad hoc
> think of all the benefits - if all the cars are fitted with 802.11
> devices, we could also use them to provide a network - we could cause
> cars to route to places where there is a gap in the connectivity at
> the moment - as per the grossglauser/tse result, and using network
> coding, this would provide for arbitrary capacity almost unbounded in
> gas rich countries.
> 
> 4/ crative accountancy
> at the same time, one could have creative CRISPS (Connectivity Rich
> ISPs) that have ingenious billing schemes - your wireless ad hoc
> broadband bill could be rolled into the tax on your gas at the gas
> station - you go fill up with 10 gallons of gas and 100Gbytes of
> download  - if you run a multi-occupancy vehicle, and you also
> run peer-to-peer file sharing you would get a double discount.
> 
> 5/ border routing considerations
> Of course one would need to consider the regulatory problems - if the
> US were to build a lot of roads just south of the canadian border to
> offer "offshore network capacity"< but the canadians used Hydro to
> re-charge lots of fuel-cell and electrc/hybrid cars, then someone is
> going to think about power-line broadband - then there could be
> interference unless one deploys OFDM (Oil For Download Mobility)
> 
> 6/ denial of service, and other security problems
> of course it is easy to jam radio, and its easy to jam on the radio
> too. so we need to worry about this, but not too much - if someone
> blocks your download, you just drive to blockbuster and pick up the
> DVD there anyhow...
> 
> 7/ network management considerations
> 
> The system should be just as manageable as the internet and the road
> system. congestion will be rare (there will be no packet loss in my
> car), and resilience will be provided by fast oregon bypasses.
> overlay routing (put that bike in the pickup, put that laptop on the
> bike, put that USB memory stick in the laptop) will naturally occur
> and is a matter for further study.
> 
> Now, back to your normal service...
> 
> In missive <424D6067.9040401 at reed.com>, "David P. Reed" typed:
> 
>  >>David - exact position may not matter in most cases, but that's what
>  >>Vonage is being beaten up about (I have 911 on the Vonage line
>  >>activated, and it gets through to my local emergency services just fine
>  >>because I told the system when I set it up where that was.)
>  >>
>  >>I note that getting the Massachusetts "state police" is rarely useful
>  >>unless you are driving on the Massachusetts Turnpike (they might as well
>  >>be a call center in Bangalore).   They cannot by law assist you, and do
>  >>not have the best means to pass on calls to localities, who might help
>  >>you if you observe someone being mugged or raped on the street in (say)
>  >>downtown Brockton.
>  >>
>  >>As far as I know, every CDMA cell phone being sold today (the vast
>  >>majority in the market) have GPS in them (in the form of A-GPS, a
>  >>proprietary technology that comes from qualcomm, which used GPS receiver
>  >>in the phone, plus an assist from towers that gets the autonomous GPS
>  >>re-locked fast when it goes out of satellite coverage).   I think that
>  >>GSM phones also all have GPS onboard as well.
>  >>
>  >>You are right that tower triangulation has failed, but the E911 mandate
>  >>for cell phones still holds, and GPS is the technology that has been
>  >>universally adopted, and works pretty well, as far as getting location.
>  >>
>  >>But as I said, knowing approximate or exact location isn't very good if
>  >>the system design actually routes calls away from local responders to a
>  >>single point of failure in some remote, windowless building that has no
>  >>direct local presence.
>  >>
>  >>
> 
>  cheers
> 
>    jon

From cannara at attglobal.net  Fri Apr  1 22:15:19 2005
From: cannara at attglobal.net (Cannara)
Date: Fri, 01 Apr 2005 22:15:19 -0800
Subject: [e2e] Frivolity, was Re:  Skype and congestion collapse.
References: <026F8EEDAD2C4342A993203088C1FC051C270E@esealmw109.eemea.ericsson.se>
	<42360909.6090509@attglobal.net> <424C3D96.3000402@reed.com>
	<Pine.GSO.4.50.0503311919110.2484-100000@argos.ee.surrey.ac.uk>
Message-ID: <424E3877.C199C480@attglobal.net>

Good!  We had to do the same thing yesterday with our garbage collector, but
she actually was only 5 miles away.  You realize that McDonald's & Jack in the
Box are outsourcing their drive-up order taking too?  Will Indians not want
the work because of the cow meat?  :]

Alex

Lloyd Wood wrote:
> 
> On Thu, 31 Mar 2005, David P. Reed wrote:
> 
> > Date: Thu, 31 Mar 2005 13:12:38 -0500
> > From: David P. Reed <dpreed at reed.com>
> > To: Alex Cannara <cannara at attglobal.net>
> > Cc: end2end-interest at postel.org
> > Subject: Re: [e2e] Skype and congestion collapse.
> >
> > Alex - the underlying assumption is that traditional telephony delivers
> > 911 functionality best.   Well, the word on the street is that in
> > California, if you call 911 on your basic, non IP cell phone, your exact
> > position is delivered to ...
> 
> India.
> 
> http://www.doonesbury.com/strip/dailydose/index.html?uc_full_date=20050320
> 
> L.

From Jon.Crowcroft at cl.cam.ac.uk  Sat Apr  2 01:06:46 2005
From: Jon.Crowcroft at cl.cam.ac.uk (Jon Crowcroft)
Date: Sat, 02 Apr 2005 10:06:46 +0100
Subject: [e2e] 911 and cell phones
In-Reply-To: Message from John Wroclawski <jtw@lcs.mit.edu> of "Fri,
	01 Apr 2005 12:35:54 CDT." <p06210207be7335d96b47@[128.30.5.151]> 
Message-ID: <E1DHead-00078r-00@mta1.cl.cam.ac.uk>

yes...

what could be more end-to-end than 
knowing where the end is?

and the means justifies the end
especially if it helps you with
prevention of theft and denial of service too...

In missive <p06210207be7335d96b47@[128.30.5.151]>, John Wroclawski typed:

 >>At 9:06 AM -0800 4/1/05, Bob Braden wrote:
 >>>What does this topic have to do with the end-to-end principle/practice?
 >>>
 >>>Bob Braden
 >>
 >>Actually, it does seem to - when you strip lots of detail away a key 
 >>question seems to be the architectural choice of "end system knows 
 >>where it is and tells the dispatcher" vs "infrastructure expected to 
 >>know where the thing using it is". Interesting that both VoIP and 
 >>cellular tech are apparently pushing the architecture towards a more 
 >>e2e model.
 >>
 >>--john

 cheers

   jon


From faisal at lums.edu.pk  Sun Apr  3 03:18:38 2005
From: faisal at lums.edu.pk (Faisal Aslam)
Date: Sun, 3 Apr 2005 15:18:38 +0500
Subject: [e2e] UDP checksum field?
Message-ID: <8C128AD85EEA5747B9F81C6230BA12F6544BB3@jhelum.lumsnet.edu.pk>

Hi,
 
Why we have checksum field is in UDP header, as UDP does not provide data retransmission etc?
I think it is used only to silently discarding a packet with wrong checksum (thats it?). Is there any other application of checksum field?

Sorry if the question is too naive.
 
Thanks
Faisal
 

From cannara at attglobal.net  Sun Apr  3 20:06:09 2005
From: cannara at attglobal.net (Cannara)
Date: Sun, 03 Apr 2005 20:06:09 -0700
Subject: [e2e] UDP checksum field?
References: <8C128AD85EEA5747B9F81C6230BA12F6544BB3@jhelum.lumsnet.edu.pk>
Message-ID: <4250AF21.552F7D9A@attglobal.net>

Faisal, yes indeed, the checksum lets the receiver discard garbage, just as
the CRC at the frame level does.  UDP can be asked to not use the checksum,
for devil-may-care applications. :]

Alex

Faisal Aslam wrote:
> 
> Hi,
> 
> Why we have checksum field is in UDP header, as UDP does not provide data retransmission etc?
> I think it is used only to silently discarding a packet with wrong checksum (thats it?). Is there any other application of checksum field?
> 
> Sorry if the question is too naive.
> 
> Thanks
> Faisal
>

From philippe.gentric at philips.com  Mon Apr  4 03:02:06 2005
From: philippe.gentric at philips.com (Philippe Gentric)
Date: Mon, 4 Apr 2005 12:02:06 +0200
Subject: [e2e] 911 and cell phones
In-Reply-To: <200504011706.JAA25472@gra.isi.edu>
Message-ID: <OF812570D0.2D438266-ONC1256FD9.0034207F-C1256FD9.00372F6F@philips.com>

>What does this topic have to do with the end-to-end principle/practice?

imagine anyone could fake ten emmergency calls in ten minutes, across an 
ocean, and get away with it...
dont you think this would be a major end-to-end [user-to-police] 
*principle* problem?


Philippe.


Bob Braden <braden at ISI.EDU>
Sent by: 
end2end-interest-bounces at postel.org
2005-04-01 19:06
 
        To:     Black_David at emc.com
dpreed at reed.com
        cc:     end2end-interest at postel.org
(bcc: Philippe Gentric/SUR/PSW/PHILIPS)
        Subject:        Re: [e2e] 911 and cell phones
        Classification: 


What does this topic have to do with the end-to-end principle/practice?

Bob Braden
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050404/12e196ea/attachment.html

From lynne at telemuse.net  Mon Apr  4 09:30:47 2005
From: lynne at telemuse.net (Lynne Jolitz)
Date: Mon, 4 Apr 2005 09:30:47 -0700
Subject: [e2e] UDP checksum field?
In-Reply-To: <Pine.GSO.4.50.0504041043320.3455-100000@argos.ee.surrey.ac.uk>
Message-ID: <001701c53933$a7be2960$6e8944c6@telemuse.net>

Yes, Lloyd is exactly right here. It is often the case that people turn off UDP checksums to "buy" more performance by relying on the CRC of the ethernet packet. It's not a stupid question - it's a very smart question, and a lot of smart people get fooled by this.

For example, the Sun datacenter back in the early 1990's had an NFS cluster project called Sunbox - an array of workstation CPUs that did divide and conquer to build a massive file server. It used an ethernet multiplexer to dynamically split the load. To buy back performance, they turned off the UDP checksum. It worked fine until they had a bad lot of ethernet boards with substandard memories - this wasn't picked up in tests because the test units were doing resends of the occasionally corrupted packets (UDP checksums usually was turned on), and in TCP the checksums would do resends as well. It was also a fairly rare problem, and the test periods were too short to pick up on the nature of this problem easily. 

But when UDP checksums were turned off in normal use, the resulting NFS requests were corrupting the filesystem (which in this case were database files), forcing rebuilds and manual repairs of database tables. 

As they were about to announce and release it, they suddenly discovered this problem - they noticed the corruption and in order to determine whether it was in the high level (stack or above) or lower levels, they turned on checksums and it worked immediately. 

They then examined the failed checksum packets to traceback in the lower level stack-down through the link layer to discover where the corruption occured. With logic analyzers, they were able to observe the contents going into memory from the NIC on reception was different than the contents going out of the memory and traveling across the bus to the processor. 

This is a surprisingly common problem in datacenters - sometimes the problem would be a switch, sometimes a configuration error, sometimes a programming error in the application, and so forth. I most recently experienced this problem with an overheated ethernet switch passing  video on an internal network.

I also ran into this at an Internet portal company where I was a manager. We were using NetApps file servers to mirror the daily information - NetApps at the time encouraged staff to turn off checksums to increase performance. The DBAs noticed problems and ended up doing frequent rebuilds, but couldn't figure out why. It took me a lot of time to convince my staff to turn on the checksums because they were told "they don't have to" by NetApps. Most datacenter staff work by cookbook, and this wasn't in the cookbook. When they finally tried it, it worked. This little problem cost us a lot of time and aggravation for very little (if any) performance gain. 

Performance gain by turning off checksums now can be obviated through the use of intelligent NIC technologies like SiliconTCP (http://jolitz.telemuse.net/pubs/pt2001_01/item) and TOE that calculate the checksum as the packet is being received. But we don't have this in commodity switches yet, so check that switch if you're having problems.

Higher level checksums are worth it every time. Don't leave the server without them. :-)

Lynne Jolitz.

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Lloyd Wood
> Sent: Monday, April 04, 2005 2:48 AM
> To: Faisal Aslam
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] UDP checksum field?
> 
> 
> On Sun, 3 Apr 2005, Faisal Aslam wrote:
> 
> > Why we have checksum field is in UDP header, as UDP does not provide
> > data retransmission etc? I think it is used only to silently
> > discarding a packet with wrong checksum (thats it?).
> 
> yes - you need an end-to-end check against a corrupted packet. UDP
> could have the checksum turned off, which proved disastrous for a
> number of applications, subtly corrupted filing systems which didn't
> have higher-level end2end checks etc.
> 
> > Is there any  other application of checksum field?
> 
> For other applications
> http://www.faqs.org/rfcs/rfc3828.html
> 
> UDP Lite originally sprang out of the observation that UDP has
> redundant length information, and that this information could be
> combined with the checksum (as in TCP/UDP) to give partial coverage.
> 
> L.
> 
> >
> > Sorry if the question is too naive.
> >
> > Thanks
> > Faisal
> >
> >
> 
> <http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at eim.surrey.ac.uk>
> 

From cannara at attglobal.net  Mon Apr  4 10:02:32 2005
From: cannara at attglobal.net (Cannara)
Date: Mon, 04 Apr 2005 10:02:32 -0700
Subject: [e2e] UDP checksum field?
References: <001701c53933$a7be2960$6e8944c6@telemuse.net>
Message-ID: <42517328.77FD2664@attglobal.net>

I'll add a funny (if you're not using Oracle TNS gateways) SQL transport
example that still exists today, despite being pointed out to Oracle about a
decade ago.  When Network General was adding more SQL decodes to the
Sniffer(r), in the '90s, we had a presentation on the Oracle transport (TNS)
underlying SQL Net traffic.  TNS rode on Netware SPP, or TCP, etc.  The fellow
went into packet fields in detail and explained how Oracle also made gateway
software available for Sun boxes to go from an Oracle system to an IBM SNA db
system.  The gateway received SQL on TNS on TCP on IP on Ethernet (for
instance) and spit out SQL on TNS or whatever IBM wanted.  

As he expounded on TNS pkt fields, a few hands went up -- "What's the checksum
field for if it's always 0?" asked a few experienced network folks. The
presenter turned back to the slide show and said: "It's unimplemented for
now".  Without malice, another question was posed:  "Well if it's unused and
your gateway has bad memory, how do you know the data going into the db on the
other side will be good?"  The presenter, a highly lauded Oracle techy, looked
at the screen for a bit, looked back at the audience, shuffled his feet,
looked again at the screen, and finally said words like:  "I don't know".  

After the presentation, a letter was written to Oracle, copied to Ellison,
explaining exactly the problem and urging the TNS checksum be implemented.  No
response ever came back, and, if you look at a TNS packet today, the checksum
is still zero.  I guess no one has used the gateway software who cares about
their data.  :]

Alex

PS Note that "gateway" here is used in the proper sense, not for "router".

Lynne Jolitz wrote:
> 
> Yes, Lloyd is exactly right here. It is often the case that people turn off UDP checksums to "buy" more performance by relying on the CRC of the ethernet packet. It's not a stupid question - it's a very smart question, and a lot of smart people get fooled by this.
> 
> For example, the Sun datacenter back in the early 1990's had an NFS cluster project called Sunbox - an array of workstation CPUs that did divide and conquer to build a massive file server. It used an ethernet multiplexer to dynamically split the load. To buy back performance, they turned off the UDP checksum. It worked fine until they had a bad lot of ethernet boards with substandard memories - this wasn't picked up in tests because the test units were doing resends of the occasionally corrupted packets (UDP checksums usually was turned on), and in TCP the checksums would do resends as well. It was also a fairly rare problem, and the test periods were too short to pick up on the nature of this problem easily.
> 
> But when UDP checksums were turned off in normal use, the resulting NFS requests were corrupting the filesystem (which in this case were database files), forcing rebuilds and manual repairs of database tables.
> 
> As they were about to announce and release it, they suddenly discovered this problem - they noticed the corruption and in order to determine whether it was in the high level (stack or above) or lower levels, they turned on checksums and it worked immediately.
> 
> They then examined the failed checksum packets to traceback in the lower level stack-down through the link layer to discover where the corruption occured. With logic analyzers, they were able to observe the contents going into memory from the NIC on reception was different than the contents going out of the memory and traveling across the bus to the processor.
> 
> This is a surprisingly common problem in datacenters - sometimes the problem would be a switch, sometimes a configuration error, sometimes a programming error in the application, and so forth. I most recently experienced this problem with an overheated ethernet switch passing  video on an internal network.
> 
> I also ran into this at an Internet portal company where I was a manager. We were using NetApps file servers to mirror the daily information - NetApps at the time encouraged staff to turn off checksums to increase performance. The DBAs noticed problems and ended up doing frequent rebuilds, but couldn't figure out why. It took me a lot of time to convince my staff to turn on the checksums because they were told "they don't have to" by NetApps. Most datacenter staff work by cookbook, and this wasn't in the cookbook. When they finally tried it, it worked. This little problem cost us a lot of time and aggravation for very little (if any) performance gain.
> 
> Performance gain by turning off checksums now can be obviated through the use of intelligent NIC technologies like SiliconTCP (http://jolitz.telemuse.net/pubs/pt2001_01/item) and TOE that calculate the checksum as the packet is being received. But we don't have this in commodity switches yet, so check that switch if you're having problems.
> 
> Higher level checksums are worth it every time. Don't leave the server without them. :-)
> 
> Lynne Jolitz.
> 
> ----
> We use SpamQuiz.
> If your ISP didn't make the grade try http://lynne.telemuse.net
> 
> > -----Original Message-----
> > From: end2end-interest-bounces at postel.org
> > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Lloyd Wood
> > Sent: Monday, April 04, 2005 2:48 AM
> > To: Faisal Aslam
> > Cc: end2end-interest at postel.org
> > Subject: Re: [e2e] UDP checksum field?
> >
> >
> > On Sun, 3 Apr 2005, Faisal Aslam wrote:
> >
> > > Why we have checksum field is in UDP header, as UDP does not provide
> > > data retransmission etc? I think it is used only to silently
> > > discarding a packet with wrong checksum (thats it?).
> >
> > yes - you need an end-to-end check against a corrupted packet. UDP
> > could have the checksum turned off, which proved disastrous for a
> > number of applications, subtly corrupted filing systems which didn't
> > have higher-level end2end checks etc.
> >
> > > Is there any  other application of checksum field?
> >
> > For other applications
> > http://www.faqs.org/rfcs/rfc3828.html
> >
> > UDP Lite originally sprang out of the observation that UDP has
> > redundant length information, and that this information could be
> > combined with the checksum (as in TCP/UDP) to give partial coverage.
> >
> > L.
> >
> > >
> > > Sorry if the question is too naive.
> > >
> > > Thanks
> > > Faisal

From braden at ISI.EDU  Mon Apr  4 10:33:29 2005
From: braden at ISI.EDU (Bob Braden)
Date: Mon, 4 Apr 2005 10:33:29 -0700 (PDT)
Subject: [e2e] UDP checksum field?
Message-ID: <200504041733.KAA26987@gra.isi.edu>


  *> explaining exactly the problem and urging the TNS checksum be implemented.  No
  *> response ever came back, and, if you look at a TNS packet today, the checksum
  *> is still zero.  I guess no one has used the gateway software who cares about
  *> their data.  :]
  *> 
  *> Alex
  *> 

Or, the incidence of (detected) failures is so low that no one cares.

Bob Braden

From lynne at telemuse.net  Mon Apr  4 10:46:29 2005
From: lynne at telemuse.net (Lynne Jolitz)
Date: Mon, 4 Apr 2005 10:46:29 -0700
Subject: [e2e] UDP checksum field?
In-Reply-To: <42517328.77FD2664@attglobal.net>
Message-ID: <002201c5393e$3b629840$6e8944c6@telemuse.net>

(With no apologies to Microsoft...) - If the Oracle tech guy had gone to the Microsoft Research school of obsfucation, he would have said "The probability of this event occuring such that the reliability of the underlying link layer is impaired by an improbably low memory bit error at ten to the minus 12 excluding thermal radiative factors and charge displacement is so low as to be impossible, hence the question is irrelevent". :-)
Lynne Jolitz

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net


> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Cannara
> Sent: Monday, April 04, 2005 10:03 AM
> To: end2end-interest at postel.org
> Subject: Re: [e2e] UDP checksum field?
> 
> 
> I'll add a funny (if you're not using Oracle TNS gateways) SQL transport
> example that still exists today, despite being pointed out to 
> Oracle about a
> decade ago.  When Network General was adding more SQL decodes to the
> Sniffer(r), in the '90s, we had a presentation on the Oracle 
> transport (TNS)
> underlying SQL Net traffic.  TNS rode on Netware SPP, or TCP, 
> etc.  The fellow
> went into packet fields in detail and explained how Oracle also 
> made gateway
> software available for Sun boxes to go from an Oracle system to 
> an IBM SNA db
> system.  The gateway received SQL on TNS on TCP on IP on Ethernet (for
> instance) and spit out SQL on TNS or whatever IBM wanted.  
> 
> As he expounded on TNS pkt fields, a few hands went up -- "What's 
> the checksum
> field for if it's always 0?" asked a few experienced network folks. The
> presenter turned back to the slide show and said: "It's unimplemented for
> now".  Without malice, another question was posed:  "Well if it's 
> unused and
> your gateway has bad memory, how do you know the data going into 
> the db on the
> other side will be good?"  The presenter, a highly lauded Oracle 
> techy, looked
> at the screen for a bit, looked back at the audience, shuffled his feet,
> looked again at the screen, and finally said words like:  "I 
> don't know".  
> 
> After the presentation, a letter was written to Oracle, copied to Ellison,
> explaining exactly the problem and urging the TNS checksum be 
> implemented.  No
> response ever came back, and, if you look at a TNS packet today, 
> the checksum
> is still zero.  I guess no one has used the gateway software who 
> cares about
> their data.  :]
> 
> Alex
> 
> PS Note that "gateway" here is used in the proper sense, not for "router".
> 
> Lynne Jolitz wrote:
> > 
> > Yes, Lloyd is exactly right here. It is often the case that 
> people turn off UDP checksums to "buy" more performance by 
> relying on the CRC of the ethernet packet. It's not a stupid 
> question - it's a very smart question, and a lot of smart people 
> get fooled by this.
> > 
> > For example, the Sun datacenter back in the early 1990's had an 
> NFS cluster project called Sunbox - an array of workstation CPUs 
> that did divide and conquer to build a massive file server. It 
> used an ethernet multiplexer to dynamically split the load. To 
> buy back performance, they turned off the UDP checksum. It worked 
> fine until they had a bad lot of ethernet boards with substandard 
> memories - this wasn't picked up in tests because the test units 
> were doing resends of the occasionally corrupted packets (UDP 
> checksums usually was turned on), and in TCP the checksums would 
> do resends as well. It was also a fairly rare problem, and the 
> test periods were too short to pick up on the nature of this 
> problem easily.
> > 
> > But when UDP checksums were turned off in normal use, the 
> resulting NFS requests were corrupting the filesystem (which in 
> this case were database files), forcing rebuilds and manual 
> repairs of database tables.
> > 
> > As they were about to announce and release it, they suddenly 
> discovered this problem - they noticed the corruption and in 
> order to determine whether it was in the high level (stack or 
> above) or lower levels, they turned on checksums and it worked 
> immediately.
> > 
> > They then examined the failed checksum packets to traceback in 
> the lower level stack-down through the link layer to discover 
> where the corruption occured. With logic analyzers, they were 
> able to observe the contents going into memory from the NIC on 
> reception was different than the contents going out of the memory 
> and traveling across the bus to the processor.
> > 
> > This is a surprisingly common problem in datacenters - 
> sometimes the problem would be a switch, sometimes a 
> configuration error, sometimes a programming error in the 
> application, and so forth. I most recently experienced this 
> problem with an overheated ethernet switch passing  video on an 
> internal network.
> > 
> > I also ran into this at an Internet portal company where I was 
> a manager. We were using NetApps file servers to mirror the daily 
> information - NetApps at the time encouraged staff to turn off 
> checksums to increase performance. The DBAs noticed problems and 
> ended up doing frequent rebuilds, but couldn't figure out why. It 
> took me a lot of time to convince my staff to turn on the 
> checksums because they were told "they don't have to" by NetApps. 
> Most datacenter staff work by cookbook, and this wasn't in the 
> cookbook. When they finally tried it, it worked. This little 
> problem cost us a lot of time and aggravation for very little (if 
> any) performance gain.
> > 
> > Performance gain by turning off checksums now can be obviated 
> through the use of intelligent NIC technologies like SiliconTCP 
> (http://jolitz.telemuse.net/pubs/pt2001_01/item) and TOE that 
> calculate the checksum as the packet is being received. But we 
> don't have this in commodity switches yet, so check that switch 
> if you're having problems.
> > 
> > Higher level checksums are worth it every time. Don't leave the 
> server without them. :-)
> > 
> > Lynne Jolitz.
> > 
> > ----
> > We use SpamQuiz.
> > If your ISP didn't make the grade try http://lynne.telemuse.net
> > 
> > > -----Original Message-----
> > > From: end2end-interest-bounces at postel.org
> > > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Lloyd Wood
> > > Sent: Monday, April 04, 2005 2:48 AM
> > > To: Faisal Aslam
> > > Cc: end2end-interest at postel.org
> > > Subject: Re: [e2e] UDP checksum field?
> > >
> > >
> > > On Sun, 3 Apr 2005, Faisal Aslam wrote:
> > >
> > > > Why we have checksum field is in UDP header, as UDP does not provide
> > > > data retransmission etc? I think it is used only to silently
> > > > discarding a packet with wrong checksum (thats it?).
> > >
> > > yes - you need an end-to-end check against a corrupted packet. UDP
> > > could have the checksum turned off, which proved disastrous for a
> > > number of applications, subtly corrupted filing systems which didn't
> > > have higher-level end2end checks etc.
> > >
> > > > Is there any  other application of checksum field?
> > >
> > > For other applications
> > > http://www.faqs.org/rfcs/rfc3828.html
> > >
> > > UDP Lite originally sprang out of the observation that UDP has
> > > redundant length information, and that this information could be
> > > combined with the checksum (as in TCP/UDP) to give partial coverage.
> > >
> > > L.
> > >
> > > >
> > > > Sorry if the question is too naive.
> > > >
> > > > Thanks
> > > > Faisal
> 

From craig at aland.bbn.com  Mon Apr  4 11:32:10 2005
From: craig at aland.bbn.com (Craig Partridge)
Date: Mon, 04 Apr 2005 14:32:10 -0400
Subject: [e2e] UDP checksum field?
In-Reply-To: Your message of "Mon, 04 Apr 2005 10:33:29 PDT."
	<200504041733.KAA26987@gra.isi.edu> 
Message-ID: <20050404183210.1AAF324B@aland.bbn.com>


In message <200504041733.KAA26987 at gra.isi.edu>, Bob Braden writes:

>  *> explaining exactly the problem and urging the TNS checksum be implemented
 >.  No
>  *> response ever came back, and, if you look at a TNS packet today, the chec
 >ksum
>  *> is still zero.  I guess no one has used the gateway software who cares ab
 >out
>  *> their data.  :]
>  *> 
>  *> Alex
>  *> 
>
>Or, the incidence of (detected) failures is so low that no one cares.

I vaguely recall that some part of BBN had experience with the NSF
checksum problem and that it took a while for the corruption of the
filesystem to become visible.  That is, errors are infrequent enough
that NIC (or switch, or whatever, ...) testing doesn't typically catch
them.  So bit rot is slow and subtle -- and when you find it, much has
been trashed (especially if one ignores early warning signs, such as
large compilations occasionally failing with unrepeatable loading/compilation
errors).

Craig

From lynne at telemuse.net  Mon Apr  4 11:58:12 2005
From: lynne at telemuse.net (Lynne Jolitz)
Date: Mon, 4 Apr 2005 11:58:12 -0700
Subject: [e2e] UDP checksum field?
In-Reply-To: <20050404183210.1AAF324B@aland.bbn.com>
Message-ID: <003301c53948$3fdcf140$6e8944c6@telemuse.net>

Absolutely right Craig - this was exactly the case with the Sunbox project I described earlier, as well as the datacenter mirror example. Too much damage too late.

As implicit dependence on reliability increases, the value of checksums becomes very clear - in the early deep space probes they learned the hard way the importance of always providing enough redundancy and error correction, because a single bit error might be the one that leads to the destruction of the communications ability of the spacecraft. One spacecraft had a corruption error like this that destroyed it for precisely this reason. They optimized out reliability to get a slightly greater data rate, and lost the spacecraft (this has happened more than once).

We're reaching a point where you have to seriously think about whether an "optimization" is really valuable - since as Craig notes, you may not notice a problem until too late. In this age of ubiquitous   computing, with plentiful processor, memory, and network bandwidth, we should be focussed on increased reliability and integrity, but old habits of a more parsimonious age die hard.

Another very recent example of ignoring the value of checksums is reflected in the recent 'fasttrack' problems of incorrect billing of tolls.
Lynne.
> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Craig Partridge
> Sent: Monday, April 04, 2005 11:32 AM
...
> I vaguely recall that some part of BBN had experience with the NSF
> checksum problem and that it took a while for the corruption of the
> filesystem to become visible.  That is, errors are infrequent enough
> that NIC (or switch, or whatever, ...) testing doesn't typically catch
> them.  So bit rot is slow and subtle -- and when you find it, much has
> been trashed (especially if one ignores early warning signs, such as
> large compilations occasionally failing with unrepeatable 
> loading/compilation
> errors).
> 
> Craig
> 

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net

From jonathan at dsg.stanford.edu  Mon Apr  4 12:46:04 2005
From: jonathan at dsg.stanford.edu (Jonathan Stone)
Date: Mon, 04 Apr 2005 12:46:04 -0700
Subject: [e2e] UDP checksum field?
In-Reply-To: Your message of "Mon, 04 Apr 2005 14:32:10 EDT."
	<20050404183210.1AAF324B@aland.bbn.com> 
Message-ID: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>


In message <20050404183210.1AAF324B at aland.bbn.com>,
Craig Partridge writes:

>In message <200504041733.KAA26987 at gra.isi.edu>, Bob Braden writes:
>
>>  *> explaining exactly the problem and urging the TNS checksum be implemente
>d
> >.  No
>>  *> response ever came back, and, if you look at a TNS packet today, the che
>c
> >ksum
>>  *> is still zero.  I guess no one has used the gateway software who cares a
>b
> >out
>>  *> their data.  :]
>>  *> 
>>  *> Alex
>>  *> 
>>
>>Or, the incidence of (detected) failures is so low that no one cares.
>
>I vaguely recall that some part of BBN had experience with the NSF
>checksum problem and that it took a while for the corruption of the
>filesystem to become visible.  That is, errors are infrequent enough
>that NIC (or switch, or whatever, ...) testing doesn't typically catch
>them.  So bit rot is slow and subtle -- and when you find it, much has
>been trashed (especially if one ignores early warning signs, such as
>large compilations occasionally failing with unrepeatable loading/compilation
>errors).

Hi Craig,

I beleive Steve Crocker mentioned this point after I presented one of
our papers on e2e checksums.  This instance was, again, a large NFS
server (don't know if it was BBN or elsewhere), where the data
corruption was not detected until after several backup cycles. So even
the backup tapes were corrupted.  I was told people working on key
projects had to go back to hardcopy print-outs and retype them.

Whether it's safe to trust outboard checksum offload is a whole
other story.

From dpreed at reed.com  Mon Apr  4 14:19:58 2005
From: dpreed at reed.com (David P. Reed)
Date: Mon, 04 Apr 2005 17:19:58 -0400
Subject: [e2e] UDP checksum field?
In-Reply-To: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
References: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
Message-ID: <4251AF7E.9050002@reed.com>

When all is said and done, the UDP checksum isn't, and never was, fully 
end-to-end protection, since there are few, if any, applications where 
the correctness of the application data can be *fully assured* by making 
sure that a single datagram gets delivered correctly.  It's an optional 
standardized way to help deal with a common risk that can arise due to 
bugs and other issues that show up in engineered systems, nto a 
guarantee of any particular property.

Since UDP datagrams can still be duplicated and modified by a 
checksum-preserving modification in the network (such modifications are 
now common, given middleboxes that discard the checksum and compute a 
new one in many cases), there is no way to assure by a mere checksum 
field that data has not been corrupted somewhere in the network.   
Assurance is not the benefit, applications still need to do truly 
end-to-end checking - UDP's ability to help in detecting incipient 
problems is very useful, however.

I won't elaborate here on the more subtle issues of TCP's lack of true 
end-to-end reliability.   Suffice it to say that there is a difficult 
issue in a definition of reliability that must depend on the difference 
between "design errors" and "random errors".


From jag8719 at vip.sina.com  Mon Apr  4 17:58:18 2005
From: jag8719 at vip.sina.com (Jason Gao)
Date: Tue, 5 Apr 2005 08:58:18 +0800
Subject: [e2e] Paper on ATP;
	end to end security provided by ATP: where SHA1-80 is enough
Message-ID: <200504050052.j350qv620435@boreas.isi.edu>

The draft onAsymmetric Transport Protocol:
http://219.232.1.66/attached/info/atp-2004.pdf

It has a security feature: encyrpted transport mode which combines AES and
SHA1. It is suggested that the mode is optional when ATP is over UDP while
mandatory when over IPv6. The algorithm (rewritten and clarified recently,
renamed to AES-SHA1-CV) is that:

Problem Space:
To ensure successive packets came from the same source (identity of the
source), and in the same time To protect confidentiality of the payload.

(It is not in the same problem space as AES-CCM or AES-OCB.)

It is assumed that a shared secret which is at least 283 bit has been
established (using Elliptic Curve Diffie-Hellman key-agreement process,
elliptic group sect283k1).

In encrypted transport mode, if there is non-null payload no extension
header may sit between the ATP fixed header and the payload. The structure
of the ATP packet is:

0                   1                   2                   3
|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|2|3|4|5|6|7|8|9|0|1|
|---------------------------------------------------------------|
|     OpCode    |         Data Segment Length                   |
|---------------------------------------------------------------|
|                         Sequence Number                       |0
|---------------------------------------------------------------|
|                         Connection                            | 
|                             Key                               |1
|---------------------------------------------------------------|
|                  Next Expected Sequence Number                |
|---------------------------------------------------------------|
| Stack Pointer |       Flags   |           Identity            |2
|--------------------------------          /Integrity           |
|                             Check                             |
|                             Code                              |3
|---------------------------------------------------------------|
~                                                               ~
~                         Payload                               ~
~                                                               ~
|---------------------------------------------------------------|

ATP fixed header is 192 bit. When it is over UDP, the full ATP fixed header
is stored next to UDP header. When it is over native IPv6, apparently ATP
fixed header is 128 bit, 

Sender's behavior
-----------------
Encryption, Composition of ICC:

Step 1, Get the high-order 16 bits of ICC and the comparison vector (CV):
    IV80 = SHA1-80(Fixed Header Excluding ICC, Shared Secret) Namely,
replace ICC field with the Shared Secret and apply SHA1-80.
The high-order 16 bits of IV80 SHALL be stored in the high-order 16 bits of
ICC field, while the low-order 64 bits are taken as the CV.

Note that  ; here ',' denotes concatenation.

Step 2, Padding
The length of the cipher text, which is determined by the length of the
original clear text according to the padding method hereafter states, is
stored in the data segment length field. It equals the length of the padded
clear text minus 8.

The original clear text is firstly padded with a sequence of octets of zero
or more length. The length of the octet sequence is 15, if the original
clear text has already been 128-bit aligned, or the number of octets it
required to make the clear text 128-bit aligned minus 1. The length, which
is represents by a single octets, of the octet sequence is padded as the
last octet. (The idea is borrowed from ESP, slightly modified)

Then the clear text is padded with the initial 64 bits, which include the
OpCode, the data segment length and the sequence number, of the ATP packet,
and lastly the 64-bit CV . 

Step 3, AES-CBC encryption
The last 128-bit block of the padded clear text is taken as the
initialization vector (IV). The IV and the full padded clear text are fed
into the AES-CBC encryption module. The key fed SHOULD be installed by the
ULA. On the default the key is derived from the shared secret. The key
derived function MUST conform to ANSI-X9.63-KDF [KDF].

The first 64 bits are stored in the low-order 64 bit position of the ICC
field. Following bits are stored in the payload field of the ATP packet.


Receiver's behavior
-------------------
Decryption, Verification of ICC

Step 1, Rebuild IV and Preliminary check Again, IV80 = SHA1-80(Fixed Header
Excluding ICC, Shared Secret) The high-order 16 bits are compared with the
high-order 16 bits of the ICC field of the ATP packet received to
preliminarily check whether the packet came from the same source.

The initial 64 bits of the ATP packet received and the low-order 64 bits of
IV80 form the IV'.

Step 2, AES-CBC decryption
The cipher text, taken from the lower-64 bit position of the ICC field and
the payload field, together with the IV and the AES key are fed into the
AES-CBC decryption module.

Step 3, Verifying IV
The last block of the decryption result is compared with IV'. If they are
equal, the packet SHALL be accepted. Or else it MUST be silently discarded.

Finally, padding is removed and the clear text payload is delivered to the
ULA.


----
It is straightforward to modify the algorithm to use SHA1-144 to obtain the
high-order 16 bits of ICC and the 128-bit initialization vector.  We choose
SHA1-80 because the secure hash algorithm applied used to be MD5, and we
believe that entropy space of 64 bit for the initial word of the IV is
enough.

The algorithm is actually a combination of AES-CBC and partial HMAC. Partial
HMAC protects the packet header which is very short (effectively at most 112
bits take part in the partial HMAC). AES-CBC provides both encrytion and
message authentication service for the payload.

The problem space for the attacker is:
Provided that low-order 64-bit of the MAC value is known (the high-order 64
bits of the MAC value is encrypted and unknown by the attacker, and it is
easy to modify the algorithm to make the whole MAC value confidential), and
the IV is the clear text of the last block, find a sequence of octets which
makes the same AES-CBC-MAC value, and the high-order 64 bits of the MAC
value must equal the unkown partial HMAC result of the fixed header  while
the low-order 64 bits equal the initial 64 bits of the ATP fixed header.
Four fields of the fixed header may be modifed, to a limited extent: the
data segment length, the sequence number, the next expected sequence number
and the flag.

We believe that the problem is so hard that SHA1-80 is enough here.

-------------------
Acknowledgement

Thanks to Stephen Sprunk. He made us aware that the original name of the
algorithm, AES-IV-SHA1-80 is misleading and the algorithm itself is obscure.
We hope that it is corrected and clarified.


From marc.herbert at free.fr  Tue Apr  5 02:41:59 2005
From: marc.herbert at free.fr (Marc Herbert)
Date: Tue, 5 Apr 2005 11:41:59 +0200 (CEST)
Subject: [e2e] very simple IP QoS for the bottleneck access link ? (was Skype
	and congestion collapse.)
In-Reply-To: <9531abdc241f450e15fa92b84fe74310@extremenetworks.com>
References: <11ad0fa8050304053342514f51@mail.gmail.com>
	<200503041318.37290.don@dhoffman.net>
	<4228E595.9030407@dirtcheapemail.com>
	<9531abdc241f450e15fa92b84fe74310@extremenetworks.com>
Message-ID: <Pine.LNX.4.58.0503081136580.12032@fcat>

On Fri, 4 Mar 2005, RJ Atkinson wrote:

> On Mar 4, 2005, at 17:47, Clark Gaylord wrote:
> > This is why we really do need some notion of QoS other than The Fat
> > Pipe.  It doesn't have to be as elaborate as RSVP-disciplined CAC, but
> > you need to be able to prioritize traffic that matters and limit the
> > amount of traffic that gets prioritized.  It doesn't have to be more
> > complex than that, but it has to do at least that.  [Ergo ... left as
> > an exercise to the reader.]
>
> I don't know that the "network" needs to have a more sophisticated
> notion of QoS than best effort.  It can sometimes be useful for the
> network device connected directly to a congested link (e.g. access
> link between a site and its upstream provider) to have some
> internal-to-the-box QoS configuration.
>
> It is not uncommon these days for the access router at the customer
> premise to have some ACL ruleset that prefers some traffic over
> other traffic or rate-limits certain kinds of traffic -- and
> equivalent configuration of the aggregation router on the ISP side
> of the same link is also not uncommon these days.

OK, so why not generalize, extend, standardize, promote and sell this
technique?  To the point of creating a extremely simple QoS API
allowing latency-sensitive applications (assumed to be CBR, as mostly
are) to register their traffic to both ends of the access link. This
API would just reliably replace ugly hacks like guessing about
"well-known" UDP ports or tedious manual configurations.

Let's assume a network overprovisioned at the core, where the
bottleneck is the access link for a significant number of nodes
(_significant_, not even "majority"). This looks a lot like the
current Internet to me. Looking at current technology trends, this
looks like it's gonna stay like this for long. OK, maybe some
revolutions in transmissions and economics we can't envision today
would make the assumptions above wrong in the end. But in the end, we
are all dead anyway.

For nodes whose access link is not the bottleneck, then this does not
apply, and they have to solve this latency issue by some other
means, assuming they want to solve it. That's all. Simple.

The implementation looks simple. The latency-sensitive application
regularly sends to both access link halves (up- and down- stream) some
way to identify their packets (for instance: dst UDP port 27015
belongs to higher class). The access link implement strict priority
for those latency-sensitive packets.  Elastic traffic takes the rest.
Only two traffic classes, can be implemented cheaply by a DSLAM and by
a consumer device. No complex configurations. For those customers who
only have a poor USB DSL modem, this could be implemented in the PC
itself.

Since it's local it's scalable. No need to perform QoS at lightning
speed, the load is spreaded to numerous network ends, etc.

Since it's local it's incremental. It's incremental in the sense you
can deploy it for one customer and not the other without any issue.
It's incremental in the sense some ISP can start offering it without
caring about the others ISP. It's incremental in the sense you can
deploy it for some applications and not the others _on the same access
link_. Legacy applications just get the lower class. It's incremental
in the sense you can deploy it first for the upstream access link (the
biggest issue today because of the "A" in ADSL)  before the downstream
link.

It's also incremental in the sense you can make it peacefully co-exist
with a more primitive and less reliable "guess well-known UDP ports"
approach. It's incremental in the sense that, once started,
applications will have a strong incentive to move to this API.

What about the user registering too much traffic in the upper priority
class?  Well, it fails. Not worst than today. Most internet users now
know how to solve this congestion issue (observed immediately): they
shut some applications down. No computations, the simple try and fix
approach known today, only better. The only added complexity is the
two classes. Since most elastic applications report the currently used
throughput, users would not have a hard time understanding that
shutting down an application that is left with zero kb/s will not
solve their congestion issue in this case.

Since it's local I hardly see any security issue. Well you can imagine
some rogue application running in your home and stealing bandwith, but
then I would say you have a much bigger issue anyway.

>From the point of view of the end to end argument, you can think of it
as the definition of "the end" has been extended to include the access
link. Is this too much heretical? IMHO there have been much worst
deviances from The Argument in network history (firewalls anyone?).

Do you think it could have any economical viability? I think that if
just one ISP and one CBR killer app (Skype, a game, whatever) would
start to package it then it would sell. "No more lag thanks to our
brand new low-ping advanced technology. Now you can download and play
at the same time". You can even give to power users an advanced link
access controller allowing them to prioritize most legacy applications
and widening the market potential, attracting all geeks.

Any issues I missed ?  There must be some. This looks too good to
be true :-) Thanks a lot in advance for your comments.


-- 
So einfach wie m?glich. Aber nicht einfacher -- Albert Einstein


From s.malik at tuhh.de  Tue Apr  5 05:34:46 2005
From: s.malik at tuhh.de (Sireen Habib Malik)
Date: Tue, 05 Apr 2005 14:34:46 +0200
Subject: [e2e] very simple IP QoS for the bottleneck access link ? (was
	Skype and congestion collapse.)
In-Reply-To: <Pine.LNX.4.58.0503081136580.12032@fcat>
References: <11ad0fa8050304053342514f51@mail.gmail.com>	<200503041318.37290.don@dhoffman.net>	<4228E595.9030407@dirtcheapemail.com>	<9531abdc241f450e15fa92b84fe74310@extremenetworks.com>
	<Pine.LNX.4.58.0503081136580.12032@fcat>
Message-ID: <425285E6.6020905@tuhh.de>


Marc Herbert wrote:

>The implementation looks simple. The latency-sensitive application
>regularly sends to both access link halves (up- and down- stream) some
>way to identify their packets (for instance: dst UDP port 27015
>belongs to higher class). The access link implement strict priority
>for those latency-sensitive packets.  Elastic traffic takes the rest.
>Only two traffic classes, can be implemented cheaply by a DSLAM and by
>a consumer device. No complex configurations. For those customers who
>only have a poor USB DSL modem, this could be implemented in the PC
>itself.
>
>  
>
Here is my understanding of how it is done today. End node marks Layer-2 
CoS and/or Layer-3 DSCP fields of the IP/UDP/RTP/Voice packet.
Voice traffic is given the top priority and is sent into a Priority 
Queue (PPQ). The low priority queue could be RED or Weighted-RED, WFQ, etc.

In order for this to work, the end must be in the "trust" region i.e 
CoS/DSCP fields should not be reset by the downstream routers/switches 
in the path.

The presence of the other, lower priority, queue adds to the 
"variations" of the departing voice trafic from the PQ. Studies have 
shown that packet delay for this type of queue can be well bounded with 
M/D/1 delay + residual time of the lower priority packets.  It is to be 
noted that if an MPLS type of tunnel is used for "voice only" then delay 
is modeled with SUM(D) /D/1 type of system which has significantly lower 
mean packet delay. So there are trade-offs.

Please note, QoS for VoIP is an "Mouth-To-Ear" issue so many other 
factors get involved.
--
SM


>Since it's local it's scalable. No need to perform QoS at lightning
>speed, the load is spreaded to numerous network ends, etc.
>
>Since it's local it's incremental. It's incremental in the sense you
>can deploy it for one customer and not the other without any issue.
>It's incremental in the sense some ISP can start offering it without
>caring about the others ISP. It's incremental in the sense you can
>deploy it for some applications and not the others _on the same access
>link_. Legacy applications just get the lower class. It's incremental
>in the sense you can deploy it first for the upstream access link (the
>biggest issue today because of the "A" in ADSL)  before the downstream
>link.
>
>It's also incremental in the sense you can make it peacefully co-exist
>with a more primitive and less reliable "guess well-known UDP ports"
>approach. It's incremental in the sense that, once started,
>applications will have a strong incentive to move to this API.
>
>What about the user registering too much traffic in the upper priority
>class?  Well, it fails. Not worst than today. Most internet users now
>know how to solve this congestion issue (observed immediately): they
>shut some applications down. No computations, the simple try and fix
>approach known today, only better. The only added complexity is the
>two classes. Since most elastic applications report the currently used
>throughput, users would not have a hard time understanding that
>shutting down an application that is left with zero kb/s will not
>solve their congestion issue in this case.
>
>Since it's local I hardly see any security issue. Well you can imagine
>some rogue application running in your home and stealing bandwith, but
>then I would say you have a much bigger issue anyway.
>
>>From the point of view of the end to end argument, you can think of it
>as the definition of "the end" has been extended to include the access
>link. Is this too much heretical? IMHO there have been much worst
>deviances from The Argument in network history (firewalls anyone?).
>
>Do you think it could have any economical viability? I think that if
>just one ISP and one CBR killer app (Skype, a game, whatever) would
>start to package it then it would sell. "No more lag thanks to our
>brand new low-ping advanced technology. Now you can download and play
>at the same time". You can even give to power users an advanced link
>access controller allowing them to prioritize most legacy applications
>and widening the market potential, attracting all geeks.
>
>Any issues I missed ?  There must be some. This looks too good to
>be true :-) Thanks a lot in advance for your comments.
>
>
>
>  
>


-- 

Sireen Malik, M.Sc.
PhD. Candidate,

Communication Networks
Hamburg University of  Technology,
FSP 4-06 (room 3008)
Denickestr. 17
21073 Hamburg, Deutschland

Tel: +49 (40) 42-878-3387
Fax: +49 (40) 42-878-2941
E-Mail: s.malik at tuhh.de

--Everything should be as simple as possible, but no simpler (Albert Einstein)


From marc.herbert at free.fr  Tue Apr  5 08:46:39 2005
From: marc.herbert at free.fr (Marc Herbert)
Date: Tue, 5 Apr 2005 17:46:39 +0200 (CEST)
Subject: [e2e] very simple IP QoS for the bottleneck access link ?
In-Reply-To: <425285E6.6020905@tuhh.de>
References: <11ad0fa8050304053342514f51@mail.gmail.com>
	<200503041318.37290.don@dhoffman.net>
	<4228E595.9030407@dirtcheapemail.com>
	<9531abdc241f450e15fa92b84fe74310@extremenetworks.com>
	<Pine.LNX.4.58.0503081136580.12032@fcat> <425285E6.6020905@tuhh.de>
Message-ID: <Pine.LNX.4.58.0504051738390.8713@fcat>

On Tue, 5 Apr 2005, Sireen Habib Malik wrote:

> Marc Herbert wrote:
>
> >The implementation looks simple. The latency-sensitive application
> >regularly sends to both access link halves (up- and down- stream) some
> >way to identify their packets (for instance: dst UDP port 27015
> >belongs to higher class). The access link implement strict priority
> >for those latency-sensitive packets.  Elastic traffic takes the rest.
> >Only two traffic classes, can be implemented cheaply by a DSLAM and by
> >a consumer device. No complex configurations. For those customers who
> >only have a poor USB DSL modem, this could be implemented in the PC
> >itself.

> Here is my understanding of how it is done today. End node marks Layer-2
> CoS and/or Layer-3 DSCP fields of the IP/UDP/RTP/Voice packet.
> Voice traffic is given the top priority and is sent into a Priority
> Queue (PPQ). The low priority queue could be RED or Weighted-RED, WFQ, etc.
>
> In order for this to work, the end must be in the "trust" region i.e
> CoS/DSCP fields should not be reset by the downstream routers/switches
> in the path.

> It is to be noted that if an MPLS type of tunnel is used...

Now I am not sure I made myself clear... I am talking about a very
simple solution _local_, _private_ to the access link, and to solve
only the bottleneck issue at the access link. You get what you paid
for. But it could still be very interesting IMHO.

So no tunnels, no routers involved at all.

Concerning VoIP for instance, each end would have to implement this
trick on its own access link _independently_ from the other end. If
only one end does, well only this end can abuse its access link with
P2P traffic while phoning simultaneously. The other end has to stop
eMule as usual.


-- 
So einfach wie m?glich. Aber nicht einfacher -- Albert Einstein


From cannara at attglobal.net  Tue Apr  5 09:31:01 2005
From: cannara at attglobal.net (Cannara)
Date: Tue, 05 Apr 2005 09:31:01 -0700
Subject: [e2e] UDP checksum field?
References: <002201c5393e$3b629840$6e8944c6@telemuse.net>
Message-ID: <4252BD45.572A0A55@attglobal.net>

Or, as Steve Balmer, Prince of OS/2 LanManager, King of Faulty Releases, would
glare:  "WAD, so stifle".  (WAD = works as designed)

:]

Alex

Lynne Jolitz wrote:
> 
> (With no apologies to Microsoft...) - If the Oracle tech guy had gone to the Microsoft Research school of obsfucation, he would have said "The probability of this event occuring such that the reliability of the underlying link layer is impaired by an improbably low memory bit error at ten to the minus 12 excluding thermal radiative factors and charge displacement is so low as to be impossible, hence the question is irrelevent". :-)
> Lynne Jolitz
> 
> ----
> We use SpamQuiz.
> If your ISP didn't make the grade try http://lynne.telemuse.net
> 
> > -----Original Message-----
> > From: end2end-interest-bounces at postel.org
> > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Cannara
> > Sent: Monday, April 04, 2005 10:03 AM
> > To: end2end-interest at postel.org
> > Subject: Re: [e2e] UDP checksum field?
> >
> >
> > I'll add a funny (if you're not using Oracle TNS gateways) SQL transport
> > example that still exists today, despite being pointed out to
> > Oracle about a
> > decade ago.  When Network General was adding more SQL decodes to the
> > Sniffer(r), in the '90s, we had a presentation on the Oracle
> > transport (TNS)
> > underlying SQL Net traffic.  TNS rode on Netware SPP, or TCP,
> > etc.  The fellow
> > went into packet fields in detail and explained how Oracle also
> > made gateway
> > software available for Sun boxes to go from an Oracle system to
> > an IBM SNA db
> > system.  The gateway received SQL on TNS on TCP on IP on Ethernet (for
> > instance) and spit out SQL on TNS or whatever IBM wanted.
> >
> > As he expounded on TNS pkt fields, a few hands went up -- "What's
> > the checksum
> > field for if it's always 0?" asked a few experienced network folks. The
> > presenter turned back to the slide show and said: "It's unimplemented for
> > now".  Without malice, another question was posed:  "Well if it's
> > unused and
> > your gateway has bad memory, how do you know the data going into
> > the db on the
> > other side will be good?"  The presenter, a highly lauded Oracle
> > techy, looked
> > at the screen for a bit, looked back at the audience, shuffled his feet,
> > looked again at the screen, and finally said words like:  "I
> > don't know".
> >
> > After the presentation, a letter was written to Oracle, copied to Ellison,
> > explaining exactly the problem and urging the TNS checksum be
> > implemented.  No
> > response ever came back, and, if you look at a TNS packet today,
> > the checksum
> > is still zero.  I guess no one has used the gateway software who
> > cares about
> > their data.  :]
> >
> > Alex
> >
> > PS Note that "gateway" here is used in the proper sense, not for "router".
> >
> > Lynne Jolitz wrote:
> > >
> > > Yes, Lloyd is exactly right here. It is often the case that
> > people turn off UDP checksums to "buy" more performance by
> > relying on the CRC of the ethernet packet. It's not a stupid
> > question - it's a very smart question, and a lot of smart people
> > get fooled by this.
> > >
> > > For example, the Sun datacenter back in the early 1990's had an
> > NFS cluster project called Sunbox - an array of workstation CPUs
> > that did divide and conquer to build a massive file server. It
> > used an ethernet multiplexer to dynamically split the load. To
> > buy back performance, they turned off the UDP checksum. It worked
> > fine until they had a bad lot of ethernet boards with substandard
> > memories - this wasn't picked up in tests because the test units
> > were doing resends of the occasionally corrupted packets (UDP
> > checksums usually was turned on), and in TCP the checksums would
> > do resends as well. It was also a fairly rare problem, and the
> > test periods were too short to pick up on the nature of this
> > problem easily.
> > >
> > > But when UDP checksums were turned off in normal use, the
> > resulting NFS requests were corrupting the filesystem (which in
> > this case were database files), forcing rebuilds and manual
> > repairs of database tables.
> > >
> > > As they were about to announce and release it, they suddenly
> > discovered this problem - they noticed the corruption and in
> > order to determine whether it was in the high level (stack or
> > above) or lower levels, they turned on checksums and it worked
> > immediately.
> > >
> > > They then examined the failed checksum packets to traceback in
> > the lower level stack-down through the link layer to discover
> > where the corruption occured. With logic analyzers, they were
> > able to observe the contents going into memory from the NIC on
> > reception was different than the contents going out of the memory
> > and traveling across the bus to the processor.
> > >
> > > This is a surprisingly common problem in datacenters -
> > sometimes the problem would be a switch, sometimes a
> > configuration error, sometimes a programming error in the
> > application, and so forth. I most recently experienced this
> > problem with an overheated ethernet switch passing  video on an
> > internal network.
> > >
> > > I also ran into this at an Internet portal company where I was
> > a manager. We were using NetApps file servers to mirror the daily
> > information - NetApps at the time encouraged staff to turn off
> > checksums to increase performance. The DBAs noticed problems and
> > ended up doing frequent rebuilds, but couldn't figure out why. It
> > took me a lot of time to convince my staff to turn on the
> > checksums because they were told "they don't have to" by NetApps.
> > Most datacenter staff work by cookbook, and this wasn't in the
> > cookbook. When they finally tried it, it worked. This little
> > problem cost us a lot of time and aggravation for very little (if
> > any) performance gain.
> > >
> > > Performance gain by turning off checksums now can be obviated
> > through the use of intelligent NIC technologies like SiliconTCP
> > (http://jolitz.telemuse.net/pubs/pt2001_01/item) and TOE that
> > calculate the checksum as the packet is being received. But we
> > don't have this in commodity switches yet, so check that switch
> > if you're having problems.
> > >
> > > Higher level checksums are worth it every time. Don't leave the
> > server without them. :-)
> > >
> > > Lynne Jolitz.
> > >
> > > ----
> > > We use SpamQuiz.
> > > If your ISP didn't make the grade try http://lynne.telemuse.net
> > >
> > > > -----Original Message-----
> > > > From: end2end-interest-bounces at postel.org
> > > > [mailto:end2end-interest-bounces at postel.org]On Behalf Of Lloyd Wood
> > > > Sent: Monday, April 04, 2005 2:48 AM
> > > > To: Faisal Aslam
> > > > Cc: end2end-interest at postel.org
> > > > Subject: Re: [e2e] UDP checksum field?
> > > >
> > > >
> > > > On Sun, 3 Apr 2005, Faisal Aslam wrote:
> > > >
> > > > > Why we have checksum field is in UDP header, as UDP does not provide
> > > > > data retransmission etc? I think it is used only to silently
> > > > > discarding a packet with wrong checksum (thats it?).
> > > >
> > > > yes - you need an end-to-end check against a corrupted packet. UDP
> > > > could have the checksum turned off, which proved disastrous for a
> > > > number of applications, subtly corrupted filing systems which didn't
> > > > have higher-level end2end checks etc.
> > > >
> > > > > Is there any  other application of checksum field?
> > > >
> > > > For other applications
> > > > http://www.faqs.org/rfcs/rfc3828.html
> > > >
> > > > UDP Lite originally sprang out of the observation that UDP has
> > > > redundant length information, and that this information could be
> > > > combined with the checksum (as in TCP/UDP) to give partial coverage.
> > > >
> > > > L.
> > > >
> > > > >
> > > > > Sorry if the question is too naive.
> > > > >
> > > > > Thanks
> > > > > Faisal
> >

From cannara at attglobal.net  Tue Apr  5 09:32:23 2005
From: cannara at attglobal.net (Cannara)
Date: Tue, 05 Apr 2005 09:32:23 -0700
Subject: [e2e] UDP checksum field?
References: <200504041733.KAA26987@gra.isi.edu>
Message-ID: <4252BD97.7E66A28E@attglobal.net>

When your Social Security check is off by a binary point, Bob, someone we all
know will care.  {:o]

Alex

Bob Braden wrote:
> 
> 
>   *> explaining exactly the problem and urging the TNS checksum be implemented.  No
>   *> response ever came back, and, if you look at a TNS packet today, the checksum
>   *> is still zero.  I guess no one has used the gateway software who cares about
>   *> their data.  :]
>   *>
>   *> Alex
>   *>
> 
> Or, the incidence of (detected) failures is so low that no one cares.
> 
> Bob Braden

From cannara at attglobal.net  Tue Apr  5 10:06:43 2005
From: cannara at attglobal.net (Cannara)
Date: Tue, 05 Apr 2005 10:06:43 -0700
Subject: [e2e] UDP checksum field?
References: <200504041733.KAA26987@gra.isi.edu>
	<Pine.GSO.4.50.0504042224570.9453-100000@argos.ee.surrey.ac.uk>
Message-ID: <4252C5A3.24AE19D7@attglobal.net>

Note that many manufacturers of bridges & routers over the years have had the
intelligence to include error-detection & correction in memory.  However, when
the marketing decisions are made about test and default configuration, that
feature is usually turned off, so performance will be better.  Check your
system manuals for those options!  

One of my personal experiences with this mistrake was at a major Wall St.
investment house, where their Sun jockeys wrote trading programs that the firm
obviously depended on to make $ every second of every day in every market for
every commodity around the world.  They called us at Net Gen because their
programs were changing unpredictably and they thought "it's the network" (the
usual guess).  So, flew to NYC with a Sniffer(r) and discussed the problem: 
"m" was changing to "n", "C" to "D", "6" to "7" every once in a while in their
sources, so compilations would fail despite no changes by the programmers.  I
told them a Sniffer won't be able to see changing source files on the net, so
we sat down to draw exactly where the bodies were buried in their systems.

The short story was, debug the server that holds the sources.  Since they had
huge disc & RAM in the server, and programs were written to disc but often sat
in cache RAM for a while (even days), we decided to test disc, but especially
RAM.  No tests showed anything.  Then one of their network guys (a VP, because
banks always have only VPs access data :) said he'd heard of a special,
extremely rough pattern test.  He downloaded it, ran it, and sure enough one
small group of bits in one RAM chip was a little flakey.  If EDC RAM had been
used, it would not have been an issue.  Hey, it wasn't the network, but it was
end-end!

Alex


Lloyd Wood wrote:
> 
> On Mon, 4 Apr 2005, Bob Braden wrote:
> 
> >   *> explaining exactly the problem and urging the TNS checksum be implemented.  No
> >   *> response ever came back, and, if you look at a TNS packet today, the checksum
> >   *> is still zero.  I guess no one has used the gateway software who cares about
> >   *> their data.  :]
> >   *>
> >   *> Alex
> >   *>
> >
> > Or, the incidence of (detected) failures is so low that no one cares.
> 
> This is arguably currently the state with RAM. If you write to a
> memory subsystem, you would like some confidence that when you read it
> back the value is correct. This is often assumed.
> 
> You can write a paranoid application to write to memory locations
> multiple times (and those sticking computers in orbit do), read back
> and compare and check all of memory for reliability periodically, but
> having a checksum on each memory location can be a better safeguard,
> though it decreases memory density somewhat.
> 
> There's been much furore of late about 'bad RAM' in Apple Macintoshes;
> many computers have moved to ECC RAM, but Apple (bar its
> commercially-focused XServe) has not. (A decade ago, people were
> grumbling about Apple not using parity RAM.)
> 
> The end-to-end argument remains as valid inside the computer too.
> 
> L.

From cannara at attglobal.net  Tue Apr  5 15:18:35 2005
From: cannara at attglobal.net (Cannara)
Date: Tue, 05 Apr 2005 15:18:35 -0700
Subject: [e2e] UDP checksum field?
References: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
	<4251AF7E.9050002@reed.com>
Message-ID: <42530EBB.C90343E5@attglobal.net>

Of course, David, but the opposite is: no checksum = no chance of
correctness.  And, the way NAT and other boxes have been intended and
deployed, many people consider them as "ends", making the mythical End-End
Principle even more of a fantasy.

Alex

"David P. Reed" wrote:
> 
> When all is said and done, the UDP checksum isn't, and never was, fully
> end-to-end protection, since there are few, if any, applications where
> the correctness of the application data can be *fully assured* by making
> sure that a single datagram gets delivered correctly.  It's an optional
> standardized way to help deal with a common risk that can arise due to
> bugs and other issues that show up in engineered systems, nto a
> guarantee of any particular property.
> 
> Since UDP datagrams can still be duplicated and modified by a
> checksum-preserving modification in the network (such modifications are
> now common, given middleboxes that discard the checksum and compute a
> new one in many cases), there is no way to assure by a mere checksum
> field that data has not been corrupted somewhere in the network.
> Assurance is not the benefit, applications still need to do truly
> end-to-end checking - UDP's ability to help in detecting incipient
> problems is very useful, however.
> 
> I won't elaborate here on the more subtle issues of TCP's lack of true
> end-to-end reliability.   Suffice it to say that there is a difficult
> issue in a definition of reliability that must depend on the difference
> between "design errors" and "random errors".

From eblanton at cs.ohiou.edu  Tue Apr  5 16:48:36 2005
From: eblanton at cs.ohiou.edu (Ethan Blanton)
Date: Tue, 5 Apr 2005 18:48:36 -0500
Subject: [e2e] UDP checksum field?
In-Reply-To: <42530EBB.C90343E5@attglobal.net>
References: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
	<4251AF7E.9050002@reed.com> <42530EBB.C90343E5@attglobal.net>
Message-ID: <20050405234836.GJ32194@colt.internal>

Cannara spake unto us the following wisdom:
> Of course, David, but the opposite is: no checksum = no chance of
> correctness.  And, the way NAT and other boxes have been intended and
> deployed, many people consider them as "ends", making the mythical End-End
> Principle even more of a fantasy.

I'm not sure exactly what you're trying to say here (I seldom am), but I
think it misses a very important point.  There  are  in  fact  a  _very_
large  number  of applictions which obey the end-to-end principle exten-
sively.  Take as an example class of such applications all  SSL  or  TLS
streams over TCP.

If  [heh]  you  have a particular axe to grind, you can probably come up
with some little semantic corner where this is not end-to-end  in  every
respect, but it will be just that -- a semantic little corner.  SSL over
TCP performs end-to-end flow  control,  end-to-end  congestion  control,
weak end-to-end integrity checking at the transport layer, and extremely
robust end-to-end integrity checking (possibly as  well  as  authentica-
tion)  at the application layer.  Note that, in this example, each layer
of the stack provides the largest reasonable set of  guarantees  it  can
provide,  and  the  ultimate  "end-to-end"  integrity and authentication
checks are performed at the _true_ ends of the connection -- the  appli-
cation.

I  realize  this  message is probably futile, but I hope it will end the
bickering over semantics in this particular  thread,  and  provide  some
food  for thought for future such threads.  No, the end-to-end principle
isn't practiced everywhere, but it is far from a "fantasy". And yes, I'm
sure  Ma  Bell  provided perfect end-to-end service via POTS in 1908 and
the Internet is so far behind we might as well not even  bother  talking
about  it,  no need to tell me that.  Since I use the Internet every day
(and, miraculously, it works), I'll leave  mailing-list  theories  about
how it can't possibly work on the shelf for now.

Ethan

-- 
The laws that forbid the carrying of arms are laws [that have no remedy
for evils].  They disarm only those who are neither inclined nor
determined to commit crimes.
		-- Cesare Beccaria, "On Crimes and Punishments", 1764
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050405/29c52a15/attachment.bin

From perfgeek at mac.com  Tue Apr  5 18:48:01 2005
From: perfgeek at mac.com (rick jones)
Date: Tue, 5 Apr 2005 18:48:01 -0700
Subject: [e2e] UDP checksum field?
In-Reply-To: <20050405234836.GJ32194@colt.internal>
References: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
	<4251AF7E.9050002@reed.com> <42530EBB.C90343E5@attglobal.net>
	<20050405234836.GJ32194@colt.internal>
Message-ID: <fb3b6b98561c183e109dd94d570f5109@mac.com>

> If  [heh]  you  have a particular axe to grind, you can probably come 
> up
> with some little semantic corner where this is not end-to-end  in  
> every
> respect, but it will be just that -- a semantic little corner.  SSL 
> over
> TCP performs end-to-end flow  control,  end-to-end  congestion  
> control,
> weak end-to-end integrity checking at the transport layer, and 
> extremely
> robust end-to-end integrity checking (possibly as  well  as  
> authentica-
> tion)  at the application layer.  Note that, in this example, each 
> layer
> of the stack provides the largest reasonable set of  guarantees  it  
> can
> provide,  and  the  ultimate  "end-to-end"  integrity and 
> authentication
> checks are performed at the _true_ ends of the connection -- the  
> appli-
> cation.

Would that semantic corner include SSL offload NICs like Britestream, 
and/or SSL offload boxes/blades we see advertised from time to time?-)

rick jones
there is no rest for the wicked, yet the virtuous have no pillows


From cannara at attglobal.net  Wed Apr  6 20:20:10 2005
From: cannara at attglobal.net (Cannara)
Date: Wed, 06 Apr 2005 20:20:10 -0700
Subject: [e2e] UDP checksum field?
References: <E1DIXWP-0005dr-00@smeg.dsg.stanford.edu>
	<4251AF7E.9050002@reed.com> <42530EBB.C90343E5@attglobal.net>
	<20050405234836.GJ32194@colt.internal>
Message-ID: <4254A6EA.9825DE58@attglobal.net>

Well, long Erudite reponses are always welcome Ethan, but rather than
Beccaria, even I, as an Italian American, actually prefer Mao: "All political
power stems from the barrel of a gun".  :]

Alex

Ethan Blanton wrote:
> 
> Cannara spake unto us the following wisdom:
> > Of course, David, but the opposite is: no checksum = no chance of
> > correctness.  And, the way NAT and other boxes have been intended and
> > deployed, many people consider them as "ends", making the mythical End-End
> > Principle even more of a fantasy.
> 
> I'm not sure exactly what you're trying to say here (I seldom am), but I
> think it misses a very important point.  There  are  in  fact  a  _very_
> large  number  of applictions which obey the end-to-end principle exten-
> sively.  Take as an example class of such applications all  SSL  or  TLS
> streams over TCP.
> 
> If  [heh]  you  have a particular axe to grind, you can probably come up
> with some little semantic corner where this is not end-to-end  in  every
> respect, but it will be just that -- a semantic little corner.  SSL over
> TCP performs end-to-end flow  control,  end-to-end  congestion  control,
> weak end-to-end integrity checking at the transport layer, and extremely
> robust end-to-end integrity checking (possibly as  well  as  authentica-
> tion)  at the application layer.  Note that, in this example, each layer
> of the stack provides the largest reasonable set of  guarantees  it  can
> provide,  and  the  ultimate  "end-to-end"  integrity and authentication
> checks are performed at the _true_ ends of the connection -- the  appli-
> cation.
> 
> I  realize  this  message is probably futile, but I hope it will end the
> bickering over semantics in this particular  thread,  and  provide  some
> food  for thought for future such threads.  No, the end-to-end principle
> isn't practiced everywhere, but it is far from a "fantasy". And yes, I'm
> sure  Ma  Bell  provided perfect end-to-end service via POTS in 1908 and
> the Internet is so far behind we might as well not even  bother  talking
> about  it,  no need to tell me that.  Since I use the Internet every day
> (and, miraculously, it works), I'll leave  mailing-list  theories  about
> how it can't possibly work on the shelf for now.
> 
> Ethan
> 
> --
> The laws that forbid the carrying of arms are laws [that have no remedy
> for evils].  They disarm only those who are neither inclined nor
> determined to commit crimes.
>                 -- Cesare Beccaria, "On Crimes and Punishments", 1764
> 
>   ------------------------------------------------------------------------------
>    Part 1.2Type: application/pgp-signature

From Farooq.Bari at cingular.com  Thu Apr  7 00:03:37 2005
From: Farooq.Bari at cingular.com (Bari, Farooq)
Date: Thu, 7 Apr 2005 00:03:37 -0700
Subject: [e2e] e2e QoS
Message-ID: <F9753E41A179D7438C42C6A8346544340174A27B@wa-msg10-bth.wireless.attws.com>


This maybe an old topic but with recent drive for network convergence
this topic seems to be popular again. There are several and seemingly
overlapping efforts by the industry on it. What do folks on this forum
think of on path mechanisms and off path mechanisms for e2e QoS.

Farooq


From rony3000us at hotmail.com  Thu Apr  7 01:35:31 2005
From: rony3000us at hotmail.com (Syed Faisal Hasan)
Date: Thu, 07 Apr 2005 08:35:31 +0000
Subject: [e2e] e2e QoS
In-Reply-To: <F9753E41A179D7438C42C6A8346544340174A27B@wa-msg10-bth.wireless.attws.com>
Message-ID: <BAY15-F15A67CC1915D33DE03C927C03E0@phx.gbl>

Farooq, perhaps you can have a look at the "Revisiting IP QoS: why do we 
care,
what have we learned? ACM SIGCOMM 2003 RIPQOS workshop report ", which
can be found at "http://portal.acm.org/citation.cfm?id=963995".

Faisal

>From: "Bari, Farooq" <Farooq.Bari at cingular.com>
>To: <end2end-interest at postel.org>
>Subject: [e2e] e2e QoS
>Date: Thu, 7 Apr 2005 00:03:37 -0700
>
>
>This maybe an old topic but with recent drive for network convergence
>this topic seems to be popular again. There are several and seemingly
>overlapping efforts by the industry on it. What do folks on this forum
>think of on path mechanisms and off path mechanisms for e2e QoS.
>
>Farooq
>
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


From nspring at cs.umd.edu  Tue Apr  5 10:18:42 2005
From: nspring at cs.umd.edu (Neil Spring)
Date: Tue, 5 Apr 2005 13:18:42 -0400
Subject: [e2e] CFP: HotNets-IV
Message-ID: <16ee210f82566d5bea3c599845843c0a@cs.umd.edu>


CALL FOR PAPERS

Fourth Workshop on
Hot Topics in Networks
HotNets-IV
http://www.acm.org/sigs/sigcomm/HotNets-IV
November 14-15, 2005
College Park, MD USA

The Fourth Workshop on Hot Topics in Networks, HotNets-IV, will bring 
together researchers in the networking and distributed systems 
community to debate emerging research directions. The goal of the 
workshop is to promote community-wide discussion of ideas that will 
influence and foster continued research in the field. The workshop will 
provide a venue for researchers to present new ideas that have the 
potential to significantly impact the community in the long term, 
especially those that are architectural or design-oriented in nature.

Each potential participant should submit a short paper describing such 
an idea; the paper could, for example, expose a new problem, advocate a 
new solution, or debunk existing work. Attendance is limited to around 
60 participants, by invitation based primarily on paper submissions. 
HotNets-IV is sponsored by ACM SIGCOMM.

We encourage submissions across the broad range of networking and 
distributed systems research, not limited to those topics covered by 
the SIGCOMM conference. Submissions may be on topics traditionally 
published at SIGCOMM, NSDI, SOSP/OSDI, SenSys, or MobiCom, or they may 
be on topics that have yet to find a home in an established conference. 
Topics of interest include, but are by no means limited to:

     * Internet and non-Internet architectures, past, present, and future
     * Overlay, peer-to-peer, and programmable network infrastructures
     * Sensor networks, storage area networks, and other examples of 
"extreme" networking
     * Wireless networks, mobility, and pervasive computing
     * Network failures, vulnerabilities, and exploits: detection, 
analysis and defenses
     * Network management and control
     * Novel distributed applications and services, including systems 
for content distribution and real-time media
     * Lessons drawn from failed research, and controversial or 
disruptive topics
     * Architectural insights or understanding of network behaviors

The selection of HotNets papers will be based primarily on their 
potential to influence future research. This influence can be exercised 
in many ways, exemplified by but not limited to the following:

     * Describing a novel approach to an old problem that promises to 
influence future research
     * Describing a new problem that requires our attention
     * Articulating a new perspective about networking and distributed 
systems
     * Debunking an old perspective about networking and distributed 
systems

Copies of the accepted papers will be made publicly available via the 
Web prior to the workshop. Proceedings will be distributed at the 
workshop and will be made available through ACM's digital library. 
Examples of papers from past HotNets workshops can be found at: 
http://www.acm.org/sigs/sigcomm/hotnets. The Program Committee will 
write short New York Times Book Review-style reviews of accepted 
papers, for inclusion in the proceedings, to provide the broader 
community with an additional perspective on future directions in the 
field.  Unlike other workshops and conferences, rejected papers will 
only receive a very short review.

The acceptance of a paper to the HotNets workshop does not preclude the 
later acceptance of a related paper to the ACM Sigcomm 2006 conference. 
However, any derived Sigcomm submission must provide a significantly 
more in-depth treatment of the idea, for example, by providing a more 
complete evaluation. Assuming that there is sufficient new material in 
a Sigcomm submission, the existence of a prior publication at HotNets 
will be ignored during the evaluation for acceptance to Sigcomm. 
Further details about this policy and its application to other 
conferences will be posted on the HotNets IV Web page 
(http://www.acm.org/sigs/sigcomm/HotNets-IV).
Submission Instructions

Submitted papers must be no longer than 6 pages (10 pt font, 1 inch 
margins). The review process is not blind, each contributing author 
should be included on the first page. Only electronic submissions in 
PostScript or PDF will be accepted. Submissions must be written in 
English, render without error using standard tools (Ghostview or 
Acrobat Reader) and print on US-Letter sized paper. Following standard 
academic practice, HotNets requests that its reviewers hold submitted 
papers in confidence. Only accepted papers will be published in 
conference proceedings. Submission information will be posted at: 
http://www.acm.org/sigs/sigcomm/HotNets-IV
Important Dates

      Submissions due:    1 August 2005 (11:59PM Eastern Daylight Time)

      Notification of Acceptance:    10 October 2005

      Camera-ready copy due:    31 October 2005

      Workshop:    14-15 November 2005


Organizers

General Chair:

     * Neil Spring (UMD)

Program Committee:

     * Jon Crowcroft (Cambridge) (Co-chair)
     * Srinivasan Seshan (CMU) (Co-chair)
     * Bengt Ahlgren (SICS)
     * Paul Barford (UWisc)
     * John Byers (BU)
     * Deborah Estrin (UCLA)
     * Tim Griffin (Cambridge)
     * Venkata Padmanabhan (Microsoft Research)
     * Jen Rexford (Princeton)
     * Ion Stoica (UCB)


From braden at ISI.EDU  Fri Apr  8 12:28:08 2005
From: braden at ISI.EDU (Bob Braden)
Date: Fri, 8 Apr 2005 12:28:08 -0700 (PDT)
Subject: [e2e] CFP: First IEEE ICNP Workshop on Secure Network Protocols
	(NPSec)
Message-ID: <200504081928.MAA28433@gra.isi.edu>


                           CALL FOR PAPERS

       First IEEE ICNP Workshop on Secure Network Protocols (NPSec)
                      Boston, Massachusetts, USA
                          November 6, 2005

                 http://www.cerias.purdue.edu/npsec/


                    (In conjunction with ICNP 2005:
       The 13th IEEE International Conference on Network Protocols)


SCOPE:

The first IEEE ICNP workshop on Secure Network Protocols (NPSec) is a
one-day event held in conjunction with ICNP 2005. NPSec focuses on two
general areas.  The first focus is on the development and analysis of
secure or hardened protocols for the operation (establishment and
maintenance) of network infrastructure, including such targets as
secure multidomain, ad-hoc, sensor or overlay networks, or other
related target areas.  This can include new protocols, enhancements to
existing protocols, protocol analysis, and new attacks on existing
protocols.  The second focus is on employing such secure network
protocols to create or enhance network applications.  Examples include
collaborative firewalls, incentive strategies for multiparty networks,
and deployment strategies to enable secure applications.

TOPICS OF INTEREST:

* secure or hardened protocols for operation of networks including (but
  not limited to):
    - internetworking, e.g., BGP, DNS
    - MANETs
    - LANs and WLANs
    - cellular data networks
    - p2p and other overlay networks
    - federated trust systems
    - sensor networks
* vulnerability analysis of existing protocols and applications (both
  theoretical and case studies), including novel attacks
* key distribution
* collaborative intrusion detection and response, such as
  collaborative firewalling
* incentive systems for multiparty networks, such as for p2p and MANET
  routing
* protocol configuration and deployment strategies enabling secure
  applications, e.g., e-commerce


IMPORTANT DATES:
  Paper submission: June 3, 2005
  Notification of acceptance: July 15, 2005
  Camera ready version: August 5, 2005


ORGANIZING COMMITTEE:

General Chair:
    Sonia Fahmy, Purdue University

Technical Program Committee Chairs:
    George Kesidis, Pennsylvania State University
    Nicholas Weaver, International Computer Science Institute

Publicity Chair:
    James Minseok Kwon, Rochester Institute of Technology

Web Chair:
    Cristina Nita-Rotaru, Purdue University

TECHNICAL PROGRAM COMMITTEE:
    Ehab Al-Shaer, DePaul University
    David Brumley, Carnegie Mellon University
    Guohong Cao, Pennsylvania State University
    Joseph Evans, U.S. National Science Foundation
    Lixin Gao, University of Massachusetts, Amherst
    Carl A. Gunter, University of Illinois at Urbana-Champaign
    George Kesidis, Pennsylvania State University
    Edward Knightly, Rice University
    Iordanis Koutsopoulos, University of Thessaly
    Carl Landwehr, University of Maryland
    Marco Ajmone Marsan, Politecnico di Torino, Italy
    Douglas Maughan, Department of Homeland Security
    Patrick McDaniel, Pennsylvania State University
    Jelena Mirkovic, University of Delaware
    Peng Ning, North Carolina State University
    Cristina Nita-Rotaru, Purdue University
    Phil Porras, SRI
    Saswati Sarkar, University of Pennsylvania
    Lakshminarayanan Subramanian, University of California at Berkeley
    Nina Taft, Intel Research
    Nicholas Weaver, International Computer Science Institute
    Felix Wu, University of California at Davis
    Jun Xu, Georgia Institute of Technology
    Bulent Yener, Rensselaer Polytechnic Institute

SUBMISSION GUIDELINES:

Submissions must be in electronic form, as Postscript or PDF
documents.  Papers can be up to 6 two-column pages, and can convey
work-in-progress that is not completely mature but shows promise.


For more information, please see:
          http://www.cerias.purdue.edu/npsec/


----- End Included Message -----


From pb at cs.wisc.edu  Tue Apr 12 14:28:05 2005
From: pb at cs.wisc.edu (Paul Barford)
Date: Tue, 12 Apr 2005 16:28:05 -0500 (CDT)
Subject: [e2e] Wisconsin network research lab now openly available
Message-ID: <Pine.LNX.4.58.0504121624100.15693@tpol.cs.wisc.edu>

All,

It is our pleasure to announce the availability of the Wisconsin Advance
Internet Laboratory (WAIL) for open use by the network research community.
With support from our partners at Cisco, Intel, University of Utah, NSF,
and Internet2 we have extended the Emulab user interface to enable remote
access and use of 80 PC's and 34 IP routers (see list below).  The remote
interface - called Schooner - enables users to connect PC's to fixed
configurations of routers (or in PC-only configurations like traditional
Emulab) thereby creating testbeds suitable for a range of experiments.
Like Emulab, the PC's come with a basic set of tools and can be modified
by users with their own experimental code.  At present, we offer a library
of fixed router configuration principally comprised of simple topologies
such as dumbells. We can offer limited support in terms of creating
customized topologies and are in the process of expanding the topology
library to make all systems generally available.  Schooner has
documentation which should enable users to get up and running with basic
configurations, but we emphasize that the environment is a work in
progress.  We look forward to supporting projects to the extent that our
resources allow and hope you will find this environment useful in your
work.  Please feel free to access the lab via:

http://www.schooner.wail.wisc.edu

Best,

Paul Barford - director
Chris Alfeld
Ana Bizarro
Dave Plonka


Current WAIL Equipment List (new equipment is added on a regular basis - if
there is something you need, let us know - we may have it):

80 PC's:  Intel 2Ghz Pentium 4, 1GB RAM, Intel 1Gbps NIC

6 Cisco GSR 12000:  OC48, OC12, OC3, Gig, FE interfaces
4 Cisco 7500: OC3, GE, FE, Serial interfaces
10 Cisco 7300: GE interfaces
5 Cisco 7200: GE, FE, OC3, Serial interfaces
5 Cisco 3600: FE interfaces
4 Cisco 2600: FE interfaces

From dima at krioukov.net  Wed Apr 13 14:38:58 2005
From: dima at krioukov.net (Dmitri Krioukov)
Date: Wed, 13 Apr 2005 14:38:58 -0700
Subject: [e2e] E2E research visions
In-Reply-To: <20050329203837.1AC9A24D@aland.bbn.com>
Message-ID: <000101c54071$38f883a0$2fe2acc0@zurich>

interesting text. few questions:

in section 6, do you want to say that
"local anti-scale" will somehow be a
solution to "global scale", or do you
simply want to attract our attention to
the former *in addition* to the latter?
in any case, you don't have a separate
section on global scalability: do you
think it's no longer an issue today and
it won't be one in the future?
--
dima.
http://www.caida.org/~dima/


> -----Original Message-----
> From: end2end-interest-bounces at postel.org 
> [mailto:end2end-interest-bounces at postel.org] On Behalf Of 
> Craig Partridge
> Sent: Tuesday, March 29, 2005 12:39 PM
> To: end2end-interest at postel.org
> Subject: [e2e] E2E research visions
> 
> 
> 
> Hi folks:
> 
> At the most recent meeting of the End2End Research Group, the 
> group, along
> with some attendees, had a discussion of possible research 
> visions that
> could inspire innovative communications research over the next ten
> years or so.  Dave Clark and I, with help from several other 
> participants
> in the discussion, have written up the ideas from that 
> discussion and a copy
> is available on my website
> 
>     http://www.ir.bbn.com/~craig/e2e-vision.pdf
> 
> for anyone who is interested.
> 
> Craig
> 
> E-mail: craig at aland.bbn.com or craig at bbn.com


From tolizhi at gmail.com  Fri Apr 15 10:09:54 2005
From: tolizhi at gmail.com (Zhi Li)
Date: Fri, 15 Apr 2005 10:09:54 -0700
Subject: [e2e] Resilient UDP
Message-ID: <7d6098f405041510095c4c73be@mail.gmail.com>

Hello,

I recently came aross a term called "reslient UDP". 
I couldn't find any related document on the web.
Does anyone know the detail operations?  Or, could you please suggest
me some references or papers about it?

Thanks a lot and have a nice weekend!

Regards,
Zhi

From huitema at windows.microsoft.com  Fri Apr 15 16:52:09 2005
From: huitema at windows.microsoft.com (Christian Huitema)
Date: Fri, 15 Apr 2005 16:52:09 -0700
Subject: [e2e] Resilient UDP
Message-ID: <DAC3FCB50E31C54987CD10797DA511BA0E3D5601@WIN-MSG-10.wingroup.windeploy.ntdev.microsoft.com>

> I recently came aross a term called "reslient UDP".
> I couldn't find any related document on the web.
> Does anyone know the detail operations?  Or, could you please suggest
> me some references or papers about it?

It is a version of UDP used with military intelligence.

-- Christian Huitema

From rony3000us at hotmail.com  Sat Apr 16 03:12:52 2005
From: rony3000us at hotmail.com (Syed Faisal Hasan)
Date: Sat, 16 Apr 2005 10:12:52 +0000
Subject: [e2e] Can TCP's congestion window go beyond receiver's maximum
	advertised window?
Message-ID: <BAY15-F211369C54E8EAE65ADD342C0370@phx.gbl>

Dear Folks,

I was trying to do a simple simulation using NS-2.27 and  I found
something interesting.

The topology is as follows
[n0]---------------[n1]

n0 is running a ftp application on top of tcp.
TCP receiver's advertised maximum window size is set to 20 (that is
the default in NS)

Congestion window (cwnd) should never go beyond receive window (rwnd), 
right?
Then why in the simulation, cwnd grows beyond rwnd? cwnd reaches to
24, while rwnd is fixed at 20.

If this is a silly question, I 'm sorry for asking. But I 'ld like to
have an explanation.

Faisal
#==========================================
#The NS script is below
#===========================================
set ns [new Simulator]

set file1 [open testout.tr w]
$ns trace-all $file1

set file2 [open ./temp/namtest.nam w]
$ns namtrace-all $file2
set windowfile [ open ./temp/WindowFile w]

proc finish {} {

global ns file1 file2
$ns flush-trace
close $file1
close $file2

exec nam ./temp/namtest.nam &
exec xgraph ./temp/WindowFile -geometry 800x600 &
exit 0

}

set n0 [$ns node]
set n1 [$ns node]
$ns duplex-link $n0 $n1 0.2Mb 500ms DropTail
$ns duplex-link-op $n0 $n1 orient right

set tcp [new Agent/TCP/Sack1]
$ns attach-agent $n0 $tcp
$tcp set window_ 20

set tcpsink [new Agent/TCPSink]
$ns attach-agent $n1 $tcpsink
$ns connect $tcp $tcpsink

set ftp [new Application/FTP]
$ftp attach-agent $tcp

proc getwindow {source file } {

global ns
set now [$ns now]
set time 0.1
set cwnd [$source set cwnd_]
puts $file "$now $cwnd"
$ns at [expr $now+$time] "getwindow $source $file"

}

$ns at 0.1 "getwindow $tcp $windowfile"

$ns at 0.0 "$ftp start"
$ns at 9.0 "$ftp stop"
$ns at 10 "finish"
$ns run
#===================================

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/


From arjuna.sathiaseelan at gmail.com  Sat Apr 16 13:16:09 2005
From: arjuna.sathiaseelan at gmail.com (Arjuna Sathiaseelan)
Date: Sat, 16 Apr 2005 21:16:09 +0100
Subject: [e2e] end2end-interest Digest, Vol 14, Issue 16
In-Reply-To: <mailman.1.1113678000.5206.end2end-interest@postel.org>
References: <mailman.1.1113678000.5206.end2end-interest@postel.org>
Message-ID: <1ef2259005041613166d1575a1@mail.gmail.com>

Dear Faisal,
  Even though this question should be directed to the ns-2 list :) -
yes the cwnd can grow beyond the rwnd - but the amount of data that is
being sent - i.e. the sending window is always the min of the cwnd and
the rwnd. So the best way is to set to a window that is equal to the
bandwidth delay product - if you want to utilize the link to its
fullest.

Regds,
Arjuna

On 4/16/05, end2end-interest-request at postel.org
<end2end-interest-request at postel.org> wrote:
> Send end2end-interest mailing list submissions to
>        end2end-interest at postel.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://www.postel.org/mailman/listinfo/end2end-interest
> or, via email, send a message with subject or body 'help' to
>        end2end-interest-request at postel.org
> 
> You can reach the person managing the list at
>        end2end-interest-owner at postel.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of end2end-interest digest..."
> 
> Today's Topics:
> 
>   1. Re: Resilient UDP (Christian Huitema)
>   2. Can TCP's congestion window go beyond receiver's maximum
>      advertised window? (Syed Faisal Hasan)
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 15 Apr 2005 16:52:09 -0700
> From: "Christian Huitema" <huitema at windows.microsoft.com>
> Subject: Re: [e2e] Resilient UDP
> To: "Zhi Li" <tolizhi at gmail.com>, <end2end-interest at postel.org>
> Message-ID:
>        <DAC3FCB50E31C54987CD10797DA511BA0E3D5601 at WIN-MSG-10.wingroup.windeploy.ntdev.microsoft.com>
> 
> Content-Type: text/plain;       charset="us-ascii"
> 
> > I recently came aross a term called "reslient UDP".
> > I couldn't find any related document on the web.
> > Does anyone know the detail operations?  Or, could you please suggest
> > me some references or papers about it?
> 
> It is a version of UDP used with military intelligence.
> 
> -- Christian Huitema
> 
> ------------------------------
> 
> Message: 2
> Date: Sat, 16 Apr 2005 10:12:52 +0000
> From: "Syed Faisal Hasan" <rony3000us at hotmail.com>
> Subject: [e2e] Can TCP's congestion window go beyond receiver's
>        maximum advertised window?
> To: end2end-interest at postel.org
> Message-ID: <BAY15-F211369C54E8EAE65ADD342C0370 at phx.gbl>
> Content-Type: text/plain; format=flowed
> 
> Dear Folks,
> 
> I was trying to do a simple simulation using NS-2.27 and  I found
> something interesting.
> 
> The topology is as follows
> [n0]---------------[n1]
> 
> n0 is running a ftp application on top of tcp.
> TCP receiver's advertised maximum window size is set to 20 (that is
> the default in NS)
> 
> Congestion window (cwnd) should never go beyond receive window (rwnd),
> right?
> Then why in the simulation, cwnd grows beyond rwnd? cwnd reaches to
> 24, while rwnd is fixed at 20.
> 
> If this is a silly question, I 'm sorry for asking. But I 'ld like to
> have an explanation.
> 
> Faisal
> #==========================================
> #The NS script is below
> #===========================================
> set ns [new Simulator]
> 
> set file1 [open testout.tr w]
> $ns trace-all $file1
> 
> set file2 [open ./temp/namtest.nam w]
> $ns namtrace-all $file2
> set windowfile [ open ./temp/WindowFile w]
> 
> proc finish {} {
> 
> global ns file1 file2
> $ns flush-trace
> close $file1
> close $file2
> 
> exec nam ./temp/namtest.nam &
> exec xgraph ./temp/WindowFile -geometry 800x600 &
> exit 0
> 
> }
> 
> set n0 [$ns node]
> set n1 [$ns node]
> $ns duplex-link $n0 $n1 0.2Mb 500ms DropTail
> $ns duplex-link-op $n0 $n1 orient right
> 
> set tcp [new Agent/TCP/Sack1]
> $ns attach-agent $n0 $tcp
> $tcp set window_ 20
> 
> set tcpsink [new Agent/TCPSink]
> $ns attach-agent $n1 $tcpsink
> $ns connect $tcp $tcpsink
> 
> set ftp [new Application/FTP]
> $ftp attach-agent $tcp
> 
> proc getwindow {source file } {
> 
> global ns
> set now [$ns now]
> set time 0.1
> set cwnd [$source set cwnd_]
> puts $file "$now $cwnd"
> $ns at [expr $now+$time] "getwindow $source $file"
> 
> }
> 
> $ns at 0.1 "getwindow $tcp $windowfile"
> 
> $ns at 0.0 "$ftp start"
> $ns at 9.0 "$ftp stop"
> $ns at 10 "finish"
> $ns run
> #===================================
> 
> _________________________________________________________________
> FREE pop-up blocking with the new MSN Toolbar - get it now!
> http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
> 
> ------------------------------
> 
> _______________________________________________
> end2end-interest mailing list
> end2end-interest at postel.org
> http://www.postel.org/mailman/listinfo/end2end-interest
> 
> End of end2end-interest Digest, Vol 14, Issue 16
> ************************************************
>

From stelios at dcs.gla.ac.uk  Sun Apr 17 11:09:18 2005
From: stelios at dcs.gla.ac.uk (Stylianos Papanastasiou)
Date: Sun, 17 Apr 2005 19:09:18 +0100
Subject: [e2e] Can TCP's congestion window go beyond receiver's
	maximum	advertised window?
In-Reply-To: <BAY15-F211369C54E8EAE65ADD342C0370@phx.gbl>
References: <BAY15-F211369C54E8EAE65ADD342C0370@phx.gbl>
Message-ID: <1113761358.7745.6.camel@bioko>

I think you should direct similar questions to the ns-users mailing
list.
The short answer is:

The window() (and windowd()) functions return the avail. cwnd which is 
the min. of cwnd_ and window_ (or wnd_ in C++ space). This is the usable
cong. window for all purposes in NS2. Hence, even though your trace says
that cwnd_ reaches the value 24 (and it does), when for instance halving
the congestion window ns does min(24,20)/2, and so you get a value of 10
for the cwnd. Your traces will verify this.

Stelios


On Sat, 2005-04-16 at 10:12 +0000, Syed Faisal Hasan wrote:
> Dear Folks,
> 
> I was trying to do a simple simulation using NS-2.27 and  I found
> something interesting.
> 
> The topology is as follows
> [n0]---------------[n1]
> 
> n0 is running a ftp application on top of tcp.
> TCP receiver's advertised maximum window size is set to 20 (that is
> the default in NS)
> 
> Congestion window (cwnd) should never go beyond receive window (rwnd), 
> right?
> Then why in the simulation, cwnd grows beyond rwnd? cwnd reaches to
> 24, while rwnd is fixed at 20.
> 
> If this is a silly question, I 'm sorry for asking. But I 'ld like to
> have an explanation.
> 
> Faisal
> #==========================================
> #The NS script is below
> #===========================================
> set ns [new Simulator]
> 
> set file1 [open testout.tr w]
> $ns trace-all $file1
> 
> set file2 [open ./temp/namtest.nam w]
> $ns namtrace-all $file2
> set windowfile [ open ./temp/WindowFile w]
> 
> proc finish {} {
> 
> global ns file1 file2
> $ns flush-trace
> close $file1
> close $file2
> 
> exec nam ./temp/namtest.nam &
> exec xgraph ./temp/WindowFile -geometry 800x600 &
> exit 0
> 
> }
> 
> set n0 [$ns node]
> set n1 [$ns node]
> $ns duplex-link $n0 $n1 0.2Mb 500ms DropTail
> $ns duplex-link-op $n0 $n1 orient right
> 
> set tcp [new Agent/TCP/Sack1]
> $ns attach-agent $n0 $tcp
> $tcp set window_ 20
> 
> set tcpsink [new Agent/TCPSink]
> $ns attach-agent $n1 $tcpsink
> $ns connect $tcp $tcpsink
> 
> set ftp [new Application/FTP]
> $ftp attach-agent $tcp
> 
> proc getwindow {source file } {
> 
> global ns
> set now [$ns now]
> set time 0.1
> set cwnd [$source set cwnd_]
> puts $file "$now $cwnd"
> $ns at [expr $now+$time] "getwindow $source $file"
> 
> }
> 
> $ns at 0.1 "getwindow $tcp $windowfile"
> 
> $ns at 0.0 "$ftp start"
> $ns at 9.0 "$ftp stop"
> $ns at 10 "finish"
> $ns run
> #===================================
> 
> _________________________________________________________________
> FREE pop-up blocking with the new MSN Toolbar - get it now! 
> http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/
> 


From craig at aland.bbn.com  Mon Apr 18 09:47:27 2005
From: craig at aland.bbn.com (Craig Partridge)
Date: Mon, 18 Apr 2005 12:47:27 -0400
Subject: [e2e] Can TCP's congestion window go beyond receiver's maximum
	advertised window?
In-Reply-To: Your message of "Sat, 16 Apr 2005 10:12:52 -0000."
	<BAY15-F211369C54E8EAE65ADD342C0370@phx.gbl> 
Message-ID: <20050418164727.913981FF@aland.bbn.com>


I don't know why this happens, but it is clear that you have to track
the two values (cwnd and rwnd) separately, as the receiver can open
its window (and the sender probably never knows for sure how big the
fully-open receiver window would be).

Craig

From rbeverly at rbeverly.net  Wed Apr 20 11:26:40 2005
From: rbeverly at rbeverly.net (Robert Beverly)
Date: Wed, 20 Apr 2005 14:26:40 -0400
Subject: [e2e] Internet email performance study
Message-ID: <20050420182640.GA26116@rbeverly.net>


Hi all,

We're looking for operational-types lurking on the list with experience 
running large mail servers.  In particular, we have collected a large 
amount of data as part of an Internet email performance study that we 
cannot entirely explain. If you can help us or are simply curious about our 
findings, we'd love to hear from you.

WHAT WE DID: Briefly, we used SMTP bounce-backs as the basis of an email 
active measurement survey.  Using random addresses as unique identifiers, 
we measure latency, loss, paths, etc. to a large set of Internet 
MTAs.   Approximately 1/3 of all servers we've surveyed respond with 
bounce-backs.  We've found some interesting results.  For example latencies 
of days (30 days in once instance).

WHAT WE DON'T UNDERSTAND:  Most servers behave as we expect, either always 
replying with bounce-backs or never replying.  However, some exhibit odd 
and seemingly non-deterministic behavior.  For example, a server will 
respond to all emails for weeks, and then reply to only a fraction (e.g., 
25-75%) of the emails in a seemingly random pattern for some period of time 
(e.g, 4 hours).  Further, we often see these patterns correlated within a 
domain (e.g., a subset of the MTAs will enter and exist this loss mode at 
the same time).  We are fairly certain that the loss is an artifact of the 
MTA behavior or local administration.  While we can guess reasons this 
might occur, we have yet to find an administrator who can explain this 
behavior with an architecture used in practice.

More details on the project including our exact methodology, plausible 
explanations for the loss and a FAQ are available on our web site:
    http://ana.lcs.mit.edu/emailtester

Thanks!

Rob Beverly / Mike Afergan


From arjuna.sathiaseelan at gmail.com  Thu Apr 21 00:42:20 2005
From: arjuna.sathiaseelan at gmail.com (Arjuna Sathiaseelan)
Date: Thu, 21 Apr 2005 08:42:20 +0100
Subject: [e2e] Question on MTU
Message-ID: <1ef2259005042100424feef544@mail.gmail.com>

Dear All,
  I would be very much obliged if you could let me know the following:

As MTU is the maximum amount of information per packet that can be
sent on the wire, does it include the MSS + TCP header + IP header +
DL header (with error correction codes) or is it just the MSS + TCP
header + IP header?

Because for the Ethernet - which has a MTU of 1500 bytes - we usually
have 1460 bytes as MSS + 20 bytes TCP header + 20 bytes IP header.
What about the header that would be added in link layer?

Please do clarify me.

Regds,
Arjuna

From touch at ISI.EDU  Thu Apr 21 09:28:28 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 09:28:28 -0700
Subject: [e2e] Question on MTU
In-Reply-To: <1ef2259005042100424feef544@mail.gmail.com>
References: <1ef2259005042100424feef544@mail.gmail.com>
Message-ID: <4267D4AC.8090503@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Arjuna Sathiaseelan wrote:
> Dear All,
>   I would be very much obliged if you could let me know the following:
> 
> As MTU is the maximum amount of information per packet that can be
> sent on the wire, does it include the MSS + TCP header + IP header +
> DL header (with error correction codes) or is it just the MSS + TCP
> header + IP header?
> 
> Because for the Ethernet - which has a MTU of 1500 bytes - we usually
> have 1460 bytes as MSS + 20 bytes TCP header + 20 bytes IP header.
> What about the header that would be added in link layer?
> 
> Please do clarify me.
> 
> Regds,
> Arjuna

MSS and MTU both omit headers, i.e., they are payload sizes.

MTU usually refers to a link layer, and denotes the maximum link ayboad
size, excluding link header/trailer info. For Ethernet, such
header/trailers include:

	- 14 byte header
	- 4 byte 802.1q (VLAN) tag
	- 4 byte CRC

Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
byte frames. From the link frame size, subtract the link header/trailer
to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
are jumbograms of 9,000 bytes in the extended ethernet spec.

MSS usually refers to a transport protocol, e.g., TCP, and denotes the
max payload size there too. It is also relative to the network (IPv4,
IPv6) protocol _and_ link layer used.

And just as link layer overhead sizes vary, so do network layer overhead
sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
included, e.g., 48 for IPv6 with jumbogram option).

Joe


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCZ9SsE5f5cImnZrsRAlPPAJ42GssC74fPcWXKtjS0pvA+7K5mbwCgnaPz
u8ahwcXwaxH7K2anV7oik0Y=
=bIR5
-----END PGP SIGNATURE-----

From mtariq at cc.gatech.edu  Thu Apr 21 10:39:08 2005
From: mtariq at cc.gatech.edu (Muhammad Mukarram Bin Tariq)
Date: Thu, 21 Apr 2005 13:39:08 -0400
Subject: [e2e] study on 'NAT'ed hosts
Message-ID: <4267E53C.4060802@cc.gatech.edu>

Hello,

I was wondering whether there is a study on estimating fraction of hosts 
that are connected to Internet from behind a NAT, or share globally 
routable IP addresses in some time-multiplexed fashion.

-- Mukarram

From tvest at pch.net  Thu Apr 21 11:52:06 2005
From: tvest at pch.net (Tom Vest)
Date: Thu, 21 Apr 2005 14:52:06 -0400
Subject: [e2e] study on 'NAT'ed hosts
In-Reply-To: <4267E53C.4060802@cc.gatech.edu>
References: <4267E53C.4060802@cc.gatech.edu>
Message-ID: <4569df0b0d481cf1247f31f9d299c388@pch.net>


On Apr 21, 2005, at 1:39 PM, Muhammad Mukarram Bin Tariq wrote:

> Hello,
>
> I was wondering whether there is a study on estimating fraction of 
> hosts that are connected to Internet from behind a NAT, or share 
> globally routable IP addresses in some time-multiplexed fashion.
>
> -- Mukarram

I would be especially interested in anything that might suggest the 
degree to which NAPT is used in ways that break the association/ratio 
between access-related address utilization, e.g., and a peak 
simultaneous usage rate. If most RIRs/NIRs/LIRs use such ratios as a 
component of their IP address request validation process (and 
conversely, most ISPs use it in their IP address requests), doesn't 
this mean that, practically speaking, NAPT does not in fact break this 
association?

Thanks -- Tom


From svp.mailman at gmail.com  Thu Apr 21 12:03:29 2005
From: svp.mailman at gmail.com (Swapnil Patil)
Date: Thu, 21 Apr 2005 15:03:29 -0400
Subject: [e2e] study on 'NAT'ed hosts
In-Reply-To: <4267E53C.4060802@cc.gatech.edu>
References: <4267E53C.4060802@cc.gatech.edu>
Message-ID: <dacda533050421120322371dea@mail.gmail.com>

see "A Technique for Counting NATted Hosts" by Steve Bellovin
appeared in the Internet Measurement Workshop 2002.

regards
-swapnil

On 4/21/05, Muhammad Mukarram Bin Tariq <mtariq at cc.gatech.edu> wrote:
> Hello,
> 
> I was wondering whether there is a study on estimating fraction of hosts
> that are connected to Internet from behind a NAT, or share globally
> routable IP addresses in some time-multiplexed fashion.
> 
> -- Mukarram
> 


-- 
This is Swapnil Patil's listserv address.

From ljorgenson at apparentnetworks.com  Thu Apr 21 12:27:42 2005
From: ljorgenson at apparentnetworks.com (Loki Jorgenson)
Date: Thu, 21 Apr 2005 12:27:42 -0700
Subject: [e2e] MTU - IP layer
Message-ID: <F09324DCDD2F5D488EAC603D6B299DC7D2C30D@jsrvr8.jaalam.net>


Minor note - MTU is technically Layer 3 (as opposed to link layer or
layer 2).  So it is quite correct to describe the MTU as the link layer
payload size.  So, as noted, 1518 bytes is the frame size at layer 2.

However, it is very important to keep in mind that MTU and path MTU
discovery operate at Layer 3.  For example, boundaries between differing
MTUs should be handled by Layer 3 devices (not switches) to avoid
end-to-end issues that can arise.

Loki

----

"Joe Wrote:"


Date: Thu, 21 Apr 2005 09:28:28 -0700
From: Joe Touch <touch at ISI.EDU>
Subject: Re: [e2e] Question on MTU
To: Arjuna Sathiaseelan <arjuna.sathiaseelan at gmail.com>
Cc: end2end-interest at postel.org
Message-ID: <4267D4AC.8090503 at isi.edu>
Content-Type: text/plain; charset=ISO-8859-1


MTU usually refers to a link layer, and denotes the maximum link ayboad
size, excluding link header/trailer info. For Ethernet, such
header/trailers include:

        - 14 byte header
        - 4 byte 802.1q (VLAN) tag
        - 4 byte CRC

Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
byte frames. From the link frame size, subtract the link header/trailer
to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
are jumbograms of 9,000 bytes in the extended ethernet spec.

MSS usually refers to a transport protocol, e.g., TCP, and denotes the
max payload size there too. It is also relative to the network (IPv4,
IPv6) protocol _and_ link layer used.

And just as link layer overhead sizes vary, so do network layer overhead
sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
included, e.g., 48 for IPv6 with jumbogram option).


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050421/b7127147/attachment.html

From am.amir at gmail.com  Thu Apr 21 12:54:26 2005
From: am.amir at gmail.com (Aamir Mehmood)
Date: Fri, 22 Apr 2005 00:54:26 +0500
Subject: [e2e] Jitter Calculations in IP networks.
Message-ID: <12a3f40805042112543dd601c2@mail.gmail.com>

Hi all,
We are doing analysis of core ip backbone. Can some one please let me
know how jitter is calculated  in ip networks. Is there any software
except ethereal which can calculate the jitter from the captured RTP
stream.

Regards

Amir

From david.borman at windriver.com  Thu Apr 21 12:58:05 2005
From: david.borman at windriver.com (David Borman)
Date: Thu, 21 Apr 2005 14:58:05 -0500
Subject: [e2e] Question on MTU
In-Reply-To: <4267D4AC.8090503@isi.edu>
References: <1ef2259005042100424feef544@mail.gmail.com>
	<4267D4AC.8090503@isi.edu>
Message-ID: <7bf770bf3d525c13130f6408e21788b7@windriver.com>


On Apr 21, 2005, at 11:28 AM, Joe Touch wrote:

> MSS usually refers to a transport protocol, e.g., TCP, and denotes the
> max payload size there too. It is also relative to the network (IPv4,
> IPv6) protocol _and_ link layer used.
>
> And just as link layer overhead sizes vary, so do network layer 
> overhead
> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
> included, e.g., 48 for IPv6 with jumbogram option).

But the advertised MSS in the TCP MSS option should not be adjusted to 
reflect any options or intermediary headers, just the fixed IP and TCP 
header sizes; 40 bytes for IPv4/TCP, and 60 bytes for IPv6/TCP.  When 
the sender generates the packet, he is responsible for reducing the TCP 
data to allow room for any additional options or headers.

			-David Borman


From touch at ISI.EDU  Thu Apr 21 13:29:33 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 13:29:33 -0700
Subject: [e2e] MTU - IP layer
In-Reply-To: <F09324DCDD2F5D488EAC603D6B299DC7D2C30D@jsrvr8.jaalam.net>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C30D@jsrvr8.jaalam.net>
Message-ID: <42680D2D.1070309@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

L3 packet size isn't referred to as MTU, esp. in IP (rfc791); it is
datagram length (or total length).

Fragments in IP must be less than or equal to the MTU, which there (791)
refers to the max payload of the L2.

path MTU discovery is equivalent to path "max link payload" discovery,
rather than path "max network payload" discovery.

IMO, therefore, MTU really refers to the L2 payload size, which is not
the same as the L3 'frame' size (size of the total IP packet), but is
related to the size of an L3 fragment.

Joe

Loki Jorgenson wrote:
> 
> Minor note - MTU is technically Layer 3 (as opposed to link layer or
> layer 2).  So it is quite correct to describe the MTU as the link layer
> payload size.  So, as noted, 1518 bytes is the frame size at layer 2.
> 
> However, it is very important to keep in mind that MTU and path MTU
> discovery operate at Layer 3.  For example, boundaries between differing
> MTUs should be handled by Layer 3 devices (not switches) to avoid
> end-to-end issues that can arise.
> 
> Loki
> 
> ----
> 
> "Joe Wrote:"
> 
> 
> Date: Thu, 21 Apr 2005 09:28:28 -0700
> From: Joe Touch <touch at ISI.EDU>
> Subject: Re: [e2e] Question on MTU
> To: Arjuna Sathiaseelan <arjuna.sathiaseelan at gmail.com>
> Cc: end2end-interest at postel.org
> Message-ID: <4267D4AC.8090503 at isi.edu>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> 
> MTU usually refers to a link layer, and denotes the maximum link ayboad
> size, excluding link header/trailer info. For Ethernet, such
> header/trailers include:
> 
>         - 14 byte header
>         - 4 byte 802.1q (VLAN) tag
>         - 4 byte CRC
> 
> Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
> byte frames. From the link frame size, subtract the link header/trailer
> to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
> are jumbograms of 9,000 bytes in the extended ethernet spec.
> 
> MSS usually refers to a transport protocol, e.g., TCP, and denotes the
> max payload size there too. It is also relative to the network (IPv4,
> IPv6) protocol _and_ link layer used.
> 
> And just as link layer overhead sizes vary, so do network layer overhead
> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
> included, e.g., 48 for IPv6 with jumbogram option).
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaA0tE5f5cImnZrsRAow8AJ4pWCIAqdCRFbQDAhbm4+z1SaZzbACfSvb/
XZXMcs7Veyt+qS6RdSEzzeU=
=kDKI
-----END PGP SIGNATURE-----

From braden at ISI.EDU  Thu Apr 21 13:55:26 2005
From: braden at ISI.EDU (Bob Braden)
Date: Thu, 21 Apr 2005 13:55:26 -0700 (PDT)
Subject: [e2e] Question on MTU
Message-ID: <200504212055.NAA02998@gra.isi.edu>


  *> 
  *> But the advertised MSS in the TCP MSS option should not be adjusted to 
  *> reflect any options or intermediary headers, just the fixed IP and TCP 
  *> header sizes; 40 bytes for IPv4/TCP, and 60 bytes for IPv6/TCP.  When 
  *> the sender generates the packet, he is responsible for reducing the TCP 
  *> data to allow room for any additional options or headers.
  *> 
  *> 			-David Borman
  *> 

Right.

Please see Section 4.2.2.6 of RFC 1122 "Requirements for Internet
Hosts - Communication Layers" for the details.

Bob Braden

From ljorgenson at apparentnetworks.com  Thu Apr 21 13:55:29 2005
From: ljorgenson at apparentnetworks.com (Loki Jorgenson)
Date: Thu, 21 Apr 2005 13:55:29 -0700
Subject: [e2e] MTU - IP layer
Message-ID: <F09324DCDD2F5D488EAC603D6B299DC7D2C342@jsrvr8.jaalam.net>

Hmmmmmm - that's an interesting reading of RFC 791 - and the distinction
of fragments over datagrams could be made in that way.

My observation remains that MTU is conceptually defined and implemented
at Layer 3.  Making pains to define it in Layer 2 terms in order to
ensure its scope includes all valid cases makes sense - and yet I find
it challenged.  Promoting the subtle distinction of "Frame payload" over
"packet/datagram" doesn't seem beneficial.

Prehaps I'm favouring the pragmatic over the precise....

Loki

-----Original Message-----
From: Joe Touch [mailto:touch at ISI.EDU] 
Sent: Thursday, April 21, 2005 1:30 PM
To: Loki Jorgenson
Cc: end2end-interest at postel.org
Subject: Re: [e2e] MTU - IP layer

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

L3 packet size isn't referred to as MTU, esp. in IP (rfc791); it is
datagram length (or total length).

Fragments in IP must be less than or equal to the MTU, which there (791)
refers to the max payload of the L2.

path MTU discovery is equivalent to path "max link payload" discovery,
rather than path "max network payload" discovery.

IMO, therefore, MTU really refers to the L2 payload size, which is not
the same as the L3 'frame' size (size of the total IP packet), but is
related to the size of an L3 fragment.

Joe

Loki Jorgenson wrote:
> 
> Minor note - MTU is technically Layer 3 (as opposed to link layer or
> layer 2).  So it is quite correct to describe the MTU as the link
layer
> payload size.  So, as noted, 1518 bytes is the frame size at layer 2.
> 
> However, it is very important to keep in mind that MTU and path MTU
> discovery operate at Layer 3.  For example, boundaries between
differing
> MTUs should be handled by Layer 3 devices (not switches) to avoid
> end-to-end issues that can arise.
> 
> Loki
> 
> ----
> 
> "Joe Wrote:"
> 
> 
> Date: Thu, 21 Apr 2005 09:28:28 -0700
> From: Joe Touch <touch at ISI.EDU>
> Subject: Re: [e2e] Question on MTU
> To: Arjuna Sathiaseelan <arjuna.sathiaseelan at gmail.com>
> Cc: end2end-interest at postel.org
> Message-ID: <4267D4AC.8090503 at isi.edu>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> 
> MTU usually refers to a link layer, and denotes the maximum link
ayboad
> size, excluding link header/trailer info. For Ethernet, such
> header/trailers include:
> 
>         - 14 byte header
>         - 4 byte 802.1q (VLAN) tag
>         - 4 byte CRC
> 
> Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
> byte frames. From the link frame size, subtract the link
header/trailer
> to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
> are jumbograms of 9,000 bytes in the extended ethernet spec.
> 
> MSS usually refers to a transport protocol, e.g., TCP, and denotes the
> max payload size there too. It is also relative to the network (IPv4,
> IPv6) protocol _and_ link layer used.
> 
> And just as link layer overhead sizes vary, so do network layer
overhead
> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
> included, e.g., 48 for IPv6 with jumbogram option).
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaA0tE5f5cImnZrsRAow8AJ4pWCIAqdCRFbQDAhbm4+z1SaZzbACfSvb/
XZXMcs7Veyt+qS6RdSEzzeU=
=kDKI
-----END PGP SIGNATURE-----

From touch at ISI.EDU  Thu Apr 21 13:59:54 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 13:59:54 -0700
Subject: [e2e] Question on MTU
In-Reply-To: <7bf770bf3d525c13130f6408e21788b7@windriver.com>
References: <1ef2259005042100424feef544@mail.gmail.com>	<4267D4AC.8090503@isi.edu>
	<7bf770bf3d525c13130f6408e21788b7@windriver.com>
Message-ID: <4268144A.6080002@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

See RFC1122 Section 4.2.2.6 on calculating the MSS advertised in the TCP
MSS option. Condensed from that section:

	The eff.snd.MSS takes options at both IP and TCP layers
	into account - this is the size of the largest segment
	actually sent.

	The MSS value sent in the MSS option must be <= MSS_R - 20,
	where MSS_R is from GET_MAXSIZES in sec 3.4.

Sec 3.4 refers to 3.3.3, which defines:

            MMS_S = EMTU_S - <IP header size>

         and EMTU_S must be less than or equal to the MTU of the network
         interface corresponding to the source address of the datagram.
         Note that <IP header size> in this equation will be 20, unless
         the IP reserves space to insert IP options for its own purposes
         in addition to any options inserted by the transport layer.

I.e., IP options ARE accounted for in the advertised MSS.

As you noted, intermediate headers (shims like IPsec and HIP) are harder
to handle because they aren't treated as options, and may not
necessarily be known to either IP or TCP. My understanding is that most
implementations adjust the IP MSS accordingly, so it gets passed up to
TCP as per secs 3.3.3 and 3.4 of 1122 above.

Joe

David Borman wrote:
> 
> On Apr 21, 2005, at 11:28 AM, Joe Touch wrote:
> 
>> MSS usually refers to a transport protocol, e.g., TCP, and denotes the
>> max payload size there too. It is also relative to the network (IPv4,
>> IPv6) protocol _and_ link layer used.
>>
>> And just as link layer overhead sizes vary, so do network layer overhead
>> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
>> included, e.g., 48 for IPv6 with jumbogram option).
> 
> 
> But the advertised MSS in the TCP MSS option should not be adjusted to
> reflect any options or intermediary headers, just the fixed IP and TCP
> header sizes; 40 bytes for IPv4/TCP, and 60 bytes for IPv6/TCP.  When
> the sender generates the packet, he is responsible for reducing the TCP
> data to allow room for any additional options or headers.
> 
>             -David Borman
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaBRKE5f5cImnZrsRAuVHAJ9eaIBHKXMZxhzcMgldOvFVphYRIACffqGL
qWTwK4RCNc/QWYLQxi4tYOU=
=SChT
-----END PGP SIGNATURE-----

From touch at ISI.EDU  Thu Apr 21 14:02:11 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 14:02:11 -0700
Subject: [e2e] MTU - IP layer
In-Reply-To: <F09324DCDD2F5D488EAC603D6B299DC7D2C342@jsrvr8.jaalam.net>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C342@jsrvr8.jaalam.net>
Message-ID: <426814D3.7030908@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Loki Jorgenson wrote:
> Hmmmmmm - that's an interesting reading of RFC 791 - and the distinction
> of fragments over datagrams could be made in that way.
> 
> My observation remains that MTU is conceptually defined and implemented
> at Layer 3.  Making pains to define it in Layer 2 terms in order to
> ensure its scope includes all valid cases makes sense - and yet I find
> it challenged.  Promoting the subtle distinction of "Frame payload" over
> "packet/datagram" doesn't seem beneficial.

But that's exactly the difference between a datagram fragment and the
entire datagram, when the datagram is larger than the MTU. Fragments are
smaller than the L2 MTU, but datagrams are smaller than the L3
'framesize' - whatever we want to call that. ;-)

Joe

> 
> Prehaps I'm favouring the pragmatic over the precise....
> 
> Loki
> 
> -----Original Message-----
> From: Joe Touch [mailto:touch at ISI.EDU] 
> Sent: Thursday, April 21, 2005 1:30 PM
> To: Loki Jorgenson
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] MTU - IP layer
> 
> L3 packet size isn't referred to as MTU, esp. in IP (rfc791); it is
> datagram length (or total length).
> 
> Fragments in IP must be less than or equal to the MTU, which there (791)
> refers to the max payload of the L2.
> 
> path MTU discovery is equivalent to path "max link payload" discovery,
> rather than path "max network payload" discovery.
> 
> IMO, therefore, MTU really refers to the L2 payload size, which is not
> the same as the L3 'frame' size (size of the total IP packet), but is
> related to the size of an L3 fragment.
> 
> Joe
> 
> Loki Jorgenson wrote:
> 
>>>Minor note - MTU is technically Layer 3 (as opposed to link layer or
>>>layer 2).  So it is quite correct to describe the MTU as the link
> 
> layer
> 
>>>payload size.  So, as noted, 1518 bytes is the frame size at layer 2.
>>>
>>>However, it is very important to keep in mind that MTU and path MTU
>>>discovery operate at Layer 3.  For example, boundaries between
> 
> differing
> 
>>>MTUs should be handled by Layer 3 devices (not switches) to avoid
>>>end-to-end issues that can arise.
>>>
>>>Loki
>>>
>>>----
>>>
>>>"Joe Wrote:"
>>>
>>>
>>>Date: Thu, 21 Apr 2005 09:28:28 -0700
>>>From: Joe Touch <touch at ISI.EDU>
>>>Subject: Re: [e2e] Question on MTU
>>>To: Arjuna Sathiaseelan <arjuna.sathiaseelan at gmail.com>
>>>Cc: end2end-interest at postel.org
>>>Message-ID: <4267D4AC.8090503 at isi.edu>
>>>Content-Type: text/plain; charset=ISO-8859-1
>>>
>>>
>>>MTU usually refers to a link layer, and denotes the maximum link
> 
> ayboad
> 
>>>size, excluding link header/trailer info. For Ethernet, such
>>>header/trailers include:
>>>
>>>        - 14 byte header
>>>        - 4 byte 802.1q (VLAN) tag
>>>        - 4 byte CRC
>>>
>>>Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
>>>byte frames. From the link frame size, subtract the link
> 
> header/trailer
> 
>>>to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
>>>are jumbograms of 9,000 bytes in the extended ethernet spec.
>>>
>>>MSS usually refers to a transport protocol, e.g., TCP, and denotes the
>>>max payload size there too. It is also relative to the network (IPv4,
>>>IPv6) protocol _and_ link layer used.
>>>
>>>And just as link layer overhead sizes vary, so do network layer
> 
> overhead
> 
>>>sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
>>>included, e.g., 48 for IPv6 with jumbogram option).
>>>
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaBTTE5f5cImnZrsRAn7wAJ0Qx8njXuW53Z6biPzVrgkFecROngCeOcPk
hQSXcr8aQWhVwgYWlDqjVhw=
=rnwt
-----END PGP SIGNATURE-----

From braden at ISI.EDU  Thu Apr 21 14:26:00 2005
From: braden at ISI.EDU (Bob Braden)
Date: Thu, 21 Apr 2005 14:26:00 -0700 (PDT)
Subject: [e2e] MTU - IP layer
Message-ID: <200504212126.OAA03031@gra.isi.edu>


  *> 
  *> Hmmmmmm - that's an interesting reading of RFC 791 - and the distinction
  *> of fragments over datagrams could be made in that way.
  *> 

"Interesting"?  Joe is correct and completely precise here.

Bob Braden

From braden at ISI.EDU  Thu Apr 21 14:32:22 2005
From: braden at ISI.EDU (Bob Braden)
Date: Thu, 21 Apr 2005 14:32:22 -0700 (PDT)
Subject: [e2e] MTU - IP layer
Message-ID: <200504212132.OAA03036@gra.isi.edu>


  *> 
  *> My observation remains that MTU is conceptually defined and implemented
  *> at Layer 3.  Making pains to define it in Layer 2 terms in order to
  *> ensure its scope includes all valid cases makes sense - and yet I find
  *> it challenged.  Promoting the subtle distinction of "Frame payload" over
  *> "packet/datagram" doesn't seem beneficial.

You are not getting the point.  This is a completely correct
distinction, and it is not subtle.  IP permits a link layer frame to be
longer (but not shorter) than the IP datagram it contains.  There is
NOT a necessary equality between layer 2 frame size and layer 3
datagram size. That is (one reason) why an IP header contains a length
field; it cannot just assume the length field provided by the link
layer (the way TCP inherits the length from IP).

We thrashed this point out in 1978 when TCP/IP was being designed.

On another issue in this thread, MSS is specific to TCP, because the
definition of a "segment" is TCP-specific.  (I once tried to convince
Jon Postel that "segment" was a superfluous term, but he was not
buying... ;-))

  *> 
  *> Prehaps I'm favouring the pragmatic over the precise....
  *> 

Quite the opposite, in fact.

Bob Braden

  *> Loki
  *> 
  *> -----Original Message-----
  *> From: Joe Touch [mailto:touch at ISI.EDU] 
  *> Sent: Thursday, April 21, 2005 1:30 PM
  *> To: Loki Jorgenson
  *> Cc: end2end-interest at postel.org
  *> Subject: Re: [e2e] MTU - IP layer
  *> 
  *> -----BEGIN PGP SIGNED MESSAGE-----
  *> Hash: SHA1
  *> 
  *> L3 packet size isn't referred to as MTU, esp. in IP (rfc791); it is
  *> datagram length (or total length).
  *> 
  *> Fragments in IP must be less than or equal to the MTU, which there (791)
  *> refers to the max payload of the L2.
  *> 
  *> path MTU discovery is equivalent to path "max link payload" discovery,
  *> rather than path "max network payload" discovery.
  *> 
  *> IMO, therefore, MTU really refers to the L2 payload size, which is not
  *> the same as the L3 'frame' size (size of the total IP packet), but is
  *> related to the size of an L3 fragment.
  *> 
  *> Joe
  *> 
  *> Loki Jorgenson wrote:
  *> > 
  *> > Minor note - MTU is technically Layer 3 (as opposed to link layer or
  *> > layer 2).  So it is quite correct to describe the MTU as the link
  *> layer
  *> > payload size.  So, as noted, 1518 bytes is the frame size at layer 2.
  *> > 
  *> > However, it is very important to keep in mind that MTU and path MTU
  *> > discovery operate at Layer 3.  For example, boundaries between
  *> differing
  *> > MTUs should be handled by Layer 3 devices (not switches) to avoid
  *> > end-to-end issues that can arise.
  *> > 
  *> > Loki
  *> > 
  *> > ----
  *> > 
  *> > "Joe Wrote:"
  *> > 
  *> > 
  *> > Date: Thu, 21 Apr 2005 09:28:28 -0700
  *> > From: Joe Touch <touch at ISI.EDU>
  *> > Subject: Re: [e2e] Question on MTU
  *> > To: Arjuna Sathiaseelan <arjuna.sathiaseelan at gmail.com>
  *> > Cc: end2end-interest at postel.org
  *> > Message-ID: <4267D4AC.8090503 at isi.edu>
  *> > Content-Type: text/plain; charset=ISO-8859-1
  *> > 
  *> > 
  *> > MTU usually refers to a link layer, and denotes the maximum link
  *> ayboad
  *> > size, excluding link header/trailer info. For Ethernet, such
  *> > header/trailers include:
  *> > 
  *> >         - 14 byte header
  *> >         - 4 byte 802.1q (VLAN) tag
  *> >         - 4 byte CRC
  *> > 
  *> > Standard ethernet has 1518 byte frames, but 802.1q ethernet has 1522
  *> > byte frames. From the link frame size, subtract the link
  *> header/trailer
  *> > to get the MTU. Standard ethernet has an MTU of 1500 bytes, but there
  *> > are jumbograms of 9,000 bytes in the extended ethernet spec.
  *> > 
  *> > MSS usually refers to a transport protocol, e.g., TCP, and denotes the
  *> > max payload size there too. It is also relative to the network (IPv4,
  *> > IPv6) protocol _and_ link layer used.
  *> > 
  *> > And just as link layer overhead sizes vary, so do network layer
  *> overhead
  *> > sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
  *> > included, e.g., 48 for IPv6 with jumbogram option).
  *> > 
  *> -----BEGIN PGP SIGNATURE-----
  *> Version: GnuPG v1.2.4 (MingW32)
  *> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
  *> 
  *> iD8DBQFCaA0tE5f5cImnZrsRAow8AJ4pWCIAqdCRFbQDAhbm4+z1SaZzbACfSvb/
  *> XZXMcs7Veyt+qS6RdSEzzeU=
  *> =kDKI
  *> -----END PGP SIGNATURE-----
  *> 

From david.borman at windriver.com  Thu Apr 21 15:13:22 2005
From: david.borman at windriver.com (David Borman)
Date: Thu, 21 Apr 2005 17:13:22 -0500
Subject: [e2e] Question on MTU
In-Reply-To: <4268144A.6080002@isi.edu>
References: <1ef2259005042100424feef544@mail.gmail.com>	<4267D4AC.8090503@isi.edu>
	<7bf770bf3d525c13130f6408e21788b7@windriver.com>
	<4268144A.6080002@isi.edu>
Message-ID: <53edeea330e7ab135170e4d17ee59c68@windriver.com>

Joe,

The "effective send MSS" takes into account options, but the MSS value 
put into the TCP MSS option should not.  In section 4.2.2.6 of RFC 
1122:

             The MSS value to be sent in an MSS option must be less than
             or equal to:

                MMS_R - 20

             where MMS_R is the maximum size for a transport-layer
             message that can be received (and reassembled).  TCP obtains
             MMS_R and MMS_S from the IP layer; see the generic call
             GET_MAXSIZES in Section 3.4.

And in section 3.3.2:

          There MUST be a mechanism by which the transport layer can
          learn MMS_R, the maximum message size that can be received and
          reassembled in an IP datagram (see GET_MAXSIZES calls in
          Section 3.4).  If EMTU_R is not indefinite, then the value of
          MMS_R is given by:

             MMS_R = EMTU_R - 20

          since 20 is the minimum size of an IP header.

The receiver can't reliably predict what IP or TCP options the sender 
is going to put into the packets, so it doesn't include them in the MSS 
option.  The sender then does take those options into account when 
calculating the "effective send MSS", because it knows exactly what 
options are going into the packet.

			-David Borman


On Apr 21, 2005, at 3:59 PM, Joe Touch wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> See RFC1122 Section 4.2.2.6 on calculating the MSS advertised in the 
> TCP
> MSS option. Condensed from that section:
>
> 	The eff.snd.MSS takes options at both IP and TCP layers
> 	into account - this is the size of the largest segment
> 	actually sent.
>
> 	The MSS value sent in the MSS option must be <= MSS_R - 20,
> 	where MSS_R is from GET_MAXSIZES in sec 3.4.
>
> Sec 3.4 refers to 3.3.3, which defines:
>
>             MMS_S = EMTU_S - <IP header size>
>
>          and EMTU_S must be less than or equal to the MTU of the 
> network
>          interface corresponding to the source address of the datagram.
>          Note that <IP header size> in this equation will be 20, unless
>          the IP reserves space to insert IP options for its own 
> purposes
>          in addition to any options inserted by the transport layer.
>
> I.e., IP options ARE accounted for in the advertised MSS.
>
> As you noted, intermediate headers (shims like IPsec and HIP) are 
> harder
> to handle because they aren't treated as options, and may not
> necessarily be known to either IP or TCP. My understanding is that most
> implementations adjust the IP MSS accordingly, so it gets passed up to
> TCP as per secs 3.3.3 and 3.4 of 1122 above.
>
> Joe
>
> David Borman wrote:
>>
>> On Apr 21, 2005, at 11:28 AM, Joe Touch wrote:
>>
>>> MSS usually refers to a transport protocol, e.g., TCP, and denotes 
>>> the
>>> max payload size there too. It is also relative to the network (IPv4,
>>> IPv6) protocol _and_ link layer used.
>>>
>>> And just as link layer overhead sizes vary, so do network layer 
>>> overhead
>>> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
>>> included, e.g., 48 for IPv6 with jumbogram option).
>>
>>
>> But the advertised MSS in the TCP MSS option should not be adjusted to
>> reflect any options or intermediary headers, just the fixed IP and TCP
>> header sizes; 40 bytes for IPv4/TCP, and 60 bytes for IPv6/TCP.  When
>> the sender generates the packet, he is responsible for reducing the 
>> TCP
>> data to allow room for any additional options or headers.
>>
>>             -David Borman
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (MingW32)
> Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
>
> iD8DBQFCaBRKE5f5cImnZrsRAuVHAJ9eaIBHKXMZxhzcMgldOvFVphYRIACffqGL
> qWTwK4RCNc/QWYLQxi4tYOU=
> =SChT
> -----END PGP SIGNATURE-----


From ljorgenson at apparentnetworks.com  Thu Apr 21 15:39:11 2005
From: ljorgenson at apparentnetworks.com (Loki Jorgenson)
Date: Thu, 21 Apr 2005 15:39:11 -0700
Subject: [e2e] MTU - IP layer
Message-ID: <F09324DCDD2F5D488EAC603D6B299DC7D2C3AC@jsrvr8.jaalam.net>


OK - I'm convinced that the language is accurate.  Thanks for the
clarification.  So it may simply be the difference between the
conceptual and the applied.

What I've been struggling with are the conflicting requirements to
resolve MTU as an end-to-end value and to handle framesize/MTU at the
interface/link layer.  If the reality of IP is such that MTU is
essentially defined in terms of the link layer, but all the pMTU
processes operate at the network layer, how does one avoid, for example,
the problems associated with black holes?

Where this comes up in our experience is when the confusion of MTU with
framesize leads to human mistakes being made at mixed MTU boundaries.
Either switches are put into place to manage the MTU constriction or
constrictions being accidentally created by miscalculation (9000 byte
frames instead of 9018).  There is no (effective) mechanism to ensure
that pMTU is a well-defined entity based on link layer implementaton -
it tends to be fragile.

At least by keeping MTU conceptually Layer 3, some of the major pitfalls
can be avoided, at least at a human level ..... thoughts?

Loki Jorgenson wrote:
> Hmmmmmm - that's an interesting reading of RFC 791 - and the
distinction
> of fragments over datagrams could be made in that way.
> 
> My observation remains that MTU is conceptually defined and
implemented
> at Layer 3.  Making pains to define it in Layer 2 terms in order to
> ensure its scope includes all valid cases makes sense - and yet I find
> it challenged.  Promoting the subtle distinction of "Frame payload"
over
> "packet/datagram" doesn't seem beneficial.

But that's exactly the difference between a datagram fragment and the
entire datagram, when the datagram is larger than the MTU. Fragments are
smaller than the L2 MTU, but datagrams are smaller than the L3
'framesize' - whatever we want to call that. ;-)

Joe


From cannara at attglobal.net  Thu Apr 21 16:42:25 2005
From: cannara at attglobal.net (Cannara)
Date: Thu, 21 Apr 2005 16:42:25 -0700
Subject: [e2e] MTU - IP layer
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C3AC@jsrvr8.jaalam.net>
Message-ID: <42683A61.C812DBFE@attglobal.net>

This is interesting for its parochial nature, behind TCP/IP blinders.  MTU is
a defined term from way, way back and has nothing specifically to do with any
protocol.  It simply indicates the Maximum Transfer Unit any layer's
implementation can handle.  In other words, each layer's PDU requires an
advertizement of MTU upward and an acceptance of the MTU offered from below.
For the IP world, where 512B seemed to be an important limit from below far
longer than it should have been, this meant implementing Fragmentation of IP
Datagrams.  At higher or lower layers of various protocols, this has been a
reality for many years.  

Alex

Loki Jorgenson wrote:
> 
> OK - I'm convinced that the language is accurate.  Thanks for the
> clarification.  So it may simply be the difference between the
> conceptual and the applied.
> 
> What I've been struggling with are the conflicting requirements to
> resolve MTU as an end-to-end value and to handle framesize/MTU at the
> interface/link layer.  If the reality of IP is such that MTU is
> essentially defined in terms of the link layer, but all the pMTU
> processes operate at the network layer, how does one avoid, for example,
> the problems associated with black holes?
> 
> Where this comes up in our experience is when the confusion of MTU with
> framesize leads to human mistakes being made at mixed MTU boundaries.
> Either switches are put into place to manage the MTU constriction or
> constrictions being accidentally created by miscalculation (9000 byte
> frames instead of 9018).  There is no (effective) mechanism to ensure
> that pMTU is a well-defined entity based on link layer implementaton -
> it tends to be fragile.
> 
> At least by keeping MTU conceptually Layer 3, some of the major pitfalls
> can be avoided, at least at a human level ..... thoughts?
> 
> Loki Jorgenson wrote:
> > Hmmmmmm - that's an interesting reading of RFC 791 - and the
> distinction
> > of fragments over datagrams could be made in that way.
> >
> > My observation remains that MTU is conceptually defined and
> implemented
> > at Layer 3.  Making pains to define it in Layer 2 terms in order to
> > ensure its scope includes all valid cases makes sense - and yet I find
> > it challenged.  Promoting the subtle distinction of "Frame payload"
> over
> > "packet/datagram" doesn't seem beneficial.
> 
> But that's exactly the difference between a datagram fragment and the
> entire datagram, when the datagram is larger than the MTU. Fragments are
> smaller than the L2 MTU, but datagrams are smaller than the L3
> 'framesize' - whatever we want to call that. ;-)
> 
> Joe

From touch at ISI.EDU  Thu Apr 21 16:41:42 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 16:41:42 -0700
Subject: [e2e] Question on MTU
In-Reply-To: <53edeea330e7ab135170e4d17ee59c68@windriver.com>
References: <1ef2259005042100424feef544@mail.gmail.com>	<4267D4AC.8090503@isi.edu>
	<7bf770bf3d525c13130f6408e21788b7@windriver.com>
	<4268144A.6080002@isi.edu>
	<53edeea330e7ab135170e4d17ee59c68@windriver.com>
Message-ID: <42683A36.7030600@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi, Dave,

David Borman wrote:
> Joe,
> 
> The "effective send MSS" takes into account options, but the MSS value
> put into the TCP MSS option should not.  In section 4.2.2.6 of RFC 1122:
> 
>             The MSS value to be sent in an MSS option must be less than
>             or equal to:
> 
>                MMS_R - 20
> 
>             where MMS_R is the maximum size for a transport-layer
>             message that can be received (and reassembled).  TCP obtains
>             MMS_R and MMS_S from the IP layer; see the generic call
>             GET_MAXSIZES in Section 3.4.

The MSS you send to the other side is the side you can receive, which
has nothing to do with your options - TCP or IP

As you correctly note, this is related to MSS_R - 20 (sorry - I used the
MSS_S value).

> And in section 3.3.2:
> 
>          There MUST be a mechanism by which the transport layer can
>          learn MMS_R, the maximum message size that can be received and
>          reassembled in an IP datagram (see GET_MAXSIZES calls in
>          Section 3.4).  If EMTU_R is not indefinite, then the value of
>          MMS_R is given by:
> 
>             MMS_R = EMTU_R - 20
> 
>          since 20 is the minimum size of an IP header.
> 
> The receiver can't reliably predict what IP or TCP options the sender is
> going to put into the packets, so it doesn't include them in the MSS
> option.  The sender then does take those options into account when
> calculating the "effective send MSS", because it knows exactly what
> options are going into the packet.

Agreed. The primary issue to me was that the options - both IP and TCP -
are taken into account in computing the MSS TCP uses, whether obtained
by looking at the local interface or learned by the PMTUD mechanisms.

FWIW, the shims sometimes cobble things by setting the interface MTU
down by the amount they add, effectively 'adding' space for it as a
result. (sometimes; sometimes it's not so easy to point to which
interface will be used, at which point I don't know if they decrement
all interfaces or try to do anything more context-dependent)

Joe

> On Apr 21, 2005, at 3:59 PM, Joe Touch wrote:
> 
> See RFC1122 Section 4.2.2.6 on calculating the MSS advertised in the TCP
> MSS option. Condensed from that section:
> 
>     The eff.snd.MSS takes options at both IP and TCP layers
>     into account - this is the size of the largest segment
>     actually sent.
> 
>     The MSS value sent in the MSS option must be <= MSS_R - 20,
>     where MSS_R is from GET_MAXSIZES in sec 3.4.
> 
> Sec 3.4 refers to 3.3.3, which defines:
> 
>             MMS_S = EMTU_S - <IP header size>
> 
>          and EMTU_S must be less than or equal to the MTU of the network
>          interface corresponding to the source address of the datagram.
>          Note that <IP header size> in this equation will be 20, unless
>          the IP reserves space to insert IP options for its own purposes
>          in addition to any options inserted by the transport layer.
> 
> I.e., IP options ARE accounted for in the advertised MSS.
> 
> As you noted, intermediate headers (shims like IPsec and HIP) are harder
> to handle because they aren't treated as options, and may not
> necessarily be known to either IP or TCP. My understanding is that most
> implementations adjust the IP MSS accordingly, so it gets passed up to
> TCP as per secs 3.3.3 and 3.4 of 1122 above.
> 
> Joe
> 
> David Borman wrote:
> 
>>>>
>>>> On Apr 21, 2005, at 11:28 AM, Joe Touch wrote:
>>>>
>>>>> MSS usually refers to a transport protocol, e.g., TCP, and denotes the
>>>>> max payload size there too. It is also relative to the network (IPv4,
>>>>> IPv6) protocol _and_ link layer used.
>>>>>
>>>>> And just as link layer overhead sizes vary, so do network layer
>>>>> overhead
>>>>> sizes (minimums of 20 for IPv4, 40 for IPv6 - larger if options are
>>>>> included, e.g., 48 for IPv6 with jumbogram option).
>>>>
>>>>
>>>>
>>>> But the advertised MSS in the TCP MSS option should not be adjusted to
>>>> reflect any options or intermediary headers, just the fixed IP and TCP
>>>> header sizes; 40 bytes for IPv4/TCP, and 60 bytes for IPv6/TCP.  When
>>>> the sender generates the packet, he is responsible for reducing the TCP
>>>> data to allow room for any additional options or headers.
>>>>
>>>>             -David Borman
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaDo2E5f5cImnZrsRAv95AKCMo1Tn9unqDs30y0+fLbqFmWlq7wCgj6TU
kyYj0EwJ72DRqmH2Y5/90gU=
=wIau
-----END PGP SIGNATURE-----

From touch at ISI.EDU  Thu Apr 21 17:20:22 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 17:20:22 -0700
Subject: [e2e] MTU - IP layer
In-Reply-To: <F09324DCDD2F5D488EAC603D6B299DC7D2C3AC@jsrvr8.jaalam.net>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C3AC@jsrvr8.jaalam.net>
Message-ID: <42684346.1020105@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Loki Jorgenson wrote:
> OK - I'm convinced that the language is accurate.  Thanks for the
> clarification.  So it may simply be the difference between the
> conceptual and the applied.

What Bob said ;-)

> What I've been struggling with are the conflicting requirements to
> resolve MTU as an end-to-end value and to handle framesize/MTU at the
> interface/link layer.

MTU is a link payload issue. Path MTU is the min of the MTUs on the
path; it is path MTU that is defined E2E.

> If the reality of IP is such that MTU is
> essentially defined in terms of the link layer, but all the pMTU
> processes operate at the network layer,

yes...

> how does one avoid, for example,
> the problems associated with black holes?

I'm not sure one has anything to do with the other. The only way to
_know_ you will avoid a black hole is to send the smallest IP packets
possible - 68 bytes. You can do this by sending small datagrams (28
byte), or by sending larger datagrams and fragment them.

Short of that, the only other way is POSITIVE feedback - try larger
packets and see what gets through. If it does, report back and use that
size. That's already under consideration in the IETF "pmtud" working group.

Using NEGATIVE feedback - the absence of error messages bouncing large
packets - is what is currently used, and that is what is susceptible to
black holes, because black holes look like a successful transmission.

> Where this comes up in our experience is when the confusion of MTU with
> framesize leads to human mistakes being made at mixed MTU boundaries.

That's what automated PMTUD is supposed to fix ;-)

> Either switches are put into place to manage the MTU constriction or
> constrictions being accidentally created by miscalculation (9000 byte
> frames instead of 9018).  There is no (effective) mechanism to ensure
> that pMTU is a well-defined entity based on link layer implementaton -
> it tends to be fragile.

That, again, is because paths are not link concepts, so pMTU isn't
defined at the link layer.

> At least by keeping MTU conceptually Layer 3, some of the major pitfalls
> can be avoided, at least at a human level ..... thoughts?

IMO, there's no benefit to human management possible; automated systems
are the key. The major pitfall, IMO, is trying to track this with brain
cells ;-)

Joe


> Loki Jorgenson wrote:
> 
>>Hmmmmmm - that's an interesting reading of RFC 791 - and the
> 
> distinction
> 
>>of fragments over datagrams could be made in that way.
>>
>>My observation remains that MTU is conceptually defined and
> 
> implemented
> 
>>at Layer 3.  Making pains to define it in Layer 2 terms in order to
>>ensure its scope includes all valid cases makes sense - and yet I find
>>it challenged.  Promoting the subtle distinction of "Frame payload"
> 
> over
> 
>>"packet/datagram" doesn't seem beneficial.
> 
> 
> But that's exactly the difference between a datagram fragment and the
> entire datagram, when the datagram is larger than the MTU. Fragments are
> smaller than the L2 MTU, but datagrams are smaller than the L3
> 'framesize' - whatever we want to call that. ;-)
> 
> Joe
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaENGE5f5cImnZrsRAje5AJ9glKM5wN1vJ2G9NtPqpdV4XbH45ACeLLr+
igUDU1KTiBnn+xc0qp20bk8=
=+5lW
-----END PGP SIGNATURE-----

From ljorgenson at apparentnetworks.com  Thu Apr 21 17:46:36 2005
From: ljorgenson at apparentnetworks.com (Loki Jorgenson)
Date: Thu, 21 Apr 2005 17:46:36 -0700
Subject: [e2e] MTU - IP layer
Message-ID: <F09324DCDD2F5D488EAC603D6B299DC7D2C41A@jsrvr8.jaalam.net>

Joe wrote:

> MTU is a link payload issue. Path MTU is the min of the MTUs on the
> path; it is path MTU that is defined E2E.

  And that doesn't seem like a problem?  I guess if RFC 1191 was
reliably implemented and Layer 2 fed back to the end-to-end....

> 
> Short of that, the only other way is POSITIVE feedback - try larger
> packets and see what gets through. If it does, report back and use
that
> size. That's already under consideration in the IETF "pmtud" working
group.

>From the early drafts I looked at they were proposing, as you suggest, a
"probing for packet loss by size defines pMTU" mechanism  - is that
still the case then?  That doesn't sound like positive feedback per se.
The idea of ICMP DF Set probing (a la RFC1191) at least seemed like
positive feedback, if only best-effort....

Loki

From touch at ISI.EDU  Thu Apr 21 21:47:58 2005
From: touch at ISI.EDU (Joe Touch)
Date: Thu, 21 Apr 2005 21:47:58 -0700
Subject: [e2e] MTU - IP layer
In-Reply-To: <F09324DCDD2F5D488EAC603D6B299DC7D2C41A@jsrvr8.jaalam.net>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C41A@jsrvr8.jaalam.net>
Message-ID: <426881FE.7030408@isi.edu>


Loki Jorgenson wrote:
> Joe wrote:
> 
> 
>>MTU is a link payload issue. Path MTU is the min of the MTUs on the
>>path; it is path MTU that is defined E2E.
> 
>   And that doesn't seem like a problem?  I guess if RFC 1191 was
> reliably implemented and Layer 2 fed back to the end-to-end....

See the new PMTUD WG below ;-)

>>Short of that, the only other way is POSITIVE feedback - try larger
>>packets and see what gets through. If it does, report back and use
> 
> that
> 
>>size. That's already under consideration in the IETF "pmtud" working
> 
> group.
> 
>>From the early drafts I looked at they were proposing, as you suggest, a
> "probing for packet loss by size defines pMTU" mechanism  - is that
> still the case then?  That doesn't sound like positive feedback per se.
> The idea of ICMP DF Set probing (a la RFC1191) at least seemed like
> positive feedback, if only best-effort....
> 
> Loki

My use of 'negative' and 'positive' may have been confusing.

I meant more like "the absence of feedback" and "the presence of
feedback". Positive/negative can be confused with the kind of
information you get, _when_ you get feedback (yes it got through, or no
it failed).

So, current pmtud is based on the absence of "no, it failed" feedback.
I.e., if the source gets the ICMP errors back, it knows that particular
attempt failed. The algorithm says "try large until you get told NOT
to", which is _why_ it is susceptible to black holes - because black
holes behave like a working large-mtu path - you do NOT get the feedback
that anything failed.

The new ptmud is based on the presence of "yes, it got through"
feedback. The loss isn't what matters; it's what gets through that does
(successful probes). The algorithm says "stay small, and try large
(disposable) probes; if the probes work, THEN get larger". This is not
susceptible to black holes - it works only when both the probes get
through _and_ the feedback makes it back successfully.

Joe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : http://www.postel.org/pipermail/end2end-interest/attachments/20050421/2755b74a/signature.bin

From ljorgenson at apparentnetworks.com  Thu Apr 21 23:57:53 2005
From: ljorgenson at apparentnetworks.com (Loki Jorgenson)
Date: Thu, 21 Apr 2005 23:57:53 -0700
Subject: [e2e] MTU - IP layer
Message-ID: <F09324DCDD2F5D488EAC603D6B299DC7D2C434@jsrvr8.jaalam.net>

OK - I've skimmed the latest draft of the RFC and I see better how they
are planning to make this work.  It represents a much more significant
change than I had gleaned from earlier readings. 

I'm still wondering about how the loss of these (disposable) probes will
be distinguished from congestion and other forms of loss, how the local
host's record of PMTU will change if the actual PMTU changes (for
example route change), and various other scenarios involving interplay
between loss, TCP and PLPMTUD .... but I'll just finish reading the
current draft and find out how it all turns out.

If others are as curious:
http://www.ietf.org/internet-drafts/draft-ietf-pmtud-method-04.txt

In any case, thanks for your insights Joe.

-----Original Message-----
From: Joe Touch [mailto:touch at ISI.EDU] 
Sent: Thursday, April 21, 2005 9:48 PM

The new ptmud is based on the presence of "yes, it got through"
feedback. The loss isn't what matters; it's what gets through that does
(successful probes). The algorithm says "stay small, and try large
(disposable) probes; if the probes work, THEN get larger". This is not
susceptible to black holes - it works only when both the probes get
through _and_ the feedback makes it back successfully.

Joe

From dpreed at reed.com  Fri Apr 22 06:54:09 2005
From: dpreed at reed.com (David P. Reed)
Date: Fri, 22 Apr 2005 09:54:09 -0400
Subject: [e2e] MTU - IP layer
In-Reply-To: <F09324DCDD2F5D488EAC603D6B299DC7D2C434@jsrvr8.jaalam.net>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C434@jsrvr8.jaalam.net>
Message-ID: <42690201.8080405@reed.com>

As a pragmatic architect, it seems to me that pmtud is focusing on 
micro-optimizing whatever problem turns out to be their motivating 
problem (FTP, I suspect), and worse yet, binding in narrow assumptions 
about the underlying *inter* network architecture (like the idea that 
there is one path, it is slowly changing, and that packet structure is 
preserved on the path, rather than being tunable to manage 
latency/jitter).  We'll never be able to exploit concurrency in the 
transport or link layers if we continue binding highly specific low 
level assumptions into highlevel protocols (also known as optimizing for 
the narrow domain of the present).  So I offer this as a suggestion...

It would seem to me that a small-packet network is free to implement 
large packets by intra-AS fragmentation and reassembly, for example.  
The objection to same was that reassembly was hard if packets took 
different paths.  But the PMTUD model implies they *Don't*!   Reductio 
ad absurdum.   So a much more practical separation of concerns would be 
to use a small number of end-to-end maximum packet sizes, and perhaps a 
notion of a much simpler f/r.  To cope with the long-term trend towards 
supporting larger and larger end-to-end datagrams, why not allow any 
size datagram, but cut it only on power-of-2 or power-of-4 boundaries 
(like the old "buddy" memory allocator, which simplified the reassembly 
of "free blocks").

Let reassembly occur whereever it is possible to do so (worst case at 
the target).   Make the end-to-end error check/error correct more robust 
(perhaps an adapted erasure code implemented at the endpoint would be 
effective at reducing round-trip overhead for fragment recovery).

Note that this *does* follow the end-to-end principle making the network 
simple and moving the work to the endpoints, while allowing the 
underlying network to be simply specified.

This is only a proposal, as usual.  Sent in hopes of inspiring useful 
research by grad student architects and thoughtful systems designers who 
need to simplify complex tradeoffs.  Perhaps cleaning up f/r is a lot 
more useful than making the "perfect" pmtud algorithm and then ruliing 
out network innovations that can't support it.

  In anticipation of the usual fiery reaction to end-to-end proposals 
from the cross-layer optimizers (routerheads) on this list, I'd ask 
those of you who are allergic to such solutions, please spout your 
annoyance at me directly, rather than doing a Cannara-like blast of rage 
and annoyance at past injustices and current bete noires to the whole list.

From mathis at psc.edu  Fri Apr 22 14:21:17 2005
From: mathis at psc.edu (Matt Mathis)
Date: Fri, 22 Apr 2005 17:21:17 -0400 (EDT)
Subject: [e2e] Question on MTU
In-Reply-To: <42683A36.7030600@isi.edu>
References: <1ef2259005042100424feef544@mail.gmail.com>
	<4267D4AC.8090503@isi.edu>
	<7bf770bf3d525c13130f6408e21788b7@windriver.com>
	<4268144A.6080002@isi.edu>
	<53edeea330e7ab135170e4d17ee59c68@windriver.com>
	<42683A36.7030600@isi.edu>
Message-ID: <Pine.LNX.4.58.0504221654360.5086@tesla.psc.edu>

There is another issue here, which I think is more germane to the original
question.   I quote from -pmtud-method-

   MTU, Maximum Transmission Unit, the size in bytes of the largest IP
      packet, including the IP header and payload, that can be
      transmitted on a link or path.  Note that this could more properly
      be called the IP MTU, to be consistent with how other standards
      organizations use the acronym MTU.

   link MTU, The Maximum Transmission Unit, i.e., maximum IP packet size
      in bytes, that can be conveyed in one piece over a link.  Beware
      that this definition differers from the definition used by other
      standards organizations.

      For IETF documents, link MTU is uniformly defined as the IP MTU
      over the link.  This includes the IP header, but excludes link
      layer headers and other framing which is not part of IP or the IP
      payload.

      Be aware that other standards organizations generally define link
      MTU to include the link layer headers.

So to make it concrete:  To the IETF, Ethernet has a 1500 Byte MTU, to the
IEEE, it has a 1518 Byte MTU.

This causes endless confusion and errors when people are configuring router
interfaces that have selectable MTUs, and other situations where both
communities might have to share documentation.  I seriously considered trying
to pick a new term to replace "IP MTU", but nothing is as crisp or sufficiently
motivating to re-train everyone who never thinks about layers below IP.

When you read a piece of documentation you can usually tell which MTU the
author meant, however once in a while a new product pops up where the HW
engineer failed to realize that IP MTU is not the total frame size and did it
wrong......

Peace,
--MM--
-------------------------------------------
Matt Mathis      http://www.psc.edu/~mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
Evil is defined by people who think they know
"The Truth" and use force to apply it to others.

From touch at ISI.EDU  Fri Apr 22 15:53:45 2005
From: touch at ISI.EDU (Joe Touch)
Date: Fri, 22 Apr 2005 15:53:45 -0700
Subject: [e2e] MTU - IP layer
In-Reply-To: <42690201.8080405@reed.com>
References: <F09324DCDD2F5D488EAC603D6B299DC7D2C434@jsrvr8.jaalam.net>
	<42690201.8080405@reed.com>
Message-ID: <42698079.9060608@isi.edu>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

David,

David P. Reed wrote:
> As a pragmatic architect, it seems to me that pmtud is focusing on
> micro-optimizing whatever problem turns out to be their motivating
> problem (FTP, I suspect), and worse yet, binding in narrow assumptions
> about the underlying *inter* network architecture (like the idea that
> there is one path, it is slowly changing, and that packet structure is
> preserved on the path, rather than being tunable to manage
> latency/jitter).  We'll never be able to exploit concurrency in the
> transport or link layers if we continue binding highly specific low
> level assumptions into highlevel protocols (also known as optimizing for
> the narrow domain of the present).

FWIW, I agree completely. Much as the 'positive feedback of positive
evidence' variant of the new version of pmtud is a step in the right
direction, I disagree with the way the current proposal is entangled
with the transport layer. I would be more comfortable if it were just
part of the network layer - where the path necessarily lies - and where
current PMTUD is basically implemented.

...
>  So I offer this as a suggestion...
> 
> It would seem to me that a small-packet network is free to implement
> large packets by intra-AS fragmentation and reassembly, for example. 
> The objection to same was that reassembly was hard if packets took
> different paths.  But the PMTUD model implies they *Don't*!   Reductio
> ad absurdum.   So a much more practical separation of concerns would be
> to use a small number of end-to-end maximum packet sizes, and perhaps a
> notion of a much simpler f/r.  To cope with the long-term trend towards
> supporting larger and larger end-to-end datagrams, why not allow any
> size datagram, but cut it only on power-of-2 or power-of-4 boundaries
> (like the old "buddy" memory allocator, which simplified the reassembly
> of "free blocks").
> 
> Let reassembly occur whereever it is possible to do so (worst case at
> the target).   Make the end-to-end error check/error correct more robust
> (perhaps an adapted erasure code implemented at the endpoint would be
> effective at reducing round-trip overhead for fragment recovery).

I agree that the basic idea should allow layers not to need to be aware
of each other beyond direct interface - IP fragments on link MTU, TCP
segments only on IP datagram limits rather than how IP fragments.

PMTUD is just an optimization, and it should never be the case that an
optimization disables functionality (as with black holes on
negative-info based current PMTUD).

One of the problems is that the optimization turns out to be
significant. The unit of loss in the network is an IP fragment, but the
unit of congestion control is a TCP MSS. When the two aren't the same,
things don't work as expected.

PMTUD (old or new) -does- move the work to the endpoints; new PMTUD even
more so, because it doesn't rely on ICMP errors from inside the network
but rather E2E feedback. Why isn't that consistent with the E2E
principle of making the network simpler while moving the work to the
endpoints?

Joe

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCaYB5E5f5cImnZrsRAgGrAJsGYpZNcPFnnqIaYcvcQkkUjkHuXACfRrX2
OPbT8mn5PV+oACi4Hjo0X/A=
=unoO
-----END PGP SIGNATURE-----

From kkrama at research.att.com  Fri Apr 22 21:54:17 2005
From: kkrama at research.att.com (K. K. Ramakrishnan)
Date: Sat, 23 Apr 2005 00:54:17 -0400
Subject: [e2e] Call for Papers: LANMAN 2005 (Deadline May 16, 2005)
Message-ID: <4269D4F9.6040007@research.att.com>

(Our apologies if you receive multiple copies of this message)
Note: updated deadline of May 16, 2005, 5 pm EDT.
======
                                    Call for Papers
  14th IEEE Workshop on Local and Metropolitan Area Networks (LANMAN 2005)
            September 18-21, 2005, Chania, Island of Crete, Greece
                     http://www.ieee-lanman.org
       Sponsored by: IEEE Communications Society

-- 
K. K. Ramakrishnan                     Email: kkrama at research.att.com
AT&T Labs-Research, Rm. A117           Tel: (973)360-8764
180 Park Ave, Florham Park, NJ 07932   Fax: (973) 360-8050
      URL: http://www.research.att.com/info/kkrama

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.postel.org/pipermail/end2end-interest/attachments/20050423/f05bcf58/attachment.html
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: call-for-papers_revised.txt
Url: http://www.postel.org/pipermail/end2end-interest/attachments/20050423/f05bcf58/call-for-papers_revised.txt

From mwd24 at thompson.cl.cam.ac.uk  Sun Apr 24 05:46:28 2005
From: mwd24 at thompson.cl.cam.ac.uk (Michael Dales)
Date: 24 Apr 2005 13:46:28 +0100
Subject: [e2e] Jitter Calculations in IP networks.
In-Reply-To: <12a3f40805042112543dd601c2@mail.gmail.com>
References: <12a3f40805042112543dd601c2@mail.gmail.com>
Message-ID: <yqmr7h0bavv.fsf@thompson.cl.cam.ac.uk>

Aamir Mehmood <am.amir at gmail.com> writes:

> Hi all,
> We are doing analysis of core ip backbone. Can some one please let me
> know how jitter is calculated  in ip networks. Is there any software
> except ethereal which can calculate the jitter from the captured RTP
> stream.

You might want to look at the work the IETF working group IPPM (IP
Performance Metrics) have been doing. Their charter can be found here:

http://www.ietf.org/html.charters/ippm-charter.html

Specifically they have an RFC specifying how to measure delay variance:

http://www.ietf.org/rfc/rfc3393.txt

Hope that's of some use.

-- 
Michael Dales


From jshen_cad at yahoo.com.cn  Wed Apr 27 00:37:02 2005
From: jshen_cad at yahoo.com.cn (Jing Shen)
Date: Wed, 27 Apr 2005 15:37:02 +0800 (CST)
Subject: [e2e] VoIP traffic characteristics
Message-ID: <20050427073703.2658.qmail@web15408.mail.cnb.yahoo.com>

Hi,

is there any work on VoIP traffic characteristics in
current internet? e.g. protocol distribution, packet
size distribution, flow size distribution,
communication pattern of Gate keeper etc. 

regards

Jing

_________________________________________________________
Do You Yahoo!?
×¢²áÊÀ½çÒ»Á÷Æ·ÖÊµÄÑÅ»¢Ãâ·ÑµçÓÊ
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/

From jussara at dcc.ufmg.br  Thu Apr 28 13:04:57 2005
From: jussara at dcc.ufmg.br (Jussara Marques de Almeida)
Date: Thu, 28 Apr 2005 17:04:57 -0300 (BRT)
Subject: [e2e] SIGMETRICS 2005 - early registration and hotel deadlines
	approaching
Message-ID: <Pine.GSO.4.58.0504281703490.22344@pantera.dcc.ufmg.br>


                   Call for Participation

              ****** ACM SIGMETRICS 2005 ******

                 International Conference on
        Measurement and Modeling of Computer Systems

                      June 6-10, 2005
                    Banff, Alberta, Canada
            http://www.cse.cuhk.edu.hk/~sigm2005


**** Early Registration Deadline: May 5, 2005 ****
**** Hotel Reservation Deadline: May 5, 2005 ****

ACM SIGMETRICS 2005, the International Conference on Measurement and
Modeling of Computer Systems, will be held June 6-10, 2005 in Banff,
Alberta, Canada.  The main conference (June 8-10) features eight paper
sessions and a poster session, as well a keynote talk by Urs Hoelzle
of Google, Inc., and a hot topics session on Optimization of Communication
Networks.  Preceeding the main conference are two workshops (June 6):

  -  MAthematical Modeling and Analysis (MAMA)
  -  Large Scale Network Inference (LSNI): Methods, Validation
     and Applications

and a full day of tutorials (June 7):

  -  Introduction to Control Theory for Computer Scientists
  -  Mathematical Optimization Techniques for Computer System Design
  -  Statistical Techniques for Performance Engineers
  -  Internet Routing: Measurement, Modeling, and Analysis
  -  Job Fairness in Queue Scheduling
  -  Using the Open Network Laboratory

Paper session topics include:
  -  Peer-to-Peer Networks
  -  Traffic Measurement and Classification
  -  Wireless Networks
  -  Caching and File Systems
  -  Bandwidth Sharing and Scheduling
  -  Network and Server Performance Measurement and Evaluation
  -  Traffic Estimation and Topology Inference.

For program details see the conference web site
    http://www.cse.cuhk.edu.hk/~sigm2005

Student travel grants are available to encourage student participation.
See the website for application details.  A Ph.D. student forum and
dinner are planned for June 6.

Banff is a world-class vacation spot in Banff National Park, in the heart
of the Canadian Rocky Mountains.  Conference attendees are encouraged to
take advantage of the potential sightseeing and leisure (or not so "leisure")
activities.  A group hiking expedition is being organized for June 5th.

Organizing Committee
====================

General Co-Chairs:
  Derek Eager, University of Saskatchewan, (eager at cs.usask.ca)
  Carey Williamson, University of Calgary, (carey at cpsc.ucalgary.ca)

Program Co-Chairs:
  Sem Borst, Bell Labs and CWI, (sem at research.bell-labs.com, sem.borst at cwi.nl)
  John C.S. Lui, Chinese University of Hong Kong, (cslui at cse.cuhk.edu.hk)

Tutorials Co-Chairs:
  Kimberly Keeton, HP Labs, (kkeeton at hpl.hp.com)
  Vishal Misra, Columbia University, (misra at cs.columbia.edu)

Finance Chair:
  Martin Arlitt, U. Calgary, (arlitt at cpsc.ucalgary.ca)

Proceedings Chair:
  Anirban Mahanti, U. Calgary, (mahanti at cpsc.ucalgary.ca)

Registration Chair and Local Arrangements Chair:
  Camille Sinanan, U. Calgary, (camille at cpsc.ucalgary.ca)

Publicity Co-Chairs:
  Jussara Almeida, UFMG, Brazil, (jussara at dcc.ufmg.br)
  Thomas Bonald, France Telecom, (thomas.bonald at francetelecom.com)

Technical Program Committee:
  Vikram Adve, UIUC
  Marco Ajmone-Marsan, Politecnico di Torino
  Mostafa Ammar, Georgia Tech
  Francois Baccelli, INRIA/ENS
  Ernst Biersack, Institut Eurecom
  Thomas Bonald, France Telecom
  Edmundo De Souza e Silva, Fed U Rio de Janiero
  Christophe Diot, Intel
  Allen Downey, Olin College
  Nick Duffield, AT&T
  Ashish Goel, Stanford
  Leana Golubchik, USC
  Albert Greenberg, AT&T
  Matthias Grossglauser, EPFL
  Mor Harchol-Balter, CMU
  Jennifer Hou, UIUC
  R.K. Iyer, UIUC
  Shivkumar Kalyanaraman , RPI
  Kimberly Keeton, HP Labs
  Peter Key, Microsoft
  Anurag Kumar, IISC Bangalore
  Jim Kurose, UMass at Amherst
  T.V. Lakshman, Bell Labs
  Simon Lam, U Texas at Austin
  Jean-Yves Le Boudec, EPFL
  Kai Li, Princeton
  Zhen Liu, IBM
  Laurent Massoulie, Microsoft
  Rob van der Mei, CWI/Vrije U
  Arif Merchant, HP Labs
  Vishal Misra, Columbia
  Sue Moon, KAIST
  Dick Muntz, UCLA
  Erich Nahum, IBM
  Philippe Nain, INRIA
  Antonio Nucci , Sprint
  Banu Ozden, USC
  Keith Ross, Polytechnic U
  Matthew Roughan, Adelaide U
  Dan Rubenstein, Columbia
  Sanjay Shakkottai, U Texas Austin
  Evgenia Smirni, College of William & Mary
  Daniel Sorin, Duke U
  Mark Squillante, IBM
  R. Srikant, UIUC
  Y.C. Tay, NUS
  Don Towsley, UMass at Amherst
  Phuoc Tran-Gia, U Wurzburg
  Jeffrey Vetter, Oak Ridge National Laboratory
  Geoff Voelker, UCSD
  Jia Wang, AT&T
  Randy Wang, Princeton
  Jun Xu, Georgia Tech
  David K.Y. Yau, Purdue U
  Pen-Chung Yew, U Minnesota
  Philip S. Yu, IBM
  Zhi-Li Zhang, U Minnesota


From antonio.pinizzotto at iit.cnr.it  Sat Apr 30 16:29:03 2005
From: antonio.pinizzotto at iit.cnr.it (Antonio Pinizzotto)
Date: Sun, 01 May 2005 01:29:03 +0200
Subject: [e2e] How to read the TCP congestion window value on Linux?
Message-ID: <427414BF.3040503@iit.cnr.it>


Hi everybody.
Do you know about any way to read the TCP cwnd value (congestion window) 
on Linux?

I have read that on Linux it is not possible to enable a socket option 
(to read to cwnd using the program trpt).

thanks

Antonio


From kaber at trash.net  Sat Apr 30 16:53:44 2005
From: kaber at trash.net (Patrick McHardy)
Date: Sun, 01 May 2005 01:53:44 +0200
Subject: [e2e] How to read the TCP congestion window value on Linux?
In-Reply-To: <427414BF.3040503@iit.cnr.it>
References: <427414BF.3040503@iit.cnr.it>
Message-ID: <42741A88.5000809@trash.net>

Antonio Pinizzotto wrote:
> 
> Hi everybody.
> Do you know about any way to read the TCP cwnd value (congestion window) 
> on Linux?

I guess one of the linux networking lists would be a better place
to ask this. Anyway, you can get cwnd through the TCP socket monitoring
interface using the "ss"-tool from iproute2
(http://developer.osdl.org/dev/iproute2) or by getsockopt(TCP_INFO).

Regards
Patrick