[e2e] New approach to diffserv...

Mon Jun 17 08:40:24 PDT 2002

| 1. Why is routing done with middleboxes? 

Convenience.  The packets have to flow through the middle-box
anyway, and in a hop-by-hop next-hop-forwarding model, it's
easier to have anything in the forwarding path tightly coupled
to the routing "brain".   

(A router could upload static routes at intervals, or whatnot,
and this is effectively what would happen in an integration between an
edge router and the sort of architecture the Cisco 12xxx series uses).

| 2. Where is the financial incentive to build networks if the basic
| network architecture FORCES you to deliver a commodity with no value-added?

How to get (almost) thrown out of a taxi:

	1. Get asked this question by a brilliant end-to-end researcher
	2. Answer:

		packet-mile charging

	   Where "mile" is shorthand for topological distance in 
	   money terms, roughly speaking.

	   Costs are closely related to geographical mileage
	   in terrestrial networks; value is generally believed to
	   be coupled to geographical footprint.   However, 
	   the abstract mile can be fitted to a combination of some
	   sort of distance and the set of equipment in the path.

	   Users (singular or in aggregate) are ALWAYS in control of 
	   how much traffic they send, and where they send it to.

	   We like MTU-sized packets; it keeps us able to focus on
	   fewer headers per second, which are harder to cope with
	   right now than bits per second.

	   Next, offer a simple implementational outline:

		a. routers talking a link-state protocol know the topology
		   of the local wan, and can have several metrics associated
		   with each link; there is no reason why physical distance
		   could not be discovered this way

		b. when you build your forwarding table ("FIB") you 
		   look at your BGP data and assign each prefix a
		   next-hop; the next-hops, if not directly connected,
		   are "recursively" looked up in the table mapping
		   potential BGP next-hops to the next router closer
		   to that BGP next-hop.  That table is constructed
		   via the IGP.

		   Thus, from IGP and BGP you know:

			prefix BGP-next-hop next-router-info metricA metricB ...

		   where the next-router-info is essentially an 
		   (interface, subinterface, MAC, label, ...) vector
		   and metricA is the one fed into the SPF algorithm.

		   metricB could be, for example, the dollar cost of
		   forwarding a packet from "here" to the BGP-next-hop,
		   and is really only important at the edge of your
	  	   network.

		c. you could work out a settlement mechanism by 
		   propagating your "metricB" from a BGP router
		   towards your peers, suppliers, or customers

		d. you don't want micropayments.  you probably don't
		   want to argue about accounting irregularities,
		   so offer a prepayment mechanism and implement
		   it as a simple token bucket

		e. your implementation then offers you a service such that

			i. you offer a tariff (you can be as dynamic as needed)
			   indicating the cost in tokens of sending one packet 
			   to destination X from interface Y

			ii. you refill the token bucket at some interval
			    (you can do this as often as needed; perhaps
			     monthly, perhaps every tenth of a second...)
			    with a fixed number of tokens

			iii. overpayments can add extra tokens as needed;
			     oversubscription can be fixed by adding fewer
			     tokens per interval while adjusting the bill down

		f. Observe that the mechanism exists ALREADY, and that
		   the sole difference is contractual; Cisco's CAR and
		   rate-shape/rate-limit and routing protocols would
		   trivially support this implementation, and does effectively,
		   except that any given packet typically is tariffed at
		   one token, independent of destination.

		   That is, if you have a 2Gbps connection to network X,
		   but only want to pay for a 500Mbps one, a bucket 
		   mechanism is used to keep the amount of traffic you
		   can move into the network at 500Mbps (+/- epsilon).

		   This already enjoys widespread deployment, and is
		   in Internet terms, a venerable product.

		g. Market this as a cost savings for people who send
		   mostly local traffic out from their networks,
		   so that these silly NAPs and exchange points can
		   finally be done away with -- they can buy fewer
		   tokens over time.

		h. Help customers who want to buy exactly enough tokens.
		   There are various ways of doing this.

	    Note that we do not charge for INCOMING traffic, since 
	    there is no practical way to control that, and we don't
	    want victims paying for being the target of a DDOS attack.
	    On the other hand, we do want DDOS magnifiers to fix their
	    networks or pay for infrastructure (and other) improvements
	    to support floods, so fully charging magnifiers by letting
	    their tokens exhaust is a nice thing.

The objection from the researcher I was in the taxi with is that
one of the major strengths of the Internet is that you do not care
where the entity you are talking to is located.   I agree that this
is a strength -- I enjoy communicating with people far away from me,
but I would certainly think about paying by the packet-mile rather
than paying a flat rate (which is point (f) above), and would use
the cheaper option.

Indeed, flat-rate won't go away even if there is widespread acceptance
of this kind of charging mechanism.   The price-point will move though,
depending on the demographics.

However, this is more likely to be a virtuous circle than not -- 
congestion suddenly becomes something that a provider will be DESPERATE
to avoid, because it would mean an immediate drop in revenue as TCPs
back down.   Senders will refine techniques summarized in RFC 2001 to
minimize their token consumption.  They will also send more data per 
overhead byte, and finally take advantage of native multicast.

Finally, in direct answer to your question: the value is in
transporting a datagram from input interface to the correct
output interface with essentially zero loss, very low delay 
(and little delay variance, ideally), and in making places
far away maximally reachable.   

| 3. Why is the network engineered in isolation from applications?

Almost all applications are likely to be able to take advantage
of a well-engineered IP network; the only upper layer consideration
is likely to be the ratio of congestion-avoiding vs
non-congestion-avoiding traffic, and what to do about in-network
traffic replication.

If someone comes up with an application which REALLY can't
be done across the Internet, well, there are other network
technologies on offer (SDH/SONET, for example) which are
not yet gone the way of the dinosaur.  

Essentially the answer is that this approach distributes
the engineering cost of supporting applications away from
the teams which are busily trying to cope with ongoing
huge traffic growth in the "core".

| 4. Isn't e2e just a clever logical deception? It's of course obvious that
| an engineered artifact will have the maximum longevity it if it avoids
| any concession to current needs, but very few of us buy wheels, chassis
| and motor, instead opting for value-added services such as seats and a 
| roof. E2e is just rhetorical "argumentum ad absurdum" wrapped up like
| some engineering mystique, no?

No, because what is being bought is a large set of distributed
computations.  In one of those distributed computations, the
goal is to render a web page on the screen in front of you.
Some of the computation is done locally; some is done within
the network (decrementing TTLs, header-sanity checksumming,
doing forwarding table lookups, constructing the tables, ...);
some is done across the network (fetching the content).

The end2end argument claims that the most effective approach
is to move a maximal amount of work in this distributed computation
to the hosts on either side.  In particular, the reliable transfer
of the data, free from errors caused by in-flight data corruption, 
duplication or loss, ordering the data, handling mismatch 
between different endianneses or software implementations, and
so on, is -- according to the end2end argument -- best done in the
hosts themselves.   

You can certainly buy a distributed computation which does alot
more of this in the middle of the network.  People still use X.25
and datakit...

| Coherent, rational discussion only please - all rants will be ignored.

Oh damn.  That's tricky.

	Sean.