[e2e] Open the floodgate - back to 1st principles

Jon Crowcroft Jon.Crowcroft at cl.cam.ac.uk
Sun Apr 25 06:11:32 PDT 2004


so going back to the original posting  about Yet Another Faster TCP
(and i recommend people go look at
http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/bicfaq.htm
for the technical background),
it would be nice to think through some of the economics here

the motive in almost all the Go Faster Stripe TCPs
is often cited as the time it takes TCP AIMD 
go get up to near line rate on todays ultra-fast networks

but why are we considering this as very important? how many times does a TCP session
actually  witness a link with the characteristics cited (and tested against) in the real world, e.g. 
100ms, 1-10Gbps?, without ANY OTHER sessions present? in the test cases, we often see people spending sometime at
places like CERN, SLAC, and so forth, waiting for a new optical link to be commissioned, before they get to be
abler to run their experiment - how long does one have to wait before the link routinely has 100s or 1000s of flows
on it? at which point why are we trying to get 100% of the capacity in less than the normal time

another side to this motivational chasm seems to me to be: if we have a really really large file to transfer, does
it matter if we have to wait 100s of RTTs before we get to near line rate? frankly, if its a matter of a GRID FTP
to move a bunch of astro, or HEP or genome data, then there's going to be hours if not days of CPU time at the far
end anyhow, so  a few 10s of seconds to get up to line rate is really neaither here nor there (and there are of
course more than 1 HEP physicist going to be waiting for LHC data, and more than one genetecist looking at genome
data, so again, what is the SHARE of the link we are targetting to get to?)?

so of course then there's the argument that with even fiber optic loss rate, TCP on its own, on 
a truly super duper fast link with sufficient RTT, will never even get to line rate, coz the
time to get from half rate to full (i.e. 1 packet per rtt, so W/2 RTTs where W = capacity/RTT), is long enough to
always see a random loss which fools the TCP - this last point is fixed simply by running SACK properly ,and
admiting there might be merely TWO tcps and random losses, although bursty at the bit level, are hardly likely to
correlate at the packet level, especially not across the large MTUs we might use on these links.

[note also none of the Big Science TCP users with these typoes of datarates pretend to have humans at the end
points -while there are people connecting CAVEs and other VR systems to phsyics simulation systerms, the data rates
are things we managed to do rather a long time back....usually - often one can move a lot of the computer model to
the right end of the net to achieve the right mix of throughput/latency even there, too so I am doubtful people
need more than about 250Mbps that HDTV types ask for)

So, back to economics - in general, we have a network which is speeding up in all places - in the core it is
speeding up for 2 reasons
1/ number of users-  primary reason I believe
1/ access link speed up - secondary (but I could be wrong)

access links speed up in 2 general ways
i) we replace site 10baseT with 100baseT with GigE with 10GigE etc -this is really corporate or server side stuff.
ii) we (on a logistical long time scale) replace individual user lines or SMEs lines with soemthing a bit better
(modem -> DSL, dialup to cable modem, etc)

I guess someone will have the historical data on thsi but taking the UK - we were doubling the number of dialup
users each year, but it took 10 years to go from 0 to 2M DSL lines - so the contribution from raw browser demand
cannot be nearly as significant as the mere contribution of weight of numbers.

Hence, going back to the TCP argument above, we might expect the number of TCP sessions on large fat pipes to
always be high - 

so while there is an increase in the rate TCPs sessions would like to run at, i believe it is much slower overall
than we are anticipating  - its probably worth being smarter about random losses, but what I am arguing is that we
should change the concentration of work on highspeed/scalable/fast/bic, to look at the behaviour of large numbers
of medium speed flows


back to the physics example - after processing , the total cross section of data from the LHC at CERN is 300Mbps.
that is NOT hard. in the genome database, a typical search can result in around 300Gbyte of intermediate data
(sometimes) - however this is usually input to something that takes a few days to process (some protein expression
model or whatever) - no problem.

I'd love to see a paper where a 10Gbps link has say 1000 flows of varying duration on it...

cheers

   jon



More information about the end2end-interest mailing list