[e2e] TCP "experiments"

Matt Mathis mattmathis at google.com
Mon Jul 29 22:26:59 PDT 2013


Good point about potential unintended consequences of shipping new features
with the instrumentation stripped....  Perhaps we should rethink that one
little detail.

Global scale experiments are always geometric spirals, iterating between
developing and testing across an exponentially growing pool of users.    As
the pool grows you keep looking for unexpected behaviors etc. and move on
when you convince yourselves that they have been addressed and all old
features have been properly regression tested (e.g. packetdrill).  The
really critical question is at what point do you (the researcher/developer)
lose control of your own legacy and we all become forced to maintain
compatibility with code that has escaped.

If you look at all of the stuff that we have done (both in transport and
above), I think you will find that we have a pretty good handle on making
sure that obsolete code actually gets deprecated and does not persist.

And that is what justifies calling it an experiment.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

Privacy matters!  We know from recent events that people are using our
services to speak in defiance of unjust governments.   We treat privacy and
security as matters of life and death, because for some users, they are.


On Mon, Jul 29, 2013 at 10:25 AM, Joe Touch <touch at isi.edu> wrote:

>
>
> On 7/27/2013 7:20 PM, Matt Mathis wrote:
>
>> The real issue is the diversity of implementations in the Internet that
>> allege to be standard IP and TCP NAT, but contain undocumented "features".
>>   No level of simulation has any hope of predicting how well a new
>> protocol
>> feature or congestion control algorithm will actually function in the real
>> Internet - you have to measure it.
>>
>> Furthermore: given that Google gets most of its revenue from clicks, how
>> much might it cost us to "deploy" a protocol feature that caused 0.01%
>> failure rate?  If you were Google management, how large of sample size
>> would you want to have before you might be willing actually deploy
>> something globally?
>>
>
> I don't understand the logic above, summarized IMO as:
>
>         - big deployment is required to find issues
>         - issues are found by measurement
>         - Google doesn't want to deploy protocols that cause failures
>
> Take that with the current actions:
>
>         - Google management is comfortable deploying protocols
>         that are not instrumented
>
> Do you see why the rest of us are concerned?
>
> Besides, there are different kinds of failures. I doubt Google wants a
> 0.01% failure of its current base, but if the feature increases its base by
> 0.02%, do you really think it wouldn't go ahead and deploy?
>
> And worse, what if the 0.01% failure wasn't to Google connections, but a
> competitor? Or just the rest of the Internet?
>
> My view has been that we need to protect the Internet for *when* there is
> no more Google. I don't trust Google (or any company) to do that.
>
> Joe
>
>


More information about the end2end-interest mailing list