[e2e] TCP ex Machina

John Wroclawski jtw at isi.edu
Sun Jul 21 17:09:53 PDT 2013

Well, it seems to me that the key question is what's being traded off. Think of it this way:

Imagine hypothetically for the moment that TCP is "optimal", ore at least effective, over some very wide and somewhat unspecified range of dynamic operating conditions, in the sense that _if you require that full range_ then TCP does about the best you can do.

Now, here we have some work that basically says "if I narrow the range of conditions, I can do better within that range". (Figure 11 in the paper ..)

So the first level interesting result is essentially pragmatic: that use of the ML algorithm appears to let them, plus or minus, choose any particular manifold within the overall "desired range of conditions" space, and then finds a CC algorithm that works "optimally", or at least "really well", within that manifold. That in and of itself is a neat result.

But it seems to me that a fairly fundamental question is "is the performance improvement intrinsically coupled to narrowing the range, or did it come from the ML approach actually finding a "better" algorithm overall?"

That is

1) hypothetically, you can imagine an "operating-range/performance product", which says that if I accept a narrower range of possible operating conditions I can do better within that range, and this is basically a conservation law;

2) and if that's true, then hypothetically

    - some portion of the performance gain shown by Remy is because they narrowed the range, and thus rode the R/P product, while

    - some other portion of the performance gain is because the ML is producing "better" algorithms overall, that is, more optimal than TCP given the same range objective for operating conditions.

However it doesn't seem clear at the moment that we have any idea what the relative portions are, or even whether the overall hypothesis (the nature & existence of a relatively tractable R/P product in this space) is right. 

But in some sense this is the crucial question for a couple of reasons - both a) because it gets to the question of whether Remy is actually creating "better" algorithms, or just ones with narrower scope, and b), crucially IMO, because it gets to the question of whether the algorithms it creates are "fragile" and prone to rare-event catastrophic failure and "works today but not tomorrow" syndrome.

(To be clear, the authors have made this observation too, as expressed in the discussion at the end of pg. 11 of the paper)

Another way to say this is that if you think of Remy as a pragmatic tool that offers a way to design algorithms that operate "closer to the limits" than current TCP - if only because it lets you pick your limits and then tunes for what you picked - then it becomes even more motivating than today to back it up with a fundamental understanding of complex system CC dynamics and fragility. This for the same basic reason that once people started to try and "optimize" bridge design using finite element analysis, rather than just intuitive and empirical overkill, it became really important to be good at it. (and there was that troubling intermediate step in the middle, when bridges kept falling down..). 

In fact one might argue that TCP is as robust as it is today precisely _because_ no-one actually tried a priori to define some specific "operating condition boundaries" and push right out to those limits, and to the extent that that's what Remy is doing it raises some pretty interesting big-scale questions about how you keep complex systems like the Internet robust in the face of ever-increasing optimization. If, on the other hand, a Remy-like approach could produce higher-performance algorithms over the _same_ range of operating conditions as current TCP, but with that range more explicitly specified, that'd be even more interesting..


On Jul 21, 2013, at 2:14 PM, Jon Crowcroft <jon.crowcroft at cl.cam.ac.uk> wrote:

> it is a tiny bit cleverer than that - the work is the moral equivalent of
> the Axelrod experiment in emergent cooperation, but neater because it is
> quantitative rather than just qualitative selection of strategies - what is
> important (imho) is that they use many many simulation runs to evaluate a
> "fitness" of a given protocol...this is heavy lifting, but pays off - so it
> will be nice to see empirical follow up work but this isn't some naive
> "overfitting" undergrad work - it is rather different and requires a
> considered response
> On Sun, Jul 21, 2013 at 9:28 PM, Detlef Bosau <detlef.bosau at web.de> wrote:
>> To my understanding, you write down the whole communication (who wants to
>> sent what to whom) and afterwards you calculate an optimized schedule.
>> Reminds me of undergraduate homework in operating systems, gantt diagrams
>> and that funny stuff.
>> You cannot predict your link's properties (e.g. in the case of wireless
>> links), you cannot predict your user's behaviour, so you conjecture a lot
>> from presumptions which hardly ever will hold.
>> Frankly spoken: This route leads to nowhere.
>> --
>> ------------------------------**------------------------------**------
>> Detlef Bosau
>> Galileistraße 30
>> 70565 Stuttgart                            Tel.:   +49 711 5208031
>>                                           mobile: +49 172 6819937
>>                                           skype:     detlef.bosau
>>                                           ICQ:          566129673
>> detlef.bosau at web.de                     http://www.detlef-bosau.de

More information about the end2end-interest mailing list