[e2e] end to end arguments in systems design

Thu Dec 6 19:14:37 PST 2007

At Thursday 06/12/2007 12:38 -0800, Lars Eggert wrote:
>Lloyd,
>
>On 2007-12-6, at 7:17, ext end2end-interest-bounces at postel.org wrote:
>>I've recently noticed that RFCs can get published without any
>>reference to how end-to-to-end reliability is ensured, even when
>>it's extremely relevant to the protocol being described and the
>>design decisions made for that protocol. This is not good -
>>particularly when detailing a new transport protocol or
>>entire architecture. Error detection and reliability can't just
>>be ignored.
>
>it sounds like you have a specific protocol/RFC in mind - can I ask  
>which one?

Lars,

Since you asked:

Two examples that spring to mind are in the output of the IRTF Delay Tolerant Networking (DTN) research group - the DTN bundle protocol (now published as RFC5050) and the Licklider transport protocol (pending).

Yes, these protocols are being published as IRTF experimental - which deliberately means 'anything goes' with good reason; no IETF review, to prevent limiting new ideas and approaches, nice big warning boilerplate at the top saying as much, caveat lector. (Though that boilerplate doesn't mention reliability as one example review criterion, it should imo, to encourage readers to bear reliability in mind. We take reliability of computers and transmission far too much for granted these days, and we shouldn't. All those IETFers beavering away on MacBooks which don't even have ECC RAM, not thinking about cosmic rays...)

Still, one would expect any IRTF group to recognise reliability and error detection and handling as important (nay, fundamental! as is layering!) very early on in its initial design discussions...

Once the lack of error detection in these DTN protocols was agreed to be an oversight, late in the design process after much lobbying and discussion, the DTNRG chairs made the call not to disrupt the then-ongoing publication process, rather than alter the protocol designs - having no error detection, or ways to ensure reliability, was seen by them as in retrospect a missing piece that needed to be added, but not considered that important a showstopping oversight or fundamental. [*]

(The DTNRG group has been imo overly focused on security above all else, though lacking a threat analysis of any kind to work from, and attempts have since been made to add in reliability checks via reusing complex security mechanisms - which doesn't quite work for the bundle protocol. The interested can see all the caveats we laid out in the approaches in various drafts, including draft-irtf-dtnrg-bundle-checksum-00.txt, particularly in the security considerations at end. Reusing security protocols in this way also happily allows for more time to be spent on security protocol design, which is the raison d'etre of DTNRG. Still, at least the DTN's reliability problem is now under scrutiny, though rather late, and any kludged fix will be far less than ideal within the constraints of the published protocols.)

This is just IRTF experimental stuff, likely just an interesting thought exercise worked out on paper, and nobody in their right mind would ever choose to deploy first-cut IRTF experimental protocols in real operational scenarios, right? But, should they do so, discovering errors and discussing what to do about them in the limits of the existing protocol designs and implementations could be grist for paper mills for years to come... who knows, the end-to-end principle and the reasons for it may even be rediscovered. 

Now, here's the chilling thought that Jon's email prompted: when the IRTF _itself_ clearly doesn't view the implications of the end-to-end principle and how you get stuff from A to B without detecting introduced errors as fundamentally important to bear in mind when doing initial designs in its research groups, and much effort has to be expended into getting reliability considered as an issue and accepted as worth looking at, "who cares?" wins, and we may as well close down the IRTF e2e mailing list as an obviously long-lost cause.

Mandating an 'Implications for end-to-end reliability' section in drafts to encourage writers to think about reliability is a last line of defence. (Perhaps we should also have an 'Implications for layering' section, which might act as a useful bozo filter.) In many ways, we've already built the flaky network infrastructure that we so richly deserve, and focusing on security alone as a panacea can't fix that.

If you wanted to see arguments about implications of basic end-to-end reliability on design in 2007,
http://maillists.intel-research.net/pipermail/dtn-interest/
was where it was at.

Anyhow, seasons greetings and a happy new year to all and sundry.

L.

[*] to put it in perspective, it's rather like leaving checksums out of early UDP and TCP RFCs, and saying you'll fix it later. What could possibly go wrong?

And since Jon prompted this and I've just been rereading his Dec '94 posting decrying ATM, which echoes down the years, it's tempting to simply say "J'Accuse, DTN."

<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at surrey.ac.uk>