[e2e] Port numbers, SRV records or...?

Wed Aug 9 11:34:13 PDT 2006

Keith Moore wrote:
>>> NFS doesn't work this way (so it's kind of a strained example), but
>>>  for a more reasonable file sharing protocol it would make sense to
>>>  have different sets of file systems available for mounting and
>>> access from different ports.  So for example, a default set of file
>>>  systems to be exported to clients might be on default port A, with
>>>  alternate sets of file systems exported on ports B and C.
>>
>> That's a really good example of what NOT to do with ports. Picking a
>>  subset of mount points isn't a service; it's a decision within a
>> service. NFS (correctly) already provides that.
> 
> It's been awhile since I read the specifications but I do not recall
> any facility within NFS that defines sets of exported file systems
> provided by a host - in all of the NFS systems of which I'm aware file
> systems have uniquely named mount points and there is no way for a
> server to export multiple sets of file systems with overlapping mount
> point names between the sets. 

Right. NFS provides mount points. You want sets of mount points. Call
the NFS protocol designers and add that. That's not a transport demux or
protocol choice issue; that's internal to NFS. If it's not there, then
add it there.

> And I disagree with your characterization.  The nice thing about
> grouping several mount points together under one port is that it's very
> easy to change the set of mount points by changing the port.

That's called a "hack". It works, but let's not design our protocols
around them.

...
> Perhaps it would be better if every protocol were explicitly designed to
> support multiple named service instances on a single port, but that
> would be difficult given that different service instances sometimes
> require different processes and different server codes.

Difficult? IMO, it's appropriate and necessary.

> But the wider point is that ports don't really identify protocols -
> fundamentally they're just demultiplexing tokens. 

Not today. Today they're both; we can decouple the two in many ways
which I tried to explain in the ID.

> They can be used for
> multiple purposes, and this is a good thing.  The convention for mapping
> between ports and protocols/services is just that - a convention.

Conventions that are required on both ends of connection are called
protocols ;-)

...
> Agreed that it's a DNS name and agreed that the prefix "www" doesn't
> imply anything semantically (though it's useful for illustrative
> purposes in an example).
> 
> But there is still a subtle distinction between DNS name and service
> name (as I was using the term here), as there are DNS labels and RRsets
> that don't correspond to any AAAA or A record.

Sure, agreed.

>> It only means a web server when a) you connect to it on a well-known
>> port (HTTP the service) b) you issue an HTTP request on that port
> 
> It's not necessarily a web server even then, because HTTP doesn't really
> imply the web.

Right - to most people, HTML does. HTTP is one way to transport HTML,
but we could use FTP for many things (except forms, e.g.).

> Nor does port 80.  There are uses of HTTP (as in IP over
> HTTP) that don't resemble anything most people would recognize as the
> web. IMHO, something becomes part of the web when it is linked to from
> the web.  (What's the root of the web?  Mu. :)

The web doesn't need a root, any more than the Internet has one (root IP
address?). The web is any place that reaches the rest of the web - it's
an application layer form of the Internet in that sense.

>> You can layer DNS on top of HTTP if you want, but as far as the port
>> is concerned, you're doing HTTP, period. If you want to do DNS on port
>> 80, you issue DNS requests (native, i.e.) on that port.
> 
> Right, but then you'd be running a different service than providing web
> pages even if you were running HTTP on port 80. 

If you want to define 'service' as HTML, then fine. But note that port
80 means HTTP - it does NOT mean HTML. You're still running HTTP when
you run DNS over HTTP. Never mind that the output wouldn't be what most
people want to see on their screens.

I think I now understand what you want - you want "application" in the
ISO sense, whereas HTTP is "application" in the Internet stack sense
(the thing that lives over TCP).

If you want an 'application location service', fine. That's not what TCP
demuxes on, nor is it what ports indicate. Ports indicate protocols on
top of TCP.

 >>> That's distinct from HTTP, which is a protocol, in two ways.  One is
>>> that HTTP can be (and is) used for purposes other than providing
>>>  web pages for use by web browsers.
>>
>> That's completely hidden from the use of the well-known HTTP port to
>>  indicate the HTTP protocol and web service, as per above, so that's
>> not relevant at this level.
> 
> But port 80 does not (reliably) either indicate the HTTP protocol or the
> web service.

In 'well known ports' it does. I've already discussed that the real
meaning is an agreement that's private to the endpoints - either agreed
a-priori (well known) or indicated explicitly.

...
> Port number (host demultiplexing token), protocol, and service provided
> are largely orthogonal.  It's just that in practice, and by convention,
> we tend to associate them.

Where 'service' is ISO application, yes, they're different. It's not
practice or convention - it's part of the way well-known ports in TCP
are defined, and that's part of the protocol (pick what layer you want
to define that at - port within TCP, or service over a port -- both are
fixed by well-knowns).

>>> While the same HTTP instance can be used to provide both web pages
>>> and other kinds of data, to be used by web browsers and/or other
>>> kinds of clients, in practice it often makes sense to have different
>>> instances of HTTP do the different jobs.
>>
>> That argues for a way to demux things inside HTTP based on "I'm doing
>>  DNS over HTTP" indicators.
> 
> No it doesn't, because that would imply (for instance) that all apps
> need to define their own demuxing protocols to support different service
> instances.

YES!

> It would also imply that all application codes need to
> interface through a general purpose application-specific demuxer (rather
> than just listening to a specific port) so that multiple instances of
> the same application (using different codes) could each share a single
> port.

YES!

I see nothing wrong with either; in fact, I see both as being necessary
and appropriate, even if not currently used.

>> However, (incorrectly), some use a-priori knowledge of which HTTP
>> server is running which layered service to argue that they need
>> multiple HTTP servers. They DO NOT.
> 
> As a practical matter, that is simply incorrect.  Try getting two
> different HTTP protocol engines (either from different vendors and/or
> built to serve different purposes) to run on the same port on the same
> host.

That's an implementation problem, partly due to the lack of demuxing
info in HTTP (although some servers do support this 'internally',
calling it virtual-hosting), and partly due to inter-vendor competition.

>>> The other difference between service name and protocol is that you
>>>  don't necessarily want to tie them together too closely because
>>> someday you might like to have a different (hopefully better)
>>> protocol provide the same service.
>>
>> That's what version numbers inside protocols are for - demuxing
>> versions of a protocol.
> 
> Disagree.  An in-band protocol version number isn't very useful unless
> the two protocols are similar enough that you can feasibly negotiate
> version from the same protocol engines. 

If you put the version number in the front of the stream/packet (like
you're supposed to), that's sufficiently similar, and you can redefine
the rest.

> For minor protocol changes this
> is fine, but for major protocol changes (or to migrate to a very
> different kind of protocol) you need a way of distinguishing between one
> protocol/version and another before you've chosen a particular protocol
> engine. 

You need to parse a packet to do this. Parse NFS and figure out what
version it is *then*.

> Ports are a good way of doing this on an occasional basis.

Ports are a hack for doing this at TCP because NFS didn't do it inside NFS.

>> It would be useful if the IETF and IEEE would a) require them, and b)
>>  use them
> 
> In my experience, protocol version numbers are very rarely useful.
> Either the changes you want to make to a protocol are incremental (in
> which case other kinds of feature negotiation, such as capability lists,
> or tagged options, tend to work better) or the changes you want to make
> to a protocol are so significant that you really want to feed the new
> protocol to a completely different protocol engine.

Nobody said you need a monolithic implementation; if you want, make a
parser (at the NFS level, e.g.,) that hands off streams to NFSv3 or
NFSv4 as needed.

>> (they do not - e.g., using a different 802 type for IPv6 was an
>> error, motivated only by short-term desire to make cheaper ethernet
>> switches).
> 
> Seems fairly harmless, but maybe I'm not aware of the downsides.
> 
> IPv4 and IPv6 are of course distinguishable, but it's unclear how the
> numerous existing implementations of IPv4 would have treated a version
> field of 6 (most would probably discard it, some would probably ignore
> it, a few would probably barf because that code was buggy and the path
> had never been tested).

Unclear? What are we here for? How they're supposed to react is quietly,
by dumping the segments. Period. See RFC1122 Sec 3.2.1.1 and RFC 1812
Sec 5.2.2.

...
> But from another angle: should the name of a protocol be embedded into,
> and tied to, a reference to a resource?  That would imply that any
> resource named by an http: URL is inherently tied to HTTP and can only
> be accessed via the HTTP protocol.  In a future day where there is a
> significantly better alternative, this would be a pessimal choice.  As
> it is we feel very constrained to try to improve HTTP only in ways that
> can easily be done over TCP on port 80 by the same protocol engines that
> parse HTTP 1.0 and 1.1, which is IMHO probably unfortunate.

So you want to push versioning, instances, and everything else -
including data subsets, etc. - to the port level.

Let's move hosts there too. We can all be IP address 10.0.0.1, and
differentiate who's who, etc. - everything - on port numbers.

--

Either that, or layering is useful (IMO, yes), and we need to decouple
things that are not REQUIRED to be coupled.

>>>> Ports really indicate which instance of a protocol at a host, IMO
>>>>  - but supporting that in TCP requires redefining the 'socket pair'
>>>> to be a pair of triples: "host, protocol, port" (rather than the
>>>> current "host, port").
>>> Why do we need to expose the protocol in TCP?  Why isn't the port
>>> selector sufficient?
>>
>> I discuss this in the ID I noted before (draft-touch-tcp-portnames).
>> The port selector is used to demux connections and attach them to
>> processes on a host; the protocol (portname in my version, well-known
>>  port in the current version) indicates the protocol.
> 
> "The protocol...indicates the protocol" 

Sorry - the protocol (TCP) indicates (via portname) the protocol (at the
next layer).

> doesn't answer the question of
> why it is needed to expose the protocol to the network, and the
> reasoning in the document seems muddy.

You don't. You need to expose it to the endpoint.

> I recognize that there is a danger of port space exhaustion, but there
> are lots of ways to solve this problem that don't require exposing the
> name of the protocol to the network.

Example, please. (I discuss alternatives in the ID; is there some aspect
that is not covered, or an example missed?)

> I also recognize that there is a demand (perhaps a naive one) by network
> operators to be able to identify and filter traffic based on various
> criteria including probably both protocol (as when specific protocol
> engines are known to be broken) and service (as when the network
> operator wants to prohibit certain kinds of traffic on its network). But
>  it's not immediately clear that explicit labels are actually useful, or
> what it would take to make them useful.  (some of the arguments against
>  definition of the "evil bit" might apply here also).  Even if explicit
> labels were useful for filtering, should they be based on protocol (e.g.
> "http") or service ("http for web pages" vs. "http for IP tunneling")?

Again, the label is truly meaningful only to the other end. It's not
there for filtering. It WILL be used for that, just like well known
ports are now. And just as (in)effectively - anyone with a brain will
circumvent that filtering by changing the meaning of the strings on the
subset of hosts they configure out-of-band.

>>> The destination host knows which protocol is being used, the source
>>>  host presumably also knows (it has some reason for choosing that
>>> port, whether it's because it's a well-known port or a SRV record or
>>> configuration data or whatever), and I wonder whether anyone else
>>> needs to know.
>>
>> That's only for well-known ports;
> 
> no, it's true regardless of how the initiator chose which destination
> port to use.

So I use the string "APPLESAUCE" - what protocol does that mean, please?

I can either scramble meaning (use HTTP for DNS) or just use nonsense
strings (as above). Only well-known ports assume global a-priori
agreement of meaning.

>>> Exposing the protocol in TCP, making it explicit rather than just a
>>>  convention, further encourages intermediaries to try to interpret
>>> it....
>>
>> Not more or less than well-known ports. The string "NFS" could mean
>> HTTP to you and me, NFS to some other pair, or DNS to another. The
>> meaning of transport information is local to the transport endpoints.
>>
> Then why make it a string and increase the chance for misinterpretation?

I like strings because they're nearly as compact, don't need a two-level
IANA registration, and are more inherently extensible (does 0123 mean
the same as 123? - not as strings). Otherwise, no difference.

The portnames ID argues primarily for another ID for the protocol
separate from the port. The use of strings is my personal preference,
but not required.

>>>> However, although there are many who want to consider multiple
>>>> instances of the same protocol, it's not clear how a source would
>>>>  know which instance to talk to. IMO, instances are basically bound
>>>> to the destination IP address, and if you want multiple instances,
>>>> get multiple addresses - because the same resolution that determines
>>>> host determines instance, IMO.
>>> I don't immediately understand why the IP address is a better
>>> instance selector than a port, except perhaps for protocols that use
>>> multiple ports.
>>
>> That's why.
> 
> well, the notion of 'host' is a lot fuzzier than it used to be, so the
> assumption that A:P1 and A:P2 are really on the same host (in the sense,
> say, of being able to access the same private data) is a lot more
> dubious than it was in the mid-1970s.

It's not dubious; it's required by the Internet architecture as
specified by RFCs 1122, 1812, etc. NAPT users do cartwheels to restore
that correlation.

> in addition to NAPTs, we have
> large distributed memory clusters, we have big-IP boxes.  these days I'd
> regard use of multiple well-known (well-known or preassigned) ports in a
> single protocol as bad protocol design.  I'd also probably regard any
> assumption that A:P1 and A:P2 were inherently on the same host as bad
> protocol design.  That doesn't mean that a protocol can't have use
> mutliple ports, but that (e.g.) it should have at most one well-known
> port with the other ports (and perhaps IP addresses) negotiated in-band.

Being able to reach the same data at different hosts isn't the same as
the same data. Forms are an example of that. If you want to have a
big-IP box, then front end it with a single IP address, or you'll need
to do cartwheels to emulate that (e.g., send all messages from a single
source IP address to the same backend server, share state via a locking
dbase, etc.)

>>> And it seems that for better or worse NAPTs have made this less
>>> feasible because they mean that A:P1 and A:P2 might not actually
>>> reach the same host.
>>
>> They do reach the same host - the NAPT emulates exactly one host.
> 
> To the extent that it does so (and I think that's a stretch), it does so
> very poorly.  They're not the same host in any reasonable sense that an
> application can reliably make use of, only in an abstract sense that is
> meaningless from a practical perspective.

They are a single host in every sense that the Internet expects (as per
below).

>> To the extent that they do not reach the same host, protocols break
>> (e.g., FTP, H.323, etc.)
> 
> Well, we all know that NATs break things.  But NATs aren't the only
> reason that it's dubious to assume that A:P1 and A:P2 are on the same host.

Right. But anything that breaks A:P1 and A:P2 aren't the same host is
broken by Internet standards (literally) - including port-specific
policy routing, etc. There are reasons to use many of these things, but
as far as the Internet architecture is concerned, they've busted it.

>>> Also (as we're seeing with IPv6), assigning multiple IP addresses to
>>> a host can be problematic - which ones does the host use to source
>>> new traffic?
>>
>> The problem there is different. When you assign multiple addresses to
>>  a host, you're making a single host into multiple virtual hosts.
>> When that's not what you're doing, things break. That's not a
>> surprise either.
> 
> Tell that to the IPv6 architects.

I have ;-)

Actually, most of what they do makes the host into a router+host
combination. There's still the issue you raise - which address "is" the
host, and when you don't pick a single one, you break things. Again,
unsurprising.

What IPv6 wants is a router+host. What it gave us is multihoming without
routing, which doesn't work.

>>>>>> The key question is "what is late bound". IMO, we could really use
>>>>>> something that decouples protocol identifier from instance (e.g.,
>>>>>> process demultiplexing) identifier.
>>>>> We could also use something that decouples service from protocol. 
>>>>> (do we really want to be stuck with HTTP forever as the only way to
>>>>> get web pages?  SMTP as the only way to transmit mail?)  How many
>>>>> layers do we want?
>>>> We do in HTTP.
>>> Strongly disagree.   And I think that's a very shortsighted view.
>>
>> I've shown above that it's layered and flexible. What's shortsighted
>>  about that?
> 
> Maybe I've lost track of what you were arguing here, but you seemed to
> be saying that we should be stuck with HTTP forever, and constrained
> from now to the end of time to change HTTP only in a way that is
> compatible with existing protocol engines. 

Nope - all I'm saying is that if you want to change HTTP, use it's
existing versioning. Show me something you cannot do that way (e.g.,
consider protocols that currently lack version numbers that you want to
add), and then maybe it's really a different protocol and deserves a
different 'portname'. Otherwise, it's a single protocol and demuxing
goes inside it (logically).

And please, let's not discuss poorly designed implementations further.
Protocol engines are an implementation issue; there are good ones and
bad. A good one first triages messages based on version. A good one
allows back-end version-specific engines to be hooked together.

>> You're still overloading protocol with process demuxing; IMO, it is
>> that which was shortsighted and we have an opportunity to correct.
> 
> No, I'm reiterating that the demuxing token we call a port number is not
> inherently tied to either protocol or service.

And I am reiterating that they are.

> And before we try to
> explicitly expose either protocol or service to the network, we need to
> be much clearer about what this actually means (protocol name is not at
> all adequate), how this affects operations such as protocol agility, how
> reliable this field can actually be in practice, its potential for
> misuse, its likely affect on network transparency and the resulting
> ability of the network to support new applications, and several other
> considerations.

Huh? So we can't define the next layer up until we define all the
layers? I don't agree.

The whole point of layering is that demuxing is local. TCP has two kinds
of demuxing: instance of a connection, and protocol above it; thus the
two IDs (port and portname, as I suggest). The rest (service, etc.) is
inside the data stream at that point.

> IMHO we'd also need to be clear (for the sake of backward compatibility)
> that the protocol name is not a demultiplexing token that can be used in
> addition to ports, that it doesn't change the way that ports work.

It overrides the a-priori well-known ports list deployed on hosts. It
changes behavior only at connection establishment; thereafter, both
existing TCP and TCP-portnames demux on just ports.

The problem is that connection establishment is more than just demuxing;
it's attaching to the next layer up.

> If
> we need more ports we need to define a straightforward way of extending
> port space, which is orthogonal to providing a way of exposing protocol
> and/or service names to the network.

That's what portnames are - orthogonal. Service names belong the next
layer up, inside HTTP, NFS, DNS, etc.

Joe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20060809/46e2565f/signature.bin