[e2e] Port numbers, SRV records or...?

Wed Aug 9 08:43:34 PDT 2006

Keith Moore wrote:
>> Existing well-known port allocations indicate both protocol and version;
>> that means that there are multiple 'default instances' in that case
>> (e.g., NFS).
> 
> Version and instance (in the sense that I was using the word) are
> orthogonal.
...
> NFS doesn't work this way (so it's kind of a strained example), but for
> a more reasonable file sharing protocol it would make sense to have
> different sets of file systems available for mounting and access from
> different ports.  So for example, a default set of file systems to be
> exported to clients might be on default port A, with alternate sets of
> file systems exported on ports B and C.  

That's a really good example of what NOT to do with ports. Picking a
subset of mount points isn't a service; it's a decision within a
service. NFS (correctly) already provides that.

>>> A host can also use HTTP to provide things other than web servers, and a
>>> host can have web servers running other protocols such as FTP.  So we
>>> have service names, host names, services, protocols, and ports - each
>>> subtly different than those next to it.
>> A few questions:
>>
>> - how are service names different from services?
> 
> It's hard to nail down the terminology.  I was actually using service
> names in two different ways.  One can think of www.example.com as a
> service name, a name of a service that provides a set of web pages.

It's a DNS name. It could indicate a place to FTP, a place to NFS mount,
etc. That's ALL it is. There's no semantic value to "www." in the prefix.

It only means a web server when
a) you connect to it on a well-known port (HTTP the service)
b) you issue an HTTP request on that port

You can layer DNS on top of HTTP if you want, but as far as the port is
concerned, you're doing HTTP, period. If you want to do DNS on port 80,
you issue DNS requests (native, i.e.) on that port.

> The other kind of service name I was wanting to talk about was the
> notion of service independent of protocol.  "web server" captures the
> notion of a service that exists for the purpose of providing web pages
> to web browsers. 

It's not the name of a service, but a group of applications run on
different machines. If they interact, it's a distributed service. If
not, it's an aggregate service.

> That's distinct from HTTP, which is a protocol, in two
> ways.  One is that HTTP can be (and is) used for purposes other than
> providing web pages for use by web browsers.

That's completely hidden from the use of the well-known HTTP port to
indicate the HTTP protocol and web service, as per above, so that's not
relevant at this level.

If you want to talk about THAT level, you need to indicate how the
endpoint apps decide to talk, e.g., DNS over HTTP. That is NOT a
transport ID at the layer HTTP is working, and so is irrelevant to the
transport port.

> While the same HTTP
> instance can be used to provide both web pages and other kinds of data,
> to be used by web browsers and/or other kinds of clients, in practice
> it often makes sense to have different instances of HTTP do the
> different jobs.

That argues for a way to demux things inside HTTP based on "I'm doing
DNS over HTTP" indicators. However, (incorrectly), some use a-priori
knowledge of which HTTP server is running which layered service to argue
that they need multiple HTTP servers. They DO NOT.

> The other difference between service name and protocol is that you
> don't necessarily want to tie them together too closely because someday
> you might like to have a different (hopefully better) protocol provide
> the same service. 

That's what version numbers inside protocols are for - demuxing versions
of a protocol. It would be useful if the IETF and IEEE would a) require
them, and b) use them (they do not - e.g., using a different 802 type
for IPv6 was an error, motivated only by short-term desire to make
cheaper ethernet switches).

>> 	transport should indicate how to parse the next layer,
>> 	e.g., to indicate "HTTP". HTTP already provides for ways
>> 	to indicate the next layer, which is similar to what
>> 	others call 'semantics', e.g.: ftp:, http:, etc. 
> 
> That's a very confusing example.  If you send a  URL prefix in an HTTP
> request it's because you're talking to a web proxy,

See RFC 2616, sec. 5.2.1. The absolute GET is temporarily, in 1.1,
required as follows:

       GET /pub/WWW/TheProject.html HTTP/1.1
       Host: www.w3.org

This is just to allow backward compatibility with previous HTTP versions
until the absolute URL form is required. They're semantically
equivalent, though.

> and in this case
> (to the web proxy) the URL prefix is really just a way of
> distinguishing one kind of resource from another and indicating what
> protocol is to be used by the proxy to access that resource.  I
> wouldn't use the word "semantics" to describe this because the
> semantics of a resource accessed by ftp are no different than the
> semantics of the same resource accessed by http.

FTP supports modes that HTTP does not, and vice versa. The semantics are
not the same, except in the trivial one-file case.

>> Ports really indicate which instance of a protocol at a host, IMO - but
>> supporting that in TCP requires redefining the 'socket pair' to be a
>> pair of triples: "host, protocol, port" (rather than the current "host,
>> port").
> 
> Why do we need to expose the protocol in TCP?  Why isn't the port
> selector sufficient? 

I discuss this in the ID I noted before (draft-touch-tcp-portnames). The
port selector is used to demux connections and attach them to processes
on a host; the protocol (portname in my version, well-known port in the
current version) indicates the protocol.

> The destination host knows which protocol is
> being used, the source host presumably also knows (it has some reason
> for choosing that port, whether it's because it's a well-known port or
> a SRV record or configuration data or whatever), and I wonder whether
> anyone else needs to know.

That's only for well-known ports; you're saying that anything the
endpoints agree to a-priori need not be in the packet. That's true, but
limiting (see the ID for reasons).

> Exposing the protocol in TCP, making it
> explicit rather than just a convention, further encourages
> intermediaries to try to interpret it....

Not more or less than well-known ports. The string "NFS" could mean HTTP
to you and me, NFS to some other pair, or DNS to another. The meaning of
transport information is local to the transport endpoints.

>> However, although there are many who want to consider multiple instances
>> of the same protocol, it's not clear how a source would know which
>> instance to talk to. IMO, instances are basically bound to the
>> destination IP address, and if you want multiple instances, get multiple
>> addresses - because the same resolution that determines host determines
>> instance, IMO.
> 
> I don't immediately understand why the IP address is a better instance
> selector than a port, except perhaps for protocols that use multiple
> ports.

That's why.

> And it seems that for better or worse NAPTs have made this less
> feasible because they mean that A:P1 and A:P2 might not actually reach
> the same host. 

They do reach the same host - the NAPT emulates exactly one host. To the
extent that they do not reach the same host, protocols break (e.g., FTP,
H.323, etc.)

> Also (as we're seeing with IPv6), assigning multiple IP
> addresses to a host can be problematic - which ones does the host use
> to source new traffic? 

The problem there is different. When you assign multiple addresses to a
host, you're making a single host into multiple virtual hosts. When
that's not what you're doing, things break. That's not a surprise either.

>>>> The key question is "what is late bound". IMO, we could really use
>>>> something that decouples protocol identifier from instance (e.g.,
>>>> process demultiplexing) identifier.
>>> We could also use something that decouples service from protocol.  (do
>>> we really want to be stuck with HTTP forever as the only way to get web
>>> pages?  SMTP as the only way to transmit mail?)  How many layers do we
>>> want?
>> We do in HTTP. 
> 
> Strongly disagree.   And I think that's a very shortsighted view.

I've shown above that it's layered and flexible. What's shortsighted
about that?

>> We might be able to use that in other protocols, but
>> that's a decision for those protocols, not TCP, IMO.
> 
> Well, sure, the decision about whether to upgrade or replace one
> protocol must be made independently of the decisions for other
> protocols.  And I don't see offhand why TCP needs to change to
> facilitate that, except maybe to expand the port space beyond 16 bits.

You're still overloading protocol with process demuxing; IMO, it is that
which was shortsighted and we have an opportunity to correct.

Joe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url : http://mailman.postel.org/pipermail/end2end-interest/attachments/20060809/dbeb6344/signature.bin