[e2e] Port numbers, SRV records or...?

Fri Aug 11 07:15:17 PDT 2006

At 2:02 -0400 2006/08/10, Keith Moore wrote:
>>>>  First of all, be careful.  Again, the infatuation with names of hosts
>>>>  are an artifact of the current tradition.  They are not necessary for
>>>>  naming and addressing for communications.  TCP can be used for any
>>>>  application which may or may not use one of the standard protocols.
>>>
>>>I'm not sure what point you are trying to make here.  Applications care
>>>about names of hosts, services, things in DNS, etc. I don't think
>>>that's infatuation, but recognition of a need, as apps have to
>>>interface with humans.  TCP doesn't, and IMHO shouldn't, care about
>>>such things.
>>
>>No Applications care about names of other applications.  TCPs care 
>>about names of other TCPs.  Host names were something that came in 
>>very early and were a bit of sloppiness that we got very attached 
>>to. I used "infatuation" because I have seen the look of abject 
>>terror on some people's faces when I suggested it was irrelevant to 
>>the naming and addressing problem.
>
>Applications care about the names of lots of different things, if 
>for no other reason than they interface with humans that care about 
>names of lots of different things.  Humans care about things that 
>they call computers, and which more-or-less correspond to things 
>that network geeks call hosts, which tend to be boxes with wires 
>attached to them that have lots of components inside them that 
>manipulate or transmit or store data and permit it to be retrieved 
>again.  Applications that help humans deal with computers need to 
>know about names of those computers and how to deal with those 
>names.  They might not need to know much about the structure of 
>those names, they may just treat those names as opaque or they might 
>need to use those names as query strings, but they do need to deal 
>with them on some level.
>
>Now the abstraction we call a host today is somewhat fuzzier than 
>the abstraction we called a host in the mid 1970s.   When a host had 
>clear boundaries, when there were resources like disk drives that 
>were clearly owned by a host (i.e. were private to that host, could 
>only be accessed from that particular host), when hosts were fairly 
>modest in terms of resources and tended to have a single network 
>interface, IP address, and identity; one set of users, one set of 
>files with ownership tied to those users - a lot of uses for "host 
>names" made more sense than they do now.  It made sense to telnet to 
>a host, rather than an instance of a telnet server that happens to 
>sit at an IP address.  It made sense to ftp to a host and be able to 
>access the set of files on that host.  It made sense to treat the 
>users of that host as a group of users for email, such that every 
>user on that host could be reached by sending mail to that host's 
>mail server.
>
>These days, as I said, boundaries are much fuzzier, and the host 
>abstraction makes less sense.  Hosts tend to be single-user in 
>practice, though we also have DNS names (think aol.com, yahoo.com, 
>gmail.com) that correspond to huge user communities.  We have 
>virtual hosts that sit on one box (granted, OS/VM existed a long 
>time ago, but now such things are more commonplace).  We have 
>"hosts" that are really computing clusters of individual processing 
>elements, where one "host" abstraction corresponds to the whole 
>cluster and separate "host" abstractions correspond to each of the 
>nodes, so that we have hosts contained within hosts.  We have 
>multiple hosts that all share resources (disks, sometimes memory) 
>such that any of them is more-or-less equivalent to another, and yet 
>there may or may not be effective IPC or memory sharing between 
>different processes on that host if they reside on different PEs.  A 
>user community is no longer likely to be defined by virtue of having 
>logins and sharing access to resources on a single host.  These days 
>the notion of telnet'ing to a host, FTPing to a host's file system, 
>talking about a host's users, or in general treating the host as an 
>abstraction that has a network interface and an IP address and a 
>collection of services that allow that host's resources to be 
>accessed and the host's users communicated with - these things are 
>still useful and the concept of "host" is still useful to us, but we 
>know of enough exceptions that we no longer treat it as the general 
>case.
>
>So we no longer expect that what used to be called "host names" are 
>inherently names of hosts - we recognize that a name might be the 
>name of something we used to think of as a host, or it might just be 
>some other name that is used to access some resource or collection 
>of resources, and we can't really tell the difference by looking at 
>either the name or its DNS RRset.  The mapping between names and 
>hosts (and for that matter addresses) is essentially arbitrary as 
>the set of services intended to be associated with a DNS name can 
>exist on zero, one, or any number of hosts/addresses/network 
>interfaces; and the set of hosts (etc.) supporting the services 
>associated with DNS name X can be identical to, disjoint from, or 
>intersect with the set of hosts supporting the services associated 
>with DNS name Y.   And with MX and SRV records the set of 
>hosts/addresses one which one service resides can be different than 
>the set of hosts/addresses at which another service resides, even if 
>both services share a common (DNS) name.
>
>So while it's a bit of a stretch to say that host names aren't 
>useful anymore, these days it's pretty difficult to make any 
>reliable statement about the characteristics of a host name.

Methinks he dost protest too much.  ;-)  That was quite a dissertation.

As many have noted, getting the concepts right and the terminology 
used is half the battle.  Being sloppy in our terminology causes us 
to make mistakes and others not as closely involved think we mean 
what we say.

Leaving aside the band-aids and years of commentary and 
interpretation (this really does begin to sound more like 
commentaries on the I-Ching than science.), if one carefully works 
through an abstract model of a network to see who has to talk to who 
about what, one never refers to a host.  One is concerned with the 
names of a various protocol machines and the bindings between them, 
but the name of the container on which they reside never comes up.

The fact that talking about host names is a convenient crutch for 
humans to use when thinking about this is nice.  But it is not 
relevant to the science.

>>The only place I have found a host-name useful is for network 
>>management when you want to know about things you are managing that 
>>are in the same system.  But for communication it is pretty much 
>>irrelevant.  Yes, there are special cases where you want to know 
>>that an application is on a particular system but they are just 
>>that: special cases.
>
>Yes, they're useful but they're special cases.  If you have an 
>application that manages the hardware on a host, you still need a 
>way to specifically name that hardware.
>
>>>There are lots of different ways to solve a problem.  TCP could have
>>>been designed to specify the protocol instead of a port.  Then we would
>>>need some sort of kludge to allow multiple instances of a protocol on a
>>>host.  Or it could have been designed to specify both a protocol and an
>>>instance, and applications designed to run on top of TCP would have
>>>needed to specify protocol instance when connecting (much as they
>>>specify port #s now).
>>
>>Actually it shouldn't have been at all.  This is really none of 
>>TCP's business.  TCP implements mechanisms to create a reliable 
>>channel and the pair of port-ids are there to be a 
>>connection-identifier, i. e. identify an instance.  Binding that 
>>channel to a pair of applications is a separate problem.  It was 
>>done in NCP as a short cut, partly because we didn't know any 
>>better and partly because we had bigger problems to solve.  TCP 
>>just did what NCP did.
>
>Well, the OS needs to deliver the data somewhere, and it needs to 
>mediate access to that channel so that (for instance) not just any 
>random process can write to it, not just any random process can read 
>from it, if multiple processes can write to it there are ways of 
>preventing their writes from getting mixed up with one another, etc. 
>Binding the channel to a process isn't the only possible way of 
>doing this, but it's a fairly obvious one.  (Note that UNIX network 
>sockets don't bind a channel to a process - the same socket can be 
>accessed by multiple processes if the socket is explicitly passed, 
>or if the original process forks and its child inherits the socket 
>file descriptor.  But the default arrangement is for a single 
>process to have access to the socket, and therefore the channel.)
>
>IMHO if we really expected multiple processes to share TCP channels 
>then we'd need some sort of structure beyond an octet stream for TCP 
>channels, so we'd know (at the very least) message boundaries within 
>TCP and a stack wouldn't deliver half of a message to one process 
>and half to another.

Again, if you analyze what goes on in a single system, you will find 
that the mechanism for finding the destination application and 
determining whether the requesting application can access it is 
distinct and should be distinct from the problem of providing a 
channel with specific performance properties. I completely agree with 
everything you said above, but TCP is not and should not be the whole 
IPC. It just provides the reliable channel.

Harkening back to the historical discussion, this was all clear 35 
years ago but it was a lot of work to build a network on machines 
with much less compute power than your cell phone and we didn't have 
time for all of the niceties.  We had to show that this was useful 
for something.  So we (like all engineers) took a few short cuts 
knowing they weren't the final answer.  It is just unfortunate that 
the people who came later have not put it right.

>
>>>  > >(Though I'll admit that it has become easier to do so since the URL
>>>>  >name:port convention because popular.  Before that, the port number
>>>>  >was often wired into applications, and it still is often wired into
>>>>  >some applications, such as SMTP.  But a lot of the need to be able
>>>>  >to use alternate port numbers has resulted from the introduction of
>>>>  >NAPTs. Before NAPTs we could get away with assigning multiple IP
>>>>  >addresses (and multiple DNS names) to a host if we needed to run
>>>>  >multiple instances of a service on that host.  And we still find it
>>>>  >convenient to do that for hosts not behind NAPTs, and for hosts
>>>>  >behind firewalls that restrict which ports can be used.)
>>>>
>>>>  URLs try to do what needs to be done.  But we really on have them for
>>>>  HTTP.  It is not easy to use in general.
>>>
>>>URLs are becoming more and more popular.  They're just more visible in
>>>HTTP than in other apps.  and even apps that don't use URLs are often
>>>now able to specify ports.  Every MUA I know of lets you specify ports
>>>for mail submission, POP, and IMAP.  (I think it would be even better
>>>if they let people type in smtp:, pop:, and imap: URls)
>>>
>>
>>As I said before the great thing about software is that you can 
>>heap band-aid upon band aid and call it a system.
>
>Which is a good thing, since we generally do not have the luxury of 
>either doing things right the first time (anticipating all future 
>needs) or scrapping an old architecture and starting again from 
>scratch. Software will always be working around architectural 
>deficiencies. Similarly we have to keep compatibility and transition 
>issues in mind when considering architectural changes.

Be careful.  We do and we don't.  I have known many companies that 
over time have step by step made wholesale *replacement* of major 
parts of their products as they transition.  Sometimes maintaining 
backward compatibility, sometimes not.  But new releases come out 
with completely new designs for parts of the system.  You are arguing 
that nothing is ever replaced and all changes is by modifying what is 
there.  This is the evolution works.  And 99% of its cases end as 
dead-ends in extinction.  With evolution, it doesn't matter there are 
100s of billions of cases.  But when there is one case, the odds 
aren't too good.  (And don't tell me not to worry because the actions 
of the IETF are not random mutations.  There are those that would 
dispute that! ;-))

>But when I look at ports I think "hey, it's a good thing that they 
>didn't nail down the meaning of hosts or ports too much back in the 
>1970s, because we need them to be a bit more flexible today than 
>they needed to be back then."  We don't need significant 
>architectural changes or any protocol or API changes for apps to be 
>able to specify ports, and that gives us useful flexibility today. 
>If ports had been defined differently - say they had been defined as 
>protocols and there were deep assumptions in hosts that (say) port 
>80 would always be HTTP and only port 80 could be HTTP - we'd be 
>having to kludge around it in other, probably more cumbersome, ways.

I am not arguing to nail down ports more.  I am arguing to nail them 
down less.  Suppose there had been no well-known ports at all.  I 
have never known an IPC mechanism in an OS to require anything like 
that.  Why should the 'Net?

>
>I suppose one measure of an architecture is how complex the kludges 
>have to be in order to make things work.  Allowing apps to specify 
>port #s seems like a fairly minor kludge, compared to say having 
>apps build their own p2p networks to work around NAT and firewall 
>limitations.

An architecture that requires kludges has bugs that should be fixed.

>
>>>  > >>>Actually, if you do it right, no one standard value is necessary at
>>>>  >>>all.  You do have to know the name of the application you want to
>>>>  >>>communicate with, but you needed to know that anyway.
>>>>  >
>>>>  >To me this looks like overloading because you might want more than
>>>>  >one instance of the same application on the host.  I'd say that you
>>>>  >need to know the name of the instance of the service you want to
>>>>  >talk to.  Now in an alternate universe that name might be something
>>>>  >like "foo.example.com:web:http:1" - encompassing server name
>>>>  >(foo.example.com), the name of the service (web) protocol name
>>>>  >(http), and an identifier (1) to distinguish one instance of the
>>>>  >same service/protocol from another. But we might not actually need
>>>>  >that much complexity, and it would expose it to traffic analysis
>>>>  >which is good or bad depending on your point-of-view.
>>>>
>>>>  Indeed. The name would have to allow for both type and instance.  As
>>>>  well as applications with multiple application protocols and multiple
>>>>  instances of them.  But this was worked out year ago.
>>>
>>>I won't claim that it can't be done, but is it really needed?  or worth
>>>it?
>>
>>Only for those that need it.  Remember there are lots of people out 
>>there developing applications for the net that will never see the 
>>IETF or the "common" protocols.  These people are struggling to 
>>solve their problems because there are being forced to use the 
>>network equivalent of DOS, because the "new bell-heads" see no need 
>>to have a "Unix".  Our job is to provide the complete tool set in 
>>such a way that if they don't need it doesn't get in their way and 
>>if they do they have it.  We aren't even debating wrenches vs 
>>sockets, we are debating whether nails can't be used for everything.
>
>Okay fine.  But when I try to understand what a good set of tools 
>for these applications developers looks like, the limitations of 
>ports (or even well known ports) seem like fairly minor problems 
>compared to the limitations of NATs, scoped addresses, IPv6 address 
>selection, firewalls that block traffic in arbitrary ways, and 
>interception proxies that alter traffic.  DNS naming looks fairly 
>ugly

They are all part and parcel of the same problem:  The 'Net only has 
half an addressing architecture.

>(especially if we also consider ad hoc and disconnected networks) 
>but not as bad as the above.    Security is really difficult and the 
>tools we have to implement it are the equivalent of stone knives and 
>bearskins.  Network management and host/network configuration seem 
>downright abysmal.  As architectural deficiencies go, ports seem way 
>down on the list.

Ports are a symptom of the disease.  Curing the disease will fix 
ports as well as the much of what  you listed above.

>>>>  Don't think it is a question of want, a question of need.  25 years
>>>>  ago we figured out how much naming and addressing we need but we
>>>>  choose to ignore the answer.
>>>
>>>Care to supply a pointer?
>>
>>RFC 1498
>
>Oh, that.  Rereading it, I think its concept of nodes is a bit 
>dated. But otherwise it's still prescient, it's still useful, and 
>nothing we've built in the Internet really gets this.  It's 
>saddening to read this and realize that we're still conflating 
>concepts that need to be kept separate (like nodes and attachment 
>points, and occasionally nodes and services).
>
>Of course, RFC 1498 does not describe an architecture.  It makes 
>good arguments for what kinds of naming we need in a network 
>protocol suite (applications would need still more kinds of naming, 
>because users do), but it doesn't explain how to implement those 
>bindings and make them robust, scalable, secure.  It's all well and 
>good to say that a node needs to be able to keep its identity when 
>it changes attachment points but that doesn't explain how to 
>efficiently route traffic to that node across changes in attachment 
>points.  etc.

Gee, you want Jerry to do *all* the work for you! ;-)  Given that we 
haven't done it, maybe that is the problem:  No one in the IETF knows 
how to do it.

>Also, there are valid reasons why we sometimes (occasionally) need 
>to conflate those concepts.  Sometimes we need to send traffic to a 
>service or service instance, sometimes we need to send traffic to a 
>node, sometimes we need to send traffic to an attachment point (the 
>usual reasons involve network or hardware management).

I don't believe it.  If you think we need to conflate these concepts 
then you haven't thought it through carefully enough.  We always send 
traffic to an application.  Now there maybe some things you weren't 
thinking of as applications, but they are.   It is true as I said 
earlier, that any application naming must allow for both type and 
instance.  But that is an implementatin nit.

>It's interesting to reflect on how a new port extension mechanism, 
>or replacement for ports as a demux token, would work if it took RFC 
>1498 in mind.  I think it would be a service (instance) selector 
>(not the same thing as a protocol selector) rather than a port 
>number relative to some IP address.  The service selectors would 
>need to be globally unique so that a service could migrate

You still need ports.  But you need to not conflate ports and 
application naming.

>from one node or attachment point to another.  There would need to 
>be some way of doing distributed assignment of service selectors 
>with a reasonable expectation of global uniqueness,

service selectors?  No.  Application-names, yes.

>without collisions.  That wouldn't solve the problem of making 
>services mobile, but it would make certain things feasible. Service 
>instances could register themselves with the equivalent of NAPTs 
>(maybe advertise them periodically using multicast) and those NAPTs 
>would know how to route traffic to them.  This wouldn't begin to 
>solve the problems associated with scoped addresses  - an app would 
>still have to know which address to use from which realm in order to 
>reach the desired service instance - but it would solve the problem 
>we now have with port collisions in NAPTs by apps that insist on 
>using well known ports.

Unnecessary.

>>>>  >In summary: Port numbers are sufficient for the endpoints, and well
>>>>  >known ports are a useful convention for a default or distinguished
>>>>  >instance of a service as long as we don't expect them to be rigidly
>>>>  >adhered to.  The question is: how much information about what
>>>>  >services/protocols are being used should be exposed to the network?
>>>>  >And if we had such a convention for exposing services/protocols to
>>>>  >the network are we in a position to demand that hosts rigidly
>>>>  >enforce that convention?
>>>>
>>>>  Why would you want to?
>>>
>>>I'm not sure we do.  But I'm also not sure why network-visible service
>>>or protocol identifiers are useful if the network can't take them as a
>>>reliable indicator of content.  So absent some means to police them, I
>>>doubt that they are useful at all.
>>
>>Ahhh, then we are in agreement?
>
>on this point at least, I guess so.

Sorry to be so tardy in responding.  Have to find the time.

Take care,
John