[rbridge] TRILL Header/Tag
Silvano Gai
sgai at nuovasystems.com
Wed Jun 27 08:17:33 PDT 2007
Donald,
The information contained in your email was known and discussed at the
time of the consensus on the header format. I don't see any reason to
change and I remain strongly in favor of the current format.
-- Silvano
> -----Original Message-----
> From: rbridge-bounces at postel.org [mailto:rbridge-bounces at postel.org]
On
> Behalf Of Eastlake III Donald-LDE008
> Sent: Monday, June 25, 2007 11:09 AM
> To: rbridge at postel.org
> Subject: [rbridge] TRILL Header/Tag
>
> Hi,
>
> The current TRILL data frame on an 802.3 link looks like this (the
Outer
> VLAN Tag is not always present):
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Destination MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Destination MAC Address | Outer Source MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Source MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Ethertype = IEEE 802.1Q | Outer.VLAN Tag Information |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> Start "TRILL Header":
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Ethertype = TRILL | V |M|R|Op-Length| Hop Count |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Egress (RB2) Nickname | Ingress (RB1) Nickname |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> :End "TRILL Header"
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Inner Destination MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Inner Destination MAC Address | Inner Source MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Inner Source MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Ethertype = IEEE 802.1Q | Inner.VLAN Tag Information |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Payload Ethertype | |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Payload |
> | |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | FCS (Frame CheckSum) |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>
> Where the V field is two zero bits, for Version 0, followed by another
> zero bit for a TRILL data frame (or a 1 bit for a TRILL IS-IS frame).
> That is, V=0 (or V=1). See draft-ietf-trill-rbridge-protocol-04.txt.
>
> (Side note: there should be a separate discussion about Q-tags
> versus/and/or S-tags but I think that is pretty orthogonal to what I
> want to talk about here.)
>
> There is no question that this format "works". But I don't see that as
> the point. After all, 803.3 Ethernet would "work" if you moved the
> Destination Address to the end of the frame. But it wouldn't be good
> engineering, because cut through switching or similar optimizations
> would be impossible.
>
> What I am primarily worried about is the fast path at transit
Rbridges.
> There doesn't seem to be any way to avoid a fair amount of work at the
> ingress Rbridge. And it could be that there would be some
> implementations that wouldn't mind having to grovel deep into a frame
at
> every transit Rbridge. But I think most implementations will want
> transit Rbridge processing of TRILL data frames to be as simple as
> possible. And I should think that anyone who is worrying about
Rbridging
> 10Gbps Ethernet or even fast optical channels should be having
problems
> looking at the current TRILL protocol specification.
>
> First, lets look at unicast. Seems really nice and efficient, even
with
> the present design. You just decrement the hop count and, if it is
> non-zero, use the egress Rbridge nickname to look up the next hop
> destination MAC address, output port, and Outer VLAN ID if any, and
off
> you go. Well, I think it is just a little more complex than that. In
> particular, what do you do about the Outer VLAN Tag priority field? An
> Outer VLAN Tag added by the previous Rbridge could have been stripped
or
> changed by bridges. You could imagine wanting to configure various
> strategies but I think the right zero configuration default is to
simply
> get the priority from the Inner VLAN Tag. (Of course, if that turns
out
> to be the default priority and no Outer VLAN is needed for
connectivity
> to the next hop Rbridge, no Outer VLAN Tag may be needed at all.) And
> then there are options. They start right after ingress nickname and,
> even if you can get around the priority problem, if there are options
> present, you probably have to look at the first bit or two of the
> options area to determine that none of them apply to you...
>
> So unicast isn't too bad, but even there I think you have to check a
> couple of bits beyond the header if options are present and the best
> default strategy requires you to delve into the inner frame to check
the
> data priority.
>
> It's multi-destination frames that are the real problem with the
current
> design for a high speed transit Rbridge that wants to prune the
> distribution tree. And, based on comments on this working group
mailing
> list, many developers will want to prune.
>
> So, say we receive a multi-destination data frame. You know the
> distribution tree from the "egress" nickname field. But if you want to
> do any pruning, you have to look beyond the TRILL Header. Getting the
> information for pruning by VLAN, you need to look at the Inner VLAN ID
> which may require skipping over options.
>
> Then we get to multicast pruning. To do that, you might first look at
> the Destination MAC Address. That comes even before the Inner VLAN Tag
> that you had to look at for VLAN pruning, so this doesn't sound too
> hard. But it also turns out that it is not adequate. RFC 4541 on IGMP
> snooping and the like is now out and it points out that there are
ranges
> of special IPv4 and IPv6 multicast addresses for which hosts don't, or
> at least can not be depended on to, issue IGMPs (or MLDs or IPv6).
> Therefore, frames sent to those IP multicast addresses have to be
> treated as broadcast and distribution can't be pruned. But, these IP
> multicast addresses are translated ambiguously to MAC multicast
> addresses just like other IP multicast addresses. So, if we look at
the
> Destination MAC address and find it is an IP derived multicast, we are
> not done.
>
> For IPv4 derived multicast MAC addresses, there is a range of such
> multicast addresses for which you have to dig even deeper into the
frame
> and look at the actual IPv4 address to see if you can prune or have to
> broadcast. True, for IPv4 it is only a small fraction of the possible
IP
> derived multicast addresses, and an alternative might be to always
> broadcast frames to those IPv4 derived MAC multicast addresses; but if
> some group just happens to be sending their streaming video over a
> multicast group that just happens to map into this area, it may not
work
> out so well for your network that those frames are getting broadcast
> everywhere in that VLAN.
>
> The situation it much worse for IPv6. In the case of IPv6 derived MAC
> multicast addresses, 100% are potentially for special IPv6 multicast
> addresses that you must broadcast. As a result, if you are going to
> prune IPv6 derived multicast, you *always* have to go look at the
actual
> IPv6 destination address deeper in the frame to decide whether or not
to
> prune.
>
> Finally, we come to the worst case, detecting IGMP, MLD, etc.
messages.
> Here you have to go beyond the IPv4 or IPv6 destination address, delve
> deeper into the IP header and, in some cases, into the IP packet
content
> just to figure out whether you need to prune the distribution tree to
> branches with IP multicast routers on them for IGMP and MLD (and RFC
> 4541 specifically recommends, due to confusions that can happen
between
> IGMPv3 and earlier version of IGMP, not sending such message to links
> unless they have an IP multicast router on the).
>
> I really don't think you want to have to look into the content of IP
> packets to make forwarding decisions at transit Rbridges.
>
> So, what I suggest is something more like the following:
>
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Destination MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Destination MAC Address | Outer Source MAC Address |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Outer Source MAC Address (last four bytes) |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Ethertype = IEEE 802.1Q | Outer.VLAN Tag Information |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> Start "TRILL Tag":
> = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> = | TRILL Ethertype | V |M|Op-Length| Hop Count |
> = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> = | Egress RBridge Nickname | Inner VLAN Tag information |
> = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> = | Inner Destination MAC Address (first four bytes) |
> = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> = | Inner Dest MAC Adr | Ingress RBridge Nickname |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Inner Source MAC Address (first four bytes) |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Rest of Inner Src MAC Adr |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> End "TRILL Tag": +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Payload Ethertype |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Payload ...
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> Final Checksum:
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> | Frame Check Sum |
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> where
> V = 0 for TRILL IS-IS frames
> V = 1 for TRILL data pruned only on VLAN
> V = 2 for TRILL data pruned on VLAN and IP multicast MAC address
> V = 3 for TRILL data pruned on VLAN and IP multicast routers
location
>
> This puts everything that is needed for a transit Rbridge to forward
> frames in the first 128 bits of what I have labeled the TRILL Tag (see
> "=" in left margin). Why did I label it "Tag" rather than "Header"?
> Well, the header paradigm seems to connote that we are adding a header
> in front of an almost complete Ethernet frame. Tag seems like a better
> word if we are constructing a block that goes in front of a frame
> contents, not a frame. In some ways, it just depends on how you look
at
> it.
>
> For multicast optimization, the investigation into the frame as to
> whether the distribution tree should be (1) only VLAN pruned, (2)
pruned
> on VLAN and based on multicast destination address, or (3) pruned on
> VLAN and branches that have multicast routers, needs to be made only
> once, at the ingress Rbridge, and is then encoded into V.
>
> The current header is 64-bits and 64-bit aligned (starting after the
> Outer VLAN tag) which is nice, but you always have to look beyond it
for
> every frame. Sometimes way beyond it, including skipping over any
TRILL
> options and IP options, to look into the content of IP packets.
>
> The suggested TRILL tag is 176 bits long but, unless you are an egress
> Rbridge, you only have to look at the first 128 bits which is 128-bit
> aligned (ignoring the outer VLAN tag like I did for the current
header).
> But since fields are mostly just being moved around, at worst you gain
> or lose two bytes total length over previous proposals. Options would
> start after the destination MAC address so, even if options are
present,
> you can check the first few bits of the options area without going
> outside this 128-bit area.
>
> This proposal does suppress the Ethertype for the inner VLAN Tag
> information but there is some precedent for that sort of thing. For
> example, 802.16 has a header suppression feature where you can set
some
> flag bits and then leave out Q-tag and/or IPv4 or IPv6 Ethertypes.
>
> So, while various minor changes could be made, the above is my
> suggestion for significantly improving the simplicity and cut through
> switching latency of handling TRILL data frames at transit Rbridges.
>
> Since there was a consensus determination in favor of the current
TRILL
> Header, there needs to be a consensus to re-open the question. If
there
> isn't, my fall back position would be to suggest expanding the values
of
> V in the TRILL Header to encode the correct pruning strategy for a
> frame, as listed with the TRILL Tag proposal above. This would not
> eliminate the need to delve into the inner frame whenever a TRILL
frame
> is handled by a transit Rbridge, but would reduce to depth and
> complexity of such delving...
>
> Thanks,
> Donald
>
>
> -----Original Message-----
> From: rbridge-bounces at postel.org [mailto:rbridge-bounces at postel.org]
On
> Behalf Of Anoop Ghanwani
> Sent: Friday, June 22, 2007 6:44 PM
> To: Silvano Gai; Radia Perlman
> Cc: rbridge at postel.org
> Subject: Re: [rbridge] per-VLAN instances of IS-IS
>
>
> I don't see much of an advantage to changing things because
> it doesn't change the fact that rBridges rely on the
> "inner frame" to get correct pruning behavior for multicasts.
>
> However, I agree with Silvano. _If_ something is going
> to change, let's try and settle it as soon as we possibly can.
>
> Anoop
>
> > -----Original Message-----
> > From: Silvano Gai [mailto:sgai at nuovasystems.com]
> > Sent: Thursday, June 21, 2007 7:09 AM
> > To: Radia Perlman; Anoop Ghanwani
> > Cc: Caitlin Bestler; Eric Gray (LO/EUS); rbridge at postel.org
> > Subject: RE: [rbridge] per-VLAN instances of IS-IS
> >
> > Radia,
> >
> > This information was known at the time of the consensus.
> > I still remain in favor of the header we selected.
> > If somebody wants to change his/her mind they can email this
> > list (SOON).
> >
> >
> > -- Silvano
> >
> > > -----Original Message-----
> > > From: Radia Perlman [mailto:Radia.Perlman at sun.com]
> > > Sent: Wednesday, June 20, 2007 12:57 PM
> > > To: Anoop Ghanwani
> > > Cc: Caitlin Bestler; Eric Gray (LO/EUS); Silvano Gai;
> > rbridge at postel.org
> > > Subject: Re: [rbridge] per-VLAN instances of IS-IS
> > >
> > > As Anoop said, core RBridges need to look inside the
> > tunneled packet
> > > at the inner packet in order to do multicast pruning as
> > well as VLAN
> > > pruning on multicast packets.
> > >
> > > Which is why Don was proposing moving both the VLAN tag and the
> > > destination multicast address to the TRILL header. (and
> > removing the
> > > VLAN tag and DA from the inner packet so that it wouldn't
> > mean adding
> > > an extra 6 bytes to an encapsulated frame).
> > >
> > > Seemed like people didn't like doing that, but I want to
> > make sure if
> > > the WG really is deciding not to do that, that they really
> > understand
> > > what they are deciding against.
> > > Often in the email threads so many different things are kind of
> > > discussed at the same time that it's obvious that people
> > (definitely
> > > including me) are getting confused.
> > >
> > > Radia
> > >
> > >
> > >
> > >
> > > Anoop Ghanwani wrote:
> > > >
> > > > They're snooping because they too care about pruning
> > their trees for
> > > > a given multicast group and that would be one of the ways
> > to achieve
> > > > it.
> > > > Otherwise, all multicasts would have to be broadcast on
> > the spanning
> > > > tree interconnecting the bridged network that sits
> > between the two
> > > > rBridges.
> > > >
> > > >
> > > > If we didn't care about multicast pruning, we wouldn't
> > need to worry
> > > > about any of this.
>
> _______________________________________________
> rbridge mailing list
> rbridge at postel.org
> http://mailman.postel.org/mailman/listinfo/rbridge
More information about the rbridge
mailing list