[rbridge] TRILL Header/Tag

Silvano Gai sgai at nuovasystems.com
Wed Jun 27 08:17:33 PDT 2007


Donald,

The information contained in your email was known and discussed at the
time of the consensus on the header format. I don't see any reason to
change and I remain strongly in favor of the current format.

-- Silvano



> -----Original Message-----
> From: rbridge-bounces at postel.org [mailto:rbridge-bounces at postel.org]
On
> Behalf Of Eastlake III Donald-LDE008
> Sent: Monday, June 25, 2007 11:09 AM
> To: rbridge at postel.org
> Subject: [rbridge] TRILL Header/Tag
> 
> Hi,
> 
> The current TRILL data frame on an 802.3 link looks like this (the
Outer
> VLAN Tag is not always present):
> 
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                  Outer Destination MAC Address                |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Outer Destination MAC Address | Outer Source MAC Address      |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                    Outer Source MAC Address                   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Ethertype = IEEE 802.1Q       |  Outer.VLAN Tag Information   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> Start "TRILL Header":
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Ethertype = TRILL             |  V  |M|R|Op-Length| Hop Count |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |    Egress (RB2) Nickname      |    Ingress (RB1) Nickname     |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> :End "TRILL Header"
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |               Inner Destination MAC Address                   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Inner Destination MAC Address |  Inner Source MAC Address     |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                    Inner Source MAC Address                   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Ethertype = IEEE 802.1Q       |  Inner.VLAN Tag Information   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |       Payload Ethertype       |                               |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+       Payload                 |
>    |                                                               |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                      FCS (Frame CheckSum)                     |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 
> Where the V field is two zero bits, for Version 0, followed by another
> zero bit for a TRILL data frame (or a 1 bit for a TRILL IS-IS frame).
> That is, V=0 (or V=1). See draft-ietf-trill-rbridge-protocol-04.txt.
> 
> (Side note: there should be a separate discussion about Q-tags
> versus/and/or S-tags but I think that is pretty orthogonal to what I
> want to talk about here.)
> 
> There is no question that this format "works". But I don't see that as
> the point. After all, 803.3 Ethernet would "work" if you moved the
> Destination Address to the end of the frame. But it wouldn't be good
> engineering, because cut through switching or similar optimizations
> would be impossible.
> 
> What I am primarily worried about is the fast path at transit
Rbridges.
> There doesn't seem to be any way to avoid a fair amount of work at the
> ingress Rbridge. And it could be that there would be some
> implementations that wouldn't mind having to grovel deep into a frame
at
> every transit Rbridge. But I think most implementations will want
> transit Rbridge processing of TRILL data frames to be as simple as
> possible. And I should think that anyone who is worrying about
Rbridging
> 10Gbps Ethernet or even fast optical channels should be having
problems
> looking at the current TRILL protocol specification.
> 
> First, lets look at unicast. Seems really nice and efficient, even
with
> the present design. You just decrement the hop count and, if it is
> non-zero, use the egress Rbridge nickname to look up the next hop
> destination MAC address, output port, and Outer VLAN ID if any, and
off
> you go. Well, I think it is just a little more complex than that. In
> particular, what do you do about the Outer VLAN Tag priority field? An
> Outer VLAN Tag added by the previous Rbridge could have been stripped
or
> changed by bridges. You could imagine wanting to configure various
> strategies but I think the right zero configuration default is to
simply
> get the priority from the Inner VLAN Tag. (Of course, if that turns
out
> to be the default priority and no Outer VLAN is needed for
connectivity
> to the next hop Rbridge, no Outer VLAN Tag may be needed at all.) And
> then there are options. They start right after ingress nickname and,
> even if you can get around the priority problem, if there are options
> present, you probably have to look at the first bit or two of the
> options area to determine that none of them apply to you...
> 
> So unicast isn't too bad, but even there I think you have to check a
> couple of bits beyond the header if options are present and the best
> default strategy requires you to delve into the inner frame to check
the
> data priority.
> 
> It's multi-destination frames that are the real problem with the
current
> design for a high speed transit Rbridge that wants to prune the
> distribution tree. And, based on comments on this working group
mailing
> list, many developers will want to prune.
> 
> So, say we receive a multi-destination data frame. You know the
> distribution tree from the "egress" nickname field. But if you want to
> do any pruning, you have to look beyond the TRILL Header. Getting the
> information for pruning by VLAN, you need to look at the Inner VLAN ID
> which may require skipping over options.
> 
> Then we get to multicast pruning. To do that, you might first look at
> the Destination MAC Address. That comes even before the Inner VLAN Tag
> that you had to look at for VLAN pruning, so this doesn't sound too
> hard. But it also turns out that it is not adequate. RFC 4541 on IGMP
> snooping and the like is now out and it points out that there are
ranges
> of special IPv4 and IPv6 multicast addresses for which hosts don't, or
> at least can not be depended on to, issue IGMPs (or MLDs or IPv6).
> Therefore, frames sent to those IP multicast addresses have to be
> treated as broadcast and distribution can't be pruned. But, these IP
> multicast addresses are translated ambiguously to MAC multicast
> addresses just like other IP multicast addresses. So, if we look at
the
> Destination MAC address and find it is an IP derived multicast, we are
> not done.
> 
> For IPv4 derived multicast MAC addresses, there is a range of such
> multicast addresses for which you have to dig even deeper into the
frame
> and look at the actual IPv4 address to see if you can prune or have to
> broadcast. True, for IPv4 it is only a small fraction of the possible
IP
> derived multicast addresses, and an alternative might be to always
> broadcast frames to those IPv4 derived MAC multicast addresses; but if
> some group just happens to be sending their streaming video over a
> multicast group that just happens to map into this area, it may not
work
> out so well for your network that those frames are getting broadcast
> everywhere in that VLAN.
> 
> The situation it much worse for IPv6. In the case of IPv6 derived MAC
> multicast addresses, 100% are potentially for special IPv6 multicast
> addresses that you must broadcast. As a result, if you are going to
> prune IPv6 derived multicast, you *always* have to go look at the
actual
> IPv6 destination address deeper in the frame to decide whether or not
to
> prune.
> 
> Finally, we come to the worst case, detecting IGMP, MLD, etc.
messages.
> Here you have to go beyond the IPv4 or IPv6 destination address, delve
> deeper into the IP header and, in some cases, into the IP packet
content
> just to figure out whether you need to prune the distribution tree to
> branches with IP multicast routers on them for IGMP and MLD (and RFC
> 4541 specifically recommends, due to confusions that can happen
between
> IGMPv3 and earlier version of IGMP, not sending such message to links
> unless they have an IP multicast router on the).
> 
> I really don't think you want to have to look into the content of IP
> packets to make forwarding decisions at transit Rbridges.
> 
> So, what I suggest is something more like the following:
> 
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                  Outer Destination MAC Address                |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Outer Destination MAC Address | Outer Source MAC Address      |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |          Outer Source MAC Address  (last four bytes)          |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    | Ethertype = IEEE 802.1Q       |  Outer.VLAN Tag Information   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> Start "TRILL Tag":
>  = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  = |         TRILL Ethertype       |   V   |M|Op-Length| Hop Count |
>  = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  = |   Egress RBridge Nickname     |  Inner VLAN Tag information   |
>  = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  = |        Inner Destination MAC Address (first four bytes)       |
>  = +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  = |   Inner Dest MAC Adr          |  Ingress RBridge Nickname     |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |           Inner Source MAC Address (first four bytes)         |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |   Rest of Inner Src MAC Adr   |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> End "TRILL Tag":                   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>                                    |       Payload Ethertype       |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |   Payload ...
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>  Final Checksum:
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>    |                       Frame Check Sum                         |
>    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> where
>    V = 0 for TRILL IS-IS frames
>    V = 1 for TRILL data pruned only on VLAN
>    V = 2 for TRILL data pruned on VLAN and IP multicast MAC address
>    V = 3 for TRILL data pruned on VLAN and IP multicast routers
location
> 
> This puts everything that is needed for a transit Rbridge to forward
> frames in the first 128 bits of what I have labeled the TRILL Tag (see
> "=" in left margin). Why did I label it "Tag" rather than "Header"?
> Well, the header paradigm seems to connote that we are adding a header
> in front of an almost complete Ethernet frame. Tag seems like a better
> word if we are constructing a block that goes in front of a frame
> contents, not a frame. In some ways, it just depends on how you look
at
> it.
> 
> For multicast optimization, the investigation into the frame as to
> whether the distribution tree should be (1) only VLAN pruned, (2)
pruned
> on VLAN and based on multicast destination address, or (3) pruned on
> VLAN and branches that have multicast routers, needs to be made only
> once, at the ingress Rbridge, and is then encoded into V.
> 
> The current header is 64-bits and 64-bit aligned (starting after the
> Outer VLAN tag) which is nice, but you always have to look beyond it
for
> every frame. Sometimes way beyond it, including skipping over any
TRILL
> options and IP options, to look into the content of IP packets.
> 
> The suggested TRILL tag is 176 bits long but, unless you are an egress
> Rbridge, you only have to look at the first 128 bits which is 128-bit
> aligned (ignoring the outer VLAN tag like I did for the current
header).
> But since fields are mostly just being moved around, at worst you gain
> or lose two bytes total length over previous proposals. Options would
> start after the destination MAC address so, even if options are
present,
> you can check the first few bits of the options area without going
> outside this 128-bit area.
> 
> This proposal does suppress the Ethertype for the inner VLAN Tag
> information but there is some precedent for that sort of thing. For
> example, 802.16 has a header suppression feature where you can set
some
> flag bits and then leave out Q-tag and/or IPv4 or IPv6 Ethertypes.
> 
> So, while various minor changes could be made, the above is my
> suggestion for significantly improving the simplicity and cut through
> switching latency of handling TRILL data frames at transit Rbridges.
> 
> Since there was a consensus determination in favor of the current
TRILL
> Header, there needs to be a consensus to re-open the question. If
there
> isn't, my fall back position would be to suggest expanding the values
of
> V in the TRILL Header to encode the correct pruning strategy for a
> frame, as listed with the TRILL Tag proposal above. This would not
> eliminate the need to delve into the inner frame whenever a TRILL
frame
> is handled by a transit Rbridge, but would reduce to depth and
> complexity of such delving...
> 
> Thanks,
> Donald
> 
> 
> -----Original Message-----
> From: rbridge-bounces at postel.org [mailto:rbridge-bounces at postel.org]
On
> Behalf Of Anoop Ghanwani
> Sent: Friday, June 22, 2007 6:44 PM
> To: Silvano Gai; Radia Perlman
> Cc: rbridge at postel.org
> Subject: Re: [rbridge] per-VLAN instances of IS-IS
> 
> 
> I don't see much of an advantage to changing things because
> it doesn't change the fact that rBridges rely on the
> "inner frame" to get correct pruning behavior for multicasts.
> 
> However, I agree with Silvano.  _If_ something is going
> to change, let's try and settle it as soon as we possibly can.
> 
> Anoop
> 
> > -----Original Message-----
> > From: Silvano Gai [mailto:sgai at nuovasystems.com]
> > Sent: Thursday, June 21, 2007 7:09 AM
> > To: Radia Perlman; Anoop Ghanwani
> > Cc: Caitlin Bestler; Eric Gray (LO/EUS); rbridge at postel.org
> > Subject: RE: [rbridge] per-VLAN instances of IS-IS
> >
> > Radia,
> >
> > This information was known at the time of the consensus.
> > I still remain in favor of the header we selected.
> > If somebody wants to change his/her mind they can email this
> > list (SOON).
> >
> >
> > -- Silvano
> >
> > > -----Original Message-----
> > > From: Radia Perlman [mailto:Radia.Perlman at sun.com]
> > > Sent: Wednesday, June 20, 2007 12:57 PM
> > > To: Anoop Ghanwani
> > > Cc: Caitlin Bestler; Eric Gray (LO/EUS); Silvano Gai;
> > rbridge at postel.org
> > > Subject: Re: [rbridge] per-VLAN instances of IS-IS
> > >
> > > As Anoop said, core RBridges need to look inside the
> > tunneled packet
> > > at the inner packet in order to do multicast pruning as
> > well as VLAN
> > > pruning on multicast packets.
> > >
> > > Which is why Don was proposing moving both the VLAN tag and the
> > > destination multicast address to the TRILL header. (and
> > removing the
> > > VLAN tag and DA from the inner packet so that it wouldn't
> > mean adding
> > > an extra 6 bytes to an encapsulated frame).
> > >
> > > Seemed like people didn't like doing that, but I want to
> > make sure if
> > > the WG really is deciding not to do that, that they really
> > understand
> > > what they are deciding against.
> > > Often in the email threads so many different things are kind of
> > > discussed at the same time that it's obvious that people
> > (definitely
> > > including me) are getting confused.
> > >
> > > Radia
> > >
> > >
> > >
> > >
> > > Anoop Ghanwani wrote:
> > > >
> > > > They're snooping because they too care about pruning
> > their trees for
> > > > a given multicast group and that would be one of the ways
> > to achieve
> > > > it.
> > > > Otherwise, all multicasts would have to be broadcast on
> > the spanning
> > > > tree interconnecting the bridged network that sits
> > between the two
> > > > rBridges.
> > > >
> > > >
> > > > If we didn't care about multicast pruning, we wouldn't
> > need to worry
> > > > about any of this.
> 
> _______________________________________________
> rbridge mailing list
> rbridge at postel.org
> http://mailman.postel.org/mailman/listinfo/rbridge



More information about the rbridge mailing list