From james.d.carlson at Sun.COM Mon Jun 1 04:49:47 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Mon, 1 Jun 2009 07:49:47 -0400 Subject: [rbridge] Hop Count processing In-Reply-To: <4A2075E1.7050801@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> Message-ID: <18979.49243.746839.801203@gargle.gargle.HOWL> Joe Touch writes: > > Much better. The "forwarding" part is the qualifier most likely to > > confuse, and it doesn't actually change the meaning. > > First, anyone who doesn't know what forwarding means is going to have a > lot of problems implementing an rbridge. There are at least two possible senses of "forwarding" here. One would be examining the TRILL header's destination nickname, finding that it's not equal to the local system's nickname, and sending the message on its merry way. Another would be decapsulating the TRILL header and then _forwarding_ the packet inside to the bridge links. That's why I said it's likely to confuse. And it doesn't change the meaning with respect to what Dinesh, Radia, and I (among others) have been suggesting for TRILL. That would be: - Check the Hop Count value on input, and drop the packet if 0, *regardless* of the destination nickname. - When you find that the destination nickname is not your own, and that (due to IS-IS) you have a next hop and link for the destination nickname, decrement the Hop Count and forward the packet along without checking the new Hop Count value. > Third, it not only changes the meaning, it changes the behavior. See the > walkthrough in my previous mail. The change is intentional. The existing Hop Count behavior (as pointed out by Dinesh) would not work if we wanted to trace the path of packets through the network. Altering it as suggested does support that goal. I agree that it's not at all like IPv4. "Normal" hop count / TTL processing involves ignoring the value completely at input time, and checking it only when attempting to forward -- after determining that the packet isn't intended for the local system. -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Mon Jun 1 07:03:45 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 07:03:45 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <18979.49243.746839.801203@gargle.gargle.HOWL> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> Message-ID: <4A23DFC1.9070302@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 James Carlson wrote: > Joe Touch writes: >>> Much better. The "forwarding" part is the qualifier most likely to >>> confuse, and it doesn't actually change the meaning. >> First, anyone who doesn't know what forwarding means is going to have a >> lot of problems implementing an rbridge. > > There are at least two possible senses of "forwarding" here. One > would be examining the TRILL header's destination nickname, finding > that it's not equal to the local system's nickname, and sending the > message on its merry way. Another would be decapsulating the TRILL > header and then _forwarding_ the packet inside to the bridge links. That's solved easily by referring to the former as forwarding the TRILL packet. Further, bridges don't forward segments; they switch them (i.e., forwarding is usually a L3 term), so the final step would be a "decapsulate and switch" operation, but as I noted, that is solved above. > That's why I said it's likely to confuse. > > And it doesn't change the meaning with respect to what Dinesh, Radia, > and I (among others) have been suggesting for TRILL. That would be: > > - Check the Hop Count value on input, and drop the packet if > 0, *regardless* of the destination nickname. > > - When you find that the destination nickname is not your own, > and that (due to IS-IS) you have a next hop and link for the > destination nickname, decrement the Hop Count and forward > the packet along without checking the new Hop Count value. By removing 'forwarding' from the language I proposed, Radia removed behavior that happens only at TRILL forwarding steps - which is what you've just described. >> Third, it not only changes the meaning, it changes the behavior. See the >> walkthrough in my previous mail. > > The change is intentional. Well, you said (and I disagreed) that: " The "forwarding" part is the qualifier most likely to confuse, and it doesn't actually change the meaning." My point is that it DOES change the meaning. It is clearly important to include. > The existing Hop Count behavior (as > pointed out by Dinesh) would not work if we wanted to trace the path > of packets through the network. Please explain that, and use an example that shows hopcounts for ingress-egress paths that include 0, 1, and 2 rbridges (as I have > I agree that it's not at all like IPv4. "Normal" hop count / TTL > processing involves ignoring the value completely at input time, and > checking it only when attempting to forward -- after determining that > the packet isn't intended for the local system. That second step is required anyway. Why is it important to consider hopcount for packets that have already been received? That seems to say "OK, we limit the hops in a network, but drop you if your hopcount is wrong even if you've made it to the destination". Why is that useful for traceroute? (I'm guessing that one of you thinks that we can get away without a "ping" message, i.e., that we can trace regular packets. That won't work, however, if the egress rbridge has more than one egress address, as I discussed.) Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoj38EACgkQE5f5cImnZrueHgCguf2IjMMCi4Y59uUCGBcMg019 /LcAnRDVXX0R+qTqNb1g25qahbRhMYq1 =jgI5 -----END PGP SIGNATURE----- From ddutt at cisco.com Mon Jun 1 08:02:36 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Mon, 01 Jun 2009 08:02:36 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23DFC1.9070302@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> Message-ID: <4A23ED8C.7050700@cisco.com> Using regular frames was my intention. Since end stations are not invlved, I can test actual customer frame and thereby know the exact path for a frame. Joe Touch wrote: > Why is that useful for traceroute? > > (I'm guessing that one of you thinks that we can get away without a > "ping" message, i.e., that we can trace regular packets. That won't > work, however, if the egress rbridge has more than one egress address, > as I discussed.) > > Do you mean for multi-dst addresses ? Dinesh -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From touch at ISI.EDU Mon Jun 1 08:30:01 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 08:30:01 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23ED8C.7050700@cisco.com> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <4A23ED8C.7050700@cisco.com> Message-ID: <4A23F3F9.4060902@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dinesh G Dutt wrote: > Using regular frames was my intention. Since end stations are not > invlved, I can test actual customer frame and thereby know the exact > path for a frame. That won't work as per below. > Joe Touch wrote: >> Why is that useful for traceroute? >> >> (I'm guessing that one of you thinks that we can get away without a >> "ping" message, i.e., that we can trace regular packets. That won't >> work, however, if the egress rbridge has more than one egress address, >> as I discussed.) >> >> > Do you mean for multi-dst addresses ? I believe I can have an rbridge that sits on two different ethernet segments, i.e., that has two different egress addresses. When I get something with hopcount=0, I respond back with an "error hopcount exceeded". I need to pick one of the egress tags to use as the message source address. I'm presuming I pick the "canonical address" of that rbridge, for which there would be only one. Consider a frame sent to the non-canonical egress. You would receive back a sequence of error messages: rbridge1 rbridge2 rbridge3 ... rbridge_last Now one of two things has happened. Either the last rbridge was the correct one that decapsulated the packet, or it wasn't. The received source address of the error message will not match in both cases. So how do you know you reached the last hop? Why is it even important to be able to do a traceroute with a regular packet? Why not just require something like PING, i.e., a TRILL message that, once received at the destination address in the TRILL header, responds with a "success" response? Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoj8/kACgkQE5f5cImnZrtJxACfWRpCTqUO4X6z8o9AW/i07NoC UK4An1CMJT3BYM9F2/cFrjF7bhZDFbuH =efIl -----END PGP SIGNATURE----- From ddutt at cisco.com Mon Jun 1 08:46:41 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Mon, 01 Jun 2009 08:46:41 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23F3F9.4060902@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> Message-ID: <4A23F7E1.808@cisco.com> When I send a unicast traceroute, I put in an address of the egress Rbridge I expect is the final destination (known by looking at my local MAC table, for example). Then, why do I have a problem ? I guess I don't understand what you're saying below, Dinesh Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Dinesh G Dutt wrote: > >> Using regular frames was my intention. Since end stations are not >> invlved, I can test actual customer frame and thereby know the exact >> path for a frame. >> > > That won't work as per below. > > >> Joe Touch wrote: >> >>> Why is that useful for traceroute? >>> >>> (I'm guessing that one of you thinks that we can get away without a >>> "ping" message, i.e., that we can trace regular packets. That won't >>> work, however, if the egress rbridge has more than one egress address, >>> as I discussed.) >>> >>> >>> >> Do you mean for multi-dst addresses ? >> > > I believe I can have an rbridge that sits on two different ethernet > segments, i.e., that has two different egress addresses. > > When I get something with hopcount=0, I respond back with an "error > hopcount exceeded". I need to pick one of the egress tags to use as the > message source address. I'm presuming I pick the "canonical address" of > that rbridge, for which there would be only one. > > Consider a frame sent to the non-canonical egress. You would receive > back a sequence of error messages: > > rbridge1 > rbridge2 > rbridge3 > ... > rbridge_last > > Now one of two things has happened. Either the last rbridge was the > correct one that decapsulated the packet, or it wasn't. The received > source address of the error message will not match in both cases. > > So how do you know you reached the last hop? > > Why is it even important to be able to do a traceroute with a regular > packet? Why not just require something like PING, i.e., a TRILL message > that, once received at the destination address in the TRILL header, > responds with a "success" response? > > Joe > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkoj8/kACgkQE5f5cImnZrtJxACfWRpCTqUO4X6z8o9AW/i07NoC > UK4An1CMJT3BYM9F2/cFrjF7bhZDFbuH > =efIl > -----END PGP SIGNATURE----- > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From james.d.carlson at Sun.COM Mon Jun 1 08:50:40 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Mon, 1 Jun 2009 11:50:40 -0400 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23F3F9.4060902@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> Message-ID: <18979.63696.516353.386890@gargle.gargle.HOWL> Joe Touch writes: > When I get something with hopcount=0, I respond back with an "error > hopcount exceeded". I need to pick one of the egress tags to use as the > message source address. I'm presuming I pick the "canonical address" of > that rbridge, for which there would be only one. You'd pick the MAC address of the interface over which you received the frame. > So how do you know you reached the last hop? Because frames with higher Hop Count never return. > Why is it even important to be able to do a traceroute with a regular > packet? Why not just require something like PING, i.e., a TRILL message > that, once received at the destination address in the TRILL header, > responds with a "success" response? That's good question. I have exactly the same concern -- basically, that we're designing the 'traceroute' infrastructure before we know anything about how the functionality will be implemented -- but Dinesh convinced me that it doesn't matter. If we do this as Dinesh has argued, then we'll always have the drop information from that last hop. If we don't, then we won't. That means that the method proposed provides additional information without significant expense. I agree that a decent 'traceroute' feature should use a special packet that allows us to send back "and here's what I'd do with that once I've decapsulated" information from the last (destination) TRILL node. -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Mon Jun 1 08:55:22 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 08:55:22 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23F7E1.808@cisco.com> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> <4A23F7E1.808@cisco.com> Message-ID: <4A23F9EA.5080107@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dinesh G Dutt wrote: > When I send a unicast traceroute, I put in an address of the egress > Rbridge I expect is the final destination (known by looking at my local > MAC table, for example). Then, why do I have a problem ? I guess I don't > understand what you're saying below, I said it below fairly directly: 1) let there be an rbridge, Y, with two egress addresses C and D let there be an rbridge, X, with ingress address A Note: C and D are different so that Y can be attached to two different segments; packets decapsulated with C go onto one segment, packets decapsulated with D go to the other. 2) send a packet through an rbridge encapsulated with the TRILL header "A->C" (A is ingress; C is egress) Send the packet through with a hopcount such that it reaches Y with hopcount=0. At that point, Y will send a "hopcount exceeded" packet back to X. What is the source address of that packet? - ---- Note: I'm very confused at the purpose of wanting see the path of a "user packet". TRILL doesn't examine anything but the TRILL headers anyway. Users can't control the TRILL hopcount by the segments they send. Traceroute works only if the segment is resent with increasing hopcounts anyway - which users can't do. Joe > Joe Touch wrote: > > > Dinesh G Dutt wrote: > >>>> Using regular frames was my intention. Since end stations are not >>>> invlved, I can test actual customer frame and thereby know the exact >>>> path for a frame. >>>> > > That won't work as per below. > > >>>> Joe Touch wrote: >>>> >>>>> Why is that useful for traceroute? >>>>> >>>>> (I'm guessing that one of you thinks that we can get away without a >>>>> "ping" message, i.e., that we can trace regular packets. That won't >>>>> work, however, if the egress rbridge has more than one egress address, >>>>> as I discussed.) >>>>> >>>>> >>>> Do you mean for multi-dst addresses ? >>>> > > I believe I can have an rbridge that sits on two different ethernet > segments, i.e., that has two different egress addresses. > > When I get something with hopcount=0, I respond back with an "error > hopcount exceeded". I need to pick one of the egress tags to use as the > message source address. I'm presuming I pick the "canonical address" of > that rbridge, for which there would be only one. > > Consider a frame sent to the non-canonical egress. You would receive > back a sequence of error messages: > > rbridge1 > rbridge2 > rbridge3 > ... > rbridge_last > > Now one of two things has happened. Either the last rbridge was the > correct one that decapsulated the packet, or it wasn't. The received > source address of the error message will not match in both cases. > > So how do you know you reached the last hop? > > Why is it even important to be able to do a traceroute with a regular > packet? Why not just require something like PING, i.e., a TRILL message > that, once received at the destination address in the TRILL header, > responds with a "success" response? > > Joe >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoj+eoACgkQE5f5cImnZrsIiQCeJbXrqJVlwM8rkat5CF42Itcd kr4An1d6owEJCN0Ks4QgHNNJxSmaeTGx =Oucu -----END PGP SIGNATURE----- From james.d.carlson at Sun.COM Mon Jun 1 08:56:19 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Mon, 1 Jun 2009 11:56:19 -0400 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23DFC1.9070302@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> Message-ID: <18979.64035.861405.382919@gargle.gargle.HOWL> Joe Touch writes: > That's solved easily by referring to the former as forwarding the TRILL > packet. > > Further, bridges don't forward segments; they switch them (i.e., > forwarding is usually a L3 term), so the final step would be a > "decapsulate and switch" operation, but as I noted, that is solved above. Not per the standards. 802.1D calls this "forwarding" and "switching" sounds like marketing to me ... but I guess I don't care to argue the point. > By removing 'forwarding' from the language I proposed, Radia removed > behavior that happens only at TRILL forwarding steps - which is what > you've just described. Yes; just explaining the idea. > >> Third, it not only changes the meaning, it changes the behavior. See the > >> walkthrough in my previous mail. > > > > The change is intentional. > > Well, you said (and I disagreed) that: > > " The "forwarding" part is the qualifier most likely to > confuse, and it doesn't actually change the meaning." > > My point is that it DOES change the meaning. It is clearly important to > include. If you include it, and if it does mean something, then it breaks the proposal that Radia, Dinesh, and others are putting forward. > > The existing Hop Count behavior (as > > pointed out by Dinesh) would not work if we wanted to trace the path > > of packets through the network. > > Please explain that, and use an example that shows hopcounts for > ingress-egress paths that include 0, 1, and 2 rbridges (as I have For ingress sending directly to egress, you'd send with a Hop Count of 0, and the target would detect this as a drop condition on input and reply. Higher hop counts would not return. For ingress sending through one TRILL hop to the egress, you'd send with Hop Count of 0, get a response from the hop node, then send with 1, and get a response from the destination (final) node as above. For higher numbers, prove by induction. > > I agree that it's not at all like IPv4. "Normal" hop count / TTL > > processing involves ignoring the value completely at input time, and > > checking it only when attempting to forward -- after determining that > > the packet isn't intended for the local system. > > That second step is required anyway. Why is it important to consider > hopcount for packets that have already been received? That seems to say > "OK, we limit the hops in a network, but drop you if your hopcount is > wrong even if you've made it to the destination". Yes, that's exactly what Dinesh is proposing. > Why is that useful for traceroute? It allows us to capture a drop at the last hop. (Dinesh insists that it's also how routers have "always" worked, but I have my doubts about that.) > (I'm guessing that one of you thinks that we can get away without a > "ping" message, i.e., that we can trace regular packets. Exactly. (Of course, without an actual traceroute-for-TRILL proposal in front of us, it's hard for me to argue that this is in fact how it should work.) > That won't > work, however, if the egress rbridge has more than one egress address, > as I discussed.) I don't follow. A TRILL "traceroute" can't see anything past the TRILL destination node, so who cares how L2 is forwarded^Wswitched past there? -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Mon Jun 1 09:00:29 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 09:00:29 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <18979.63696.516353.386890@gargle.gargle.HOWL> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> <18979.63696.516353.386890@gargle.gargle.HOWL> Message-ID: <4A23FB1D.5000907@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 James Carlson wrote: > Joe Touch writes: >> When I get something with hopcount=0, I respond back with an "error >> hopcount exceeded". I need to pick one of the egress tags to use as the >> message source address. I'm presuming I pick the "canonical address" of >> that rbridge, for which there would be only one. > > You'd pick the MAC address of the interface over which you received > the frame. What if there are multiple such addresses (i.e., MAC overloading)? >> So how do you know you reached the last hop? > > Because frames with higher Hop Count never return. That could also mean: - the return error message was lost - routing had an incomplete path (never reaches the dest) >> Why is it even important to be able to do a traceroute with a regular >> packet? Why not just require something like PING, i.e., a TRILL message >> that, once received at the destination address in the TRILL header, >> responds with a "success" response? > > That's good question. I have exactly the same concern -- basically, > that we're designing the 'traceroute' infrastructure before we know > anything about how the functionality will be implemented -- but Dinesh > convinced me that it doesn't matter. > > If we do this as Dinesh has argued, then we'll always have the drop > information from that last hop. But you don't know it's the last hop, as per above. > If we don't, then we won't. That > means that the method proposed provides additional information without > significant expense. As in my other message, this doesn't correlate to user packet behavior anyway. We need to keep sending the same packet repeatedly with increasing hopcount, which is a separate function that needs to be implemented. That function has no benefit from using a user packet as payload, since the TRILL header hides that content anyway. > I agree that a decent 'traceroute' feature should use a special packet > that allows us to send back "and here's what I'd do with that once > I've decapsulated" information from the last (destination) TRILL node. That'd be "PING" ('echo request') (i.e., that's how traceroute works in IPv4/IPv6). Joe Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoj+x0ACgkQE5f5cImnZruAygCdFyYo9NzptZoMXtYRiP1icVzL j4EAoJ9t0pyobWvGFsz6xF2wlkI0Oz2J =e7fW -----END PGP SIGNATURE----- From touch at ISI.EDU Mon Jun 1 09:18:09 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 09:18:09 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <18979.64035.861405.382919@gargle.gargle.HOWL> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <18979.64035.861405.382919@gargle.gargle.HOWL> Message-ID: <4A23FF41.1000305@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 James Carlson wrote: ... >> Well, you said (and I disagreed) that: >> >> " The "forwarding" part is the qualifier most likely to >> confuse, and it doesn't actually change the meaning." >> >> My point is that it DOES change the meaning. It is clearly important to >> include. > > If you include it, and if it does mean something, then it breaks the > proposal that Radia, Dinesh, and others are putting forward. There are different things here: 1) what hopcount does a packet have that successfully reaches its destination? we can make that 0 (as with IPv4 and IPv6) we can make that 1 Making it 1 has the effect of making a traceroute send a "hopcount exceeded" from the rbridge at the destination, but unfortunately there are ways in which that message might not be interpreted as "it got there" (i.e., it could be "this is the last rbridge I saw, but I have no idea where it got). 2) whether we need to refer to rbridge forwarding in the description of hopcount processing we absolutely do. I've seen three agreements that claim we don't, but I've demonstrated this is not possible. We need to differentiate how we handle packets that have arrived at their destination (which are decapsulated and their payload is sent out an interface) from packets that are forwarded (which are sent out an interface themselves). I.e., we need to distinguish between receiving and forwarding *TRILL* frames. When we receive a TRILL frame, we forward (if you like that term) its payload, but not the TRILL frame. >>> The existing Hop Count behavior (as >>> pointed out by Dinesh) would not work if we wanted to trace the path >>> of packets through the network. >> Please explain that, and use an example that shows hopcounts for >> ingress-egress paths that include 0, 1, and 2 rbridges (as I have > > For ingress sending directly to egress, you'd send with a Hop Count of > 0, and the target would detect this as a drop condition on input and > reply. Higher hop counts would not return. > > For ingress sending through one TRILL hop to the egress, you'd send > with Hop Count of 0, get a response from the hop node, then send with > 1, and get a response from the destination (final) node as above. Agreed, though as I've shown: a) we still need active participation by the source rbridge (this cannot be initiated by a user's packet) - to set the hopcount - to repeatedly send the same packet b) we still need to differentiate when a packet has reached its destination in how hopcounts are processed. you can't just say "remove the word forwarding" from hopcount processing; it needs to be there > For higher numbers, prove by induction. > >>> I agree that it's not at all like IPv4. "Normal" hop count / TTL >>> processing involves ignoring the value completely at input time, and >>> checking it only when attempting to forward -- after determining that >>> the packet isn't intended for the local system. >> That second step is required anyway. Why is it important to consider >> hopcount for packets that have already been received? That seems to say >> "OK, we limit the hops in a network, but drop you if your hopcount is >> wrong even if you've made it to the destination". > > Yes, that's exactly what Dinesh is proposing. > >> Why is that useful for traceroute? > > It allows us to capture a drop at the last hop. > > (Dinesh insists that it's also how routers have "always" worked, but I > have my doubts about that.) That may be how some routers incorrectly implement the requirements for IPv4 or IPv6, but it fails to explain the success of sending TTL=0 between hosts on the same LAN. >> (I'm guessing that one of you thinks that we can get away without a >> "ping" message, i.e., that we can trace regular packets. > > Exactly. > > (Of course, without an actual traceroute-for-TRILL proposal in front > of us, it's hard for me to argue that this is in fact how it should > work.) That'd be very useful. >> That won't >> work, however, if the egress rbridge has more than one egress address, >> as I discussed.) > > I don't follow. A TRILL "traceroute" can't see anything past the > TRILL destination node, so who cares how L2 is forwarded^Wswitched > past there? I'm talking about an rbridge that has more than one egress address, e.g., via overloading on a single interface or with multiple interfaces. The problem is that your "hopcount exceeded" message won't know what address to use, and if it picks one that isn't the packet's destination then you will not know the difference between a packet that arrives and one that hits a black hole. Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoj/0EACgkQE5f5cImnZrvdoQCfe4rIJvFOJbyzCBACZtF84CGc 7rQAoPN1wgGwhaTS/ai8q4Au5UeUmcaP =M7Gf -----END PGP SIGNATURE----- From james.d.carlson at Sun.COM Mon Jun 1 10:11:09 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Mon, 1 Jun 2009 13:11:09 -0400 Subject: [rbridge] Hop Count processing In-Reply-To: <4A23FB1D.5000907@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <18979.64035.861405.382919@gargle.gargle.HOWL> <4A23FF41.1000305@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> <18979.63696.516353.386890@gargle.gargle.HOWL> <4A23FB1D.5000907@isi.edu> Message-ID: <18980.2989.877068.764784@gargle.gargle.HOWL> Joe Touch writes: > > You'd pick the MAC address of the interface over which you received > > the frame. > > What if there are multiple such addresses (i.e., MAC overloading)? Who cares? How is that any different from having potentially multiple addresses on a link at _any_ point in the middle of the path with multiple TRILL hops? I would argue that -- if this is actually an issue for any real implementation (I don't know that it is) -- then it's a generic issue. The TRILL implementation in that system would need to know how to deal with multiple MAC addresses on each port, and the current document does not deal with that. It's not merely an issue with traceroute-like functionality, nor is a special issue with the last hop. Thus I don't understand why you're bringing it up here. > >> So how do you know you reached the last hop? > > > > Because frames with higher Hop Count never return. > > That could also mean: > > - the return error message was lost > > - routing had an incomplete path (never reaches the dest) Sure. The other way you know that your path is complete is that the system that sends back the last message happens to have the ID of the original TRILL destination. > > If we do this as Dinesh has argued, then we'll always have the drop > > information from that last hop. > > But you don't know it's the last hop, as per above. You can. See above. :-/ > > If we don't, then we won't. That > > means that the method proposed provides additional information without > > significant expense. > > As in my other message, this doesn't correlate to user packet behavior > anyway. We need to keep sending the same packet repeatedly with > increasing hopcount, which is a separate function that needs to be > implemented. That function has no benefit from using a user packet as > payload, since the TRILL header hides that content anyway. Correct; this would need to be initiated from a TRILL node. > > I agree that a decent 'traceroute' feature should use a special packet > > that allows us to send back "and here's what I'd do with that once > > I've decapsulated" information from the last (destination) TRILL node. > > That'd be "PING" ('echo request') (i.e., that's how traceroute works in > IPv4/IPv6). It could be *anything*. We're designing a new protocol. We're not constrained to the hackery used in ping. (E.g., having an undefined payload format that is used by each node in proprietary ways.) Note that _real_ traceroute doesn't use ping, but instead uses UDP datagrams sent to an arbitrary (unused) port. ICMP Echo for traceroute is a Windowism. ;-} Joe Touch writes: > Making it 1 has the effect of making a traceroute send a "hopcount > exceeded" from the rbridge at the destination, but unfortunately there > are ways in which that message might not be interpreted as "it got > there" (i.e., it could be "this is the last rbridge I saw, but I have no > idea where it got). So far, that seems to be "good enough." Plus, since we have a database of IDs for the RBridges (the nickname database), it's easy to figure out what you're looking at. It's worth noting that IPv4 traceroute has the same ambiguity. Not all nodes bother responding to arbitrary packets, nor do they all allow ICMP errors to get through. The result is that some traceroute attempts just trail off in a bunch of stars. (Try tracerouting to www.sun.com ...) > 2) whether we need to refer to rbridge forwarding in the description of > hopcount processing > > we absolutely do. > > I've seen three agreements that claim we don't, but I've demonstrated > this is not possible. I don't believe you have. > We need to differentiate how we handle packets > that have arrived at their destination (which are decapsulated and their > payload is sent out an interface) from packets that are forwarded (which > are sent out an interface themselves). I.e., we need to distinguish > between receiving and forwarding *TRILL* frames. When we receive a TRILL > frame, we forward (if you like that term) its payload, but not the TRILL > frame. Actually, I don't believe that's necessary. I agree that it'd be nice to have a distinguished message for the "last" hop, and to have important information sent back from all hops. Without a definition of how traceroute is supposed to function in TRILL, we'll need to defer that. > a) we still need active participation by the source rbridge (this cannot > be initiated by a user's packet) Yep; agreed. I don't think anyone was suggesting otherwise. > b) we still need to differentiate when a packet has reached its > destination in how hopcounts are processed. you can't just say "remove > the word forwarding" from hopcount processing; it needs to be there Actually, it works without it. > > (Dinesh insists that it's also how routers have "always" worked, but I > > have my doubts about that.) > > That may be how some routers incorrectly implement the requirements for > IPv4 or IPv6, but it fails to explain the success of sending TTL=0 > between hosts on the same LAN. At least for the proposal under discussion, that particular router bug is immaterial. > > I don't follow. A TRILL "traceroute" can't see anything past the > > TRILL destination node, so who cares how L2 is forwarded^Wswitched > > past there? > > I'm talking about an rbridge that has more than one egress address, > e.g., via overloading on a single interface or with multiple interfaces. > The problem is that your "hopcount exceeded" message won't know what > address to use, and if it picks one that isn't the packet's destination > then you will not know the difference between a packet that arrives and > one that hits a black hole. It must have had some way of receiving the message in the first place. If it's unicast, then the original destination on the 'trace' message should be the source for the reply. If it's multicast (a frightening concept), then the source should be whatever source MAC address is usually used for TRILL messages from that node when sending over the given link on which the original message was received. Note, of course, that we're arguing about the number of angels on the head of a pin at this point, as the traceroute protocol itself *HAS NOT BEEN PROPOSED*. How it might choose MAC addresses to use seems quite well off-topic and doesn't particularly help illuminate the issue. -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Mon Jun 1 11:08:31 2009 From: touch at ISI.EDU (Joe Touch) Date: Mon, 01 Jun 2009 11:08:31 -0700 Subject: [rbridge] Hop Count processing In-Reply-To: <18980.2989.877068.764784@gargle.gargle.HOWL> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <18979.64035.861405.382919@gargle.gargle.HOWL> <4A23FF41.1000305@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> <18979.63696.516353.386890@gargle.gargle.HOWL> <4A23FB1D.5000907@isi.edu> <18980.2989.877068.764784@gargle.gargle.HOWL> Message-ID: <4A24191F.70501@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 James Carlson wrote: > Joe Touch writes: >>> You'd pick the MAC address of the interface over which you received >>> the frame. >> What if there are multiple such addresses (i.e., MAC overloading)? > > Who cares? > > How is that any different from having potentially multiple addresses > on a link at _any_ point in the middle of the path with multiple TRILL > hops? Th difference is that you'll get a set of responses back, where you send a packet from X->Y and you see: P Q R S T And that's it. At that point, you have no idea if T is the same node as Y, or whether T is just a dead end in the routing (a black hole, e.g.). If your point is to know you reached the end, you basically need: failed - hopcount exceeded failed - hopcount exceeded succeeded Another "failed - hopcount exceeded" won't cut it. > I would argue that -- if this is actually an issue for any real > implementation (I don't know that it is) -- then it's a generic > issue. The TRILL implementation in that system would need to know how > to deal with multiple MAC addresses on each port, and the current > document does not deal with that. You only need to know whether the multiple MAC addresses are the same rbridge when you are doing this sort of thing; for pings or anything direct that you get "postitive feedback" on, you have no problem. > It's not merely an issue with traceroute-like functionality, nor is a > special issue with the last hop. Thus I don't understand why you're > bringing it up here. Because it *is* both an issue with only this sort of traceroute solution and on the last hop, as noted above. >>>> So how do you know you reached the last hop? >>> Because frames with higher Hop Count never return. >> That could also mean: >> >> - the return error message was lost >> >> - routing had an incomplete path (never reaches the dest) > > Sure. The other way you know that your path is complete is that the > system that sends back the last message happens to have the ID of the > original TRILL destination. See above; you could easily get back an ID that is for the last hop, but NOT the original destination. >>> If we don't, then we won't. That >>> means that the method proposed provides additional information without >>> significant expense. >> As in my other message, this doesn't correlate to user packet behavior >> anyway. We need to keep sending the same packet repeatedly with >> increasing hopcount, which is a separate function that needs to be >> implemented. That function has no benefit from using a user packet as >> payload, since the TRILL header hides that content anyway. > > Correct; this would need to be initiated from a TRILL node. > >>> I agree that a decent 'traceroute' feature should use a special packet >>> that allows us to send back "and here's what I'd do with that once >>> I've decapsulated" information from the last (destination) TRILL node. >> That'd be "PING" ('echo request') (i.e., that's how traceroute works in >> IPv4/IPv6). > > It could be *anything*. We're designing a new protocol. We're not > constrained to the hackery used in ping. (E.g., having an undefined > payload format that is used by each node in proprietary ways.) Ping is a useful protocol regardless - it tells you the other end is up. We probably can use that for diagnostics anyway. The "hack" here, if anything, is trying to infer a packet has arrived when you get an error message from the place where the TTL goes to zero, rather than by getting a "success" message. > Note that _real_ traceroute doesn't use ping, but instead uses UDP > datagrams sent to an arbitrary (unused) port. The first version of traceroute used PING. It was changed because PING isn't required. We don't have ports, but we can require the PING be implemented. > ICMP Echo for > traceroute is a Windowism. ;-} See above. Windows uses the original definition. Using ICMP pings remains an option in most traceroute implementations. > Joe Touch writes: >> Making it 1 has the effect of making a traceroute send a "hopcount >> exceeded" from the rbridge at the destination, but unfortunately there >> are ways in which that message might not be interpreted as "it got >> there" (i.e., it could be "this is the last rbridge I saw, but I have no >> idea where it got). > > So far, that seems to be "good enough." Plus, since we have a > database of IDs for the RBridges (the nickname database), it's easy to > figure out what you're looking at. > > It's worth noting that IPv4 traceroute has the same ambiguity. Not > all nodes bother responding to arbitrary packets, nor do they all > allow ICMP errors to get through. The result is that some traceroute > attempts just trail off in a bunch of stars. (Try tracerouting to > www.sun.com ...) This matters only for the destination; at the destination, you get a different message - THAT is what makes it work, and that's not what we're doing yet. >> 2) whether we need to refer to rbridge forwarding in the description of >> hopcount processing >> >> we absolutely do. >> >> I've seen three agreements that claim we don't, but I've demonstrated >> this is not possible. > > I don't believe you have. The definition you gave had it. I showed how all other definitions proposed failed to give sensible results. What other sort of proof do you need? Or do you have another description of the processing that does not refer to forwarding that you would like checked? >> We need to differentiate how we handle packets >> that have arrived at their destination (which are decapsulated and their >> payload is sent out an interface) from packets that are forwarded (which >> are sent out an interface themselves). I.e., we need to distinguish >> between receiving and forwarding *TRILL* frames. When we receive a TRILL >> frame, we forward (if you like that term) its payload, but not the TRILL >> frame. > > Actually, I don't believe that's necessary. > > I agree that it'd be nice to have a distinguished message for the > "last" hop, and to have important information sent back from all > hops. Without a definition of how traceroute is supposed to function > in TRILL, we'll need to defer that. If you aren't going to define traceroute, then how can you argue what we need in terms of hopcount to support it? >> a) we still need active participation by the source rbridge (this cannot >> be initiated by a user's packet) > > Yep; agreed. I don't think anyone was suggesting otherwise. > >> b) we still need to differentiate when a packet has reached its >> destination in how hopcounts are processed. you can't just say "remove >> the word forwarding" from hopcount processing; it needs to be there > > Actually, it works without it. Again, show me a description that works. ... >>> I don't follow. A TRILL "traceroute" can't see anything past the >>> TRILL destination node, so who cares how L2 is forwarded^Wswitched >>> past there? >> I'm talking about an rbridge that has more than one egress address, >> e.g., via overloading on a single interface or with multiple interfaces. >> The problem is that your "hopcount exceeded" message won't know what >> address to use, and if it picks one that isn't the packet's destination >> then you will not know the difference between a packet that arrives and >> one that hits a black hole. > > It must have had some way of receiving the message in the first place. > > If it's unicast, then the original destination on the 'trace' message > should be the source for the reply. If it's multicast (a frightening > concept), then the source should be whatever source MAC address is > usually used for TRILL messages from that node when sending over the > given link on which the original message was received. > > Note, of course, that we're arguing about the number of angels on the > head of a pin at this point, as the traceroute protocol itself *HAS > NOT BEEN PROPOSED*. How it might choose MAC addresses to use seems > quite well off-topic and doesn't particularly help illuminate the > issue. Until someone has a description of traceroute, it's premature to use traceroute as an argument for how to do hopcount processing. Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkokGR8ACgkQE5f5cImnZrt+SACgtMO3CzGG2t6A0mIWdmkGVi6M SGUAoN0qAg46izO7FN9dSTbCuCAJmRJL =Na75 -----END PGP SIGNATURE----- From james.d.carlson at Sun.COM Mon Jun 1 12:26:28 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Mon, 1 Jun 2009 15:26:28 -0400 Subject: [rbridge] Hop Count processing In-Reply-To: <4A24191F.70501@isi.edu> References: <4A14C56A.1000408@isi.edu> <4A14DE61.4090502@cisco.com> <4A155EE2.2040306@isi.edu> <4A1F1211.9080001@sun.com> <4A1F2DD9.5000909@isi.edu> <4A20496F.9010409@sun.com> <4A205283.80607@sun.com> <18976.22789.315890.979662@gargle.gargle.HOWL> <4A2075E1.7050801@isi.edu> <18979.49243.746839.801203@gargle.gargle.HOWL> <4A23DFC1.9070302@isi.edu> <18979.64035.861405.382919@gargle.gargle.HOWL> <4A23FF41.1000305@isi.edu> <4A23ED8C.7050700@cisco.com> <4A23F3F9.4060902@isi.edu> <18979.63696.516353.386890@gargle.gargle.HOWL> <4A23FB1D.5000907@isi.edu> <18980.2989.877068.764784@gargle.gargle.HOWL> <4A24191F.70501@isi.edu> Message-ID: <18980.11108.381614.813318@gargle.gargle.HOWL> Joe Touch writes: > Until someone has a description of traceroute, it's premature to use > traceroute as an argument for how to do hopcount processing. I don't believe that there is another reasonable argument that supports it. (I've heard privately that some implementations may have unspecified "performance problems" if they have to check after decrementing, but that's just baffling to me, and certainly not a good justification for changing the way this feature operates.) -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From Radia.Perlman at sun.com Thu Jun 4 13:58:33 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Thu, 04 Jun 2009 13:58:33 -0700 Subject: [rbridge] # of tree issues: default # of distribution trees, minimum acceptable supported # of trees Message-ID: <4A283579.9080908@sun.com> Easy question: what do you prefer the default # of trees to calculate be? Harder question: What is the minimum acceptable # of trees an implementation is willing to compute? (this is currently not specified in the spec. Should it be?) Ugly question: What happens if R3 can only support, say, 3 trees and the highest priority guy specifies, say, 5 trees? There are various potential answers to this: a) if we ignore the question, it will go away b) we should put "number of trees I support" into the LSP, and the highest priority RBridge is not allowed to specify a number greater than the minimum supported by any RBridge c) require RBridges to support as many trees as the highest priority RBridge demands From Radia.Perlman at sun.com Thu Jun 4 14:05:03 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Thu, 04 Jun 2009 14:05:03 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? Message-ID: <4A2836FF.60801@sun.com> There's currently a flag in an LSP for an RBridge to say "don't send me layer 2 multicasts that aren't derived from IP multicasts". Would this ever be used? Can all layer 2 non-IP-derived multicasts really be lumped into the same bucket and turned on and off as a unit? It seems safer to me to remove that flag, since maybe there are some important layer 2 multicasts? Maybe it's intended as a safety thing, because IP-derived-multicasts have to be explicitly requested (unless you are an IP multicast router), whereas the other layer 2 multicasts would go everywhere? And if we do have that flag, and someone says "I don't want non-IP-derived layer 2 multicasts", does that include ESADI information? Radia From touch at ISI.EDU Thu Jun 4 16:18:56 2009 From: touch at ISI.EDU (Joe Touch) Date: Thu, 04 Jun 2009 16:18:56 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A2836FF.60801@sun.com> References: <4A2836FF.60801@sun.com> Message-ID: <4A285660.7090505@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Radia Perlman wrote: > There's currently a flag in an LSP for an RBridge to say "don't send me > layer 2 multicasts that aren't > derived from IP multicasts". > > Would this ever be used? Can all layer 2 non-IP-derived multicasts > really be lumped into the same > bucket and turned on and off as a unit? It might be useful not to make any assumptions about future use of L2 non-IP-derived mcast. Someone might come up with something that will then break over rbridges, and that doesn't sound like a win... Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkooVl8ACgkQE5f5cImnZrt3OQCfeFr388KqsxmyZ7LJ/KRvvDJ2 xHoAnRI5EW8dPYLtGEZzC1iQTVSYR6gq =Elnj -----END PGP SIGNATURE----- From Radia.Perlman at sun.com Thu Jun 4 17:47:48 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Thu, 04 Jun 2009 17:47:48 -0700 Subject: [rbridge] Tie breaking in trees Message-ID: <4A286B34.3080703@sun.com> (I just reread the spec, which is why I have these last few questions). There's a section in the spec that is about tie-breaking, which has to be somewhat revised since it conflicts somewhat with a recent feature we put in, which is to allow path splitting multicast rooted at the same root. The paragraph in the spec that I think needs to change is as follows: " If there are two or more equal lowest cost adjacencies between two RBridges, then between adjacencies established by P2P Hellos and adjacencies established by TRILL-Hellos, the P2P adjacencies are preferred; between TRILL-Hello links, the adjacency with the lowest Designated IS LAN ID (pseudonode) is preferred; and between P2P links, the adjacency with the lowest Extended Circuit ID is preferred. Such tie breaking only affects the two RBridges connected by such equal cost adjacencies. The tie breaking determines which of the tied links to send multi-destination traffic on and on which of them to permit receipt of such TRILL frames. " First, to review tree-building, especially now that we've added the feature of multiple trees from the same root (using different nicknames). To build a tree from a particular nickname, all RBridges need to build the same tree. The tree will be a shortest-path tree from the root, using different tie-breakers for equal cost links based on "tree number", so that multiple trees from the same root will choose different links. The tie-breaker is based on the 7-byte ID of the parent. That means that there is no tie-breaker for links that have no pseudonode. No-pseudonode links include pt-to-pt links and pseudonode-suppressed LAN links. The spec currently treats all those links between R1 and R2 as a single link from the point of view of RBridges other than R1 and R2. I believe that for "real" pt-to-pt links, there is no reason to do a specified tie-breaker. For traffic being forwarded from R1 to R2, it seems as though R1 should be able to send on any of the links, just like it would for unicast. But I think there was some controversy over doing that, so the wording I will suggest below is selecting a single link for all multicast that selects that pair of RBridges. I'd suggest the following wording instead for the paragraph at the top of the note, which would be discussing a packet arriving from neighbor R2, on a tree, say T, for which the non-pseudonode link to R1 is in that tree, and the ingress RBridge, say Ri, passes the RPF check for tree T on link (R1-R2). This is only relevant to the endnodes of the R1-R2 link. "If the tree-building and tie-breaking for a particular tree selects a non-pseudonode link between R1 and R2, that "R1-R2" link might consist of multiple links. These parallel links would be visible to R1 and R2, but not to the rest of the campus (because the links are not represented by pseudonodes). If this bundle of parallel links is included in a tree, it is important for R1 and R2 to decide which link to use, but is irrelevant to other RBridges, and therefore, the tie-breaking algorithm need not be visible to any RBridges other than R1 and R2. In this case, R1-R2 adjacencies are ordered as follows, with the one "most preferred" adjacency being the one that R1 transmits to R2 on, and the one that R2 accepts traffic from R1 on: a) most preferred are those established by P2P Hellos, with tie-breaking among those based on preferring the one with the numerically highest Estended Circuit ID. b) next considered are those established through TRILL-Hellos, with suppressed pseudonodes. Note that the pseudonode is suppressed in LSPs, but still appears in the TRILL-Hello, and therefore is available for this tie-breaking. Among these links, the one with the numerically largest pseudonode ID is preferred". ********** An alternative would be to replace a) with: a) most preferred are those established by P2P Hellos. If there are one or more of those, R1 is allowed to transmit on any of those, and R2 is required to accept from any of those. From d3e3e3 at gmail.com Fri Jun 5 14:53:24 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Fri, 5 Jun 2009 17:53:24 -0400 Subject: [rbridge] Tie breaking in trees In-Reply-To: <4A286B34.3080703@sun.com> References: <4A286B34.3080703@sun.com> Message-ID: <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> On Thu, Jun 4, 2009 at 8:47 PM, Radia Perlman wrote: > > ... > > ********** > An alternative would be to replace a) with: > > a) most preferred are those established by P2P Hellos. If there > are one or more of those, R1 is allowed to transmit on any of those, > and R2 is required to accept from any of those. I prefer this alternative. It is safe to multipath multi-destination frames across parallel one-hop links that are configured to be P2P. So I see no reason to add complexity at the receiver by requiring to it reject such frames on all but one of such P2P links. Thanks, Donald From d3e3e3 at gmail.com Sat Jun 6 10:53:38 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Sat, 6 Jun 2009 13:53:38 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A285660.7090505@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> Message-ID: <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> Hi Joe, I'm assuming you think that we don't need this bit? Thanks, Donald On Thu, Jun 4, 2009 at 7:18 PM, Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Radia Perlman wrote: >> There's currently a flag in an LSP for an RBridge to say "don't send me >> layer 2 multicasts that aren't >> derived from IP multicasts". >> >> Would this ever be used? Can all layer 2 non-IP-derived multicasts >> really be lumped into the same >> bucket and turned on and off as a unit? > > It might be useful not to make any assumptions about future use of L2 > non-IP-derived mcast. Someone might come up with something that will > then break over rbridges, and that doesn't sound like a win... > > Joe > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkooVl8ACgkQE5f5cImnZrt3OQCfeFr388KqsxmyZ7LJ/KRvvDJ2 > xHoAnRI5EW8dPYLtGEZzC1iQTVSYR6gq > =Elnj > -----END PGP SIGNATURE----- > _______________________________________________ From touch at ISI.EDU Sat Jun 6 11:22:23 2009 From: touch at ISI.EDU (Joe Touch) Date: Sat, 06 Jun 2009 11:22:23 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> Message-ID: <4A2AB3DF.6020305@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Donald Eastlake wrote: > Hi Joe, > > I'm assuming you think that we don't need this bit? It seems like even having it around is asking for trouble, AFAICT. Joe > On Thu, Jun 4, 2009 at 7:18 PM, Joe Touch wrote: > Radia Perlman wrote: >>>> There's currently a flag in an LSP for an RBridge to say "don't send me >>>> layer 2 multicasts that aren't >>>> derived from IP multicasts". >>>> >>>> Would this ever be used? Can all layer 2 non-IP-derived multicasts >>>> really be lumped into the same >>>> bucket and turned on and off as a unit? > It might be useful not to make any assumptions about future use of L2 > non-IP-derived mcast. Someone might come up with something that will > then break over rbridges, and that doesn't sound like a win... > > Joe _______________________________________________ > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoqs98ACgkQE5f5cImnZrtRlwCginXsRCpwmP1EIqr/xAGsjsPL Ku4AoJ2eROAbzwNoMM9Bhzc1V530svXo =HRnL -----END PGP SIGNATURE----- From ddutt at cisco.com Sun Jun 7 16:38:55 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Sun, 07 Jun 2009 16:38:55 -0700 Subject: [rbridge] Tie breaking in trees In-Reply-To: <4A286B34.3080703@sun.com> References: <4A286B34.3080703@sun.com> Message-ID: <4A2C4F8F.50605@cisco.com> Radia, Radia Perlman wrote: > a) most preferred are those established by P2P Hellos, > with tie-breaking among those based on preferring the > one with the numerically highest Estended Circuit ID. > > b) next considered are those established through TRILL-Hellos, > with suppressed pseudonodes. Note that the pseudonode is > suppressed in LSPs, but still appears in the TRILL-Hello, > and therefore is available for this tie-breaking. Among these > links, the one with the numerically largest pseudonode ID is preferred". > I want this i.e. (a) followed by (b). The alternative that you suggest below for (a) will make IIC inordinately expensive to implement in hardware. The alteranate check will have to allow frames for not only the adjacent Rbridge but all other Rbridges that are upstream from multiple interfaces. I strongly disagree with the alternate text below, Dinesh > ********** > An alternative would be to replace a) with: > > a) most preferred are those established by P2P Hellos. If there > are one or more of those, R1 is allowed to transmit on any of those, > and R2 is required to accept from any of those. > > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From ddutt at cisco.com Sun Jun 7 16:41:50 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Sun, 07 Jun 2009 16:41:50 -0700 Subject: [rbridge] Tie breaking in trees In-Reply-To: <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> Message-ID: <4A2C503E.2050506@cisco.com> Strongly disagree. See my earlier response to Radia's original email, Dinesh Donald Eastlake wrote: > On Thu, Jun 4, 2009 at 8:47 PM, Radia Perlman wrote: > >> ... >> >> ********** >> An alternative would be to replace a) with: >> >> a) most preferred are those established by P2P Hellos. If there >> are one or more of those, R1 is allowed to transmit on any of those, >> and R2 is required to accept from any of those. >> > > I prefer this alternative. It is safe to multipath multi-destination > frames across parallel one-hop links that are configured to be P2P. So > I see no reason to add complexity at the receiver by requiring to it > reject such frames on all but one of such P2P links. > It is not safe because IIC will have to be allowed on multiple interfaces. What you're suggesting is adding more complexity. Dinesh > Thanks, > Donald > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From ddutt at cisco.com Sun Jun 7 19:00:57 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Sun, 07 Jun 2009 19:00:57 -0700 Subject: [rbridge] Tie breaking in trees In-Reply-To: <4A286B34.3080703@sun.com> References: <4A286B34.3080703@sun.com> Message-ID: <4A2C70D9.6010805@cisco.com> I used what may have been a unknown terminology. IIC means incoming interface check, what is referred to in the spec as RPF check. So, please replace IIC in my previous emails with RPF check. Apologize for the confusion, Dinesh Radia Perlman wrote: > (I just reread the spec, which is why I have these last few questions). > > > There's a section in the spec that is about tie-breaking, which has to > be somewhat > revised since it conflicts somewhat with a recent feature we put in, > which is > to allow path splitting multicast rooted at the same root. The paragraph in > the spec that I think needs to change is as follows: > > " If there are two or more equal lowest cost adjacencies between two > RBridges, then between adjacencies established by P2P Hellos and > adjacencies established by TRILL-Hellos, the P2P adjacencies are > preferred; between TRILL-Hello links, the adjacency with the lowest > Designated IS LAN ID (pseudonode) is preferred; and between P2P > links, the adjacency with the lowest Extended Circuit ID is > preferred. Such tie breaking only affects the two RBridges connected > by such equal cost adjacencies. The tie breaking determines which of > the tied links to send multi-destination traffic on and on which of > them to permit receipt of such TRILL frames. > " > > First, to review tree-building, especially now that we've added the feature > of multiple trees from the same root (using different nicknames). > > To build a tree from a particular nickname, all RBridges need to > build the same tree. The tree will be a shortest-path tree from the > root, using different tie-breakers for equal cost links based on "tree > number", > so that multiple trees from the same root will choose different links. > > The tie-breaker is based on the 7-byte ID of the parent. That means that > there is no tie-breaker for links that have no pseudonode. No-pseudonode > links include > pt-to-pt links and pseudonode-suppressed LAN links. The spec currently > treats all those links between R1 and R2 as a single link from the > point of view of RBridges other than R1 and R2. > > I believe that for "real" pt-to-pt links, there is no reason to do a > specified tie-breaker. > For traffic being forwarded from R1 to R2, it seems as though R1 should be > able to send on any of the links, just like it would for unicast. But I > think > there was some controversy over doing that, so the wording I will suggest > below is selecting a single link for all multicast that selects that > pair of RBridges. > > I'd suggest the following wording instead for the paragraph at the > top of the note, which > would be discussing a packet arriving from neighbor R2, on a tree, say > T, for which > the non-pseudonode link to R1 is in that tree, and the ingress RBridge, > say Ri, passes > the RPF check for tree T on link (R1-R2). This is only relevant to > the endnodes of the R1-R2 link. > > "If the tree-building and tie-breaking for a particular tree selects a > non-pseudonode link between R1 and R2, that "R1-R2" link > might consist of multiple links. These parallel links would be > visible to R1 and R2, but not to the rest of the campus (because > the links are not represented by pseudonodes). If this bundle of > parallel links is included in a tree, it is important for R1 and R2 > to decide which link to use, but is irrelevant to other RBridges, > and therefore, the tie-breaking algorithm need not be visible > to any RBridges other than R1 and R2. In this case, > R1-R2 adjacencies are ordered as follows, with the > one "most preferred" adjacency being the one that R1 transmits > to R2 on, and the one that R2 accepts traffic from R1 on: > > a) most preferred are those established by P2P Hellos, > with tie-breaking among those based on preferring the > one with the numerically highest Estended Circuit ID. > > b) next considered are those established through TRILL-Hellos, > with suppressed pseudonodes. Note that the pseudonode is > suppressed in LSPs, but still appears in the TRILL-Hello, > and therefore is available for this tie-breaking. Among these > links, the one with the numerically largest pseudonode ID is preferred". > > ********** > An alternative would be to replace a) with: > > a) most preferred are those established by P2P Hellos. If there > are one or more of those, R1 is allowed to transmit on any of those, > and R2 is required to accept from any of those. > > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From james.d.carlson at sun.com Mon Jun 8 05:59:27 2009 From: james.d.carlson at sun.com (James Carlson) Date: Mon, 8 Jun 2009 08:59:27 -0400 Subject: [rbridge] Tie breaking in trees In-Reply-To: <4A2C503E.2050506@cisco.com> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> <4A2C503E.2050506@cisco.com> Message-ID: <18989.2863.515852.713148@gargle.gargle.HOWL> Dinesh G Dutt writes: > It is not safe because IIC will have to be allowed on multiple > interfaces. What you're suggesting is adding more complexity. It's actually not hard to do in hardware. All you need is a settable input interface identifier (on each physical interface) for the RPF check that defaults to the ID assigned to that interface. Allow the routing protocol to set the ID to be the same on each link when parallel links are detected by Hellos. Breaking multiple parallel links so that they don't work by default seems like a bad thing to me. I'd rather have us just rely on implementors: if you can't get parallel link behavior right due to hardware limitations, then exercise some discretion in your implementation by shutting down extra links automatically, and notifying the administrator that you're unable to handle the requested configuration. In fact, that should be true for all obscure implementation limitations, and we shouldn't have to write it into any specification. If you detect that you're in a sitation you can't support, then disable and/or shut yourself down in some suitable implementation- specific manner. That's just "network equipment design 101." Roping that area as off-bounds for all _other_ implementors who don't have the same problems seems a bit too limiting to me. -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From jeffpick at broadcom.com Mon Jun 8 11:53:30 2009 From: jeffpick at broadcom.com (Jeff Pickering) Date: Mon, 8 Jun 2009 11:53:30 -0700 Subject: [rbridge] mib discussions Message-ID: <9793EC0A42D76D4EB2A8F94D77E2138893C4E55656@SJEXCHCCR02.corp.ad.broadcom.com> Hi all, Have any discussions started yet in the MIB arena? If not, is anyone ready to start talking? Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/rbridge/attachments/20090608/5e6b56ee/attachment.html From d3e3e3 at gmail.com Wed Jun 10 08:18:19 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Wed, 10 Jun 2009 11:18:19 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A2AB3DF.6020305@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> Message-ID: <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> The Other Multicast bit was added after Dino Farinacci advocated it at the last Chicago meeting. (See minutes: http://www.ietf.org/proceedings/07jul/minutes/trill.txt) This was put through a consensus call on the working group mailing list resulting in the formal consensus determination here: http://www.postel.org/pipermail/rbridge/2007-September/002470.html. This bit has been in the draft since verison -06. It defaults to "on" so, unless you go to some effort to configure it to off, all RBridges do get all the non-IP derived multicast traffic for the VLANs they advertise they are connected to. It doesn't have any effect on layer 2 or TRILL control frames. In light of all this, I am very reluctant to consider changing this part of the design unless there is a clear consensus to re-open the issue. Thanks, Donald ============================= Donald E. Eastlake 3rd +1-508-634-2066 (home) 155 Beaver Street Milford, MA 01757 USA d3e3e3 at gmail.com On Sat, Jun 6, 2009 at 2:22 PM, Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Donald Eastlake wrote: >> Hi Joe, >> >> I'm assuming you think that we don't need this bit? > > It seems like even having it around is asking for trouble, AFAICT. > > Joe > >> On Thu, Jun 4, 2009 at 7:18 PM, Joe Touch wrote: >> Radia Perlman wrote: >>>>> There's currently a flag in an LSP for an RBridge to say "don't send me >>>>> layer 2 multicasts that aren't >>>>> derived from IP multicasts". >>>>> >>>>> Would this ever be used? Can all layer 2 non-IP-derived multicasts >>>>> really be lumped into the same >>>>> bucket and turned on and off as a unit? >> It might be useful not to make any assumptions about future use of L2 >> non-IP-derived mcast. Someone might come up with something that will >> then break over rbridges, and that doesn't sound like a win... >> >> Joe > _______________________________________________ >> _______________________________________________ >> rbridge mailing list >> rbridge at postel.org >> http://mailman.postel.org/mailman/listinfo/rbridge > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkoqs98ACgkQE5f5cImnZrtRlwCginXsRCpwmP1EIqr/xAGsjsPL > Ku4AoJ2eROAbzwNoMM9Bhzc1V530svXo > =HRnL > -----END PGP SIGNATURE----- > From ddutt at cisco.com Wed Jun 10 09:38:53 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Wed, 10 Jun 2009 09:38:53 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> Message-ID: <4A2FE19D.7090102@cisco.com> I agree, Dinesh Donald Eastlake wrote: > The Other Multicast bit was added after Dino Farinacci advocated it at > the last Chicago meeting. (See minutes: > http://www.ietf.org/proceedings/07jul/minutes/trill.txt) This was put > through a consensus call on the working group mailing list resulting > in the formal consensus determination here: > http://www.postel.org/pipermail/rbridge/2007-September/002470.html. > > This bit has been in the draft since verison -06. It defaults to "on" > so, unless you go to some effort to configure it to off, all RBridges > do get all the non-IP derived multicast traffic for the VLANs they > advertise they are connected to. It doesn't have any effect on layer 2 > or TRILL control frames. > > In light of all this, I am very reluctant to consider changing this > part of the design unless there is a clear consensus to re-open the > issue. > > Thanks, > Donald > ============================= > Donald E. Eastlake 3rd +1-508-634-2066 (home) > 155 Beaver Street > Milford, MA 01757 USA > d3e3e3 at gmail.com > > > > On Sat, Jun 6, 2009 at 2:22 PM, Joe Touch wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> >> >> Donald Eastlake wrote: >> >>> Hi Joe, >>> >>> I'm assuming you think that we don't need this bit? >>> >> It seems like even having it around is asking for trouble, AFAICT. >> >> Joe >> >> >>> On Thu, Jun 4, 2009 at 7:18 PM, Joe Touch wrote: >>> Radia Perlman wrote: >>> >>>>>> There's currently a flag in an LSP for an RBridge to say "don't send me >>>>>> layer 2 multicasts that aren't >>>>>> derived from IP multicasts". >>>>>> >>>>>> Would this ever be used? Can all layer 2 non-IP-derived multicasts >>>>>> really be lumped into the same >>>>>> bucket and turned on and off as a unit? >>>>>> >>> It might be useful not to make any assumptions about future use of L2 >>> non-IP-derived mcast. Someone might come up with something that will >>> then break over rbridges, and that doesn't sound like a win... >>> >>> Joe >>> >> _______________________________________________ >> >>> _______________________________________________ >>> rbridge mailing list >>> rbridge at postel.org >>> http://mailman.postel.org/mailman/listinfo/rbridge >>> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.9 (MingW32) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iEYEARECAAYFAkoqs98ACgkQE5f5cImnZrtRlwCginXsRCpwmP1EIqr/xAGsjsPL >> Ku4AoJ2eROAbzwNoMM9Bhzc1V530svXo >> =HRnL >> -----END PGP SIGNATURE----- >> >> > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From touch at ISI.EDU Wed Jun 10 10:49:44 2009 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jun 2009 10:49:44 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> Message-ID: <4A2FF238.50501@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Donald Eastlake wrote: > The Other Multicast bit was added after Dino Farinacci advocated it at > the last Chicago meeting. (See minutes: > http://www.ietf.org/proceedings/07jul/minutes/trill.txt) The minutes indicate a suggestion to put the bit in, but not a clear reason why it's needed. The notes only indicate that this was a "suggestion", then declare "consensus to be confirmed on the list. The post you made on the email list in July indicates consensus at the meeting, even though that wasn't indicated in the minutes: http://www.postel.org/pipermail/rbridge/2007-July/002399.html > This was put > through a consensus call on the working group mailing list resulting > in the formal consensus determination here: > http://www.postel.org/pipermail/rbridge/2007-September/002470.html. Besides your mail, there was only one post from James Carlson endorsing the idea: http://www.postel.org/pipermail/rbridge/2007-July/002400.html Two months later you declare consensus based on "plenty of support and almost no opposition". There was no discussion on the mailing list. There was no "plenty of support" on the mailing list. http://www.postel.org/pipermail/rbridge/2007-September/002470.html > This bit has been in the draft since verison -06. It defaults to "on" > so, unless you go to some effort to configure it to off, all RBridges > do get all the non-IP derived multicast traffic for the VLANs they > advertise they are connected to. It doesn't have any effect on layer 2 > or TRILL control frames. The question is "why do we need this complexity"? I appreciate it won't cause problems if defaulted 'on', but having a bit means making sure it's implemented and tested for vendors. > In light of all this, I am very reluctant to consider changing this > part of the design unless there is a clear consensus to re-open the > issue. Well, there's just as much consensus on the mailing list to open the issue (my post) as there was to reach consensus to insert the bit in the first place ;-) Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkov8jgACgkQE5f5cImnZrv3ugCgp+dv2QL9QrSo7QfGgH4F8ofg p7kAnRcEQuYPUxzNFFWA9kZbjRNQRkGO =soN/ -----END PGP SIGNATURE----- From touch at ISI.EDU Wed Jun 10 10:52:24 2009 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jun 2009 10:52:24 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> Message-ID: <4A2FF2D8.4070902@isi.edu> Donald Eastlake wrote: > The Other Multicast bit was added after Dino Farinacci advocated it at > the last Chicago meeting. (See minutes: > http://www.ietf.org/proceedings/07jul/minutes/trill.txt) The minutes indicate a suggestion to put the bit in, but not a clear reason why it's needed. The notes only indicate that this was a "suggestion", then declare "consensus to be confirmed on the list. The post you made on the email list in July indicates consensus at the meeting, even though that wasn't indicated in the minutes: http://www.postel.org/pipermail/rbridge/2007-July/002399.html > This was put > through a consensus call on the working group mailing list resulting > in the formal consensus determination here: > http://www.postel.org/pipermail/rbridge/2007-September/002470.html. Besides your mail, there was only one post from James Carlson endorsing the idea: http://www.postel.org/pipermail/rbridge/2007-July/002400.html Two months later you declare consensus based on "plenty of support and almost no opposition". There was no discussion on the mailing list. There was not really "plenty of support" on the mailing list. http://www.postel.org/pipermail/rbridge/2007-September/002470.html > This bit has been in the draft since verison -06. It defaults to "on" > so, unless you go to some effort to configure it to off, all RBridges > do get all the non-IP derived multicast traffic for the VLANs they > advertise they are connected to. It doesn't have any effect on layer 2 > or TRILL control frames. The question is "why do we need this complexity"? I appreciate it won't cause problems if defaulted 'on', but having a bit means making sure it's implemented and tested for vendors. > In light of all this, I am very reluctant to consider changing this > part of the design unless there is a clear consensus to re-open the > issue. Well, there's just as much consensus on the mailing list to open the issue (my post) as there was to reach consensus to insert the bit in the first place ;-) Maybe what I'm asking is at least to clarify the issue. Is it really needed as a bit, and if so, what is the impact of changing it from the default value? Joe From james.d.carlson at sun.com Wed Jun 10 12:40:23 2009 From: james.d.carlson at sun.com (James Carlson) Date: Wed, 10 Jun 2009 15:40:23 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A2FF2D8.4070902@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> Message-ID: <18992.3111.509016.143173@gargle.gargle.HOWL> Joe Touch writes: > Donald Eastlake wrote: > > This was put > > through a consensus call on the working group mailing list resulting > > in the formal consensus determination here: > > http://www.postel.org/pipermail/rbridge/2007-September/002470.html. > > Besides your mail, there was only one post from James Carlson endorsing > the idea: > http://www.postel.org/pipermail/rbridge/2007-July/002400.html Just to make clear (which itself might be impossible at this point): the reason I supported it was for symmetry with the other multicast- optimizing bits already defined. If the implementation has some reason to know that it has useful local information about non-IP multicasts in use (e.g., the subnet in question runs only IP or perhaps is known to use GMRP for all multicast addresses), then it can set or reset the flag as needed. If it doesn't (or can't) know about non-IP multicast usage, then it should set it to 1 with the rest of those who aren't snooping the multicast control protocols. I somewhat doubt it's going to see much use, but it's also fairly cheap -- as long as we already have to support IPv4 and IPv6 control bits. (And since, if you're lazy, you can just ignore it and let the downstream discard the unwanted packets.) -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Wed Jun 10 13:06:04 2009 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jun 2009 13:06:04 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <18992.3111.509016.143173@gargle.gargle.HOWL> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> Message-ID: <4A30122C.9020609@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 James Carlson wrote: > Joe Touch writes: >> Donald Eastlake wrote: >>> This was put >>> through a consensus call on the working group mailing list resulting >>> in the formal consensus determination here: >>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >> Besides your mail, there was only one post from James Carlson endorsing >> the idea: >> http://www.postel.org/pipermail/rbridge/2007-July/002400.html > > Just to make clear (which itself might be impossible at this point): > the reason I supported it was for symmetry with the other multicast- > optimizing bits already defined. If the implementation has some > reason to know that it has useful local information about non-IP > multicasts in use (e.g., the subnet in question runs only IP or > perhaps is known to use GMRP for all multicast addresses), then it can > set or reset the flag as needed. If it doesn't (or can't) know about > non-IP multicast usage, then it should set it to 1 with the rest of > those who aren't snooping the multicast control protocols. > > I somewhat doubt it's going to see much use, but it's also fairly > cheap -- as long as we already have to support IPv4 and IPv6 control > bits. (And since, if you're lazy, you can just ignore it and let the > downstream discard the unwanted packets.) I'm seeing a "something operators can set as desired", but not a reason they would ever want to set it. I particularly dislike the idea of filtering multicasts based on upper layer info (i.e., whether it's IP or not). IGMP is an optimization, but it seems like this bit could break things when its use wasn't needed - and I still don't see a clear need. Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkowEiwACgkQE5f5cImnZrvfzgCg6QgncAxHZVc5xThyFXFreWJS ndQAoIf7/c6vdfhkjCe93tilC8cMPJIe =0quk -----END PGP SIGNATURE----- From anoop at brocade.com Wed Jun 10 14:24:16 2009 From: anoop at brocade.com (Anoop Ghanwani) Date: Wed, 10 Jun 2009 14:24:16 -0700 Subject: [rbridge] Tie breaking in trees In-Reply-To: <18989.2863.515852.713148@gargle.gargle.HOWL> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> <4A2C503E.2050506@cisco.com> <18989.2863.515852.713148@gargle.gargle.HOWL> Message-ID: <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD7A@HQ-EXCH-7.corp.brocade.com> I think the more common case would be to run LACP and establish a LAG (link agg group) so that TRILL/IS-IS are only aware of a single link between the two RBridges. Today, for multicast with a single tree, we pretty much have to have loop-free connectivity. 2 links between 2 Rbridges is effectively a loop. STP would have broken that by blocking one of those. Likewise, since we compute a tree for multicast, it would make sense to only allow one of those. If someone really wants to have the multiple links active, then they can use link aggregation. (I don't feel strongly about this one way or another, though.) Anoop > -----Original Message----- > From: rbridge-bounces at postel.org > [mailto:rbridge-bounces at postel.org] On Behalf Of James Carlson > Sent: Monday, June 08, 2009 5:59 AM > To: Dinesh G Dutt > Cc: Donald Eastlake; TRILL/RBridge Working Group > Subject: Re: [rbridge] Tie breaking in trees > > Dinesh G Dutt writes: > > It is not safe because IIC will have to be allowed on multiple > > interfaces. What you're suggesting is adding more complexity. > > It's actually not hard to do in hardware. All you need is a settable > input interface identifier (on each physical interface) for the RPF > check that defaults to the ID assigned to that interface. Allow the > routing protocol to set the ID to be the same on each link when > parallel links are detected by Hellos. > > Breaking multiple parallel links so that they don't work by default > seems like a bad thing to me. I'd rather have us just rely on > implementors: if you can't get parallel link behavior right due to > hardware limitations, then exercise some discretion in your > implementation by shutting down extra links automatically, and > notifying the administrator that you're unable to handle the requested > configuration. > > In fact, that should be true for all obscure implementation > limitations, and we shouldn't have to write it into any specification. > If you detect that you're in a sitation you can't support, then > disable and/or shut yourself down in some suitable implementation- > specific manner. That's just "network equipment design 101." > > Roping that area as off-bounds for all _other_ implementors who don't > have the same problems seems a bit too limiting to me. > > -- > James Carlson, Solaris Networking > > Sun Microsystems / 35 Network Drive 71.232W Vox +1 > 781 442 2084 > MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 > 781 442 1677 > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > From anoop at brocade.com Wed Jun 10 14:40:11 2009 From: anoop at brocade.com (Anoop Ghanwani) Date: Wed, 10 Jun 2009 14:40:11 -0700 Subject: [rbridge] # of tree issues: default # of distribution trees, minimum acceptable supported # of trees In-Reply-To: <4A283579.9080908@sun.com> References: <4A283579.9080908@sun.com> Message-ID: <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD81@HQ-EXCH-7.corp.brocade.com> The default number of trees should be 1. I don't like (c) at all. I would prefer that we go with the least capable device's request; i.e. if even one RBridge says to compute only one tree, then only one tree should get computed. Anoop > -----Original Message----- > From: rbridge-bounces at postel.org > [mailto:rbridge-bounces at postel.org] On Behalf Of Radia Perlman > Sent: Thursday, June 04, 2009 1:59 PM > To: TRILL/RBridge Working Group > Subject: [rbridge] # of tree issues: default # of > distribution trees, minimum acceptable supported # of trees > > Easy question: what do you prefer the default # of trees to > calculate be? > > Harder question: What is the minimum acceptable # of trees an > implementation > is willing to compute? (this is currently not specified in the spec. > Should it be?) > > Ugly question: What happens if R3 can only support, say, 3 trees and > the highest priority guy specifies, say, 5 trees? There are various > potential answers to this: > > a) if we ignore the question, it will go away > b) we should put "number of trees I support" into the LSP, and > the highest priority RBridge is not allowed to specify a number > greater than the minimum supported by any RBridge > c) require RBridges to support as many trees as the highest > priority RBridge demands > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > From ddutt at cisco.com Wed Jun 10 15:00:47 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Wed, 10 Jun 2009 15:00:47 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A30122C.9020609@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> Message-ID: <4A302D0F.8000100@cisco.com> There is a L2 protocol called MMRP (used to be called GMRP) that allows me to do the equivalent of IGMP but for L2-only multicast (actually it can also be used for IP multicast). So, it should be possible for an Rbridge to say "don't send me any L2 multicast that isn't derived from IP multicast". Dinesh Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > James Carlson wrote: > >> Joe Touch writes: >> >>> Donald Eastlake wrote: >>> >>>> This was put >>>> through a consensus call on the working group mailing list resulting >>>> in the formal consensus determination here: >>>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >>>> >>> Besides your mail, there was only one post from James Carlson endorsing >>> the idea: >>> http://www.postel.org/pipermail/rbridge/2007-July/002400.html >>> >> Just to make clear (which itself might be impossible at this point): >> the reason I supported it was for symmetry with the other multicast- >> optimizing bits already defined. If the implementation has some >> reason to know that it has useful local information about non-IP >> multicasts in use (e.g., the subnet in question runs only IP or >> perhaps is known to use GMRP for all multicast addresses), then it can >> set or reset the flag as needed. If it doesn't (or can't) know about >> non-IP multicast usage, then it should set it to 1 with the rest of >> those who aren't snooping the multicast control protocols. >> >> I somewhat doubt it's going to see much use, but it's also fairly >> cheap -- as long as we already have to support IPv4 and IPv6 control >> bits. (And since, if you're lazy, you can just ignore it and let the >> downstream discard the unwanted packets.) >> > > I'm seeing a "something operators can set as desired", but not a reason > they would ever want to set it. I particularly dislike the idea of > filtering multicasts based on upper layer info (i.e., whether it's IP or > not). IGMP is an optimization, but it seems like this bit could break > things when its use wasn't needed - and I still don't see a clear need. > > Joe > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkowEiwACgkQE5f5cImnZrvfzgCg6QgncAxHZVc5xThyFXFreWJS > ndQAoIf7/c6vdfhkjCe93tilC8cMPJIe > =0quk > -----END PGP SIGNATURE----- > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From ddutt at cisco.com Wed Jun 10 15:02:24 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Wed, 10 Jun 2009 15:02:24 -0700 Subject: [rbridge] # of tree issues: default # of distribution trees, minimum acceptable supported # of trees In-Reply-To: <4A283579.9080908@sun.com> References: <4A283579.9080908@sun.com> Message-ID: <4A302D70.3030800@cisco.com> I don't like (c) at all. (b) is a fine solution and I believe it is already there. I don't have a strong opinion on the min number of multicast trees that an Rbridge MUST support. But asking for anything more than 1 seems over-zealous, Dinesh Radia Perlman wrote: > Easy question: what do you prefer the default # of trees to calculate be? > > Harder question: What is the minimum acceptable # of trees an implementation > is willing to compute? (this is currently not specified in the spec. > Should it be?) > > Ugly question: What happens if R3 can only support, say, 3 trees and > the highest priority guy specifies, say, 5 trees? There are various > potential answers to this: > > a) if we ignore the question, it will go away > b) we should put "number of trees I support" into the LSP, and > the highest priority RBridge is not allowed to specify a number > greater than the minimum supported by any RBridge > c) require RBridges to support as many trees as the highest > priority RBridge demands > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From touch at ISI.EDU Wed Jun 10 15:06:46 2009 From: touch at ISI.EDU (Joe Touch) Date: Wed, 10 Jun 2009 15:06:46 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A302D0F.8000100@cisco.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> Message-ID: <4A302E76.4060405@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dinesh G Dutt wrote: > There is a L2 protocol called MMRP (used to be called GMRP) that allows > me to do the equivalent of IGMP but for L2-only multicast (actually it > can also be used for IP multicast). So, it should be possible for an > Rbridge to say "don't send me any L2 multicast that isn't derived from > IP multicast". I don't see why these two statements are related. If MMRP can also be used for IP multicast or not, then how would an operator ever know when to set this bit to "off"? I'm suggesting that if we can't give clear advice to an operator on when to set each value of a bit, we should not include it. Joe > James Carlson wrote: > >>>> Joe Touch writes: >>>> >>>>> Donald Eastlake wrote: >>>>> >>>>>> This was put >>>>>> through a consensus call on the working group mailing list resulting >>>>>> in the formal consensus determination here: >>>>>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >>>>>> >>>>> Besides your mail, there was only one post from James Carlson endorsing >>>>> the idea: >>>>> http://www.postel.org/pipermail/rbridge/2007-July/002400.html >>>>> >>>> Just to make clear (which itself might be impossible at this point): >>>> the reason I supported it was for symmetry with the other multicast- >>>> optimizing bits already defined. If the implementation has some >>>> reason to know that it has useful local information about non-IP >>>> multicasts in use (e.g., the subnet in question runs only IP or >>>> perhaps is known to use GMRP for all multicast addresses), then it can >>>> set or reset the flag as needed. If it doesn't (or can't) know about >>>> non-IP multicast usage, then it should set it to 1 with the rest of >>>> those who aren't snooping the multicast control protocols. >>>> >>>> I somewhat doubt it's going to see much use, but it's also fairly >>>> cheap -- as long as we already have to support IPv4 and IPv6 control >>>> bits. (And since, if you're lazy, you can just ignore it and let the >>>> downstream discard the unwanted packets.) >>>> > > I'm seeing a "something operators can set as desired", but not a reason > they would ever want to set it. I particularly dislike the idea of > filtering multicasts based on upper layer info (i.e., whether it's IP or > not). IGMP is an optimization, but it seems like this bit could break > things when its use wasn't needed - and I still don't see a clear need. > > Joe _______________________________________________ rbridge mailing list rbridge at postel.org http://mailman.postel.org/mailman/listinfo/rbridge >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEUEARECAAYFAkowLnYACgkQE5f5cImnZru4jACXRpWDmbsqx36tYQ71mAgNAPQF mQCfWccsjsmdv6jylLSIteO+s3M0Hto= =VPUF -----END PGP SIGNATURE----- From ddutt at cisco.com Wed Jun 10 22:04:23 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Wed, 10 Jun 2009 22:04:23 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A302E76.4060405@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> <4A302E76.4060405@isi.edu> Message-ID: <4A309057.3070909@cisco.com> Joe, I see your point. I don't know the basis for this requirement. If an operator wanted to turn off all non-IP multicast, wouldn't turning on this bit be a good idea ? I guess your question is in what situations can an operator make this decision. Furthermore, I don't believe existing 802.1Q bridges provide such a funtionality. So, maybe getting rid of this bit is a good idea as you suggest. Let me check with Dino and see if he remembers the need for this bit, Dinesh Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Dinesh G Dutt wrote: > >> There is a L2 protocol called MMRP (used to be called GMRP) that allows >> me to do the equivalent of IGMP but for L2-only multicast (actually it >> can also be used for IP multicast). So, it should be possible for an >> Rbridge to say "don't send me any L2 multicast that isn't derived from >> IP multicast". >> > > I don't see why these two statements are related. If MMRP can also be > used for IP multicast or not, then how would an operator ever know when > to set this bit to "off"? > > I'm suggesting that if we can't give clear advice to an operator on when > to set each value of a bit, we should not include it. > > Joe > > >> James Carlson wrote: >> >> >>>>> Joe Touch writes: >>>>> >>>>> >>>>>> Donald Eastlake wrote: >>>>>> >>>>>> >>>>>>> This was put >>>>>>> through a consensus call on the working group mailing list resulting >>>>>>> in the formal consensus determination here: >>>>>>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >>>>>>> >>>>>>> >>>>>> Besides your mail, there was only one post from James Carlson endorsing >>>>>> the idea: >>>>>> http://www.postel.org/pipermail/rbridge/2007-July/002400.html >>>>>> >>>>>> >>>>> Just to make clear (which itself might be impossible at this point): >>>>> the reason I supported it was for symmetry with the other multicast- >>>>> optimizing bits already defined. If the implementation has some >>>>> reason to know that it has useful local information about non-IP >>>>> multicasts in use (e.g., the subnet in question runs only IP or >>>>> perhaps is known to use GMRP for all multicast addresses), then it can >>>>> set or reset the flag as needed. If it doesn't (or can't) know about >>>>> non-IP multicast usage, then it should set it to 1 with the rest of >>>>> those who aren't snooping the multicast control protocols. >>>>> >>>>> I somewhat doubt it's going to see much use, but it's also fairly >>>>> cheap -- as long as we already have to support IPv4 and IPv6 control >>>>> bits. (And since, if you're lazy, you can just ignore it and let the >>>>> downstream discard the unwanted packets.) >>>>> >>>>> >> I'm seeing a "something operators can set as desired", but not a reason >> they would ever want to set it. I particularly dislike the idea of >> filtering multicasts based on upper layer info (i.e., whether it's IP or >> not). IGMP is an optimization, but it seems like this bit could break >> things when its use wasn't needed - and I still don't see a clear need. >> >> Joe >> > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEUEARECAAYFAkowLnYACgkQE5f5cImnZru4jACXRpWDmbsqx36tYQ71mAgNAPQF > mQCfWccsjsmdv6jylLSIteO+s3M0Hto= > =VPUF > -----END PGP SIGNATURE----- > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From james.d.carlson at sun.com Thu Jun 11 05:30:45 2009 From: james.d.carlson at sun.com (James Carlson) Date: Thu, 11 Jun 2009 08:30:45 -0400 Subject: [rbridge] Tie breaking in trees In-Reply-To: <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD7A@HQ-EXCH-7.corp.brocade.com> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> <4A2C503E.2050506@cisco.com> <18989.2863.515852.713148@gargle.gargle.HOWL> <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD7A@HQ-EXCH-7.corp.brocade.com> Message-ID: <18992.63733.200516.460641@gargle.gargle.HOWL> Anoop Ghanwani writes: > Likewise, since we compute a tree for multicast, it > would make sense to only allow one of those. Yep; that's what has to happen. > If someone really wants to have the multiple links > active, then they can use link aggregation. The only significant problem with that assertion is that it assumes that the links are of a sort that can be aggregated. I know that the initial deployments (ours included) will all be TRILL-over-Ethernet, but I do think it's a bad idea to assume that this will _always_ be true. Protocols (good ones at least) tend to have long lives, and need to adapt. (A minor problem is that handling aggregation by LACP is a pain. It requires manual configuration and maintenance. If we detect and handle aggregates appropriately in TRILL, then that's one more selling point for RBridges over regular bridges: you just wire them up, and they work. No questions asked.) -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From james.d.carlson at Sun.COM Thu Jun 11 06:37:18 2009 From: james.d.carlson at Sun.COM (James Carlson) Date: Thu, 11 Jun 2009 09:37:18 -0400 Subject: [rbridge] # of tree issues: default # of distribution trees, minimum acceptable supported # of trees In-Reply-To: <4A302D70.3030800@cisco.com> References: <4A283579.9080908@sun.com> <4A302D70.3030800@cisco.com> Message-ID: <18993.2190.754702.354521@gargle.gargle.HOWL> Dinesh G Dutt writes: > I don't like (c) at all. (b) is a fine solution and I believe it is > already there. > > I don't have a strong opinion on the min number of multicast trees that > an Rbridge MUST support. But asking for anything more than 1 seems > over-zealous, (b) sounds like the right answer, though it needs to be no greater than the *maximum* supported by any RBridges (rather than the minimum). -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Thu Jun 11 07:37:51 2009 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jun 2009 07:37:51 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A309057.3070909@cisco.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> <4A302E76.4060405@isi.edu> <4A309057.3070909@cisco.com> Message-ID: <4A3116BF.8090201@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dinesh G Dutt wrote: > Joe, > > I see your point. I don't know the basis for this requirement. If an > operator wanted to turn off all non-IP multicast, wouldn't turning on > this bit be a good idea ? I guess your question is in what situations > can an operator make this decision. Exactly. > Furthermore, I don't believe > existing 802.1Q bridges provide such a funtionality. So, maybe getting > rid of this bit is a good idea as you suggest. > > Let me check with Dino and see if he remembers the need for this bit, Thanks. That'd be useful. Joe > > Dinesh > Joe Touch wrote: > > > Dinesh G Dutt wrote: > >>>> There is a L2 protocol called MMRP (used to be called GMRP) that allows >>>> me to do the equivalent of IGMP but for L2-only multicast (actually it >>>> can also be used for IP multicast). So, it should be possible for an >>>> Rbridge to say "don't send me any L2 multicast that isn't derived from >>>> IP multicast". >>>> > > I don't see why these two statements are related. If MMRP can also be > used for IP multicast or not, then how would an operator ever know when > to set this bit to "off"? > > I'm suggesting that if we can't give clear advice to an operator on when > to set each value of a bit, we should not include it. > > Joe > > >>>> James Carlson wrote: >>>> >>>> >>>>>>> Joe Touch writes: >>>>>>> >>>>>>>> Donald Eastlake wrote: >>>>>>>> >>>>>>>>> This was put >>>>>>>>> through a consensus call on the working group mailing list >>>>>>>>> resulting >>>>>>>>> in the formal consensus determination here: >>>>>>>>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >>>>>>>>> >>>>>>>> Besides your mail, there was only one post from James Carlson >>>>>>>> endorsing >>>>>>>> the idea: >>>>>>>> http://www.postel.org/pipermail/rbridge/2007-July/002400.html >>>>>>>> >>>>>>> Just to make clear (which itself might be impossible at this point): >>>>>>> the reason I supported it was for symmetry with the other multicast- >>>>>>> optimizing bits already defined. If the implementation has some >>>>>>> reason to know that it has useful local information about non-IP >>>>>>> multicasts in use (e.g., the subnet in question runs only IP or >>>>>>> perhaps is known to use GMRP for all multicast addresses), then it >>>>>>> can >>>>>>> set or reset the flag as needed. If it doesn't (or can't) know about >>>>>>> non-IP multicast usage, then it should set it to 1 with the rest of >>>>>>> those who aren't snooping the multicast control protocols. >>>>>>> >>>>>>> I somewhat doubt it's going to see much use, but it's also fairly >>>>>>> cheap -- as long as we already have to support IPv4 and IPv6 control >>>>>>> bits. (And since, if you're lazy, you can just ignore it and let the >>>>>>> downstream discard the unwanted packets.) >>>>>>> >>>> I'm seeing a "something operators can set as desired", but not a reason >>>> they would ever want to set it. I particularly dislike the idea of >>>> filtering multicasts based on upper layer info (i.e., whether it's IP or >>>> not). IGMP is an optimization, but it seems like this bit could break >>>> things when its use wasn't needed - and I still don't see a clear need. >>>> >>>> Joe >>>> > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > >> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoxFr8ACgkQE5f5cImnZrutDwCbBuwVDzPZgxdE/AB0ZcS+m0SQ ALgAn39Af0EuNqlfya38OGbfuJowMqGE =s9As -----END PGP SIGNATURE----- From ddutt at cisco.com Thu Jun 11 07:58:24 2009 From: ddutt at cisco.com (Dinesh G Dutt) Date: Thu, 11 Jun 2009 07:58:24 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A3116BF.8090201@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> <4A302E76.4060405@isi.edu> <4A309057.3070909@cisco.com> <4A3116BF.8090201@isi.edu> Message-ID: <4A311B90.9000902@cisco.com> I checked and we don't need this bit. It was something that was put in for use in a network where MMRP was widely deployed, kind of thing. But as you pointed out, we can prune multicast effectively using the normal multicast pruning and don't need this bit. Dinesh Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > Dinesh G Dutt wrote: > >> Joe, >> >> I see your point. I don't know the basis for this requirement. If an >> operator wanted to turn off all non-IP multicast, wouldn't turning on >> this bit be a good idea ? I guess your question is in what situations >> can an operator make this decision. >> > > Exactly. > > >> Furthermore, I don't believe >> existing 802.1Q bridges provide such a funtionality. So, maybe getting >> rid of this bit is a good idea as you suggest. >> >> Let me check with Dino and see if he remembers the need for this bit, >> > > Thanks. That'd be useful. > > Joe > > >> Dinesh >> Joe Touch wrote: >> >> >> Dinesh G Dutt wrote: >> >> >>>>> There is a L2 protocol called MMRP (used to be called GMRP) that allows >>>>> me to do the equivalent of IGMP but for L2-only multicast (actually it >>>>> can also be used for IP multicast). So, it should be possible for an >>>>> Rbridge to say "don't send me any L2 multicast that isn't derived from >>>>> IP multicast". >>>>> >>>>> >> I don't see why these two statements are related. If MMRP can also be >> used for IP multicast or not, then how would an operator ever know when >> to set this bit to "off"? >> >> I'm suggesting that if we can't give clear advice to an operator on when >> to set each value of a bit, we should not include it. >> >> Joe >> >> >> >>>>> James Carlson wrote: >>>>> >>>>> >>>>> >>>>>>>> Joe Touch writes: >>>>>>>> >>>>>>>> >>>>>>>>> Donald Eastlake wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> This was put >>>>>>>>>> through a consensus call on the working group mailing list >>>>>>>>>> resulting >>>>>>>>>> in the formal consensus determination here: >>>>>>>>>> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Besides your mail, there was only one post from James Carlson >>>>>>>>> endorsing >>>>>>>>> the idea: >>>>>>>>> http://www.postel.org/pipermail/rbridge/2007-July/002400.html >>>>>>>>> >>>>>>>>> >>>>>>>> Just to make clear (which itself might be impossible at this point): >>>>>>>> the reason I supported it was for symmetry with the other multicast- >>>>>>>> optimizing bits already defined. If the implementation has some >>>>>>>> reason to know that it has useful local information about non-IP >>>>>>>> multicasts in use (e.g., the subnet in question runs only IP or >>>>>>>> perhaps is known to use GMRP for all multicast addresses), then it >>>>>>>> can >>>>>>>> set or reset the flag as needed. If it doesn't (or can't) know about >>>>>>>> non-IP multicast usage, then it should set it to 1 with the rest of >>>>>>>> those who aren't snooping the multicast control protocols. >>>>>>>> >>>>>>>> I somewhat doubt it's going to see much use, but it's also fairly >>>>>>>> cheap -- as long as we already have to support IPv4 and IPv6 control >>>>>>>> bits. (And since, if you're lazy, you can just ignore it and let the >>>>>>>> downstream discard the unwanted packets.) >>>>>>>> >>>>>>>> >>>>> I'm seeing a "something operators can set as desired", but not a reason >>>>> they would ever want to set it. I particularly dislike the idea of >>>>> filtering multicasts based on upper layer info (i.e., whether it's IP or >>>>> not). IGMP is an optimization, but it seems like this bit could break >>>>> things when its use wasn't needed - and I still don't see a clear need. >>>>> >>>>> Joe >>>>> >>>>> >> _______________________________________________ >> rbridge mailing list >> rbridge at postel.org >> http://mailman.postel.org/mailman/listinfo/rbridge >> >> > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iEYEARECAAYFAkoxFr8ACgkQE5f5cImnZrutDwCbBuwVDzPZgxdE/AB0ZcS+m0SQ > ALgAn39Af0EuNqlfya38OGbfuJowMqGE > =s9As > -----END PGP SIGNATURE----- > > -- We make our world significant by the courage of our questions and by the depth of our answers. - Carl Sagan From james.d.carlson at sun.com Thu Jun 11 08:02:18 2009 From: james.d.carlson at sun.com (James Carlson) Date: Thu, 11 Jun 2009 11:02:18 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A311B90.9000902@cisco.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> <4A302E76.4060405@isi.edu> <4A309057.3070909@cisco.com> <4A3116BF.8090201@isi.edu> <4A311B90.9000902@cisco.com> Message-ID: <18993.7290.146276.137183@gargle.gargle.HOWL> Dinesh G Dutt writes: > I checked and we don't need this bit. It was something that was put in > for use in a network where MMRP was widely deployed, kind of thing. But > as you pointed out, we can prune multicast effectively using the normal > multicast pruning and don't need this bit. ... and I'm happy with or without the bit. It's not important. -- James Carlson, Solaris Networking Sun Microsystems / 35 Network Drive 71.232W Vox +1 781 442 2084 MS UBUR02-212 / Burlington MA 01803-2757 42.496N Fax +1 781 442 1677 From touch at ISI.EDU Thu Jun 11 14:23:24 2009 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jun 2009 14:23:24 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <18993.7290.146276.137183@gargle.gargle.HOWL> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <18992.3111.509016.143173@gargle.gargle.HOWL> <4A30122C.9020609@isi.edu> <4A302D0F.8000100@cisco.com> <4A302E76.4060405@isi.edu> <4A309057.3070909@cisco.com> <4A3116BF.8090201@isi.edu> <4A311B90.9000902@cisco.com> <18993.7290.146276.137183@gargle.gargle.HOWL> Message-ID: <4A3175CC.3020406@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 AOK - I'd like to make a motion to remove the bit. I'd like to encourage a quick check that all other bits have clear indications as to when an administrator would want to set/clear them, based on either existing practice or specific guidance. I.e., it should be more than just saying "here's what the bit does". Thanks, all, for the exchange on this. Joe James Carlson wrote: > Dinesh G Dutt writes: >> I checked and we don't need this bit. It was something that was put in >> for use in a network where MMRP was widely deployed, kind of thing. But >> as you pointed out, we can prune multicast effectively using the normal >> multicast pruning and don't need this bit. > > ... and I'm happy with or without the bit. It's not important. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoxdcwACgkQE5f5cImnZrvORgCgsVyQ9YqKzTS+NFQ8zR1rOUO9 mqUAoO1vS3uNDjcWgzk/NzWG1aPYLElL =9mB0 -----END PGP SIGNATURE----- From d3e3e3 at gmail.com Thu Jun 11 20:50:54 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Thu, 11 Jun 2009 23:50:54 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A2FF2D8.4070902@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> Message-ID: <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> Hi Joe, Sorry for the delay in reply. See below... On Wed, Jun 10, 2009 at 1:52 PM, Joe Touch wrote: > Donald Eastlake wrote: >> The Other Multicast bit was added after Dino Farinacci advocated it at >> the last Chicago meeting. (See minutes: >> http://www.ietf.org/proceedings/07jul/minutes/trill.txt) > > The minutes indicate a suggestion to put the bit in, but not a clear > reason why it's needed. The notes only indicate that this was a > "suggestion", then declare "consensus to be confirmed on the list. > > The post you made on the email list in July indicates consensus at the > meeting, even though that wasn't indicated in the minutes: > http://www.postel.org/pipermail/rbridge/2007-July/002399.html The minutes are not a transcript or blow by blow description. As I recall, Dino Farinacci was at the meeting. There was discussion of the previously single multicast router attachment bit and how it should be split into separate IPv4 and IPv6 multicast router attachment bits, which attract all IPv4 derived and IPv6 derived multicast frames, respectively. Dino spoke of how you should do whatever you can to control the flooding of multicast traffic and asked if a similar bit could be added for the non-IP derived frames. As the minutes say: "This met with general agreement." As I recall, there appeared to be virtually unanimous agreement among those in the room. In fact agreement so strong that someone, Radia if I remember correctly, asked Dino if we thought we should add a fourth configuration bit that would indicate whether the RBridge wanted Broadcast frames. Dino replied that he thought that would be a bad idea as, by definition, Broadcast frames are supposed to go to all stations. (All of this is actually per VLAN.) >> This was put >> through a consensus call on the working group mailing list resulting >> in the formal consensus determination here: >> http://www.postel.org/pipermail/rbridge/2007-September/002470.html. > > Besides your mail, there was only one post from James Carlson endorsing > the idea: > http://www.postel.org/pipermail/rbridge/2007-July/002400.html > > Two months later you declare consensus based on "plenty of support and > almost no opposition". There was no discussion on the mailing list. > There was not really "plenty of support" on the mailing list. > http://www.postel.org/pipermail/rbridge/2007-September/002470.html Consensus determination is not based solely on the working group mailing list. And, of course, it can not be based solely on opinions at a face-to-face working group meeting. But chairs are entitled to take both into account. Even the strongest support at a meeting must be tested on the mailing list. In this case, what appeared to be quite strong support at the meeting was confirmed on the mailing list by the lack of any opposition on the mailing list and it is just icing on the cake that there was additional support on the mailing list. I stand by the consensus determination that Erik and I made and that, overall, there was plenty of support in the working group.. >> This bit has been in the draft since verison -06. It defaults to "on" >> so, unless you go to some effort to configure it to off, all RBridges >> do get all the non-IP derived multicast traffic for the VLANs they >> advertise they are connected to. It doesn't have any effect on layer 2 >> or TRILL control frames. > > The question is "why do we need this complexity"? I appreciate it won't > cause problems if defaulted 'on', but having a bit means making sure > it's implemented and tested for vendors. It provides another tool for controlling the burden of multicast traffic with a per VLAN granularity. Having it in the protocol imposes some costs. Different people may differ in judgement as to the balance of these factors and whether the bit should be in the protocol. >> In light of all this, I am very reluctant to consider changing this >> part of the design unless there is a clear consensus to re-open the >> issue. > > Well, there's just as much consensus on the mailing list to open the > issue (my post) as there was to reach consensus to insert the bit in the > first place ;-) The consensus to add this bit was based on overwhelming support at a meeting as confirmed by no opposition and a further support on the mailing list. > Maybe what I'm asking is at least to clarify the issue. Is it really > needed as a bit, and if so, what is the impact of changing it from the > default value? No, it's not "needed" in a strict sense. Neither is unicast. You could just handle everything as if it were broadcast. Like almost everything, the bit has pluses and minuses. When you ask the impact of changing it from its default value of "on" to "off", for one more VLANs, are you saying that the protocol specification is ambiguous? The optional multicast optimization in TRILL applies only to IP derived multicast. The IPv4 and IPv6 multicast router attachment bits attract (per VLAN) all IPv4 and IPv6 derived multicast traffic. The Other Multicast bit attracts all other multicast traffic and defaults to on. In the absence of this bit, there would be no way, within the TRILL protocol, to control non-IP derived multicast and all of it would have to be sent to all RBridges in the same VLAN as the traffic. With the bit, it is possible to clear the bit and stop non-IP derived multicast from being decapsulated at the RBridge or even getting to the RBridge if there are no other RBridges downstream in the distribution tree being used that has the bit set. > Joe Thanks, Donald From touch at ISI.EDU Thu Jun 11 21:22:49 2009 From: touch at ISI.EDU (Joe Touch) Date: Thu, 11 Jun 2009 21:22:49 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> Message-ID: <4A31D819.4000801@isi.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Donald Eastlake wrote: ... > In the absence of this bit, there would be no way, within the TRILL > protocol, to control non-IP derived multicast and all of it would have > to be sent to all RBridges in the same VLAN as the traffic. With the > bit, it is possible to clear the bit and stop non-IP derived multicast > from being decapsulated at the RBridge or even getting to the RBridge > if there are no other RBridges downstream in the distribution tree > being used that has the bit set. I have heard repeatedly what this does. The question we've had no response to is: why would an operator want to do this? In the absence of this bit, there would be no way to do something in TRILL that isn't supported in existing L2 bridges (reportedly). If the bit were set by an operator, there's the potential for silent malfunction - where the operator thought there were no rbridges downstream, but then that changed. If this sort of thing isn't needed in existing L2s, why should it be part of rbridges? Joe -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkox2BkACgkQE5f5cImnZruTxgCgxo5/rWy05RL1kdaBSVdvCohS uq8An1YcA3fWskN2fywv2dobfgK/YtPN =J/Dy -----END PGP SIGNATURE----- From d3e3e3 at gmail.com Thu Jun 11 21:23:35 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Fri, 12 Jun 2009 00:23:35 -0400 Subject: [rbridge] Tie breaking in trees In-Reply-To: <18992.63733.200516.460641@gargle.gargle.HOWL> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> <4A2C503E.2050506@cisco.com> <18989.2863.515852.713148@gargle.gargle.HOWL> <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD7A@HQ-EXCH-7.corp.brocade.com> <18992.63733.200516.460641@gargle.gargle.HOWL> Message-ID: <1028365c0906112123l516389d6n4b535afab61a1eea@mail.gmail.com> It seems to me that supporting the ability to accept multi-destination frames on more than one parallel link, under conditions when this is safe (basically that the links are point-to-point), can be an option. RBridges would have to support transmission and reception on the one link the tie-breaking criterion choose. So it would always be safe to send on that one link. If an RBridge supports reception on all the parallel links for which it is safe, this can be announced as a supported option (an option that would not have to be expressed in the frames that might be multi-pathed over those links). Thanks, Donald ============================= Donald E. Eastlake 3rd +1-508-634-2066 (home) 155 Beaver Street Milford, MA 01757 USA d3e3e3 at gmail.com From d3e3e3 at gmail.com Sun Jun 14 13:03:28 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Sun, 14 Jun 2009 16:03:28 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A31D819.4000801@isi.edu> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> <4A31D819.4000801@isi.edu> Message-ID: <1028365c0906141303y7ecafe1cu3d1cc6d3a3b5d9ca@mail.gmail.com> On Fri, Jun 12, 2009 at 12:22 AM, Joe Touch wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > ... > > In the absence of this bit, there would be no way to do something in > TRILL that isn't supported in existing L2 bridges (reportedly). The L2 multicast registration protocols (GMRP/MMRP), as far as I can see, allow you to register individual multicast addresses or to register for all multicast traffic. All of the TRILL RBridge per VLAN multicast bits (IPv4 and IPv6 derived and Other) do something that you can't do with GMRP/MMRP in requesting huge blocks of multicast addresses. > If the bit were set by an operator, there's the potential for silent > malfunction - where the operator thought there were no rbridges > downstream, but then that changed. I don't think any of these three bits has any effect on multicast frames flowing downstream in a distribution tree. If an RBridge is doing multicast pruning, these bits just make that RBridge act so as to assure that all multicast in the three categories gets any RBridges it is directly or indirectly feeding... > ... > > Joe Thanks, Donald From Radia.Perlman at sun.com Sun Jun 14 13:05:56 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Sun, 14 Jun 2009 13:05:56 -0700 Subject: [rbridge] Tie breaking in trees In-Reply-To: <1028365c0906112123l516389d6n4b535afab61a1eea@mail.gmail.com> References: <4A286B34.3080703@sun.com> <1028365c0906051453hc0d3cf4m3af089f0ba56a9e5@mail.gmail.com> <4A2C503E.2050506@cisco.com> <18989.2863.515852.713148@gargle.gargle.HOWL> <54E40085E26FCB45AAB9D3DF3F9DD3DDCFB97CDD7A@HQ-EXCH-7.corp.brocade.com> <18992.63733.200516.460641@gargle.gargle.HOWL> <1028365c0906112123l516389d6n4b535afab61a1eea@mail.gmail.com> Message-ID: <4A355824.9090202@sun.com> So it seems to me that there is some discomfort in requiring accepting on multiple links, but as Anoop and Donald pointed out, this feature could be implemented outside of the spec by a) doing pairwise negotation for link aggregation, or b) negotiating the option of load splitting multicast on multiple pt-to-pt links with the neighbor, since it doesn't affect any RBridges other than the pair attached to the multiple pt-to-pt links. So, it seems like the wording I suggested in the original post on this thread would be acceptable, since I don't think the wording precludes either of the above ways of doing multicast load splitting. So to review, the wording I'd suggest for replacing that one paragraph is: "If the tree-building and tie-breaking for a particular tree selects a non-pseudonode link between R1 and R2, that "R1-R2" link might consist of multiple links. These parallel links would be visible to R1 and R2, but not to the rest of the campus (because the links are not represented by pseudonodes). If this bundle of parallel links is included in a tree, it is important for R1 and R2 to decide which link to use, but is irrelevant to other RBridges, and therefore, the tie-breaking algorithm need not be visible to any RBridges other than R1 and R2. In this case, R1-R2 adjacencies are ordered as follows, with the one "most preferred" adjacency being the one that R1 transmits to R2 on, and the one that R2 accepts traffic from R1 on: a) most preferred are those established by P2P Hellos, with tie-breaking among those based on preferring the one with the numerically highest Extended Circuit ID. b) next considered are those established through TRILL-Hellos, with suppressed pseudonodes. Note that the pseudonode is suppressed in LSPs, but still appears in the TRILL-Hello, and therefore is available for this tie-breaking. Among these links, the one with the numerically largest pseudonode ID is preferred". Donald Eastlake wrote: > It seems to me that supporting the ability to accept multi-destination > frames on more than one parallel link, under conditions when this is > safe (basically that the links are point-to-point), can be an option. > RBridges would have to support transmission and reception on the one > link the tie-breaking criterion choose. So it would always be safe to > send on that one link. If an RBridge supports reception on all the > parallel links for which it is safe, this can be announced as a > supported option (an option that would not have to be expressed in the > frames that might be multi-pathed over those links). > > Thanks, > Donald > ============================= > Donald E. Eastlake 3rd +1-508-634-2066 (home) > 155 Beaver Street > Milford, MA 01757 USA > d3e3e3 at gmail.com > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > From Radia.Perlman at sun.com Sun Jun 14 13:10:17 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Sun, 14 Jun 2009 13:10:17 -0700 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> Message-ID: <4A355929.2080700@sun.com> Again, trying to close up these last little issues. Having the bit certainly isn't expensive, but it would be really good to understand why it's there and when it would be set, so we should probably wait for Dinesh to ask Dino. I worry that if it's needed, it might need to be more granular that "all" or "no" layer 2 multicast. For instance, there might wind up being something IGMP-like in layer 2 requesting specific multicast addresses or ranges. (there might already be such a thing). Radia From d3e3e3 at gmail.com Sun Jun 14 20:36:47 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Sun, 14 Jun 2009 23:36:47 -0400 Subject: [rbridge] When would an RBridge say "I don't want layer 2 multicast"? In-Reply-To: <4A355929.2080700@sun.com> References: <4A2836FF.60801@sun.com> <4A285660.7090505@isi.edu> <1028365c0906061053x7d427855pe8718a30fb3f052e@mail.gmail.com> <4A2AB3DF.6020305@isi.edu> <1028365c0906100818w1c458fdeu272674baac56201e@mail.gmail.com> <4A2FF2D8.4070902@isi.edu> <1028365c0906112050y46f78537y69ab347592dd234@mail.gmail.com> <4A355929.2080700@sun.com> Message-ID: <1028365c0906142036y7a8678b4pa234446b74df784b@mail.gmail.com> On Sun, Jun 14, 2009 at 4:10 PM, Radia Perlman wrote: > Again, trying to close up these last little issues. > > Having the bit certainly isn't expensive, but it would be really good to > understand why it's there and > when it would be set, so we should probably wait for Dinesh to ask Dino. I believe he has and reports that Dino is not longer pushing for this bit. > I worry that if it's needed, it might need to be more granular that > "all" or "no" layer 2 multicast. For instance, > there might wind up being something IGMP-like in layer 2 requesting > specific multicast addresses or ranges. > (there might already be such a thing). As far as I can tell, there is no facility for giving ranges in the layer-2 multicast registration protocols, only individual multicast addresses or all multicast addresses. > Radia Thanks, Donald From erik.nordmark at sun.com Wed Jun 17 10:23:13 2009 From: erik.nordmark at sun.com (Erik Nordmark) Date: Wed, 17 Jun 2009 10:23:13 -0700 Subject: [rbridge] Consensus to remove "Other Multicast" bit? Message-ID: <4A392681.4070207@sun.com> There has been some discussion about removing the Other Multicast bit on the mailing list, with a few people wanting to see it removed and no-one arguing to keep it. Since the Other Multicast bit was added to the base protocol specification based on working group consensus, it makes sense to formally ask for WG consensus regarding its removal. If you are in favor of keeping this bit, please speak up now. Otherwise, we will conclude that the WG consensus has changed. Erik and Donald From jpickering at nc.rr.com Mon Jun 22 08:55:24 2009 From: jpickering at nc.rr.com (jeff pickering) Date: Mon, 22 Jun 2009 08:55:24 -0700 Subject: [rbridge] dist tree parallel links - asymetric Message-ID: <4A3FA96C.80509@nc.rr.com> I have a distribution tree parallel links question I was hoping someone could clarify for me. Let's say you have 2 RBs downstream from the root, with parallel links between them. RB-A has cost 10 on link 1, cost 20 on link 2. RB-B has cost 20 on link 1, cost 10 on link 2. Lets say RB-B is downstream from RB-A. When computing its paths, RB-A will clearly use link 1. Whether it shows 1 or 2 links in its LSPs is irrelevant, RB-B still needs to know somehow to choose its adjacency over link 1 for its path to the root. But in the asymetric cost case, I dont see how its can do this. Is there something new in either the LSP or hello that would resolve this? Or should RB-B use its lower cost and if so how does the RPF issue get addressed? Thanks to anyone who can enlighten me. Jeff From ayabaner at cisco.com Mon Jun 22 10:05:27 2009 From: ayabaner at cisco.com (Ayan Banerjee) Date: Mon, 22 Jun 2009 10:05:27 -0700 Subject: [rbridge] dist tree parallel links - asymetric In-Reply-To: <4A3FA96C.80509@nc.rr.com> Message-ID: Jeff, In TRILL 3-way handshake is mandatory for P2P links. Hence, each node will get an unique extended circuit identifier after adjacency formation. On LAN links, the source mac can be used an identifier for uniqueness of each link. I believe that P2P links are preferred over LAN links. Among the set, in the event there are multiple "identifiers" then the highest (or lowest - I remember that there was some discussion on the list about highest/lowest, do not recall what we ended up with) one will be chosen for the tree. Thanks, Ayan On 6/22/09 8:55 AM, "jeff pickering" wrote: > > I have a distribution tree parallel links question I was hoping someone > could clarify > for me. > > Let's say you have 2 RBs downstream from the root, with parallel links > between them. > > RB-A has cost 10 on link 1, cost 20 on link 2. > RB-B has cost 20 on link 1, cost 10 on link 2. > > Lets say RB-B is downstream from RB-A. When computing its paths, RB-A > will clearly > use link 1. Whether it shows 1 or 2 links in its LSPs is irrelevant, > RB-B still needs to know > somehow to choose its adjacency over link 1 for its path to the root. > But in the asymetric cost > case, I dont see how its can do this. Is there something new in either > the LSP or hello that > would resolve this? > > Or should RB-B use its lower cost and if so how does the RPF issue get > addressed? > > Thanks to anyone who can enlighten me. > > Jeff > > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge From jpickering at nc.rr.com Tue Jun 23 06:15:32 2009 From: jpickering at nc.rr.com (jeff pickering) Date: Tue, 23 Jun 2009 06:15:32 -0700 Subject: [rbridge] dist tree parallel links - proposed solution In-Reply-To: References: Message-ID: <4A40D574.9090204@nc.rr.com> Ayan, Thanks much for all the help, but I still think there is a problem. I'll try to rearticulate: Scenario: RB-B is computing dist tree spf and has neighbor RB-A towards root. RB-A has multiple ptpt links to RB-B (ptpt critical to issue) and those links are UNEQUAL COST. RB-B must choose RB-A's lowest cost link (RB-Bs links costs are irrelevant.) RB-A announces in its LSP only the links to RB-B and their costs. (unlike a bcast subnet where pnode ID would also be included). It does not include any info that might enable RB-B to identify which link the announced cost is associated with. Likewise, the 3way hello contains nothing that allows mapping remote cost to adjacency. Tiebreaking rules based on circuit ID dont apply because there is no tie. (note that the wide metric tlv contains only system ID in its 7 octet link info). RB-B could make a choice based on its own link costs, but that would be a problem for RPF. So its seems additional information is needed. I see two options: 1) add some ciruict ID info to the subtlv of the wide metric ISN link. 2) add cost info to the new link capability tlv. 3) enforce some relationship between circuit ID and cost such that higher cost means less preferred circuit ID. Could then use circuit ID for a decision. 3) seems kink of hokey to me. and since the issue is purely local to RB-A and RB-B, I would prefer option 2. Of course if Im out to lunch, I'd love to hear why. Regards, Jeff Ayan Banerjee wrote: > Jeff, > > At nodes RB-A and RB-B, where this information is necessary during the SPF > run, LSP link information, adjacency information, metric information etc all > values are present. > > Agree that other nodes do not have the complete information, however, they > only care about the cost of the link between RB-A and RB-B. The link (among > the parallel links) to be used between RB-A and RB-B is needed locally and > complete information is present at those two nodes. > > Thanks, > Ayan > > On 6/22/09 1:09 PM, "jeff pickering" wrote: > > >> Ayan, >> >> How? There is no cost info in the 3way hello/handshake and no 3way cid >> related info >> in the LSP ISN link (wide narrow or otherwise). >> >> Thanks, >> Jeff >> >> >> >> Ayan Banerjee wrote: >> >>> Jeff, >>> >>> Agreed. However, from the adjacency state machine (since 3-way is mandatory >>> in TRILL) one can figure this information. >>> >>> Thanks, >>> Ayan >>> >>> >>> On 6/22/09 12:34 PM, "jeff pickering" wrote: >>> >>> >>> >>>> For the local matter of RB-B making its adjacency set, it has 2 >>>> adjcacencies to RB-A and it >>>> must choose one. The one it must choose is that represented by the >>>> lowest cost of link of >>>> the upstream node (RB-A). But for a ptpt link, there is nothing in the >>>> LSP that would allow >>>> RB-B to determine which one of its adjacencies corresponded/mapped to >>>> that link. >>>> >>>> Thanks, >>>> Jeff >>>> >>>> >>>> >>>> Ayan Banerjee wrote: >>>> >>>> >>>>> Jeff, >>>>> >>>>> On re-reading your email, I see that you have unequal costs. The costs from >>>>> the upstream node will be honored for the bi-directional tree. >>>>> >>>>> On a separate note, for equal cost multi-path, the link that needs to be >>>>> used between nodes RB-A and RB-B are a local matter. Other nodes will just >>>>> "view" the tree as being connected between RB-A and RB-B. I was trying to >>>>> state how the "local" algorithm is made deterministic. >>>>> >>>>> Thanks, >>>>> Ayan >>>>> >>>>> >>>>> On 6/22/09 11:49 AM, "jeff pickering" wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Im sorry if Im being obtuse. This is a non-equal cost issue. RB-A will >>>>>> advertise in its >>>>>> LSP ISN entry a link to RB-B and its associated cost, but no (ext/cid >>>>>> for ptpt) info >>>>>> which allows RB-B on the other end to determine which one of its >>>>>> adjacencies (and therefore >>>>>> ports) the advertised link corresponds to. I understand for bcast, you >>>>>> can determine this >>>>>> from the pnode id part on the ISN link, but dont see what info you have >>>>>> have for ptpt. >>>>>> (unless I misread the wide metrics spec 5305). >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> Ayan Banerjee wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> In TRILL 3-way handshake is mandatory for P2P links. Hence, each node >>>>>>> will >>>>>>> get an unique extended circuit identifier after adjacency formation. On >>>>>>> LAN >>>>>>> links, the source mac can be used an identifier for uniqueness of each >>>>>>> link. >>>>>>> >>>>>>> I believe that P2P links are preferred over LAN links. Among the set, in >>>>>>> the >>>>>>> event there are multiple "identifiers" then the highest (or lowest - I >>>>>>> remember that there was some discussion on the list about highest/lowest, >>>>>>> do >>>>>>> not recall what we ended up with) one will be chosen for the tree. >>>>>>> >>>>>>> Thanks, >>>>>>> Ayan >>>>>>> >>>>>>> On 6/22/09 8:55 AM, "jeff pickering" wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I have a distribution tree parallel links question I was hoping someone >>>>>>>> could clarify >>>>>>>> for me. >>>>>>>> >>>>>>>> Let's say you have 2 RBs downstream from the root, with parallel links >>>>>>>> between them. >>>>>>>> >>>>>>>> RB-A has cost 10 on link 1, cost 20 on link 2. >>>>>>>> RB-B has cost 20 on link 1, cost 10 on link 2. >>>>>>>> >>>>>>>> Lets say RB-B is downstream from RB-A. When computing its paths, RB-A >>>>>>>> will clearly >>>>>>>> use link 1. Whether it shows 1 or 2 links in its LSPs is irrelevant, >>>>>>>> RB-B still needs to know >>>>>>>> somehow to choose its adjacency over link 1 for its path to the root. >>>>>>>> But in the asymetric cost >>>>>>>> case, I dont see how its can do this. Is there something new in either >>>>>>>> the LSP or hello that >>>>>>>> would resolve this? >>>>>>>> >>>>>>>> Or should RB-B use its lower cost and if so how does the RPF issue get >>>>>>>> addressed? >>>>>>>> >>>>>>>> Thanks to anyone who can enlighten me. >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> rbridge mailing list >>>>>>>> rbridge at postel.org >>>>>>>> http://mailman.postel.org/mailman/listinfo/rbridge >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> >>> > > > > From Radia.Perlman at sun.com Tue Jun 23 09:49:23 2009 From: Radia.Perlman at sun.com (Radia Perlman) Date: Tue, 23 Jun 2009 09:49:23 -0700 Subject: [rbridge] dist tree parallel links - proposed solution In-Reply-To: <4A40D574.9090204@nc.rr.com> References: <4A40D574.9090204@nc.rr.com> Message-ID: <4A410793.9020100@sun.com> Let me see if I understand your question by restating it. You are concerned, when calculating the multicast distribution tree from root R, that all RBridges calculate the same tree, and think there might be a problem if someplace in the tree there is a link with asymmetric costs in the two directions. Is that what you are concerned about? I don't think there's a problem with having different bridges calculate the tree rooted at R differently, because they all use the same tree-building algorithm, starting with R, searching for paths shortest path first. But there is an interesting issue. I believe, when building the tree, looking at the newly treed node, say R1, that it is R1's reported cost to each of R1's neighbors that are used, not neighbor's cost to R1 (other than the "two way connectivity check", which I assume only checks that they both report the link exists, but not whether the costs are the same). So...... Suppose the link costs really are very different in the two directions for some link. For unicast, when R is calculating a shortest path tree from itself to each destination, it makes sense to use the link cost outwards from R, and the cost in the reverse direction is irrelevant. However, for multicast, it is a bidirectional tree being built. So, for instance, if some node R7 might get put into the tree as a child of R3, where the link cost R3-R7 is, say cost 2, but the reverse cost is, say, 4.5 gazillion, or R7 might get put into the tree as a child of R4, where the link cost R4-R7 and R7-R4 is a reasonable cost in both directions of, say, 18, then placing R7 as a child of R4 would be a better choice than placing R7 as a child of R3. I don't think we really want to worry about this, but it is interesting. If we were worried about it, we could use, as the cost of a link between R1 and R2, the *average* of the costs in each direction, for building the multicast tree. If there really were links with wildly different costs in the two directions, it might even be the right thing to do. Radia jeff pickering wrote: > Ayan, > > Thanks much for all the help, but I still think there is a problem. I'll > try to rearticulate: > > Scenario: RB-B is computing dist tree spf and has neighbor RB-A towards > root. > RB-A has multiple ptpt links to RB-B (ptpt critical to issue) and those > links are UNEQUAL COST. > RB-B must choose RB-A's lowest cost link (RB-Bs links costs are irrelevant.) > RB-A announces in its LSP only the links to RB-B and their costs. > (unlike a bcast subnet > where pnode ID would also be included). It does not include any info > that might enable > RB-B to identify which link the announced cost is associated with. > Likewise, the 3way hello > contains nothing that allows mapping remote cost to adjacency. > Tiebreaking rules based on > circuit ID dont apply because there is no tie. > > (note that the wide metric tlv contains only system ID in its 7 octet > link info). > > RB-B could make a choice based on its own link costs, but that would be > a problem for RPF. > So its seems additional information is needed. I see two options: > > 1) add some ciruict ID info to the subtlv of the wide metric ISN link. > 2) add cost info to the new link capability tlv. > 3) enforce some relationship between circuit ID and cost such that > higher cost > means less preferred circuit ID. Could then use circuit ID for a > decision. > > 3) seems kink of hokey to me. > and since the issue is purely local to RB-A and RB-B, I would prefer > option 2. > > Of course if Im out to lunch, I'd love to hear why. > > Regards, > Jeff > > Ayan Banerjee wrote: > >> Jeff, >> >> At nodes RB-A and RB-B, where this information is necessary during the SPF >> run, LSP link information, adjacency information, metric information etc all >> values are present. >> >> Agree that other nodes do not have the complete information, however, they >> only care about the cost of the link between RB-A and RB-B. The link (among >> the parallel links) to be used between RB-A and RB-B is needed locally and >> complete information is present at those two nodes. >> >> Thanks, >> Ayan >> >> On 6/22/09 1:09 PM, "jeff pickering" wrote: >> >> >> >>> Ayan, >>> >>> How? There is no cost info in the 3way hello/handshake and no 3way cid >>> related info >>> in the LSP ISN link (wide narrow or otherwise). >>> >>> Thanks, >>> Jeff >>> >>> >>> >>> Ayan Banerjee wrote: >>> >>> >>>> Jeff, >>>> >>>> Agreed. However, from the adjacency state machine (since 3-way is mandatory >>>> in TRILL) one can figure this information. >>>> >>>> Thanks, >>>> Ayan >>>> >>>> >>>> On 6/22/09 12:34 PM, "jeff pickering" wrote: >>>> >>>> >>>> >>>> >>>>> For the local matter of RB-B making its adjacency set, it has 2 >>>>> adjcacencies to RB-A and it >>>>> must choose one. The one it must choose is that represented by the >>>>> lowest cost of link of >>>>> the upstream node (RB-A). But for a ptpt link, there is nothing in the >>>>> LSP that would allow >>>>> RB-B to determine which one of its adjacencies corresponded/mapped to >>>>> that link. >>>>> >>>>> Thanks, >>>>> Jeff >>>>> >>>>> >>>>> >>>>> Ayan Banerjee wrote: >>>>> >>>>> >>>>> >>>>>> Jeff, >>>>>> >>>>>> On re-reading your email, I see that you have unequal costs. The costs from >>>>>> the upstream node will be honored for the bi-directional tree. >>>>>> >>>>>> On a separate note, for equal cost multi-path, the link that needs to be >>>>>> used between nodes RB-A and RB-B are a local matter. Other nodes will just >>>>>> "view" the tree as being connected between RB-A and RB-B. I was trying to >>>>>> state how the "local" algorithm is made deterministic. >>>>>> >>>>>> Thanks, >>>>>> Ayan >>>>>> >>>>>> >>>>>> On 6/22/09 11:49 AM, "jeff pickering" wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Im sorry if Im being obtuse. This is a non-equal cost issue. RB-A will >>>>>>> advertise in its >>>>>>> LSP ISN entry a link to RB-B and its associated cost, but no (ext/cid >>>>>>> for ptpt) info >>>>>>> which allows RB-B on the other end to determine which one of its >>>>>>> adjacencies (and therefore >>>>>>> ports) the advertised link corresponds to. I understand for bcast, you >>>>>>> can determine this >>>>>>> from the pnode id part on the ISN link, but dont see what info you have >>>>>>> have for ptpt. >>>>>>> (unless I misread the wide metrics spec 5305). >>>>>>> >>>>>>> Jeff >>>>>>> >>>>>>> >>>>>>> Ayan Banerjee wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Jeff, >>>>>>>> >>>>>>>> In TRILL 3-way handshake is mandatory for P2P links. Hence, each node >>>>>>>> will >>>>>>>> get an unique extended circuit identifier after adjacency formation. On >>>>>>>> LAN >>>>>>>> links, the source mac can be used an identifier for uniqueness of each >>>>>>>> link. >>>>>>>> >>>>>>>> I believe that P2P links are preferred over LAN links. Among the set, in >>>>>>>> the >>>>>>>> event there are multiple "identifiers" then the highest (or lowest - I >>>>>>>> remember that there was some discussion on the list about highest/lowest, >>>>>>>> do >>>>>>>> not recall what we ended up with) one will be chosen for the tree. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Ayan >>>>>>>> >>>>>>>> On 6/22/09 8:55 AM, "jeff pickering" wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I have a distribution tree parallel links question I was hoping someone >>>>>>>>> could clarify >>>>>>>>> for me. >>>>>>>>> >>>>>>>>> Let's say you have 2 RBs downstream from the root, with parallel links >>>>>>>>> between them. >>>>>>>>> >>>>>>>>> RB-A has cost 10 on link 1, cost 20 on link 2. >>>>>>>>> RB-B has cost 20 on link 1, cost 10 on link 2. >>>>>>>>> >>>>>>>>> Lets say RB-B is downstream from RB-A. When computing its paths, RB-A >>>>>>>>> will clearly >>>>>>>>> use link 1. Whether it shows 1 or 2 links in its LSPs is irrelevant, >>>>>>>>> RB-B still needs to know >>>>>>>>> somehow to choose its adjacency over link 1 for its path to the root. >>>>>>>>> But in the asymetric cost >>>>>>>>> case, I dont see how its can do this. Is there something new in either >>>>>>>>> the LSP or hello that >>>>>>>>> would resolve this? >>>>>>>>> >>>>>>>>> Or should RB-B use its lower cost and if so how does the RPF issue get >>>>>>>>> addressed? >>>>>>>>> >>>>>>>>> Thanks to anyone who can enlighten me. >>>>>>>>> >>>>>>>>> Jeff >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> rbridge mailing list >>>>>>>>> rbridge at postel.org >>>>>>>>> http://mailman.postel.org/mailman/listinfo/rbridge >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >> >> >> > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > From jpickering at nc.rr.com Tue Jun 23 10:54:03 2009 From: jpickering at nc.rr.com (jeff pickering) Date: Tue, 23 Jun 2009 10:54:03 -0700 Subject: [rbridge] dist tree parallel links - proposed solution In-Reply-To: <4A410793.9020100@sun.com> References: <4A40D574.9090204@nc.rr.com> <4A410793.9020100@sun.com> Message-ID: <4A4116BB.7050602@nc.rr.com> Actually, I was primarily concerned about RPF check failures. Lets say R7 just got put into the multicast tree as a child of R4 (R7 is the one running the SPF here). There are multiple links PTPT links between R4 and R7. If the links cost differ, R4 will perform its RPF check when receiving multicast packets from R7 and toss packets that dont arrive on its view of the lowest cost link between the two. That implies that R7 must know what R4's lowest cost link is when determining its upstream adjacency for the multicast tree. And for a PTPT link, I just dont see how R7 makes that decision. Clearly things are different if the link is broadcast because the link in the LSP contains a pnode ID which can can be used by R7 to dtermine exactly which link is implied when R4 puts a cost/ISN pair in its LSP. Please note that this has nothing to do with bcast links, or any router other than R4 and R7. All routers, including R4 and R7 compute the same tree as far as how nodes are interconnected in the tree. Its just that R7 may pick the wrong link over which to forward to R4 and therefore the RPF could fail. Jeff Radia Perlman wrote: > Let me see if I understand your question by restating it. > > You are concerned, when calculating the multicast distribution tree > from root R, that all RBridges > calculate the same tree, and think there might be a problem if > someplace in the tree there is a link > with asymmetric costs in the two directions. Is that what you are > concerned about? > > I don't think there's a problem with having different bridges > calculate the tree rooted at R differently, > because they all use the same tree-building algorithm, starting with > R, searching for paths shortest path first. > > But there is an interesting issue. > I believe, when building the tree, looking at the newly treed node, > say R1, that it is R1's reported cost > to each of R1's neighbors that are used, not neighbor's cost to R1 > (other than the "two way connectivity check", > which I assume only checks that they both report the link exists, but > not whether the costs are the same). > > So...... Suppose the link costs really are very different in the two > directions for some link. For unicast, > when R is calculating a shortest path tree from itself to each > destination, it makes sense to use the > link cost outwards from R, and the cost in the reverse direction is > irrelevant. > > However, for multicast, it is a bidirectional tree being built. So, > for instance, if some node R7 might get > put into the tree as a child of R3, where the link cost R3-R7 is, say > cost 2, but the reverse cost is, say, 4.5 gazillion, > or R7 might get put into the tree as a child of R4, where the link > cost R4-R7 and R7-R4 is a reasonable cost in both > directions of, say, 18, then placing R7 as a child of R4 would be a > better choice than placing R7 > as a child of R3. > > I don't think we really want to worry about this, but it is interesting. > > If we were worried about it, we could use, as the cost of a link > between R1 and R2, the *average* of the costs > in each direction, for building the multicast tree. If there really > were links with wildly different costs in > the two directions, it might even be the right thing to do. > > Radia > > > > > jeff pickering wrote: >> Ayan, >> >> Thanks much for all the help, but I still think there is a problem. >> I'll try to rearticulate: >> >> Scenario: RB-B is computing dist tree spf and has neighbor RB-A >> towards root. >> RB-A has multiple ptpt links to RB-B (ptpt critical to issue) and >> those links are UNEQUAL COST. >> RB-B must choose RB-A's lowest cost link (RB-Bs links costs are >> irrelevant.) >> RB-A announces in its LSP only the links to RB-B and their costs. >> (unlike a bcast subnet >> where pnode ID would also be included). It does not include any info >> that might enable >> RB-B to identify which link the announced cost is associated with. >> Likewise, the 3way hello >> contains nothing that allows mapping remote cost to adjacency. >> Tiebreaking rules based on >> circuit ID dont apply because there is no tie. >> >> (note that the wide metric tlv contains only system ID in its 7 octet >> link info). >> >> RB-B could make a choice based on its own link costs, but that would >> be a problem for RPF. >> So its seems additional information is needed. I see two options: >> >> 1) add some ciruict ID info to the subtlv of the wide metric ISN link. >> 2) add cost info to the new link capability tlv. >> 3) enforce some relationship between circuit ID and cost such that >> higher cost >> means less preferred circuit ID. Could then use circuit ID for a >> decision. >> >> 3) seems kink of hokey to me. >> and since the issue is purely local to RB-A and RB-B, I would prefer >> option 2. >> >> Of course if Im out to lunch, I'd love to hear why. >> >> Regards, >> Jeff >> >> Ayan Banerjee wrote: >> >>> Jeff, >>> >>> At nodes RB-A and RB-B, where this information is necessary during >>> the SPF >>> run, LSP link information, adjacency information, metric information >>> etc all >>> values are present. >>> >>> Agree that other nodes do not have the complete information, >>> however, they >>> only care about the cost of the link between RB-A and RB-B. The link >>> (among >>> the parallel links) to be used between RB-A and RB-B is needed >>> locally and >>> complete information is present at those two nodes. >>> >>> Thanks, >>> Ayan >>> >>> On 6/22/09 1:09 PM, "jeff pickering" wrote: >>> >>> >>>> Ayan, >>>> >>>> How? There is no cost info in the 3way hello/handshake and no 3way cid >>>> related info >>>> in the LSP ISN link (wide narrow or otherwise). >>>> >>>> Thanks, >>>> Jeff >>>> >>>> >>>> >>>> Ayan Banerjee wrote: >>>> >>>>> Jeff, >>>>> Agreed. However, from the adjacency state machine (since 3-way is >>>>> mandatory >>>>> in TRILL) one can figure this information. >>>>> >>>>> Thanks, >>>>> Ayan >>>>> >>>>> >>>>> On 6/22/09 12:34 PM, "jeff pickering" wrote: >>>>> >>>>> >>>>>> For the local matter of RB-B making its adjacency set, it has 2 >>>>>> adjcacencies to RB-A and it >>>>>> must choose one. The one it must choose is that represented by the >>>>>> lowest cost of link of >>>>>> the upstream node (RB-A). But for a ptpt link, there is nothing >>>>>> in the >>>>>> LSP that would allow >>>>>> RB-B to determine which one of its adjacencies >>>>>> corresponded/mapped to >>>>>> that link. >>>>>> >>>>>> Thanks, >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>>> Ayan Banerjee wrote: >>>>>> >>>>>>> Jeff, >>>>>>> >>>>>>> On re-reading your email, I see that you have unequal costs. The >>>>>>> costs from >>>>>>> the upstream node will be honored for the bi-directional tree. >>>>>>> >>>>>>> On a separate note, for equal cost multi-path, the link that >>>>>>> needs to be >>>>>>> used between nodes RB-A and RB-B are a local matter. Other nodes >>>>>>> will just >>>>>>> "view" the tree as being connected between RB-A and RB-B. I was >>>>>>> trying to >>>>>>> state how the "local" algorithm is made deterministic. >>>>>>> >>>>>>> Thanks, >>>>>>> Ayan >>>>>>> >>>>>>> >>>>>>> On 6/22/09 11:49 AM, "jeff pickering" wrote: >>>>>>> >>>>>>> >>>>>>>> Im sorry if Im being obtuse. This is a non-equal cost issue. >>>>>>>> RB-A will >>>>>>>> advertise in its >>>>>>>> LSP ISN entry a link to RB-B and its associated cost, but no >>>>>>>> (ext/cid >>>>>>>> for ptpt) info >>>>>>>> which allows RB-B on the other end to determine which one of its >>>>>>>> adjacencies (and therefore >>>>>>>> ports) the advertised link corresponds to. I understand for >>>>>>>> bcast, you >>>>>>>> can determine this >>>>>>>> from the pnode id part on the ISN link, but dont see what info >>>>>>>> you have >>>>>>>> have for ptpt. >>>>>>>> (unless I misread the wide metrics spec 5305). >>>>>>>> >>>>>>>> Jeff >>>>>>>> >>>>>>>> >>>>>>>> Ayan Banerjee wrote: >>>>>>>> >>>>>>>>> Jeff, >>>>>>>>> >>>>>>>>> In TRILL 3-way handshake is mandatory for P2P links. Hence, >>>>>>>>> each node >>>>>>>>> will >>>>>>>>> get an unique extended circuit identifier after adjacency >>>>>>>>> formation. On >>>>>>>>> LAN >>>>>>>>> links, the source mac can be used an identifier for uniqueness >>>>>>>>> of each >>>>>>>>> link. >>>>>>>>> >>>>>>>>> I believe that P2P links are preferred over LAN links. Among >>>>>>>>> the set, in >>>>>>>>> the >>>>>>>>> event there are multiple "identifiers" then the highest (or >>>>>>>>> lowest - I >>>>>>>>> remember that there was some discussion on the list about >>>>>>>>> highest/lowest, >>>>>>>>> do >>>>>>>>> not recall what we ended up with) one will be chosen for the >>>>>>>>> tree. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Ayan >>>>>>>>> >>>>>>>>> On 6/22/09 8:55 AM, "jeff pickering" >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> I have a distribution tree parallel links question I was >>>>>>>>>> hoping someone >>>>>>>>>> could clarify >>>>>>>>>> for me. >>>>>>>>>> >>>>>>>>>> Let's say you have 2 RBs downstream from the root, with >>>>>>>>>> parallel links >>>>>>>>>> between them. >>>>>>>>>> >>>>>>>>>> RB-A has cost 10 on link 1, cost 20 on link 2. >>>>>>>>>> RB-B has cost 20 on link 1, cost 10 on link 2. >>>>>>>>>> >>>>>>>>>> Lets say RB-B is downstream from RB-A. When computing its >>>>>>>>>> paths, RB-A >>>>>>>>>> will clearly >>>>>>>>>> use link 1. Whether it shows 1 or 2 links in its LSPs is >>>>>>>>>> irrelevant, >>>>>>>>>> RB-B still needs to know >>>>>>>>>> somehow to choose its adjacency over link 1 for its path to >>>>>>>>>> the root. >>>>>>>>>> But in the asymetric cost >>>>>>>>>> case, I dont see how its can do this. Is there something new >>>>>>>>>> in either >>>>>>>>>> the LSP or hello that >>>>>>>>>> would resolve this? >>>>>>>>>> >>>>>>>>>> Or should RB-B use its lower cost and if so how does the RPF >>>>>>>>>> issue get >>>>>>>>>> addressed? >>>>>>>>>> >>>>>>>>>> Thanks to anyone who can enlighten me. >>>>>>>>>> >>>>>>>>>> Jeff >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> rbridge mailing list >>>>>>>>>> rbridge at postel.org >>>>>>>>>> http://mailman.postel.org/mailman/listinfo/rbridge >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>> >>> >> >> _______________________________________________ >> rbridge mailing list >> rbridge at postel.org >> http://mailman.postel.org/mailman/listinfo/rbridge >> > > > From d3e3e3 at gmail.com Tue Jun 23 13:02:37 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Tue, 23 Jun 2009 16:02:37 -0400 Subject: [rbridge] dist tree parallel links - proposed solution In-Reply-To: <4A4116BB.7050602@nc.rr.com> References: <4A40D574.9090204@nc.rr.com> <4A410793.9020100@sun.com> <4A4116BB.7050602@nc.rr.com> Message-ID: <1028365c0906231302n423280e0q10068861b697c084@mail.gmail.com> Hi Jeff, Draft version -13, which should be posted in a few days, removes the qualifier "least cost". The preferred path among a set of parallel P2P links is chosen based only on extended circuit ID. Thanks, Donald ============================= Donald E. Eastlake 3rd +1-508-634-2066 (home) 155 Beaver Street Milford, MA 01757 USA d3e3e3 at gmail.com On Tue, Jun 23, 2009 at 1:54 PM, jeff pickering wrote: > > Actually, I was primarily concerned about RPF check failures. Lets say > R7 just got put into the multicast > tree as a child of R4 (R7 is the one running the SPF here). > There are multiple links PTPT links between R4 and R7. If the links cost > differ, R4 will > perform its RPF check when receiving multicast packets from R7 and toss > packets that dont arrive on > its view of the lowest cost link between the two. That implies that R7 > must know what R4's lowest cost > link is when determining its upstream adjacency for the multicast tree. > And for a PTPT link, I just dont > see how R7 makes that decision. Clearly things are different if the link > is broadcast because the link in > the LSP contains a pnode ID which can can be used by R7 to dtermine > exactly which link is > implied when R4 puts a cost/ISN pair in its LSP. > > Please note that this has nothing to do with bcast links, or any router > other than R4 and R7. All routers, including > R4 and R7 compute the same tree as far as how nodes are interconnected > in the tree. Its just that R7 may pick > the wrong link over which to forward to R4 and therefore the RPF could > fail. > > Jeff > > Radia Perlman wrote: > > Let me see if I understand your question by restating it. > > > > You are concerned, when calculating the multicast distribution tree > > from root R, that all RBridges > > calculate the same tree, and think there might be a problem if > > someplace in the tree there is a link > > with asymmetric costs in the two directions. Is that what you are > > concerned about? > > > > I don't think there's a problem with having different bridges > > calculate the tree rooted at R differently, > > because they all use the same tree-building algorithm, starting with > > R, searching for paths shortest path first. > > > > But there is an interesting issue. > > I believe, when building the tree, looking at the newly treed node, > > say R1, that it is R1's reported cost > > to each of R1's neighbors that are used, not neighbor's cost to R1 > > (other than the "two way connectivity check", > > which I assume only checks that they both report the link exists, but > > not whether the costs are the same). > > > > So...... Suppose the link costs really are very different in the two > > directions for some link. For unicast, > > when R is calculating a shortest path tree from itself to each > > destination, it makes sense to use the > > link cost outwards from R, and the cost in the reverse direction is > > irrelevant. > > > > However, for multicast, it is a bidirectional tree being built. So, > > for instance, if some node R7 might get > > put into the tree as a child of R3, where the link cost R3-R7 is, say > > cost 2, but the reverse cost is, say, 4.5 gazillion, > > or R7 might get put into the tree as a child of R4, where the link > > cost R4-R7 and R7-R4 is a reasonable cost in both > > directions of, say, 18, then placing R7 as a child of R4 would be a > > better choice than placing R7 > > as a child of R3. > > > > I don't think we really want to worry about this, but it is interesting. > > > > If we were worried about it, we could use, as the cost of a link > > between R1 and R2, the *average* of the costs > > in each direction, for building the multicast tree. If there really > > were links with wildly different costs in > > the two directions, it might even be the right thing to do. > > > > Radia > > > > > > > > > > jeff pickering wrote: > >> Ayan, > >> > >> Thanks much for all the help, but I still think there is a problem. > >> I'll try to rearticulate: > >> > >> Scenario: RB-B is computing dist tree spf and has neighbor RB-A > >> towards root. > >> RB-A has multiple ptpt links to RB-B (ptpt critical to issue) and > >> those links are UNEQUAL COST. > >> RB-B must choose RB-A's lowest cost link (RB-Bs links costs are > >> irrelevant.) > >> RB-A announces in its LSP only the links to RB-B and their costs. > >> (unlike a bcast subnet > >> where pnode ID would also be included). It does not include any info > >> that might enable > >> RB-B to identify which link the announced cost is associated with. > >> Likewise, the 3way hello > >> contains nothing that allows mapping remote cost to adjacency. > >> Tiebreaking rules based on > >> circuit ID dont apply because there is no tie. > >> > >> (note that the wide metric tlv contains only system ID in its 7 octet > >> link info). > >> > >> RB-B could make a choice based on its own link costs, but that would > >> be a problem for RPF. > >> So its seems additional information is needed. I see two options: > >> > >> 1) add some ciruict ID info to the subtlv of the wide metric ISN link. > >> 2) add cost info to the new link capability tlv. > >> 3) enforce some relationship between circuit ID and cost such that > >> higher cost > >> means less preferred circuit ID. Could then use circuit ID for a > >> decision. > >> > >> 3) seems kink of hokey to me. > >> and since the issue is purely local to RB-A and RB-B, I would prefer > >> option 2. > >> > >> Of course if Im out to lunch, I'd love to hear why. > >> > >> Regards, > >> Jeff > >> > >> Ayan Banerjee wrote: > >> > >>> Jeff, > >>> > >>> At nodes RB-A and RB-B, where this information is necessary during > >>> the SPF > >>> run, LSP link information, adjacency information, metric information > >>> etc all > >>> values are present. > >>> > >>> Agree that other nodes do not have the complete information, > >>> however, they > >>> only care about the cost of the link between RB-A and RB-B. The link > >>> (among > >>> the parallel links) to be used between RB-A and RB-B is needed > >>> locally and > >>> complete information is present at those two nodes. > >>> > >>> Thanks, > >>> Ayan > >>> > >>> On 6/22/09 1:09 PM, "jeff pickering" wrote: > >>> > >>> > >>>> Ayan, > >>>> > >>>> How? There is no cost info in the 3way hello/handshake and no 3way cid > >>>> related info > >>>> in the LSP ISN link (wide narrow or otherwise). > >>>> > >>>> Thanks, > >>>> Jeff > >>>> > >>>> > >>>> > >>>> Ayan Banerjee wrote: > >>>> > >>>>> Jeff, > >>>>> Agreed. However, from the adjacency state machine (since 3-way is > >>>>> mandatory > >>>>> in TRILL) one can figure this information. > >>>>> > >>>>> Thanks, > >>>>> Ayan > >>>>> > >>>>> > >>>>> On 6/22/09 12:34 PM, "jeff pickering" wrote: > >>>>> > >>>>> > >>>>>> For the local matter of RB-B making its adjacency set, it has 2 > >>>>>> adjcacencies to RB-A and it > >>>>>> must choose one. The one it must choose is that represented by the > >>>>>> lowest cost of link of > >>>>>> the upstream node (RB-A). But for a ptpt link, there is nothing > >>>>>> in the > >>>>>> LSP that would allow > >>>>>> RB-B to determine which one of its adjacencies > >>>>>> corresponded/mapped to > >>>>>> that link. > >>>>>> > >>>>>> Thanks, > >>>>>> Jeff > >>>>>> > >>>>>> > >>>>>> > >>>>>> Ayan Banerjee wrote: > >>>>>> > >>>>>>> Jeff, > >>>>>>> > >>>>>>> On re-reading your email, I see that you have unequal costs. The > >>>>>>> costs from > >>>>>>> the upstream node will be honored for the bi-directional tree. > >>>>>>> > >>>>>>> On a separate note, for equal cost multi-path, the link that > >>>>>>> needs to be > >>>>>>> used between nodes RB-A and RB-B are a local matter. Other nodes > >>>>>>> will just > >>>>>>> "view" the tree as being connected between RB-A and RB-B. I was > >>>>>>> trying to > >>>>>>> state how the "local" algorithm is made deterministic. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Ayan > >>>>>>> > >>>>>>> > >>>>>>> On 6/22/09 11:49 AM, "jeff pickering" > wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Im sorry if Im being obtuse. This is a non-equal cost issue. > >>>>>>>> RB-A will > >>>>>>>> advertise in its > >>>>>>>> LSP ISN entry a link to RB-B and its associated cost, but no > >>>>>>>> (ext/cid > >>>>>>>> for ptpt) info > >>>>>>>> which allows RB-B on the other end to determine which one of its > >>>>>>>> adjacencies (and therefore > >>>>>>>> ports) the advertised link corresponds to. I understand for > >>>>>>>> bcast, you > >>>>>>>> can determine this > >>>>>>>> from the pnode id part on the ISN link, but dont see what info > >>>>>>>> you have > >>>>>>>> have for ptpt. > >>>>>>>> (unless I misread the wide metrics spec 5305). > >>>>>>>> > >>>>>>>> Jeff > >>>>>>>> > >>>>>>>> > >>>>>>>> Ayan Banerjee wrote: > >>>>>>>> > >>>>>>>>> Jeff, > >>>>>>>>> > >>>>>>>>> In TRILL 3-way handshake is mandatory for P2P links. Hence, > >>>>>>>>> each node > >>>>>>>>> will > >>>>>>>>> get an unique extended circuit identifier after adjacency > >>>>>>>>> formation. On > >>>>>>>>> LAN > >>>>>>>>> links, the source mac can be used an identifier for uniqueness > >>>>>>>>> of each > >>>>>>>>> link. > >>>>>>>>> > >>>>>>>>> I believe that P2P links are preferred over LAN links. Among > >>>>>>>>> the set, in > >>>>>>>>> the > >>>>>>>>> event there are multiple "identifiers" then the highest (or > >>>>>>>>> lowest - I > >>>>>>>>> remember that there was some discussion on the list about > >>>>>>>>> highest/lowest, > >>>>>>>>> do > >>>>>>>>> not recall what we ended up with) one will be chosen for the > >>>>>>>>> tree. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Ayan > >>>>>>>>> > >>>>>>>>> On 6/22/09 8:55 AM, "jeff pickering" > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I have a distribution tree parallel links question I was > >>>>>>>>>> hoping someone > >>>>>>>>>> could clarify > >>>>>>>>>> for me. > >>>>>>>>>> > >>>>>>>>>> Let's say you have 2 RBs downstream from the root, with > >>>>>>>>>> parallel links > >>>>>>>>>> between them. > >>>>>>>>>> > >>>>>>>>>> RB-A has cost 10 on link 1, cost 20 on link 2. > >>>>>>>>>> RB-B has cost 20 on link 1, cost 10 on link 2. > >>>>>>>>>> > >>>>>>>>>> Lets say RB-B is downstream from RB-A. When computing its > >>>>>>>>>> paths, RB-A > >>>>>>>>>> will clearly > >>>>>>>>>> use link 1. Whether it shows 1 or 2 links in its LSPs is > >>>>>>>>>> irrelevant, > >>>>>>>>>> RB-B still needs to know > >>>>>>>>>> somehow to choose its adjacency over link 1 for its path to > >>>>>>>>>> the root. > >>>>>>>>>> But in the asymetric cost > >>>>>>>>>> case, I dont see how its can do this. Is there something new > >>>>>>>>>> in either > >>>>>>>>>> the LSP or hello that > >>>>>>>>>> would resolve this? > >>>>>>>>>> > >>>>>>>>>> Or should RB-B use its lower cost and if so how does the RPF > >>>>>>>>>> issue get > >>>>>>>>>> addressed? > >>>>>>>>>> > >>>>>>>>>> Thanks to anyone who can enlighten me. > >>>>>>>>>> > >>>>>>>>>> Jeff > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> rbridge mailing list > >>>>>>>>>> rbridge at postel.org > >>>>>>>>>> http://mailman.postel.org/mailman/listinfo/rbridge > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > >>> > >> > >> _______________________________________________ > >> rbridge mailing list > >> rbridge at postel.org > >> http://mailman.postel.org/mailman/listinfo/rbridge > >> > > > > > > > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.postel.org/pipermail/rbridge/attachments/20090623/205cb3c2/attachment-0001.html From jpickering at nc.rr.com Tue Jun 23 15:04:03 2009 From: jpickering at nc.rr.com (jeff pickering) Date: Tue, 23 Jun 2009 15:04:03 -0700 Subject: [rbridge] dist tree parallel links - proposed solution In-Reply-To: <1028365c0906231302n423280e0q10068861b697c084@mail.gmail.com> References: <4A40D574.9090204@nc.rr.com> <4A410793.9020100@sun.com> <4A4116BB.7050602@nc.rr.com> <1028365c0906231302n423280e0q10068861b697c084@mail.gmail.com> Message-ID: <4A415153.2090900@nc.rr.com> Excellent. Thanks for the clarification. Jeff Donald Eastlake wrote: > Hi Jeff, > > Draft version -13, which should be posted in a few days, removes the > qualifier "least cost". The preferred path among a set of parallel P2P > links is chosen based only on extended circuit ID. > > Thanks, > Donald > ============================= > Donald E. Eastlake 3rd +1-508-634-2066 (home) > 155 Beaver Street > Milford, MA 01757 USA > d3e3e3 at gmail.com > > > On Tue, Jun 23, 2009 at 1:54 PM, jeff pickering > wrote: > > > Actually, I was primarily concerned about RPF check failures. Lets say > R7 just got put into the multicast > tree as a child of R4 (R7 is the one running the SPF here). > There are multiple links PTPT links between R4 and R7. If the > links cost > differ, R4 will > perform its RPF check when receiving multicast packets from R7 and > toss > packets that dont arrive on > its view of the lowest cost link between the two. That implies that R7 > must know what R4's lowest cost > link is when determining its upstream adjacency for the multicast > tree. > And for a PTPT link, I just dont > see how R7 makes that decision. Clearly things are different if > the link > is broadcast because the link in > the LSP contains a pnode ID which can can be used by R7 to dtermine > exactly which link is > implied when R4 puts a cost/ISN pair in its LSP. > > Please note that this has nothing to do with bcast links, or any > router > other than R4 and R7. All routers, including > R4 and R7 compute the same tree as far as how nodes are interconnected > in the tree. Its just that R7 may pick > the wrong link over which to forward to R4 and therefore the RPF > could fail. > > Jeff > > Radia Perlman wrote: > > Let me see if I understand your question by restating it. > > > > You are concerned, when calculating the multicast distribution tree > > from root R, that all RBridges > > calculate the same tree, and think there might be a problem if > > someplace in the tree there is a link > > with asymmetric costs in the two directions. Is that what you are > > concerned about? > > > > I don't think there's a problem with having different bridges > > calculate the tree rooted at R differently, > > because they all use the same tree-building algorithm, starting with > > R, searching for paths shortest path first. > > > > But there is an interesting issue. > > I believe, when building the tree, looking at the newly treed node, > > say R1, that it is R1's reported cost > > to each of R1's neighbors that are used, not neighbor's cost to R1 > > (other than the "two way connectivity check", > > which I assume only checks that they both report the link > exists, but > > not whether the costs are the same). > > > > So...... Suppose the link costs really are very different in the two > > directions for some link. For unicast, > > when R is calculating a shortest path tree from itself to each > > destination, it makes sense to use the > > link cost outwards from R, and the cost in the reverse direction is > > irrelevant. > > > > However, for multicast, it is a bidirectional tree being built. So, > > for instance, if some node R7 might get > > put into the tree as a child of R3, where the link cost R3-R7 > is, say > > cost 2, but the reverse cost is, say, 4.5 gazillion, > > or R7 might get put into the tree as a child of R4, where the link > > cost R4-R7 and R7-R4 is a reasonable cost in both > > directions of, say, 18, then placing R7 as a child of R4 would be a > > better choice than placing R7 > > as a child of R3. > > > > I don't think we really want to worry about this, but it is > interesting. > > > > If we were worried about it, we could use, as the cost of a link > > between R1 and R2, the *average* of the costs > > in each direction, for building the multicast tree. If there really > > were links with wildly different costs in > > the two directions, it might even be the right thing to do. > > > > Radia > > > > > > > > > > jeff pickering wrote: > >> Ayan, > >> > >> Thanks much for all the help, but I still think there is a problem. > >> I'll try to rearticulate: > >> > >> Scenario: RB-B is computing dist tree spf and has neighbor RB-A > >> towards root. > >> RB-A has multiple ptpt links to RB-B (ptpt critical to issue) and > >> those links are UNEQUAL COST. > >> RB-B must choose RB-A's lowest cost link (RB-Bs links costs are > >> irrelevant.) > >> RB-A announces in its LSP only the links to RB-B and their costs. > >> (unlike a bcast subnet > >> where pnode ID would also be included). It does not include > any info > >> that might enable > >> RB-B to identify which link the announced cost is associated with. > >> Likewise, the 3way hello > >> contains nothing that allows mapping remote cost to adjacency. > >> Tiebreaking rules based on > >> circuit ID dont apply because there is no tie. > >> > >> (note that the wide metric tlv contains only system ID in its 7 > octet > >> link info). > >> > >> RB-B could make a choice based on its own link costs, but that > would > >> be a problem for RPF. > >> So its seems additional information is needed. I see two options: > >> > >> 1) add some ciruict ID info to the subtlv of the wide metric > ISN link. > >> 2) add cost info to the new link capability tlv. > >> 3) enforce some relationship between circuit ID and cost such that > >> higher cost > >> means less preferred circuit ID. Could then use circuit ID > for a > >> decision. > >> > >> 3) seems kink of hokey to me. > >> and since the issue is purely local to RB-A and RB-B, I would > prefer > >> option 2. > >> > >> Of course if Im out to lunch, I'd love to hear why. > >> > >> Regards, > >> Jeff > >> > >> Ayan Banerjee wrote: > >> > >>> Jeff, > >>> > >>> At nodes RB-A and RB-B, where this information is necessary during > >>> the SPF > >>> run, LSP link information, adjacency information, metric > information > >>> etc all > >>> values are present. > >>> > >>> Agree that other nodes do not have the complete information, > >>> however, they > >>> only care about the cost of the link between RB-A and RB-B. > The link > >>> (among > >>> the parallel links) to be used between RB-A and RB-B is needed > >>> locally and > >>> complete information is present at those two nodes. > >>> > >>> Thanks, > >>> Ayan > >>> > >>> On 6/22/09 1:09 PM, "jeff pickering" > wrote: > >>> > >>> > >>>> Ayan, > >>>> > >>>> How? There is no cost info in the 3way hello/handshake and no > 3way cid > >>>> related info > >>>> in the LSP ISN link (wide narrow or otherwise). > >>>> > >>>> Thanks, > >>>> Jeff > >>>> > >>>> > >>>> > >>>> Ayan Banerjee wrote: > >>>> > >>>>> Jeff, > >>>>> Agreed. However, from the adjacency state machine (since > 3-way is > >>>>> mandatory > >>>>> in TRILL) one can figure this information. > >>>>> > >>>>> Thanks, > >>>>> Ayan > >>>>> > >>>>> > >>>>> On 6/22/09 12:34 PM, "jeff pickering" > wrote: > >>>>> > >>>>> > >>>>>> For the local matter of RB-B making its adjacency set, it has 2 > >>>>>> adjcacencies to RB-A and it > >>>>>> must choose one. The one it must choose is that represented > by the > >>>>>> lowest cost of link of > >>>>>> the upstream node (RB-A). But for a ptpt link, there is nothing > >>>>>> in the > >>>>>> LSP that would allow > >>>>>> RB-B to determine which one of its adjacencies > >>>>>> corresponded/mapped to > >>>>>> that link. > >>>>>> > >>>>>> Thanks, > >>>>>> Jeff > >>>>>> > >>>>>> > >>>>>> > >>>>>> Ayan Banerjee wrote: > >>>>>> > >>>>>>> Jeff, > >>>>>>> > >>>>>>> On re-reading your email, I see that you have unequal > costs. The > >>>>>>> costs from > >>>>>>> the upstream node will be honored for the bi-directional tree. > >>>>>>> > >>>>>>> On a separate note, for equal cost multi-path, the link that > >>>>>>> needs to be > >>>>>>> used between nodes RB-A and RB-B are a local matter. Other > nodes > >>>>>>> will just > >>>>>>> "view" the tree as being connected between RB-A and RB-B. > I was > >>>>>>> trying to > >>>>>>> state how the "local" algorithm is made deterministic. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Ayan > >>>>>>> > >>>>>>> > >>>>>>> On 6/22/09 11:49 AM, "jeff pickering" > > wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Im sorry if Im being obtuse. This is a non-equal cost issue. > >>>>>>>> RB-A will > >>>>>>>> advertise in its > >>>>>>>> LSP ISN entry a link to RB-B and its associated cost, but no > >>>>>>>> (ext/cid > >>>>>>>> for ptpt) info > >>>>>>>> which allows RB-B on the other end to determine which one > of its > >>>>>>>> adjacencies (and therefore > >>>>>>>> ports) the advertised link corresponds to. I understand for > >>>>>>>> bcast, you > >>>>>>>> can determine this > >>>>>>>> from the pnode id part on the ISN link, but dont see what > info > >>>>>>>> you have > >>>>>>>> have for ptpt. > >>>>>>>> (unless I misread the wide metrics spec 5305). > >>>>>>>> > >>>>>>>> Jeff > >>>>>>>> > >>>>>>>> > >>>>>>>> Ayan Banerjee wrote: > >>>>>>>> > >>>>>>>>> Jeff, > >>>>>>>>> > >>>>>>>>> In TRILL 3-way handshake is mandatory for P2P links. Hence, > >>>>>>>>> each node > >>>>>>>>> will > >>>>>>>>> get an unique extended circuit identifier after adjacency > >>>>>>>>> formation. On > >>>>>>>>> LAN > >>>>>>>>> links, the source mac can be used an identifier for > uniqueness > >>>>>>>>> of each > >>>>>>>>> link. > >>>>>>>>> > >>>>>>>>> I believe that P2P links are preferred over LAN links. Among > >>>>>>>>> the set, in > >>>>>>>>> the > >>>>>>>>> event there are multiple "identifiers" then the highest (or > >>>>>>>>> lowest - I > >>>>>>>>> remember that there was some discussion on the list about > >>>>>>>>> highest/lowest, > >>>>>>>>> do > >>>>>>>>> not recall what we ended up with) one will be chosen for the > >>>>>>>>> tree. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Ayan > >>>>>>>>> > >>>>>>>>> On 6/22/09 8:55 AM, "jeff pickering" > > > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> I have a distribution tree parallel links question I was > >>>>>>>>>> hoping someone > >>>>>>>>>> could clarify > >>>>>>>>>> for me. > >>>>>>>>>> > >>>>>>>>>> Let's say you have 2 RBs downstream from the root, with > >>>>>>>>>> parallel links > >>>>>>>>>> between them. > >>>>>>>>>> > >>>>>>>>>> RB-A has cost 10 on link 1, cost 20 on link 2. > >>>>>>>>>> RB-B has cost 20 on link 1, cost 10 on link 2. > >>>>>>>>>> > >>>>>>>>>> Lets say RB-B is downstream from RB-A. When computing its > >>>>>>>>>> paths, RB-A > >>>>>>>>>> will clearly > >>>>>>>>>> use link 1. Whether it shows 1 or 2 links in its LSPs is > >>>>>>>>>> irrelevant, > >>>>>>>>>> RB-B still needs to know > >>>>>>>>>> somehow to choose its adjacency over link 1 for its path to > >>>>>>>>>> the root. > >>>>>>>>>> But in the asymetric cost > >>>>>>>>>> case, I dont see how its can do this. Is there > something new > >>>>>>>>>> in either > >>>>>>>>>> the LSP or hello that > >>>>>>>>>> would resolve this? > >>>>>>>>>> > >>>>>>>>>> Or should RB-B use its lower cost and if so how does > the RPF > >>>>>>>>>> issue get > >>>>>>>>>> addressed? > >>>>>>>>>> > >>>>>>>>>> Thanks to anyone who can enlighten me. > >>>>>>>>>> > >>>>>>>>>> Jeff > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> rbridge mailing list > >>>>>>>>>> rbridge at postel.org > >>>>>>>>>> http://mailman.postel.org/mailman/listinfo/rbridge > >>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>> > >>> > >>> > >> > >> _______________________________________________ > >> rbridge mailing list > >> rbridge at postel.org > >> http://mailman.postel.org/mailman/listinfo/rbridge > >> > > > > > > > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > > > ------------------------------------------------------------------------ > > _______________________________________________ > rbridge mailing list > rbridge at postel.org > http://mailman.postel.org/mailman/listinfo/rbridge > From d3e3e3 at gmail.com Sun Jun 28 20:08:52 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Sun, 28 Jun 2009 23:08:52 -0400 Subject: [rbridge] I-D ACTION:draft-eastlake-nlpid-iana-considerations-00.txt Message-ID: <1028365c0906282008p3cafec69ndfbb2ccc66f82ba6@mail.gmail.com> You may be interested that the draft below has been posted and includes the allocation of an NLPID for TRILL for such uses as in the IS-IS Protocols Supported TLV. Thanks, Donald ============================= A New Internet-Draft is available from the on-line Internet-Drafts directories. Title : IANA Considerations for NLPIDs Author(s) : D. Eastlake 3rd Filename : draft-eastlake-nlpid-iana-considerations-00.txt Pages : 11 Date : 2009-6-24 Some protocols being developed or extended by the IETF make use of the ISO/IEC Network Layer Protocol Identifier (NLPID). This document provides NLPID IANA Considerations. A URL for this Internet-Draft is: http://www.ietf.org/internet-drafts/draft-eastlake-nlpid-iana-considerations-00.txt From Internet-Drafts at ietf.org Mon Jun 29 14:45:01 2009 From: Internet-Drafts at ietf.org (Internet-Drafts@ietf.org) Date: Mon, 29 Jun 2009 14:45:01 -0700 (PDT) Subject: [rbridge] I-D ACTION:draft-ietf-trill-rbridge-protocol-13.txt Message-ID: <20090629214501.E3B673A6C25@core3.amsl.com> From d3e3e3 at gmail.com Tue Jun 30 06:14:44 2009 From: d3e3e3 at gmail.com (Donald Eastlake) Date: Tue, 30 Jun 2009 09:14:44 -0400 Subject: [rbridge] Working Group Last Call: draft-ietf-trill-rbridge-protocol-13.txt Message-ID: <1028365c0906300614x144cbb59xf1387ee0e92aa9a@mail.gmail.com> The changes between the -12 and -13 versions of the base protocol draft were sufficiently extensive that we have decided that another Working Group Last Call is required. This message starts a two week WGLC which ends July 15th. Thanks, Donald and Erik ============================= Donald E. Eastlake 3rd +1-508-634-2066 (home) 155 Beaver Street Milford, MA 01757 USA d3e3e3 at gmail.com On Mon, Jun 29, 2009 at 5:45 PM, wrote: > A New Internet-Draft is available from the on-line Internet-Drafts > directories. > This draft is a work item of the Transparent Interconnection of Lots of Links Working Group of the IETF. > > ? ? ? ?Title ? ? ? ? ? : Rbridges: Base Protocol Specification > ? ? ? ?Author(s) ? ? ? : D. Eastlake 3rd, D. Dutt, S. Gai, A. Ghanwani, R. Perlman > ? ? ? ?Filename ? ? ? ?: draft-ietf-trill-rbridge-protocol-13.txt > ? ? ? ?Pages ? ? ? ? ? : 102 > ? ? ? ?Date ? ? ? ? ? ?: 2009-6-29 > > RBridges provide optimal pair-wise forwarding with zero > ? configuration, safe forwarding even during periods of temporary > ? loops, and support for multipathing of both unicast and multicast > ? traffic. They achieve these goals using IS-IS routing and > ? encapsulation of traffic with a header that includes a hop count. > > ? RBridges are compatible with previous IEEE 802.1 customer bridges as > ? well as IPv4 and IPv6 routers and end nodes. They are as invisible to > ? current IP routers as bridges are and, like routers, they terminate > ? the bridge spanning tree protocol. > > ? The design supports VLANs and optimization of the distribution of > ? multi-destination frames based on VLAN and IP derived multicast > ? groups. ?It also allows forwarding tables to be sized according to > ? the number of RBridges (rather than the number of end nodes), which > ? allows internal forwarding tables to be substantially smaller than in > ? conventional bridges. > > A URL for this Internet-Draft is: > http://www.ietf.org/internet-drafts/draft-ietf-trill-rbridge-protocol-13.txt > > Internet-Drafts are also available by anonymous FTP at: > ftp://ftp.ietf.org/internet-drafts/