[rbridge] Last Call comment on: http://www.ietf.org/internet-drafts/draft-ietf-trill-prob-01.txt
Silvano Gai
sgai at nuovasystems.com
Fri Oct 27 10:33:56 PDT 2006
these are my comments:
Sgai 1> The document assume that in spanning tree there are transient
loops. THIS IS ABSOLUTELY FALSE. Spanning tree never causes a loop, not
even during a transition. The document assumes that, since in ST there
are transient loops, it is OK to have transient loops in TRILL and that
they only need to be mitigated through a TTL.
The TTL solution is OK for unicast traffic, since unicast traffic does
not replicate while in the loop and eventually the TTL will drop it or
the network will converge and the frame will be delivered.
The TTL solution is NOT OK for multicast/broadcast traffic, since this
traffic replicates while in the loop causing a broadcast/multicast
storm.
Due to the fact that switches replicate in HW and have low latency, in a
meshed network, even with a moderate TTL, in few hundreds microseconds,
billion of frames will be part of the storm.
These billions of frames will be queue everywhere, causing hosts to
crash, but especially they will saturate the queue of the switch CPU.
The CPU quickly becomes incapable of dealing with its queue and
incapable of receiving control frames to break the loop.
Anyhow, ISIS will react in hundreds of milliseconds, while the storm
will reach its peak in hundreds of microseconds.
Customers that have seen a broadcast storm, due to a bogus ST
implementation or a misconfiguration, don't want to see a second one.
A solution based on TTL must therefore have a strong requirement for
dedicated buffers/paths for the control frames to reach the CPU, so that
it is guarantee that control frames will eventually break the loop.
I don't think it is acceptable to have temporary loop for broadcast
multicast, even if they are mitigated by TTL. An interlock mechanism
similar to ST must be used for multicast/broadcast.
I ask for a strong requirement that says: "TRILL MUST avoid
multicast/broadcast storms"
Sgai 2> ST provides symmetrical forwarding, i.e. the path from A to B is
the reverse of the path from B to A. Is this a requirement for TRILL?
Sgai 3> the terminology used in this draft is not the one used in IEEE
standards. This makes it difficult to understand what certain sentences
really mean. Concepts like autolearning and caches are not IEEE
concepts.
Sgai 4> There is no mention of the applicability of other important IEEE
standards/WG/Study Groups, e.g.
- 802.3ad-2000, Link Aggregation.
- 802.1ah - Provider Backbone Bridges
- 802.1aq - Shortest Path Bridging
- 802.1au - Congestion Notification
- 802.1ad - Provider Bridges
- 802.1AE - MAC Security
- 802.3ar - Congestion Management Task Force.
- 802.3as - Frame Expansion Task Force.
I think this document needs to clearly state the position of the WG with
respect to these projects.
Sgai 5> I also think there need to be a mention of the applicability of
important industrial efforts:
- NIC Teaming
- uplinkfast
- split-MLT
- Q in Q
All these are widely deployed in all datacenters/enterprises. I think
this document needs to clearly state the position of the WG with respect
to these de fact standards.
Sgai 6> Many customers look at TRILL as a backbone network. They would
like to connect their current switches to the TRILL backbone using
Etherchannel and connecting the member links on different RBridges for
High availability. Is this a requirement? In general which is the
relation between Etherchannel and TRILL?
Sgai 7> Does TRILL work properly if Ethernet is deployed with Pause
enabled?
Additional comments in the text marked as sgai N> where N is the number
of the comment.
For all these reasons, but in particular for <sgai 1> I think this
document needs another major revision before it can complete the WG last
call.
-- Silvano
----------------------------------------------------------
2.5. Problems Not Addressed
There are other challenges to deploying Ethernet subnets that are not
addressed in this document. These include:
o increased Ethernet link subnet scale
o increased node relocation
o Ethernet link subnet management protocol security
o flooding attacks on a Ethernet link subnet
Solutions to TRILL are not intended to support deployment of
increasingly larger scales of Ethernet link subnets than current
broadcast domains can support (e.g., around 1,000 end-hosts in a
single bridged LAN of 100 bridges, or 100,000 end-hosts inside 1,000
VLANs served by 10,000 bridges).
Sgai 8> I don't know were these number come from, but with 256/512 ports
Ethernet switches available, it does not take 10,000 bridges to reach
100,000 nodes. I also don't understand if the mention of 1,000 VLANs is
intended as a limit. As I mentioned in previous emails, many customers
don't have enough of 4,000 VLANs and deploy private VLANs. All the
implementations I know about hash the pair (MAD-address, VLAN} into the
filtering database and the only limitation is the size of the filtering
database.
Similarly, solutions to TRILL are not intended to address link layer
node migration, which can complicate the caches in learning bridges.
Sgai 9> IEEE 802.1D does not contain the word "cache". Are you referring
to the filtering database? Why are filtering databases complicated by
node migration? I think that TRILL should provide a solution to node
migration that is as good as IEEE 802.1D or better.
Similar challenges exist in the ARP protocol, where link layer
forwarding is not updated appropriately when nodes move to ports on
other bridges. Again, the compartmentalization available in network
routing, like that of network layer ASes, can help hide the effect of
migration. That is a side effect, however, and not a primary focus of
this work.
Sgai 10> I am not sure what the previous sentence means, I will remove
it.
Current link control plane protocols, including Ethernet link subnet
management (STP) and link/network integration (ARP), are vulnerable
to a variety of attacks. Solutions to TRILL are not intended to
directly address these vulnerabilities. Similar attacks exist in the
data plane, e.g., source address spoofing, single address traffic
attacks, traffic snooping, and broadcast flooding. TRILL solutions do
not address any of these issues, although it is critical that they do
not introduce new vulnerabilities in the process (see Section 5).
3. Desired Properties of Solutions to TRILL
This section describes some of the desirable or required properties
of any system that would solve the TRILL problems, independent of the
details of such an architecture. Most of these are based on retaining
useful properties of bridges, or maintaining those properties while
solving the problems listed in Section 2.
3.1. No Change to Link Capabilities
There must be no change to the service that Ethernet subnets already
provide as a result of deploying a TRILL solution. Ethernet supports
unicast, broadcast, and multicast natively. Although network
protocols, notably IP, can tolerate link layers that do not provide
all three, it would be useful to retain the support already in place
[7].
Sgai 11> This requirement needs to be a "must". It also needs to say
that TRILL need to work also for non-IP protocols,
Zeroconf, as well as existing bridge autoconfiguration, are
dependent on broadcast as well.
Current Ethernet ensures in-order delivery and no duplicated packets
under normal operation (excepting transients during reconfiguration).
Sgai 12> outside a marginal corner case in RSTP that affects only
in-order delivery, these two properties are also guarantee during
reconfiguration. There are no transient loops in ST, see <sgai 1>
These criteria apply in varying degrees to the different variants of
Ethernet, e.g., basic Ethernet up through basic VLAN (802.1Q) ensures
Sgai 13> IEEE 802.1Q is not involved in this; it is a property of IEEE
802.1D.
that all packets between two link addresses have both properties, but
protocol/port VLAN (802.1V) ensures this only for packets with the
same protocol and port. [JUST CHECKING - OR AM I MISREADING WHAT
802.1V DOES?]
Sgai 14> this needs to be resolved
Touch & Perlman Expires April 22, 2007 [Page 8]
Internet-Draft TRILL: Problem and Applicability October 2006
There are subtle implications to such a requirement. Bridge
autolearning
sgai 15> autolearning is not a well known concept, not present in IEEE
802.1D
already is susceptible to moving nodes between ports,
because previously learned associations between port and link address
change. A TRILL solution could be similarly susceptible to such
changes.
3.2. Zero Configuration and Zero Assumption
Both bridges and hubs are zero configuration devices; hubs having no
configuration at all, and bridges being automatically self-
configured. Bridges are further zero-assumption devices, unlike hubs.
Bridges can be interconnected in arbitrary topologies, without regard
for cycles or even self-attachment. STP removes the impact of cycles
automatically, and port autolearning reduces unnecessary broadcast of
unicast traffic.
Sgai 16> port autolearning is not an IEEE concept.
A TRILL solution should strive to have similar zero configuration,
zero assumption operation. This includes having TRILL solution
components automatically discover other TRILL solution components and
organize themselves, as well as to configure that organization for
proper operation (plug-and-play). It also includes zero configuration
backward compatibility with existing bridges and hubs, which may
include interacting with some of the bridge protocols, such as STP.
VLANs add a caveat to zero configuration; a TRILL solution should
support automatic use of a default VLAN (like non-VLAN bridges), but
should require explicit configuration where the VLANS require them as
well.
Sgai 17> The discussion about VLAN needs to be much more extensive. It
is clear from the mailing list discussion that VLANs can be used inside
the packet or in the Ethernet encapsulation of TRILL. These are two
different kinds of VLANs and their requirement need to be stated
separately. Q in Q needs also to be discussed. See also <sgai 26>.
Autoconfiguration extends to optional services, such as multicast
support via IGMP snooping, broadcast support via serial copy, and
supporting multiple VLANs.
Sgai 18> what about VLAN pruning?
3.3. Forwarding Loop Mitigation
Spanning tree avoids forwarding loops by construction, although
transient loops can occur, e.g., via the appearance of a new link.
Sgai 19> this statement is incorrect. ST does not have transient loops.
See <sgai 1>
Solutions to TRILL are intended to use adapted network layer routing
protocols which may introduce transient loops during routing
convergence. TRILL solutions thus need support for mitigating the
effect of such routing loops.
In the Internet, loop mitigation is provided by a decrementing
hopcounts (TTL); in other networks, packets include a trace
(serialized or unioned) of visited nodes [1]. These mechanisms
(respectively) limit the impact of loops or detect them explicitly. A
mechanism with similar effect should be included in TRILL solutions.
Sgai 20> see <sgai 1>
Touch & Perlman Expires April 22, 2007 [Page 9]
Internet-Draft TRILL: Problem and Applicability October 2006
[QUESTION: anyone have a good reference for serialized or union
traces - or better names for them?]
sgai 21> this needs to be resolved
3.4. Spanning Tree Management
In order to address convergence under reconfiguration and robustness
to link interruption (Sections 2.2 and 2.3), participation in the STP
must be carefully managed. The goal is to provide the desired
stability of the TRILL solution and of the entire Ethernet link
subnet while not interfering with the operation of STP of the
Ethernet on which the TRILL resides. This may involve TRILL solutions
participating in the STP, where the protocol is used for TRILL might
dampen interactions with STP, or it may involve severing the STP into
separate STPs on 'stub' external Ethernet link subnet segments.
A requirement is that a TRILL solution must not require modifications
or exceptions to the existing spanning tree protocols (STP, MSTP).
Sgai 22> does this include RSTP? More in general this document does not
describe requirements for the interaction of TRILL with ST.
[we need pictures here; to appear]
Sgai 23> this needs to be resolved
3.5. Multiple Attachments
In STP, a single NIC with multiple attachments to a single spanning
tree will always only get traffic over one of the two attachment
points,
sgai 24> Not clear how a NIC in the host can have multiple attachments.
If you are referring to NIC teaming, what you says is false.
TRILL allows load sharing between the attachment points.
Further, TRILL must manage multicast and broadcast traffic so as not
to create feedback loops on Ethernet segments which are attached at
multiple TRILL access points.
[NOTE: this might be omitted, as it has not been shown to be a
problem with STP].
Sgai 25> this needs to be resolved
3.6. VLAN Issues
A TRILL solution should support multiple VLANs (802.1Q, 802.1V, and
802.1S). This may involve ignorance, just as many bridge devices do
not participate in the VLAN protocols. It may alternately support
direct VLAN support, e.g., by the use of separate TRILL routing
protocol instances to separate traffic for each VLAN traversing a
TRILL solution.
Sgai 26> See also <sgai 17>. I am not sure what the first two sentences
are trying to say, the last part needs to be expanded and clearly
differentiated from the discussion related to the section 3.2. I propose
to call these VLANs the "outer VLANs" and the VLANs discussed in 3.2 the
"inner VLANs" (with reference to the position of the tag in the frame.
3.7. Equivalence
As with any extension to an existing architecture, it would be useful
- though not strictly necessary - to be able to describe or consider
a TRILL solution as a model of an existing link layer component. Such
equivalence provides a validation model for the architecture, and a
Touch & Perlman Expires April 22, 2007 [Page 10]
Internet-Draft TRILL: Problem and Applicability October 2006
way for users to predict the effect of the use of a TRILL solution on
a deployed Ethernet. In this case, 'user' refers to users of the
Ethernet protocol, whether at the host (data segments), bridge (ST
control segments), or VLAN (VLAN control).
This provides a sanity check, i.e., "we got it right if we can
replace a TRILL solution with an X" (where "X" might be a single
bridge, a hub, or some other link layer abstraction). It does not
matter whether "X" can be implemented on the same scale as the
corresponding TRILL solution. It also does not matter if it can -
there may be utility to deploying the TRILL solution components
incrementally, in ways that a single "X" could not be installed.
For example, if TRILL solution were equivalent to a single 802.1D
bridge, it would mean that the TRILL solution would - as a whole -
participate in the STP. This need not require that TRILL solution
would propagate STP, any more than a bridge need do so in its on-
board control. It would mean that the solution would interact with
BPDUs at the edge, where the solution would - again, as a whole -
participate as if a single node in the spanning tree. Note that this
equivalence is not required; a solution may act as if an 802.1 hub,
or may not have a corresponding equivalent link layer component at
all.
3.8. Optimizations
There are a number of optimizations that may be applied to TRILL
solutions. These must be applied in a way that does not affect
functionality as a tradeoff for increased performance. Such
optimizations address broadcast and multicast frame distribution,
VLAN support, and snooping of ARP and IPv6 neighbor discovery.
[NOTE: need to say more here.]
Sgai 27> this needs to be resolved
3.9. Internet Architecture Issues
TRILL solutions are intended to have no impact on the Internet
network layer architecture. In particular, the Internet and higher
layer headers should remain intact when traversing a TRILL solution,
just as they do when traversing any other link subnet technologies.
This means that the IP TTL field cannot be co-opted for forwarding
loop mitigation, as it would interfere with the Internet layer
assuming that the link subnet was reachable with no changes in TTL
(Internet TTLs are changed only at routers, as per RFC 1812, and even
if IP TTL were considered, TRILL is expected to support non-IP
payloads, and so requires a separate solution anyway) [1].
Sgai 28> The requirement must be: "TRILL must support non-IP
Payloads"
Touch & Perlman Expires April 22, 2007 [Page 11]
Internet-Draft TRILL: Problem and Applicability October 2006
TRILL solutions should also have no impact on Internet routing or
signaling, which also means that broadcast and multicast, both of
which can pervade an entire Ethernet link subnet, must be able to
transparently pervade a TRILL solution. Changing how either of these
capabilities behaves would have significant effects on a variety of
protocols, including RIP (broadcast), RIPv2 (multicast), ARP
(broadcast), IPv6 neighbor discovery (multicast), etc.
Note that snooping of network layer packets may be useful, especially
for certain optimizations. These include snooping multicast control
plane packets (IGMP) to tune link multicast to match the network
multicast topology, as is already done in existing smart switches
[2]. This also includes snooping IPv6 neighbor discovery messages to
assist with governing TRILL solution edge configuration, as is the
case in some smart learning bridges [9]. Other layers may similarly
be snooped, notably ARP packets, for similar reasons for IPv4 [13].
[Need a ref for the router-router 'igmp' protocol]
Sgai 29> this needs to be resolved
4. Applicability
As might be expected, TRILL solutions are intended to be used to
solve the problems described in Section 2. However, not all such
installations are appropriate environments for such solutions. This
section outlines the issues in the appropriate use of these
solutions.
TRILL solutions are intended to address problems of path efficiency
and stability within a single Ethernet link subnet. Like bridges,
individual TRILL solution components may find other TRILL solution
components within a single Ethernet link subnet and aggregate into a
single TRILL solution.
TRILL solutions are not intended to span separate Ethernet link
subnets where interconnected by network layer (e.g., router) devices,
except via link layer tunnels that are in place prior to their
deployment, where such tunnels render the distinct subnet
undetectably equivalent from a single Ethernet link subnet.
A currently open question is whether a single Ethernet link subnet
should contain only one TRILL solution instance, either of necessity
of architecture or utility.
Sgai 30> this needs to be resolved
Multiple TRILL solutions, like Internet
ASes, may allow TRILL routing protocols to be partitioned in ways
that help their stability, but this may come at the price of needing
the TRILL solutions to participate more fully as nodes (each modeling
a bridge) in the Ethernet link subnet STP. Each architecture solution
should decide whether multiple TRILL solutions are supported within a
Touch & Perlman Expires April 22, 2007 [Page 12]
Internet-Draft TRILL: Problem and Applicability October 2006
single Ethernet link subnet and mechanisms should be included to
enforce whatever decision is made.
TRILL solutions are not intended to address scalability limitations
in bridged subnets. Although there may be scale benefits of other
aspects of solving TRILL problems, e.g., of using network layer
routing to provide stability under link changes or intermittent
outages, this is not a focus of this work.
As also noted earlier, TRILL solutions are not intended to address
security vulnerabilities in either the data plane or control plane of
the link layer. This means that TRILL solutions should not limit
broadcast frames, ARP requests, or spanning tree protocol messages
(if such are interpreted by the TRILL solution or solution edge).
5. Security Considerations
TRILL solutions should not introduce new vulnerabilities compared to
traditional bridged subnets.
TRILL solutions are not intended to be a solution to Ethernet link
subnet vulnerabilities, including spoofing, flooding, snooping, and
attacks on the link control plane (STP, flooding the learning cache)
and link-network control plane (ARP). Although TRILL solutions are
intended to provide more stable routing than STP, this stability is
limited to performance, and the subsequent robustness is intended to
address non-malicious events.
There may be some side-effects to the use of TRILL solutions that can
provide more robust operation under certain attacks, such as those
interrupting or adding link service, but TRILL solutions should not
be relied upon for such capabilities.
Finally, TRILL solutions should not interfere with other protocols
intended to address these vulnerabilities, such as those under
development to secure IPv6 neighbor discovery.
[need a ref for secure ipv6 nd]
Sgai 31> this needs to be resolved
6. IANA Considerations
This document has no IANA considerations.
This section should be removed by the RFC Editor prior to final
publication.
Touch & Perlman Expires April 22, 2007 [Page 13]
Internet-Draft TRILL: Problem and Applicability October 2006
7. Conclusions
(TBA)
Sgai 32> this needs to be resolved
More information about the rbridge
mailing list