As I was studying for the Troubleshooting Cisco Data Center Unified Fabric (DCUFT) exam, I came across a couple of low level NX-OS commands that can help determine whether the Data Center Bridging eXchange (DCBX) protocol is functioning correctly. Being able to verify the operation of DCBX is important when troubleshooting FCoE, because the proper operation of the Data Center Bridging (DCB) extensions is a prerequisite for FCoE.
Unfortunately, the output of these commands is rather cryptic, because it essentially shows the content of the DCBX TLVs as raw hex dumps, rather than nicely decoding the fields in the output of the command. Because I still wanted to understand how to read the DCBX information contained in these commands, I decided to dive a bit deeper into the DCBX protocol.
While researching DCBX on the web I bumped into a nice introduction to DCBX in the Juniper technical documentation. This document also conveniently links to the reference documents for the two common flavors of DCBX, version 1.01 (aka the CEE version) and the official IEEE version defined as part of the 802.1Qaz Enhanced Transmission Selection standard.
In order to start my dive into the bits and bytes of the protocol, all I needed was a DCBX packet capture to decode. Luckily, I had a packet capture in my personal collection that I had taken a couple of weeks earlier using the NX-OS ethanalyzer tool. I filtered through the capture file until I had found a DCBX exchange between a Qlogic CNA and the Nexus 5000. For those of you who would like to examine the DCBX packet capture yourselves I provided the pcap file here. So let’s fire up Wireshark and have a look at the packets.
As you can see in the screenshot, Wireshark decodes the packets properly as LLDP, but does not recognize the DCBX specific TLVs. I was a bit surprised by this, because Wireshark tends to have a very large set of dissectors, but apparently DCBX is still too much of a niche protocol to be included.
Note: I actually found a Wireshark dissector for DCBX published by the University of New Hampshire InterOperability Lab. However, it seems like this is simply a patched version of a specific Wireshark release. It looks like this patch was never contributed back to the Wireshark project and integrated into the regular Wireshark release updates. Unfortunately, my programming skills are too limited to be able to reverse engineer the patch and commit it myself. I would be very grateful if somebody with more software engineering know-how than me would volunteer to make this happen.
As Wireshark doesn’t have a dissector for the DCBX TLVs it flags them as “IntelCor – Unknown”. Because the OUI used in the CEE version of DCBX (0x001b21) was contributed by Intel to the DCB working group for use in the pre-standard version of DCBX, WireShark simply flags the TLV as an Intel TLV.
As per section 2.4.1 of the DCBX 1.01 specification, DCBX TLVs are identified by TLV Type 127, OUI 0x001b21, and Subtype 2, so we know we are dealing with a CEE version DCBX TLV here. Unfortunately, the actual content of the TLV is simply dumped as a long hex string:
020a000000000001000000000606000080000808080a000080008906001b210804110000800000010000323200000000000002
So let’s have a look and see what we can make of this. The diagram below outlines the general structure of the CEE DCBX TLV.
As we can see, the hex string actually represents a DCB protocol control sub-TLV, followed by a series of DCB feature sub-TLVs. Each of these sub-TLVs starts with a 7-bit type field and a 9-bit length field, similar to the regular LLDP TLVs. This should help us break down the long string into its sub TLVs. Let’s first lay out the hex string in offset hex, to make it a bit easier to interpret:
0000 02 0a 00 00 00 00 00 01 00 00 00 00 06 06 00 00 0010 80 00 08 08 08 0a 00 00 80 00 89 06 00 1b 21 08 0020 04 11 00 00 80 00 00 01 00 00 32 32 00 00 00 00 0030 00 00 02
The hex string starts with “02 0a”, which indicates sub-type 1 and length 10. So the first 12 octets of the string represent the control protocol sub-TLV (2 octets for type and length and 10 octets for the value). This means that the next sub-TLV starts with “06 06”, which indicates sub-type 3 and length 6. So the second block of 8 bytes represents a DCB feature sub-TLV of type 3. Following this method we see that the next sub-TLV starts with “08 0a” on the second line and represents another DCB feature sub-TLV with sub-type 4 and length 10. Finally, this is followed by another sub-TLV starting with “04 11” on the third line. This part of the string represents a feature sub-TLV with sub-type 2 and length 17, which brings us to the end of the string. Now that we have separated the string into its components we can start analyzing the individual sub-TLVs.
Let’s start with the control protocol sub-TLV, which is identified by the sub-type value of 1. The structure of the TLV is shown in the following diagram.
The corresponding piece of the hex-string is:
0000 02 0a 00 00 00 00 00 01 00 00 00 00
Comparing these values to the header structure this gives us the following breakdown of the fields:
- 02 = Sub-type 1 (control protocol sub-TLV)
- 0a = Length 10
- 00 = Operational version 0
- 00 = Maximum version 0
- 00 00 00 01 = Sequence number 1
- 00 00 00 00 = Acknowledgement number 0
The control protocol sub-TLV contains the basic parameters that control versioning and reliable transmission of the DCB feature TLVs. This sub-TLV is a mandatory element for every DCBX TLV.
Now let’s move on to the next sub-TLV, which is the Priority Flow Control (PFC) feature sub-TLV, identified by the sub-type value of 3. All feature sub-TLVs have the same header fields, followed by fields that are specific to the feature. The common header fields for the feature sub-TLVs are shown in the diagram below.
Now let’s compare this again to the piece of the hex-string that corresponds to the PFC feature TLV:
0000 06 06 00 00 80 00 08 08
Comparing these values to the header structure this gives us the following breakdown of the fields:
- 06 = Sub-type 3 (Priority-based Flow Control sub-TLV)
- 06 = Length 6
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
From a troubleshooting standpoint, the most relevant fields here are the three bits, “Enable”, “Willing”, and “Error”. In this case, the “Enable” bit is set, signifying that the PFC feature is enabled on the switch. The “Willing” bit is set to 0, which indicates that the switch will not accept DCB configuration from the CNA. This makes sense, because the idea is that the switch will push its DCB feature configurations to the CNA, not the other way around. The “Error” bit is set to 0, which is normal, because this bit will only be set in case of negotiation failures. The setting of these three bits in the DCBX TLVs sent by the CNA and the switch can give you a good indication if the feature is working properly on both sides.
This leaves the final two octets in this part of the hex string “08 08”. In order to decode these fields we need to have a closer look at the feature-specific content for the PFC feature sub-TLV. The configuration of the PFC feature is always mapped into two octets, which are laid out as follows:
The first 8 bits each map to a traffic class (CoS) and if priority flow control is enabled for that class the bit is set to 1. If pause frames are not enabled for that class the bit is set to 0. The last octet lists the number of traffic classes (0-8) that PFC can be enabled for simultaneously.
So if we map this to the “08 08” value in our hex string, this can be translated as follows:
- 08 = PFC is enabled for CoS 3
- 08 = PFC is supported for 8 traffic classes
This matches the default FCoE QoS policies on the Nexus 5000, which enable priority flow control for FCoE, which by default uses CoS 3.
So let’s move on to the next feature sub-TLV. The hex string for the TLV is the following:
0000 08 0a 00 00 80 00 89 06 00 1b 21 08
The sub-type 4 identifies this sub-TLV as a DCB Application Protocol sub-TLV. This sub-TLV is used to indicate which protocol(s) should be mapped to a specific priority (CoS) value. Based on the structure of the feature sub-TLV header, which is common for all feature sub-TLVs, we can decode the first six octets:
- 08 = Sub-type 4 (Application protocol sub-TLV)
- 0a = Length 10
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
For the final six octets, we need to look at the structure of the application protocol sub-TLV. The structure of this sub-TLV is laid out in the following diagram:
The structure of this TLV is pretty flexible, but the recommended way to use this field is the following. For the OUI, the Intel DCBX OUI 0x001b21 is used and mapped into the fields labeled OUI_23-18_bits, OUI_15-8 bits, and OUI_7-0_bits. Bits 16 and 17 of the OUI (which in a MAC address would be replaced by the Individual and Group bits) are not mapped into the TLV, but these two bits carry the Selector Field instead. The selector field determines how the application protocol is identified. If the selector field is set to “00” it indicates that the Application Protocol ID is an EtherType, if the selector field is set to “01” it indicates that the Application Protocol ID is a TCP/UDP port number. The User Priority Map indicates to which priority values (CoS) this protocol is mapped, by setting the bit corresponding to the priority value to 1.
So based on this information we can decode the final 6 octets of this sub-TLV:
- 89 06 = Application Proto ID FCoE
- 00 1b 21 = Intel DCBX OUI and Selector Field indicating that the application protocol is defined by an EtherType
- 08 = Upper Protocol Map maps to CoS 3
Through this feature sub-TLV, the switch tells the CNA that FCoE traffic, identified by EtherType 0x8906, should be mapped to CoS 3.
Now let’s move on to the final sub-TLV. This TLV is represented by the following part of the hex-string:
0000 04 11 00 00 80 00 00 01 00 00 32 32 00 00 00 00 0010 00 00 02
The sub-type 2 identifies this as a Priority Group Feature sub-TLV. This feature sub-TLV is used to communicate the parameters related to ETS, which define the queuing and scheduling of the different traffic classes. Again, we can start by decoding the first six octets based on the feature sub-TLV header:
- 04 = Sub-type 2 (Priority group sub-TLV)
- 11 = Length 17
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
Next, we need to take a look at the feature-specific portion of the TLV to decode the final 13 octets:
Based on this structure we can decode the last 13 octets as follows:
- 0 = CoS 0 is mapped to priority group 0
- 0 = CoS 1 is mapped to priority group 0
- 0 = CoS 2 is mapped to priority group 0
- 1 = CoS 3 is mapped to priority group 1
- 0 = CoS 4 is mapped to priority group 0
- 0 = CoS 5 is mapped to priority group 0
- 0 = CoS 6 is mapped to priority group 0
- 0 = CoS 7 is mapped to priority group 0
- 32 = Percentage for priority group 0 is 50%
- 32 = Percentage for priority group 1 is 50%
- 00 = Percentage for priority group 2 is 0%
- 00 = Percentage for priority group 3 is 0%
- 00 = Percentage for priority group 4 is 0%
- 00 = Percentage for priority group 5 is 0%
- 00 = Percentage for priority group 6 is 0%
- 00 = Percentage for priority group 7 is 0%
- 02 = Number of priority groups supported is 2
So the first eight hex characters identify the priority groups to which each of the eight traffic classes (CoS values) are mapped. In our example, CoS value 3 is mapped to priority group 1, while all other CoS values are mapped to priority group 0. In other words, two queues are defined, one for CoS 3 and one for all other CoS values. The next eight octets define the bandwidth percentage allocated to each of the priority-groups. Priority group 0 (CoS 1, 2, 4, 5, 6, 7) gets 50%, while priority group 1 (CoS 3) also gets 50%.
So finally, we have a full decode of our original hex string. For reference, the full string was:
0000 02 0a 00 00 00 00 00 01 00 00 00 00 06 06 00 00 0010 80 00 08 08 08 0a 00 00 80 00 89 06 00 1b 21 08 0020 04 11 00 00 80 00 00 01 00 00 32 32 00 00 00 00 0030 00 00 02
Now we know that this translates to the following DCBX parameters:
- 02 = Sub-type 1 (control protocol sub-TLV)
- 0a = Length 10
- 00 = Operational version 0
- 00 = Maximum version 0
- 00 00 00 01 = Sequence number 1
- 00 00 00 00 = Acknowledgement number 0
- 06 = Sub-type 3 (Priority-based Flow Control sub-TLV)
- 06 = Length 6
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
- 08 = PFC is enabled for CoS 3
- 08 = PFC is supported for 8 traffic classes
- 08 = Sub-type 4 (Application protocol sub-TLV)
- 0a = Length 10
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
- 89 06 = Application Proto ID FCoE
- 00 1b 21 = Intel DCBX OUI and Selector Field indicating that the application protocol is defined by an EtherType
- 08 = Upper Protocol Map maps to CoS 3
- 04 = Sub-type 2 (Priority group sub-TLV)
- 11 = Length 17
- 00 = Operational version 0
- 00 = Maximum version 0
- 80 = Enabled, Not Willing, No Error
- 00 = Sub-type 0
- 0 = CoS 0 is mapped to priority group 0
- 0 = CoS 1 is mapped to priority group 0
- 0 = CoS 2 is mapped to priority group 0
- 1 = CoS 3 is mapped to priority group 1
- 0 = CoS 4 is mapped to priority group 0
- 0 = CoS 5 is mapped to priority group 0
- 0 = CoS 6 is mapped to priority group 0
- 0 = CoS 7 is mapped to priority group 0
- 32 = Percentage for priority group 0 is 50%
- 32 = Percentage for priority group 1 is 50%
- 00 = Percentage for priority group 2 is 0%
- 00 = Percentage for priority group 3 is 0%
- 00 = Percentage for priority group 4 is 0%
- 00 = Percentage for priority group 5 is 0%
- 00 = Percentage for priority group 6 is 0%
- 00 = Percentage for priority group 7 is 0%
- 02 = Number of priority groups supported is 2
It would have been really nice if Wireshark could have decoded this for me, so I wouldn’t have to do it myself. However, by digging into DCBX at the bits and bytes level, I gained a better understanding of the exact parameters that are being negotiated by the DCBX protocol and which values to zoom in on when troubleshooting DCBX. Now when I see command output such as the following, at least I know how to extract the relevant information from it:
switch# show system internal dcbx info interface ethernet 1/4 <snip> Peer's DCX TLV: DCBX TLV Proto(1) type: 1(Control) DCBX TLV Length: 10 DCBX TLV Value 00 00 02 00 00 00 01 00 00 00 sub_type 0, error 0, willing 0, enable 0, max_version 0, oper_version 0 DCBX TLV Proto(1) type: 2(PriGrp) DCBX TLV Length: 17 DCBX TLV Value 00 00 c0 00 00 01 00 00 32 32 00 00 00 00 00 00 02 sub_type 0, error 0, willing 1, enable 1, max_version 0, oper_version 0 DCBX TLV Proto(1) type: 3(PFC) DCBX TLV Length: 6 DCBX TLV Value 00 00 c0 00 08 01 sub_type 0, error 0, willing 1, enable 1, max_version 0, oper_version 0 DCBX TLV Proto(1) type: 4(App(Fcoe)) DCBX TLV Length: 16 DCBX TLV Value 00 00 c0 00 89 06 00 1b 21 08 89 14 00 1b 21 08 sub_type 0, error 0, willing 1, enable 1, max_version 0, oper_version 0
For those who made it all the way to the end of this article and enjoyed this deep dive into DCBX, it is left as an exercise to decode the DCBX sub-TLVs in the command output above. Enjoy!
Thanks, its very helpful 🙂
Very helpful.
A pcap capture file would be awesome!
Hi Nitin,
The capture file that I used for this decode is actually linked in the article, right above the first Wireshark screenshot. The download URL is https://s3-eu-west-1.amazonaws.com/layerzero-public/dcbx.pcap.
Tom
Hi Tom,
Very interesting article. I would be interested to see your decode of an Application Protocol sub-TLV recommending to the CNA to use a specific TCP port for classification. Also, am I right in saying that there is no way to recommend the CNA classify based on VLAN ID?
Regards,
Peter
Hi Peter,
Unfortunately I don’t have an actual decode of a CNA using DCBX to negotiate for iSCSI, which I think would be the most typical example of using a TCP port instead of an EtherType to classify the traffic.
With regards to your second question: It looks like the current standard only allows for classification based on Ethertype (selector field 00) or TCP/UDP port (selector field 01). However, that leaves two additional values for the selector field (10 and 11) for future use, so if there would be a requirement for VLAN based class selection I guess this could be implemented in future versions of the DCBX standard.
Regards,
Tom