PSA: X11SBA-F rev 1.01 secondary LAN port timeouts

Hopefully this post will save someone time, money and aggravation. If you’re considering buying Supermicro X11SBA-F/LN4F you need to be aware that at the very least some boards (and possibly all) with a hardware revision 1.01 have serious and seemingly non-correctable issues with a secondary PCI-E bridge that feeds secondary LANs. DO NOT BUY X11SBA rev 1.01 if you need secondary NICs and always make sure the hardware revision is at least rev 1.02.

X11SBA series are very nice low-cost ridiculously low-power feature-packed boards that allow you to get 3 firewalls/domain controllers/PBX appliances for under $1000 total (if you shop around), with growth capability that should last you a decade (assuming you’re using Linux/BSD and not Windows, of course). One of the key features in addition to multi-core is that -F model comes with 2 GbE NICs, while -LN4F features for 4 GbE NICs.

Here’s the problem: the NIC #2 on -F and NICs #2, #3 and #4 on -LN4F sit on top of a separate PCI-E bridge than the 1st NIC. And that secondary bridge seems to have hardware problems for X11SBA hardware rev 1.01 causing transfer failures and requiring LAN PHY reset.

The problem manifests itself in the following way. LAN comes up, responds to pings and passes traffic happily. As the load increases you suddenly lose all connectivity. On FreeBSD 10/11 this failure looks like this:

igb1: Watchdog timeout -- resetting
igb1: Queue(846295657) tdh = -1249464976, hw tdt = 589450809
igb1: TX(846295657) desc avail = 0,Next TX to Clean = 0
igb1: link state changed to DOWN
igb1: link state changed to UP
igb1: Watchdog timeout -- resetting
igb1: Queue(846295657) tdh = -1249464976, hw tdt = 589450809
igb1: TX(846295657) desc avail = 0,Next TX to Clean = 0
igb1: link state changed to DOWN
igb1: link state changed to UP

This issue will be easily reproducible by using iperf as follows:

$ iperf -c <your reliable iperf server> -d -t600 -l1m -e -i1

[  5] local <client IP> port 60094 connected with <server IP> port 5001
[  4] local <server IP> port 5001 connected with <client IP> port 58122
[ ID] Interval        Transfer    Bandwidth       Write/Err  Rtry    Cwnd
[  5] 0.00-1.00 sec   112 MBytes   940 Mbits/sec  112/0         0   1377K
[  4] 0.00-1.00 sec  31.6 MBytes   265 Mbits/sec  3662    3659:1:1:1:0:0:0:0
[  5] 1.00-2.00 sec   104 MBytes   872 Mbits/sec  104/0        78    858K
[  4] 1.00-2.00 sec  32.4 MBytes   271 Mbits/sec  4160    4157:0:2:0:0:0:0:1
...
[  5] 49.00-50.00 sec  41.0 MBytes   344 Mbits/sec  41/0         0    718K
[  5] 50.00-51.00 sec  39.7 MBytes   333 Mbits/sec  40/0         1      1K
[  4] 50.00-51.00 sec  86.4 MBytes   724 Mbits/sec  3617    3611:3:2:0:0:0:0:1
[  5] 51.00-52.00 sec   445 KBytes  3.65 Mbits/sec  1/1         1      1K
[  4] 51.00-52.00 sec  0.00 Bytes  0.00 bits/sec  0    0:0:0:0:0:0:0:0
[  5] 52.00-53.00 sec  0.00 Bytes  0.00 bits/sec  0/2         1      1K

What To Do to Fix X11SBA

What will NOT work to fix the issue:

  • Disabling ACPI
  • Disabling MSI-X
  • Disabling MSI
  • Increasing mbuf or whatever network stack memory buffers are called on your OS
  • Disabling PCI-E power management (ASPM)
  • Tuning sysctl
  • Updating BIOS
  • Updating LAN EEPROM
  • Messing with BIOS settings, power management etc.

If any of the manipulations above solve your problem, you’re not experiencing the hardware issue I’m describing and it’s likely software, settings or both.

If you are having the hardware problem there is nothing you will be able to do short of RMAing your X11SBA with Supermicro asking them to provide you with a board revision 1.02 or higher or returning your board to the retailer. If you are requesting an RMA, make sure to describe the problem in exhaustive detail and/or provide them a link to this article. It took me two RMAs to get my problem solved – the first time the tech only tested for “ping OK” and declared the board functional after updating LAN EEPROM.

Links:

  1. pfSense forum describing the watchdoggate in detail.
  2. Watchdoggate described from the Linux side of the playground.

Leave a Reply