Discussion:
[pfSense] Frequent "bge0: watchdog timeout -- resetting" problems
Paul Mather
2013-05-13 14:07:47 UTC
Permalink
I'm running pfSense 2.0.3-RELEASE (i386) on a Dell 2650 rack-mount server. I'm using the built-in Broadcom gigabit ethernet NICs for WAN and LAN:

bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x000105> mem 0xfca10000-0xfca1ffff irq 28 at device 6.0 on pci4
miibus0: <MII bus> on bge0
brgphy0: <BCM5701 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge0: [ITHREAD]
bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x000105> mem 0xfca00000-0xfca0ffff irq 29 at device 8.0 on pci4
miibus1: <MII bus> on bge1
brgphy1: <BCM5701 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1: [ITHREAD]

***@pci0:4:6:0: class=0x020000 card=0x01211028 chip=0x164514e4 rev=0x15 hdr=0x00
class = network
subclass = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split transaction
cap 01[48] = powerspec 2 supports D0 D3 current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit
***@pci0:4:8:0: class=0x020000 card=0x01211028 chip=0x164514e4 rev=0x15 hdr=0x00
class = network
subclass = ethernet
cap 07[40] = PCI-X 64-bit supports 133MHz, 512 burst read, 1 split transaction
cap 01[48] = powerspec 2 supports D0 D3 current D0
cap 03[50] = VPD
cap 05[58] = MSI supports 8 messages, 64 bit


I am having severe problems with these NICs---particularly the WAN side (bge0). Under traffic (not necessarily high load), I will lose connectivity for some time until the NIC appears to be reset via a watchdog. It is typical to see this repeated in dmesg:

bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP


In System -> Advanced -> Networking, I have disabled hardware checksum offload; hardware TCP segmentation offload; and hardware large receive offload, but this hasn't seemed to help. I have seen on Google references to problems with Broadcom 57XX-based NICs under FreeBSD, and there are indications some work has been done in FreeBSD 9-STABLE to improve matters, which is obviously not helpful for pfSense running 8.1-RELEASE-p13.

I have checked the state table usage when this problem occurs and it is low (with ample free state entries available).

I have heard that disabling MSI can sometimes be helpful, but the bge driver does not appear to use it:

sysctl -a | grep msi
hw.bce.msi_enable: 1
hw.cxgb.msi_allowed: 2
hw.em.enable_msix: 1
hw.igb.enable_msix: 1
hw.malo.pci.msi_disable: 0
hw.pci.honor_msi_blacklist: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1


Has anyone run into this problem? Can anyone offer a possible solution or workaround?

I have a dual-NIC expansion card in the same machine that supports fxp NICs, and, right now, I am tempted to switch to those, believing it is probably better to have stable 100BaseT than flaky 1000BaseT. But, I'm hoping something can be done to make the bge ports be stable. Any thoughts?

Cheers,

Paul.
Giles Coochey
2013-05-13 14:40:41 UTC
Permalink
Post by Paul Mather
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
I had something similar, with a VM implementation, it seemed to go away
when I increased the memory on the system.
--
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
***@coochey.net
Paul Mather
2013-05-13 15:09:45 UTC
Permalink
Post by Paul Mather
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
bge0: watchdog timeout -- resetting
bge0: link state changed to DOWN
bge0: link state changed to UP
I had something similar, with a VM implementation, it seemed to go away when I increased the memory on the system.
How much memory was in the increased-memory system? The hardware I am using has 2 GB of RAM, which should be plenty for pfSense. According to the RRD graphs, active+wired+cached memory usage is normally below 5% of total RAM at all times on this system.

Cheers,

Paul.

Loading...