igc Adapter Resets #74

Open
opened 2023-07-07 23:16:32 +02:00 by simon · 1 comment

Problem

Sometimes, the NIC resets and throws the following message into dmesg (all at the same time):

igc 0000:03:00.0 enp3s0: NIC Link is Down
br-lan: port 2(enp3s0) entered disabled state
igc 0000:03:00.0 enp3s0: Register Dump
igc 0000:03:00.0 enp3s0: Register Name   Value
igc 0000:03:00.0 enp3s0: CTRL            181c0641
igc 0000:03:00.0 enp3s0: STATUS          40680681
igc 0000:03:00.0 enp3s0: CTRL_EXT        10000040
igc 0000:03:00.0 enp3s0: MDIC            18017969
igc 0000:03:00.0 enp3s0: ICR             00000001
igc 0000:03:00.0 enp3s0: RCTL            0440803a
igc 0000:03:00.0 enp3s0: RDLEN[0-3]      00001000 00001000 00001000 00001000
igc 0000:03:00.0 enp3s0: RDH[0-3]        00000013 000000c4 0000006e 0000009d
igc 0000:03:00.0 enp3s0: RDT[0-3]        00000012 000000c3 0000006d 0000009c
igc 0000:03:00.0 enp3s0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
igc 0000:03:00.0 enp3s0: RDBAL[0-3]      3839f000 3a39b000 3a398000 3eae0000
igc 0000:03:00.0 enp3s0: RDBAH[0-3]      00000001 00000001 00000001 00000001
igc 0000:03:00.0 enp3s0: TCTL            a503f0fa
igc 0000:03:00.0 enp3s0: TDBAL[0-3]      38131000 383ac000 383a7000 383a2000
igc 0000:03:00.0 enp3s0: TDBAH[0-3]      00000001 00000001 00000001 00000001
igc 0000:03:00.0 enp3s0: TDLEN[0-3]      00001000 00001000 00001000 00001000
igc 0000:03:00.0 enp3s0: TDH[0-3]        000000d0 00000073 000000ec 000000d4
igc 0000:03:00.0 enp3s0: TDT[0-3]        000000d0 00000073 000000ec 000000d4
igc 0000:03:00.0 enp3s0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
igc 0000:03:00.0 enp3s0: Reset adapter

(In this case, the cable was removed, so the link going down is not a problem here, but that is not always the case)

On Linux 6.4.1, I observed a kernel trace (however it isn’t printed every time the issue occurs):

[Sun Jul  9 09:39:59 2023] ------------[ cut here ]------------
[Sun Jul  9 09:39:59 2023] NETDEV WATCHDOG: enp3s0 (igc): transmit queue 3 timed out 9252 ms
[Sun Jul  9 09:39:59 2023] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023] Modules linked in: af_packet ctr ccm wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel msr iwlmvm snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_cadence snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp mac80211 snd_sof snd_sof_utils snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_hda_codec_hdmi libarc4 snd_soc_core snd_compress btusb ac97_bus snd_pcm_dmaengine btrtl snd_hda_intel btbcm btintel snd_intel_dspcfg btmtk snd_intel_sdw_acpi snd_hda_codec snd_hda_core bluetooth iwlwifi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hwdep iTCO_wdt cmdlinepart intel_pmc_bxt snd_pcm nls_iso8859_1 coretemp spi_nor watchdog crc32_pclmul mfd_core polyval_generic nls_cp437 gf128mul evdev ecdh_generic snd_timer vfat ecc ghash_clmulni_intel fat mtd mac_hid cfg80211 crc16 i2c_i801 mei_me
[Sun Jul  9 09:39:59 2023]  intel_cstate igc snd soundcore spi_intel_pci spi_intel rfkill i2c_smbus ptp pps_core mei intel_pmc_core tiny_power_button intel_scu_pltdrv pinctrl_elkhartlake button nft_chain_nat xt_MASQUERADE xt_mark xt_conntrack xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat sch_fq_codel nf_tables nfnetlink nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 atkbd libps2 serio vivaldi_fmap loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass uinput fuse deflate efi_pstore configfs efivarfs dmi_sysfs ip_tables x_tables autofs4 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core hid_generic usbhid hid sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic mmc_block xhci_pci xhci_pci_renesas ahci xhci_hcd libahci libata crct10dif_pclmul crct10dif_common sha512_ssse3 sha512_generic usbcore sdhci_pci aesni_intel cqhci sdhci libaes crypto_simd scsi_mod cryptd mmc_core led_class scsi_common usb_common rtc_cmos dm_mod dax btrfs blake2b_generic xor
[Sun Jul  9 09:39:59 2023]  libcrc32c crc32c_generic crc32c_intel raid6_pq i915 i2c_algo_bit drm_buddy cec intel_gtt video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt ttm agpgart drm i2c_core backlight
[Sun Jul  9 09:39:59 2023] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.1 #1-NixOS
[Sun Jul  9 09:39:59 2023] Hardware name: Protectli VP2420/VP2420, BIOS Dasharo (coreboot+UEFI) v1.1.0 04/12/2023
[Sun Jul  9 09:39:59 2023] RIP: 0010:dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023] Code: ff ff ff 48 89 df c6 05 85 f8 11 01 01 e8 83 2d fa ff 45 89 f8 44 89 f1 48 89 de 48 89 c2 48 c7 c7 a8 b0 61 95 e8 8b 0d 7c ff <0f> 0b e9 2a ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[Sun Jul  9 09:39:59 2023] RSP: 0018:ffffab8700108e78 EFLAGS: 00010286
[Sun Jul  9 09:39:59 2023] RAX: 0000000000000000 RBX: ffff8d65fa84e000 RCX: 0000000000000027
[Sun Jul  9 09:39:59 2023] RDX: ffff8d67388a14c8 RSI: 0000000000000001 RDI: ffff8d67388a14c0
[Sun Jul  9 09:39:59 2023] RBP: ffff8d65fa84e4c8 R08: 0000000000000000 R09: ffffab8700108d20
[Sun Jul  9 09:39:59 2023] R10: 0000000000000003 R11: ffffffff95d38f68 R12: ffff8d66000aabc0
[Sun Jul  9 09:39:59 2023] R13: ffff8d65fa84e41c R14: 0000000000000003 R15: 0000000000002424
[Sun Jul  9 09:39:59 2023] FS:  0000000000000000(0000) GS:ffff8d6738880000(0000) knlGS:0000000000000000
[Sun Jul  9 09:39:59 2023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Jul  9 09:39:59 2023] CR2: 0000000002367048 CR3: 0000000102dc6000 CR4: 0000000000350ee0
[Sun Jul  9 09:39:59 2023] Call Trace:
[Sun Jul  9 09:39:59 2023]  <IRQ>
[Sun Jul  9 09:39:59 2023]  ? dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023]  ? __warn+0x81/0x130
[Sun Jul  9 09:39:59 2023]  ? dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023]  ? report_bug+0x171/0x1a0
[Sun Jul  9 09:39:59 2023]  ? handle_bug+0x41/0x70
[Sun Jul  9 09:39:59 2023]  ? exc_invalid_op+0x17/0x70
[Sun Jul  9 09:39:59 2023]  ? asm_exc_invalid_op+0x1a/0x20
[Sun Jul  9 09:39:59 2023]  ? dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023]  ? dev_watchdog+0x235/0x240
[Sun Jul  9 09:39:59 2023]  ? __pfx_dev_watchdog+0x10/0x10
[Sun Jul  9 09:39:59 2023]  call_timer_fn+0x24/0x130
[Sun Jul  9 09:39:59 2023]  ? __pfx_dev_watchdog+0x10/0x10
[Sun Jul  9 09:39:59 2023]  __run_timers+0x222/0x2c0
[Sun Jul  9 09:39:59 2023]  run_timer_softirq+0x1d/0x40
[Sun Jul  9 09:39:59 2023]  __do_softirq+0xc7/0x2ae
[Sun Jul  9 09:39:59 2023]  __irq_exit_rcu+0xab/0xe0
[Sun Jul  9 09:39:59 2023]  sysvec_apic_timer_interrupt+0x72/0x90
[Sun Jul  9 09:39:59 2023]  </IRQ>
[Sun Jul  9 09:39:59 2023]  <TASK>
[Sun Jul  9 09:39:59 2023]  asm_sysvec_apic_timer_interrupt+0x1a/0x20
[Sun Jul  9 09:39:59 2023] RIP: 0010:cpuidle_enter_state+0xcc/0x440
[Sun Jul  9 09:39:59 2023] Code: 6a b9 65 ff e8 15 f2 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 03 cd 64 ff 45 84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d
[Sun Jul  9 09:39:59 2023] RSP: 0018:ffffab87000bbe90 EFLAGS: 00000246
[Sun Jul  9 09:39:59 2023] RAX: ffff8d67388b2840 RBX: ffff8d67388bd200 RCX: 0000000000000000
[Sun Jul  9 09:39:59 2023] RDX: 0000000000000001 RSI: fffffffd0464e3d0 RDI: 0000000000000000
[Sun Jul  9 09:39:59 2023] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000401a41a4
[Sun Jul  9 09:39:59 2023] R10: 0000000000000008 R11: 0000000000000085 R12: ffffffff95dacf80
[Sun Jul  9 09:39:59 2023] R13: 0000004cab8f133d R14: 0000000000000001 R15: 0000000000000000
[Sun Jul  9 09:39:59 2023]  cpuidle_enter+0x2d/0x40
[Sun Jul  9 09:39:59 2023]  do_idle+0x1d8/0x230
[Sun Jul  9 09:39:59 2023]  cpu_startup_entry+0x1d/0x20
[Sun Jul  9 09:39:59 2023]  start_secondary+0x12b/0x150
[Sun Jul  9 09:39:59 2023]  secondary_startup_64_no_verify+0x10b/0x10b
[Sun Jul  9 09:39:59 2023]  </TASK>
[Sun Jul  9 09:39:59 2023] ---[ end trace 0000000000000000 ]---
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Register Dump
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Register Name   Value
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: CTRL            181c0641
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: STATUS          40680683
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: CTRL_EXT        10000040
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: MDIC            1805dde1
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: ICR             00000081
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RCTL            0440803a
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDLEN[0-3]      00001000 00001000 00001000 00001000
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDH[0-3]        0000000d 00000081 0000001e 00000010
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDT[0-3]        0000000c 00000080 0000001d 00000003
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDBAL[0-3]      35106000 35109000 3510c000 3510f000
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDBAH[0-3]      00000001 00000001 00000001 00000001
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TCTL            a503f0fa
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDBAL[0-3]      350f4000 350f9000 350fe000 35103000
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDBAH[0-3]      00000001 00000001 00000001 00000001
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDLEN[0-3]      00001000 00001000 00001000 00001000
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDH[0-3]        0000002b 000000e9 000000d2 000000a7
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDT[0-3]        0000002b 000000ea 000000d2 000000a7
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
[Sun Jul  9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Reset adapter
[Sun Jul  9 09:39:59 2023] br-lan: port 3(enp3s0) entered disabled state

Scope

This affects all devices with Intel i225-V NICs, namely shinbu (4 NICs) and fuuko (1 external NIC).

shinobu: lspci -vnn -d 8086:15f3
01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
    Subsystem: Intel Corporation Device [8086:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at 7f400000 (32-bit, non-prefetchable) [size=1M]
    Memory at 7f600000 (32-bit, non-prefetchable) [size=16K]
    Expansion ROM at 7f500000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-63
    Capabilities: [1c0] Latency Tolerance Reporting
    Capabilities: [1f0] Precision Time Measurement
    Capabilities: [1e0] L1 PM Substates
    Kernel driver in use: igc
    Kernel modules: igc

02:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
    Subsystem: Intel Corporation Device [8086:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 17
    Memory at 7f700000 (32-bit, non-prefetchable) [size=1M]
    Memory at 7f900000 (32-bit, non-prefetchable) [size=16K]
    Expansion ROM at 7f800000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-64
    Capabilities: [1c0] Latency Tolerance Reporting
    Capabilities: [1f0] Precision Time Measurement
    Capabilities: [1e0] L1 PM Substates
    Kernel driver in use: igc
    Kernel modules: igc

03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
    Subsystem: Intel Corporation Device [8086:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 18
    Memory at 7fa00000 (32-bit, non-prefetchable) [size=1M]
    Memory at 7fc00000 (32-bit, non-prefetchable) [size=16K]
    Expansion ROM at 7fb00000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-65
    Capabilities: [1c0] Latency Tolerance Reporting
    Capabilities: [1f0] Precision Time Measurement
    Capabilities: [1e0] L1 PM Substates
    Kernel driver in use: igc
    Kernel modules: igc

04:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
    Subsystem: Intel Corporation Device [8086:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at 7fd00000 (32-bit, non-prefetchable) [size=1M]
    Memory at 7ff00000 (32-bit, non-prefetchable) [size=16K]
    Expansion ROM at 7fe00000 [disabled] [size=1M]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-66
    Capabilities: [1c0] Latency Tolerance Reporting
    Capabilities: [1f0] Precision Time Measurement
    Capabilities: [1e0] L1 PM Substates
    Kernel driver in use: igc
    Kernel modules: igc
fuuko: lspci -vnn -d 8086:15f3
0a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03)
    Subsystem: Intel Corporation Device [8086:0000]
    Flags: bus master, fast devsel, latency 0, IRQ 41, IOMMU group 15
    Memory at fb800000 (32-bit, non-prefetchable) [size=1M]
    Memory at fb900000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [140] Device Serial Number 02-76-c6-ff-ff-00-a2-c0
    Capabilities: [1c0] Latency Tolerance Reporting
    Capabilities: [1f0] Precision Time Measurement
    Capabilities: [1e0] L1 PM Substates
    Kernel driver in use: igc
    Kernel modules: igc

Firmware version

shinobu # ethtool -i enp1s0
driver: igc
version: 6.1.35
firmware-version: 1057:8754
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Kernel version

shinobu # uname -a
Linux shinobu 6.1.35 #1-NixOS SMP PREEMPT_DYNAMIC Wed Jun 21 14:01:03 UTC 2023 x86_64 GNU/Linux

Sometimes, it happens just spontaneously with nothing from my end, but the chances of it happening are increased by the following factors:

  • Being connected to a 2.5GbE port (without this it is very unlikely)
  • Changing the link status (by unplugging cables/power cycling machines etc.) often directly triggers it
  • High throughput: Running e.g. iperf3 for a longer time (multiple runs or a long run) will often cause it

Upstream issues

There seem to be problems with the specific integration of the i225-V on ASUS mainboards, which are related to power saving measures and cause the device to disappear from the PCI bus. From what I can see, this is not what happens in this case.

Solution

?

## Problem Sometimes, the NIC resets and throws the following message into dmesg (all at the same time): ``` igc 0000:03:00.0 enp3s0: NIC Link is Down br-lan: port 2(enp3s0) entered disabled state igc 0000:03:00.0 enp3s0: Register Dump igc 0000:03:00.0 enp3s0: Register Name Value igc 0000:03:00.0 enp3s0: CTRL 181c0641 igc 0000:03:00.0 enp3s0: STATUS 40680681 igc 0000:03:00.0 enp3s0: CTRL_EXT 10000040 igc 0000:03:00.0 enp3s0: MDIC 18017969 igc 0000:03:00.0 enp3s0: ICR 00000001 igc 0000:03:00.0 enp3s0: RCTL 0440803a igc 0000:03:00.0 enp3s0: RDLEN[0-3] 00001000 00001000 00001000 00001000 igc 0000:03:00.0 enp3s0: RDH[0-3] 00000013 000000c4 0000006e 0000009d igc 0000:03:00.0 enp3s0: RDT[0-3] 00000012 000000c3 0000006d 0000009c igc 0000:03:00.0 enp3s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808 igc 0000:03:00.0 enp3s0: RDBAL[0-3] 3839f000 3a39b000 3a398000 3eae0000 igc 0000:03:00.0 enp3s0: RDBAH[0-3] 00000001 00000001 00000001 00000001 igc 0000:03:00.0 enp3s0: TCTL a503f0fa igc 0000:03:00.0 enp3s0: TDBAL[0-3] 38131000 383ac000 383a7000 383a2000 igc 0000:03:00.0 enp3s0: TDBAH[0-3] 00000001 00000001 00000001 00000001 igc 0000:03:00.0 enp3s0: TDLEN[0-3] 00001000 00001000 00001000 00001000 igc 0000:03:00.0 enp3s0: TDH[0-3] 000000d0 00000073 000000ec 000000d4 igc 0000:03:00.0 enp3s0: TDT[0-3] 000000d0 00000073 000000ec 000000d4 igc 0000:03:00.0 enp3s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108 igc 0000:03:00.0 enp3s0: Reset adapter ``` (In this case, the cable was removed, so the link going down is not a problem here, but that is not always the case) On Linux 6.4.1, I observed a kernel trace (however it isn’t printed every time the issue occurs): ``` [Sun Jul 9 09:39:59 2023] ------------[ cut here ]------------ [Sun Jul 9 09:39:59 2023] NETDEV WATCHDOG: enp3s0 (igc): transmit queue 3 timed out 9252 ms [Sun Jul 9 09:39:59 2023] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] Modules linked in: af_packet ctr ccm wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel msr iwlmvm snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_cadence snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp mac80211 snd_sof snd_sof_utils snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_hda_codec_hdmi libarc4 snd_soc_core snd_compress btusb ac97_bus snd_pcm_dmaengine btrtl snd_hda_intel btbcm btintel snd_intel_dspcfg btmtk snd_intel_sdw_acpi snd_hda_codec snd_hda_core bluetooth iwlwifi intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hwdep iTCO_wdt cmdlinepart intel_pmc_bxt snd_pcm nls_iso8859_1 coretemp spi_nor watchdog crc32_pclmul mfd_core polyval_generic nls_cp437 gf128mul evdev ecdh_generic snd_timer vfat ecc ghash_clmulni_intel fat mtd mac_hid cfg80211 crc16 i2c_i801 mei_me [Sun Jul 9 09:39:59 2023] intel_cstate igc snd soundcore spi_intel_pci spi_intel rfkill i2c_smbus ptp pps_core mei intel_pmc_core tiny_power_button intel_scu_pltdrv pinctrl_elkhartlake button nft_chain_nat xt_MASQUERADE xt_mark xt_conntrack xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat sch_fq_codel nf_tables nfnetlink nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 atkbd libps2 serio vivaldi_fmap loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass uinput fuse deflate efi_pstore configfs efivarfs dmi_sysfs ip_tables x_tables autofs4 dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core hid_generic usbhid hid sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic mmc_block xhci_pci xhci_pci_renesas ahci xhci_hcd libahci libata crct10dif_pclmul crct10dif_common sha512_ssse3 sha512_generic usbcore sdhci_pci aesni_intel cqhci sdhci libaes crypto_simd scsi_mod cryptd mmc_core led_class scsi_common usb_common rtc_cmos dm_mod dax btrfs blake2b_generic xor [Sun Jul 9 09:39:59 2023] libcrc32c crc32c_generic crc32c_intel raid6_pq i915 i2c_algo_bit drm_buddy cec intel_gtt video wmi drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt ttm agpgart drm i2c_core backlight [Sun Jul 9 09:39:59 2023] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.4.1 #1-NixOS [Sun Jul 9 09:39:59 2023] Hardware name: Protectli VP2420/VP2420, BIOS Dasharo (coreboot+UEFI) v1.1.0 04/12/2023 [Sun Jul 9 09:39:59 2023] RIP: 0010:dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] Code: ff ff ff 48 89 df c6 05 85 f8 11 01 01 e8 83 2d fa ff 45 89 f8 44 89 f1 48 89 de 48 89 c2 48 c7 c7 a8 b0 61 95 e8 8b 0d 7c ff <0f> 0b e9 2a ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 [Sun Jul 9 09:39:59 2023] RSP: 0018:ffffab8700108e78 EFLAGS: 00010286 [Sun Jul 9 09:39:59 2023] RAX: 0000000000000000 RBX: ffff8d65fa84e000 RCX: 0000000000000027 [Sun Jul 9 09:39:59 2023] RDX: ffff8d67388a14c8 RSI: 0000000000000001 RDI: ffff8d67388a14c0 [Sun Jul 9 09:39:59 2023] RBP: ffff8d65fa84e4c8 R08: 0000000000000000 R09: ffffab8700108d20 [Sun Jul 9 09:39:59 2023] R10: 0000000000000003 R11: ffffffff95d38f68 R12: ffff8d66000aabc0 [Sun Jul 9 09:39:59 2023] R13: ffff8d65fa84e41c R14: 0000000000000003 R15: 0000000000002424 [Sun Jul 9 09:39:59 2023] FS: 0000000000000000(0000) GS:ffff8d6738880000(0000) knlGS:0000000000000000 [Sun Jul 9 09:39:59 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Sun Jul 9 09:39:59 2023] CR2: 0000000002367048 CR3: 0000000102dc6000 CR4: 0000000000350ee0 [Sun Jul 9 09:39:59 2023] Call Trace: [Sun Jul 9 09:39:59 2023] <IRQ> [Sun Jul 9 09:39:59 2023] ? dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] ? __warn+0x81/0x130 [Sun Jul 9 09:39:59 2023] ? dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] ? report_bug+0x171/0x1a0 [Sun Jul 9 09:39:59 2023] ? handle_bug+0x41/0x70 [Sun Jul 9 09:39:59 2023] ? exc_invalid_op+0x17/0x70 [Sun Jul 9 09:39:59 2023] ? asm_exc_invalid_op+0x1a/0x20 [Sun Jul 9 09:39:59 2023] ? dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] ? dev_watchdog+0x235/0x240 [Sun Jul 9 09:39:59 2023] ? __pfx_dev_watchdog+0x10/0x10 [Sun Jul 9 09:39:59 2023] call_timer_fn+0x24/0x130 [Sun Jul 9 09:39:59 2023] ? __pfx_dev_watchdog+0x10/0x10 [Sun Jul 9 09:39:59 2023] __run_timers+0x222/0x2c0 [Sun Jul 9 09:39:59 2023] run_timer_softirq+0x1d/0x40 [Sun Jul 9 09:39:59 2023] __do_softirq+0xc7/0x2ae [Sun Jul 9 09:39:59 2023] __irq_exit_rcu+0xab/0xe0 [Sun Jul 9 09:39:59 2023] sysvec_apic_timer_interrupt+0x72/0x90 [Sun Jul 9 09:39:59 2023] </IRQ> [Sun Jul 9 09:39:59 2023] <TASK> [Sun Jul 9 09:39:59 2023] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [Sun Jul 9 09:39:59 2023] RIP: 0010:cpuidle_enter_state+0xcc/0x440 [Sun Jul 9 09:39:59 2023] Code: 6a b9 65 ff e8 15 f2 ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 03 cd 64 ff 45 84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [Sun Jul 9 09:39:59 2023] RSP: 0018:ffffab87000bbe90 EFLAGS: 00000246 [Sun Jul 9 09:39:59 2023] RAX: ffff8d67388b2840 RBX: ffff8d67388bd200 RCX: 0000000000000000 [Sun Jul 9 09:39:59 2023] RDX: 0000000000000001 RSI: fffffffd0464e3d0 RDI: 0000000000000000 [Sun Jul 9 09:39:59 2023] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000401a41a4 [Sun Jul 9 09:39:59 2023] R10: 0000000000000008 R11: 0000000000000085 R12: ffffffff95dacf80 [Sun Jul 9 09:39:59 2023] R13: 0000004cab8f133d R14: 0000000000000001 R15: 0000000000000000 [Sun Jul 9 09:39:59 2023] cpuidle_enter+0x2d/0x40 [Sun Jul 9 09:39:59 2023] do_idle+0x1d8/0x230 [Sun Jul 9 09:39:59 2023] cpu_startup_entry+0x1d/0x20 [Sun Jul 9 09:39:59 2023] start_secondary+0x12b/0x150 [Sun Jul 9 09:39:59 2023] secondary_startup_64_no_verify+0x10b/0x10b [Sun Jul 9 09:39:59 2023] </TASK> [Sun Jul 9 09:39:59 2023] ---[ end trace 0000000000000000 ]--- [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Register Dump [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Register Name Value [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: CTRL 181c0641 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: STATUS 40680683 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: CTRL_EXT 10000040 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: MDIC 1805dde1 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: ICR 00000081 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RCTL 0440803a [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDLEN[0-3] 00001000 00001000 00001000 00001000 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDH[0-3] 0000000d 00000081 0000001e 00000010 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDT[0-3] 0000000c 00000080 0000001d 00000003 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RXDCTL[0-3] 02040808 02040808 02040808 02040808 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDBAL[0-3] 35106000 35109000 3510c000 3510f000 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: RDBAH[0-3] 00000001 00000001 00000001 00000001 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TCTL a503f0fa [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDBAL[0-3] 350f4000 350f9000 350fe000 35103000 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDBAH[0-3] 00000001 00000001 00000001 00000001 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDLEN[0-3] 00001000 00001000 00001000 00001000 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDH[0-3] 0000002b 000000e9 000000d2 000000a7 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TDT[0-3] 0000002b 000000ea 000000d2 000000a7 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: TXDCTL[0-3] 02100108 02100108 02100108 02100108 [Sun Jul 9 09:39:59 2023] igc 0000:03:00.0 enp3s0: Reset adapter [Sun Jul 9 09:39:59 2023] br-lan: port 3(enp3s0) entered disabled state ``` ## Scope This affects all devices with Intel i225-V NICs, namely shinbu (4 NICs) and fuuko (1 external NIC). <details> <summary>shinobu: <code>lspci -vnn -d 8086:15f3</code></summary> 01:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) Subsystem: Intel Corporation Device [8086:0000] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at 7f400000 (32-bit, non-prefetchable) [size=1M] Memory at 7f600000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at 7f500000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-63 Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc Kernel modules: igc 02:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) Subsystem: Intel Corporation Device [8086:0000] Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at 7f700000 (32-bit, non-prefetchable) [size=1M] Memory at 7f900000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at 7f800000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-64 Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc Kernel modules: igc 03:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) Subsystem: Intel Corporation Device [8086:0000] Flags: bus master, fast devsel, latency 0, IRQ 18 Memory at 7fa00000 (32-bit, non-prefetchable) [size=1M] Memory at 7fc00000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at 7fb00000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-65 Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc Kernel modules: igc 04:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) Subsystem: Intel Corporation Device [8086:0000] Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at 7fd00000 (32-bit, non-prefetchable) [size=1M] Memory at 7ff00000 (32-bit, non-prefetchable) [size=16K] Expansion ROM at 7fe00000 [disabled] [size=1M] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 64-62-66-ff-ff-21-f5-66 Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc Kernel modules: igc </details> <details> <summary>fuuko: <code>lspci -vnn -d 8086:15f3</code></summary> 0a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 03) Subsystem: Intel Corporation Device [8086:0000] Flags: bus master, fast devsel, latency 0, IRQ 41, IOMMU group 15 Memory at fb800000 (32-bit, non-prefetchable) [size=1M] Memory at fb900000 (32-bit, non-prefetchable) [size=16K] Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [70] MSI-X: Enable+ Count=5 Masked- Capabilities: [a0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 02-76-c6-ff-ff-00-a2-c0 Capabilities: [1c0] Latency Tolerance Reporting Capabilities: [1f0] Precision Time Measurement Capabilities: [1e0] L1 PM Substates Kernel driver in use: igc Kernel modules: igc </details> Firmware version ``` shinobu # ethtool -i enp1s0 driver: igc version: 6.1.35 firmware-version: 1057:8754 expansion-rom-version: bus-info: 0000:01:00.0 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes ``` Kernel version ``` shinobu # uname -a Linux shinobu 6.1.35 #1-NixOS SMP PREEMPT_DYNAMIC Wed Jun 21 14:01:03 UTC 2023 x86_64 GNU/Linux ``` Sometimes, it happens just spontaneously with nothing from my end, but the chances of it happening are increased by the following factors: * Being connected to a 2.5GbE port (without this it is very unlikely) * Changing the link status (by unplugging cables/power cycling machines etc.) often directly triggers it * High throughput: Running e.g. iperf3 for a longer time (multiple runs or a long run) will often cause it ## Upstream issues There seem to be problems with the specific integration of the i225-V on ASUS mainboards, which are related to power saving measures and cause the device to disappear from the PCI bus. From what I can see, this is not what happens in this case. * https://bugzilla.kernel.org/show_bug.cgi?id=216257 however, this also has a `Detected Tx Unit Hang` error message, which I have never seen * https://www.intel.co.uk/content/www/uk/en/support/articles/000057261/ethernet-products/gigabit-ethernet-controllers-up-to-2-5gbe.html I do not know if this refers to the same problem, but this (updating the firmware) might solve it. However, there is no way laid out how to do this on Linux. ## Solution ?
simon added the
type
bug
blocked by/testing needed
affects/hardware
affects/usability
labels 2023-07-07 23:16:32 +02:00
Poster
Owner

I reached out to Protectli, since 4 out of 5 affected ports are on their device (and they might have other customers with the same problem).

I reached out to Protectli, since 4 out of 5 affected ports are on their device (and they might have other customers with the same problem).
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: simon/nixos-config#74
There is no content yet.