Virtualize hitagi #81

Closed
opened 2024-03-06 22:15:33 +01:00 by simon · 1 comment
Owner

This is an experiment. This is a tracking issue on my findings.

Motivation

TBD

Current progress

TBD

Issues

Reset not working with Windows guest

Problem

When shutting down windows guest, it stays in the libvirt state “shutting down”, the host system becomes more and more unresponsive and in the end hangs.

Log

Mar 10 13:52:56 hyper kernel: vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
Mar 10 13:52:58 hyper kernel: vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
Mar 10 13:53:01 hyper kernel: vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
Mar 10 13:53:06 hyper kernel: vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting
Mar 10 13:53:16 hyper kernel: vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting
Mar 10 13:53:33 hyper kernel: vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting
Mar 10 13:54:10 hyper kernel: vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up
Mar 10 13:54:22 hyper kernel: vfio-pci 0000:08:00.0: not ready 1023ms after bus reset; waiting
Mar 10 13:54:24 hyper kernel: vfio-pci 0000:08:00.0: not ready 2047ms after bus reset; waiting
Mar 10 13:54:27 hyper kernel: vfio-pci 0000:08:00.0: not ready 4095ms after bus reset; waiting
Mar 10 13:54:32 hyper kernel: vfio-pci 0000:08:00.0: not ready 8191ms after bus reset; waiting
Mar 10 13:54:42 hyper kernel: vfio-pci 0000:08:00.0: not ready 16383ms after bus reset; waiting
Mar 10 13:54:59 hyper kernel: vfio-pci 0000:08:00.0: not ready 32767ms after bus reset; waiting
Mar 10 13:55:37 hyper kernel: vfio-pci 0000:08:00.0: not ready 65535ms after bus reset; giving up

Resources

Additional findings

  • Does not happen with Linux guests

Audio latency

Problem:

  • Audio is delayed by over 350 ms.

Additional findings

  • Audio device does not influence it (by much), Tested with
    • MOTU M2 (USB)
    • cheap USB adapter
    • Intel Arc (HDMI)
    • AMD PCIe (mainboard header/case port)
  • osu lazer is barely functional, varying the playback speed from very fast to almost regular speed, which makes it unusable. Sometimes the audio completely goes away after a short time.
  • On mayushii, using a MOTU M2, the full round trip time (audacity → pipewire → M2 → Speaker → Microphone → M2 → pipewire → audacity) is ~ 70 ms
  • Setting clock.force-quantum to 512 or lower fixes the osu behaviour
  • Achieving somewhat unnoticeable latency is possibly by forcing quantum to 16 and the sample rate to 192k (not feasible in practice, as this has a very audible negative impact on consistency)
  • A fedora VM on mayushii (without tuning) also has >300 ms latency
  • pulseaudio fixes the osu issue but still has high delay
  • plain alsa (on virtualized hitagi!) has an impressive latency of 3 ms

Guesses

  • osu (which needs accurate timings) being really broken leads me to suspect that not the passthrough of the device itself is the problem, but rather an inaccurate clock in the kernel or pipewire

TODO

  • Test latency on mayushii
  • Test latency on fedora VM on mayushii
  • Find objective way of measuring the delay: audacity beat track → record to new track; measure delay between first impulses
  • Test latency on bare metal hitagi
  • Test latency on Windows VM
  • Test other demanding audio applications if they exhibit a behaviour similar to osu’s

Resolution

By default, pipewire applies tuning for VMs. However, that assumes an emulated sound card, which is not the case for me. By setting PIPEWIRE_VM, a specific VM type can be forced, though I think it can’t be disabled when it is detected. A workaround is to copy the host’s system information, which tricks pipewire’s VM detection.

<!-- … -->
  <os>
    <!-- … -->
    <smbios mode="host"/>
  </os>
<!-- … -->
This is an experiment. This is a tracking issue on my findings. ## Motivation TBD ## Current progress TBD ## Issues ### Reset not working with Windows guest #### Problem When shutting down windows guest, it stays in the libvirt state “shutting down”, the host system becomes more and more unresponsive and in the end hangs. #### Log ``` Mar 10 13:52:56 hyper kernel: vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting Mar 10 13:52:58 hyper kernel: vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting Mar 10 13:53:01 hyper kernel: vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting Mar 10 13:53:06 hyper kernel: vfio-pci 0000:08:00.0: not ready 8191ms after FLR; waiting Mar 10 13:53:16 hyper kernel: vfio-pci 0000:08:00.0: not ready 16383ms after FLR; waiting Mar 10 13:53:33 hyper kernel: vfio-pci 0000:08:00.0: not ready 32767ms after FLR; waiting Mar 10 13:54:10 hyper kernel: vfio-pci 0000:08:00.0: not ready 65535ms after FLR; giving up Mar 10 13:54:22 hyper kernel: vfio-pci 0000:08:00.0: not ready 1023ms after bus reset; waiting Mar 10 13:54:24 hyper kernel: vfio-pci 0000:08:00.0: not ready 2047ms after bus reset; waiting Mar 10 13:54:27 hyper kernel: vfio-pci 0000:08:00.0: not ready 4095ms after bus reset; waiting Mar 10 13:54:32 hyper kernel: vfio-pci 0000:08:00.0: not ready 8191ms after bus reset; waiting Mar 10 13:54:42 hyper kernel: vfio-pci 0000:08:00.0: not ready 16383ms after bus reset; waiting Mar 10 13:54:59 hyper kernel: vfio-pci 0000:08:00.0: not ready 32767ms after bus reset; waiting Mar 10 13:55:37 hyper kernel: vfio-pci 0000:08:00.0: not ready 65535ms after bus reset; giving up ``` #### Resources - Similar Issue: https://forum.proxmox.com/threads/issues-with-intel-arc-a770m-gpu-passthrough-on-nuc12snki72-vfio-pci-not-ready-after-flr-or-bus-reset.130667/ (workaround does not work) #### Additional findings - Does not happen with Linux guests ### Audio latency #### Problem: - Audio is delayed by over 350 ms. #### Additional findings - Audio device does not influence it (by much), Tested with - MOTU M2 (USB) - cheap USB adapter - Intel Arc (HDMI) - AMD PCIe (mainboard header/case port) - osu lazer is barely functional, varying the playback speed from very fast to almost regular speed, which makes it unusable. Sometimes the audio completely goes away after a short time. - On mayushii, using a MOTU M2, the full round trip time (audacity → pipewire → M2 → Speaker → Microphone → M2 → pipewire → audacity) is ~ 70 ms - Setting `clock.force-quantum` to 512 or lower fixes the osu behaviour - Achieving somewhat unnoticeable latency is possibly by forcing quantum to 16 and the sample rate to 192k (not feasible in practice, as this has a very audible negative impact on consistency) - A fedora VM on mayushii (without tuning) also has >300 ms latency - pulseaudio fixes the osu issue but still has high delay - plain alsa (on virtualized hitagi!) has an impressive latency of 3 ms #### Guesses - osu (which needs accurate timings) being really broken leads me to suspect that not the passthrough of the device itself is the problem, but rather an inaccurate clock in the kernel or pipewire #### TODO - [X] Test latency on mayushii - [X] Test latency on fedora VM on mayushii - [X] Find objective way of measuring the delay: audacity beat track → record to new track; measure delay between first impulses - ~~Test latency on bare metal hitagi~~ - ~~Test latency on Windows VM~~ - ~~Test other demanding audio applications if they exhibit a behaviour similar to osu’s~~ #### Resolution By default, pipewire applies tuning for VMs. However, that assumes an emulated sound card, which is not the case for me. By setting `PIPEWIRE_VM`, a specific VM type can be forced, though I think it can’t be disabled when it is detected. A workaround is to copy the host’s system information, which tricks pipewire’s VM detection. ```xml <!-- … --> <os> <!-- … --> <smbios mode="host"/> </os> <!-- … --> ```
simon added the
type
tracking
label 2024-03-06 22:15:33 +01:00
simon self-assigned this 2024-03-06 22:15:33 +01:00
Author
Owner

This caused much more issues than it solved. Also, it introduced quite a bit of complexity. Therefore, I abandoned the idea for now.

The progress can be found in the hyper branch (f2ecb958ab).

This caused much more issues than it solved. Also, it introduced quite a bit of complexity. Therefore, I abandoned the idea for now. The progress can be found in the `hyper` branch (f2ecb958ab7701a8a6fe9bc20f9a818ea465b7a5).
simon closed this issue 2024-04-01 21:49:12 +02:00
simon added the
resolution
wontfix
label 2024-04-01 21:49:33 +02:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: simon/nixos-config#81
No description provided.