Skip to content
  • David Woodhouse's avatar
    54ad31fb
    hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update · 54ad31fb
    David Woodhouse authored
    
    
    A Linux guest will perform IRQ migration after the IRQ has happened,
    updating the RTE to point to the new destination CPU and then unmasking
    the interrupt.
    
    However, when the guest updates the RTE, ioapic_mem_write() calls
    ioapic_service(), which redelivers the pending level interrupt via
    kvm_set_irq(), *before* calling ioapic_update_kvm_routes() which sets
    the new target CPU.
    
    Thus, the IRQ which is supposed to go to the new target CPU is instead
    misdelivered to the previous target. An example where the guest kernel
    is attempting to migrate from CPU#2 to CPU#0 shows:
    
    xenstore_read tx 0 path control/platform-feature-xs_reset_watches
    ioapic_set_irq vector: 11 level: 1
    ioapic_set_remote_irr set remote irr for pin 11
    ioapic_service: trigger KVM IRQ 11
    [    0.523627] The affinity mask was 0-3 and the handler is on 2
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
    ioapic_update_kvm_routes: update KVM route for IRQ 11: fee02000 8021
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
    xenstore_reset_watches
    ioapic_set_irq vector: 11 level: 1
    ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x1c021
    [    0.524569] ioapic_ack_level IRQ 11 moveit = 1
    ioapic_eoi_broadcast EOI broadcast for vector 33
    ioapic_clear_remote_irr clear remote irr for pin 11 vector 33
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
    ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x18021
    [    0.525235] ioapic_finish_move IRQ 11 calls irq_move_masked_irq()
    [    0.526147] irq_do_set_affinity for IRQ 11, 0
    [    0.526732] ioapic_set_affinity for IRQ 11, 0
    [    0.527330] ioapic_setup_msg_from_msi for IRQ11 target 0
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x27
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x27 size 0x4 val 0x0
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
    [    0.527623] ioapic_set_affinity returns 0
    [    0.527623] ioapic_finish_move IRQ 11 calls unmask_ioapic_irq()
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x8021
    ioapic_set_remote_irr set remote irr for pin 11
    ioapic_service: trigger KVM IRQ 11
    ioapic_update_kvm_routes: update KVM route for IRQ 11: fee00000 8021
    [    0.529571] The affinity mask was 0 and the handler is on 2
    [    xenstore_watch path memory/target token FFFFFFFF92847D40
    
    There are no other code paths in ioapic_mem_write() which need the KVM
    IRQ routing table to be updated, so just shift the call from the end
    of the function to happen right before the call to ioapic_service()
    and thus deliver the re-enabled IRQ to the right place.
    
    Alternative fixes might have been just to remove the part in
    ioapic_service() which delivers the IRQ via kvm_set_irq() because
    surely delivering as MSI ought to work just fine anyway in all cases?
    That code lacks a comment justifying its existence.
    
    Or maybe in the specific case shown in the above log, it would have
    sufficed for ioapic_update_kvm_routes() to update the route *even*
    when the IRQ is masked. It's not like it's actually going to get
    triggered unless QEMU deliberately does so, anyway? But that only
    works because the target CPU happens to be in the high word of the
    RTE; if something in the *low* word (vector, perhaps) was changed
    at the same time as the unmask, we'd still trigger with stale data.
    
    Fixes: 15eafc2e "kvm: x86: add support for KVM_CAP_SPLIT_IRQCHIP"
    Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
    Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
    Message-Id: <20230308111952.2728440-2-dwmw2@infradead.org>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    54ad31fb
    hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update
    David Woodhouse authored
    
    
    A Linux guest will perform IRQ migration after the IRQ has happened,
    updating the RTE to point to the new destination CPU and then unmasking
    the interrupt.
    
    However, when the guest updates the RTE, ioapic_mem_write() calls
    ioapic_service(), which redelivers the pending level interrupt via
    kvm_set_irq(), *before* calling ioapic_update_kvm_routes() which sets
    the new target CPU.
    
    Thus, the IRQ which is supposed to go to the new target CPU is instead
    misdelivered to the previous target. An example where the guest kernel
    is attempting to migrate from CPU#2 to CPU#0 shows:
    
    xenstore_read tx 0 path control/platform-feature-xs_reset_watches
    ioapic_set_irq vector: 11 level: 1
    ioapic_set_remote_irr set remote irr for pin 11
    ioapic_service: trigger KVM IRQ 11
    [    0.523627] The affinity mask was 0-3 and the handler is on 2
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
    ioapic_update_kvm_routes: update KVM route for IRQ 11: fee02000 8021
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
    xenstore_reset_watches
    ioapic_set_irq vector: 11 level: 1
    ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x1c021
    [    0.524569] ioapic_ack_level IRQ 11 moveit = 1
    ioapic_eoi_broadcast EOI broadcast for vector 33
    ioapic_clear_remote_irr clear remote irr for pin 11 vector 33
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
    ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x18021
    [    0.525235] ioapic_finish_move IRQ 11 calls irq_move_masked_irq()
    [    0.526147] irq_do_set_affinity for IRQ 11, 0
    [    0.526732] ioapic_set_affinity for IRQ 11, 0
    [    0.527330] ioapic_setup_msg_from_msi for IRQ11 target 0
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x27
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x27 size 0x4 val 0x0
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
    [    0.527623] ioapic_set_affinity returns 0
    [    0.527623] ioapic_finish_move IRQ 11 calls unmask_ioapic_irq()
    ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
    ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x8021
    ioapic_set_remote_irr set remote irr for pin 11
    ioapic_service: trigger KVM IRQ 11
    ioapic_update_kvm_routes: update KVM route for IRQ 11: fee00000 8021
    [    0.529571] The affinity mask was 0 and the handler is on 2
    [    xenstore_watch path memory/target token FFFFFFFF92847D40
    
    There are no other code paths in ioapic_mem_write() which need the KVM
    IRQ routing table to be updated, so just shift the call from the end
    of the function to happen right before the call to ioapic_service()
    and thus deliver the re-enabled IRQ to the right place.
    
    Alternative fixes might have been just to remove the part in
    ioapic_service() which delivers the IRQ via kvm_set_irq() because
    surely delivering as MSI ought to work just fine anyway in all cases?
    That code lacks a comment justifying its existence.
    
    Or maybe in the specific case shown in the above log, it would have
    sufficed for ioapic_update_kvm_routes() to update the route *even*
    when the IRQ is masked. It's not like it's actually going to get
    triggered unless QEMU deliberately does so, anyway? But that only
    works because the target CPU happens to be in the high word of the
    RTE; if something in the *low* word (vector, perhaps) was changed
    at the same time as the unmask, we'd still trigger with stale data.
    
    Fixes: 15eafc2e "kvm: x86: add support for KVM_CAP_SPLIT_IRQCHIP"
    Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
    Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
    Message-Id: <20230308111952.2728440-2-dwmw2@infradead.org>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
Loading