Skip to content
Snippets Groups Projects
  1. Oct 12, 2023
  2. Oct 03, 2023
  3. Sep 29, 2023
  4. Sep 08, 2023
  5. Aug 24, 2023
  6. Aug 22, 2023
  7. Jul 31, 2023
    • Gavin Shan's avatar
      kvm: Fix crash due to access uninitialized kvm_state · fe6bda58
      Gavin Shan authored
      
      Runs into core dump on arm64 and the backtrace extracted from the
      core dump is shown as below. It's caused by accessing uninitialized
      @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d0730
      ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's
      memory region is added earlier than before.
      
          main
          qemu_init
          configure_accelerators
          qemu_opts_foreach
          do_configure_accelerator
          accel_init_machine
          kvm_init
          virt_kvm_type
          virt_set_memmap
          machine_memory_devices_init
          memory_region_add_subregion
          memory_region_add_subregion_common
          memory_region_update_container_subregions
          memory_region_transaction_begin
          qemu_flush_coalesced_mmio_buffer
          kvm_flush_coalesced_mmio_buffer
      
      Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the
      uninitialized @kvm_state. With this applied, no crash is observed on
      arm64.
      
      Fixes: 176d0730 ("hw/arm/virt: Use machine_memory_devices_init()")
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Message-id: 20230731125946.2038742-1-gshan@redhat.com
      Signed-off-by: default avatarPeter Maydell <peter.maydell@linaro.org>
      fe6bda58
  8. Jun 28, 2023
  9. Jun 26, 2023
    • Marcelo Tosatti's avatar
      kvm: reuse per-vcpu stats fd to avoid vcpu interruption · 3b6f4852
      Marcelo Tosatti authored
      
      A regression has been detected in latency testing of KVM guests.
      More specifically, it was observed that the cyclictest
      numbers inside of an isolated vcpu (running on isolated pcpu) are:
      
      Where a maximum of 50us is acceptable.
      
      The implementation of KVM_GET_STATS_FD uses run_on_cpu to query
      per vcpu statistics, which interrupts the vcpu (and is unnecessary).
      
      To fix this, open the per vcpu stats fd on vcpu initialization,
      and read from that fd from QEMU's main thread.
      
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      3b6f4852
  10. May 18, 2023
  11. Apr 04, 2023
  12. Mar 07, 2023
  13. Mar 01, 2023
  14. Feb 27, 2023
  15. Feb 04, 2023
  16. Jan 11, 2023
    • David Hildenbrand's avatar
      kvm: Atomic memslot updates · f39b7d2b
      David Hildenbrand authored
      
      If we update an existing memslot (e.g., resize, split), we temporarily
      remove the memslot to re-add it immediately afterwards. These updates
      are not atomic, especially not for KVM VCPU threads, such that we can
      get spurious faults.
      
      Let's inhibit most KVM ioctls while performing relevant updates, such
      that we can perform the update just as if it would happen atomically
      without additional kernel support.
      
      We capture the add/del changes and apply them in the notifier commit
      stage instead. There, we can check for overlaps and perform the ioctl
      inhibiting only if really required (-> overlap).
      
      To keep things simple we don't perform additional checks that wouldn't
      actually result in an overlap -- such as !RAM memory regions in some
      cases (see kvm_set_phys_mem()).
      
      To minimize cache-line bouncing, use a separate indicator
      (in_ioctl_lock) per CPU.  Also, make sure to hold the kvm_slots_lock
      while performing both actions (removing+re-adding).
      
      We have to wait until all IOCTLs were exited and block new ones from
      getting executed.
      
      This approach cannot result in a deadlock as long as the inhibitor does
      not hold any locks that might hinder an IOCTL from getting finished and
      exited - something fairly unusual. The inhibitor will always hold the BQL.
      
      AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
      placed (e.g., during postcopy), because we're waiting for a lock, or if the
      userfaultfd thread cannot process a fault, because it is waiting for a
      lock, there could be a deadlock. However, the BQL is not applicable here,
      because any other guest memory access while holding the BQL would already
      result in a deadlock.
      
      Nothing else in the kernel should block forever and wait for userspace
      intervention.
      
      Note: pause_all_vcpus()/resume_all_vcpus() or
      start_exclusive()/end_exclusive() cannot be used, as they either drop
      the BQL or require to be called without the BQL - something inhibitors
      cannot handle. We need a low-level locking mechanism that is
      deadlock-free even when not releasing the BQL.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarEmanuele Giuseppe Esposito <eesposit@redhat.com>
      Tested-by: default avatarEmanuele Giuseppe Esposito <eesposit@redhat.com>
      Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      f39b7d2b
    • Emanuele Giuseppe Esposito's avatar
      KVM: keep track of running ioctls · a27dd2de
      Emanuele Giuseppe Esposito authored
      
      Using the new accel-blocker API, mark where ioctls are being called
      in KVM. Next, we will implement the critical section that will take
      care of performing memslots modifications atomically, therefore
      preventing any new ioctl from running and allowing the running ones
      to finish.
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Signed-off-by: default avatarEmanuele Giuseppe Esposito <eesposit@redhat.com>
      Message-Id: <20221111154758.1372674-3-eesposit@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a27dd2de
  17. Dec 14, 2022
  18. Oct 11, 2022
    • Chenyi Qiang's avatar
      i386: add notify VM exit support · e2e69f6b
      Chenyi Qiang authored
      
      There are cases that malicious virtual machine can cause CPU stuck (due
      to event windows don't open up), e.g., infinite loop in microcode when
      nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
      IRQ) can be delivered. It leads the CPU to be unavailable to host or
      other VMs. Notify VM exit is introduced to mitigate such kind of
      attacks, which will generate a VM exit if no event window occurs in VM
      non-root mode for a specified amount of time (notify window).
      
      A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
      so that the user can query the capability and set the expected notify
      window when creating VMs. The format of the argument when enabling this
      capability is as follows:
        Bit 63:32 - notify window specified in qemu command
        Bit 31:0  - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
                    enable the feature.)
      
      Users can configure the feature by a new (x86 only) accel property:
          qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n
      
      The default option of notify-vmexit is run, which will enable the
      capability and do nothing if the exit happens. The internal-error option
      raises a KVM internal error if it happens. The disable option does not
      enable the capability. The default value of notify-window is 0. It is valid
      only when notify-vmexit is not disabled. The valid range of notify-window
      is non-negative. It is even safe to set it to zero since there's an
      internal hardware threshold to be added to ensure no false positive.
      
      Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit
      qualification (no cases are anticipated that would set this bit), which
      means VM context is corrupted. It would be reflected in the flags of
      KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM
      internal error unconditionally.
      
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e2e69f6b
    • Chenyi Qiang's avatar
      kvm: expose struct KVMState · 5f8a6bce
      Chenyi Qiang authored
      
      Expose struct KVMState out of kvm-all.c so that the field of struct
      KVMState can be accessed when defining target-specific accelerator
      properties.
      
      Signed-off-by: default avatarChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      5f8a6bce
  19. Oct 10, 2022
  20. Oct 06, 2022
  21. Sep 18, 2022
    • Paolo Bonzini's avatar
      kvm: fix memory leak on failure to read stats descriptors · 21adec30
      Paolo Bonzini authored
      
      Reported by Coverity as CID 1490142.  Since the size is constant and the
      lifetime is the same as the StatsDescriptors struct, embed the struct
      directly instead of using a separate allocation.
      
      Suggested-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      21adec30
    • Paolo Bonzini's avatar
      KVM: use store-release to mark dirty pages as harvested · 52281c6d
      Paolo Bonzini authored
      
      The following scenario can happen if QEMU sets more RESET flags while
      the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU:
      
          CPU0                     CPU1               CPU2
          ------------------------ ------------------ ------------------------
                                                      fill gfn0
                                                      store-rel flags for gfn0
                                                      fill gfn1
                                                      store-rel flags for gfn1
          load-acq flags for gfn0
          set RESET for gfn0
          load-acq flags for gfn1
          set RESET for gfn1
          do ioctl! ----------->
                                   ioctl(RESET_RINGS)
                                                      fill gfn2
                                                      store-rel flags for gfn2
          load-acq flags for gfn2
          set RESET for gfn2
                                   process gfn0
                                   process gfn1
                                   process gfn2
          do ioctl!
          etc.
      
      The three load-acquire in CPU0 synchronize with the three store-release
      in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1
      may miss gfn2's fields other than flags.
      
      The kernel must be able to cope with invalid values of the fields, and
      userspace *will* invoke the ioctl once more.  However, once the RESET flag
      is cleared on gfn2, it is lost forever, therefore in the above scenario
      CPU1 must read the correct value of gfn2's fields.
      
      Therefore RESET must be set with a store-release, that will synchronize
      with KVM's load-acquire in CPU1.
      
      Cc: Gavin Shan <gshan@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      52281c6d
  22. Sep 01, 2022
Loading