Commits · 31d286a615416f769b425fc34d15edef7b1434be · Anton / libtcg

Oct 25, 2023

kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEP · 1a44a79d
Paolo Bonzini authored 1 year ago
```
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```
1a44a79d

kvm: i386: require KVM_CAP_DEBUGREGS · f57a4dd3

Paolo Bonzini authored 1 year ago


This was introduced in KVM in Linux 2.6.35, we can require it unconditionally.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f57a4dd3

kvm: unify listeners for PIO address space · 2cb81af0

Paolo Bonzini authored 1 year ago


Since we now assume that ioeventfds are present, kvm_io_listener is always
registered.  Merge it with kvm_coalesced_pio_listener in a single
listener.  Since PIO space does not have KVM memslots attached to it,
the priority is irrelevant.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2cb81af0

kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH · 126e7f78

Paolo Bonzini authored 1 year ago


KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016.
Assume that it is present.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

126e7f78

kvm: assume that many ioeventfds can be created · 5d9ec1f4

Paolo Bonzini authored 1 year ago

NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had
increased to 1000 and later ioeventfds were changed to not count against
the limit. But the earlier limit of 200 would already be enough for
kvm_check_many_ioeventfds() to be true, so remove the check.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

5d9ec1f4

kvm: drop reference to KVM_CAP_PCI_2_3 · d19fe67b

Paolo Bonzini authored 1 year ago


This is a remnant of pre-VFIO device assignment; it is not defined
anymore by Linux and not used by QEMU.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

d19fe67b

kvm: require KVM_IRQFD for kernel irqchip · a788260b

Paolo Bonzini authored 1 year ago


KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been
available on architectures that support an in-kernel interrupt controller.
We can require it unconditionally.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a788260b

kvm: require KVM_CAP_SIGNAL_MSI · cc5e719e

Paolo Bonzini authored 1 year ago


This was introduced in KVM in Linux 3.5, we can require it unconditionally
in kvm_irqchip_send_msi().  However, not all architectures have to implement
it so check it only in x86, the only architecture that ever had MSI injection
but not KVM_CAP_SIGNAL_MSI.

ARM uses it to detect the presence of the ITS emulation in the kernel,
introduced in Linux 4.8.  Assume that it's there and possibly fail when
realizing the arm-its-kvm device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cc5e719e

kvm: require KVM_CAP_INTERNAL_ERROR_DATA · aacec9ae

Paolo Bonzini authored 1 year ago

This was introduced in KVM in Linux 2.6.33, we can require it unconditionally.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

aacec9ae

Oct 12, 2023

kvm: Add stub for kvm_get_max_memslots() · 16ab2eda

David Hildenbrand authored 1 year ago


We'll need the stub soon from memory device context.

While at it, use "unsigned int" as return value and place the
declaration next to kvm_get_free_memslots().

Message-ID: <20230926185738.277351-11-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

16ab2eda

kvm: Return number of free memslots · 5b23186a

David Hildenbrand authored 1 year ago


Let's return the number of free slots instead of only checking if there
is a free slot. While at it, check all address spaces, which will also
consider SMM under x86 correctly.

This is a preparation for memory devices that consume multiple memslots.

Message-ID: <20230926185738.277351-5-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

5b23186a

Oct 03, 2023

accel/tcg: Move can_do_io to CPUNegativeOffsetState · 464dacf6

Richard Henderson authored 1 year ago


Minimize the displacement to can_do_io, since it may
be touched at the start of each TranslationBlock.
It fits into other padding within the substructure.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

464dacf6

Sep 29, 2023

accel/kvm/kvm-all: Handle register access errors · 7191f24c

Akihiko Odaki authored 2 years ago


A register access error typically means something seriously wrong
happened so that anything bad can happen after that and recovery is
impossible.
Even failing one register access is catastorophic as
architecture-specific code are not written so that it torelates such
failures.

Make sure the VM stop and nothing worse happens if such an error occurs.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

7191f24c

Sep 08, 2023

arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE · c8f2eb5d

Shameer Kolothum authored 1 year ago


Now that we have Eager Page Split support added for ARM in the kernel,
enable it in Qemu. This adds,
 -eager-split-size to -accel sub-options to set the eager page split chunk size.
 -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.

The chunk size specifies how many pages to break at a time, using a
single allocation. Bigger the chunk size, more pages need to be
allocated ahead of time.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

c8f2eb5d

Aug 24, 2023

accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint · b67be03e

Anton Johansson authored 1 year ago


Widens the pc and saved_insn fields of kvm_sw_breakpoint from
target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also
widened to match.

Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230807155706.9580-2-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

b67be03e

Aug 22, 2023

accel/kvm: Make kvm_dirty_ring_reaper_init() void · 43a5e377

Akihiko Odaki authored 1 year ago


The returned value was always zero and had no meaning.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

43a5e377

accel/kvm: Free as when an error occurred · 4625742c

Akihiko Odaki authored 1 year ago


An error may occur after s->as is allocated, for example if the
KVM_CREATE_VM ioctl call fails.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

4625742c

accel/kvm: Use negative KVM type for error propagation · bc3e41a0

Akihiko Odaki authored 1 year ago


On MIPS, kvm_arch_get_default_type() returns a negative value when an
error occurred so handle the case. Also, let other machines return
negative values when errors occur and declare returning a negative
value as the correct way to propagate an error that happened when
determining KVM type.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

bc3e41a0

kvm: Introduce kvm_arch_get_default_type hook · 5e0d6590

Akihiko Odaki authored 1 year ago


kvm_arch_get_default_type() returns the default KVM type. This hook is
particularly useful to derive a KVM type that is valid for "none"
machine model, which is used by libvirt to probe the availability of
KVM.

For MIPS, the existing mips_kvm_type() is reused. This function ensures
the availability of VZ which is mandatory to use KVM on the current
QEMU.

Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added doc comment for new function]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

5e0d6590

Jul 31, 2023

kvm: Fix crash due to access uninitialized kvm_state · fe6bda58

Gavin Shan authored 1 year ago


Runs into core dump on arm64 and the backtrace extracted from the
core dump is shown as below. It's caused by accessing uninitialized
@kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d0730
("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's
memory region is added earlier than before.

    main
    qemu_init
    configure_accelerators
    qemu_opts_foreach
    do_configure_accelerator
    accel_init_machine
    kvm_init
    virt_kvm_type
    virt_set_memmap
    machine_memory_devices_init
    memory_region_add_subregion
    memory_region_add_subregion_common
    memory_region_update_container_subregions
    memory_region_transaction_begin
    qemu_flush_coalesced_mmio_buffer
    kvm_flush_coalesced_mmio_buffer

Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the
uninitialized @kvm_state. With this applied, no crash is observed on
arm64.

Fixes: 176d0730 ("hw/arm/virt: Use machine_memory_devices_init()")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230731125946.2038742-1-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

fe6bda58

Jun 28, 2023

exec/memory: Add symbol for the min value of memory listener priority · 14a868c6

Isaku Yamahata authored 1 year ago


Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of
the memory listener instead of the hard-coded magic value 0.  Add explicit
initialization.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

14a868c6

exec/memory: Add symbol for memory listener priority for device backend · 8be0461d

Isaku Yamahata authored 1 year ago


Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value
for memory listener to replace the hard-coded value 10 for the
device backend.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

8be0461d

exec/memory: Add symbolic value for memory listener priority for accel · 5369a36c

Isaku Yamahata authored 1 year ago


Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory
listener to replace the hard-coded value 10 for accel.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>

5369a36c

Jun 26, 2023

kvm: reuse per-vcpu stats fd to avoid vcpu interruption · 3b6f4852

Marcelo Tosatti authored 1 year ago


A regression has been detected in latency testing of KVM guests.
More specifically, it was observed that the cyclictest
numbers inside of an isolated vcpu (running on isolated pcpu) are:

Where a maximum of 50us is acceptable.

The implementation of KVM_GET_STATS_FD uses run_on_cpu to query
per vcpu statistics, which interrupts the vcpu (and is unnecessary).

To fix this, open the per vcpu stats fd on vcpu initialization,
and read from that fd from QEMU's main thread.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3b6f4852

May 18, 2023

kvm: Enable dirty ring for arm64 · 856e23a0

Gavin Shan authored 1 year ago


arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup
bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3'
or 'arm-its-kvm' device is enabled. Here the extension is always enabled
and the unnecessary overhead to do the last stage of dirty log synchronization
when those two devices aren't used is introduced, but the overhead should
be very small and acceptable. The benefit is cover future cases where those
two devices are used without modifying the code.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230509022122.20888-5-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

856e23a0

kvm: Add helper kvm_dirty_ring_init() · 3794cb94

Gavin Shan authored 1 year ago


Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-4-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

3794cb94

kvm: Synchronize the backup bitmap in the last stage · b20cc776

Gavin Shan authored 1 year ago


In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-3-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

b20cc776

migration: Add last stage indicator to global dirty log · 1e493be5

Gavin Shan authored 1 year ago


The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20230509022122.20888-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

1e493be5

Apr 04, 2023

kvm: dirty-ring: Fix race with vcpu creation · 56adee40

Peter Xu authored 2 years ago

It's possible that we want to reap a dirty ring on a vcpu that is during
creation, because the vcpu is put onto list (CPU_FOREACH visible) before
initialization of the structures.  In this case:

qemu_init_vcpu
    x86_cpu_realizefn
        cpu_exec_realizefn
            cpu_list_add      <---- can be probed by CPU_FOREACH
        qemu_init_vcpu
            cpus_accel->create_vcpu_thread(cpu);
                kvm_init_vcpu
                    map kvm_dirty_gfns  <--- kvm_dirty_gfns valid

Don't try to reap dirty ring on vcpus during creation or it'll crash.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756


Reported-by: Xiaohui Li <xiaohli@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

56adee40

Mar 07, 2023

gdbstub: move update guest debug to accel ops · 412ae126

Mads Ynddal authored 2 years ago


Continuing the refactor of a48e7d9e (gdbstub: move guest debug support
check to ops) by removing hardcoded kvm_enabled() from generic cpu.c
code, and replace it with a property of AccelOpsClass.

Signed-off-by: Mads Ynddal <m.ynddal@samsung.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230207131721.49233-1-mads@ynddal.dk>
[AJB: add ifdef around update_guest_debug_ops, fix brace]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230302190846.2593720-27-alex.bennee@linaro.org>
Message-Id: <20230303025805.625589-30-richard.henderson@linaro.org>

412ae126

Mar 01, 2023

kvm/i386: Add xen-evtchn-max-pirq property · e16aff4c

David Woodhouse authored 2 years ago


The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI
devices. Allow it to be increased if the user desires.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

e16aff4c

kvm/i386: Add xen-gnttab-max-frames property · 6f43f2ee

David Woodhouse authored 2 years ago


Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

6f43f2ee

i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support · 61491cf4

David Woodhouse authored 2 years ago


This just initializes the basic Xen support in KVM for now. Only permitted
on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap
overlay, event channel, grant tables and other stuff will exist. There's
no point having the basic hypercall support if nothing else works.

Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used
later by support devices.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>

61491cf4

Feb 27, 2023

accel/kvm: Silent -Wmissing-field-initializers warning · 2459d420

Philippe Mathieu-Daudé authored 2 years ago


Silent when compiling with -Wextra:

  ../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers]
        { NULL, }
                ^

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221220143532.24958-3-philmd@linaro.org>

2459d420

gdbstub: Use vaddr type for generic insert/remove_breakpoint() API · 55b5b8e9

Philippe Mathieu-Daudé authored 2 years ago


Both insert/remove_breakpoint() handlers are used in system and
user emulation. We can not use the 'hwaddr' type on user emulation,
we have to use 'vaddr' which is defined as "wide enough to contain
any #target_ulong virtual address".

gdbstub.c doesn't require to include "exec/hwaddr.h" anymore.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221216215519.5522-4-philmd@linaro.org>

55b5b8e9

Feb 04, 2023

stats: Move QMP commands from monitor/ to stats/ · aa09b3d5

Markus Armbruster authored 2 years ago


This moves these commands from MAINTAINERS section "QMP" to new
section "Stats".  Status is Orphan.  Volunteers welcome!

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20230124121946.1139465-23-armbru@redhat.com>

aa09b3d5

Jan 11, 2023

kvm: Atomic memslot updates · f39b7d2b

David Hildenbrand authored 2 years ago


If we update an existing memslot (e.g., resize, split), we temporarily
remove the memslot to re-add it immediately afterwards. These updates
are not atomic, especially not for KVM VCPU threads, such that we can
get spurious faults.

Let's inhibit most KVM ioctls while performing relevant updates, such
that we can perform the update just as if it would happen atomically
without additional kernel support.

We capture the add/del changes and apply them in the notifier commit
stage instead. There, we can check for overlaps and perform the ioctl
inhibiting only if really required (-> overlap).

To keep things simple we don't perform additional checks that wouldn't
actually result in an overlap -- such as !RAM memory regions in some
cases (see kvm_set_phys_mem()).

To minimize cache-line bouncing, use a separate indicator
(in_ioctl_lock) per CPU.  Also, make sure to hold the kvm_slots_lock
while performing both actions (removing+re-adding).

We have to wait until all IOCTLs were exited and block new ones from
getting executed.

This approach cannot result in a deadlock as long as the inhibitor does
not hold any locks that might hinder an IOCTL from getting finished and
exited - something fairly unusual. The inhibitor will always hold the BQL.

AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
placed (e.g., during postcopy), because we're waiting for a lock, or if the
userfaultfd thread cannot process a fault, because it is waiting for a
lock, there could be a deadlock. However, the BQL is not applicable here,
because any other guest memory access while holding the BQL would already
result in a deadlock.

Nothing else in the kernel should block forever and wait for userspace
intervention.

Note: pause_all_vcpus()/resume_all_vcpus() or
start_exclusive()/end_exclusive() cannot be used, as they either drop
the BQL or require to be called without the BQL - something inhibitors
cannot handle. We need a low-level locking mechanism that is
deadlock-free even when not releasing the BQL.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

f39b7d2b

KVM: keep track of running ioctls · a27dd2de

Emanuele Giuseppe Esposito authored 2 years ago


Using the new accel-blocker API, mark where ioctls are being called
in KVM. Next, we will implement the critical section that will take
care of performing memslots modifications atomically, therefore
preventing any new ioctl from running and allowing the running ones
to finish.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-3-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

a27dd2de

Dec 14, 2022

qapi: Use returned bool to check for failure (again) · d1c81c34

Markus Armbruster authored 2 years ago


Commit 012d4c96 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking.  Since then a few more uses
in need of the same treatment crept in.  Do that.  All pretty
mechanical except for

* balloon_stats_get_all()

  This is basically the same transformation commit 012d4c96 applied
  to the virtual walk example in include/qapi/visitor.h.

* set_max_queue_size()

  Additionally replace "goto end of function" by return.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

d1c81c34

Oct 11, 2022

i386: add notify VM exit support · e2e69f6b

Chenyi Qiang authored 2 years ago

There are cases that malicious virtual machine can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs. Notify VM exit is introduced to mitigate such kind of
attacks, which will generate a VM exit if no event window occurs in VM
non-root mode for a specified amount of time (notify window).

A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space
so that the user can query the capability and set the expected notify
window when creating VMs. The format of the argument when enabling this
capability is as follows:
Bit 63:32 - notify window specified in qemu command
Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to
enable the feature.)

Users can configure the feature by a new (x86 only) accel property:
qemu -accel kvm,notify-vmexit=run|internal-error|disable,notify-window=n

The default option of notify-vmexit is run, which will enable the
capability and do nothing if the exit happens. The internal-error option
raises a KVM internal error if it happens. The disable option does not
enable the capability. The default value of notify-window is 0. It is valid
only when notify-vmexit is not disabled. The valid range of notify-window
is non-negative. It is even safe to set it to zero since there's an
internal hardware threshold to be added to ensure no false positive.

Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit
qualification (no cases are anticipated that would set this bit), which
means VM context is corrupted. It would be reflected in the flags of
KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM
internal error unconditionally.

Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

e2e69f6b