Skip to content
Snippets Groups Projects
  1. Sep 19, 2023
    • hongmianquan's avatar
      memory: avoid updating ioeventfds for some address_space · 544cff46
      hongmianquan authored
      
      When updating ioeventfds, we need to iterate all address spaces,
      but some address spaces do not register eventfd_add|del call when
      memory_listener_register() and they do nothing when updating ioeventfds.
      So we can skip these AS in address_space_update_ioeventfds().
      
      The overhead of memory_region_transaction_commit() can be significantly
      reduced. For example, a VM with 8 vhost net devices and each one has
      64 vectors, can reduce the time spent on memory_region_transaction_commit by 20%.
      
      Message-ID: <20230830032906.12488-1-hongmianquan@bytedance.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarhongmianquan <hongmianquan@bytedance.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      544cff46
    • David Hildenbrand's avatar
      softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W... · 6da4b1c2
      David Hildenbrand authored
      softmmu/physmem: Hint that "readonly=on,rom=off" exists when opening file R/W for private mapping fails
      
      It's easy to miss that memory-backend-file with "share=off" (default)
      will always try opening the file R/W as default, and fail if we don't
      have write permissions to the file.
      
      In that case, the user has to explicit specify "readonly=on,rom=off" to
      get usable RAM, for example, for VM templating.
      
      Let's hint that '-object memory-backend-file,readonly=on,rom=off,...'
      exists to consume R/O files in a private mapping to create writable RAM,
      but only if we have permissions to open the file read-only.
      
      Message-ID: <20230906120503.359863-11-david@redhat.com>
      Suggested-by: default avatarThinerLogoer <logoerthiner1@163.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      6da4b1c2
    • David Hildenbrand's avatar
      softmmu/physmem: Never return directories from file_ram_open() · ca01f1b8
      David Hildenbrand authored
      
      open() does not fail on directories when opening them readonly (O_RDONLY).
      
      Currently, we succeed opening such directories and fail later during
      mmap(), resulting in a misleading error message.
      
      $ ./qemu-system-x86_64 \
          -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
       qemu-system-x86_64: unable to map backing store for guest RAM: No such device
      
      To identify directories and handle them accordingly in file_ram_open()
      also when readonly=true was specified, detect if we just opened a directory
      using fstat() instead. Then, fail file_ram_open() right away, similarly
      to how we now fail if the file does not exist and we want to open the
      file readonly.
      
      With this change, we get a nicer error message:
       qemu-system-x86_64: can't open backing store tmp for guest RAM: Is a directory
      
      Note that the only memory-backend-file will end up calling
      memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
      file_ram_open().
      
      Message-ID: <20230906120503.359863-8-david@redhat.com>
      Reported-by: default avatarThiner Logoer <logoerthiner1@163.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarMario Casquero <mcasquer@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      ca01f1b8
    • David Hildenbrand's avatar
      softmmu/physmem: Fail creation of new files in file_ram_open() with readonly=true · 4d6b23f7
      David Hildenbrand authored
      
      Currently, if a file does not exist yet, file_ram_open() will create new
      empty file and open it writable. However, it even does that when
      readonly=true was specified.
      
      Specifying O_RDONLY instead to create a new readonly file would
      theoretically work, however, ftruncate() will refuse to resize the new
      empty file and we'll get a warning:
          ftruncate: Invalid argument
      And later eventually more problems when actually mmap'ing that file and
      accessing it.
      
      If someone intends to let QEMU open+mmap a file read-only, better
      create+resize+fill that file ahead of time outside of QEMU context.
      
      We'll now fail with:
      ./qemu-system-x86_64 \
          -object memory-backend-file,id=ram0,mem-path=tmp,readonly=true,size=1g
      qemu-system-x86_64: can't open backing store tmp for guest RAM: No such file or directory
      
      All use cases of readonly files (R/O NVDIMMs, VM templating) work on
      existing files, so silently creating new files might just hide user
      errors when accidentally specifying a non-existent file.
      
      Note that the only memory-backend-file will end up calling
      memory_region_init_ram_from_file() -> qemu_ram_alloc_from_file() ->
      file_ram_open().
      
      Move error reporting to the single caller.
      
      Message-ID: <20230906120503.359863-7-david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      4d6b23f7
    • David Hildenbrand's avatar
      softmmu/physmem: Bail out early in ram_block_discard_range() with readonly files · b2cccb52
      David Hildenbrand authored
      
      fallocate() will fail, let's print a nicer error message.
      
      Message-ID: <20230906120503.359863-6-david@redhat.com>
      Suggested-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      b2cccb52
    • David Hildenbrand's avatar
      softmmu/physmem: Remap with proper protection in qemu_ram_remap() · 9e6b9f37
      David Hildenbrand authored
      
      Let's remap with the proper protection that we can derive from
      RAM_READONLY.
      
      Message-ID: <20230906120503.359863-5-david@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      9e6b9f37
    • David Hildenbrand's avatar
      softmmu/physmem: Distinguish between file access mode and mmap protection · 5c52a219
      David Hildenbrand authored
      
      There is a difference between how we open a file and how we mmap it,
      and we want to support writable private mappings of readonly files. Let's
      define RAM_READONLY and RAM_READONLY_FD flags, to replace the single
      "readonly" parameter for file-related functions.
      
      In memory_region_init_ram_from_fd() and memory_region_init_ram_from_file(),
      initialize mr->readonly based on the new RAM_READONLY flag.
      
      While at it, add some RAM_* flags we missed to add to the list of accepted
      flags in the documentation of some functions.
      
      No change in functionality intended. We'll make use of both flags next
      and start setting them independently for memory-backend-file.
      
      Message-ID: <20230906120503.359863-3-david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      5c52a219
  2. Sep 15, 2023
  3. Sep 11, 2023
    • Avihai Horon's avatar
      sysemu: Add prepare callback to struct VMChangeStateEntry · 9d3103c8
      Avihai Horon authored
      
      Add prepare callback to struct VMChangeStateEntry.
      
      The prepare callback is optional and can be set by the new function
      qemu_add_vm_change_state_handler_prio_full() that allows setting this
      callback in addition to the main callback.
      
      The prepare callbacks and main callbacks are called in two separate
      phases: First all prepare callbacks are called and only then all main
      callbacks are called.
      
      The purpose of the new prepare callback is to allow all devices to run a
      preliminary task before calling the devices' main callbacks.
      
      This will facilitate adding P2P support for VFIO migration where all
      VFIO devices need to be put in an intermediate P2P quiescent state
      before being stopped or started by the main callback.
      
      Signed-off-by: default avatarAvihai Horon <avihaih@nvidia.com>
      Reviewed-by: default avatarCédric Le Goater <clg@redhat.com>
      Tested-by: default avatarYangHang Liu <yanghliu@redhat.com>
      Signed-off-by: default avatarCédric Le Goater <clg@redhat.com>
      9d3103c8
  4. Sep 01, 2023
  5. Aug 31, 2023
  6. Aug 29, 2023
  7. Aug 22, 2023
  8. Jul 26, 2023
  9. Jul 12, 2023
  10. Jul 07, 2023
    • Fiona Ebner's avatar
      qemu_cleanup: begin drained section after vm_shutdown() · ca2a5e63
      Fiona Ebner authored
      
      in order to avoid requests being stuck in a BlockBackend's request
      queue during cleanup. Having such requests can lead to a deadlock [0]
      with a virtio-scsi-pci device using iothread that's busy with IO when
      initiating a shutdown with QMP 'quit'.
      
      There is a race where such a queued request can continue sometime
      (maybe after bdrv_child_free()?) during bdrv_root_unref_child() [1].
      The completion will hold the AioContext lock and wait for the BQL
      during SCSI completion, but the main thread will hold the BQL and
      wait for the AioContext as part of bdrv_root_unref_child(), leading to
      the deadlock [0].
      
      [0]:
      
      > Thread 3 (Thread 0x7f3bbd87b700 (LWP 135952) "qemu-system-x86"):
      > #0  __lll_lock_wait (futex=futex@entry=0x564183365f00 <qemu_global_mutex>, private=0) at lowlevellock.c:52
      > #1  0x00007f3bc1c0d843 in __GI___pthread_mutex_lock (mutex=0x564183365f00 <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:80
      > #2  0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x564183365f00 <qemu_global_mutex>, file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../util/qemu-thread-posix.c:94
      > #3  0x000056418247cc2a in qemu_mutex_lock_iothread_impl (file=0x564182b7f774 "../softmmu/physmem.c", line=2593) at ../softmmu/cpus.c:504
      > #4  0x00005641826d5325 in prepare_mmio_access (mr=0x5641856148a0) at ../softmmu/physmem.c:2593
      > #5  0x00005641826d6fe7 in address_space_stl_internal (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0, endian=DEVICE_LITTLE_ENDIAN) at /home/febner/repos/qemu/memory_ldst.c.inc:318
      > #6  0x00005641826d7154 in address_space_stl_le (as=0x56418679b310, addr=4276113408, val=16418, attrs=..., result=0x0) at /home/febner/repos/qemu/memory_ldst.c.inc:357
      > #7  0x0000564182374b07 in pci_msi_trigger (dev=0x56418679b0d0, msg=...) at ../hw/pci/pci.c:359
      > #8  0x000056418237118b in msi_send_message (dev=0x56418679b0d0, msg=...) at ../hw/pci/msi.c:379
      > #9  0x0000564182372c10 in msix_notify (dev=0x56418679b0d0, vector=8) at ../hw/pci/msix.c:542
      > #10 0x000056418243719c in virtio_pci_notify (d=0x56418679b0d0, vector=8) at ../hw/virtio/virtio-pci.c:77
      > #11 0x00005641826933b0 in virtio_notify_vector (vdev=0x5641867a34a0, vector=8) at ../hw/virtio/virtio.c:1985
      > #12 0x00005641826948d6 in virtio_irq (vq=0x5641867ac078) at ../hw/virtio/virtio.c:2461
      > #13 0x0000564182694978 in virtio_notify (vdev=0x5641867a34a0, vq=0x5641867ac078) at ../hw/virtio/virtio.c:2473
      > #14 0x0000564182665b83 in virtio_scsi_complete_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:115
      > #15 0x00005641826670ce in virtio_scsi_complete_cmd_req (req=0x7f3bb000e5d0) at ../hw/scsi/virtio-scsi.c:641
      > #16 0x000056418266736b in virtio_scsi_command_complete (r=0x7f3bb0010560, resid=0) at ../hw/scsi/virtio-scsi.c:712
      > #17 0x000056418239aac6 in scsi_req_complete (req=0x7f3bb0010560, status=2) at ../hw/scsi/scsi-bus.c:1526
      > #18 0x000056418239e090 in scsi_handle_rw_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:242
      > #19 0x000056418239e13f in scsi_disk_req_check_error (r=0x7f3bb0010560, ret=-123, acct_failed=false) at ../hw/scsi/scsi-disk.c:265
      > #20 0x000056418239e482 in scsi_dma_complete_noio (r=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:340
      > #21 0x000056418239e5d9 in scsi_dma_complete (opaque=0x7f3bb0010560, ret=-123) at ../hw/scsi/scsi-disk.c:371
      > #22 0x00005641824809ad in dma_complete (dbs=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:107
      > #23 0x0000564182480a72 in dma_blk_cb (opaque=0x7f3bb000d9d0, ret=-123) at ../softmmu/dma-helpers.c:127
      > #24 0x00005641827bf78a in blk_aio_complete (acb=0x7f3bb00021a0) at ../block/block-backend.c:1563
      > #25 0x00005641827bfa5e in blk_aio_write_entry (opaque=0x7f3bb00021a0) at ../block/block-backend.c:1630
      > #26 0x000056418295638a in coroutine_trampoline (i0=-1342102448, i1=32571) at ../util/coroutine-ucontext.c:177
      > #27 0x00007f3bc0caed40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
      > #28 0x00007f3bbd8757f0 in ?? ()
      > #29 0x0000000000000000 in ?? ()
      >
      > Thread 1 (Thread 0x7f3bbe3e9280 (LWP 135944) "qemu-system-x86"):
      > #0  __lll_lock_wait (futex=futex@entry=0x5641856f2a00, private=0) at lowlevellock.c:52
      > #1  0x00007f3bc1c0d8d1 in __GI___pthread_mutex_lock (mutex=0x5641856f2a00) at ../nptl/pthread_mutex_lock.c:115
      > #2  0x0000564182939f2e in qemu_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
      > #3  0x000056418293a140 in qemu_rec_mutex_lock_impl (mutex=0x5641856f2a00, file=0x564182c0e319 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
      > #4  0x00005641829532d5 in aio_context_acquire (ctx=0x5641856f29a0) at ../util/async.c:728
      > #5  0x000056418279d5df in bdrv_set_aio_context_commit (opaque=0x5641856e6e50) at ../block.c:7493
      > #6  0x000056418294e288 in tran_commit (tran=0x56418630bfe0) at ../util/transactions.c:87
      > #7  0x000056418279d880 in bdrv_try_change_aio_context (bs=0x5641856f7130, ctx=0x56418548f810, ignore_child=0x0, errp=0x0) at ../block.c:7626
      > #8  0x0000564182793f39 in bdrv_root_unref_child (child=0x5641856f47d0) at ../block.c:3242
      > #9  0x00005641827be137 in blk_remove_bs (blk=0x564185709880) at ../block/block-backend.c:914
      > #10 0x00005641827bd689 in blk_remove_all_bs () at ../block/block-backend.c:583
      > #11 0x0000564182798699 in bdrv_close_all () at ../block.c:5117
      > #12 0x000056418248a5b2 in qemu_cleanup () at ../softmmu/runstate.c:821
      > #13 0x0000564182738603 in qemu_default_main () at ../softmmu/main.c:38
      > #14 0x0000564182738631 in main (argc=30, argv=0x7ffd675a8a48) at ../softmmu/main.c:48
      >
      > (gdb) p *((QemuMutex*)0x5641856f2a00)
      > $1 = {lock = {__data = {__lock = 2, __count = 2, __owner = 135952, ...
      > (gdb) p *((QemuMutex*)0x564183365f00)
      > $2 = {lock = {__data = {__lock = 2, __count = 0, __owner = 135944, ...
      
      [1]:
      
      > Thread 1 "qemu-system-x86" hit Breakpoint 5, bdrv_drain_all_end () at ../block/io.c:551
      > #0  bdrv_drain_all_end () at ../block/io.c:551
      > #1  0x00005569810f0376 in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:156
      > #2  0x00005569810bd3e0 in bdrv_replace_child_noperm (child=0x556982e2d7d0, new_bs=0x0) at ../block.c:2897
      > #3  0x00005569810bdef2 in bdrv_root_unref_child (child=0x556982e2d7d0) at ../block.c:3227
      > #4  0x00005569810e8137 in blk_remove_bs (blk=0x556982e42880) at ../block/block-backend.c:914
      > #5  0x00005569810e7689 in blk_remove_all_bs () at ../block/block-backend.c:583
      > #6  0x00005569810c2699 in bdrv_close_all () at ../block.c:5117
      > #7  0x0000556980db45b2 in qemu_cleanup () at ../softmmu/runstate.c:821
      > #8  0x0000556981062603 in qemu_default_main () at ../softmmu/main.c:38
      > #9  0x0000556981062631 in main (argc=30, argv=0x7ffd7a82a418) at ../softmmu/main.c:48
      > [Switching to Thread 0x7fe76dab2700 (LWP 103649)]
      >
      > Thread 3 "qemu-system-x86" hit Breakpoint 4, blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
      > #0  blk_inc_in_flight (blk=0x556982e42880) at ../block/block-backend.c:1505
      > #1  0x00005569810e8f36 in blk_wait_while_drained (blk=0x556982e42880) at ../block/block-backend.c:1312
      > #2  0x00005569810e9231 in blk_co_do_pwritev_part (blk=0x556982e42880, offset=3422961664, bytes=4096, qiov=0x556983028060, qiov_offset=0, flags=0) at ../block/block-backend.c:1402
      > #3  0x00005569810e9a4b in blk_aio_write_entry (opaque=0x556982e2cfa0) at ../block/block-backend.c:1628
      > #4  0x000055698128038a in coroutine_trampoline (i0=-2090057872, i1=21865) at ../util/coroutine-ucontext.c:177
      > #5  0x00007fe770f50d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
      > #6  0x00007ffd7a829570 in ?? ()
      > #7  0x0000000000000000 in ?? ()
      
      Signed-off-by: default avatarFiona Ebner <f.ebner@proxmox.com>
      Message-ID: <20230706131418.423713-1-f.ebner@proxmox.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ca2a5e63
  11. Jun 27, 2023
  12. Jun 26, 2023
  13. Jun 23, 2023
  14. Jun 20, 2023
  15. Jun 13, 2023
    • Steve Sistare's avatar
      exec/memory: Introduce RAM_NAMED_FILE flag · b0182e53
      Steve Sistare authored
      
      migrate_ignore_shared() is an optimization that avoids copying memory
      that is visible and can be mapped on the target.  However, a
      memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED
      flag set is not migrated when migrate_ignore_shared() is true.  This is
      wrong, because the block has no named backing store, and its contents will
      be lost.  To fix, ignore shared memory iff it is a named file.  Define a
      new flag RAM_NAMED_FILE to distinguish this case.
      
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Message-Id: <1686151116-253260-1-git-send-email-steven.sistare@oracle.com>
      Signed-off-by: default avatarPhilippe Mathieu-Daudé <philmd@linaro.org>
      b0182e53
  16. Jun 06, 2023
    • Paolo Bonzini's avatar
      atomics: eliminate mb_read/mb_set · 06831001
      Paolo Bonzini authored
      
      qatomic_mb_read and qatomic_mb_set were the very first atomic primitives
      introduced for QEMU; their semantics are unclear and they provide a false
      sense of safety.
      
      The last use of qatomic_mb_read() has been removed, so delete it.
      qatomic_mb_set() instead can survive as an optimized
      qatomic_set()+smp_mb(), similar to Linux's smp_store_mb(), but
      rename it to qatomic_set_mb() to match the order of the two
      operations.
      
      Reviewed-by: default avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      06831001
  17. Jun 05, 2023
  18. Jun 01, 2023
Loading