Skip to content
Snippets Groups Projects
  1. Jan 14, 2022
    • Vladimir Sementsov-Ogievskiy's avatar
      block: drop BLK_PERM_GRAPH_MOD · 64631f36
      Vladimir Sementsov-Ogievskiy authored
      
      First, this permission never protected a node from being changed, as
      generic child-replacing functions don't check it.
      
      Second, it's a strange thing: it presents a permission of parent node
      to change its child. But generally, children are replaced by different
      mechanisms, like jobs or qmp commands, not by nodes.
      
      Graph-mod permission is hard to understand. All other permissions
      describe operations which done by parent node on its child: read,
      write, resize. Graph modification operations are something completely
      different.
      
      The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared
      perm) is mirror_start_job, for s->target. Still modern code should use
      bdrv_freeze_backing_chain() to protect from graph modification, if we
      don't do it somewhere it may be considered as a bug. So, it's a bit
      risky to drop GRAPH_MOD, and analyzing of possible loss of protection
      is hard. But one day we should do it, let's do it now.
      
      One more bit of information is that locking the corresponding byte in
      file-posix doesn't make sense at all.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      64631f36
    • Kevin Wolf's avatar
      vvfat: Fix vvfat_write() for writes before the root directory · b9b8860d
      Kevin Wolf authored
      
      The calculation in sector2cluster() is done relative to the offset of
      the root directory. Any writes to blocks before the start of the root
      directory (in particular, writes to the FAT) result in negative values,
      which are not handled correctly in vvfat_write().
      
      This changes sector2cluster() to return a signed value, and makes sure
      that vvfat_write() doesn't try to find mappings for negative cluster
      number. It clarifies the code in vvfat_write() to make it more obvious
      that the cluster numbers can be negative.
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-Id: <20211209152231.23756-1-kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      b9b8860d
    • Kevin Wolf's avatar
      vvfat: Fix size of temporary qcow file · 2db9b9e9
      Kevin Wolf authored
      
      The size of the qcow size was calculated so that only the FAT partition
      would fit on it, but not the whole disk. However, offsets relative to
      the whole disk are used to access it, so increase its size to be large
      enough for that.
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-Id: <20211209151815.23495-1-kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      2db9b9e9
    • Stefan Hajnoczi's avatar
      block-backend: prevent dangling BDS pointers across aio_poll() · 1e3552db
      Stefan Hajnoczi authored
      The BlockBackend root child can change when aio_poll() is invoked. This
      happens when a temporary filter node is removed upon blockjob
      completion, for example.
      
      Functions in block/block-backend.c must be aware of this when using a
      blk_bs() pointer across aio_poll() because the BlockDriverState refcnt
      may reach 0, resulting in a stale pointer.
      
      One example is scsi_device_purge_requests(), which calls blk_drain() to
      wait for in-flight requests to cancel. If the backup blockjob is active,
      then the BlockBackend root child is a temporary filter BDS owned by the
      blockjob. The blockjob can complete during bdrv_drained_begin() and the
      last reference to the BDS is released when the temporary filter node is
      removed. This results in a use-after-free when blk_drain() calls
      bdrv_drained_end(bs) on the dangling pointer.
      
      Explicitly hold a reference to bs across block APIs that invoke
      aio_poll().
      
      Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2021778
      Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2036178
      
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Message-Id: <20220111153613.25453-2-stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      1e3552db
    • Emanuele Giuseppe Esposito's avatar
      include/sysemu/blockdev.h: remove drive_mark_claimed_by_board and inline drive_def · cc67f28e
      Emanuele Giuseppe Esposito authored
      
      drive_def is only a particular use case of
      qemu_opts_parse_noisily, so it can be inlined.
      
      Also remove drive_mark_claimed_by_board, as it is only defined
      but not implemented (nor used) anywhere.
      
      Signed-off-by: default avatarEmanuele Giuseppe Esposito <eesposit@redhat.com>
      Message-Id: <20211215121140.456939-3-eesposit@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      cc67f28e
  2. Jan 12, 2022
    • Stefan Hajnoczi's avatar
      aio-posix: split poll check from ready handler · 826cc324
      Stefan Hajnoczi authored
      
      Adaptive polling measures the execution time of the polling check plus
      handlers called when a polled event becomes ready. Handlers can take a
      significant amount of time, making it look like polling was running for
      a long time when in fact the event handler was running for a long time.
      
      For example, on Linux the io_submit(2) syscall invoked when a virtio-blk
      device's virtqueue becomes ready can take 10s of microseconds. This
      can exceed the default polling interval (32 microseconds) and cause
      adaptive polling to stop polling.
      
      By excluding the handler's execution time from the polling check we make
      the adaptive polling calculation more accurate. As a result, the event
      loop now stays in polling mode where previously it would have fallen
      back to file descriptor monitoring.
      
      The following data was collected with virtio-blk num-queues=2
      event_idx=off using an IOThread. Before:
      
      168k IOPS, IOThread syscalls:
      
        9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0)    = 16
        9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8)                         = 8
        9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8)                         = 8
        9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3
        9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512)                        = 8
        9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0)    = 32
      
      174k IOPS (+3.6%), IOThread syscalls:
      
        9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0)    = 32
        9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8)                         = 8
        9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8)                         = 8
        9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50)    = 32
      
      Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because
      the IOThread stays in polling mode instead of falling back to file
      descriptor monitoring.
      
      As usual, polling is not implemented on Windows so this patch ignores
      the new io_poll_read() callback in aio-win32.c.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarStefano Garzarella <sgarzare@redhat.com>
      Message-id: 20211207132336.36627-2-stefanha@redhat.com
      
      [Fixed up aio_set_event_notifier() calls in
      tests/unit/test-fdmon-epoll.c added after this series was queued.
      --Stefan]
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      826cc324
    • Thomas Huth's avatar
      block/file-posix: Simplify the XFS_IOC_DIOINFO handling · a5730b8b
      Thomas Huth authored
      
      The handling for the XFS_IOC_DIOINFO ioctl is currently quite excessive:
      This is not a "real" feature like the other features that we provide with
      the "--enable-xxx" and "--disable-xxx" switches for the configure script,
      since this does not influence lots of code (it's only about one call to
      xfsctl() in file-posix.c), so people don't gain much with the ability to
      disable this with "--disable-xfsctl".
      It's also unfortunate that the ioctl will be disabled on Linux in case
      the user did not install the right xfsprogs-devel package before running
      configure. Thus let's simplify this by providing the ioctl definition
      on our own, so we can completely get rid of the header dependency and
      thus the related code in the configure script.
      
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarThomas Huth <thuth@redhat.com>
      Message-Id: <20211215125824.250091-1-thuth@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      a5730b8b
  3. Dec 28, 2021
    • Vladimir Sementsov-Ogievskiy's avatar
      blockjob: drop BlockJob.blk field · 985cac8f
      Vladimir Sementsov-Ogievskiy authored
      
      It's unused now (except for permission handling)[*]. The only reasonable
      user of it was block-stream job, recently updated to use own blk. And
      other block jobs prefer to use own source node related objects.
      
      So, the arguments of dropping the field are:
      
       - block jobs prefer not to use it
       - block jobs usually has more then one node to operate on, and better
         to operate symmetrically (for example has both source and target
         blk's in specific block-job state structure)
      
      *: BlockJob.blk is used to keep some permissions. We simply move
      permissions to block-job child created in block_job_create() together
      with blk.
      
      In mirror, we just should not care anymore about restoring state of
      blk. Most probably this code could be dropped long ago, after dropping
      bs->job pointer. Now it finally goes away together with BlockJob.blk
      itself.
      
      iotest 141 output is updated, as "bdrv_has_blk(bs)" check in
      qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new
      error message looks even better.
      
      In iotest 283 we need to add a job id, otherwise "Invalid job ID"
      happens now earlier than permission check (as permissions moved from
      blk to block-job node).
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Reviewed-by: default avatarNikita Lapshin <nikita.lapshin@virtuozzo.com>
      985cac8f
    • Vladimir Sementsov-Ogievskiy's avatar
      block/stream: add own blk · 048954e2
      Vladimir Sementsov-Ogievskiy authored
      
      block-stream is the only block-job, that reasonably use BlockJob.blk.
      We are going to drop BlockJob.blk soon. So, let block-stream have own
      blk.
      
      Signed-off-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Reviewed-by: default avatarNikita Lapshin <nikita.lapshin@virtuozzo.com>
      048954e2
  4. Dec 21, 2021
  5. Dec 09, 2021
    • Stefan Hajnoczi's avatar
      block/nvme: fix infinite loop in nvme_free_req_queue_cb() · cf4fbc30
      Stefan Hajnoczi authored
      
      When the request free list is exhausted the coroutine waits on
      q->free_req_queue for the next free request. Whenever a request is
      completed a BH is scheduled to invoke nvme_free_req_queue_cb() and wake
      up waiting coroutines.
      
      1. nvme_get_free_req() waits for a free request:
      
          while (q->free_req_head == -1) {
              ...
                  trace_nvme_free_req_queue_wait(q->s, q->index);
                  qemu_co_queue_wait(&q->free_req_queue, &q->lock);
              ...
          }
      
      2. nvme_free_req_queue_cb() wakes up the coroutine:
      
          while (qemu_co_enter_next(&q->free_req_queue, &q->lock)) {
             ^--- infinite loop when free_req_head == -1
          }
      
      nvme_free_req_queue_cb() and the coroutine form an infinite loop when
      q->free_req_head == -1. Fix this by checking q->free_req_head in
      nvme_free_req_queue_cb(). If the free request list is exhausted, don't
      wake waiting coroutines. Eventually an in-flight request will complete
      and the BH will be scheduled again, guaranteeing forward progress.
      
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarPhilippe Mathieu-Daudé <philmd@redhat.com>
      Message-id: 20211208152246.244585-1-stefanha@redhat.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      cf4fbc30
  6. Nov 23, 2021
    • Daniella Lee's avatar
      block/vvfat.c fix leak when failure occurs · 22c36b75
      Daniella Lee authored
      
      Function vvfat_open called function enable_write_target and init_directories,
      and these functions malloc new memory for BDRVVVFATState::qcow_filename,
      BDRVVVFATState::used_clusters, and BDRVVVFATState::cluster_buff.
      
      When the specified folder does not exist ,it may contains memory leak.
      After init_directories function is executed, the vvfat_open return -EIO,
      and bdrv_open_driver goto label open_failed,
      the program use g_free(bs->opaque) to release BDRVVVFATState struct
      without members mentioned.
      
      command line:
      qemu-system-x86_64 -hdb <vdisk qcow file>  -usb -device usb-storage,drive=fat16
      -drive file=fat:rw:fat-type=16:"<path of a host folder does not exist>",
      id=fat16,format=raw,if=none
      
      enable_write_target called:
      (gdb) bt
          at ../block/vvfat.c:3114
          flags=155650, errp=0x7fffffffd780) at ../block/vvfat.c:1236
          node_name=0x0, options=0x555556fa45d0, open_flags=155650,
          errp=0x7fffffffd890) at ../block.c:1558
          errp=0x7fffffffd890) at ../block.c:1852
          reference=0x0, options=0x555556fa45d0, flags=40962, parent=0x555556f98cd0,
          child_class=0x555556b1d6a0 <child_of_bds>, child_role=19,
          errp=0x7fffffffda90) at ../block.c:3779
          options=0x555556f9cfc0, bdref_key=0x555556239bb8 "file",
          parent=0x555556f98cd0, child_class=0x555556b1d6a0 <child_of_bds>,
          child_role=19, allow_none=true, errp=0x7fffffffda90) at ../block.c:3419
          reference=0x0, options=0x555556f9cfc0, flags=8194, parent=0x0,
          child_class=0x0, child_role=0, errp=0x555556c98c40 <error_fatal>)
          at ../block.c:3726
          options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>)
          at ../block.c:3872
          options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>)
          at ../block/block-backend.c:436
          bs_opts=0x555556f757b0, errp=0x555556c98c40 <error_fatal>)
          at ../blockdev.c:608
          errp=0x555556c98c40 <error_fatal>) at ../blockdev.c:992
      ......
      
      Signed-off-by: default avatarDaniella Lee <daniellalee111@gmail.com>
      Message-Id: <20211119112553.352222-1-daniellalee111@gmail.com>
      [hreitz: Took commit message from v1]
      Signed-off-by: default avatarHanna Reitz <hreitz@redhat.com>
      22c36b75
  7. Nov 16, 2021
  8. Nov 02, 2021
  9. Oct 15, 2021
  10. Oct 14, 2021
Loading