Skip to content
Snippets Groups Projects
  1. Jan 26, 2024
    • Ari Sundholm's avatar
      block/blklogwrites: Fix a bug when logging "write zeroes" operations. · cf709665
      Ari Sundholm authored
      
      There is a bug in the blklogwrites driver pertaining to logging "write
      zeroes" operations, causing log corruption. This can be easily observed
      by setting detect-zeroes to something other than "off" for the driver.
      
      The issue is caused by a concurrency bug pertaining to the fact that
      "write zeroes" operations have to be logged in two parts: first the log
      entry metadata, then the zeroed-out region. While the log entry
      metadata is being written by bdrv_co_pwritev(), another operation may
      begin in the meanwhile and modify the state of the blklogwrites driver.
      This is as intended by the coroutine-driven I/O model in QEMU, of
      course.
      
      Unfortunately, this specific scenario is mishandled. A short example:
          1. Initially, in the current operation (#1), the current log sector
      number in the driver state is only incremented by the number of sectors
      taken by the log entry metadata, after which the log entry metadata is
      written. The current operation yields.
          2. Another operation (#2) may start while the log entry metadata is
      being written. It uses the current log position as the start offset for
      its log entry. This is in the sector right after the operation #1 log
      entry metadata, which is bad!
          3. After bdrv_co_pwritev() returns (#1), the current log sector
      number is reread from the driver state in order to find out the start
      offset for bdrv_co_pwrite_zeroes(). This is an obvious blunder, as the
      offset will be the sector right after the (misplaced) operation #2 log
      entry, which means that the zeroed-out region begins at the wrong
      offset.
          4. As a result of the above, the log is corrupt.
      
      Fix this by only reading the driver metadata once, computing the
      offsets and sizes in one go (including the optional zeroed-out region)
      and setting the log sector number to the appropriate value for the next
      operation in line.
      
      Signed-off-by: default avatarAri Sundholm <ari@tuxera.com>
      Cc: qemu-stable@nongnu.org
      Message-ID: <20240109184646.1128475-1-megari@gmx.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      (cherry picked from commit a9c8ea95470c27a8a02062b67f9fa6940e828ab6)
      Signed-off-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      cf709665
  2. Jan 25, 2024
    • Fiona Ebner's avatar
      block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status · 99dd4a15
      Fiona Ebner authored
      
      Using fleecing backup like in [0] on a qcow2 image (with metadata
      preallocation) can lead to the following assertion failure:
      
      > bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.
      
      In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
      will be set by the qcow2 driver, so the caller will recursively check
      the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
      chain, in bdrv_co_do_block_status() for the snapshot-access driver,
      the assertion failure will happen, because both flags are set.
      
      To fix it, clear the recurse flag after the recursive check was done.
      
      In detail:
      
      > #0  qcow2_co_block_status
      
      Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
      BDRV_BLOCK_OFFSET_VALID.
      
      > #1  bdrv_co_do_block_status
      
      Because of the data flag, bdrv_co_do_block_status() will now also set
      BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
      bdrv_co_do_block_status() for the bdrv_file child will be called,
      which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
      BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.
      
      Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
      BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
      
      > #2  bdrv_co_common_block_status_above
      > #3  bdrv_co_block_status_above
      > #4  bdrv_co_block_status
      > #5  cbw_co_snapshot_block_status
      > #6  bdrv_co_snapshot_block_status
      > #7  snapshot_access_co_block_status
      > #8  bdrv_co_do_block_status
      
      Return value is propagated all the way up to here, where the assertion
      failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
      both set.
      
      > #9  bdrv_co_common_block_status_above
      > #10 bdrv_co_block_status_above
      > #11 block_copy_block_status
      > #12 block_copy_dirty_clusters
      > #13 block_copy_common
      > #14 block_copy_async_co_entry
      > #15 coroutine_trampoline
      
      [0]:
      
      > #!/bin/bash
      > rm /tmp/disk.qcow2
      > ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
      > ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
      > ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
      > ./qemu-system-x86_64 --qmp stdio \
      > --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
      > --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
      > --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
      > <<EOF
      > {"execute": "qmp_capabilities"}
      > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
      > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
      > {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
      > EOF
      
      Signed-off-by: default avatarFiona Ebner <f.ebner@proxmox.com>
      Reviewed-by: default avatarVladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
      Message-id: 20240116154839.401030-1-f.ebner@proxmox.com
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      (cherry picked from commit 8a9be7992426c8920d4178e7dca59306a18c7a3a)
      Signed-off-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      99dd4a15
  3. Dec 22, 2023
    • Kevin Wolf's avatar
      block: Fix crash when loading snapshot on inactive node · e2e01b3a
      Kevin Wolf authored
      
      bdrv_is_read_only() only checks if the node is configured to be
      read-only eventually, but even if it returns false, writing to the node
      may not be permitted at the moment (because it's inactive).
      
      bdrv_is_writable() checks that the node can be written to right now, and
      this is what the snapshot operations really need.
      
      Change bdrv_can_snapshot() to use bdrv_is_writable() to fix crashes like
      the following:
      
      $ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
      qemu-system-x86_64: ../block/io.c:1990: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
      
      The resulting error message after this patch isn't perfect yet, but at
      least it doesn't crash any more:
      
      $ ./qemu-system-x86_64 -hda /tmp/test.qcow2 -loadvm foo -incoming defer
      qemu-system-x86_64: Device 'ide0-hd0' is writable but does not support snapshots
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-ID: <20231201142520.32255-2-kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      (cherry picked from commit d3007d348adaaf04ee8b099a475282034a662414)
      Signed-off-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      e2e01b3a
  4. Nov 28, 2023
    • Kevin Wolf's avatar
      export/vhost-user-blk: Fix consecutive drains · 411132c9
      Kevin Wolf authored
      The vhost-user-blk export implement AioContext switches in its drain
      implementation. This means that on drain_begin, it detaches the server
      from its AioContext and on drain_end, attaches it again and schedules
      the server->co_trip coroutine in the updated AioContext.
      
      However, nothing guarantees that server->co_trip is even safe to be
      scheduled. Not only is it unclear that the coroutine is actually in a
      state where it can be reentered externally without causing problems, but
      with two consecutive drains, it is possible that the scheduled coroutine
      didn't have a chance yet to run and trying to schedule an already
      scheduled coroutine a second time crashes with an assertion failure.
      
      Following the model of NBD, this commit makes the vhost-user-blk export
      shut down server->co_trip during drain so that resuming the export means
      creating and scheduling a new coroutine, which is always safe.
      
      There is one exception: If the drain call didn't poll (for example, this
      happens in the context of bdrv_graph_wrlock()), then the coroutine
      didn't have a chance to shut down. However, in this case the AioContext
      can't have changed; changing the AioContext always involves a polling
      drain. So in this case we can simply assert that the AioContext is
      unchanged and just leave the coroutine running or wake it up if it has
      yielded to wait for the AioContext to be attached again.
      
      Fixes: e1054cd4
      Fixes: https://issues.redhat.com/browse/RHEL-1708
      
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-ID: <20231127115755.22846-1-kwolf@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      411132c9
    • Fam Zheng's avatar
      vmdk: Don't corrupt desc file in vmdk_write_cid · 9fb7b350
      Fam Zheng authored
      If the text description file is larger than DESC_SIZE, we force the last
      byte in the buffer to be 0 and write it out.
      
      This results in a corruption.
      
      Try to allocate a big buffer in this case.
      
      Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1923
      
      
      
      Signed-off-by: default avatarFam Zheng <fam@euphon.net>
      Message-ID: <20231124115654.3239137-1-fam@euphon.net>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      9fb7b350
  5. Nov 21, 2023
    • Kevin Wolf's avatar
      stream: Fix AioContext locking during bdrv_graph_wrlock() · 1dbc7d34
      Kevin Wolf authored
      
      In stream_prepare(), we need to temporarily drop the AioContext lock
      that job_prepare_locked() took for us while calling the graph write lock
      functions which can poll.
      
      All block nodes related to this block job are in the same AioContext, so
      we can pass any of them to bdrv_graph_wrlock()/ bdrv_graph_wrunlock().
      Unfortunately, the one that we picked is base, which can be NULL - and
      in this case the AioContext lock is not released and deadlocks can
      occur.
      
      Fix this by passing s->target_bs, which is never NULL.
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-ID: <20231115172012.112727-4-kwolf@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      1dbc7d34
    • Kevin Wolf's avatar
      block: Fix deadlocks in bdrv_graph_wrunlock() · 6bc0bcc8
      Kevin Wolf authored
      
      bdrv_graph_wrunlock() calls aio_poll(), which may run callbacks that
      have a nested event loop. Nested event loops can depend on other
      iothreads making progress, so in order to allow them to make progress it
      must not hold the AioContext lock of another thread while calling
      aio_poll().
      
      This introduces a @bs parameter to bdrv_graph_wrunlock() whose
      AioContext is temporarily dropped (which matches bdrv_graph_wrlock()),
      and a bdrv_graph_wrunlock_ctx() that can be used if the BlockDriverState
      doesn't necessarily exist any more when unlocking.
      
      This also requires a change to bdrv_schedule_unref(), which was relying
      on the incorrectly taken lock. It needs to take the lock itself now.
      While this is a separate bug, it can't be fixed a separate patch because
      otherwise the intermediate state would either deadlock or try to release
      a lock that we don't even hold.
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-ID: <20231115172012.112727-3-kwolf@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      [kwolf: Fixed up bdrv_schedule_unref()]
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      6bc0bcc8
    • Kevin Wolf's avatar
      block: Fix bdrv_graph_wrlock() call in blk_remove_bs() · bb092d6d
      Kevin Wolf authored
      While not all callers of blk_remove_bs() are correct in this respect,
      the assumption in the function is that callers hold the AioContext lock
      of the BlockBackend (this is required by the drain calls in it).
      
      In order to avoid deadlock in the nested event loop, bdrv_graph_wrlock()
      has then to be called with the root BlockDriverState as its parameter
      instead of NULL, so that this AioContext lock is temporarily dropped.
      
      Fixes: https://issues.redhat.com/browse/RHEL-1761
      
      
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Message-ID: <20231115172012.112727-2-kwolf@redhat.com>
      Reviewed-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      bb092d6d
  6. Nov 13, 2023
  7. Nov 08, 2023
  8. Nov 07, 2023
  9. Nov 06, 2023
  10. Nov 03, 2023
  11. Nov 01, 2023
    • Steve Sistare's avatar
      cpr: relax blockdev migration blockers · e0ee3a8f
      Steve Sistare authored
      
      Some blockdevs block migration because they do not support sharing across
      hosts and/or do not support dirty bitmaps.  These prohibitions do not apply
      if the old and new qemu processes do not run concurrently, and if new qemu
      starts on the same host as old, which is the case for cpr.  Narrow the scope
      of these blockers so they only apply to normal mode.  They will not block
      cpr modes when they are added in subsequent patches.
      
      No functional change until a new mode is added.
      
      Signed-off-by: default avatarSteve Sistare <steven.sistare@oracle.com>
      Reviewed-by: default avatarJuan Quintela <quintela@redhat.com>
      Signed-off-by: default avatarJuan Quintela <quintela@redhat.com>
      Message-ID: <1698263069-406971-4-git-send-email-steven.sistare@oracle.com>
      e0ee3a8f
  12. Oct 31, 2023
Loading