Commits · 64ebbb7d62cfe3e435b1ce87f839398b75bbc35a · Anton / libtcg

Dec 28, 2021

blockjob: drop BlockJob.blk field · 985cac8f

Vladimir Sementsov-Ogievskiy authored 3 years ago


It's unused now (except for permission handling)[*]. The only reasonable
user of it was block-stream job, recently updated to use own blk. And
other block jobs prefer to use own source node related objects.

So, the arguments of dropping the field are:

 - block jobs prefer not to use it
 - block jobs usually has more then one node to operate on, and better
   to operate symmetrically (for example has both source and target
   blk's in specific block-job state structure)

*: BlockJob.blk is used to keep some permissions. We simply move
permissions to block-job child created in block_job_create() together
with blk.

In mirror, we just should not care anymore about restoring state of
blk. Most probably this code could be dropped long ago, after dropping
bs->job pointer. Now it finally goes away together with BlockJob.blk
itself.

iotest 141 output is updated, as "bdrv_has_blk(bs)" check in
qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new
error message looks even better.

In iotest 283 we need to add a job id, otherwise "Invalid job ID"
happens now earlier than permission check (as permissions moved from
blk to block-job node).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>

985cac8f

block/stream: add own blk · 048954e2

Vladimir Sementsov-Ogievskiy authored 3 years ago


block-stream is the only block-job, that reasonably use BlockJob.blk.
We are going to drop BlockJob.blk soon. So, let block-stream have own
blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>

048954e2

Dec 21, 2021

nbd: allow reconnect on open, with corresponding new options · be16b8bf

Vladimir Sementsov-Ogievskiy authored 3 years ago


It is useful when start of vm and start of nbd server are not
simple to sync.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

be16b8bf

Dec 09, 2021

block/nvme: fix infinite loop in nvme_free_req_queue_cb() · cf4fbc30

Stefan Hajnoczi authored 3 years ago


When the request free list is exhausted the coroutine waits on
q->free_req_queue for the next free request. Whenever a request is
completed a BH is scheduled to invoke nvme_free_req_queue_cb() and wake
up waiting coroutines.

1. nvme_get_free_req() waits for a free request:

    while (q->free_req_head == -1) {
        ...
            trace_nvme_free_req_queue_wait(q->s, q->index);
            qemu_co_queue_wait(&q->free_req_queue, &q->lock);
        ...
    }

2. nvme_free_req_queue_cb() wakes up the coroutine:

    while (qemu_co_enter_next(&q->free_req_queue, &q->lock)) {
       ^--- infinite loop when free_req_head == -1
    }

nvme_free_req_queue_cb() and the coroutine form an infinite loop when
q->free_req_head == -1. Fix this by checking q->free_req_head in
nvme_free_req_queue_cb(). If the free request list is exhausted, don't
wake waiting coroutines. Eventually an in-flight request will complete
and the BH will be scheduled again, guaranteeing forward progress.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20211208152246.244585-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

cf4fbc30

Nov 23, 2021

block/vvfat.c fix leak when failure occurs · 22c36b75

Daniella Lee authored 3 years ago


Function vvfat_open called function enable_write_target and init_directories,
and these functions malloc new memory for BDRVVVFATState::qcow_filename,
BDRVVVFATState::used_clusters, and BDRVVVFATState::cluster_buff.

When the specified folder does not exist ,it may contains memory leak.
After init_directories function is executed, the vvfat_open return -EIO,
and bdrv_open_driver goto label open_failed,
the program use g_free(bs->opaque) to release BDRVVVFATState struct
without members mentioned.

command line:
qemu-system-x86_64 -hdb <vdisk qcow file>  -usb -device usb-storage,drive=fat16
-drive file=fat:rw:fat-type=16:"<path of a host folder does not exist>",
id=fat16,format=raw,if=none

enable_write_target called:
(gdb) bt
    at ../block/vvfat.c:3114
    flags=155650, errp=0x7fffffffd780) at ../block/vvfat.c:1236
    node_name=0x0, options=0x555556fa45d0, open_flags=155650,
    errp=0x7fffffffd890) at ../block.c:1558
    errp=0x7fffffffd890) at ../block.c:1852
    reference=0x0, options=0x555556fa45d0, flags=40962, parent=0x555556f98cd0,
    child_class=0x555556b1d6a0 <child_of_bds>, child_role=19,
    errp=0x7fffffffda90) at ../block.c:3779
    options=0x555556f9cfc0, bdref_key=0x555556239bb8 "file",
    parent=0x555556f98cd0, child_class=0x555556b1d6a0 <child_of_bds>,
    child_role=19, allow_none=true, errp=0x7fffffffda90) at ../block.c:3419
    reference=0x0, options=0x555556f9cfc0, flags=8194, parent=0x0,
    child_class=0x0, child_role=0, errp=0x555556c98c40 <error_fatal>)
    at ../block.c:3726
    options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>)
    at ../block.c:3872
    options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>)
    at ../block/block-backend.c:436
    bs_opts=0x555556f757b0, errp=0x555556c98c40 <error_fatal>)
    at ../blockdev.c:608
    errp=0x555556c98c40 <error_fatal>) at ../blockdev.c:992
......

Signed-off-by: Daniella Lee <daniellalee111@gmail.com>
Message-Id: <20211119112553.352222-1-daniellalee111@gmail.com>
[hreitz: Took commit message from v1]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

22c36b75

Nov 16, 2021

file-posix: Fix alignment after reopen changing O_DIRECT · 5dbd0ce1

Kevin Wolf authored 3 years ago


At the end of a reopen, we already call bdrv_refresh_limits(), which
should update bs->request_alignment according to the new file
descriptor. However, raw_probe_alignment() relies on s->needs_alignment
and just uses 1 if it isn't set. We neglected to update this field, so
starting with cache=writeback and then reopening with cache=none means
that we get an incorrect bs->request_alignment == 1 and unaligned
requests fail instead of being automatically aligned.

Fix this by recalculating s->needs_alignment in raw_refresh_limits()
before calling raw_probe_alignment().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211104113109.56336-1-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-13-kwolf@redhat.com>
[hreitz: Fix iotest 142 for block sizes greater than 512 by operating on
         a file with a size of 1 MB]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211116101431.105252-1-hreitz@redhat.com>

5dbd0ce1

stream: Traverse graph after modification · 8d3dd037

Hanna Reitz authored 3 years ago


bdrv_cor_filter_drop() modifies the block graph.  That means that other
parties can also modify the block graph before it returns.  Therefore,
we cannot assume that the result of a graph traversal we did before
remains valid afterwards.

We should thus fetch `base` and `unfiltered_base` afterwards instead of
before.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211111120829.81329-2-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211115145409.176785-2-kwolf@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

8d3dd037

Nov 02, 2021

block/nvme: Extract nvme_free_queue() from nvme_free_queue_pair() · a8951438

Philippe Mathieu-Daudé authored 3 years ago


Instead of duplicating code, extract the common helper to free
a single queue.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211006164931.172349-4-philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

a8951438

block/nvme: Display CQ/SQ pointer in nvme_free_queue_pair() · 53cedeaa

Philippe Mathieu-Daudé authored 3 years ago


For debugging purpose it is helpful to know the CQ/SQ pointers.
We already have a trace event in nvme_free_queue_pair(), extend
it to report these pointer addresses.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211006164931.172349-3-philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

53cedeaa

block/nvme: Automatically free qemu_memalign() with QEMU_AUTO_VFREE · 4a613bd8

Philippe Mathieu-Daudé authored 3 years ago


Since commit 4d324c0b ("introduce QEMU_AUTO_VFREE") buffers
allocated by qemu_memalign() can automatically freed when using
the QEMU_AUTO_VFREE macro. Use it to simplify a bit.

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211006164931.172349-2-philmd@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

4a613bd8

block-backend: Silence clang -m32 compiler warning · 73d4a113

Hanna Reitz authored 3 years ago


Similarly to e7e588d4, there is a
warning in block/block-backend.c that qiov->size <= INT64_MAX is always
true on machines where size_t is narrower than a uint64_t.  In said
commit, we silenced this warning by casting to uint64_t.

The commit introducing this warning here
(a93d81c8) anticipated it and so tried
to address it the same way.  However, it only did so in one of two
places where this comparison occurs, and so we still need to fix up the
other one.

Fixes: a93d81c8
       ("block-backend: convert blk_aio_ functions to int64_t bytes
       paramter")
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211026090745.30800-1-hreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

73d4a113

linux-aio: add `dev_max_batch` parameter to laio_io_unplug() · 68d79466

Stefano Garzarella authored 3 years ago


Between the submission of a request and the unplug, other devices
with larger limits may have been queued new requests without flushing
the batch.

Using the new `dev_max_batch` parameter, laio_io_unplug() can check
if the batch exceeds the device limit to flush the current batch.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20211026162346.253081-4-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

68d79466

linux-aio: add `dev_max_batch` parameter to laio_co_submit() · 512da211

Stefano Garzarella authored 3 years ago


This new parameter can be used by block devices to limit the
Linux AIO batch size more than the limit set by the AIO context.

file-posix backend supports this, passing its `aio-max-batch` option
previously added.

Add an helper function to calculate the maximum batch size.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20211026162346.253081-3-sgarzare@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

512da211

file-posix: add `aio-max-batch` option · 684960d4

Stefano Garzarella authored 3 years ago


Commit d7ddd0a1 ("linux-aio: limit the batch size using
`aio-max-batch` parameter") added a way to limit the batch size
of Linux AIO backend for the entire AIO context.

The same AIO context can be shared by multiple devices, so
latency-sensitive devices may want to limit the batch size even
more to avoid increasing latency.

For this reason we add the `aio-max-batch` option to the file
backend, which will be used by the next commits to limit the size of
batches including requests generated by this device.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20211026162346.253081-2-sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

684960d4

block/export/fuse.c: fix musl build · 30433203

Fabrice Fontaine authored 3 years ago

Include linux/falloc.h if CONFIG_FALLOCATE_ZERO_RANGE is defined to fix
https://gitlab.com/qemu-project/qemu/-/commit/50482fda98bd62e072c30b7ea73c985c4e9d9bbb
and avoid the following build failure on musl:

../block/export/fuse.c: In function 'fuse_fallocate':
../block/export/fuse.c:643:21: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function)
  643 |     else if (mode & FALLOC_FL_ZERO_RANGE) {
      |                     ^~~~~~~~~~~~~~~~~~~~

Fixes:
 - http://autobuild.buildroot.org/results/be24433a429fda681fb66698160132c1c99bc53b



Fixes: 50482fda ("block/export/fuse.c: fix musl build")
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Message-Id: <20211022095209.1319671-1-fontaine.fabrice@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

30433203

block/rbd: implement bdrv_co_block_status · 0347a8fd

Peter Lieven authored 3 years ago


the qemu rbd driver currently lacks support for bdrv_co_block_status.
This results mainly in incorrect progress during block operations (e.g.
qemu-img convert with an rbd image as source).

This patch utilizes the rbd_diff_iterate2 call from librbd to detect
allocated and unallocated (all zero areas).

To avoid querying the ceph OSDs for the answer this is only done if
the image has the fast-diff feature which depends on the object-map and
exclusive-lock features. In this case it is guaranteed that the information
is present in memory in the librbd client and thus very fast.

If fast-diff is not available all areas are reported to be allocated
which is the current behaviour if bdrv_co_block_status is not implemented.

Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <20211012152231.24868-1-pl@kamp.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

0347a8fd

block/file-posix: Fix return value translation for AIO discards · 13a02833

Ari Sundholm authored 3 years ago


AIO discards regressed as a result of the following commit:
	0dfc7af2 block/file-posix: Optimize for macOS

When trying to run blkdiscard within a Linux guest, the request would
fail, with some errors in dmesg:

---- [ snip ] ----
[    4.010070] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[    4.011061] sd 2:0:0:0: [sda] tag#0 Sense Key : Aborted Command
[current]
[    4.011061] sd 2:0:0:0: [sda] tag#0 Add. Sense: I/O process
terminated
[    4.011061] sd 2:0:0:0: [sda] tag#0 CDB: Unmap/Read sub-channel 42
00 00 00 00 00 00 00 18 00
[    4.011061] blk_update_request: I/O error, dev sda, sector 0
---- [ snip ] ----

This turns out to be a result of a flaw in changes to the error value
translation logic in handle_aiocb_discard(). The default return value
may be left untranslated in some configurations, and the wrong variable
is used in one translation.

Fix both issues.

Fixes: 0dfc7af2 ("block/file-posix: Optimize for macOS")
Cc: qemu-stable@nongnu.org
Signed-off-by: Ari Sundholm <ari@tuxera.com>
Signed-off-by: Emil Karlson <jkarlson@tuxera.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20211019110954.4170931-1-ari@tuxera.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

13a02833

block/vpc: Add a sanity check that fixed-size images have the right type · 7da9623c

Thomas Huth authored 3 years ago


The code in vpc.c uses BDRVVPCState->footer.type in various places
to decide whether the image is a fixed-size (VHD_FIXED) or a dynamic
(VHD_DYNAMIC) image. However, we never check that this field really
contains VHD_FIXED if we detected a fixed size image in vpc_open(),
so a wrong value here could cause quite some trouble during runtime.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20211012082702.792259-1-thuth@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

7da9623c

vmdk: allow specification of tools version · f3d43dfd

Thomas Weißschuh authored 3 years ago


VMDK files support an attribute that represents the version of the guest
tools that are installed on the disk.
This attribute is used by vSphere before a machine has been started to
determine if the VM has the guest tools installed.
This is important when configuring "Operating system customizations" in
vSphere, as it checks for the presence of the guest tools before
allowing those customizations.
Thus when the VM has not yet booted normally it would be impossible to
customize it, therefore preventing a customized first-boot.

The attribute should not hurt on disks that do not have the guest tools
installed and indeed the VMware tools also unconditionally add this
attribute.
(Defaulting to the value "2147483647", as is done in this patch)

Signed-off-by: Thomas Weißschuh <thomas.weissschuh.ext@zeiss.com>
Message-Id: <20210913130419.13241-1-thomas.weissschuh.ext@zeiss.com>
[hreitz: Added missing '#' in block-core.json]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

f3d43dfd

Oct 15, 2021

block-backend: drop INT_MAX restriction from blk_check_byte_request() · aa78b825

Vladimir Sementsov-Ogievskiy authored 3 years ago


blk_check_bytes_request is called from blk_co_do_preadv,
blk_co_do_pwritev_part, blk_co_do_pdiscard and blk_co_copy_range
before (maybe) calling throttle_group_co_io_limits_intercept() (which
has int64_t argument) and then calling corresponding bdrv_co_ function.
bdrv_co_ functions are OK with int64_t bytes as well.

So dropping the check for INT_MAX we just get same restrictions as in
bdrv_ layer: discard and write-zeroes goes through
bdrv_check_qiov_request() and are allowed to be 64bit. Other requests
go through bdrv_check_request32() and still restricted by INT_MAX
boundary.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-13-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

aa78b825

block-backend: blk_pread, blk_pwrite: rename count parameter to bytes · 14149710

Vladimir Sementsov-Ogievskiy authored 3 years ago


To be consistent with declarations in include/sysemu/block-backend.h.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

14149710

block-backend: convert blk_aio_ functions to int64_t bytes paramter · a93d81c8

Vladimir Sementsov-Ogievskiy authored 3 years ago


1. Convert bytes in BlkAioEmAIOCB:
  aio->bytes is only passed to already int64_t interfaces, and set in
  blk_aio_prwv, which is updated here.

2. For all updated functions the parameter type becomes wider so callers
   are safe.

3. In blk_aio_prwv we only store bytes to BlkAioEmAIOCB, which is
   updated here.

4. Other updated functions are wrappers on blk_aio_prwv.

Note that blk_aio_preadv and blk_aio_pwritev become safer: before this
commit, it's theoretically possible to pass qiov with size exceeding
INT_MAX, which than converted to int argument of blk_aio_prwv. Now it's
converted to int64_t which is a lot better. Still add assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-11-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak assertion and grammar]
Signed-off-by: Eric Blake <eblake@redhat.com>

a93d81c8

block-backend: convert blk_co_copy_range to int64_t bytes · e192179b

Vladimir Sementsov-Ogievskiy authored 3 years ago


Function is updated so that parameter type becomes wider, so all
callers should be OK with it.

Look at blk_co_copy_range() itself: bytes is passed only to
blk_check_byte_request() and bdrv_co_copy_range(), which already have
int64_t bytes parameter, so we are OK.

Note that requests exceeding INT_MAX are still restricted by
blk_check_byte_request().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-10-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

e192179b

block-backend: convert blk_foo wrappers to use int64_t bytes parameter · 06f0325c

Vladimir Sementsov-Ogievskiy authored 3 years ago


Convert blk_pdiscard, blk_pwrite_compressed, blk_pwrite_zeroes.
These are just wrappers for functions with int64_t argument, so allow
passing int64_t as well. Parameter type becomes wider so all callers
should be OK with it.

Note that requests exceeding INT_MAX are still restricted by
blk_check_byte_request().

Note also that we don't (and are not going to) convert blk_pwrite and
blk_pread: these functions return number of bytes on success, so to
update them, we should change return type to int64_t as well, which
will lead to investigating and updating all callers which is too much.

So, blk_pread and blk_pwrite remain unchanged.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-9-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

06f0325c

block-backend: drop blk_prw, use block-coroutine-wrapper · 16d36e29

Vladimir Sementsov-Ogievskiy authored 3 years ago


Let's drop hand-made coroutine wrappers and use coroutine wrapper
generation like in block/io.c.

Now, blk_foo() functions are written in same way as blk_co_foo() ones,
but wrap blk_do_foo() instead of blk_co_do_foo().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: spelling fix]
Signed-off-by: Eric Blake <eblake@redhat.com>

16d36e29

block-coroutine-wrapper.py: support BlockBackend first argument · 7d55a3bb

Vladimir Sementsov-Ogievskiy authored 3 years ago


Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

7d55a3bb

block-backend: rename _do_ helper functions to _co_do_ · 70e8775e

Vladimir Sementsov-Ogievskiy authored 3 years ago


This is a preparation to the following commit, to use automatic
coroutine wrapper generation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

70e8775e

block-backend: convert blk_co_pdiscard to int64_t bytes · 2800637a

Vladimir Sementsov-Ogievskiy authored 3 years ago


We updated blk_do_pdiscard() and its wrapper blk_co_pdiscard(). Both
functions are updated so that the parameter type becomes wider, so all
callers should be OK with it.

Look at blk_do_pdiscard(): bytes is passed only to
blk_check_byte_request() and bdrv_co_pdiscard(), which already have
int64_t bytes parameter, so we are OK.

Note that requests exceeding INT_MAX are still restricted by
blk_check_byte_request().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

2800637a

block-backend: convert blk_co_pwritev_part to int64_t bytes · 34460feb

Vladimir Sementsov-Ogievskiy authored 3 years ago


We convert blk_do_pwritev_part() and some wrappers:
blk_co_pwritev_part(), blk_co_pwritev(), blk_co_pwrite_zeroes().

All functions are converted so that the parameter type becomes wider, so
all callers should be OK with it.

Look at blk_do_pwritev_part() body:
bytes is passed to:

 - trace_blk_co_pwritev (we update it here)
 - blk_check_byte_request, throttle_group_co_io_limits_intercept,
   bdrv_co_pwritev_part - all already have int64_t argument.

Note that requests exceeding INT_MAX are still restricted by
blk_check_byte_request().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

34460feb

block-backend: make blk_co_preadv() 64bit · 95479077

Vladimir Sementsov-Ogievskiy authored 3 years ago


For both updated functions, the type of bytes becomes wider, so all callers
should be OK with it.

blk_co_preadv() only passes its arguments to blk_do_preadv().

blk_do_preadv() passes bytes to:

 - trace_blk_co_preadv, which is updated too
 - blk_check_byte_request, throttle_group_co_io_limits_intercept,
   bdrv_co_preadv, which are already int64_t.

Note that requests exceeding INT_MAX are still restricted by
blk_check_byte_request().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

95479077

block-backend: blk_check_byte_request(): int64_t bytes · 7242db63

Vladimir Sementsov-Ogievskiy authored 3 years ago


Rename size and make it int64_t to correspond to modern block layer,
which always uses int64_t for offset and bytes (not in blk layer yet,
which is a task for following commits).

All callers pass int or unsigned int.

So, for bytes in [0, INT_MAX] nothing is changed, for negative bytes we
now fail on "bytes < 0" check instead of "bytes > INT_MAX" check.

Note, that blk_check_byte_request() still doesn't allow requests
exceeding INT_MAX.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006131718.214235-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

7242db63

qcow2: Silence clang -m32 compiler warning · e7e588d4

Hanna Reitz authored 3 years ago


With -m32, size_t is generally only a uint32_t.  That makes clang
complain that in the assertion

  assert(qiov->size <= INT64_MAX);

the range of the type of qiov->size (size_t) is too small for any of its
values to ever exceed INT64_MAX.

Cast qiov->size to uint64_t to silence clang.

Fixes: f7ef38dd
       ("block: use int64_t instead of uint64_t in driver read
       handlers")
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20211011155031.149158-1-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

e7e588d4

Oct 14, 2021

configure, meson: move libaio check to meson.build · ff66f3e5

Paolo Bonzini authored 3 years ago


Message-Id: <20211007130829.632254-10-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ff66f3e5

Oct 07, 2021

mirror: Do not clear .cancelled · a640fa0e

Hanna Reitz authored 3 years ago


Clearing .cancelled before leaving the main loop when the job has been
soft-cancelled is no longer necessary since job_is_cancelled() only
returns true for jobs that have been force-cancelled.

Therefore, this only makes a differences in places that call
job_cancel_requested().  In block/mirror.c, this is done only before
.cancelled was cleared.

In job.c, there are two callers:
- job_completed_txn_abort() asserts that .cancelled is true, so keeping
  it true will not affect this place.

- job_complete() refuses to let a job complete that has .cancelled set.
  It is correct to refuse to let the user invoke job-complete on mirror
  jobs that have already been soft-cancelled.

With this change, there are no places that reset .cancelled to false and
so we can be sure that .force_cancel can only be true if .cancelled is
true as well.  Assert this in job_is_cancelled().

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-13-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

a640fa0e

mirror: Stop active mirroring after force-cancel · 9b230ef9

Hanna Reitz authored 3 years ago


Once the mirror job is force-cancelled (job_is_cancelled() is true), we
should not generate new I/O requests.  This applies to active mirroring,
too, so stop it once the job is cancelled.

(We must still forward all I/O requests to the source, though, of
course, but those are not really I/O requests generated by the job, so
this is fine.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-12-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

9b230ef9

mirror: Check job_is_cancelled() earlier · 4feeec7e

Hanna Reitz authored 3 years ago

We must check whether the job is force-cancelled early in our main loop,
most importantly before any `continue` statement.  For example, we used
to have `continue`s before our current checking location that are
triggered by `mirror_flush()` failing.  So, if `mirror_flush()` kept
failing, force-cancelling the job would not terminate it.

Jobs can be cancelled while they yield, and once they are
(force-cancelled), they should not generate new I/O requests.
Therefore, we should put the check after the last yield before
mirror_iteration() is invoked.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462


Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-11-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

4feeec7e

mirror: Use job_is_cancelled() · 20ad4d20

Hanna Reitz authored 3 years ago

mirror_drained_poll() returns true whenever the job is cancelled,
because "we [can] be sure that it won't issue more requests". However,
this is only true for force-cancelled jobs, so use job_is_cancelled().

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-10-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

20ad4d20

job: Add job_cancel_requested() · 08b83bff

Hanna Reitz authored 3 years ago

Most callers of job_is_cancelled() actually want to know whether the job
is on its way to immediate termination.  For example, we refuse to pause
jobs that are cancelled; but this only makes sense for jobs that are
really actually cancelled.

A mirror job that is cancelled during READY with force=false should
absolutely be allowed to pause.  This "cancellation" (which is actually
a kind of completion) may take an indefinite amount of time, and so
should behave like any job during normal operation.  For example, with
on-target-error=stop, the job should stop on write errors.  (In
contrast, force-cancelled jobs should not get write errors, as they
should just terminate and not do further I/O.)

Therefore, redefine job_is_cancelled() to only return true for jobs that
are force-cancelled (which as of HEAD^ means any job that interprets the
cancellation request as a request for immediate termination), and add
job_cancel_requested() as the general variant, which returns true for
any jobs which have been requested to be cancelled, whether it be
immediately or after an arbitrarily long completion phase.

Finally, here is a justification for how different job_is_cancelled()
invocations are treated by this patch:

- block/mirror.c (mirror_run()):
  - The first invocation is a while loop that should loop until the job
    has been cancelled or scheduled for completion.  What kind of cancel
    does not matter, only the fact that the job is supposed to end.

  - The second invocation wants to know whether the job has been
    soft-cancelled.  Calling job_cancel_requested() is a bit too broad,
    but if the job were force-cancelled, we should leave the main loop
    as soon as possible anyway, so this should not matter here.

  - The last two invocations already check force_cancel, so they should
    continue to use job_is_cancelled().

- block/backup.c, block/commit.c, block/stream.c, anything in tests/:
  These jobs know only force-cancel, so there is no difference between
  job_is_cancelled() and job_cancel_requested().  We can continue using
  job_is_cancelled().

- job.c:
  - job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled
    jobs should be prevented from being paused.  Continue using job_is_cancelled().

  - job_update_rc(), job_finalize_single(), job_finish_sync(): These
    functions are all called after the job has left its main loop.  The
    mirror job (the only job that can be soft-cancelled) will clear
    .cancelled before leaving the main loop if it has been
    soft-cancelled.  Therefore, these functions will observe .cancelled
    to be true only if the job has been force-cancelled.  We can
    continue to use job_is_cancelled().
    (Furthermore, conceptually, a soft-cancelled mirror job should not
    report to have been cancelled.  It should report completion (see
    also the block-job-cancel QAPI documentation).  Therefore, it makes
    sense for these functions not to distinguish between a
    soft-cancelled mirror job and a job that has completed as normal.)

  - job_completed_txn_abort(): All jobs other than @job have been
    force-cancelled.  job_is_cancelled() must be true for them.
    Regarding @job itself: job_completed_txn_abort() is mostly called
    when the job's return value is not 0.  A soft-cancelled mirror has a
    return value of 0, and so will not end up here then.
    However, job_cancel() invokes job_completed_txn_abort() if the job
    has been deferred to the main loop, which is mostly the case for
    completed jobs (which skip the assertion), but not for sure.
    To be safe, use job_cancel_requested() in this assertion.

  - job_complete(): This is function eventually invoked by the user
    (through qmp_block_job_complete() or qmp_job_complete(), or
    job_complete_sync(), which comes from qemu-img).  The intention here
    is to prevent a user from invoking job-complete after the job has
    been cancelled.  This should also apply to soft cancelling: After a
    mirror job has been soft-cancelled, the user should not be able to
    decide otherwise and have it complete as normal (i.e. pivoting to
    the target).

  - job_cancel(): Both functions are equivalent (see comment there), but
    we want to use job_is_cancelled(), because this shows that we call
    job_completed_txn_abort() only for force-cancelled jobs.  (As
    explained for job_update_rc(), soft-cancelled jobs should be treated
    as if they have completed as normal.)

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462


Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-9-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

08b83bff

jobs: Give Job.force_cancel more meaning · 73895f38

Hanna Reitz authored 3 years ago


We largely have two cancel modes for jobs:

First, there is actual cancelling.  The job is terminated as soon as
possible, without trying to reach a consistent result.

Second, we have mirror in the READY state.  Technically, the job is not
really cancelled, but it just is a different completion mode.  The job
can still run for an indefinite amount of time while it tries to reach a
consistent result.

We want to be able to clearly distinguish which cancel mode a job is in
(when it has been cancelled).  We can use Job.force_cancel for this, but
right now it only reflects cancel requests from the user with
force=true, but clearly, jobs that do not even distinguish between
force=false and force=true are effectively always force-cancelled.

So this patch has Job.force_cancel signify whether the job will
terminate as soon as possible (force_cancel=true) or whether it will
effectively remain running despite being "cancelled"
(force_cancel=false).

To this end, we let jobs that provide JobDriver.cancel() tell the
generic job code whether they will terminate as soon as possible or not,
and for jobs that do not provide that method we assume they will.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20211006151940.214590-7-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

73895f38

job: @force parameter for job_cancel_sync() · 4cfb3f05

Hanna Reitz authored 3 years ago

Callers should be able to specify whether they want job_cancel_sync() to
force-cancel the job or not.

In fact, almost all invocations do not care about consistency of the
result and just want the job to terminate as soon as possible, so they
should pass force=true.  The replication block driver is the exception,
specifically the active commit job it runs.

As for job_cancel_sync_all(), all callers want it to force-cancel all
jobs, because that is the point of it: To cancel all remaining jobs as
quickly as possible (generally on process termination).  So make it
invoke job_cancel_sync() with force=true.

This changes some iotest outputs, because quitting qemu while a mirror
job is active will now lead to it being cancelled instead of completed,
which is what we want.  (Cancelling a READY mirror job with force=false
may take an indefinite amount of time, which we do not want when
quitting.  If users want consistent results, they must have all jobs be
done before they quit qemu.)

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462


Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20211006151940.214590-6-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

4cfb3f05