- May 12, 2022
-
-
Paolo Bonzini authored
qemu_co_queue_restart_all is basically the same as qemu_co_enter_all but without a QemuLockable argument. That's perfectly fine, but only as long as the function is marked coroutine_fn. If used outside coroutine context, qemu_co_queue_wait will attempt to take the lock and that is just broken: if you are calling qemu_co_queue_restart_all outside coroutine context, the lock is going to be a QemuMutex which cannot be taken twice by the same thread. The patch adds the marker to qemu_co_queue_restart_all and to its sole non-coroutine_fn caller; it then reimplements the function in terms of qemu_co_enter_all_impl, to remove duplicated code and to clarify that the latter also works in coroutine context. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20220427130830.150180-4-pbonzini@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- May 11, 2022
-
-
Markus Armbruster authored
Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Macros should be ALL_CAPS. Normalize the exception. Done with scripts/clean-header-guards.pl. include/hw/xen/interface/ and tools/virtiofsd/ left alone, because these were imported from Xen and libfuse respectively. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-3-armbru@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org>
-
Markus Armbruster authored
Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]
-
- May 04, 2022
-
-
Hanna Reitz authored
VMDK disk data is stored in extents, which may or may not be separate from bs->file. VmdkExtent.file points to where they are stored. Each that is stored in bs->file will simply reuse the exact pointer value of bs->file. (That is why vmdk_free_extents() will unref VmdkExtent.file (e->file) only if e->file != bs->file.) Reopen operations can change bs->file (they will replace the whole BdrvChild object, not just the BDS stored in that BdrvChild), and then we will need to change all .file pointers of all such VmdkExtents to point to the new BdrvChild. In vmdk_reopen_prepare(), we have to check which VmdkExtents are affected, and in vmdk_reopen_commit(), we can modify them. We have to split this because: - The new BdrvChild is created only after prepare, so we can change VmdkExtent.file only in commit - In commit, there no longer is any (valid) reference to the old BdrvChild object, so there would be nothing to compare VmdkExtent.file against to see whether it was equal to bs->file before reopening (There is BDRVReopenState.old_file_bs, but the old bs->file BdrvChild's .bs pointer will be NULL-ed when the new BdrvChild is created, and so we cannot compare VmdkExtent.file->bs against BDRVReopenState.old_file_bs) Signed-off-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220314162719.65384-2-hreitz@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Hanna Reitz authored
qcow2_co_invalidate_cache() closes and opens the qcow2 file, by calling qcow2_close() and qcow2_do_open(). These two functions must thus be usable from both a global-state and an I/O context. As they are, they are not safe to call in an I/O context, because they use bdrv_unref_child() and bdrv_open_child() to close/open the data_file child, respectively, both of which are global-state functions. When used from qcow2_co_invalidate_cache(), we do not need to close/open the data_file child, though (we do not do this for bs->file or bs->backing either), and so we should skip it in the qcow2_co_invalidate_cache() path. To do so, add a parameter to qcow2_do_open() and qcow2_close() to make them skip handling s->data_file, and have qcow2_co_invalidate_cache() exempt it from the memset() on the BDRVQcow2State. (Note that the QED driver similarly closes/opens the QED image by invoking bdrv_qed_close()+bdrv_qed_do_open(), but both functions seem safe to use in an I/O context.) Fixes: https://gitlab.com/qemu-project/qemu/-/issues/945 Signed-off-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-3-hreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- May 03, 2022
-
-
Marc-André Lureau authored
It is only used by block/file-posix.c, move it there. Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org>
-
- Apr 26, 2022
-
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-10-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
requests[].receiving is set by nbd_receive_replies() under the receive_mutex; Read it under the same mutex as well. Waking up receivers on errors happens after each reply finishes processing, in nbd_co_receive_one_chunk(). If there is no currently-active reply, there are two cases: * either there is no active request at all, in which case no element of request[] can have .receiving = true * or nbd_receive_replies() must be running and owns receive_mutex; in that case it will get back to nbd_co_receive_one_chunk() because the socket has been shutdown, and all waiting coroutines will wake up in turn. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-9-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
Remove the confusing, and most likely wrong, atomics. The only function that used to be somewhat in a hot path was nbd_client_connected(), but it is not anymore after the previous patches. The same logic is used both to check if a request had to be reissued and also in nbd_reconnecting_attempt(). The former cases are outside requests_lock, while nbd_reconnecting_attempt() does have the lock, therefore the two have been separated in the previous commit. nbd_client_will_reconnect() can simply take s->requests_lock, while nbd_reconnecting_attempt() can inline the access now that no complicated atomics are involved. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-8-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
Prepare for the next patch, so that the diff is less confusing. nbd_client_connecting is moved closer to the definition point. nbd_client_connecting_wait() is kept only for the reconnection logic; when it is used to check if a request has to be reissued, use the renamed function nbd_client_will_reconnect(). In the next patch, the two cases will have different locking requirements. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-7-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
The condition for waiting on the s->free_sema queue depends on both s->in_flight and s->state. The latter is currently using atomics, but this is quite dubious and probably wrong. Because s->state is written in the main thread too, for example by the yank callback, it cannot be protected by a CoMutex. Introduce a separate lock that can be used by nbd_co_send_request(); later on this lock will also be used for s->state. There will not be any contention on the lock unless there is a yank or reconnect, so this is not performance sensitive. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-6-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
Elevate s->in_flight early so that other incoming requests will wait on the CoQueue in nbd_co_send_request; restart them after getting back from nbd_reconnect_attempt. This could be after the reconnect timer or nbd_cancel_in_flight have cancelled the attempt, so there is no need anymore to cancel the requests there. nbd_co_send_request now handles both stopping and restarting pending requests after a successful connection, and there is no need to hold send_mutex in nbd_co_do_establish_connection. The current setup is confusing because nbd_co_do_establish_connection is called both with send_mutex taken and without it. Before the patch it uses free_sema which (at least in theory...) is protected by send_mutex, after the patch it does not anymore. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-5-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> [eblake: wrap long line] Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
It is unnecessary to check nbd_client_connected() because every time s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down and all coroutines are resumed. The only case where it was actually needed is when the NBD server disconnects and there is no reconnect-delay. In that case, nbd_receive_replies() does not set s->reply.handle and nbd_co_do_receive_one_chunk() cannot continue. For that one case, check the return value of nbd_receive_replies(). As to the others: * nbd_receive_replies() can put the current coroutine to sleep if another reply is ongoing; then it will be woken by nbd_channel_error(), called by the ongoing reply. Or it can try itself to read a reply header and fail, thus calling nbd_channel_error() itself. * nbd_co_send_request() will write the body of the request and fail * nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk() and then nbd_co_do_receive_one_chunk(), which will handle the failure as above; or it will just detect a previous call to nbd_iter_channel_error() via iter->ret < 0. Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-4-pbonzini@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
Several coroutine functions in block/nbd.c are not marked as such. This patch adds a few more markers; it is not exhaustive, but it focuses especially on: - places that wake other coroutines, because aio_co_wake() has very different semantics inside a coroutine (queuing after yield vs. entering immediately); - functions with _co_ in their names, to avoid confusion Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-3-pbonzini@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Paolo Bonzini authored
The .reply_possible field of s->requests is never set to false. This is not a problem as it is only a safeguard to detect protocol errors, but it's sloppy. In fact, the field is actually not necessary at all, because .coroutine is set to NULL in NBD_FOREACH_REPLY_CHUNK after receiving the last chunk. Thus, replace .reply_possible with .coroutine and move the check before deciding the fate of this request. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220414175756.671165-2-pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by:
Lukas Straub <lukasstraub2@web.de> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Rename the type to be reused. Old name is "what is it for". To be natively reused for other needs, let's name it exactly "what is it". Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru> [eblake: Adjust S-o-b to Vladimir's new email, with permission] Reviewed-by:
Eric Blake <eblake@redhat.com> Acked-by:
John Snow <jsnow@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
- Apr 25, 2022
-
-
Denis V. Lunev authored
'blockdev-change-medium' is a convinient wrapper for the following sequence of commands: * blockdev-open-tray * blockdev-remove-medium * blockdev-insert-medium * blockdev-close-tray and should be used f.e. to change ISO image inside the CD-ROM tray. Though the guest could lock the tray and some linux guests like CentOS 8.5 actually does that. In this case the execution if this command results in the error like the following: Device 'scsi0-0-1-0' is locked and force was not specified, wait for tray to open and try again. This situation is could be resolved 'blockdev-open-tray' by passing flag 'force' inside. Thus is seems reasonable to add the same capability for 'blockdev-change-medium' too. Signed-off-by:
Denis V. Lunev <den@openvz.org> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Acked-by:
"Dr. David Alan Gilbert" <dgilbert@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Hanna Reitz <hreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Markus Armbruster <armbru@redhat.com> Message-Id: <20220412221846.280723-1-den@openvz.org> Reviewed-by:
Daniel P. Berrangé <berrange@redhat.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
- Apr 20, 2022
-
-
Hanna Reitz authored
Instead of fprint()-ing error messages in rebuild_refcount_structure() and its rebuild_refcounts_write_refblocks() helper, pass them through an Error object to qcow2_check_refcounts() (which will then print it). Suggested-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220405134652.19278-4-hreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com>
-
Hanna Reitz authored
When rebuilding the refcount structures (when qemu-img check -r found errors with refcount = 0, but reference count > 0), the new refcount table defaults to being put at the image file end[1]. There is no good reason for that except that it means we will not have to rewrite any refblocks we already wrote to disk. Changing the code to rewrite those refblocks is not too difficult, though, so let us do that. That is beneficial for images on block devices, where we cannot really write beyond the end of the image file. Use this opportunity to add extensive comments to the code, and refactor it a bit, getting rid of the backwards-jumping goto. [1] Unless there is something allocated in the area pointed to by the last refblock, so we have to write that refblock. In that case, we try to put the reftable in there. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071 Closes: https://gitlab.com/qemu-project/qemu/-/issues/941 Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220405134652.19278-2-hreitz@redhat.com>
-
- Apr 06, 2022
-
-
Marc-André Lureau authored
Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marc-André Lureau authored
Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-26-marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Marc-André Lureau authored
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Mar 29, 2022
-
-
Hanna Reitz authored
When the stream block job cuts out the nodes between top and base in stream_prepare(), it does not drain the subtree manually; it fetches the base node, and tries to insert it as the top node's backing node with bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so the actual base node might change (because the base node is actually not part of the stream job) before the old base node passed to bdrv_set_backing_hd() is installed. This has two implications: First, the stream job does not keep a strong reference to the base node. Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g. because some other block job is drained to finish), we will get a use-after-free. We should keep a strong reference to that node. Second, even with such a strong reference, the problem remains that the base node might change before bdrv_set_backing_hd() actually runs and as a result the wrong base node is installed. Both effects can be seen in 030's TestParallelOps.test_overlapping_5() case, which has five nodes, and simultaneously streams from the middle node to the top node, and commits the middle node down to the base node. As it is, this will sometimes crash, namely when we encounter the above-described use-after-free. Taking a strong reference to the base node, we no longer get a crash, but the resuling block graph is less than ideal: The expected result is obviously that all middle nodes are cut out and the base node is the immediate backing child of the top node. However, if stream_prepare() takes a strong reference to its base node (the middle node), and then the commit job finishes in bdrv_set_backing_hd(), supposedly dropping that middle node, the stream job will just reinstall it again. Therefore, we need to keep the whole subtree drained in stream_prepare(), so that the graph modification it performs is effectively atomic, i.e. that the base node it fetches is still the base node when bdrv_set_backing_hd() sets it as the top node's backing node. Verify this by asserting in said 030's test case that the base node is always the top node's immediate backing child when both jobs are done. Signed-off-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220324140907.17192-1-hreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Acked-by:
Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
-
- Mar 24, 2022
-
-
Philippe Mathieu-Daudé authored
"0x%u" format is very misleading, replace by "0x%x". Found running: $ git grep -E '0x%[0-9]*([lL]*|" ?PRI)[dDuU]' block/ Inspired-by:
Richard Henderson <richard.henderson@linaro.org> Signed-off-by:
Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Reviewed-by:
Daniel P. Berrangé <berrange@redhat.com> Reviewed-by:
Denis V. Lunev <den@openvz.org> Message-id: 20220323114718.58714-2-philippe.mathieu.daude@gmail.com Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- Mar 22, 2022
-
-
Marc-André Lureau authored
One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Reviewed-by:
Philippe Mathieu-Daudé <f4bug@amsat.org>
-
Marc-André Lureau authored
One less qemu-specific macro. It also helps to make some headers/units only depend on glib, and thus moved in standalone projects eventually. Signed-off-by:
Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by:
Richard W.M. Jones <rjones@redhat.com>
-
Stefano Garzarella authored
Commit d24f8023 ("block/rbd: increase dynamically the image size") added a workaround to support growing images (eg. qcow2), resizing the image before write operations that exceed the current size. We recently added support for write zeroes and without the workaround we can have problems with qcow2. So let's move the resize into qemu_rbd_start_co() and do it when the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993 Fixes: c56ac27d ("block/rbd: add write zeroes support") Signed-off-by:
Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220317162638.41192-1-sgarzare@redhat.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
- Mar 21, 2022
-
-
Rao Lei authored
During the IO stress test, the IO request coroutine has a probability that is can't be awakened when the NBD server is killed. The GDB stack is as follows: (gdb) bt 0 0x00007f2ff990cbf6 in __ppoll (fds=0x55575de85000, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 1 0x000055575c302e7c in qemu_poll_ns (fds=0x55575de85000, nfds=1, timeout=599999603140) at ../util/qemu-timer.c:348 2 0x000055575c2d3c34 in fdmon_poll_wait (ctx=0x55575dc480f0, ready_list=0x7ffd9dd1dae0, timeout=599999603140) at ../util/fdmon-poll.c:80 3 0x000055575c2d350d in aio_poll (ctx=0x55575dc480f0, blocking=true) at ../util/aio-posix.c:655 4 0x000055575c16eabd in bdrv_do_drained_begin(bs=0x55575dee7fe0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true)at ../block/io.c:474 5 0x000055575c16eba6 in bdrv_drained_begin (bs=0x55575dee7fe0) at ../block/io.c:480 6 0x000055575c1aff33 in quorum_del_child (bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block/quorum.c:1130 7 0x000055575c14239b in bdrv_del_child (parent_bs=0x55575dee7fe0, child=0x55575dcea690, errp=0x7ffd9dd1dd08) at ../block.c:7705 8 0x000055575c12da28 in qmp_x_blockdev_change(parent=0x55575df404c0 "colo-disk0", has_child=true, child=0x55575de867f0 "children.1", has_node=false, no de=0x0, errp=0x7ffd9dd1dd08) at ../blockdev.c:3676 9 0x000055575c258435 in qmp_marshal_x_blockdev_change (args=0x7f2fec008190, ret=0x7f2ff7b0bd98, errp=0x7f2ff7b0bd90) at qapi/qapi-commands-block-core.c :1675 10 0x000055575c2c6201 in do_qmp_dispatch_bh (opaque=0x7f2ff7b0be30) at ../qapi/qmp-dispatch.c:129 11 0x000055575c2ebb1c in aio_bh_call (bh=0x55575dc429c0) at ../util/async.c:141 12 0x000055575c2ebc2a in aio_bh_poll (ctx=0x55575dc480f0) at ../util/async.c:169 13 0x000055575c2d2d96 in aio_dispatch (ctx=0x55575dc480f0) at ../util/aio-posix.c:415 14 0x000055575c2ec07f in aio_ctx_dispatch (source=0x55575dc480f0, callback=0x0, user_data=0x0) at ../util/async.c:311 15 0x00007f2ff9e7cfbd in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0 16 0x000055575c2fd581 in glib_pollfds_poll () at ../util/main-loop.c:232 17 0x000055575c2fd5ff in os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:255 18 0x000055575c2fd710 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 19 0x000055575bfa7588 in qemu_main_loop () at ../softmmu/runstate.c:726 20 0x000055575bbee57a in main (argc=60, argv=0x7ffd9dd1e0e8, envp=0x7ffd9dd1e2d0) at ../softmmu/main.c:50 (gdb) qemu coroutine 0x55575e16aac0 0 0x000055575c2ee7dc in qemu_coroutine_switch (from_=0x55575e16aac0, to_=0x7f2ff830fba0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302 1 0x000055575c2fe2a9 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:195 2 0x000055575c2fe93c in qemu_co_queue_wait_impl (queue=0x55575dc46170, lock=0x7f2b32ad9850) at ../util/qemu-coroutine-lock.c:56 3 0x000055575c17ddfb in nbd_co_send_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, qiov=0x55575dfc15d8) at ../block/nbd.c:478 4 0x000055575c17f931 in nbd_co_request (bs=0x55575ebfaf20, request=0x7f2b32ad9920, write_qiov=0x55575dfc15d8) at ../block/nbd.c:1182 5 0x000055575c17fe14 in nbd_client_co_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/nbd.c:1284 6 0x000055575c170d25 in bdrv_driver_pwritev (bs=0x55575ebfaf20, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1264 7 0x000055575c1733b4 in bdrv_aligned_pwritev (child=0x55575dff6890, req=0x7f2b32ad9ad0, offset=403487858688, bytes=4538368, align=1, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 8 0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 9 0x000055575c17391b in bdrv_co_pwritev (child=0x55575dff6890, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 10 0x000055575c1ee506 in replication_co_writev (bs=0x55575e9824f0, sector_num=788062224, remaining_sectors=8864, qiov=0x55575dfc15d8, flags=0) at ../block/replication.c:270 11 0x000055575c170eed in bdrv_driver_pwritev (bs=0x55575e9824f0, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:1297 12 0x000055575c1733b4 in bdrv_aligned_pwritev (child=0x55575dcea690, req=0x7f2b32ad9e00, offset=403487858688, bytes=4538368, align=512, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2126 13 0x000055575c173c67 in bdrv_co_pwritev_part (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, qiov_offset=0, flags=0) at ../block/io.c:2314 14 0x000055575c17391b in bdrv_co_pwritev (child=0x55575dcea690, offset=403487858688, bytes=4538368, qiov=0x55575dfc15d8, flags=0) at ../block/io.c:2233 15 0x000055575c1aeffa in write_quorum_entry (opaque=0x7f2fddaf8c50) at ../block/quorum.c:699 16 0x000055575c2ee4db in coroutine_trampoline (i0=1578543808, i1=21847) at ../util/coroutine-ucontext.c:173 17 0x00007f2ff9855660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 When we do failover in COLO mode, QEMU will hang while it is waiting for the in-flight IO. From the call trace, we can see the IO request coroutine has yielded in nbd_co_send_request(). When we kill the NBD server, it will never be wake up. Actually, when we do IO stress test, it will have a lot of requests in free_sema queue. When the NBD server is killed, current MAX_NBD_REQUESTS finishes with errors but they wake up at most MAX_NBD_REQEUSTS from the queue. So, let's move qemu_co_queue_next out to fix this issue. Signed-off-by:
Lei Rao <lei.rao@intel.com> Message-Id: <20220309074844.275450-1-lei.rao@intel.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
- Mar 15, 2022
-
-
Philippe Mathieu-Daudé authored
When building on macOS 12 we get: block/file-posix.c:3335:18: warning: 'IOMasterPort' is deprecated: first deprecated in macOS 12.0 [-Wdeprecated-declarations] kernResult = IOMasterPort( MACH_PORT_NULL, &masterPort ); ^~~~~~~~~~~~ IOMainPort Replace by IOMainPort, redefining it to IOMasterPort if not available. Suggested-by:
Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by:
Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed by: Cameron Esfahani <dirty@apple.com> Reviewed-by:
Akihiko Odaki <akihiko.odaki@gmail.com> Tested-by:
Akihiko Odaki <akihiko.odaki@gmail.com> Signed-off-by:
Philippe Mathieu-Daudé <f4bug@amsat.org>
-
- Mar 07, 2022
-
-
Daniel P. Berrangé authored
The TLS usage for NBD was restricted to IP sockets because validating x509 certificates requires knowledge of the hostname that the client is connecting to. TLS does not have to use x509 certificates though, as PSK (pre-shared keys) provide an alternative credential option. These have no requirement for a hostname and can thus be trivially used for UNIX sockets. Furthermore, with the ability to overide the default hostname for TLS validation in the previous patch, it is now also valid to want to use x509 certificates with FD passing and UNIX sockets. Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220304193610.3293146-6-berrange@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Daniel P. Berrangé authored
When connecting to an NBD server with TLS and x509 credentials, the client must validate the hostname it uses for the connection, against that published in the server's certificate. If the client is tunnelling its connection over some other channel, however, the hostname it uses may not match the info reported in the server's certificate. In such a case, the user needs to explicitly set an override for the hostname to use for certificate validation. This is achieved by adding a 'tls-hostname' property to the NBD block driver. Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220304193610.3293146-4-berrange@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Daniel P. Berrangé authored
In commit a71d597b Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Thu Jun 10 13:08:00 2021 +0300 block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() the use of the 'hostname' field from the BDRVNBDState struct was lost, and 'nbd_connect' just hardcoded it to match the IP socket address. This was a harmless bug at the time since we block use with anything other than IP sockets. Shortly though, we want to allow the caller to override the hostname used in the TLS certificate checks. This is to allow for TLS when doing port forwarding or tunneling. Thus we need to reinstate the passing along of the 'hostname'. Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220304193610.3293146-3-berrange@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Peter Maydell authored
Move the various memalign-related functions out of osdep.h and into their own header, which we include only where they are used. While we're doing this, add some brief documentation comments. Signed-off-by:
Peter Maydell <peter.maydell@linaro.org> Reviewed-by:
Richard Henderson <richard.henderson@linaro.org> Reviewed-by:
Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-10-peter.maydell@linaro.org
-
Vladimir Sementsov-Ogievskiy authored
Current scheme of image fleecing looks like this: [guest] [NBD export] | | |root | root v v [copy-before-write] -----> [temp.qcow2] | target | |file |backing v | [active disk] <-------------+ - On guest writes copy-before-write filter copies old data from active disk to temp.qcow2. So fleecing client (NBD export) when reads changed regions from temp.qcow2 image and unchanged from active disk through backing link. This patch makes possible new image fleecing scheme: [guest] [NBD export] | | | root | root v file v [copy-before-write]<------[snapshot-access] | | | file | target v v [active-disk] [temp.img] - copy-before-write does CBW operations and also provides snapshot-access API. The API may be accessed through snapshot-access driver. Benefits of new scheme: 1. Access control: if remote client try to read data that not covered by original dirty bitmap used on copy-before-write open, client gets -EACCES. 2. Discard support: if remote client do DISCARD, this additionally to discarding data in temp.img informs block-copy process to not copy these clusters. Next read from discarded area will return -EACCES. This is significant thing: when fleecing user reads data that was not yet copied to temp.img, we can avoid copying it on further guest write. 3. Synchronisation between client reads and block-copy write is more efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag used for writes to temp.qcow2. New scheme is less blocking: - fleecing reads are never blocked: if data region is untouched or in-flight, we just read from active-disk, otherwise we read from temp.img - writes to temp.img are not blocked by fleecing reads - still, guest writes of-course are blocked by in-flight fleecing reads, that currently read from active-disk - it's the minimum necessary blocking 4. Temporary image may be of any format, as we don't rely on backing feature. 5. Permission relation are simplified. With old scheme we have to share write permission on target child of copy-before-write, otherwise backing link conflicts with copy-before-write file child write permissions. With new scheme we don't have backing link, and copy-before-write node may have unshared access to temporary node. (Not realized in this commit, will be in future). 6. Having control on fleecing reads we'll be able to implement alternative behavior on failed copy-before-write operations. Currently we just break guest request (that's a historical behavior of backup). But in some scenarios it's a bad behavior: better is to drop the backup as failed but don't break guest request. With new scheme we can simply unset some bits in a bitmap on CBW failure and further fleecing reads will -EACCES, or something like this. (Not implemented in this commit, will be in future) Additional application for this is implementing timeout for CBW operations. Iotest 257 output is updated, as two more bitmaps now live in copy-before-write filter. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220303194349.2304213-13-vsementsov@virtuozzo.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
The new block driver simply utilizes snapshot-access API of underlying block node. In further patches we want to use it like this: [guest] [NBD export] | | | root | root v file v [copy-before-write]<------[snapshot-access] | | | file | target v v [active-disk] [temp.img] This way, NBD client will be able to read snapshotted state of active disk, when active disk is continued to be written by guest. This is known as "fleecing", and currently uses another scheme based on qcow2 temporary image which backing file is active-disk. New scheme comes with benefits - see next commit. The other possible application is exporting internal snapshots of qcow2, like this: [guest] [NBD export] | | | root | root v file v [qcow2]<---------[snapshot-access] For this, we'll need to implement snapshot-access API handlers in qcow2 driver, and improve snapshot-access block driver (and API) to make it possible to select snapshot by name. Another thing to improve is size of snapshot. Now for simplicity we just use size of bs->file, which is OK for backup, but for qcow2 snapshots export we'll need to imporve snapshot-access API to get size of snapshot. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220303194349.2304213-12-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Add new block driver handlers and corresponding generic wrappers. It will be used to allow copy-before-write filter to provide reach fleecing interface in further commit. In future this approach may be used to allow reading qcow2 internal snapshots, for example to export them through NBD. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Add function to wait for all intersecting requests. To be used in the further commit. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-10-vsementsov@virtuozzo.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Add a convenient function similar with bdrv_block_status() to get status of dirty bitmap. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Let's reuse convenient helper. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-8-vsementsov@virtuozzo.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
Split intersecting-requests functionality out of block-copy to be reused in copy-before-write filter. Note: while being here, fix tiny typo in MAINTAINERS. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-7-vsementsov@virtuozzo.com> Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-