- Nov 02, 2021
-
-
Thomas Weißschuh authored
VMDK files support an attribute that represents the version of the guest tools that are installed on the disk. This attribute is used by vSphere before a machine has been started to determine if the VM has the guest tools installed. This is important when configuring "Operating system customizations" in vSphere, as it checks for the presence of the guest tools before allowing those customizations. Thus when the VM has not yet booted normally it would be impossible to customize it, therefore preventing a customized first-boot. The attribute should not hurt on disks that do not have the guest tools installed and indeed the VMware tools also unconditionally add this attribute. (Defaulting to the value "2147483647", as is done in this patch) Signed-off-by:
Thomas Weißschuh <thomas.weissschuh.ext@zeiss.com> Message-Id: <20210913130419.13241-1-thomas.weissschuh.ext@zeiss.com> [hreitz: Added missing '#' in block-core.json] Signed-off-by:
Hanna Reitz <hreitz@redhat.com>
-
- Sep 29, 2021
-
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com>
-
Vladimir Sementsov-Ogievskiy authored
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by:
Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by:
Eric Blake <eblake@redhat.com>
-
- Jan 28, 2021
-
-
Eric Blake authored
The easiest spots to use QAPI_LIST_APPEND are where we already have an obvious pointer to the tail of a list. While at it, consistently use the variable name 'tail' for that purpose. Signed-off-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Markus Armbruster <armbru@redhat.com> Message-Id: <20210113221013.390592-5-eblake@redhat.com> Signed-off-by:
Markus Armbruster <armbru@redhat.com>
-
- Oct 13, 2020
-
-
Christian Borntraeger authored
Fedora 32 gcc 10 seems to give false positives: Compiling C object libblock.fa.p/block_vmdk.c.o ../block/vmdk.c: In function ‘vmdk_parse_extents’: ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 587 | g_free(extent->l1_table); | ^~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:754:17: note: ‘extent’ was declared here 754 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 | ret = vmdk_init_tables(bs, extent, errp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1178 | extent->flat_start_offset = flat_offset << 9; | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~ ../block/vmdk.c: In function ‘vmdk_open_vmdk4’: ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 581 | extent->l2_cache = | ~~~~~~~~~~~~~~~~~^ 582 | g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:872:17: note: ‘extent’ was declared here 872 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c: In function ‘vmdk_open’: ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 | ret = vmdk_init_tables(bs, extent, errp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 | VmdkExtent *extent; | ^~~~~~ cc1: all warnings being treated as errors make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1 fix them by assigning a default value. Signed-off-by:
Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by:
Fam Zheng <fam@euphon.net> Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com> Signed-off-by:
Laurent Vivier <laurent@vivier.eu>
-
- Sep 07, 2020
-
-
Hanna Reitz authored
Before HEAD^, we needed this because bdrv_co_flush() by itself would only flush bs->file. With HEAD^, bdrv_co_flush() will flush all children on which a WRITE or WRITE_UNCHANGED permission has been taken. Thus, vmdk no longer needs to do it itself. Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Kevin Wolf <kwolf@redhat.com>
-
- Sep 01, 2020
-
-
Liao Pingfang authored
Remove superfluous breaks, as there is a "return" before them. Signed-off-by:
Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by:
Yi Wang <wang.yi59@zte.com.cn> Reviewed-by:
Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by:
Thomas Huth <thuth@redhat.com> Message-Id: <1594631107-36574-1-git-send-email-wang.yi59@zte.com.cn> Signed-off-by:
Laurent Vivier <laurent@vivier.eu>
-
- Jul 14, 2020
-
-
Eric Blake authored
vmdk already requires that if backing_file is present, that it be another vmdk image (see vmdk_co_do_create). Meanwhile, we want to move towards always being explicit about the backing format for other drivers where it matters. So for convenience, make qemu-img create -F vmdk work, while rejecting all other explicit formats (note that this is only for QemuOpts usage; there is no change to the QAPI to allow a format through -blockdev). Signed-off-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200706203954.341758-5-eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Jul 10, 2020
-
-
Markus Armbruster authored
When migrate_add_blocker(blocker, &errp) is followed by error_propagate(errp, err), we can often just as well do migrate_add_blocker(..., errp). Do that with this Coccinelle script: @@ expression blocker, err, errp; expression ret; @@ - ret = migrate_add_blocker(blocker, &err); - if (err) { + ret = migrate_add_blocker(blocker, errp); + if (ret < 0) { ... when != err; - error_propagate(errp, err); ... } @@ expression blocker, err, errp; @@ - migrate_add_blocker(blocker, &err); - if (err) { + if (migrate_add_blocker(blocker, errp) < 0) { ... when != err; - error_propagate(errp, err); ... } Double-check @err is not used afterwards. Dereferencing it would be use after free, but checking whether it's null would be legitimate. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-43-armbru@redhat.com>
-
Markus Armbruster authored
When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. The previous commit did that with a Coccinelle script I consider fairly trustworthy. This commit uses the same script with the matching of return taken out, i.e. we convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... } to if (!foo(..., errp)) { ... ... } This is unsound: @err could still be read between afterwards. I don't know how to express "no read of @err without an intervening write" in Coccinelle. Instead, I manually double-checked for uses of @err. Suboptimal line breaks tweaked manually. qdev_realize() simplified further to placate scripts/checkpatch.pl. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-36-armbru@redhat.com>
-
Markus Armbruster authored
When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) | - !fun(args, &err, args2) + !fun(args, errp, args2) | - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var | !var | var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; | return var; ) } @depends on rule1 || rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>
-
- May 18, 2020
-
-
Hanna Reitz authored
bdrv_default_perms() can decide which permission profile to use based on the BdrvChildRole, so block drivers do not need to select it explicitly. The blkverify driver now no longer shares the WRITE permission for the image to verify. We thus have to adjust two places in test-block-iothread not to take it. (Note that in theory, blkverify should behave like quorum in this regard and share neither WRITE nor RESIZE for both of its children. In practice, it does not really matter, because blkverify is used only for debugging, so we might as well keep its permissions rather liberal.) Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-30-mreitz@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Hanna Reitz authored
Commonly, they need to pass the BDRV_CHILD_IMAGE set as the BdrvChildRole; but there are exceptions for drivers with external data files (qcow2 and vmdk). Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-26-mreitz@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Hanna Reitz authored
For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Hanna Reitz authored
We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- May 08, 2020
-
-
Kevin Wolf authored
If we have a backup L2 table, we currently flush once after writing to the active L2 table and again after writing to the backup table. A single flush is enough and makes things a little less slow. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-6-kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Kevin Wolf authored
If a cluster is already zeroed, we don't have to call vmdk_L2update(), which is rather slow because it flushes the image file. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-5-kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Kevin Wolf authored
When overwriting a zero cluster, we must not perform copy-on-write from the backing file, but from a zeroed buffer. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-4-kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Kevin Wolf authored
m_data must contain valid data even for zero clusters when no cluster was allocated in the image file. Without this, zero writes segfault with images that have zeroed_grain=on. For zero writes, we don't want to allocate a cluster in the image file even in compressed files. Fixes: 524089bc Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-3-kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Kevin Wolf authored
m_data is used for zero clusters even though valid == 0. It really only means that a new cluster was allocated in the image file. Rename it to reflect this. While at it, change it from int to bool, too. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Message-Id: <20200430133007.170335-2-kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- May 05, 2020
-
-
Eric Blake authored
There are several callers that need to create a new block backend from an existing BDS; make the task slightly easier with a common helper routine. Suggested-by:
Max Reitz <mreitz@redhat.com> Signed-off-by:
Eric Blake <eblake@redhat.com> Message-Id: <20200424190903.522087-2-eblake@redhat.com> [mreitz: Set @ret only in error paths, see https://lists.nongnu.org/archive/html/qemu-block/2020-04/msg01216.html ] Signed-off-by:
Max Reitz <mreitz@redhat.com> Message-Id: <20200428192648.749066-2-eblake@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Apr 30, 2020
-
-
Kevin Wolf authored
Now that node level interface bdrv_truncate() supports passing request flags to the block driver, expose this on the BlockBackend level, too. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Alberto Garcia <berto@igalia.com> Reviewed-by:
Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-4-kwolf@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Kevin Wolf authored
Now that block drivers can support flags for .bdrv_co_truncate, expose the parameter in the node level interfaces bdrv_co_truncate() and bdrv_truncate(). Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
Alberto Garcia <berto@igalia.com> Reviewed-by:
Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-3-kwolf@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Mar 26, 2020
-
-
Maxim Levitsky authored
This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by:
Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by:
Denis V. Lunev <den@openvz.org> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Mar 16, 2020
-
-
Philippe Mathieu-Daudé authored
Description copied from Linux kernel commit from Gustavo A. R. Silva (see [3]): --v-- description start --v-- The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member [1], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being unadvertenly introduced [2] to the Linux codebase from now on. --^-- description end --^-- Do the similar housekeeping in the QEMU codebase (which uses C99 since commit 7be41675). All these instances of code were found with the help of the following command (then manual analysis, without modifying structures only having a single flexible array member, such QEDTable in block/qed.h): git grep -F '[0];' [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=76497732932f [3] https://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git/commit/?id=17642a2fbd2c1 Inspired-by:
Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Oct 28, 2019
-
-
Hanna Reitz authored
We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by:
Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Sep 03, 2019
-
-
Hanna Reitz authored
Compressed writes generally have to write full clusters, not just in theory but also in practice when it comes to vmdk's streamOptimized subformat. It currently is just silently broken for writes with non-zero in-cluster offsets: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 4k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 4096 4 KiB, 1 ops; 00.01 sec (443.724 KiB/sec and 110.9309 ops/sec) read failed: Invalid argument (The technical reason is that vmdk_write_extent() just writes the incomplete compressed data actually to offset 4k. When reading the data, vmdk_read_extent() looks at offset 0 and finds the compressed data size to be 0, because that is what it reads from there. This yields an error.) For incomplete writes with zero in-cluster offsets, the error path when reading the rest of the cluster is a bit different, but the result is the same: $ qemu-img create -f vmdk -o subformat=streamOptimized foo.vmdk 1M $ qemu-io -c 'write 0k 4k' -c 'read 4k 4k' foo.vmdk wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 00.01 sec (362.641 KiB/sec and 90.6603 ops/sec) read failed: Invalid argument (Here, vmdk_read_extent() finds the data and then sees that the uncompressed data is short.) It is better to reject invalid writes than to make the user believe they might have succeeded and then fail when trying to read it back. Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
John Snow <jsnow@redhat.com> Message-id: 20190815153638.4600-5-mreitz@redhat.com Reviewed-by:
John Snow <jsnow@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Hanna Reitz authored
This makes iotest 033 pass with e.g. subformat=monolithicFlat. It also turns a former error in 059 into success. Signed-off-by:
Max Reitz <mreitz@redhat.com> Message-id: 20190815153638.4600-3-mreitz@redhat.com Reviewed-by:
John Snow <jsnow@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Aug 19, 2019
-
-
Hanna Reitz authored
Fixes: 69f47505 Signed-off-by:
Max Reitz <mreitz@redhat.com> Message-id: 20190725155512.9827-3-mreitz@redhat.com Reviewed-by:
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by:
John Snow <jsnow@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Jun 24, 2019
-
-
Sam Eiderman authored
Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in QEMU). This format was lacking in the following: * Grain directory (L1) and grain table (L2) entries were 32-bit, allowing access to only 2TB (slightly less) of data. * The grain size (default) was 512 bytes - leading to data fragmentation and many grain tables. * For space reclamation purposes, it was necessary to find all the grains which are not pointed to by any grain table - so a reverse mapping of "offset of grain in vmdk" to "grain table" must be constructed - which takes large amounts of CPU/RAM. The format specification can be found in VMware's documentation: https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf In ESXi 6.5, to support snapshot files larger than 2TB, a new format was introduced: SESparse (Space Efficient). This format fixes the above issues: * All entries are now 64-bit. * The grain size (default) is 4KB. * Grain directory and grain tables are now located at the beginning of the file. + seSparse format reserves space for all grain tables. + Grain tables can be addressed using an index. + Grains are located in the end of the file and can also be addressed with an index. - seSparse vmdks of large disks (64TB) have huge preallocated headers - mainly due to L2 tables, even for empty snapshots. * The header contains a reverse mapping ("backmap") of "offset of grain in vmdk" to "grain table" and a bitmap ("free bitmap") which specifies for each grain - whether it is allocated or not. Using these data structures we can implement space reclamation efficiently. * Due to the fact that the header now maintains two mappings: * The regular one (grain directory & grain tables) * A reverse one (backmap and free bitmap) These data structures can lose consistency upon crash and result in a corrupted VMDK. Therefore, a journal is also added to the VMDK and is replayed when the VMware reopens the file after a crash. Since ESXi 6.7 - SESparse is the only snapshot format available. Unfortunately, VMware does not provide documentation regarding the new seSparse format. This commit is based on black-box research of the seSparse format. Various in-guest block operations and their effect on the snapshot file were tested. The only VMware provided source of information (regarding the underlying implementation) was a log file on the ESXi: /var/log/hostd.log Whenever an seSparse snapshot is created - the log is being populated with seSparse records. Relevant log records are of the form: [...] Const Header: [...] constMagic = 0xcafebabe [...] version = 2.1 [...] capacity = 204800 [...] grainSize = 8 [...] grainTableSize = 64 [...] flags = 0 [...] Extents: [...] Header : <1 : 1> [...] JournalHdr : <2 : 2> [...] Journal : <2048 : 2048> [...] GrainDirectory : <4096 : 2048> [...] GrainTables : <6144 : 2048> [...] FreeBitmap : <8192 : 2048> [...] BackMap : <10240 : 2048> [...] Grain : <12288 : 204800> [...] Volatile Header: [...] volatileMagic = 0xcafecafe [...] FreeGTNumber = 0 [...] nextTxnSeqNumber = 0 [...] replayJournal = 0 The sizes that are seen in the log file are in sectors. Extents are of the following format: <offset : size> This commit is a strict implementation which enforces: * magics * version number 2.1 * grain size of 8 sectors (4KB) * grain table size of 64 sectors * zero flags * extent locations Additionally, this commit proivdes only a subset of the functionality offered by seSparse's format: * Read-only * No journal replay * No space reclamation * No unmap support Hence, journal header, journal, free bitmap and backmap extents are unused, only the "classic" (L1 -> L2 -> data) grain access is implemented. However there are several differences in the grain access itself. Grain directory (L1): * Grain directory entries are indexes (not offsets) to grain tables. * Valid grain directory entries have their highest nibble set to 0x1. * Since grain tables are always located in the beginning of the file - the index can fit into 32 bits - so we can use its low part if it's valid. Grain table (L2): * Grain table entries are indexes (not offsets) to grains. * If the highest nibble of the entry is: 0x0: The grain in not allocated. The rest of the bytes are 0. 0x1: The grain is unmapped - guest sees a zero grain. The rest of the bits point to the previously mapped grain, see 0x3 case. 0x2: The grain is zero. 0x3: The grain is allocated - to get the index calculate: ((entry & 0x0fff000000000000) >> 48) | ((entry & 0x0000ffffffffffff) << 12) * The difference between 0x1 and 0x2 is that 0x1 is an unallocated grain which results from the guest using sg_unmap to unmap the grain - but the grain itself still exists in the grain extent - a space reclamation procedure should delete it. Unmapping a zero grain has no effect (0x2 will not change to 0x1) but unmapping an unallocated grain will (0x0 to 0x1) - naturally. In order to implement seSparse some fields had to be changed to support both 32-bit and 64-bit entry sizes. Reviewed-by:
Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by:
Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by:
Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by:
Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Sam Eiderman authored
512M of L1 entries is a very loose bound, only 32M are required to store the maximal supported VMDK file size of 2TB. Fixed qemu-iotest 59# - now failure occures before on impossible L1 table size. Reviewed-by:
Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by:
Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by:
Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com Reviewed-by:
Max Reitz <mreitz@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Sam Eiderman authored
Commit b0651b8c ("vmdk: Move l1_size check into vmdk_add_extent") extended the l1_size check from VMDK4 to VMDK3 but did not update the default coverage in the moved comment. The previous vmdk4 calculation: (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB The added vmdk3 calculation: (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB Adding the calculation of vmdk3 to the comment. In any case, VMware does not offer virtual disks more than 2TB for vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is not implemented yet in qemu. Reviewed-by:
Karl Heubaum <karl.heubaum@oracle.com> Reviewed-by:
Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by:
Sam Eiderman <shmuel.eiderman@oracle.com> Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com Reviewed-by:
yuchenlin <yuchenlin@synology.com> Reviewed-by:
Max Reitz <mreitz@redhat.com> Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
- Jun 04, 2019
-
-
Kevin Wolf authored
This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Apr 30, 2019
-
-
Sam Eiderman authored
Commit b69864e5 ("vmdk: Support version=3 in VMDK descriptor files") fixed the probe function to correctly guess vmdk descriptors with version=3. This solves the issue where vmdk snapshot with parent vmdk descriptor containing "version=3" would be treated as raw instead vmdk. In the future case where a new vmdk version is introduced, we will again experience this issue, even if the user will provide "-f vmdk" it will only apply to the tip image and not to the underlying "misprobed" parent image. The code in vmdk.c already assumes that the backing file of vmdk must be vmdk (see vmdk_is_cid_valid which returns 0 if backing file is not vmdk). So let's make it official by supplying the backing_format as vmdk. Reviewed-by:
Mark Kanda <mark.kanda@oracle.com> Reviewed-By:
Liran Alon <liran.alon@oracle.com> Reviewed-by:
Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by:
Shmuel Eiderman <shmuel.eiderman@oracle.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Fam Zheng <fam@euphon.net> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Mar 19, 2019
-
-
Sam Eiderman authored
Commit 509d39aa added support for read only VMDKs of version 3. This commit fixes the probe function to correctly handle descriptors of version 3. This commit has two effects: 1. We no longer need to supply '-f vmdk' when pointing to descriptor files of version 3 in qemu/qemu-img command line arguments. 2. This fixes the scenario where a VMDK points to a parent version 3 descriptor file which is being probed as "raw" instead of "vmdk". Reviewed-by:
Arbel Moshe <arbel.moshe@oracle.com> Reviewed-by:
Mark Kanda <mark.kanda@oracle.com> Signed-off-by:
Shmuel Eiderman <shmuel.eiderman@oracle.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- Feb 25, 2019
-
-
Yu-Chen Lin authored
In vmdk_co_create_opts, when it finds hw_version is undefined, it will set it to 4, which misleading the compat6 and hwversion in vmdk_co_do_create. Simply set hw_version to NULL after free, let the logic in vmdk_co_do_create to decide the value of hw_version. This bug can be reproduced by: $ qemu-img convert -O vmdk -o subformat=streamOptimized,compat6 /home/yuchenlin/syno.qcow2 /home/yuchenlin/syno.vmdk qemu-img: /home/yuchenlin/syno.vmdk: error while converting vmdk: compat6 cannot be enabled with hwversion set Signed-off-by:
yuchenlin <yuchenlin@synology.com> Message-id: 20190221110805.28239-1-yuchenlin@synology.com Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Hanna Reitz authored
Some follow-up patches will rework the way bs->full_open_options is refreshed in bdrv_refresh_filename(). The new implementation will remove the need for the block drivers' bdrv_refresh_filename() implementations to set bs->full_open_options; instead, it will be generic and use static information from each block driver. However, by implementing bdrv_gather_child_options(), block drivers will still be able to override the way the full_open_options of their children are incorporated into their own. We need to implement this function for VMDK because we have to prevent the generic implementation from gathering the options of all children: It is not possible to specify options for the extents through the runtime options. For quorum, the child names that would be used by the generic implementation and the ones that we actually (currently) want to use differ. See quorum_gather_child_options() for more information. Note that both of these are cases which are not ideal: In case of VMDK it would probably be nice to be able to specify options for all extents. In case of quorum, the current runtime option structure is simply broken and needs to be fixed (but that is left for another patch). Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-23-mreitz@redhat.com Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Hanna Reitz authored
Make bdrv_get_full_backing_filename_from_filename() return an allocated string instead of placing the result in a caller-provided buffer. Signed-off-by:
Max Reitz <mreitz@redhat.com> Message-id: 20190201192935.18394-11-mreitz@redhat.com Signed-off-by:
Max Reitz <mreitz@redhat.com>
-
Hanna Reitz authored
Besides being safe for arbitrary path lengths, after some follow-up patches all callers will want a freshly allocated buffer anyway. In the meantime, path_combine_deprecated() is added which has the same interface as path_combine() had before this patch. All callers to that function will be converted in follow-up patches. Signed-off-by:
Max Reitz <mreitz@redhat.com> Reviewed-by:
Alberto Garcia <berto@igalia.com> Reviewed-by:
Kevin Wolf <kwolf@redhat.com> Message-id: 20190201192935.18394-10-mreitz@redhat.com Signed-off-by:
Max Reitz <mreitz@redhat.com>
-